Reinventing the Python wheel
It is no secret that the Python packaging world is at something of a
crossroads; there have been debates and discussions about the packaging
landscape that started long before our 2023
series describing some of the difficulties. There has been progress
since then—and incremental improvements all along, in truth—but a new
initiative is looking to overhaul packaging for the language. At PyCon US 2025, Barry Warsaw and
Jonathan Dekhtiar gave a presentation on the WheelNext project, which is a community
effort that aims to improve the experience for users and providers of Python
packages while also working with toolmakers and other parts of the
ecosystem to "reinvent the wheel
". While the project's name refers
to Python's wheel
binary distribution format, its goals stretch much further than simply the
format.
Warsaw started things off by noting that, while he and Dekhtiar both work
for NVIDIA, WheelNext is a "community-driven initiative that spans all
of the entire Python community
". He put up profile pictures from
around 30 different people who had already been contributing to the WheelNext GitHub
repository; "it's really open to anybody
", Warsaw said.
Before getting into the meat of WheelNext, it is important to "celebrate
the wins for the Python packaging community
", he said, showing some
screen shots that had been taken a few weeks before the talk in May. The
numbers of projects (600K+), releases (nearly 7 million), files
(14.1 million), and users (920K+) listed on the Python Package Index (PyPI) home page are
eye-opening, for example. The PyPI Stats page showed
numbers that "certainly blew me away; 1.6 billion downloads a day
[on that day], 20 billion downloads a month
". He showed some
other graphs that illustrated the "prodigious amount of data and
packages that are being vended by PyPI
".
It is clear that Python packages, and wheels in particular, have been extremely
successful. They are used "every day, in many many different ecosystems,
in corners of the Python world that we're not even aware of
" and that
is the result of a lot of hard work over decades by many people,
organizations, and companies. "Wheels pretty much serve the needs of
most users most of the time ... so that's awesome.
"
Over time, though, as Python reaches more development communities and
additional use cases arise, "cracks are beginning to show
"
WheelNext
![Barry Warsaw [Barry Warsaw]](https://static.lwn.net/images/2025/pycon-warsaw-sm.png)
His elevator pitch for WheelNext: "An incubator for thinking about the
problems and solutions of the packaging community for the next, let's say,
ten years, five years, 30 years
". WheelNext goes well beyond just the
wheel format; "we're really talking about evolving the Python-packaging
ecosystem
". Among the companies, organizations, and individuals
involved are all of the different stakeholders for packaging, beyond just
users: "tool makers, environment managers, installers, package
managers
", many tools that consume wheels, people building packages for
PyPI, and so on. All of those are affected by what is working well—and not
so well—for packaging.
He showed a slide of the logos of a dozen or so communities and projects that
are part of the effort, noting that it was just a sampling of them. He
said that they were mostly in the scientific-computing part of the Python
world; "that 's just because I think the packaging ecosystem doesn't
serve their needs quite as well as many other communities
".
A lot of the inspiration for WheelNext came out of the pypackaging-native web
site; he recommended visiting that site for "really excellent,
detailed information about the problem space that we're trying to
solve
". WheelNext is the other side of the coin, trying to find
solutions for the problems outlined there.
For example, wheels and tools like pip are not set up to handle GPUs, CPU micro-architectures, and specialized libraries (e.g. for linear algebra); in those cases, users want a wheel that targets specific versions of those variables. Limits on the size of wheels that are imposed by PyPI need to be addressed as part of WheelNext. In addition, there are native dependencies: libraries written in C, C++, Rust, or other languages that are needed by Python modules, along with any dependencies. It is difficult for users to specify exactly what they need in those cases.
The WheelNext project has a set
of axioms that it has adopted as part of its
philosophy. "If it works for you now, great, it'll continue to work for
you later
"; the project is not looking to change things for parts of
the community where the status quo already works. Beyond that, the project is
prioritizing the user experience of package installation and trying to
push any complexity into the tools. WheelNext does not want to create
another silo or its own ecosystem, it wants to meet users where they are
today; users already have tools that they like, so it does not make sense
to force them to learn another.
The idea is for WheelNext to come up with ecosystem-wide solutions, not
ones that only work for a single tool or service. For example, there are "lots of
third-party indexes that exist
" beyond just PyPI, Warsaw said; the
intent is to think about "how are we going to standardize what we need to standardize
for interoperability
". Backward compatibility will be prioritized, but
if there comes a need to break that to improve things, the plan is "to
do so intentionally and explicitly with a very defined migration path
".
Problems
At that point, Dekhtiar took the stage to further discuss the problems that
WheelNext is trying to address. First up was the PyPI file-size
limitations; by default PyPI files can be up to 100MB in size; that can be
raised to 1GB by PyPI staff, which is a manual process, thus kind of
painful. Meanwhile, a project is limited to 10GB in total size for all of
its files; "you are kind of boxed in by multiple limits at the same
time
". This problem is particularly acute for scientific projects and
those shipping large AI models.
![Jonathan Dekhtiar [Jonathan Dekhtiar]](https://static.lwn.net/images/2025/pycon-dekhtiar-sm.png)
One possible solution to that is for those who need to distribute larger
objects to have their own indexes. "That does not work as well as we
would wish
", in part because pip's interface is painful to use with
extra indexes. In addition, there can be security problems
because multiple indexes can provide packages with the same names, some of
which could be buggy or malicious. The dependency resolver will try to
choose the best version to install based on version numbers, which can be
set by attackers; "in terms of security, it is difficult to manage
".
So, improving the mechanism for resolving and prioritizing indexes is an
important target for WheelNext.
As Warsaw had noted, backward compatibility is a major goal for the
initiative; the intent is to fix the problems "without reinventing the
workflow
", Dekhtiar said. It is difficult to do so using the existing
wheel format because it cannot be extended in a backward-compatible way;
there is no way to express that older versions of pip should see one thing,
while newer versions should see a different format. That lack is holding
up the ability to add support for symbolic links, Zstandard compression
(which would help with PyPI file-size limitations somewhat), and
METADATA.json files (which would help implementing Python dependency
lock files and package resolution).
Many Python scientific-computing packages rely on binary libraries of various sorts. A given application might use multiple packages, each of which relies on (and ships) its own version of the OpenBLAS linear-algebra library. Being able to share common libraries between packages in an efficient manner would reduce the number of redundant libraries that are shipped in packages and loaded at run time.
Some packages support multiple options for backends that provide visualization tools, web servers, or GUI toolkits, for example. They often require that one of the options is chosen when the package is installed, but users may not have a opinion about which they want. Like Python does with default arguments to functions, he said, it would be nice to have a way to specify a default "extra" that will get installed if no other choice is made.
Right now, wheels are identified by a set of platform identifiers that do
not include all of the different possibilities. In particular, packages
can be built and optimized for specialized hardware, such as GPUs, FPGAs,
CPU micro-architectures, specialized instruction sets (e.g. AVX512), and so
on, but there is no mechanism to select wheels based on that criteria.
Without fine-grained selection, "what you end up having is the lowest
common denominator, which is optimized for nobody
", Dekhtiar said.
The
problem has been "solved" for some projects by having a web-site
selector that allows users to choose the right package, but it
forces them to read the documentation and set up a different
index for getting their packages. "This is awesome, because it allows
us to do what we need
", but "we are a little bit sad that this is
the best answer we have today and we wish we could do better
".
Since the talk was 30 minutes in length, he said, they could not cover the entirety of WheelNext, but he wanted to quickly go through some of the PEPs that have been discussed along the way. He started with the (withdrawn) PEP 759 ("External Wheel Hosting"); it is a proposed solution for the problems with multiple repositories (indexes) for Python artifacts. PEP 771 ("Default Extras for Python Software") is meant to address the need for specifying default backends as he had described earlier. He said that PEP 777 ("How to Re-invent the Wheel") was meant to help ensure that the wheel format evolves in ways that would not require backward-incompatible changes in the future. PEP 778 ("Supporting Symlinks in Wheels"), which was recently deferred, is challenging because not all filesystems support symbolic links, but there is a need to share libraries as he mentioned earlier. The build isolation passthrough PEP does not have a number, but it is meant to help in building Python extensions based on experimental or development versions of packages on the local machine.
Governance
With that, Warsaw stepped up to talk about packaging governance and PEP 772 ("Packaging Council
governance process") in particular. Over the years, in the Python
community, "there's been as little bureaucracy as we could possibly get
away with and more of a grassroots movement for handling things
". As
it becomes clear "that we need some more formalism, we figure out how
to do that
"; the creation of the
Python steering council is a good example of how that works.
The community has recognized a need for some more formalism in packaging
governance recently. There are essentially two developers who each
"have a vertical slice of the packaging ecosystem and they have standing
delegations from the Python steering council
" to decide on packaging
PEPs, Warsaw, who is a member of the steering council, said. There are
concerns about the "bus factor", but having people in that situation "also means that there is a lot
of burden on that one person to do everything and make sure that they get
it right
" for their slice.
So the PEP is an effort to bring the steering-council model ("which is
mostly successful
") to the packaging community. The idea is
that the steering council can delegate packaging decisions to a council
that is elected by the large community of packaging stakeholders. Those
who are familiar with the workings of the steering council will find the
election of the packaging council and its operation to be similar. There
are some differences, due to the nature of the packaging community;
currently the main effort is to define the voting community that will vote
on the five members of the packaging council. His hope was that the PEP
could go to the steering council for a vote soon and that the packaging
council could get started sometime this year; an updated version of the PEP
was announced
shortly after PyCon.
More PEP
Dekhtiar returned to the podium to talk about more WheelNext initiatives; PEP 766 ("Explicit Priority
Choices Among Multiple Indexes") was up first. As he had said, the pip interface for
using multiple indexes is cumbersome; it would be better if users could
specify "PyTorch comes from here, NumPy comes from there, and the CUDA
wheels come from there
". The interface needs work, but there is also a
need to protect against security problems when the installers are choosing
packages based on their origin and version numbers.
PEP 766 is more meant "to define the vocabulary, the wording, than actually
behavior
"; the intent is to have common language for the installers to
use when describing their resolution behavior with respect to multiple
indexes.
Sharing binary files, like OpenBLAS, between wheels or with
system-installed libraries (if they are present) is difficult; "there is
no safe way to do that that is common and standardized across the
ecosystem
". The WheelNext participants want to find a solution for a
native
library loader that is a kind of "best practice" approach to the
problem, which can be shared throughout the community. He likened it to
importlib,
but one "that's specific around loading binaries
". There is a
saying in the WheelNext community, he said, "'good enough is better than
nothing at all' and right now we have nothing at all
" for handling
shared libraries.
Wheel
variants is another potential PEP. Dekhtiar said that he wanted to
share the dream that WheelNext participants have about the next iteration
of the wheel format. Today, the platform that is associated with a wheel
includes the Python ABI version (e.g. 3.12, 3.13, or 3.13 free-threaded),
the operating system, the C library (e.g. manylinux
for the GNU C library), and the CPU architecture. Those are encoded into
the name of the wheel file. Those tags are not sufficient to describe all
of the platforms in use, but "constantly adding tags to better describe
your platform is not a scalable practice
". There are different GPUs,
application-specific integrated circuits (ASICs), and, some day,
quantum-computing devices; even if the community wanted to fully describe
today's systems, "we don't have the language to be able to do that
".
Instead, the idea of wheel variants is to have the installer determine what
the local system has installed, then use that information to choose the
right wheel. For example, for JAX
and PyTorch, the installer could determine which version of CUDA
is installed, what kind of Tensor
Processing Unit (TPU) there is, and which instructions are supported by the
CPU, then it can "pick the best
".
He went through some scenarios using a prototype pip that would download
vendor plugins to detect various aspects of the environment (CPU
micro-architecture or CUDA version, for example). From a combination of
the package metadata and the running system, it would determine which wheels
to request for installation. At the time of the talk, the prototype worked with a subset of packages
and just pip as an installer, but the hope is to get it working with others
in order to collect more feedback.
Conclusion
Warsaw finished the presentation with a "call to action", inviting people to get involved with WheelNext and to bring their use cases. The project has various ways to participate and is actively seeking feedback and contributions. For interested readers, the YouTube video of the talk is also available.
[Thanks to the Linux Foundation for its travel sponsorship that allowed me to travel to Pittsburgh for PyCon US.]
Index entries for this article | |
---|---|
Conference | PyCon/2025 |
Python | Packaging |
Posted Jul 9, 2025 15:38 UTC (Wed)
by nickodell (subscriber, #125165)
[Link] (1 responses)
Posted Jul 18, 2025 21:55 UTC (Fri)
by zahlman (guest, #175387)
[Link]
Of course, there are multiple steps to implementing any of these kinds of changes across the ecosystem. Ideally, changes to metadata standards should ensure that older installers (i.e. older versions of pip) automatically ignore packages they won't know how to handle. And newer ones of course have to actually implement the new specifications. That's especially an issue for symlinks — the packaging formats need to be able to describe them even for archive formats that don't provide native support, but more importantly, Windows installers need to be able to work without admin rights. Presumably this means describing a more abstract form of link, and then Python code would have to be rewritten to actually use those links. .pth files work for Python source, but for everything else (data files, compiled C .so files etc.) an entirely new mechanism would be needed. And right now, it seems that the PEP 778 author/sponsor/delegate aren't even sure if they want to tackle that wide of a scope.
On the flip side, I worry that there isn't actually enough demand for these features to get them prioritized. It seems like lots of developers out there are perfectly happy to e.g. specify "numpy" without a version as a dependency for today's new one-off hundred-line script, and download 15-20MB for it again because the latest point version isn't in cache.
Posted Jul 10, 2025 4:18 UTC (Thu)
by donald.buczek (subscriber, #112892)
[Link] (12 responses)
terrible
Posted Jul 10, 2025 7:20 UTC (Thu)
by gpth (subscriber, #142055)
[Link] (11 responses)
Thanks
Posted Jul 10, 2025 7:54 UTC (Thu)
by cpitrat (subscriber, #116459)
[Link] (4 responses)
I guess the "terrible" comes from the concern of transparently executing remote code.
Yet you're already downloading code that you'll execute anyway, coming, in theory, from the same source. And I think pip is already executing some python code at installation time. So I'm not sure if this idea should bring additional new concerns ...
Posted Jul 10, 2025 8:01 UTC (Thu)
by cpitrat (subscriber, #116459)
[Link] (1 responses)
This supposes, of course, that the source cannot be trusted or has been compromised. The script would have to be obfuscated enough that the targetting is not obvious (e.g. relying on some very rare piece of hardware that the target is known to use).
Similar thing can already be done in the code of the package itself, but it would be in plain view for any user. With the exotic package, it could be hidden (unless there's a way to identify the package is not build from published source) in a package that nobody is likely to review.
Posted Jul 18, 2025 22:12 UTC (Fri)
by zahlman (guest, #175387)
[Link]
My understanding is that NVidia has already been doing this sort of thing with setup.py for years. Not involving malware, I presume, but my understanding is that they explicitly re-direct pip to download the real package from their own index, after running code to determine which one.
In principle, setup.py can be audited before running, but in practice you have to go quite some distance out of your way. `pip download` is not usable for this task (see https://zahlman.github.io/posts/2025/02/28/python-packagi...) so you need to arrange your own separate download explicitly and then convince pip to use that file. And then multiply that by all your transitive dependencies, of course.
Such code isn't included in, and doesn't run from wheels (i.e. specifying `--only-binary=:all:`), but then you don't get the potential benefits from trusted code, either. Assuming a wheel for your platform is available in the first place.
It seems that people want to be able to install code and its dependencies in a completely streamlined, fast way; but they also want to be able to use packages that interface to C code, and not have to worry about niche platform details (I saw https://faultlore.com/blah/c-isnt-a-language/ the other day and it seems relevant here), and avoid redundancy, and also have everything be secure. It really seems like something's gotta give.
Posted Jul 10, 2025 16:45 UTC (Thu)
by SLi (subscriber, #53131)
[Link] (1 responses)
So this proposal sounds like something that would, perhaps, solve the needing-to-download part. But how would it play otherwise with the problem? Or is computable dependency resolution a lost cause?
Posted Jul 18, 2025 22:33 UTC (Fri)
by zahlman (guest, #175387)
[Link]
Packages adhering to recent metadata standards get their metadata files extracted automatically and made separately available on PyPI (the relevant standards are described in https://peps.python.org/pep-0658/). But the metadata for a modern source distribution (a `PKG-INFO` file — not pyproject.toml, which you may think of as "source" for the "built" PKG-INFO metadata) is still allowed to declare everything except the metadata version, package name and package version as dynamic. And in older versions of the standard, anything you omit is implicitly dynamic.
There is, now, a hook defined for getting this metadata (https://peps.python.org/pep-0517/#prepare-metadata-for-bu...), but there's nothing to force packages (more realistically, the build systems they depend on) to implement it. By default, the flow is: your installer program builds the entire "wheel" for the package (by setting up any build-system dependencies in an isolated environment, and then running the source package's included build-orchestration code), then checks what metadata is in *that* resulting archive file. (I sort-of touched on this in https://lwn.net/Articles/1020576/ , but without specifically talking about metadata.)
It isn't really supposed to be this way. In principle, https://peps.python.org/pep-0508/ describes a system for making a project's dependencies conditional on the target Python version, OS etc. But apparently this system still can't provide enough information for some packages — and many others are just packaged lazily, or haven't been updated in many years and are packaged according to very outdated standards. And this only helps for dependencies, not for anything else that might be platform specific. (Apparently, *licenses* are platform-specific for some projects out there, if I understood past discussion correctly.)
This is arguably just what you get when you want to support decades of legacy while having people ship projects that mix Python and C (and Fortran, and now Rust, and probably some other things in rare cases).
Posted Jul 10, 2025 10:08 UTC (Thu)
by donald.buczek (subscriber, #112892)
[Link] (5 responses)
I'm not a fan of this for multiple reasons. Let me elaborate on just one of these reasons, which I wanted to express with the term "nondeterministic installation".
We run our own in-house distribution. You can think of me as a packager. The systems on which we install software are different from those where the software eventually runs. So every installer which wrongly assumes that the environment it sees at installation time is the same as at runtime needs extra work from us. We operate in a scientific environment, and want reproducibility. Of course, if at all possible, we prefer building from source. Regardless of whether we build from source, whenever an installer attempts to download something, we need to reverse-engineer the process. We then download the data in advance and modify the installer to use the local copy. We don't want the installation to fail or produce different results the next time because something external has changed or is no longer available.
Additionally, I wouldn't trust "vendor plugins" at all to correctly detect "various aspects of the environment". In my experience they only work for standard installations on one or two big distributions and nothing else.
Posted Jul 10, 2025 12:53 UTC (Thu)
by gpth (subscriber, #142055)
[Link] (1 responses)
To me Python packaging, even with frozen, specific versions seems not enough for complete reproducibility, exactly for the reasons you highlighted. Without knowing all the details, multi-architecture container images (ie. https://developers.redhat.com/articles/2023/11/03/how-bui...) sound like a solution that would provide better reproducibility here.
Posted Jul 11, 2025 14:36 UTC (Fri)
by donald.buczek (subscriber, #112892)
[Link]
If software just follows standards, everything is fine. For example, use execlp() to activate external programs, as it respects the PATH environment variable. Activate shared libraries with the dynamic linker which honors LD_LIBRARY_PATH. Open files by their canonical names, which allows redirection by symbolic links.
If, on the other hand, an installer 'locates' all the files at installation time and configures them with resolved paths, the setup may fail if the environment changes. This can occur, for example, if you switch to a machine with a different GPU, necessitating a different driver version and library variants.
But even outside our environment the prototypical user with their personal notebook and nothing else should be annoyed, when updating one thing calls for re-installation of another thing, because the vendor-supplied oracle needs to look at the environment a second time. Backup and Restore to a replacement system will no longer work.
Posted Jul 11, 2025 22:54 UTC (Fri)
by dbnichol (subscriber, #39622)
[Link]
Posted Jul 14, 2025 19:07 UTC (Mon)
by rjones (subscriber, #159862)
[Link] (1 responses)
Of course, besides the downloading part.
Provided they "do the right thing" and support isolated or offline installs intelligently and in a standardized way then you shouldn't have to do any reverse engineering. Whenever dealing with python or most any other language you have to figure out a way to cache it to your networks or locally if you want to avoid depending on pulling whatever is hosted on the internet.
It could range from "easier then dealing with rpm or deb files" to "being nearly impossible to use" depending on the details of their implementation.
Posted Jul 15, 2025 9:12 UTC (Tue)
by donald.buczek (subscriber, #112892)
[Link]
Well, while in theory configure-scripts could also do all kinds of unwanted stuff, most of the time they are generated by GNU Autotools. And while GNU Autotools is archaic and unnecessarily complex, it has set and conforms to standards which are very much welcome. Autotools-generated scripts usually respect environment variables like $DESTDIR. They won't attempt to "edit" files into your /etc or replace some library in your /usr/lib with patched versions.
Typically, a "configure" checks for the "availability" of a feature, for example a library with a specific minimum API version. If it's not available, the script will either abort the configuration process or adjust the configuration to exclude features that require the missing library. However, once you have such a required component in your system, there is no reason it would go away. If the dependencies are well-designed, upgrading them typically does not cause problems. A typical configure script will not check for hardware attributes like the amount of memory or the number of cores or the specific version of your graphics card.
Yes, sometimes we can't just rebuild very old stuff, because of changes in the C-compiler or when the required software was not as well designed. But generally, packages with Autotools work well for us.
> It could range from "easier then dealing with rpm or deb files" to "being nearly impossible to use" depending on the details of their implementation.
Right. Maybe the problem is as follows: When I think of "vendor plugins to detect various aspects of the environment", I think of the NVIDIA driver installer and that is a major pain point.
Posted Jul 14, 2025 12:54 UTC (Mon)
by vivo (subscriber, #48315)
[Link] (3 responses)
I do really hope that the most used languages in open source converge to one and only one package manager be it based on rpm, dpkg or portage.
Posted Jul 14, 2025 13:05 UTC (Mon)
by Wol (subscriber, #4433)
[Link]
And here lies the crux of the problem. Rpm and dpkg would need massive re-engineering to cope with what portage does, I suspect, while portage is massive overkill for what rpm or dpkg do.
And then one only has to look at Red Hat and SuSE - two rpm-based distros that (less now than previously) were pretty much incompatible despite sharing the same package manager.
Everyone thinks that rpm distros were mostly cross-compatible, but that was for the same reason that dpkg distros are compatible - dpkg distros are all (afaik) Debian derivatives. MOST rpm distros are descended from Red Hat, but SUSE is descended from Yggdrasil/Slackware (and the rest... I can't remember the complications).
Cheers,
Posted Jul 15, 2025 7:17 UTC (Tue)
by taladar (subscriber, #68407)
[Link]
Take e.g. Rust features, they are also on/off flags like USE flags in portage but work entirely differently again with cargo assuming that any version of a crate with a specific feature enable (regardless of the values of other features) will satisfy a dependency that asks for that crate with that feature.
If you wanted a grand unification of all package managers you should probably first start trying to define what a package actually is (including details like optional compile time features and other compile time options such as alternative dependencies (think openssl/gnutls/...)). You would quickly realize that this is somewhere between hard and impossible.
Posted Jul 18, 2025 22:52 UTC (Fri)
by zahlman (guest, #175387)
[Link]
My understanding is that Guido van Rossum understood this quite well and made it very explicitly not his problem, which is why pip and Setuptools are technically third party and distutils got deprecated and eventually removed from the standard library.
For pure Python projects that can rely on the basic already-compiled bits of C code in the standard library (for basic filesystem interaction etc.), dependency management is generally not a big issue in Python IMX. Python's design makes it impractical to have multiple versions of a package in the same environment, which occasionally causes problems. But usually, everything is smooth even with compiled C code as long as it can be pre-compiled and the system is sufficiently standard to select a pre-compiled version. When I switched over to Linux for home use it never even occurred to me to worry about whether I'd lose (or complicate) access to popular Python packages, and indeed I didn't.
> I do really hope that the most used languages in open source converge to one and only one package manager be it based on rpm, dpkg or portage.
Perhaps it's gauche to point it out on LWN, but this will not satisfy the needs of the very large percentage of Python programmers who are on Windows.
But also, to my understanding, none of these tools (or their corresponding package formats) are oriented towards installing things in a custom location, which is essential for Python development and even for a lot of end users. I'm not sure even chroot would help here — currently, pip needs to consult the `sysconfig` standard library (https://docs.python.org/3/library/sysconfig.html) to determine install paths, and it also supports installing for the system (recent Python versions may require a security override), in a separate user-level directory or in a virtual environment. (And you really do need virtual environments.)
Posted Jul 15, 2025 21:47 UTC (Tue)
by gray_-_wolf (subscriber, #131074)
[Link] (10 responses)
Well, I just wonder whether "wheel" is really the best model to base reproducible (scientific) computing on. Was more radical approach considered?
Posted Jul 16, 2025 1:45 UTC (Wed)
by DemiMarie (subscriber, #164188)
[Link] (8 responses)
Posted Jul 18, 2025 18:37 UTC (Fri)
by dvdeug (guest, #10998)
[Link] (6 responses)
As for the results are scientifically equivalent? Early weather simulations established that not bit-for-bit identical simulations will diverge. Even in less chaotic cases, how do you know they're scientifically equivalent? Bit-for-bit is easy to check; scientifically equivalent is hard, if not impossible, to check.
Posted Jul 18, 2025 22:54 UTC (Fri)
by zahlman (guest, #175387)
[Link]
The latter does not prove the former.
Posted Jul 19, 2025 13:25 UTC (Sat)
by DemiMarie (subscriber, #164188)
[Link] (4 responses)
Posted Jul 19, 2025 15:12 UTC (Sat)
by Wol (subscriber, #4433)
[Link] (1 responses)
But do they? Depends on the chaos!
Some chaotic structures diverge rapidly with small differences in the inputs. Others (it's called a "strange attractor" iirc) find it very hard to break away from a stable pattern. Your high-pressure summer weather system that resolutely refuses to move is one of these.
Cheers,
Posted Jul 19, 2025 21:09 UTC (Sat)
by kleptog (subscriber, #1183)
[Link]
And there's a whole branch of mathematics about how to fix algorithms to improve numerical stability. Floating point numbers on computer have properties that mean you sometimes you won't get the answer you hope with a naive implementation.
Posted Jul 31, 2025 6:03 UTC (Thu)
by dvdeug (guest, #10998)
[Link] (1 responses)
Posted Jul 31, 2025 8:33 UTC (Thu)
by farnz (subscriber, #17727)
[Link]
With a "yes"/"no", you're never going to know whether it's because you hit a lucky outcome; with a full PDF, you can see that "most likely is asteroid miss by 100 km, but there's a good chance of an asteroid hit", or "outcome is chaotic - all possible sizes of storm are equally likely from the measurements.
Posted Jul 20, 2025 6:58 UTC (Sun)
by donald.buczek (subscriber, #112892)
[Link]
To answer that question: In reality we (IT) are happy when we can provide an environment where you can run and recompile applications which were developed a decade ago. The actual scientific applications are developed, for example, by Bioinformaticians, who should think about the aspects of reproducibility on this level. Some do, most don't. If the output depends on races or other sources of randomness, we can't help. Most of the time, it doesn't matter for the scientific conclusions, though. Still it would be good for review if you could reproduce the output exactly and not just statistically.
Posted Jul 16, 2025 13:36 UTC (Wed)
by aragilar (subscriber, #122569)
[Link]
Posted Jul 17, 2025 6:51 UTC (Thu)
by callegar (guest, #16148)
[Link] (12 responses)
1. it hinders the possibility of picking the best blas for your system or to test different options;
2. most important it breaks the possibility to do multithreading right. Every individual blas ends up with its own view of how many cores the system has and on how many of them it is parallelizing. This means that if multiple packages end up doing things concurrently and each uses blas, you end up with a very suboptimal case where more threads than cores are employed. For this reason it looks like many packages that vendor some blas in their wheel build it to be single core. And again this is a performance loss.
3. it makes packages bigger than they should and memory usage larger than it should, with an obvious loss in cache performance.
Posted Jul 17, 2025 6:54 UTC (Thu)
by callegar (guest, #16148)
[Link]
Posted Jul 17, 2025 7:38 UTC (Thu)
by intelfx (subscriber, #130118)
[Link] (10 responses)
I think this is the final undeniable proof that every language-specific package silo which is created to bypass "those useless distros" eventually becomes just a worse ad-hoc distro as soon as people start using it to solve actually hard packaging problems.
Posted Jul 17, 2025 8:59 UTC (Thu)
by callegar (guest, #16148)
[Link] (9 responses)
A notable aspect with Python (and other languages) is that while packages can in principle be shared among many applications (ultimately looking a bit like shared libraries), it is impossible to make different version of the same package co-exist, which makes the traditional way of packaging things adopted in distros a nightmare.
If you need applications A, B and C and they all rely on package X, then either you vendor X in A, B and C or the distro needs to find a single version of X that can satisfy A, B and C at the same time. If this is impossible, then the traditional distro approach will fail or force the developers to patch downstream A, B, C or even X to solve the issue. For languages that enable having different versions of the same package coexist, it becomes a matter of providing both a package of X-1 and of X-2, so that, for example A can depend on X-1 and pull it in when installed and B can depend on X-2.
The reason why you need tools like conda or uv, capable of managing a huge cache of pre-built packages, of creating virtual environmens and of creating in there a forest of links to the packages in the cache is not just "providing some isolation", but also (and I would say in great portion) a workaround for not having the possibility of dropping all packages in a single place in multiple versions as needed and having the projects go seek themselves the versions they can work with.
In some sense the "keep it simple here" (no package versioning) ends up "making it complex somewhere else" (no practical possibility to rely on common distro packaging tools and the *real* need to reinvent them).
Posted Jul 17, 2025 20:27 UTC (Thu)
by raven667 (subscriber, #5198)
[Link] (8 responses)
Is this a solvable problem, creating a new mechanism for loading modules or declaring dependencies to get a soname-like experience for Python that can be retrofitted in in a way that affects new code which is updated to take advantage of it but not existing code which doesn't know about it? Maybe some special attribute of __init__ or something which can provide version range info, and a new directory structure or naming convention for module_name@version or something, with a constraint that the same python interpreter maybe cannot load two different versions of the same module_name at the same time and will have an import error exception instead if its attempted. This could allow the python interpreter to have the same behavior as if you used a virtualenv but integrated with the system-wide directory structure and far more tractable for a package manager to update by not having overlapping files.
Posted Jul 18, 2025 23:26 UTC (Fri)
by zahlman (guest, #175387)
[Link] (7 responses)
Many people have this idea (there was a DPO thread recently, even: https://discuss.python.org/t/_/97416) but it really isn't feasible, even without considering "retrofitting".
When you import a module in Python by the default means, it's cached process-wide (in a dictionary exposed as `sys.modules`). This allows for module objects to function as singletons — doing setup work just once, allowing for "lazy loading" of a module imported within a function, customizing an import with runtime logic (commonly used to implement a fallback) etc. etc. And of course it avoids some kinds of circular import problems (though there is still a problem if you import specific names `from` a module - https://stackoverflow.com/questions/9252543), and saves time if an import statement is reached multiple times.
But this means that, even if you come up with a syntax to specify a package version, and a scheme for locating the right source code to load, you break a contract that huge amounts of existing code rely upon for correctness. In particular, you will still have diamond dependency problems.
Suppose that A.py does `import B` and `import C`, and calls `B.foo()` and `C.bar()`; those modules both `import X`, and try to implement their functions using X functionality. Suppose further that they're written for different versions of X. Now suppose we add a syntax that allows each of them to find and use a separate X.py (such that each one implements the API they expect), and revamp the import system so that the separate X module-objects can coexist in `sys.modules` (so that B and C can keep using them).
*Now the code in A can break*. It may, implicitly, expect the `B.foo()` call to have impacted the state of X in a way that is relevant to the `C.bar()` call, but in reality C has been completely isolated from that state change. And there is no general solution to that, because in general the internal state for the different versions of X can be mutually incomprehensible. They are effectively separate libraries that happen to look similar and have the same name.
In the real world, you *can* vendor B with its own vendored X1, and vendor C with its own vendored X2, and patch the import statements so that the vendored B and C access their own vendored Xs directly. But you can only do this with the foresight that B and C both need X, and then you have to write the A code with awareness of the X state-sharing problems. And none of what you vendor will be practically usable by *other* code that happens to have the same dependencies. In practice, vendoring is pretty rare in the Python ecosystem. (Pip does it, but that's because of bootstrapping issues.)
Posted Jul 19, 2025 4:22 UTC (Sat)
by raven667 (subscriber, #5198)
[Link] (6 responses)
No, not that, I'm no where near qualified to be a language designer but I was not suggesting that Python could be evolved to support two modules of different versions loaded in one interpreter process at the same time, once they are loaded in the interpreter there should be only one instance of X, and if the second import specifies that it is not compatible with the version of X which is loaded then it should fail (which is actually better than today which only has version checks on import if they are explicitly coded and not automatically I don't think)
Solving the whole problem, like Rust does where different parts can load different versions of libraries, which can be implemented in different versions of the language standard, is amazing, but defining a more easily tractable subset of the problem then solving that is often good enough.
Posted Jul 24, 2025 6:26 UTC (Thu)
by callegar (guest, #16148)
[Link] (5 responses)
So you get `foo1`, `foo2` and `foo3` rather than `foo` and you can have `foo1`, `foo2` and `foo3` coexist. This clearly does not let you *share state* between `foo1` and `foo2`. However, if the APIs are different the very idea of *sharing state* becomes scary.
Obviously, this requires discipline in package naming. But I think it would help a lot the work of distros if more packages followed this approach.
Posted Jul 24, 2025 10:50 UTC (Thu)
by farnz (subscriber, #17727)
[Link] (4 responses)
This pushes towards the glibc solution (one version, but with backwards compatibility), rather than parallel installability - not least because parallel installability leads towards an NP-complete problem for the packaging team trying to minimise the number of versions of foo that they maintain in the distro.
Posted Jul 25, 2025 4:45 UTC (Fri)
by donald.buczek (subscriber, #112892)
[Link] (3 responses)
Exactly, the system that has long been chosen for Unix-like operating systems sorts files according to function (/usr/bin, /etc, ...).
You can get quite far with $PREFIX installations and wrappers, but it's ugly and opaque. I sometimes think that a system that always bundles software in name-version-variant directories and supports the dynamic networking of these components as a core principle would be better from today's perspective.
Posted Jul 25, 2025 7:22 UTC (Fri)
by taladar (subscriber, #68407)
[Link] (2 responses)
To me that feels just like pushing the problem onto the user. The complexity is all still there, the distro just does not have to care so much about it but the user who wants to use the components together still does and has to constantly make choices related to that.
Posted Jul 26, 2025 6:14 UTC (Sat)
by donald.buczek (subscriber, #112892)
[Link] (1 responses)
The user, acting as an admin, would still be able to install new versions and the distribution-provided package manager would analyze the dependencies. It would just not try to resolve this to a result in which each package can only exist once. It might keep other/older variants around which are required by other packages. It might support some kind of diamond dependencies, too.
The basic difference would be, that packages go into their own file system tree so that they don't conflict with each other if multiple variants of the same package are wanted or needed.
Posted Jul 26, 2025 10:59 UTC (Sat)
by farnz (subscriber, #17727)
[Link]
Wheel we get multiple file compression?
This may seem like a weird and unnecessary feature. If you have repeated content, why not just remove the duplication? This would be the ideal solution, but it is often tricky. Recently, a project I work on that distributes many Cython files saved two megabytes in the distributed version by enabling Cython's new shared library. (Matus Valo's work in this area has been wonderful.) But this change to Cython took thousands of lines of code changes to accomplish, and it would have been much less necessary if wheel formats could compress inter-file duplication better.
Wheel we get multiple file compression?
Nomdeterministic installations
Nomdeterministic installations
Nomdeterministic installations
Nomdeterministic installations
Nomdeterministic installations
Nomdeterministic installations
Nomdeterministic installations
Nondeterministic installations
Nondeterministic installations
Nondeterministic installations
Nondeterministic installations
Nondeterministic installations
Nondeterministic installations
> Unless that stuff is well documented and thought out there has always been a element of 'reverse engineering' when dealing with that sort of thing.
Right. Maybe the problem is as follows: When I think of "vendor plugins to detect various aspects of the environment", I think of the NVIDIA driver installer and that is a major pain point.
Every language reinvent the wheel
This would make the life of the distributions much easier and the entry barrier for developers lower bonus it would improve the quality of dependancy management for everybody
Every language reinvent the wheel
Wol
Every language reinvent the wheel
Every language reinvent the wheel
This does not feel radical enough
This does not feel radical enough
This does not feel radical enough
This does not feel radical enough
This does not feel radical enough
This does not feel radical enough
Wol
This does not feel radical enough
This does not feel radical enough
A lot of that is about what your simulation is meant to tell you; a simulation that says "will happen" or "won't happen" is only really of use as a component of a larger simulation that turns "measurements with error bars" into "Probability Density Function of event happening".
This does not feel radical enough
This does not feel radical enough
This does not feel radical enough
Shared library packages
I mean being able to solve at least this issue would already be a huge win. I don't know if this can be done *in the short term* without having to thoroughly change the way in which things are packaged.
Shared library packages
Shared library packages
Re-inventing distro mechanisms
Re-inventing distro mechanisms
Re-inventing distro mechanisms
Re-inventing distro mechanisms
Re-inventing distro mechanisms
There's two problems with this approach (it's been tried before, and doesn't help distros):
Re-inventing distro mechanisms
Re-inventing distro mechanisms
Re-inventing distro mechanisms
Re-inventing distro mechanisms
This begins to sound like a reinvention of NixOS and similar distributions, which puts every package into its own prefix, and can support just about any dependency setup you care about as a result.
Re-inventing distro mechanisms