LWN: Comments on "Reinventing the Python wheel"

This does not feel radical enough

farnz — Thu, 31 Jul 2025 08:33:06 +0000

A lot of that is about what your simulation is meant to tell you; a simulation that says "will happen" or "won't happen" is only really of use as a component of a larger simulation that turns "measurements with error bars" into "Probability Density Function of event happening".

With a "yes"/"no", you're never going to know whether it's because you hit a lucky outcome; with a full PDF, you can see that "most likely is asteroid miss by 100 km, but there's a good chance of an asteroid hit", or "outcome is chaotic - all possible sizes of storm are equally likely from the measurements.

This does not feel radical enough

dvdeug — Thu, 31 Jul 2025 06:03:53 +0000

If you're checking to see if an asteroid could hit Earth, it's not good enough to say that a simulation that was scientifically equivalent showed it won't. Yes, with current measurements there's a scientifically equivalent simulation that says it won't, but we need to know if it could, consistent with the current measurements, so we know if we need to measure it better. Likewise with weather, we know that with the limits of our current knowledge, it's consistent there won't be a tornado three days from now. The question is, could there be? Moreover, you don't know beforehand whether chaos will pop up; being bit for bit lets you check that, whereas scientifically equivalent is unrepeatable.

Re-inventing distro mechanisms

farnz — Sat, 26 Jul 2025 10:59:27 +0000

This begins to sound like a reinvention of NixOS and similar distributions, which puts every package into its own prefix, and can support just about any dependency setup you care about as a result.

Re-inventing distro mechanisms

donald.buczek — Sat, 26 Jul 2025 06:14:20 +0000

But the user would still be free to run a simple command or use ./configure without selecting specific version or variants of existing software and the system would assume the single recommended version or variant.

The user, acting as an admin, would still be able to install new versions and the distribution-provided package manager would analyze the dependencies. It would just not try to resolve this to a result in which each package can only exist once. It might keep other/older variants around which are required by other packages. It might support some kind of diamond dependencies, too.

The basic difference would be, that packages go into their own file system tree so that they don't conflict with each other if multiple variants of the same package are wanted or needed.

Re-inventing distro mechanisms

taladar — Fri, 25 Jul 2025 07:22:56 +0000

> I sometimes think that a system that always bundles software in name-version-variant directories and supports the dynamic networking of these components as a core principle would be better from today's perspective.

To me that feels just like pushing the problem onto the user. The complexity is all still there, the distro just does not have to care so much about it but the user who wants to use the components together still does and has to constantly make choices related to that.

Re-inventing distro mechanisms

donald.buczek — Fri, 25 Jul 2025 04:45:06 +0000

> Otherwise, you get things like "we read /etc/foo/config" in both foo1 and foo3, but foo's developers forget that a particular key had a meaning in foo1, and reuse it for a different meaning in foo3

Exactly, the system that has long been chosen for Unix-like operating systems sorts files according to function (/usr/bin, /etc, ...).

You can get quite far with $PREFIX installations and wrappers, but it's ugly and opaque. I sometimes think that a system that always bundles software in name-version-variant directories and supports the dynamic networking of these components as a core principle would be better from today's perspective.

Re-inventing distro mechanisms

farnz — Thu, 24 Jul 2025 10:50:25 +0000

There's two problems with this approach (it's been tried before, and doesn't help distros):

Distros don't want to package foo1, foo2 and foo3. They want to just package a single maintained version, ideally, but can compromise if there are multiple maintained versions. However, in practice what seems to happen is that foo1 gets abandoned by its upstream, foo3 is the only maintained version, and the distro is now on the hook for pushing all the packages that depend on foo to stop using foo1 or foo2. We've seen this with, for example, GTK+, where GTK+ major versions can coexist, and the work of telling projects to move to a supported version of GTK+ has entirely fallen on the distros.
The same discipline required from foo's developers to ensure that you cannot have different major versions sharing state is also the discipline needed to do things like glibc's versioning linker script. Otherwise, you get things like "we read /etc/foo/config" in both foo1 and foo3, but foo's developers forget that a particular key had a meaning in foo1, and reuse it for a different meaning in foo3 - after all, no-one maintains foo1 any more, so no-one remembers it that well.

This pushes towards the glibc solution (one version, but with backwards compatibility), rather than parallel installability - not least because parallel installability leads towards an NP-complete problem for the packaging team trying to minimise the number of versions of foo that they maintain in the distro.

Re-inventing distro mechanisms

callegar — Thu, 24 Jul 2025 06:26:31 +0000

What is practically happening is that for some packages what would be the "major" version number, i.e., the number indicating non-backward compatible API changes in semantic versioning becomes a part of the package name.

So you get `foo1`, `foo2` and `foo3` rather than `foo` and you can have `foo1`, `foo2` and `foo3` coexist. This clearly does not let you *share state* between `foo1` and `foo2`. However, if the APIs are different the very idea of *sharing state* becomes scary.

Obviously, this requires discipline in package naming. But I think it would help a lot the work of distros if more packages followed this approach.

This does not feel radical enough

donald.buczek — Sun, 20 Jul 2025 06:58:20 +0000

> By “reproducible”, do you mean that the results are bit-for-bit identical, or that the results are scientifically equivalent?

To answer that question: In reality we (IT) are happy when we can provide an environment where you can run and recompile applications which were developed a decade ago. The actual scientific applications are developed, for example, by Bioinformaticians, who should think about the aspects of reproducibility on this level. Some do, most don't. If the output depends on races or other sources of randomness, we can't help. Most of the time, it doesn't matter for the scientific conclusions, though. Still it would be good for review if you could reproduce the output exactly and not just statistically.

This does not feel radical enough

kleptog — Sat, 19 Jul 2025 21:09:21 +0000

Sure, but it's usually fairly straight-forward to determine whether a model is stable or not. If your model isn't stable then the results are going to be suspect no matter what you do.

And there's a whole branch of mathematics about how to fix algorithms to improve numerical stability. Floating point numbers on computer have properties that mean you sometimes you won't get the answer you hope with a naive implementation.

This does not feel radical enough

Wol — Sat, 19 Jul 2025 15:12:03 +0000

> because even tiny differences in the inputs to the simulation would also cause divergence.

But do they? Depends on the chaos!

Some chaotic structures diverge rapidly with small differences in the inputs. Others (it's called a "strange attractor" iirc) find it very hard to break away from a stable pattern. Your high-pressure summer weather system that resolutely refuses to move is one of these.

Cheers,
Wol

This does not feel radical enough

DemiMarie — Sat, 19 Jul 2025 13:25:25 +0000

Two results are scientifically equivalent if the difference between them is within the margin of error. The divergent behavior you mentioned is real, but what they indicate is that neither result is to be trusted sufficiently far in the future, because even tiny differences in the inputs to the simulation would also cause divergence.

Re-inventing distro mechanisms

raven667 — Sat, 19 Jul 2025 04:22:34 +0000

> Suppose that A.py does `import B` and `import C`, and calls `B.foo()` and `C.bar()`; those modules both `import X`, and try to implement their functions using X functionality. Suppose further that they're written for different versions of X. Now suppose we add a syntax that allows each of them to find and use a separate X.py (such that each one implements the API they expect), and revamp the import system so that the separate X module-objects can coexist in `sys.modules` (so that B and C can keep using them).

No, not that, I'm no where near qualified to be a language designer but I was not suggesting that Python could be evolved to support two modules of different versions loaded in one interpreter process at the same time, once they are loaded in the interpreter there should be only one instance of X, and if the second import specifies that it is not compatible with the version of X which is loaded then it should fail (which is actually better than today which only has version checks on import if they are explicitly coded and not automatically I don't think)

Solving the whole problem, like Rust does where different parts can load different versions of libraries, which can be implemented in different versions of the language standard, is amazing, but defining a more easily tractable subset of the problem then solving that is often good enough.

Re-inventing distro mechanisms

zahlman — Fri, 18 Jul 2025 23:26:18 +0000

> Is this a solvable problem, creating a new mechanism for loading modules or declaring dependencies to get a soname-like experience for Python that can be retrofitted in in a way that affects new code which is updated to take advantage of it but not existing code which doesn't know about it?

Many people have this idea (there was a DPO thread recently, even: https://discuss.python.org/t/_/97416) but it really isn't feasible, even without considering "retrofitting".

When you import a module in Python by the default means, it's cached process-wide (in a dictionary exposed as `sys.modules`). This allows for module objects to function as singletons — doing setup work just once, allowing for "lazy loading" of a module imported within a function, customizing an import with runtime logic (commonly used to implement a fallback) etc. etc. And of course it avoids some kinds of circular import problems (though there is still a problem if you import specific names `from` a module - https://stackoverflow.com/questions/9252543), and saves time if an import statement is reached multiple times.

But this means that, even if you come up with a syntax to specify a package version, and a scheme for locating the right source code to load, you break a contract that huge amounts of existing code rely upon for correctness. In particular, you will still have diamond dependency problems.

Suppose that A.py does `import B` and `import C`, and calls `B.foo()` and `C.bar()`; those modules both `import X`, and try to implement their functions using X functionality. Suppose further that they're written for different versions of X. Now suppose we add a syntax that allows each of them to find and use a separate X.py (such that each one implements the API they expect), and revamp the import system so that the separate X module-objects can coexist in `sys.modules` (so that B and C can keep using them).

*Now the code in A can break*. It may, implicitly, expect the `B.foo()` call to have impacted the state of X in a way that is relevant to the `C.bar()` call, but in reality C has been completely isolated from that state change. And there is no general solution to that, because in general the internal state for the different versions of X can be mutually incomprehensible. They are effectively separate libraries that happen to look similar and have the same name.

In the real world, you *can* vendor B with its own vendored X1, and vendor C with its own vendored X2, and patch the import statements so that the vendored B and C access their own vendored Xs directly. But you can only do this with the foresight that B and C both need X, and then you have to write the A code with awareness of the X state-sharing problems. And none of what you vendor will be practically usable by *other* code that happens to have the same dependencies. In practice, vendoring is pretty rare in the Python ecosystem. (Pip does it, but that's because of bootstrapping issues.)

This does not feel radical enough

zahlman — Fri, 18 Jul 2025 22:54:07 +0000

> Note that old console video games are bit-for-bit identical; TASbot gives the same series of inputs, and the game gives the same outputs every time.

The latter does not prove the former.

Every language reinvent the wheel

zahlman — Fri, 18 Jul 2025 22:52:15 +0000

> managing dependancies is much more difficult than language architects realize, that's why Linux distibution package manager are usually nontrivial programs.

My understanding is that Guido van Rossum understood this quite well and made it very explicitly not his problem, which is why pip and Setuptools are technically third party and distutils got deprecated and eventually removed from the standard library.

For pure Python projects that can rely on the basic already-compiled bits of C code in the standard library (for basic filesystem interaction etc.), dependency management is generally not a big issue in Python IMX. Python's design makes it impractical to have multiple versions of a package in the same environment, which occasionally causes problems. But usually, everything is smooth even with compiled C code as long as it can be pre-compiled and the system is sufficiently standard to select a pre-compiled version. When I switched over to Linux for home use it never even occurred to me to worry about whether I'd lose (or complicate) access to popular Python packages, and indeed I didn't.

> I do really hope that the most used languages in open source converge to one and only one package manager be it based on rpm, dpkg or portage.

Perhaps it's gauche to point it out on LWN, but this will not satisfy the needs of the very large percentage of Python programmers who are on Windows.

But also, to my understanding, none of these tools (or their corresponding package formats) are oriented towards installing things in a custom location, which is essential for Python development and even for a lot of end users. I'm not sure even chroot would help here — currently, pip needs to consult the `sysconfig` standard library (https://docs.python.org/3/library/sysconfig.html) to determine install paths, and it also supports installing for the system (recent Python versions may require a security override), in a separate user-level directory or in a virtual environment. (And you really do need virtual environments.)

Nomdeterministic installations

zahlman — Fri, 18 Jul 2025 22:33:58 +0000

> packages can dynamically compute some metadata relevant to the resolving (I forget the exact details). Obviously made worse by not being able to get any of that without downloading the package.

Packages adhering to recent metadata standards get their metadata files extracted automatically and made separately available on PyPI (the relevant standards are described in https://peps.python.org/pep-0658/). But the metadata for a modern source distribution (a `PKG-INFO` file — not pyproject.toml, which you may think of as "source" for the "built" PKG-INFO metadata) is still allowed to declare everything except the metadata version, package name and package version as dynamic. And in older versions of the standard, anything you omit is implicitly dynamic.

There is, now, a hook defined for getting this metadata (https://peps.python.org/pep-0517/#prepare-metadata-for-bu...), but there's nothing to force packages (more realistically, the build systems they depend on) to implement it. By default, the flow is: your installer program builds the entire "wheel" for the package (by setting up any build-system dependencies in an isolated environment, and then running the source package's included build-orchestration code), then checks what metadata is in *that* resulting archive file. (I sort-of touched on this in https://lwn.net/Articles/1020576/ , but without specifically talking about metadata.)

It isn't really supposed to be this way. In principle, https://peps.python.org/pep-0508/ describes a system for making a project's dependencies conditional on the target Python version, OS etc. But apparently this system still can't provide enough information for some packages — and many others are just packaged lazily, or haven't been updated in many years and are packaged according to very outdated standards. And this only helps for dependencies, not for anything else that might be platform specific. (Apparently, *licenses* are platform-specific for some projects out there, if I understood past discussion correctly.)

This is arguably just what you get when you want to support decades of legacy while having people ship projects that mix Python and C (and Fortran, and now Rust, and probably some other things in rare cases).

Nomdeterministic installations

zahlman — Fri, 18 Jul 2025 22:12:19 +0000

> Similar thing can already be done in the code of the package itself, but it would be in plain view for any user. With the exotic package, it could be hidden (unless there's a way to identify the package is not build from published source) in a package that nobody is likely to review.

My understanding is that NVidia has already been doing this sort of thing with setup.py for years. Not involving malware, I presume, but my understanding is that they explicitly re-direct pip to download the real package from their own index, after running code to determine which one.

In principle, setup.py can be audited before running, but in practice you have to go quite some distance out of your way. `pip download` is not usable for this task (see https://zahlman.github.io/posts/2025/02/28/python-packagi...) so you need to arrange your own separate download explicitly and then convince pip to use that file. And then multiply that by all your transitive dependencies, of course.

Such code isn't included in, and doesn't run from wheels (i.e. specifying `--only-binary=:all:`), but then you don't get the potential benefits from trusted code, either. Assuming a wheel for your platform is available in the first place.

It seems that people want to be able to install code and its dependencies in a completely streamlined, fast way; but they also want to be able to use packages that interface to C code, and not have to worry about niche platform details (I saw https://faultlore.com/blah/c-isnt-a-language/ the other day and it seems relevant here), and avoid redundancy, and also have everything be secure. It really seems like something's gotta give.

Wheel we get multiple file compression?

zahlman — Fri, 18 Jul 2025 21:55:21 +0000

I have seen ideas thrown around for that in the community. There have even been suggestions about support for putting the actual code in a second internal archive while leaving metadata arranged as usual. Certainly it's at least desired to support other compression formats - after all, lzma support exists in the standard library (and there's a fairly popular third-party package for zstd). For cases with *identical* files, PEP 778 (mentioned in the article) aims at support for symlinks.

Of course, there are multiple steps to implementing any of these kinds of changes across the ecosystem. Ideally, changes to metadata standards should ensure that older installers (i.e. older versions of pip) automatically ignore packages they won't know how to handle. And newer ones of course have to actually implement the new specifications. That's especially an issue for symlinks — the packaging formats need to be able to describe them even for archive formats that don't provide native support, but more importantly, Windows installers need to be able to work without admin rights. Presumably this means describing a more abstract form of link, and then Python code would have to be rewritten to actually use those links. .pth files work for Python source, but for everything else (data files, compiled C .so files etc.) an entirely new mechanism would be needed. And right now, it seems that the PEP 778 author/sponsor/delegate aren't even sure if they want to tackle that wide of a scope.

On the flip side, I worry that there isn't actually enough demand for these features to get them prioritized. It seems like lots of developers out there are perfectly happy to e.g. specify "numpy" without a version as a dependency for today's new one-off hundred-line script, and download 15-20MB for it again because the latest point version isn't in cache.

This does not feel radical enough

dvdeug — Fri, 18 Jul 2025 18:37:36 +0000

Why would bit-for-bit identical be impossible? Don't use randomness, and be careful about using multiple threads. You might need to worry about different ISAs; best to run the same binary on all. There's a lot of work on Debian to build arbitrary packages to be bit-for-bit identical; it's definitely possible. Note that old console video games are bit-for-bit identical; TASbot gives the same series of inputs, and the game gives the same outputs every time.

As for the results are scientifically equivalent? Early weather simulations established that not bit-for-bit identical simulations will diverge. Even in less chaotic cases, how do you know they're scientifically equivalent? Bit-for-bit is easy to check; scientifically equivalent is hard, if not impossible, to check.

Re-inventing distro mechanisms

raven667 — Thu, 17 Jul 2025 20:27:19 +0000

> A notable aspect with Python (and other languages) is that while packages can in principle be shared among many applications (ultimately looking a bit like shared libraries), it is impossible to make different version of the same package co-exist

Is this a solvable problem, creating a new mechanism for loading modules or declaring dependencies to get a soname-like experience for Python that can be retrofitted in in a way that affects new code which is updated to take advantage of it but not existing code which doesn't know about it? Maybe some special attribute of __init__ or something which can provide version range info, and a new directory structure or naming convention for module_name@version or something, with a constraint that the same python interpreter maybe cannot load two different versions of the same module_name at the same time and will have an import error exception instead if its attempted. This could allow the python interpreter to have the same behavior as if you used a virtualenv but integrated with the system-wide directory structure and far more tractable for a package manager to update by not having overlapping files.

Re-inventing distro mechanisms

callegar — Thu, 17 Jul 2025 08:59:21 +0000

Reinventing distribution-like mechanisms is sometimes not just the consequence of an initial sense of superiority with respect to the "useless" distros, but unfortunately a necessity to work around some inherent aspects of certain modern programming languages and environments.

A notable aspect with Python (and other languages) is that while packages can in principle be shared among many applications (ultimately looking a bit like shared libraries), it is impossible to make different version of the same package co-exist, which makes the traditional way of packaging things adopted in distros a nightmare.

If you need applications A, B and C and they all rely on package X, then either you vendor X in A, B and C or the distro needs to find a single version of X that can satisfy A, B and C at the same time. If this is impossible, then the traditional distro approach will fail or force the developers to patch downstream A, B, C or even X to solve the issue. For languages that enable having different versions of the same package coexist, it becomes a matter of providing both a package of X-1 and of X-2, so that, for example A can depend on X-1 and pull it in when installed and B can depend on X-2.

The reason why you need tools like conda or uv, capable of managing a huge cache of pre-built packages, of creating virtual environmens and of creating in there a forest of links to the packages in the cache is not just "providing some isolation", but also (and I would say in great portion) a workaround for not having the possibility of dropping all packages in a single place in multiple versions as needed and having the projects go seek themselves the versions they can work with.

In some sense the "keep it simple here" (no package versioning) ends up "making it complex somewhere else" (no practical possibility to rely on common distro packaging tools and the *real* need to reinvent them).

Shared library packages

intelfx — Thu, 17 Jul 2025 07:38:21 +0000

> What would be very useful at this point is the ability to be able to package binary libraries to support other packages and to be shared among them

I think this is the final undeniable proof that every language-specific package silo which is created to bypass "those useless distros" eventually becomes just a worse ad-hoc distro as soon as people start using it to solve actually hard packaging problems.

Shared library packages

callegar — Thu, 17 Jul 2025 06:54:27 +0000

I mean being able to solve at least this issue would already be a huge win. I don't know if this can be done *in the short term* without having to thoroughly change the way in which things are packaged.

Shared library packages

callegar — Thu, 17 Jul 2025 06:51:36 +0000

What would be very useful at this point is the ability to be able to package binary libraries to support other packages and to be shared among them. An obvious example is blas. As of today every Pypi wheel seems to vendors its blas, which is horrible:

1. it hinders the possibility of picking the best blas for your system or to test different options;

2. most important it breaks the possibility to do multithreading right. Every individual blas ends up with its own view of how many cores the system has and on how many of them it is parallelizing. This means that if multiple packages end up doing things concurrently and each uses blas, you end up with a very suboptimal case where more threads than cores are employed. For this reason it looks like many packages that vendor some blas in their wheel build it to be single core. And again this is a performance loss.

3. it makes packages bigger than they should and memory usage larger than it should, with an obvious loss in cache performance.

This does not feel radical enough

aragilar — Wed, 16 Jul 2025 13:36:16 +0000

The elephant in the room is Windows. Once you're willing to drop Windows there are lots of options, but practically once you include Windows you only have conda (and various reimplementations of it). For some reason people want to push the PyPI/PyPA ecosystem to be a clone of the conda ecosystem due to some perceived issues with conda (when it's likely the issues people have with conda are due the constraints needed to support such a setup).

This does not feel radical enough

DemiMarie — Wed, 16 Jul 2025 01:45:33 +0000

By “reproducible”, do you mean that the results are bit-for-bit identical, or that the results are scientifically equivalent? The former is often impossible. The latter is much more likely to be feasible.

This does not feel radical enough

gray_-_wolf — Tue, 15 Jul 2025 21:47:37 +0000

Since (at least partially) the motivation here is scientific computations and reproducibility, I wonder whether it would not be better to just adopt GNU Guix (or, I guess, Nix). You would be able to express dependencies on native libraries and fine-tune to specific architectures as much as you would want. If binary substitutes would be available, the package would be fetched, otherwise it would be compiled from source.

Well, I just wonder whether "wheel" is really the best model to base reproducible (scientific) computing on. Was more radical approach considered?

Nondeterministic installations

donald.buczek — Tue, 15 Jul 2025 09:12:53 +0000

> I don't see much difference between this proposal and the sort of "./configure && make && make install" dance that is very common in any sort of compiled language where it sets various flags and attributes based on the environment it finds itself in.
> Unless that stuff is well documented and thought out there has always been a element of 'reverse engineering' when dealing with that sort of thing.

Well, while in theory configure-scripts could also do all kinds of unwanted stuff, most of the time they are generated by GNU Autotools. And while GNU Autotools is archaic and unnecessarily complex, it has set and conforms to standards which are very much welcome. Autotools-generated scripts usually respect environment variables like $DESTDIR. They won't attempt to "edit" files into your /etc or replace some library in your /usr/lib with patched versions.

Typically, a "configure" checks for the "availability" of a feature, for example a library with a specific minimum API version. If it's not available, the script will either abort the configuration process or adjust the configuration to exclude features that require the missing library. However, once you have such a required component in your system, there is no reason it would go away. If the dependencies are well-designed, upgrading them typically does not cause problems. A typical configure script will not check for hardware attributes like the amount of memory or the number of cores or the specific version of your graphics card.

Yes, sometimes we can't just rebuild very old stuff, because of changes in the C-compiler or when the required software was not as well designed. But generally, packages with Autotools work well for us.

> It could range from "easier then dealing with rpm or deb files" to "being nearly impossible to use" depending on the details of their implementation.
Right. Maybe the problem is as follows: When I think of "vendor plugins to detect various aspects of the environment", I think of the NVIDIA driver installer and that is a major pain point.

Right. Maybe the problem is as follows: When I think of "vendor plugins to detect various aspects of the environment", I think of the NVIDIA driver installer and that is a major pain point.

Every language reinvent the wheel

taladar — Tue, 15 Jul 2025 07:17:49 +0000

That is never going to happen, mostly because languages differ quite significantly in how they handle compile-time options and most of the distro package managers other than portage don't handle them at all. Portage on the other hand has a lot of special casing for situations where the simple on/off USE flags don't work.

Take e.g. Rust features, they are also on/off flags like USE flags in portage but work entirely differently again with cargo assuming that any version of a crate with a specific feature enable (regardless of the values of other features) will satisfy a dependency that asks for that crate with that feature.

If you wanted a grand unification of all package managers you should probably first start trying to define what a package actually is (including details like optional compile time features and other compile time options such as alternative dependencies (think openssl/gnutls/...)). You would quickly realize that this is somewhere between hard and impossible.

Nondeterministic installations

rjones — Mon, 14 Jul 2025 19:07:01 +0000

I don't see much difference between this proposal and the sort of "./configure && make && make install" dance that is very common in any sort of compiled language where it sets various flags and attributes based on the environment it finds itself in. Unless that stuff is well documented and thought out there has always been a element of 'reverse engineering' when dealing with that sort of thing.

Of course, besides the downloading part.

Provided they "do the right thing" and support isolated or offline installs intelligently and in a standardized way then you shouldn't have to do any reverse engineering. Whenever dealing with python or most any other language you have to figure out a way to cache it to your networks or locally if you want to avoid depending on pulling whatever is hosted on the internet.

It could range from "easier then dealing with rpm or deb files" to "being nearly impossible to use" depending on the details of their implementation.

Every language reinvent the wheel

Wol — Mon, 14 Jul 2025 13:05:31 +0000

> rpm, dpkg or portage.

And here lies the crux of the problem. Rpm and dpkg would need massive re-engineering to cope with what portage does, I suspect, while portage is massive overkill for what rpm or dpkg do.

And then one only has to look at Red Hat and SuSE - two rpm-based distros that (less now than previously) were pretty much incompatible despite sharing the same package manager.

Everyone thinks that rpm distros were mostly cross-compatible, but that was for the same reason that dpkg distros are compatible - dpkg distros are all (afaik) Debian derivatives. MOST rpm distros are descended from Red Hat, but SUSE is descended from Yggdrasil/Slackware (and the rest... I can't remember the complications).

Cheers,
Wol

Every language reinvent the wheel

vivo — Mon, 14 Jul 2025 12:54:34 +0000

managing dependancies is much more difficult than language architects realize, that's why Linux distibution package manager are usually nontrivial programs.

I do really hope that the most used languages in open source converge to one and only one package manager be it based on rpm, dpkg or portage.
This would make the life of the distributions much easier and the entry barrier for developers lower bonus it would improve the quality of dependancy management for everybody

Nondeterministic installations

dbnichol — Fri, 11 Jul 2025 22:54:05 +0000

That was basically my first thought, too. You definitely want to have an explicit path so the installer is deterministic. There are lots of cases where you need to know that you're going to consistently resolve a specific set of packages. However, having heuristics that are more likely to just DTRT is a win for casual developers. I didn't read the actual proposal, but I'd be surprised if this wasn't being considered already.

Nondeterministic installations

donald.buczek — Fri, 11 Jul 2025 14:36:29 +0000

Thanks. I didn't want to go into every detail, but we already have our own solutions utilizing wrappers, environment variables, and symbolic links. These provide the programs with the runtime environment they need and enable us to install several versions and variations of the same software stack in parallel, which users can select from at runtime. These tools can also hide hardware complexity from the applications by providing hardware-specific variants of libraries etc.

If software just follows standards, everything is fine. For example, use execlp() to activate external programs, as it respects the PATH environment variable. Activate shared libraries with the dynamic linker which honors LD_LIBRARY_PATH. Open files by their canonical names, which allows redirection by symbolic links.

If, on the other hand, an installer 'locates' all the files at installation time and configures them with resolved paths, the setup may fail if the environment changes. This can occur, for example, if you switch to a machine with a different GPU, necessitating a different driver version and library variants.

But even outside our environment the prototypical user with their personal notebook and nothing else should be annoyed, when updating one thing calls for re-installation of another thing, because the vendor-supplied oracle needs to look at the environment a second time. Backup and Restore to a replacement system will no longer work.

Nomdeterministic installations

SLi — Thu, 10 Jul 2025 16:45:43 +0000

Tools like poetry have now problems, I believe, resolving dependencies because packages can dynamically compute some metadata relevant to the resolving (I forget the exact details). Obviously made worse by not being able to get any of that without downloading the package.

So this proposal sounds like something that would, perhaps, solve the needing-to-download part. But how would it play otherwise with the problem? Or is computable dependency resolution a lost cause?

Nondeterministic installations

gpth — Thu, 10 Jul 2025 12:53:48 +0000

I don't have all the details of your environment and you might have already considered this, in which case I apologize in advance :).

To me Python packaging, even with frozen, specific versions seems not enough for complete reproducibility, exactly for the reasons you highlighted. Without knowing all the details, multi-architecture container images (ie. https://developers.redhat.com/articles/2023/11/03/how-bui...) sound like a solution that would provide better reproducibility here.

Nondeterministic installations

donald.buczek — Thu, 10 Jul 2025 10:08:34 +0000

I apologize for the lazy single-word negative comment and the typo in the subject line.

I'm not a fan of this for multiple reasons. Let me elaborate on just one of these reasons, which I wanted to express with the term "nondeterministic installation".

We run our own in-house distribution. You can think of me as a packager. The systems on which we install software are different from those where the software eventually runs. So every installer which wrongly assumes that the environment it sees at installation time is the same as at runtime needs extra work from us. We operate in a scientific environment, and want reproducibility. Of course, if at all possible, we prefer building from source. Regardless of whether we build from source, whenever an installer attempts to download something, we need to reverse-engineer the process. We then download the data in advance and modify the installer to use the local copy. We don't want the installation to fail or produce different results the next time because something external has changed or is no longer available.

Additionally, I wouldn't trust "vendor plugins" at all to correctly detect "various aspects of the environment". In my experience they only work for standard installations on one or two big distributions and nothing else.

Nomdeterministic installations

cpitrat — Thu, 10 Jul 2025 08:01:19 +0000

I just thought of a potential additional concern: supposing the script runs and returns which version of the package to use, one idea that comes to mind is some targeted attack: if hostname== target: return "compromised_package"

This supposes, of course, that the source cannot be trusted or has been compromised. The script would have to be obfuscated enough that the targetting is not obvious (e.g. relying on some very rare piece of hardware that the target is known to use).

Similar thing can already be done in the code of the package itself, but it would be in plain view for any user. With the exotic package, it could be hidden (unless there's a way to identify the package is not build from published source) in a package that nobody is likely to review.

Nomdeterministic installations

cpitrat — Thu, 10 Jul 2025 07:54:18 +0000

IIUC the idea is to download some code (a small python script probaby) which runs on your machine and returns some information about your system.

I guess the "terrible" comes from the concern of transparently executing remote code.

Yet you're already downloading code that you'll execute anyway, coming, in theory, from the same source. And I think pip is already executing some python code at installation time. So I'm not sure if this idea should bring additional new concerns ...