|
|
Subscribe / Log in / New account

Python packaging and its tools

By Jake Edge
March 1, 2023

Python packaging

The Python-packaging discussions continued in January and February; they show no sign of abating in March either. This time around, we look (again) at tools for packaging, including a brand new Rust-based entrant. There is also a proposal to have interested parties create Python Enhancement Proposals (PEPs) for packaging solutions that would be judged by a panel of PEP delegates in order to try to choose something that the whole community can rally around—without precluding the existence of other options. As always, it is all a difficult balancing act.

One tool

Picking up from where our last article left off, there was interest in finding a single tool that the Python Packaging Authority (PyPA) could push as the default. But Donald Stufft said that he is skeptical that the PyPA has the means to "bless a singular tool in a way that people will actually recognize it as 'the' tool". Beyond that, though, he is not sure that the clamored-for single, unified tool is even possible; everyone expects too much of such a tool.

I suspect that 100% of the users that want a unified tool, just blindly assumed that whatever their preferred workflow, or something like it, would of course be included in that tool, and they don't consider that they might have to make drastic changes to their workflow to get it-- but somebody is going to have to make drastic changes, because the reality is what exists now in the world are so varied that a singular tool can't possibly solve them all IMO.

Greg Roodt wondered if an enhanced pip might be the right path. Pradyun Gedam agreed, noting that pip already occupies the privileged/default position. Even though it is not sensible to "combine all the workflows/innovations into one thing", getting to a 90% solution is "not an intractable task", he said.

Paul Moore was concerned that adding these features to pip is not something that can be done quickly; "pip has a lot of technical debt which we would need to pay off before we could add a lot of extra complexity". Roodt acknowledged that, but, like Gedam, thought a few extra pieces would be sufficient to help simplify the ecosystem substantially. Stufft said that adding features to pip "is probably the least controversial way of arriving at a unified tool".

Both Moore and Stufft were worried about adding the virtual-environment management that is offered by other tools to pip, but they did discuss some possible approaches. Brett Cannon cautioned that, based on some investigation of virtual-environment workflows he had done recently, "there is no 90% answer, so there will be some teeth gnashing regardless of the decision".

Poetry creator Sébastien Eustace wondered if there was even a need for the PyPA to "endorse or promote a single tool". He noted that Poetry is an independent packaging tool that came about because of missing pieces in the PyPA tool set; it was never endorsed by the PyPA but still has become "the second most downloaded 'packaging' tool". Stufft said that the push for a single recommended tool arose because it is one of the most requested features from users; the current status quo works, but users are not happy with it. Any endorsement would simply set the default:

I don't think anyone is suggesting preventing there to be options. The question is whether there should be a recommended or "default" option, not whether we should provide an only option. Obviously users want it, and they don't feel well served by the status quo.

Ofek Lev was concerned about the "massive undertaking" required to add these extra features to pip; throughout, he has been advocating Hatch as a better candidate for the unified tool. Stufft's (and others') arguments boil down to a question of practicality, though; pip exists, it is used by a majority of Python users, is already recommended by the Python Package Index (PyPI), so it is the default default, so to speak. But Lev thinks that fundamentally changing pip will be difficult to do for a number of reasons:

Do we think the backends of Flit, Hatch, Poetry, PDM, etc. were created just for fun or because PEP 517 told us we could? No, it was because setuptools was too difficult to contribute to. And consider in that case that is merely improving upon its central and only purpose of building packages. In the case we're talking about here we're in a code base of equivalent size with even more complexity and we're talking about not just adding new features but fundamentally changing what it does/is.

It is probably not surprising that Stufft disagreed; he does not think it constitutes a fundamental change to pip, just further evolution of the tool:

In fact, pip can start adding those features today, without anyone's permission, and I suspect if they did so the "please provide a unified tool" talking point would just go away, because pip is already the default tool, it's just implementing the features that people keep asking for.

However, the ability of the pip maintainers to find the time to do that work is worrisome to various commenters. "H. Vetinari" said that it is "an intriguing idea to flesh out pip in this way", but there is a need to expand the maintainer group in order to "realistically grow all those features in something less than 'years'". Moore agreed that it would require some "fundamental changes to how pip is maintained" to speed up the development of these extra features. He had a long list of reasons that it would take a long time, including limited maintainer bandwidth and the preservation of legacy pip workflows.

But Moore also agreed that choosing pip neatly sidestepped the "which tool to bless" question; "I don't think we should underestimate the challenges", however. While users want a unified tool, they may not want to wait as long as it would take; "In 3-5 years? Maybe we could get pip to the point of being that tool in that sort of time period. In 6 months? Not a chance." Stufft said that users are mostly reasonable and will just want to see progress toward the goal; "I don't think there is a world where they get it in 6mos no matter what we do".

Ralf Gommers is skeptical about the pip-based plan. He said that "the weight of history, the complex and legacy code, the backlog of issues and difficulty of working on pip, and the important lower-level role as a pure installer it already fulfills are already stacked against this idea". He suggested that some combination of Poetry, Hatch, and PDM might be the right approach; "Each has its own problems and isn't complete enough, however if you'd take the best features of each you'd have about the right thing." Authors of each of those tools have commented in the thread, he said, so they could simply get together and produce a unified tool:

[...] I think it's safe to say that if these projects would join forces, we'd have something very promising and worth recommending as the workflow & Python project management tool. And it doesn't require a 200+ message thread with everyone involved in packaging agreeing - if a handful of authors would agree to do this and make it happen, we'd be good here and could "bless" it after the fact.

Gedam announced his (also lengthy) blog post that attempted to summarize his views and fill in lots of the background on the topic. He concluded that adding features to pip would be desirable, but that it is daunting. Developers interested in packaging have generally developed their own tools:

[...] we've made it fairly tractable to "build your own" in a sandbox that lets you ignore the need to support entire swaths of workflows, and that's something you can't compete with easily for contributor experience. And, when the alternative is "spend a few months trying to implement something in a 'legacy' codebase, while catering to needs that you don't have, also convince a bunch of people with limited availability that your idea is a good one and wait for them to review what you wrote", it's not surprising that we end up with a bunch of "new things" and have multiple groups building multiple workflow tools.

We still don't have agreement that this is the direction that we, as a community, want pip to go.

Battling PEPs?

Stufft was generally in agreement with Gedam's "excellent post", but he did take exception to the idea that the community is not in agreement. He believes that most users are in agreement that pip (or some other tool that is shipped with Python) should provide the "unified experience". Since pip is that tool, it should be enhanced, or some other tool should be shipped with Python, which would require agreement from the Python core developers by way of the steering council (SC). He proposed a kind of PEP "battle" to figure out which direction to go:

Interested parties write a PEP on how they think we should solve the "unification" problem within some time frame, all of these PEPs will have the same set of PEP-Delegates, the various proposals will be discussed just like any other PEP, then the PEP-Delegates will pick one and that's the direction we'll go in. [...] If they are unable to come to an agreement, then it will get kicked up to the SC to make a choice.

My recommendation is that we do something a little unorthodox and instead of having a singular PEP-Delegate, we instead have a team of 3 of them, who will together select the direction we go in. My rationale here is that this is our first time making a decision quite like this, it's going to be a decision with a large impact, and there is no one singular person who could make this decision who isn't biased in some way.

Christopher A. M. Gerlach (C.A.M. Gerlach), who is one of the PEP editors, further refined the idea and offered his assistance. There has been at least one volunteer for the PEP-Delegate group that would evaluate the PEPs, but, as of yet, there has been no visible action on the creation of PEPs to consider. It is not at all clear that those who might be in a position to propose a PEP and push it through to "completion" want to put in the enormous effort required to do so. Multiple competing visions seems like it may be even more of a stretch, but we shall see—it has only been a little over a month since Stufft suggested that path.

A new tool

On January 20, though, Nathaniel J. Smith announced a new tool (and binary format) that, to a certain extent, upends the usual order of things. He noted that one of the goals of Kushal Das, who was one of the authors of PEP 582 ("Python local packages directory") back in 2018, was that Python beginners only need to download a single thing in order to get started with the language. The PEP, which is still being discussed, was a means to that end. Smith looked at the problem from a different angle:

Historically, our tools have started with the assumption that you already have a Python, and now you want to manage it. That means every tool needs to be prepared to cope with every possible way of installing/managing Python. It means a beginner-friendly workflow tool has to be part of the interpreter (the main motivation for PEP 582), even with all the limitations that imposes [...]

But what if we went the other way, and uploaded CPython to PyPI, so you could pip install python? Well, OK, you couldn't actually pip install it because pip is written in Python, but pretend we had a tool that could do this. Then Kushal's beginners could install this one tool, and it could bootstrap Python + the packages they needed.

Pybi is Smith's format for packaging CPython binaries for distribution, which is similar in form to the wheel format used by PyPI. That way, some tool could download the latest Python, install it, and pre-populate the install with some packages of interest from PyPI. As noted, though, that tool would not have access to a Python environment, so Smith also developed posy in Rust. In part, posy is meant to be a way for Smith to exercise his Rust skills. The GitHub site README starts with an homage, calling the project: "Me messing around in Rust for fun (just a hobby, won't be big and serious like pip)". The eventual goal sounds fairly serious, however:

  • A project-oriented Python workflow manager, designed to make it easy for beginners to write their first Python script or notebook, and then grow with you to developing complex standalone apps and libraries with many contributors.
  • A combined replacement for pyenv, deadsnakes, tox, venv, pip, pip-compile/pipenv, and PEP 582, all in a single-file executable with zero system requirements (not even Python).

The reception to the announcement ranged from generally positive to something approaching "over the moon", though there are still plenty of reservations, of course. For the most part, posy simply implements the existing packaging standards, but it also takes into account the lifecycle model that was discussed at a 2018 core sprint. That model ranges from beginners (or, more broadly, simple projects, perhaps consisting of a handful of scripts) through deployable web applications, reusable libraries, and standalone applications; it has come up multiple times in these packaging discussions.

Gedam was concerned that posy is inventing yet another scheme for virtual-environment handling, among other things; he raised the inevitable specter from the xkcd: Standards comic. Moore agreed with some of those concerns, but was happy to see posy take the full lifecycle into account:

Most tools and approaches I've seen either frame themselves as "beginner friendly" (stage 1 and maybe 2), or as aimed at stage 3 (deployable webapp/reusable library/standalone app) and later. And both groups assume that stages 1 and 2 - "simple scripts" and "sharing with others" are beginner workflows, not needed by more advanced users [...] Or at least, that's how the documentation, examples and discussions feel to me.

I've no idea whether this project will succeed in unifying the full lifecycle described in that document. I don't know if it'll make our existing problems worse. I'm concerned about the fact that it's inventing new mechanisms for things like isolation that may or may not work. I suspect that a model based around heavy manipulation of sys.path will cause huge problems for the static typing community, for example. But I'm pleased that someone is looking at a problem which I feel like struggled to express well enough to get the existing tools to pay attention to [...], and I'm glad that we're still innovating, and not just fighting to consolidate what we have and deal with legacy issues.

While "some of the ideas here are interesting", Stufft said, there were some things that he was "not particularly enthused about", including an unclear deployment story, an unnecessary extra binary format, and the implementation language. "I personally enjoy Rust, but I think it speaks to a serious shortcoming in the idea that it relies on being written in an external language to make it viable." He argued that the only Rust property that was being employed was that it can create a standalone binary, which is really just a property of compiled languages; it could have "used one of the various strategies that exist to create a single file executable out of Python", instead. He is concerned that it gives the wrong impression:

The language choice is a short coming, because it has the implication that the packaging tool isn't capable enough to produce real world software that is meant to be deployed to end user machines, machines that you can't rely on the system Python on. After all, there's nothing inherently special about posy here, it's just an application that wants to run without the dependence of an existing Python install.

Smith pointed out that posy would not exist at all if it were not written in Rust, however, since that was part of why he wrote it. Stufft acknowledged that, but is concerned that by sidestepping (via Rust) the problem of delivering a Python command-line application to users, that important part of the overall Python-packaging story is being skipped as well. He clarified that point further in another post: "My assertion is that packaging things for distribution to end users is also part of the packaging story, because well it is, and it's one of the most chronically underserved parts of our packaging story [...]".

The thread continued on a ways and it appears there is a fair amount of enthusiasm for Smith's approach. Where that goes from here is hard to say, but there is still a plenty of work needed to get to the point where posy can fully fill the niche he envisions for it. It may well make sense to merge the pybi and wheel formats into a "wheel 2.0" or similar; there is talk of doing so, which might be an effort that is independent of posy's future.

A new thread

As January came to a close, the thread for part one of the packaging-strategy discussion wound down and was eventually closed. In early February, the thread for the second part of the strategy discussion was opened, though it seems that much of the energy has gone out of the conversation(s), as the new thread had a rather desultory tone. Moore wondered if the discussion time might be better spent elsewhere. He asked: "But are these strategy discussions likely to deliver anything better, or are they just taking energy and bandwidth away from the people working on making progress?" Part of the problem is that the PyPA is effectively simply an interest group, rather than a decision-making body:

Discussions like this tend mostly to demonstrate that there's no uniform view on direction among PyPA members (let alone among non-PyPA projects like conda and poetry). [...] There wasn't much consensus on the previous discussion, so does that mean we have no strategy? Or will someone propose a strategy, in which case without a change in PyPA governance, what difference will that make? (Even with a change in governance, I don't see anyone imposing a particular direction on packaging projects - there's too much history of independence for that to happen any time soon).

That led Gedam to start something of a meta-thread where he responded to the frustrations that had been voiced about the discussions; he sympathized with those feelings, but felt that progress was being made. Beyond that, despite people feeling a sense of urgency to immediately solve the packaging problems, it is going to take a while to get there. "We're not going to magically/quickly solve issues that are happening at a larger-than-ever scale and that have grown into their current shape over more than a decade!"

Steve Dower suggested that some kind of focused, in-person gathering might be a better way to get some kind of resolution, though other options are possible: "Less ideal is to have regularly scheduled meetings in amongst other distractions, and at the bottom end is to have an online-only, text-only, open-invite discussion without a specific goal (sound familiar? :) )". Gommers wondered if there were any plans for such a gathering, but Gedam said that there were not, at least yet.

There is still fruitful discussion going on in various threads in the Packaging category of the Python discussion forum. It is clear that none of these questions or problems is going to be resolved anytime soon, though progress is slowly being made in various areas, just as it has been over the past decade or more.

It is probably the right time to let things play out a ways before we check back in on this freewheeling Python-packaging conversation. It will be interesting to see what, if anything, concrete comes out of it. There is, already, the pypackaging-native site, which describes many of the problems, but are there PEPs in the works to solve some of them? While the discussion is somewhat fragmented—and a bit fractious at times—there is a lot of attention being drawn to the problems right now, which may help lead the community to a workable path for a solution (or, more likely, solutions). Stay tuned ...


Index entries for this article
PythonPackaging


to post comments

Python packaging and its tools

Posted Mar 2, 2023 9:56 UTC (Thu) by egor_tensin (subscriber, #118591) [Link] (1 responses)

I really wish some kind of solution existed. Currently, the official docs at https://packaging.python.org/en/latest/tutorials/packagin... offer _four_ options to "backends" in pyproject.toml. I really don't know or care about the difference between them, and as far as I can see, it's not even explained. Where did the setup.cfg file go? It was there couple of versions back. What a mess!

Python packaging and its tools

Posted Mar 2, 2023 14:28 UTC (Thu) by pbonzini (subscriber, #60935) [Link]

setup.cfg is still there if you use setuptools, but you can also put all your configuration in a pyproject.toml file.

Information on how to build the wheel can be in other files (setup.py for setuptools, meson.build for mesonpy, etc.) but the common information about the project (dependencies, scripts, name/version/author/description, etc.) is always in pyproject.toml.

I looked at it a couple months back in the context of writing a program which had Cython bits in it. It's actually pretty nice—except that finding the right documentation is a Herculean task.

Python packaging and its tools

Posted Mar 2, 2023 11:16 UTC (Thu) by LtWorf (subscriber, #124958) [Link]

Thanks for this. I try to follow their discussion, but there are too many comments for me to really manage to keep track of it.

Python packaging and its tools

Posted Mar 2, 2023 12:33 UTC (Thu) by cyperpunks (subscriber, #39406) [Link] (3 responses)

> The language choice is a short coming, because it has the implication that the packaging tool isn't capable enough to produce > real world software that is meant to be deployed to end user machines, machines that you can't rely on the system Python on.

It's sad, but that's my current conclusion, it's not just packaging.

The short life time of each Python 3.X release combined with frequent breaks for backward compatibility and the unique ability in Python to create runtime issues rather than build time issues turns it into a no-go for serious projects.

Python packaging and its tools

Posted Mar 2, 2023 13:46 UTC (Thu) by pizza (subscriber, #46) [Link] (1 responses)

> The short life time of each Python 3.X release combined with frequent breaks for backward compatibility and the unique ability in Python to create runtime issues rather than build time issues turns it into a no-go for serious projects.

Of course, by the time your organization realizes this you're so far down the rabbit hole the only thing you can do is keep digging deeper into technical debt (or be branded as the person/group who was responsible for the project being cancelled)

Python packaging and its tools

Posted Mar 3, 2023 1:38 UTC (Fri) by kenmoffat (subscriber, #4807) [Link]

As someone who only uses python modules as a build-time or runtime dependency, I totally agree with this. But for those of us who have to, or choose to, maintain existing systems there is one bright spot - once python3.N.x is working on a system I can update python through the 3.N series and expect that series to be maintained for longer than the old system. Of course, if you are into really longterm systems then that is not a lot of help.

I usually keep current and some past desktop systems available and maintained for relevant vulnerabilities for a year or two, although I'll build new (LFS/BLFS) systems frequently. Before I discovered that past python minor versions did get updates for vulnerabilities it was a pain in the proverbial to rebuild everything which created python modules.

But as a from-source builder I'm still trying to forget the aggravation of the move to wheel and the general deprecation of the old ways of specifying dependencies and how to run tests.

At the risk of starting a flamewar - at least with perl modules you can expect that in 99.9% the tests run. with meaningful erorrs if a test fails or a dependency is missing, and similarly they describe the dependencies for building, testing, and runtime. With python it really feels like falling into a snake pit.

Python packaging and its tools

Posted Mar 2, 2023 15:54 UTC (Thu) by taladar (subscriber, #68407) [Link]

It is not even just projects.

Personally I avoid Python as a language for tools I merely want to use as much as I can because I don't want to come to rely on tools that will inevitably break at some point at runtime like so many Python tools I have used in the past have done.

Next time someone asks why I will probably link them this series of articles here on LWN too as they illustrate the point that Python can't really be relied on quite well.

Python packaging and its tools

Posted Mar 2, 2023 16:31 UTC (Thu) by NN (subscriber, #163788) [Link] (17 responses)

Some discussion I find missing is lessons to be learnt from the js ecosystem. Or really, from any ecosystem with a non-terrible packaging system (R, C, etc). In R, you just download the thing. In C, you just get the file and the headers. In js, npm takes care of things for you. In Linux, you have various package managers which each do their own thing, but there is a certain level playing ground because you can build from source. But in Python, in Python the process is just uniquely terrible.

Like, coming from js and with a very superficial knowledge of Python, the core question for me is why can't you have something like npm for Python. What sacrifices do you have to make?

Python packaging and its tools

Posted Mar 2, 2023 18:10 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (16 responses)

As I have explained in comments on prior articles in this series, it's a two-part problem:

* History.
* C extensions.

"History" basically boils down to this: When Python was first getting started as a Real Language™, language-specific package management wasn't really a thing. As a result, they did not provide any native tooling for it, and everyone just sort of figured out their own solution, or let their Linux distro do it for them. They have been trying to untangle the resulting technical debt for the last couple of decades or so, but nobody seems to agree on how, or even whether, to standardize a solution.

C extensions are a more interesting issue. Compiling and distributing C extensions is complicated, because you don't know what libraries (and versions of libraries) will be available on the target machine. That leaves you with four options:

1. Pick a "reasonable" set of libraries and versions that are "probably" installed. This is basically what manylinux does, and it's why it's called "many" rather than "all." The drawback is that this is probably going to be a fairly small set of relatively old libraries, so it doesn't really solve the problem very thoroughly.
2. Vendor and/or statically link everything, like Rust and Go. Now all of the distros hate your guts because they have to unvendor your work. OTOH, there is a reason that multiple other languages have reached this conclusion. Distros may just have to learn to live with vendoring.
3. Make your own private enclave on the target machine, where you can install whatever libraries you want, and package those libraries yourself. In other words, you basically roll your own package management, not just for Python, but for C libraries and all other dependencies. This is what Conda does, and I imagine the distros would hate this even more than (2), if all Python software were distributed like that. Fortunately, most things are packaged for both Conda and Pip, so distros can just quietly pretend it doesn't exist.
4. Distribute source code, and if it doesn't compile, it's the user's problem. This is what Pip does in practice (whenever manylinux is inadequate).

Python packaging and its tools

Posted Mar 2, 2023 18:58 UTC (Thu) by k8to (guest, #15413) [Link] (5 responses)

Thanks this was a pretty helpful comment. It crystallized why the problems I've had with pip exist, which I sort of half-understood before. Situations would happen like build automation expecting to slap down the cryptography package via pip and suddenly I'm debugging openSSL build problems. And this happens because someone else at the company thought pip was just the normal way to provide dependencies for their tool.

As a python developer, my frustration is that I want to deliver complete running packages to users. I want to give them a tar and/or zip that unpacks and works, but yet the (python) libraries I end up needing to use tend to only document pip as a means of getting the library working, and pip tends to lean towards assuming local install. And the ecosystem tends to lean towards shifting dependencies semi-often.

So it feels like I end up sort of crafting my own bad packaging hacks on top of packaging tools to excise unwanted C extensions and so on, to get a runs-everywhere redistributable. I end up feeling very fragile and foolish in this approach, but asking my customers to become pip experts is a non-starter.

Sometimes it feels like the easiest path is to rewrite my key selling applications in something other than python, but there many years of sunk cost there.

Python packaging and its tools

Posted Mar 2, 2023 23:50 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

In practice, it is my opinion that some variant of (3) (such as Conda, Docker, Flatpak, or the like) tends to be the most portable way of distributing complete applications. Note that venv is *not* a good implementation of (3), because a venv is designed to be created in situ (rather than created in advance and distributed as a package). venvs offer relatively limited isolation from the host system, and also encode their absolute paths into various places (so that you can't easily relocate them). Note also that some Linux distros *really* don't like it when applications are distributed in this manner (but I don't know whether you care what they think).

Python packaging and its tools

Posted Mar 3, 2023 9:48 UTC (Fri) by cyperpunks (subscriber, #39406) [Link]

> As a python developer, my frustration is that I want to deliver complete running packages to users.

Unless you have very deep knowledge about C, shared libraries, Rust and Python on all target platforms (macOS, Windows, and a series of Linux distros) it can't be done. It's just more or less impossible to distribute software written in Python in this way.

Python packaging and its tools

Posted Mar 11, 2023 7:06 UTC (Sat) by auxsvr (guest, #120007) [Link] (1 responses)

Python supports loading the dependencies from a single zip file, which is e.g. what PyInstaller uses to store all dependencies and produce an executable. https://github.com/yt-dlp/yt-dlp is an example of this method: the result is a single executable that is a zip file with a shebang line calling the interpreter.

Python packaging and its tools

Posted Mar 13, 2023 17:10 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

That works as long as there aren't any compiled modules in the dependency tree, right? Where else does the runtime loader get the loadable content from for them?

Python packaging and its tools

Posted Apr 20, 2023 12:45 UTC (Thu) by Klavs (guest, #10563) [Link]

Problem is python depends on IMPORTANT system libraries - like the ssl lib.
So either you have to "test for specific distros" and then security updates is managed "by owner of system".. or you package it all - and YOU bear that responsibility.

If you want to take that on - then docker (or podman etc.) container images is probably the way to go.

Python packaging and its tools

Posted Mar 3, 2023 7:10 UTC (Fri) by LtWorf (subscriber, #124958) [Link]

> but for C libraries and all other dependencies

such as the entire rust toolchain as well

Python packaging and its tools

Posted Mar 3, 2023 17:22 UTC (Fri) by sionescu (subscriber, #59410) [Link] (7 responses)

It's more than that. The root of the problem is an obsession of pretty much all language communities (Perl, Python, Ruby, Rust, Erlang, Go, Javascript, Scheme, Common Lisp) for making their own build system and package manager that doesn't integrate with other languages except those commonly accepted as "system languages", i.e. C/C++/Fortran.

Imagine if there was a universal package manager that worked across languages, and that permitted various integrators of specifying dependencies like a Go library build-depending on an R script which depends on a Python script that depends on the Python interpreter which depends on a C compiler and a bunch of C libraries, etc...

That would make life easier for all languages, for distribution maintainers but right now the best contender for a universal build system would be Bazel and imagine what the users of those languages would say at the prospect of depending on a Java project.

Python packaging and its tools

Posted Mar 3, 2023 17:56 UTC (Fri) by pizza (subscriber, #46) [Link] (1 responses)

> The root of the problem is an obsession of pretty much all language communities (Perl,

I don't think Perl should be on this list; Not only does it predate Linux itself (Perl 4 was released five months before Torvalds announced his Linux kernel), it has always striven to play (and integrate) well in others' sandboxes, as befits its initial focus as a "glue" language.

Also, I recall that Perl has had, for quite some time (at least a decade, likely even longer), the tooling to automagically generate deb and rpms from arbitrary CPAN packages, including proper dependencies. And that provides the basis of most of what's packaged in RH/Fedora-land.

Python packaging and its tools

Posted Mar 5, 2023 2:33 UTC (Sun) by cozzyd (guest, #110972) [Link]

Python setuptools has bdist_rpm but sadly it seems to be deprecated (and I don't think it properly expresses dependencies anyway...)

Python packaging and its tools

Posted Mar 5, 2023 9:04 UTC (Sun) by NYKevin (subscriber, #129325) [Link]

Speaking as a Google employee who regularly uses Blaze (the internal equivalent), Bazel is great if:

* You know what all of your dependencies are.
* You can enumerate all of them in a reasonable, machine-readable format.
* Preferably, your build process is at least halfway sensible. It doesn't "care" that much about the build environment, working directory, filesystem mount options, phase of the moon, etc., and just looks at the source code you tell it to.
* Everything is amenable to build automation. Nothing requires a human to e.g. click a button in a GUI, faff about with hardware, etc. just to build a binary.
* You want reproducible builds, or at least reproducible-by-default builds.
* You are willing to treat all of the above as strict requirements of all artifacts in your dependency graph, all the way down to raw source code of everything (or binary blobs, if you have binary blobs). You are willing to fix "irregular" build processes rather than accepting them as technical debt.

It's the last bullet point that tends to become a problem for people. There's always that one thing that has a ridiculous build process.

Python packaging and its tools

Posted Mar 7, 2023 6:12 UTC (Tue) by ssmith32 (subscriber, #72404) [Link] (3 responses)

Er. I was with you until the end, and, then, Bazel?

How about apt or yum?

Work across languages, and are far closer to universal. Certainly would make life easier for distributions, if people just, you know, used the package manager provided with the distribution. Certainly checks all the boxes you asked for.

Of you're gonna harp on languages' obsession with having their own package managers, pitching a build tool that came out of Google's obsession with having their own.. everything.. is gonna generate a few funny looks.

Also, build tools are not package management. Or at least they shouldn't be.

Python packaging and its tools

Posted Mar 7, 2023 10:56 UTC (Tue) by farnz (subscriber, #17727) [Link]

Neither apt nor yum are package managers - they're repository handling tools build atop dpkg and RPM package managers. And dpkg and RPM don't supply a build system - at core, they specify a way to run a build system, and then how to find the files produced by the build to turn into a binary RPM.

Once you have a build system, you have one problem to solve: how do I track down my dependencies and integrate them into my build? If you use distribution package managers, you end up with two problems:

  1. How do I support people who don't use the distribution package manager I chose? E.g. supporting Windows and macOS, or supporting Debian users if I base around RPM? This is the problem I already had, and I've still got it.
  2. What ensures that my tool always outputs distribution policy compliant packages, even as the distribution policy changes, and as my users do things that work for them?

Given that reusing the distribution package manager doesn't solve a problem, but does add one more, why would I do that?

Python packaging and its tools

Posted Mar 8, 2023 0:17 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> How about apt or yum?

They don't solve the problem of reproducible builds. By default, they will install the latest available versions of packages. So in practice every build environment will be slightly different.

It's possible to hack up a system that will add a "lockfile" with exact package versions that you can install in a container, but I'm not aware of any large-scale system using this approach.

Python packaging and its tools

Posted Mar 9, 2023 5:17 UTC (Thu) by pabs (subscriber, #43278) [Link]

Debian buildinfo files are basically lockfiles (plus some other things), and IIRC there are tools you can use to rebuild packages with the exact versions listed in them.

Python packaging and its tools

Posted Mar 11, 2023 11:17 UTC (Sat) by deltragon (guest, #159552) [Link]

Note that npm does seem to have found a solution to the C extensions problem as well, since the old node-sass (which has now been replaced with dart-sass, but was quite popular before) was just bindings to the C libsass.
It used either prebuilt binaries or node-gyp to compile on the users machine at install time, and seeing how popular node-sass was/still is, that seems to have worked out.

Python packaging and its tools

Posted Mar 11, 2023 0:26 UTC (Sat) by jensend (guest, #1385) [Link]

One thing that comes to mind when discussing replacements for wheel etc:

Python installs tend towards having tons of tiny files. A "batteries included" Python installation with popular libraries can include over 100,000 files, with a median size of under 3kB.

This leads to wasted slack space and presumably to reduced performance due to filesystem overhead. How much will vary considerably across different filesystems.

On a 32GB+ ExFAT drive, for instance, the default cluster size is 128kB. So if you put a Python install on a USB stick, it can waste 10GB of slack space for 2GB of files.

Libraries imported from archives, as with e.g. .jar, would avoid leaning so much on filesystem characteristics. Compression wouldn't be required. But zipimport doesn't work for many packages, and wheels are not designed to be used this way at all.

Python packaging and its tools

Posted Apr 5, 2023 22:15 UTC (Wed) by pnovotnak (guest, #107233) [Link]

IMO:

The language package manager should do exactly one thing: Manage packages (and do it well). Typically the provided tool is correct, but it languishes and a community tool is allowed to grow and the first thing that package manager does is to bundle a task system. This flourishing of weird 3rd-party tools is a symptom of a problem. Though Golang is not interpreted, it's package manager is the shining example here that I'm aware of. While not perfect, it solves the communities problems, it's extremely fast, is simple to understand, and it doesn't really leave any room for competitors (as shown by killing them all off when it was released).

One additional thing I've realized lately is that the lack of an explicit requirement for a language version manager sows confusion. I'd love to see interpreted languages installed by default via separate version managers (the version manager being the thing you use to install the interpreter), although perhaps in a more perfect world would have a standard UNIX shim utility that can be used to juggle system-provided versions more safely. However, the way local installs are linked to the system installs is a potential source of confusion here. asdf would appear to be the closest thing we've got to that vision at the moment.


Copyright © 2023, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds