|
|
Subscribe / Log in / New account

Python packaging and its tools

Python packaging and its tools

Posted Mar 2, 2023 16:31 UTC (Thu) by NN (subscriber, #163788)
Parent article: Python packaging and its tools

Some discussion I find missing is lessons to be learnt from the js ecosystem. Or really, from any ecosystem with a non-terrible packaging system (R, C, etc). In R, you just download the thing. In C, you just get the file and the headers. In js, npm takes care of things for you. In Linux, you have various package managers which each do their own thing, but there is a certain level playing ground because you can build from source. But in Python, in Python the process is just uniquely terrible.

Like, coming from js and with a very superficial knowledge of Python, the core question for me is why can't you have something like npm for Python. What sacrifices do you have to make?


to post comments

Python packaging and its tools

Posted Mar 2, 2023 18:10 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (16 responses)

As I have explained in comments on prior articles in this series, it's a two-part problem:

* History.
* C extensions.

"History" basically boils down to this: When Python was first getting started as a Real Languageā„¢, language-specific package management wasn't really a thing. As a result, they did not provide any native tooling for it, and everyone just sort of figured out their own solution, or let their Linux distro do it for them. They have been trying to untangle the resulting technical debt for the last couple of decades or so, but nobody seems to agree on how, or even whether, to standardize a solution.

C extensions are a more interesting issue. Compiling and distributing C extensions is complicated, because you don't know what libraries (and versions of libraries) will be available on the target machine. That leaves you with four options:

1. Pick a "reasonable" set of libraries and versions that are "probably" installed. This is basically what manylinux does, and it's why it's called "many" rather than "all." The drawback is that this is probably going to be a fairly small set of relatively old libraries, so it doesn't really solve the problem very thoroughly.
2. Vendor and/or statically link everything, like Rust and Go. Now all of the distros hate your guts because they have to unvendor your work. OTOH, there is a reason that multiple other languages have reached this conclusion. Distros may just have to learn to live with vendoring.
3. Make your own private enclave on the target machine, where you can install whatever libraries you want, and package those libraries yourself. In other words, you basically roll your own package management, not just for Python, but for C libraries and all other dependencies. This is what Conda does, and I imagine the distros would hate this even more than (2), if all Python software were distributed like that. Fortunately, most things are packaged for both Conda and Pip, so distros can just quietly pretend it doesn't exist.
4. Distribute source code, and if it doesn't compile, it's the user's problem. This is what Pip does in practice (whenever manylinux is inadequate).

Python packaging and its tools

Posted Mar 2, 2023 18:58 UTC (Thu) by k8to (guest, #15413) [Link] (5 responses)

Thanks this was a pretty helpful comment. It crystallized why the problems I've had with pip exist, which I sort of half-understood before. Situations would happen like build automation expecting to slap down the cryptography package via pip and suddenly I'm debugging openSSL build problems. And this happens because someone else at the company thought pip was just the normal way to provide dependencies for their tool.

As a python developer, my frustration is that I want to deliver complete running packages to users. I want to give them a tar and/or zip that unpacks and works, but yet the (python) libraries I end up needing to use tend to only document pip as a means of getting the library working, and pip tends to lean towards assuming local install. And the ecosystem tends to lean towards shifting dependencies semi-often.

So it feels like I end up sort of crafting my own bad packaging hacks on top of packaging tools to excise unwanted C extensions and so on, to get a runs-everywhere redistributable. I end up feeling very fragile and foolish in this approach, but asking my customers to become pip experts is a non-starter.

Sometimes it feels like the easiest path is to rewrite my key selling applications in something other than python, but there many years of sunk cost there.

Python packaging and its tools

Posted Mar 2, 2023 23:50 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

In practice, it is my opinion that some variant of (3) (such as Conda, Docker, Flatpak, or the like) tends to be the most portable way of distributing complete applications. Note that venv is *not* a good implementation of (3), because a venv is designed to be created in situ (rather than created in advance and distributed as a package). venvs offer relatively limited isolation from the host system, and also encode their absolute paths into various places (so that you can't easily relocate them). Note also that some Linux distros *really* don't like it when applications are distributed in this manner (but I don't know whether you care what they think).

Python packaging and its tools

Posted Mar 3, 2023 9:48 UTC (Fri) by cyperpunks (subscriber, #39406) [Link]

> As a python developer, my frustration is that I want to deliver complete running packages to users.

Unless you have very deep knowledge about C, shared libraries, Rust and Python on all target platforms (macOS, Windows, and a series of Linux distros) it can't be done. It's just more or less impossible to distribute software written in Python in this way.

Python packaging and its tools

Posted Mar 11, 2023 7:06 UTC (Sat) by auxsvr (guest, #120007) [Link] (1 responses)

Python supports loading the dependencies from a single zip file, which is e.g. what PyInstaller uses to store all dependencies and produce an executable. https://github.com/yt-dlp/yt-dlp is an example of this method: the result is a single executable that is a zip file with a shebang line calling the interpreter.

Python packaging and its tools

Posted Mar 13, 2023 17:10 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

That works as long as there aren't any compiled modules in the dependency tree, right? Where else does the runtime loader get the loadable content from for them?

Python packaging and its tools

Posted Apr 20, 2023 12:45 UTC (Thu) by Klavs (guest, #10563) [Link]

Problem is python depends on IMPORTANT system libraries - like the ssl lib.
So either you have to "test for specific distros" and then security updates is managed "by owner of system".. or you package it all - and YOU bear that responsibility.

If you want to take that on - then docker (or podman etc.) container images is probably the way to go.

Python packaging and its tools

Posted Mar 3, 2023 7:10 UTC (Fri) by LtWorf (subscriber, #124958) [Link]

> but for C libraries and all other dependencies

such as the entire rust toolchain as well

Python packaging and its tools

Posted Mar 3, 2023 17:22 UTC (Fri) by sionescu (subscriber, #59410) [Link] (7 responses)

It's more than that. The root of the problem is an obsession of pretty much all language communities (Perl, Python, Ruby, Rust, Erlang, Go, Javascript, Scheme, Common Lisp) for making their own build system and package manager that doesn't integrate with other languages except those commonly accepted as "system languages", i.e. C/C++/Fortran.

Imagine if there was a universal package manager that worked across languages, and that permitted various integrators of specifying dependencies like a Go library build-depending on an R script which depends on a Python script that depends on the Python interpreter which depends on a C compiler and a bunch of C libraries, etc...

That would make life easier for all languages, for distribution maintainers but right now the best contender for a universal build system would be Bazel and imagine what the users of those languages would say at the prospect of depending on a Java project.

Python packaging and its tools

Posted Mar 3, 2023 17:56 UTC (Fri) by pizza (subscriber, #46) [Link] (1 responses)

> The root of the problem is an obsession of pretty much all language communities (Perl,

I don't think Perl should be on this list; Not only does it predate Linux itself (Perl 4 was released five months before Torvalds announced his Linux kernel), it has always striven to play (and integrate) well in others' sandboxes, as befits its initial focus as a "glue" language.

Also, I recall that Perl has had, for quite some time (at least a decade, likely even longer), the tooling to automagically generate deb and rpms from arbitrary CPAN packages, including proper dependencies. And that provides the basis of most of what's packaged in RH/Fedora-land.

Python packaging and its tools

Posted Mar 5, 2023 2:33 UTC (Sun) by cozzyd (guest, #110972) [Link]

Python setuptools has bdist_rpm but sadly it seems to be deprecated (and I don't think it properly expresses dependencies anyway...)

Python packaging and its tools

Posted Mar 5, 2023 9:04 UTC (Sun) by NYKevin (subscriber, #129325) [Link]

Speaking as a Google employee who regularly uses Blaze (the internal equivalent), Bazel is great if:

* You know what all of your dependencies are.
* You can enumerate all of them in a reasonable, machine-readable format.
* Preferably, your build process is at least halfway sensible. It doesn't "care" that much about the build environment, working directory, filesystem mount options, phase of the moon, etc., and just looks at the source code you tell it to.
* Everything is amenable to build automation. Nothing requires a human to e.g. click a button in a GUI, faff about with hardware, etc. just to build a binary.
* You want reproducible builds, or at least reproducible-by-default builds.
* You are willing to treat all of the above as strict requirements of all artifacts in your dependency graph, all the way down to raw source code of everything (or binary blobs, if you have binary blobs). You are willing to fix "irregular" build processes rather than accepting them as technical debt.

It's the last bullet point that tends to become a problem for people. There's always that one thing that has a ridiculous build process.

Python packaging and its tools

Posted Mar 7, 2023 6:12 UTC (Tue) by ssmith32 (subscriber, #72404) [Link] (3 responses)

Er. I was with you until the end, and, then, Bazel?

How about apt or yum?

Work across languages, and are far closer to universal. Certainly would make life easier for distributions, if people just, you know, used the package manager provided with the distribution. Certainly checks all the boxes you asked for.

Of you're gonna harp on languages' obsession with having their own package managers, pitching a build tool that came out of Google's obsession with having their own.. everything.. is gonna generate a few funny looks.

Also, build tools are not package management. Or at least they shouldn't be.

Python packaging and its tools

Posted Mar 7, 2023 10:56 UTC (Tue) by farnz (subscriber, #17727) [Link]

Neither apt nor yum are package managers - they're repository handling tools build atop dpkg and RPM package managers. And dpkg and RPM don't supply a build system - at core, they specify a way to run a build system, and then how to find the files produced by the build to turn into a binary RPM.

Once you have a build system, you have one problem to solve: how do I track down my dependencies and integrate them into my build? If you use distribution package managers, you end up with two problems:

  1. How do I support people who don't use the distribution package manager I chose? E.g. supporting Windows and macOS, or supporting Debian users if I base around RPM? This is the problem I already had, and I've still got it.
  2. What ensures that my tool always outputs distribution policy compliant packages, even as the distribution policy changes, and as my users do things that work for them?

Given that reusing the distribution package manager doesn't solve a problem, but does add one more, why would I do that?

Python packaging and its tools

Posted Mar 8, 2023 0:17 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> How about apt or yum?

They don't solve the problem of reproducible builds. By default, they will install the latest available versions of packages. So in practice every build environment will be slightly different.

It's possible to hack up a system that will add a "lockfile" with exact package versions that you can install in a container, but I'm not aware of any large-scale system using this approach.

Python packaging and its tools

Posted Mar 9, 2023 5:17 UTC (Thu) by pabs (subscriber, #43278) [Link]

Debian buildinfo files are basically lockfiles (plus some other things), and IIRC there are tools you can use to rebuild packages with the exact versions listed in them.

Python packaging and its tools

Posted Mar 11, 2023 11:17 UTC (Sat) by deltragon (guest, #159552) [Link]

Note that npm does seem to have found a solution to the C extensions problem as well, since the old node-sass (which has now been replaced with dart-sass, but was quite popular before) was just bindings to the C libsass.
It used either prebuilt binaries or node-gyp to compile on the users machine at install time, and seeing how popular node-sass was/still is, that seems to have worked out.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds