A survey of the Python packaging landscape
Over the past several months, there have been wide-ranging discussions in the Python community about difficulties users have with installing packages for the language. There is a bewildering array of options for package-installation tools and Python distributions focused on particular use cases (e.g. scientific computing); many of those options do not interoperate well—or at all—so they step on each others' toes. The discussions have focused on where solutions might be found to make it easier on users, but lots of history and entrenched use cases need to be overcome in order to get there—or even to make progress in that direction.
In order to follow along on these lengthy discussions, though, an overview of Python's packaging situation and the challenges it presents may be helpful. Linux users typically start by installing whichever Python version is supplied by their distribution, then installing various other Python packages and applications that come from their distribution's repositories. That works fine so long as the versions of all of those pieces are sufficient for the needs of the user. Eventually, though, users may encounter some package they want to use that is not provided by their distribution, so they need to install it from somewhere else.
PyPI and pip
The Python Package Index (PyPI) contains a huge number of useful packages that can be installed in a system running Python. That is typically done using the pip package installer, which will either install the package in a site-wide location or somewhere user-specific depending on whether it was invoked with privileges. pip will also download any needed dependencies, but it only looks for those dependencies at PyPI, since it has no knowledge of the distribution package manager. That can lead to pip installing a dependency that actually is available in the distribution's repository, which is just one of the ways pip and the distribution package manager (e.g. DNF, Apt, etc.) can get crosswise.
Beyond that, there can be conflicting dependency needs between different packages or applications. If application A needs version 1 of a dependency, but application B needs version 2, only one can be satisfied because only a single version of a package can be active for a particular Python instance. It is not possible to specify that the import statement in A picks up a different version than the one that B picks up. Linux distributions solve those conflicting-version problems in various ways, which sometimes results in applications not being available because another, more important package required something that conflicted. The Linux-distribution path is not a panacea, especially for those who want bleeding-edge Python applications and modules. For those not following that path, this is where the Python virtual environment (venv) comes into play.
Virtual environments
A virtual environment is a lightweight way to create a Python instance with its own set of packages that are precisely tuned to the needs of the application or user. They were added to Python itself with PEP 405 ("Python Virtual Environments") in 2011, but they had already become popular via the virtualenv module on PyPI. At this point, it has become almost an expectation that developers are using virtual environments to house and manage their dependencies; it has reached a point where there is talk of forcing pip and other tools to only install into them.
When the module to be installed is pure Python, installation with pip is fairly straightforward, but Python modules can also have pieces that are written to the C API, thus they need to be built for the target system from source code in C, C++, Rust, or other languages. That requires the proper toolchain to be available on that system, which is typically easy to ensure on Linux, but is less so on other operating systems. So projects can provide pre-built binary "wheels" in addition to source distributions on PyPI.
But wheels are highly specialized for the operating system, architecture, C library, and other characteristics of the environment, which leads to a huge matrix of possibilities. PyPI relies on all of the individual projects to build "all" of the wheels that users might need, which distributes the burden, but also means that there are gaps for projects that do not have the resources of a large build farm to create wheels. Beyond that, some Python applications and libraries, especially in the scientific-computing world, depend on external libraries of various sorts, which are also needed on target systems.
Distributions
This is where Python distributions pick up the slack. For Linux users, their regular distribution may well provide what is needed for, say, NumPy. But if the version that the distribution provides is insufficient for some reason, or if the user is running some other operating system that lacks a system-wide package manager, it probably makes sense to seek out Anaconda, or its underlying conda package manager.
The NumPy installation page demonstrates some of the complexities with Python packaging. It has various recommendations for ways to install NumPy; for beginners on any operating system, it suggests Anaconda. For more advanced users, Miniforge, which is a version of conda that defaults to using the conda-forge package repository, seems to be the preferred solution, but pip and PyPI are mentioned as an alternate path.
There are a number of differences between pip and conda that are
described in the
"Python
package management" section of the NumPy installation page. The
biggest difference is that conda manages external, non-Python dependencies,
compilers, GPU compute libraries, languages, and so on, including Python
itself. On the other hand, pip only works with some version of
Python that
has already been installed from, say, python.org or as part of a Linux
distribution. Beyond that, conda is an integrated solution that handles
packages, dependencies, and virtual environments, "while with pip you
may need another tool (there are many!) for dealing with environments or
complex dependencies
".
In fact, the "pip" recommendation for NumPy is not to actually use
that tool, but to use Poetry instead, because it
"provides a dependency resolver and environment management capabilities
in a similar fashion as conda does
". So a conda-like approach is what
NumPy suggests and the difference is that Poetry/pip use PyPI,
while conda normally uses conda-forge. The split is bigger than that, though,
because conda does not use binary wheels, but instead uses its own format
that is different from (and, in some cases, predates) the packaging
standards that pip and much of the rest of the Python packaging
world use.
PyPA
The Python Packaging Authority (PyPA) is a working group in the community that maintains pip, PyPI, and other tools; it also approves packaging-related Python Enhancement Proposals (PEPs) as a sort of permanent PEP-delegate from the steering council (which was inherited from former benevolent dictator Guido van Rossum). How the PEP process works is described on its "PyPA Specifications" page. Despite its name, though, the PyPA has no real authority in the community; it leads by example and its recommendations (even in the form of PEPs) are simply that—tool authors can and do ignore or skirt them as desired.
The PyPA maintains multiple tools, the "Python Packaging User
Guide", and more. The organization's goals are specified on its
site, but they are necessarily rather conservative because the Python
software-distribution ecosystem "has a foundation that is almost 15
years old, which poses a variety of challenges to successful
evolution
".
In a lengthy (and opinionated) mid-January blog post, Chris Warrick looked at the proliferation of tools, noting that there are 14 that he found, most of which are actually maintained by the PyPA, but it is not at all clear from that organization's documentation which of those tools should be preferred. Meanwhile, the tools that check most of the boxes in Warrick's comparison chart, Poetry and PDM, are not maintained by the working group, but instead by others who are not participating in the PyPA, he said.
The situation is, obviously, messy; the PyPA is well aware of that and has
been trying to wrangle various solutions for quite some time. The
discussions of the problems have seemingly become more widespread—or more
visible—over the past few months, in part because of an off-hand comment in
Brett Cannon's (successful) self-re-nomination
to the steering council for 2023. He surely did not know how much
discussion would be spawned from a note tucked into the bottom of that
message:
"(I'm also still working towards lock/pinned dependencies files on the
packaging side and doing stuff with the Python Launcher for Unix, but
that's outside of Python core).
"
Several commented in that thread on their hopes that the council (or someone) could come up with some kind of unifying vision for Python packaging. Those responses were split off into a separate "Wanting a singular packaging tool/vision" thread, which grew from there. That discussion led to other threads, several of which are still quite active as this is being written. Digging into those discussions is a subject for next week—and likely beyond.
Readers who want to get a jump-start on the discussions will want to read Warrick's analysis and consult the pypackaging-native site that was announced by Ralf Gommers in late December. Also of interest is the results of the Python packaging survey, which further sets the stage for much of the recent discussion and work. Packaging woes have been a long-simmering (and seriously multi-faceted) problem for Python, so it is nice to see some efforts toward fixing, or at least improving, the situation in the (relatively) near term. But there is still a long way to go. Stay tuned ...
Index entries for this article | |
---|---|
Python | Packaging |
Posted Jan 17, 2023 23:51 UTC (Tue)
by ryanduve (subscriber, #127786)
[Link] (16 responses)
1. Pyenv, to manage installation of different Python versions. From what I can tell, it's the equivalent of building Python from source, but it takes care of the common build flags and does free namespace management. I used to use the system package manager for this, but Pyenv supports every minor/patch version of Python I've thrown at it, which is really useful.
2. Poetry, to manage dependencies for each project and generate a lock file, which nails down transitive dependency versions.
System-level dependencies are usually just sort of agreed upon within the working group, which works because they rarely need formal specification for our work. If that's not the case, we replace Pyenv with a Docker image and specify whatever is needed on top from there.
It's honestly not ideal. There's a ton of `poetry env remove ...` voodoo that has to be done from time-to-time, and it's always a headache determining which virtual environment is active on someone else's IDE. I hope the PyPA comes up with a single replacement for both the above, packaged with Python's standard library.
Posted Jan 18, 2023 0:53 UTC (Wed)
by acarno (subscriber, #123476)
[Link] (14 responses)
Posted Jan 18, 2023 1:47 UTC (Wed)
by tialaramex (subscriber, #21167)
[Link] (11 responses)
If there's a tool that's right in the box and is at least pretty good for beginners, then why would they use something else at first while learning? And once all the beginners use it, you're starting to build up a population who want this tool even if maybe an alternative could be better for some things, they've "grown up" with the bundled tool and that's what they want improved.
If Python shipped with a tool that can "Just Do It" when it comes to using modules like Requests, whatever the popular way is to talk to Postgressql or MariaDB or whatever, maybe a couple of popular options for Machine Learning or AWS S3 or other stuff some people need a lot but others don't, that could be huge even if it can't install Bob's Third Mediocre JSON parser v0.1.2 or whatever.
Posted Jan 18, 2023 5:03 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (10 responses)
Having said that, they're far from perfect. Problems with pip and venv:
0. You have to create the venv. It doesn't just magically spring into existence when you start installing packages, or prompt you to create it, or something of that nature. (It would probably still be good to have a way to do manual venv creation, but the fact that it's not automatic means that less experienced users will not realize they need to do it in the first place.)
Posted Jan 18, 2023 6:36 UTC (Wed)
by k8to (guest, #15413)
[Link] (1 responses)
It feels like there probably is some middle path of cooperation between the distribution and python that could make this less brittle for the common culrpits. But maybe not.
Posted Jan 18, 2023 14:27 UTC (Wed)
by SnoopJ (guest, #162807)
[Link]
Mostly it would prevent users from damaging their distribution copy by giving distributions with strong opinions about How Python Should Be the option to tell pip (or other package managers) that the site is someone else's problem.
Posted Jan 18, 2023 6:52 UTC (Wed)
by LtWorf (subscriber, #124958)
[Link] (1 responses)
I don't understnad. A venv is not there to provide isolation. It's there to basically provide a path to use to load libraries. That's all. It's a completely different thing than a container.
Also, most containers, by default, do not provide any isolation.
Posted Jan 18, 2023 18:11 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link]
I am not familiar with your definition of "isolation."
Posted Jan 18, 2023 13:45 UTC (Wed)
by zorro (subscriber, #45643)
[Link] (3 responses)
I'm on Windows and what I want to do is ZIP the contents of my venv folder and use the resulting ZIP file as a distribution package for my application. My hope is that I can simply unzip the package on any other Windows computer and run the application there without needing to install or cause interference with anything, including Python itself. Am I too optimistic?
Posted Jan 18, 2023 15:26 UTC (Wed)
by Wol (subscriber, #4433)
[Link]
Okay, I run gentoo, and I installed Sphinx as per the instructions (use venv, I think it said). I'm not a Python user, I don't understand it, and very quickly all hell broke lose.
At which point, I realised that gentoo has a Sphinx package, I installed that instead, and now everything appears to work fine.
The problem, as it appears to me, is that if you are dealing with a clueless luser (which is me and Python), venv is both hard to automate, and hard to explain. BOOM! Now I leave everything to emerge, it's not a problem. "It Just Works (tm)".
Cheers,
Posted Jan 18, 2023 16:32 UTC (Wed)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
FWIW, this means that "activating" a venv is never actually necessary. At least I never do it. Instead, I just run the `pip`, `python`, or any other tool directly from the venv directory and everything Just Works™.
Posted Jan 19, 2023 12:05 UTC (Thu)
by zorro (subscriber, #45643)
[Link]
I ended up following the Windows embeddable package approach described in https://fpim.github.io/posts/setting-up-python-windows-em.... This gave me exactly what I wanted. No system-level installation of Python, all dependencies contained in a single folder (including Python itself), an no hardcoded absolute paths. Better than using venv, AFAIC.
Posted Jan 19, 2023 9:20 UTC (Thu)
by kleptog (subscriber, #1183)
[Link] (1 responses)
Yeah, venvs are fine for development, but for production we use a two stage docker build process that first takes the list of dependencies and produces a bunch of wheels, and then builds a clean docker image with all the wheels installed globally.
The biggest weakness IMHO of pip is that PyPI has no metadata file, so it has to actually download the package (and possibly executes the setup.py) to determine the dependencies. So the package resolver, if it gets into an unresolvable situation, can keep downloading older and older versions in an attempt to make it work. One time we got strange errors from the buildbot and figured it had downloaded 3 year old versions of various packages which somehow worked (and had taken all night to do it).
Posted Jan 20, 2023 0:32 UTC (Fri)
by NYKevin (subscriber, #129325)
[Link]
I think the "standard" solution to this problem is pip freeze, but while this does solve the "I have a setup that works, now how do I replicate it?" problem, it does *not* solve the "I have a setup that works, but it's outdated, now how do I build a newer setup that also works?" problem, and that's arguably the more important problem to solve.
Posted Jan 19, 2023 3:22 UTC (Thu)
by ranger207 (subscriber, #134731)
[Link] (1 responses)
Posted Jan 19, 2023 18:33 UTC (Thu)
by acarno (subscriber, #123476)
[Link]
Thanks for sharing! :)
Posted Jan 18, 2023 1:08 UTC (Wed)
by hazmat (subscriber, #668)
[Link]
Posted Jan 18, 2023 5:00 UTC (Wed)
by PengZheng (subscriber, #108006)
[Link]
I think it is plausible to combine C/C++ package manager (yes, I mean Conan, which is also written in python) with pip and virtualenv, and let Conan deal with these external libraries.
Posted Jan 18, 2023 9:46 UTC (Wed)
by cyperpunks (subscriber, #39406)
[Link] (1 responses)
It's often more work to deploy a Python application than maintaining the actual Python code.
Posted Jan 26, 2023 9:07 UTC (Thu)
by milesrout (subscriber, #126894)
[Link]
Posted Jan 18, 2023 13:56 UTC (Wed)
by jkingweb (subscriber, #113039)
[Link] (3 responses)
Posted Jan 18, 2023 14:15 UTC (Wed)
by rahulsundaram (subscriber, #21946)
[Link] (2 responses)
Interesting question. Apparently it is a bit of an inside joke. Refer to
https://discuss.python.org/t/where-the-name-wheel-comes-f...
More about the wheel format in https://peps.python.org/pep-0427/
Posted Jan 22, 2023 8:25 UTC (Sun)
by CChittleborough (subscriber, #60775)
[Link] (1 responses)
Posted Jan 26, 2023 11:46 UTC (Thu)
by sammythesnake (guest, #17693)
[Link]
Posted Jan 18, 2023 14:18 UTC (Wed)
by jhoblitt (subscriber, #77733)
[Link]
The scientific computing world has heavily embraced conda, which solves somes of the issues but it is still messy to try use two tools at the same time with conflicting dependency chains.
Posted Jan 18, 2023 20:45 UTC (Wed)
by iabervon (subscriber, #722)
[Link] (3 responses)
Posted Jan 19, 2023 1:38 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
* pkgutil.get_data(package, path)
Of these, pkgutil and importlib.resources.files() are not explicitly marked deprecated, and both are in the stdlib, so obviously we're supposed to... flip a coin? Well, pkgutil has a much simpler API, so it's probably the one that people will use in practice, but importlib.resources is far newer (despite already having a significant portion of its API surface marked as deprecated), so it's probably the one that they *want* people to use.
Posted Jan 19, 2023 23:27 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
> FYI the long-term plan is to deprecate pkgutil, so I would use newer APIs as provided by importlib.
So it's to-be-deprecated but not actually deprecated yet. And the churn continues...
Posted Jan 19, 2023 23:57 UTC (Thu)
by iabervon (subscriber, #722)
[Link]
For what it's worth, objects that fit the duck type of pathlib.Path as far as any operations you'd want to attempt are really nice to work with, although I wouldn't be annotating any variables with the current return type of importlib.resources.files().
Posted Jan 19, 2023 1:34 UTC (Thu)
by pj (subscriber, #4506)
[Link]
Posted Jan 19, 2023 5:11 UTC (Thu)
by marcH (subscriber, #57642)
[Link]
My super ugly way to work around this issue has been to go and try "apt-get/dnf install python[3]-X" where X is the pip name. It works about 80% of the time.
pip install finally added some "--dry-run" flag that helps with that.
It would be nice for Linux distros to make that mapping easier. Maybe packages that don't follow that naming convention could have a "python-X" alias when possible. It does not have to work 100% of the time, going from 80% to 95% would already help a lot.
Posted Jan 19, 2023 12:48 UTC (Thu)
by ehiggs (subscriber, #90713)
[Link] (1 responses)
To wit: pyenv is written in shell scripts and while it's a bit janky, it works very well because it doesn't suffer from this bootstrap problem.
I would recommend Rust for this but someone will just end up writing plugins in PyO3 and reintroduce the boostrap problem again - so it would need something with no (or unpopular) Python bindings. The obvious conclusion: Python tooling should be written in Go.
Posted Jan 26, 2023 6:20 UTC (Thu)
by njs (subscriber, #40338)
[Link]
Posted Jan 26, 2023 13:35 UTC (Thu)
by callegar (guest, #16148)
[Link]
> It is not possible to specify that the import statement in A picks up a different version than the one that B picks up.
Regardless of solutions to other more difficult problems, such as the management of "native" dependencies, the point above keeps puzzling me, since it seems addressable and still does not seem to enjoy large consideration. On one hand it gets mentioned only with regards to pip, but it is in fact common to all the mentioned packaging options (pip, setuptools, or conda) and it does not go away (not completely at least) even if you use virtual environments.
The main problem is that it gives a huge power to a single author to make a whole ecosystem lag. If you have a project and in your project there is a single dependency X whose author is not prompt (or even not willing) to update its own dependency requirements then that single X is going to prevent you from using up to date versions of all its dependencies that may be themselves dependency of your project. In turn these older packages will impose limits on how new their own dependencies can be, until all your project ends up having to rely on old stuff, giving up on the opportunity to use new features. Using a virtual environment here does not help.
A survey of the Python packaging landscape
A survey of the Python packaging landscape
A survey of the Python packaging landscape
A survey of the Python packaging landscape
1. pip still wants to install into the global site-packages dir by default (i.e. if no flags are passed and a venv is not active). This is bad. pip should not do that, at least not without some kind of --i-know-what-im-doing flag. It baffles me that the Python folks are dragging their feet on this one - I could've told you that five years ago. To be fair, this is not a totally unreasonable thing to do on platforms with no package manager (i.e. Windows), but even so it's still not ideal for development purposes, and end users probably should not be using pip anyway.
2. pip's handling of dependency management and versioning is rather minimal, and possibly inadequate for some situations.
3. More generally, pip and venv are slightly less abstract than what some users (i.e. academics and other non-software-engineers) would like them to be. They are not opinionated enough, and where they do have opinions, those opinions are sometimes the "wrong" ones (for the scientific use case in particular).
4. A venv is basically the world's leakiest userspace container, which is to say that venvs are really not containers at all, but just a bunch of scripts and environment variables that ask Python nicely to pretend that it's running in an isolated environment. Deploying a venv in production "as is" is questionable at best. You're probably better off using real containers instead, at which point the venv is redundant.
5. If you don't have a C compiler all set up and ready for pip to invoke, then some packages simply can't be installed (by pip), because they are not distributed as compiled binaries and/or because distributing binaries on Linux is generally awkward.
6. Several Linux distributions, notably including Debian and its derivatives, have intentionally sabotaged pip and (indirectly) venv to prevent problem (1) from breaking people's systems. On those systems, you have to install pip separately, even though upstream distributes it as part of the language. This leads to confusion and the perception of pip and venv as "not ready for general use" (e.g. "the tutorial said to run python -m venv env, but when I do that, it gives this weird error about pip and says it isn't going to work!").
7. There are probably at least a few domain-specific problems that I'm unaware of, but the above is just what I could think of off the top of my head.
A survey of the Python packaging landscape
A survey of the Python packaging landscape
A survey of the Python packaging landscape
A survey of the Python packaging landscape
A survey of the Python packaging landscape
A survey of the Python packaging landscape
Wol
A survey of the Python packaging landscape
A survey of the Python packaging landscape
A survey of the Python packaging landscape
A survey of the Python packaging landscape
A survey of the Python packaging landscape
A survey of the Python packaging landscape
A survey of the Python packaging landscape
A survey of the Python packaging landscape
A survey of the Python packaging landscape
A survey of the Python packaging landscape
A survey of the Python packaging landscape
A survey of the Python packaging landscape
I always assumed the old hacker jargon term “wheel” had something to do with it. Hmm...
A survey of the Python packaging landscape
A survey of the Python packaging landscape
A survey of the Python packaging landscape
A survey of the Python packaging landscape
A survey of the Python packaging landscape
* pkg_resources, generally
* importlib.resources.read_text() etc.
* importlib.resources.files(package) and all of the various methods you can call on its return value
A survey of the Python packaging landscape
A survey of the Python packaging landscape
A survey of the Python packaging landscape
1. python versioning via nix (as a non-main package manager on my ubuntu system)
2. python dependency version pinning via pip-tools
but then I still use setup.py and bdist_wheel for packaging.
A survey of the Python packaging landscape
A survey of the Python packaging landscape
A survey of the Python packaging landscape
A survey of the Python packaging landscape