Cooperative package management for Python
A longstanding tug-of-war between system package managers and Python's own
installation mechanisms (primarily pip, but there are others) looks
on its way to being resolved—or at least regularized. PEP 668
("Graceful cooperation between external and Python package
managers
") has been created to provide ways for the two types of package installation
to
work together, rather than at cross-purposes at times.
Since many operating systems depend on Python tools, with package versions
that may differ from those of users' Python applications, making them play together
nicely should result in more stable systems.
The root cause of the problem is that distribution package managers and Python package managers ("pip" is shorthand to refer to those throughout the rest of the article) often share the same "site‑packages" directory for storing installed packages. Updating a package, or, worse yet, removing one, may make perfect sense in the context of the specific package manager, but completely foul up the other. As the PEP notes, that can cause real havoc:
This may pose a critical problem for the integrity of distros, which often have package-management tools that are themselves written in Python. For example, it's possible to unintentionally break Fedora's dnf command with a pip install command, making it hard to recover.
The sys.path system parameter governs where Python looks for modules when it encounters an import statement; it gets initialized from the PYTHONPATH environment variable, with some installation- and invocation-specific directories added. sys.path is a Python list of directories that get consulted in order, much like the shell PATH environment variable that it is modeled on. Python programs can manipulate sys.path to redirect the search, which is part of what makes virtual environments work.
Using virtual environments with pip, instead of installing packages system-wide, has been the recommended practice to avoid conflicts with OS-installed packages for quite some time. But it is not generally mandatory, so users sometimes still run into problems. One goal of PEP 668 is to allow distributions to indicate that they provide another mechanism for managing Python packages, which will then change the default behavior of pip. Users will still be able to override that default, but that will hopefully alert them to the problems that could arise.
A distribution that wants to opt into the new behavior will tell pip that it manages Python packages with its tooling by placing a configuration file called EXTERNALLY‑MANAGED in the directory where the Python standard library lives. If pip finds the EXTERNALLY‑MANAGED file there and is not running within a virtual environment, it should exit with an error message unless the user has explicitly overridden the default with command-line flag; the PEP recommends ‑‑break‑system‑packages for the flag name. The EXTERNALLY‑MANAGED file can contain an error message that pip should return when it exits due to those conditions being met; the messages can be localized in the file as well. The intent is for the message to give distribution-specific information guiding the user to the proper way to create a virtual environment.
Another problem that can occur is when packages are removed from system-wide installs by pip. If, for example, the user installs a package system-wide and runs into a problem, the "obvious" solution to that may cause bigger problems:
There is a worse problem with system-wide installs: if you attempt to recover from this situation with sudo pip uninstall, you may end up removing packages that are shipped by the system's package manager. In fact, this can even happen if you simply upgrade a package - pip will try to remove the old version of the package, as shipped by the OS. At this point it may not be possible to recover the system to a consistent state using just the software remaining on the system.
A second change proposed in the PEP would limit pip to only operating on the directories specified for its use. The idea is that distributions can separate the two kinds of packages into their own directories, which is something that several Linux distributions already do:
For example, Fedora and Debian (and their derivatives) both implement this split by using /usr/local for locally-installed packages and /usr for distro-installed packages. Fedora uses /usr/local/lib/python3.x/site‑packages vs. /usr/lib/python3.x/site‑packages. (Debian uses /usr/local/lib/python3/dist‑packages vs. /usr/lib/python3/dist‑packages as an additional layer of separation from a locally-compiled Python interpreter: if you build and install upstream CPython in /usr/local/bin, it will look at /usr/local/lib/python3/site‑packages, and Debian wishes to make sure that packages installed via the locally-built interpreter don't show up on sys.path for the distro interpreter.)
So the proposal would require pip to query the location where it is meant to place its packages and only modify files in that directory. Since the locally installed packages are normally placed ahead of the system-wide packages on sys.path, though, this can lead to pip "shadowing" a distribution package. Shadowing an installed package can, of course, lead to some of the problems mentioned, so it is recommended that pip emit a warning when this happens.
The PEP has an extensive
analysis of the use cases and the impact these changes will have.
"The changed behavior in this PEP is intended to 'do the right thing'
for as many use cases as possible.
" In particular, the changes to
allow distributions to have two different locations for packages and for
pip not to change the system-wide location are essentially standardizing the
current practice of some distributions. The "Recommendations
for distros" section of the PEP specifically calls out that separation
as a best practice moving forward.
There are situations where distributions would not want to default to this new behavior, however. Containers for single applications may not benefit from the restrictions, so the PEP recommends that distributions change their behavior for those container images:
Distros that produce official images for single-application containers (e.g., Docker container images) should remove the EXTERNALLY‑MANAGED file, preferably in a way that makes it not come back if a user of that image installs package updates inside their image (think RUN apt‑get dist‑upgrade). On dpkg-based systems, using dpkg‑divert ‑‑local to persistently rename the file would work. On other systems, there may need to be some configuration flag available to a post-install script to re-remove the EXTERNALLY‑MANAGED file.
In general, the PEP seems not to be particularly controversial. The PEP discussion thread is positive for the most part, though Paul Moore, who may be the PEP-Delegate deciding on the proposal, is concerned that those affected may not even know about it:
One thing I would be looking for is a bit more discussion - the linux-sig discussion mentioned was only 6 messages since May, and there's only a couple of messages here. I'm not convinced that "silence means approval" is sufficient here, it's difficult to be sure where interested parties hang out, so silence seems far more likely to imply "wasn't aware of the proposal" in this case. In fact, I'd suggest that the PEP gets a section listing distributions that have confirmed their intent to support this proposal, including the distribution, and a link to where the commitment was made.
Assuming said confirmations are forthcoming, or that any objections and suggestions can be accommodated, PEP 668 seems like a nice step forward for Python. Having tools like DNF and apt fight with pip and others is obviously a situation that has caused problems in the past and will do so again. Finding a way to cooperate without causing any major backward-compatibility headaches is important. Ensuring that other distributions are on-board with these changes, all of which are ultimately optional anyway, should lead to more stability and, ultimately, happier users—both for Python and for the distributions.
Index entries for this article | |
---|---|
Python | Linux distributions |
Python | Python Enhancement Proposals (PEP)/PEP 668 |
Posted Aug 31, 2021 21:03 UTC (Tue)
by NYKevin (subscriber, #129325)
[Link] (17 responses)
Posted Aug 31, 2021 22:01 UTC (Tue)
by beagnach (guest, #32987)
[Link] (15 responses)
Posted Aug 31, 2021 22:42 UTC (Tue)
by NYKevin (subscriber, #129325)
[Link] (14 responses)
No really, click through. Look at the sheer *number* of people who were (or possibly still are?) inconvenienced by this behavior. I'm sure they had their reasons, but they broke a lot of workflows when they did that.
Posted Sep 1, 2021 0:57 UTC (Wed)
by JanC_ (guest, #34940)
[Link] (12 responses)
There’s lots of other parts of “Python” that are not installed by default, e.g. development headers, documentation, tests, examples, IDLE, etc. Basically, a default install includes everything you need to run Python programs, but not everything you might need to develop in Python. I assume this is mostly to reduce its size in environments where all those aren’t needed (which is by far the majority of installations).
Posted Sep 1, 2021 4:13 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (11 responses)
Posted Sep 1, 2021 5:49 UTC (Wed)
by stefanor (subscriber, #32895)
[Link] (10 responses)
The good news is that the python3-full binary package now exists to meet this need. If you're a Pythonista, you can install this and get the stdlib you expect.
But let's dig deeper and questioning the assumption you made. Why do distros break it?
There are some optional parts of the stdlib (e.g. database drivers), and then there are the mechanics of the operating system to consider.
If an app in your operating system is written in Python, it is reasonable for that application to depend on a subset of the Python standard library, to reduce install footprint by minimizing dependencies. This may sound unreasonable at first, but there are things in the Python standard library that you really don't need for app runtime: e.g. documentation, dev headers, stdlib test suite, Tkinter, IDLE, lib2to3, ensurepip, or distutils. As a Python developer, these may seem like sacrosanct parts of the stdlib, but as a distro maintainer, they are not something that you need to waste install CD space for, and most desktop end-users will never miss them. You can significantly reduce the installed size of the Python stack and its dependencies on most end-user systems by making these components optional.
Generally distributions break complex packages up into multiple pieces, trying to find a balance between a minimal core and all the optional bits that users may need for their particular use case. (Not every optional feature will be supported, of course). In Debian, libreoffice is broken into around 200 binary packages, as an extreme example.
Python in Debian is broken into several major pieces:
python3: The main CPython interpreter package, including the standard runtime stdlib.
> The entire purpose of ensurepip was to *ensure* that everyone had pip available with every installation of Python, by incorporating it into the stdlib.
And yet ensurepip never really made sense in a typical package-managed Linux distro. Distros have package-managers that are responsible for installing things. They don't tend to get on well with other tools messing in the same trees of the filesystem. (That's what this article is about.) Debian expects Debian users to install pip by apt installing python3-venv or python3-pip (as appropriate), not by running ensurepip to install things in /usr. For this reason Debian has always hobbled ensurepip to print an error message explaining this, when executed directly. When used by the venv module, it just does what you expect, and creates a venv seeded with pip.
Posted Sep 1, 2021 9:40 UTC (Wed)
by MrWim (subscriber, #47432)
[Link]
I think this is exactly why it makes sense to include venv wherever you include pip. venv is the mechanism that people use to avoid messing with the same trees of the filesystem. By not including it you have a pip that can mess with the distro provided packages, but you don't have the capability to sandbox off these changes.
Note: you don't need to be a Python developer to want pip. You'll need it whenever you want to run any non-distro-provided Python software - not only when developing it. It's exactly these users who are not familiar with the Python packaging tools that are at most risk from breaking their systems in a way that they don't know how to diagnose or fix.
Posted Sep 1, 2021 12:31 UTC (Wed)
by beagnach (guest, #32987)
[Link] (2 responses)
> The good news is that the python3-full binary package now exists to meet this need. If you're a Pythonista, you can install this and get the stdlib you expect
The difference between these two packages isn't quite clear to me.
Are there parts of the standard library absent from the python3 package in latest Debian? If so is there an easy way to find out the details?
My experience has been that this is often under-documented - it can be quite hard to establish exactly how components are divided up between the various debian packages relating to a single large project.
Posted Sep 1, 2021 18:22 UTC (Wed)
by stefanor (subscriber, #32895)
[Link]
The written documentation for the split is here:
Posted Sep 7, 2021 4:47 UTC (Tue)
by nevyn (guest, #33129)
[Link]
Using the word "python3" as though it's a single thing is confusing you, and probably others trying to follow along. For almost any project in a distribution you have at least three things worth talking about: 1) All of it (roughly what you get if you installed from an upstream tarball). 2) The major usable piece of it (/usr/bin/python3 and all of stdlib. so you can run arbitrary python3 programs, in this case). 3) Specific dependencies so that distribution package FOO can run correctly.
All three of those things might install /usr/bin/python3, and could thus. be called "python3" but are likely to be very different otherwise.
Posted Sep 1, 2021 20:59 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (5 responses)
For quite a while, this was not the case, and you simply got a broken venv. Fortunately, it would print an error message explaining the problem, but that's still incredibly obnoxious behavior.
Posted Sep 1, 2021 22:16 UTC (Wed)
by stefanor (subscriber, #32895)
[Link] (4 responses)
That is still the case. It doesn't complete creating the venv, prints an error explaining why, and exits non-zero.
Posted Sep 1, 2021 22:42 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (3 responses)
Maybe you disagree with me about where exactly to draw the line between "developer-oriented" and "regular" parts of the stdlib. But for better or for worse, the documentation at docs.python.org strongly suggests that this is a standard part of Python, and that the harried sysadmin who just wants to automate their deployment process (without doing a whole lot of faffing about with Docker and Kubernetes) can reasonably assume it will work, because it's part of the whole Batteries Included thing. And of course, it actually *does* work on the sysadmin's laptop, because at some point they installed venv or something which depends on it, and then forgot about it, because setting up a venv is practically the first thing you do after you graduate from writing Hello World programs in Python.
I can understand limiting ensurepip so that it can only install into venv. But splitting out venv altogether just smacks of the pretentious "for your own good" nonsense I got sick of with the GNOME folks years ago. This is not some huge package like GCC or Make, nor is it a set of bindings to a big thing that might not be installed (like Tkinter). venv is a relatively small set of self-contained tools which upstream has loudly advertised as Included.™ I don't see which users actually benefit from breaking it out into a separate package, unless the only benefit is a slightly smaller installation size. But then you might as well go around breaking everything out into separate packages. Let's have python3-pathlib and python3-itertools and python3-enum and...
Posted Sep 2, 2021 0:22 UTC (Thu)
by stefanor (subscriber, #32895)
[Link] (2 responses)
Yes there is, it's minimising the payload of Python to support applications written in Python.
On a clean Debian bullseye install, installing python3-venv requires an additional 7 binary packages using 5MiB of disk space, over just python3.
We've simplified things over time. python-pip-whl currently contains 31 wheels, in the past these were all separate binary packages.
The harried sysadmin you describe will get an error message describing exactly what they need to do to resolve their issue. This probably one of tens of such issues they'll see while automating their deployment.
Posted Sep 2, 2021 19:14 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
I am willing to concede that it is technically possible that someone, somewhere, might not want to have this ~400 KiB package, and/or OpenSSL, installed.
The other five, however, are nonsense. Two of the packages you list are just more specific versions of venv, and pip-whl and distutils are both part of the Python stdlib proper, which Debian has artificially split out. That leaves lib2to3, and I would be shocked if that *wasn't* self-inflicted by Debian supporting old versions of Python. Upstream certainly shouldn't be depending on it anymore.
In short: The only real argument I'm seeing here is "some people want Python, but don't want OpenSSL." I'm skeptical that this is a very large or interesting class of users, worth the inconvenience it causes to everybody else.
> The harried sysadmin you describe will get an error message describing exactly what they need to do to resolve their issue.
No. They will click the big shiny "deploy" button, deployment will not actually happen, and they will then spend 20+ minutes digging through logs until they eventually figure out that it's the venv step which is failing.
Posted Sep 10, 2021 9:09 UTC (Fri)
by laarmen (subscriber, #63948)
[Link]
My guess is that a choice has been made to spare those MiB for all the Debian installs which simply want to use a Python program, not *develop* one (or deploy an application with dependencies not satisfied by Debian...). This is fairly logical and in keeping with other Debian packaging practices, such as not shipping the library headers with the main lib package, etc...
And yes, there's a lot of other ways some space could be saved on Debian installs.
Posted Sep 7, 2021 18:22 UTC (Tue)
by dbnichol (subscriber, #39622)
[Link]
Posted Sep 1, 2021 14:27 UTC (Wed)
by xecycle (subscriber, #140261)
[Link]
Posted Sep 1, 2021 14:22 UTC (Wed)
by martin.langhoff (subscriber, #61417)
[Link]
- "EXTERNALLY_MANAGED" indicates perspective - external to whom?. Instead, doing something like `echo rpm > MANAGED_BY` is much better, scalable, future-proof.
- We need this same dialog (and mechanisms) for PHP, Perl, Ruby, JS...
Posted Sep 1, 2021 17:48 UTC (Wed)
by MattBBaker (guest, #28651)
[Link] (3 responses)
Oh this is going to be a fun flag to explain to users.
Posted Sep 1, 2021 18:08 UTC (Wed)
by mathstuf (subscriber, #69389)
[Link]
Posted Sep 1, 2021 21:59 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
Posted Sep 2, 2021 16:45 UTC (Thu)
by kpfleming (subscriber, #23250)
[Link]
Posted Sep 1, 2021 20:42 UTC (Wed)
by cyperpunks (subscriber, #39406)
[Link] (1 responses)
$ upip install spam
will first search distro repos for spam and install found package by native pkg package tool.
If not found in native repos, jump to pip (in the background of course) and install into special pip dirs
Keep track of packaged install by both pip and native tooling.
If doing:
$ upip remove eggs
would break something in either systems it will complain and prevent such action.
Posted Sep 1, 2021 21:12 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link]
1. The vast majority of the time, a regular end user (not a developer) either wants to install via their native package manager, or via something like an Ubuntu PPA or Snap/Flatpak. Using pip is a Bad Idea because tools like unattended-upgrades don't know how to interact with it (so you don't get automated security updates, your sysadmin can't audit your installed packages, etc. unless someone goes around building out all of that functionality to support pip). Also, pip can easily install packages which are Not Ready For Production Use,™ usually with very little or no warning to the end user, because it's a developer-oriented tool.
Posted Sep 5, 2021 20:35 UTC (Sun)
by j0057 (subscriber, #143939)
[Link] (2 responses)
Posted Sep 6, 2021 23:35 UTC (Mon)
by Conan_Kudo (subscriber, #103240)
[Link] (1 responses)
Posted Sep 8, 2021 12:54 UTC (Wed)
by j0057 (subscriber, #143939)
[Link]
Cooperative package management for Python
Cooperative package management for Python
Cooperative package management for Python
Cooperative package management for Python
Cooperative package management for Python
Cooperative package management for Python
python3-minimal: Intended for install environments and size constrained CD images. Just the CPython interpreter + a minimalist subset of the stdlib.
python3-doc: Documentation.
python3-dev: C header files, the -config script, and a static version of libcpython.
python3-distutils: The distutils module (only needed at build time)
python3-examples: Examples, Demos and Tools
python3-dbg: A debug build of the CPython interpreter
python3-gdbm: The GNU dbm driver (and dependency on libgdbm)
python3-tk: tk module (and Tcl/Tk dependencies)
python3-venv: Depends on the wheels that ensurepip requires to bootstrap pip into a venv.
libpython3.X-testsuite: The stdlib test suite
idle-python-3.X: IDLE
Technically, there are a few more packages, but these are the functional break-points.
Cooperative package management for Python
Cooperative package management for Python
> python3: The main CPython interpreter package, including the standard runtime stdlib.
Cooperative package management for Python
https://www.debian.org/doc/packaging-manuals/python-polic...
And yes, that often lags behind the reality in the archive, it's easy to forget to update documentation. It is currently accurate, though.
Cooperative package management for Python
Cooperative package management for Python
Cooperative package management for Python
Cooperative package management for Python
Cooperative package management for Python
They are: ca-certificates openssl python-pip-whl python3-distutils python3-lib2to3 python3-venv python3.9-venv
Cooperative package management for Python
Package: ca-certificates
Version: 20210119
Priority: optional
Section: misc
Maintainer: Julien Cristau <jcristau@debian.org>
Installed-Size: 391 kB
Depends: openssl (>= 1.1.1), debconf (>= 0.5) | debconf-2.0
Breaks: ca-certificates-java (<< 20121112+nmu1)
Enhances: openssl
Download-Size: 247 kB
APT-Manual-Installed: yes
APT-Sources: [my employer's private mirror]
Description: Common CA certificates
Contains the certificate authorities shipped with Mozilla's browser to allow
SSL-based applications to check for the authenticity of SSL connections.
.
Please note that Debian can neither confirm nor deny whether the
certificate authorities whose certificates are included in this package
have in any way been audited for trustworthiness or RFC 3647 compliance.
Full responsibility to assess them belongs to the local system
administrator.
Cooperative package management for Python
Cooperative package management for Python
Cooperative package management for Python
Cooperative package management for Python
Cooperative package management for Python
"Why would I want an option to hose my system install?"
"Okay, it's not an affirmative action telling it to hose your system. Pip just gained the ability to tell if your about to hose your system, and advise you not to. But give you the option to do it anyways if you really want to."
Cooperative package management for Python
Cooperative package management for Python
Cooperative package management for Python
Cooperative package management for Python
(which are outside native pkg manager dirs.)
Cooperative package management for Python
2. The vast majority of the time, a developer wants to install into a venv (virtualenv, or "virtual environment"), which is basically a directory with a bunch of locally-installed pip packages and some environment-variable-modifying scripts. The net effect is that, when the venv is "active," Python looks inside the venv before or instead of the system-wide packages. venvs are very lightweight; you can remove them with rm -r, and they leave no trace on the system. In practice, most native package managers are not designed to do that sort of thing (unless you use something much heavier than a venv, such as spinning up an entire Docker container).
Cooperative package management for Python
"dist-packages" was always a Debian-ism. At least I've never seen it in any other distribution family. Ruby has a concept of "vendor directories" and "site directories", and Python would probably benefit from a standardized approach like that.
Cooperative package management for Python
Cooperative package management for Python