|
|
Subscribe / Log in / New account

Cooperative package management for Python

Cooperative package management for Python

Posted Sep 1, 2021 4:13 UTC (Wed) by NYKevin (subscriber, #129325)
In reply to: Cooperative package management for Python by JanC_
Parent article: Cooperative package management for Python

As mentioned, both venv and ensurepip have programmatic interfaces and are part of the Python stdlib. If you tell me that your operating system has "Python" on it, I am going to assume that I can use and call into every single part of the stdlib. I'm not going to split out parts of the stdlib as separate dependencies, or instruct people to run distro-specific hacks to fix the half-an-installation they got by default. The entire purpose of ensurepip was to *ensure* that everyone had pip available with every installation of Python, by incorporating it into the stdlib. That's why it's called "ensurepip," and not "maybepip" or "optionalpip." By removing it, you are breaking API compatibility with standard (upstream) Python.


to post comments

Cooperative package management for Python

Posted Sep 1, 2021 5:49 UTC (Wed) by stefanor (subscriber, #32895) [Link] (10 responses)

> If you tell me that your operating system has "Python" on it, I am going to assume that I can use and call into every single part of the stdlib.

The good news is that the python3-full binary package now exists to meet this need. If you're a Pythonista, you can install this and get the stdlib you expect.

But let's dig deeper and questioning the assumption you made. Why do distros break it?

There are some optional parts of the stdlib (e.g. database drivers), and then there are the mechanics of the operating system to consider.

If an app in your operating system is written in Python, it is reasonable for that application to depend on a subset of the Python standard library, to reduce install footprint by minimizing dependencies. This may sound unreasonable at first, but there are things in the Python standard library that you really don't need for app runtime: e.g. documentation, dev headers, stdlib test suite, Tkinter, IDLE, lib2to3, ensurepip, or distutils. As a Python developer, these may seem like sacrosanct parts of the stdlib, but as a distro maintainer, they are not something that you need to waste install CD space for, and most desktop end-users will never miss them. You can significantly reduce the installed size of the Python stack and its dependencies on most end-user systems by making these components optional.

Generally distributions break complex packages up into multiple pieces, trying to find a balance between a minimal core and all the optional bits that users may need for their particular use case. (Not every optional feature will be supported, of course). In Debian, libreoffice is broken into around 200 binary packages, as an extreme example.

Python in Debian is broken into several major pieces:

python3: The main CPython interpreter package, including the standard runtime stdlib.
python3-minimal: Intended for install environments and size constrained CD images. Just the CPython interpreter + a minimalist subset of the stdlib.
python3-doc: Documentation.
python3-dev: C header files, the -config script, and a static version of libcpython.
python3-distutils: The distutils module (only needed at build time)
python3-examples: Examples, Demos and Tools
python3-dbg: A debug build of the CPython interpreter
python3-gdbm: The GNU dbm driver (and dependency on libgdbm)
python3-tk: tk module (and Tcl/Tk dependencies)
python3-venv: Depends on the wheels that ensurepip requires to bootstrap pip into a venv.
libpython3.X-testsuite: The stdlib test suite
idle-python-3.X: IDLE
Technically, there are a few more packages, but these are the functional break-points.

> The entire purpose of ensurepip was to *ensure* that everyone had pip available with every installation of Python, by incorporating it into the stdlib.

And yet ensurepip never really made sense in a typical package-managed Linux distro. Distros have package-managers that are responsible for installing things. They don't tend to get on well with other tools messing in the same trees of the filesystem. (That's what this article is about.) Debian expects Debian users to install pip by apt installing python3-venv or python3-pip (as appropriate), not by running ensurepip to install things in /usr. For this reason Debian has always hobbled ensurepip to print an error message explaining this, when executed directly. When used by the venv module, it just does what you expect, and creates a venv seeded with pip.

Cooperative package management for Python

Posted Sep 1, 2021 9:40 UTC (Wed) by MrWim (subscriber, #47432) [Link]

> ensurepip never really made sense in a typical package-managed Linux distro. Distros have package-managers that are responsible for installing things. They don't tend to get on well with other tools messing in the same trees of the filesystem.

I think this is exactly why it makes sense to include venv wherever you include pip. venv is the mechanism that people use to avoid messing with the same trees of the filesystem. By not including it you have a pip that can mess with the distro provided packages, but you don't have the capability to sandbox off these changes.

Note: you don't need to be a Python developer to want pip. You'll need it whenever you want to run any non-distro-provided Python software - not only when developing it. It's exactly these users who are not familiar with the Python packaging tools that are at most risk from breaking their systems in a way that they don't know how to diagnose or fix.

Cooperative package management for Python

Posted Sep 1, 2021 12:31 UTC (Wed) by beagnach (guest, #32987) [Link] (2 responses)

So the distro package of python3 may not include all the standard library... that came as quite the surprise. But yes, it makes sense from a distro package manager point of view.

> The good news is that the python3-full binary package now exists to meet this need. If you're a Pythonista, you can install this and get the stdlib you expect
> python3: The main CPython interpreter package, including the standard runtime stdlib.

The difference between these two packages isn't quite clear to me.

Are there parts of the standard library absent from the python3 package in latest Debian? If so is there an easy way to find out the details?

My experience has been that this is often under-documented - it can be quite hard to establish exactly how components are divided up between the various debian packages relating to a single large project.

Cooperative package management for Python

Posted Sep 1, 2021 18:22 UTC (Wed) by stefanor (subscriber, #32895) [Link]

The easy way to find out the details is to look at what python3-full depends on.

The written documentation for the split is here:
https://www.debian.org/doc/packaging-manuals/python-polic...
And yes, that often lags behind the reality in the archive, it's easy to forget to update documentation. It is currently accurate, though.

Cooperative package management for Python

Posted Sep 7, 2021 4:47 UTC (Tue) by nevyn (guest, #33129) [Link]

> So the distro package of python3 may not include all the standard library

Using the word "python3" as though it's a single thing is confusing you, and probably others trying to follow along. For almost any project in a distribution you have at least three things worth talking about: 1) All of it (roughly what you get if you installed from an upstream tarball). 2) The major usable piece of it (/usr/bin/python3 and all of stdlib. so you can run arbitrary python3 programs, in this case). 3) Specific dependencies so that distribution package FOO can run correctly.

All three of those things might install /usr/bin/python3, and could thus. be called "python3" but are likely to be very different otherwise.

Cooperative package management for Python

Posted Sep 1, 2021 20:59 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (5 responses)

> When used by the venv module, it just does what you expect, and creates a venv seeded with pip.

For quite a while, this was not the case, and you simply got a broken venv. Fortunately, it would print an error message explaining the problem, but that's still incredibly obnoxious behavior.

Cooperative package management for Python

Posted Sep 1, 2021 22:16 UTC (Wed) by stefanor (subscriber, #32895) [Link] (4 responses)

> For quite a while, this was not the case, and you simply got a broken venv. Fortunately, it would print an error message explaining the problem, but that's still incredibly obnoxious behavior.

That is still the case. It doesn't complete creating the venv, prints an error explaining why, and exits non-zero.

Cooperative package management for Python

Posted Sep 1, 2021 22:42 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (3 responses)

There is no logical reason for that limitation. The only thing that venv actually does is make a directory with a very specific set of regular files and directories underneath. It does not interact with the system's package manager or site-packages in any way. The scripts which it creates are specifically designed to *prevent* the user from interacting with the system's package manager or site-packages. venv is not "just for developers" either, since it can used in (for example) an automated deployment script for a cattle-not-pets environment. That's why it has a programmatic interface.

Maybe you disagree with me about where exactly to draw the line between "developer-oriented" and "regular" parts of the stdlib. But for better or for worse, the documentation at docs.python.org strongly suggests that this is a standard part of Python, and that the harried sysadmin who just wants to automate their deployment process (without doing a whole lot of faffing about with Docker and Kubernetes) can reasonably assume it will work, because it's part of the whole Batteries Included thing. And of course, it actually *does* work on the sysadmin's laptop, because at some point they installed venv or something which depends on it, and then forgot about it, because setting up a venv is practically the first thing you do after you graduate from writing Hello World programs in Python.

I can understand limiting ensurepip so that it can only install into venv. But splitting out venv altogether just smacks of the pretentious "for your own good" nonsense I got sick of with the GNOME folks years ago. This is not some huge package like GCC or Make, nor is it a set of bindings to a big thing that might not be installed (like Tkinter). venv is a relatively small set of self-contained tools which upstream has loudly advertised as Included.™ I don't see which users actually benefit from breaking it out into a separate package, unless the only benefit is a slightly smaller installation size. But then you might as well go around breaking everything out into separate packages. Let's have python3-pathlib and python3-itertools and python3-enum and...

Cooperative package management for Python

Posted Sep 2, 2021 0:22 UTC (Thu) by stefanor (subscriber, #32895) [Link] (2 responses)

> There is no logical reason for that limitation.

Yes there is, it's minimising the payload of Python to support applications written in Python.

On a clean Debian bullseye install, installing python3-venv requires an additional 7 binary packages using 5MiB of disk space, over just python3.
They are: ca-certificates openssl python-pip-whl python3-distutils python3-lib2to3 python3-venv python3.9-venv

We've simplified things over time. python-pip-whl currently contains 31 wheels, in the past these were all separate binary packages.

The harried sysadmin you describe will get an error message describing exactly what they need to do to resolve their issue. This probably one of tens of such issues they'll see while automating their deployment.

Cooperative package management for Python

Posted Sep 2, 2021 19:14 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (1 responses)

% apt show ca-certificates
Package: ca-certificates
Version: 20210119
Priority: optional
Section: misc
Maintainer: Julien Cristau <jcristau@debian.org>
Installed-Size: 391 kB
Depends: openssl (>= 1.1.1), debconf (>= 0.5) | debconf-2.0
Breaks: ca-certificates-java (<< 20121112+nmu1)
Enhances: openssl
Download-Size: 247 kB
APT-Manual-Installed: yes
APT-Sources: [my employer's private mirror]
Description: Common CA certificates
Contains the certificate authorities shipped with Mozilla's browser to allow
SSL-based applications to check for the authenticity of SSL connections.
.
Please note that Debian can neither confirm nor deny whether the
certificate authorities whose certificates are included in this package
have in any way been audited for trustworthiness or RFC 3647 compliance.
Full responsibility to assess them belongs to the local system
administrator.

I am willing to concede that it is technically possible that someone, somewhere, might not want to have this ~400 KiB package, and/or OpenSSL, installed.

The other five, however, are nonsense. Two of the packages you list are just more specific versions of venv, and pip-whl and distutils are both part of the Python stdlib proper, which Debian has artificially split out. That leaves lib2to3, and I would be shocked if that *wasn't* self-inflicted by Debian supporting old versions of Python. Upstream certainly shouldn't be depending on it anymore.

In short: The only real argument I'm seeing here is "some people want Python, but don't want OpenSSL." I'm skeptical that this is a very large or interesting class of users, worth the inconvenience it causes to everybody else.

> The harried sysadmin you describe will get an error message describing exactly what they need to do to resolve their issue.

No. They will click the big shiny "deploy" button, deployment will not actually happen, and they will then spend 20+ minutes digging through logs until they eventually figure out that it's the venv step which is failing.

Cooperative package management for Python

Posted Sep 10, 2021 9:09 UTC (Fri) by laarmen (subscriber, #63948) [Link]

You're focusing on the functionality here, but there's an important point made by stefanor : venv pulls in several MiB of dependencies.

My guess is that a choice has been made to spare those MiB for all the Debian installs which simply want to use a Python program, not *develop* one (or deploy an application with dependencies not satisfied by Debian...). This is fairly logical and in keeping with other Debian packaging practices, such as not shipping the library headers with the main lib package, etc...

And yes, there's a lot of other ways some space could be saved on Debian installs.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds