Convergence in the pip and conda worlds?
The discussions about the world of Python packaging and the problems caused by its disparate tools and incompatible ecosystems are still ongoing. Last week, we looked at the beginnings of the conversation in mid-November, as the discussion turned toward a possible convergence between two of the major package-management players: pip and conda. There are numerous barriers to bringing the two closer together, inertia not least, but the advantages for users of both, as well as new users to come, could be substantial.
conda versus pip
As our overview of the packaging landscape outlines, the Anaconda distribution for Python, which developed conda as its package manager, is largely aimed at the scientific-computing world, while pip comes out of the Python Packaging Authority (PyPA). These days, pip is one of the "batteries included" with Python so it is often seen as the "official" packaging solution, even though the PyPA does not necessarily see it that way. The belief that pip is official is part of the problem, H. Vetinari said:
If the python packaging authority doesn't mention conda anywhere, a lot of people will never even discover it. And even people who are aware are doubtful - I see the confusion all the time (in my dayjob and online) about which way is "the right way" to do python packaging and dependency management.I firmly believe that the vast majority of users would adapt to any paradigm that solves their problems and doesn't get in their way too much. I think the strongest resistance actually comes from those people knee-deep in packaging entrails, and the significance of that group is that many of them are the movers and shakers of the (non-conda) packaging ecosystem.
Vetinari is active in the conda-forge
package repository community; he thinks that the conda "side" is
willing to change to try to find a way to cover the needs of those who do
not currently use it. "End users really don't benefit from a zoo of
different solutions [...]
". PyPA developer Paul Moore had suggested
that it would require a fair amount of work to bring pip and conda
together, though he is personally not an advocate of that plan. Steve
Dower said
that he did not see a need to "reconcile conda into a 'Python packaging
vision'
", since conda is a "full-stack" solution that provides
everything, including Python itself. But Vetinari sees things differently:
Conda is full-stack because that's – unfortunately – what's necessary to deal with the inherent complexity of the problem space. But it's not a beneficial state of affairs for anyone IMO; that divergence is an issue that affects a huge amount of python deployments (e.g. having to decide how to prioritize the benefits of pyproject.toml / poetry's UX [user experience], etc. vs. the advantages of conda) – it's possible to claim that it's too much work to reconcile, but fundamentally, that schism shouldn't have to exist.
Ralf Gommers pointed out that discussions of this sort often go nowhere because the participants are talking past each other. The "regular" Python users, who can mostly just pick up their dependencies from the Python Package Index (PyPI) using pip or who get packages via their Linux distribution, have a much different picture from those doing scientific computing or machine learning with Python. The two groups generally do not experience the same problems, thus it is not surprising that they do not see solutions that bridge both worlds.
The problems for scientific Python—which are "related to compiler
toolchains, ABIs, distributing packages with compiled code in them, being
able to express dependencies on non-Python libraries and tools,
etc.
"—are complex, but they have not been explained well over the
years. Gommers was pre-announcing an effort to fill that hole: "I'm
making a serious attempt at comprehensively describing the key problems
scientific, ML/AI and other native-code-using folks have with PyPI, wheels
and Python packaging.
" The pypackaging-native site is
meant as a reference site, "so we hopefully stop talking past each
other
". He formally announced
(and linked to) the site at the end of December.
History and future
Bryan Van de Ven recounted
some of the history of conda, noting that it came about around the same
time as the PyPA and before the wheel binary package format
was born. Decisions that were made at that time would probably be
made much differently today. Van de Ven noted that he is no longer a conda
developer, but he did have a specific wish list
of features for more unified packaging if he "could wave a wand
":
- conda-style environments (because a link farm is more general)
- wheel packages for most/all Python packages (because they are sufficient)
- "conda packages" (or something like them) for anything else, e.g. non-python requirements
He was asked about conda's "link farm" environment, which is another way to provide a virtual environment, like those created by venv in the standard library or virtualenv on PyPI. Van de Ven briefly described the idea since he was unaware of any documentation on it:
The gist is that every version of every package is installed in a completely isolated directory, with its own entire "usr/local" hierarchy underneath. Then "creating an environment" means making a directory <envname> with an empty "usr/local" hierarchy, and linking in all the files from the relevant package version hierarchies there. Now "activating an environment" means "point your PATH at <envname>/bin".
A Python virtual environment created by venv is a separate directory structure that effectively builds atop an existing Python installation on the host system. Packages are installed into a venv-specific site-packages directory; a venv-specific bin holds a link to the Python binary as well as an activation script that can be run to "enter" the environment. The venv arranges that executing a script from the bin automatically activates the environment for that invocation; actually doing an activation sets up the shell path and Python sys.prefix and sys.exec_prefix variables to point into the environment until it is deactivated.
Moore wondered
whether it made sense to start working on some of those items that Van de
Ven had listed. The changes that the PyPA is working on "have extremely
long timescales
" because getting people to move away from their existing
practices "is an incredibly slow process, if you don't want to
just alienate everyone
". Given that, it makes sense to start now with
incremental changes and in establishing standards moving forward.
Of course, there's no guarantee that everyone shares your view on the ideal solution (and if you're looking to standardise on conda-style environments, that will include the core devs, as venv is a stdlib facility) but I'd hope that negotiation and compromise isn't out of the question here :)
Gommers agreed
with that as a "desired solution direction
" but as he and Moore
discussed it further in the thread, it was clear there is still a fairly
wide gulf to somehow bridge. Nathaniel J. Smith thought
that it made more sense for conda to integrate more from pip than
the other way around:
I think the simplest way to make conda/pip play well together would be for conda to add first-class support for the upstream python packaging formats – wheels, .dist-info directories, etc. Then conda could see a complete picture of everything that's installed, whether from conda packages or wheels, handle conflicts between them, etc.This seems a lot easier than pip growing to support conda, because pip is responsible for supporting all python environments – venv, distro, whatever – while conda is free to specialize. Also the python packaging formats are much better documented than the conda equivalents.
The discussion so far had proceeded without any conda developers weighing
in, but that changed when conda tech lead (and PyPA co-founder) Jannis
Leidel posted. "I
hope to build bridges between conda and the PyPA stack as much as possible
to improve the user experience of both ecosystems.
" He noted that
conda has moved to a multi-stakeholder
governance model; "Anaconda is still invested (and increasingly so)
but it's not the only stakeholder anymore
". He thinks that
both conda and the PyPA "made the same painful mistakes of
over-optimizing for a particular subset of users
", which is a
fundamental problem. He also made some general
points about the packaging situation and on working together.
Moore had two
specific questions for Leidel. Did he think that conda would ever be
usable with a Python
installation that was not created by conda? Would conda builds of Python
packages ever be usable by non-conda tools like pip? Moore
concluded: "For me, those are the two key factors that will determine
whether we should be thinking in terms of a single unified ecosystem, or
multiple independent ones.
"
Leidel replied
that it would be hard to get conda to work with other Python installations
"since for conda Python is just another package that it expects to have
been consistently built and available
". Using conda packages elsewhere
is more plausible, but there is still quite a bit of work to get there.
For one thing, he would like to see "an evolution of the wheel format to
optionally include conda-style features
". He agreed that the question
of unification versus multiple independent projects was an important one to
answer, however.
Vendoring
One problem area is that PyPI packages often include (or "vendor") other libraries and such into their wheels in order to make it easier for users who may not have the specialized libraries available. Those who use Linux package managers typically do not have those problems because the distribution packages the dependencies separately and the package manager installs them automatically—the same goes for conda users. Dower said that this vendoring is one of the main reasons that conda cannot simply consult PyPI to pick up its dependencies since it may well also get incompatible versions of other libraries that are along for the ride.
If Conda also searched PyPI for packages, this would mean packagers would just have to publish a few additional wheels that:Those three points are the critical ones that make sharing builds between Conda and PyPI impossible (or at least, against the design) regardless of direction.
- don't vendor things available as conda packages
- do include additional dependencies for those things
- link against the import libraries/headers/options used for the matching Conda builds of dependencies
Numpy installed through PyPI needs to vendor anything that can't be assumed to be on the system. Numpy installed through Conda must not vendor it, because it should be using the same shared library as everything else in the environment. This can only realistically be reconciled with multiple builds and separate packages (or very clever packages).
Adding some platform/ABI tags to wheels for conda, as Leidel suggested, could make PyPI/pip and conda more interoperable, Dower said. He outlined a set of things that needed to be done, starting with a way to define "native requirements" (for the non-Python dependencies). Gommers explained what that would look like using SciPy as an example. It has various Python dependencies (e.g. NumPy, Cython, Meson, etc.) that it declares in its pyproject.toml file, but there is also a list of dependencies that cannot be declared that way: C/C++ compiler, Fortran compiler, BLAS and LAPACK, and so on. He is interested in working on a Python Enhancement Proposal (PEP) to add a way to declare the native dependencies; he thinks that could help improve the user experience, especially for packages with more complicated needs:
And SciPy is still simple compared to other cases, like GPU or distributed libraries. Right now we just start a build when someone types pip install scipy and there's no wheel. And then fail halfway through with a hopefully somewhat clear error message. And then users get to read the html docs to figure out what they are missing. At that point, even a "system dependencies" list that pip can only show as an informative error message at the end would be a big help.
Moore was certainly
in favor of that approach. Being able to check in advance whether it
will be possible to
build something for a Python package would be useful so that tools can at
least tell users what it is that they are missing. More capable tools may
be able to actually go out and fetch the needed pieces; "even for pip,
having better errors and not starting builds that are guaranteed to fail
would be a great step forward
".
After some more discussion on the need for additional metadata in order to support those kinds of changes, the conversation began to trail off—in that thread, anyway. At the end of November, the results of a survey of users about packaging was announced, which, perhaps unsurprisingly, resulted in more discussion, there and in a strategy discussion thread that was started shortly after the new year. Beyond that, several PEPs have been floating around for discussion, while yet another packaging tool and binary format was announced. It is, obviously, a wildly busy time in the packaging realm or, perhaps more accurately at this point: in the discussions about said realm.
Index entries for this article | |
---|---|
Python | Packaging |
Posted Feb 2, 2023 10:05 UTC (Thu)
by kleptog (subscriber, #1183)
[Link] (3 responses)
> pip is one of the "batteries included" with Python so it is often seen as the "official" packaging solution, even though the PyPA does not necessarily see it that way.
Which surprised me. If pip isn't official then no Python packaging solution is official. Which is just bizarre. It certainly surprised everyone I suggested it to.
On the other hand we have:
> because pip is responsible for supporting all python environments – venv, distro, whatever – while conda is free to specialize
If pip is responsible to for supporting all environments, then that would imply it's official. Otherwise this responsibility would not exist.
ISTM that the quickest win is to standardise a way for packages to declare external dependencies in normal python packages. Even if pip doesn't use it, it opens the way for other tools to use it.
Posted Feb 2, 2023 10:38 UTC (Thu)
by dottedmag (subscriber, #18590)
[Link] (1 responses)
Conda, Nix, Debian and other closed infrastructures can do it by controlling the whole set of packages.
However Python packages are exposed to the wild world, where even "I need a C compiler" won't do, as there are many C compilers with different supported C versions, bugs, incompatible extensions, CLI interfaces, target platforms, ideas about ABI etc.
Posted Feb 2, 2023 10:38 UTC (Thu)
by dottedmag (subscriber, #18590)
[Link]
Posted Feb 9, 2023 15:29 UTC (Thu)
by fung1 (subscriber, #144307)
[Link]
A big part of why Pip is not viewed as "the official solution" to package installation is that PyPA has been striving to reinvent packaging tools so that they're based on published standards and specifications rather than a de facto "whatever this tool does is the standard" approach. This means even something as seemingly ubiquitous as Pip is supposed to just be one possible implementation of those standards, in order to allow for fair competition from anyone else who wants to develop an interoperable replacement. Officially blessing one solution is viewed by many as favoritism, making it very hard if not nearly impossible for any alternative to gain sufficient mind-share. The initial standards are being derived from what these tools do in order to not cause them to suddenly be non-compliant, but with the idea that as people want to implement sweeping new features or scope changes in the tools, they need to get them reflected in reviewed and agreed-upon published standards first.
Pip's maintainers see it as being responsible for meeting many of these use cases, but that doesn't mean the responsibility is placed on it by the CPython project. Rather, it's a scope the maintainers have chosen to give it, mostly in order to maintain backwards compatibility for users of earlier versions and predecessors like easy_install. They technically also have the "freedom to specialize" but they prefer not to exercise it, as that would leave many current users of its more general approach in the lurch.
Posted Feb 2, 2023 10:45 UTC (Thu)
by gerdesj (subscriber, #5446)
[Link]
Posted Feb 3, 2023 2:05 UTC (Fri)
by smitty_one_each (subscriber, #28989)
[Link]
Posted Feb 3, 2023 10:22 UTC (Fri)
by cortana (subscriber, #24596)
[Link] (2 responses)
The belief that pip is official is part of the problem I'm just in despair at this point. The official Install Python Modules documentation says quite clearly:
pip is the preferred installer program. Starting with Python 3.4, it is included by default with the Python binary installers. I'm now becoming more and more inclined to disregard any recommendations from PyPA, They burned all my goodwill with the Pipenv fiasco, and now this. These days I've settled on a combination of Poetry and micropipenv as the last annoying Python packaging tools, but I expect before too long, something else will force me to move on to newer tools with names generated by picking a random combination of terms from the list [py, dist, package, setup, env, utils, virt, build, virtual, wheel, v, pack, file, ...].
Posted Feb 3, 2023 20:15 UTC (Fri)
by intelfx (subscriber, #130118)
[Link] (1 responses)
What was the "Pipenv fiasco"?
Posted Feb 6, 2023 0:56 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link]
https://chriswarrick.com/blog/2018/07/17/pipenv-promises-...
I cannot speak to the accuracy of anything said on either of those pages, because I have never used pipenv myself and did not follow this issue at the time. Note also that both links are several years old, and the situation may have changed since 2018.
Posted Feb 8, 2023 15:50 UTC (Wed)
by qwertyface (subscriber, #84167)
[Link]
Before reading it, I'd drafted a comment saying that conda absolutely shouldn't become a standard solution to distributing Python packages. It has most of the scope of a Linux distribution, but without the helpful restriction of being Linux only. I still mostly believe that, but now think something like it might always be necessary in some circumstances.
Posted Feb 9, 2023 14:56 UTC (Thu)
by mboisson (guest, #163560)
[Link]
* don’t vendor things available as <conda=>system> packages
On cluster environments, we actually ask our users to *not* use conda, in large part due to these reasons.
conda is more like yum/apt than it is like pip, and that does not play well on a cluster, but our users keep coming to us wanting to use conda instead of pip (and then we show them that pip works better on clusters).
Convergence in the pip and conda worlds?
Convergence in the pip and conda worlds?
Convergence in the pip and conda worlds?
Convergence in the pip and conda worlds?
Convergence in the pip and conda worlds?
Convergence in the pip and conda worlds?
Convergence in the pip and conda worlds?
Convergence in the pip and conda worlds?
Convergence in the pip and conda worlds?
https://old.reddit.com/r/Python/comments/8jd6aq/why_is_pi...
Convergence in the pip and conda worlds?
Convergence in the pip and conda worlds?
* do include additional dependencies for those things
* link against the import libraries/headers/options used for the matching <conda=>system> builds of dependencies
https://docs.alliancecan.ca/wiki/Anaconda/en