|
|
Subscribe / Log in / New account

Convergence in the pip and conda worlds?

By Jake Edge
February 1, 2023

Python packaging

The discussions about the world of Python packaging and the problems caused by its disparate tools and incompatible ecosystems are still ongoing. Last week, we looked at the beginnings of the conversation in mid-November, as the discussion turned toward a possible convergence between two of the major package-management players: pip and conda. There are numerous barriers to bringing the two closer together, inertia not least, but the advantages for users of both, as well as new users to come, could be substantial.

conda versus pip

As our overview of the packaging landscape outlines, the Anaconda distribution for Python, which developed conda as its package manager, is largely aimed at the scientific-computing world, while pip comes out of the Python Packaging Authority (PyPA). These days, pip is one of the "batteries included" with Python so it is often seen as the "official" packaging solution, even though the PyPA does not necessarily see it that way. The belief that pip is official is part of the problem, H. Vetinari said:

If the python packaging authority doesn't mention conda anywhere, a lot of people will never even discover it. And even people who are aware are doubtful - I see the confusion all the time (in my dayjob and online) about which way is "the right way" to do python packaging and dependency management.

I firmly believe that the vast majority of users would adapt to any paradigm that solves their problems and doesn't get in their way too much. I think the strongest resistance actually comes from those people knee-deep in packaging entrails, and the significance of that group is that many of them are the movers and shakers of the (non-conda) packaging ecosystem.

Vetinari is active in the conda-forge package repository community; he thinks that the conda "side" is willing to change to try to find a way to cover the needs of those who do not currently use it. "End users really don't benefit from a zoo of different solutions [...]". PyPA developer Paul Moore had suggested that it would require a fair amount of work to bring pip and conda together, though he is personally not an advocate of that plan. Steve Dower said that he did not see a need to "reconcile conda into a 'Python packaging vision'", since conda is a "full-stack" solution that provides everything, including Python itself. But Vetinari sees things differently:

Conda is full-stack because that's – unfortunately – what's necessary to deal with the inherent complexity of the problem space. But it's not a beneficial state of affairs for anyone IMO; that divergence is an issue that affects a huge amount of python deployments (e.g. having to decide how to prioritize the benefits of pyproject.toml / poetry's UX [user experience], etc. vs. the advantages of conda) – it's possible to claim that it's too much work to reconcile, but fundamentally, that schism shouldn't have to exist.

Ralf Gommers pointed out that discussions of this sort often go nowhere because the participants are talking past each other. The "regular" Python users, who can mostly just pick up their dependencies from the Python Package Index (PyPI) using pip or who get packages via their Linux distribution, have a much different picture from those doing scientific computing or machine learning with Python. The two groups generally do not experience the same problems, thus it is not surprising that they do not see solutions that bridge both worlds.

The problems for scientific Python—which are "related to compiler toolchains, ABIs, distributing packages with compiled code in them, being able to express dependencies on non-Python libraries and tools, etc."—are complex, but they have not been explained well over the years. Gommers was pre-announcing an effort to fill that hole: "I'm making a serious attempt at comprehensively describing the key problems scientific, ML/AI and other native-code-using folks have with PyPI, wheels and Python packaging." The pypackaging-native site is meant as a reference site, "so we hopefully stop talking past each other". He formally announced (and linked to) the site at the end of December.

History and future

Bryan Van de Ven recounted some of the history of conda, noting that it came about around the same time as the PyPA and before the wheel binary package format was born. Decisions that were made at that time would probably be made much differently today. Van de Ven noted that he is no longer a conda developer, but he did have a specific wish list of features for more unified packaging if he "could wave a wand":

  • conda-style environments (because a link farm is more general)
  • wheel packages for most/all Python packages (because they are sufficient)
  • "conda packages" (or something like them) for anything else, e.g. non-python requirements

He was asked about conda's "link farm" environment, which is another way to provide a virtual environment, like those created by venv in the standard library or virtualenv on PyPI. Van de Ven briefly described the idea since he was unaware of any documentation on it:

The gist is that every version of every package is installed in a completely isolated directory, with its own entire "usr/local" hierarchy underneath. Then "creating an environment" means making a directory <envname> with an empty "usr/local" hierarchy, and linking in all the files from the relevant package version hierarchies there. Now "activating an environment" means "point your PATH at <envname>/bin".

A Python virtual environment created by venv is a separate directory structure that effectively builds atop an existing Python installation on the host system. Packages are installed into a venv-specific site-packages directory; a venv-specific bin holds a link to the Python binary as well as an activation script that can be run to "enter" the environment. The venv arranges that executing a script from the bin automatically activates the environment for that invocation; actually doing an activation sets up the shell path and Python sys.prefix and sys.exec_prefix variables to point into the environment until it is deactivated.

Moore wondered whether it made sense to start working on some of those items that Van de Ven had listed. The changes that the PyPA is working on "have extremely long timescales" because getting people to move away from their existing practices "is an incredibly slow process, if you don't want to just alienate everyone". Given that, it makes sense to start now with incremental changes and in establishing standards moving forward.

Of course, there's no guarantee that everyone shares your view on the ideal solution (and if you're looking to standardise on conda-style environments, that will include the core devs, as venv is a stdlib facility) but I'd hope that negotiation and compromise isn't out of the question here :)

Gommers agreed with that as a "desired solution direction" but as he and Moore discussed it further in the thread, it was clear there is still a fairly wide gulf to somehow bridge. Nathaniel J. Smith thought that it made more sense for conda to integrate more from pip than the other way around:

I think the simplest way to make conda/pip play well together would be for conda to add first-class support for the upstream python packaging formats – wheels, .dist-info directories, etc. Then conda could see a complete picture of everything that's installed, whether from conda packages or wheels, handle conflicts between them, etc.

This seems a lot easier than pip growing to support conda, because pip is responsible for supporting all python environments – venv, distro, whatever – while conda is free to specialize. Also the python packaging formats are much better documented than the conda equivalents.

The discussion so far had proceeded without any conda developers weighing in, but that changed when conda tech lead (and PyPA co-founder) Jannis Leidel posted. "I hope to build bridges between conda and the PyPA stack as much as possible to improve the user experience of both ecosystems." He noted that conda has moved to a multi-stakeholder governance model; "Anaconda is still invested (and increasingly so) but it's not the only stakeholder anymore". He thinks that both conda and the PyPA "made the same painful mistakes of over-optimizing for a particular subset of users", which is a fundamental problem. He also made some general points about the packaging situation and on working together.

Moore had two specific questions for Leidel. Did he think that conda would ever be usable with a Python installation that was not created by conda? Would conda builds of Python packages ever be usable by non-conda tools like pip? Moore concluded: "For me, those are the two key factors that will determine whether we should be thinking in terms of a single unified ecosystem, or multiple independent ones."

Leidel replied that it would be hard to get conda to work with other Python installations "since for conda Python is just another package that it expects to have been consistently built and available". Using conda packages elsewhere is more plausible, but there is still quite a bit of work to get there. For one thing, he would like to see "an evolution of the wheel format to optionally include conda-style features". He agreed that the question of unification versus multiple independent projects was an important one to answer, however.

Vendoring

One problem area is that PyPI packages often include (or "vendor") other libraries and such into their wheels in order to make it easier for users who may not have the specialized libraries available. Those who use Linux package managers typically do not have those problems because the distribution packages the dependencies separately and the package manager installs them automatically—the same goes for conda users. Dower said that this vendoring is one of the main reasons that conda cannot simply consult PyPI to pick up its dependencies since it may well also get incompatible versions of other libraries that are along for the ride.

If Conda also searched PyPI for packages, this would mean packagers would just have to publish a few additional wheels that:
  • don't vendor things available as conda packages
  • do include additional dependencies for those things
  • link against the import libraries/headers/options used for the matching Conda builds of dependencies
Those three points are the critical ones that make sharing builds between Conda and PyPI impossible (or at least, against the design) regardless of direction.

Numpy installed through PyPI needs to vendor anything that can't be assumed to be on the system. Numpy installed through Conda must not vendor it, because it should be using the same shared library as everything else in the environment. This can only realistically be reconciled with multiple builds and separate packages (or very clever packages).

Adding some platform/ABI tags to wheels for conda, as Leidel suggested, could make PyPI/pip and conda more interoperable, Dower said. He outlined a set of things that needed to be done, starting with a way to define "native requirements" (for the non-Python dependencies). Gommers explained what that would look like using SciPy as an example. It has various Python dependencies (e.g. NumPy, Cython, Meson, etc.) that it declares in its pyproject.toml file, but there is also a list of dependencies that cannot be declared that way: C/C++ compiler, Fortran compiler, BLAS and LAPACK, and so on. He is interested in working on a Python Enhancement Proposal (PEP) to add a way to declare the native dependencies; he thinks that could help improve the user experience, especially for packages with more complicated needs:

And SciPy is still simple compared to other cases, like GPU or distributed libraries. Right now we just start a build when someone types pip install scipy and there's no wheel. And then fail halfway through with a hopefully somewhat clear error message. And then users get to read the html docs to figure out what they are missing. At that point, even a "system dependencies" list that pip can only show as an informative error message at the end would be a big help.

Moore was certainly in favor of that approach. Being able to check in advance whether it will be possible to build something for a Python package would be useful so that tools can at least tell users what it is that they are missing. More capable tools may be able to actually go out and fetch the needed pieces; "even for pip, having better errors and not starting builds that are guaranteed to fail would be a great step forward".

After some more discussion on the need for additional metadata in order to support those kinds of changes, the conversation began to trail off—in that thread, anyway. At the end of November, the results of a survey of users about packaging was announced, which, perhaps unsurprisingly, resulted in more discussion, there and in a strategy discussion thread that was started shortly after the new year. Beyond that, several PEPs have been floating around for discussion, while yet another packaging tool and binary format was announced. It is, obviously, a wildly busy time in the packaging realm or, perhaps more accurately at this point: in the discussions about said realm.


Index entries for this article
PythonPackaging


to post comments

Convergence in the pip and conda worlds?

Posted Feb 2, 2023 10:05 UTC (Thu) by kleptog (subscriber, #1183) [Link] (3 responses)

On the one hand we have :

> pip is one of the "batteries included" with Python so it is often seen as the "official" packaging solution, even though the PyPA does not necessarily see it that way.

Which surprised me. If pip isn't official then no Python packaging solution is official. Which is just bizarre. It certainly surprised everyone I suggested it to.

On the other hand we have:

> because pip is responsible for supporting all python environments – venv, distro, whatever – while conda is free to specialize

If pip is responsible to for supporting all environments, then that would imply it's official. Otherwise this responsibility would not exist.

ISTM that the quickest win is to standardise a way for packages to declare external dependencies in normal python packages. Even if pip doesn't use it, it opens the way for other tools to use it.

Convergence in the pip and conda worlds?

Posted Feb 2, 2023 10:38 UTC (Thu) by dottedmag (subscriber, #18590) [Link] (1 responses)

The tricky part will be the specification.

Conda, Nix, Debian and other closed infrastructures can do it by controlling the whole set of packages.

However Python packages are exposed to the wild world, where even "I need a C compiler" won't do, as there are many C compilers with different supported C versions, bugs, incompatible extensions, CLI interfaces, target platforms, ideas about ABI etc.

Convergence in the pip and conda worlds?

Posted Feb 2, 2023 10:38 UTC (Thu) by dottedmag (subscriber, #18590) [Link]

infrastructures -> ecosystems

Convergence in the pip and conda worlds?

Posted Feb 9, 2023 15:29 UTC (Thu) by fung1 (subscriber, #144307) [Link]

The "Authority" in "Python Packaging Authority" (PyPA) was originally meant as a joke, highlighting the lack of control that group is effectively able to exert. It's gotten increasing recognition from the core CPython developers and SC since then, but at the end of the day it's still only a self-identifying collective of people developing interrelated packaging solutions for a large segment of the Python packaging ecosystem, and the opinions expressed by folks who count themselves as a part of that collective don't necessarily the reflect those of the CPython project, even if they're probably some of the most visible and easiest opinions to find sometimes referred to by official CPython documentation. They have, for example, fairly recently (in Python lifetime terms) gotten Pip included directly in builds of the standard library shipped as part of CPython, which has increased the perception of official status for it.

A big part of why Pip is not viewed as "the official solution" to package installation is that PyPA has been striving to reinvent packaging tools so that they're based on published standards and specifications rather than a de facto "whatever this tool does is the standard" approach. This means even something as seemingly ubiquitous as Pip is supposed to just be one possible implementation of those standards, in order to allow for fair competition from anyone else who wants to develop an interoperable replacement. Officially blessing one solution is viewed by many as favoritism, making it very hard if not nearly impossible for any alternative to gain sufficient mind-share. The initial standards are being derived from what these tools do in order to not cause them to suddenly be non-compliant, but with the idea that as people want to implement sweeping new features or scope changes in the tools, they need to get them reflected in reviewed and agreed-upon published standards first.

Pip's maintainers see it as being responsible for meeting many of these use cases, but that doesn't mean the responsibility is placed on it by the CPython project. Rather, it's a scope the maintainers have chosen to give it, mostly in order to maintain backwards compatibility for users of earlier versions and predecessors like easy_install. They technically also have the "freedom to specialize" but they prefer not to exercise it, as that would leave many current users of its more general approach in the lurch.

Convergence in the pip and conda worlds?

Posted Feb 2, 2023 10:45 UTC (Thu) by gerdesj (subscriber, #5446) [Link]

"I'm Brian, and so's my wife" ...

Convergence in the pip and conda worlds?

Posted Feb 3, 2023 2:05 UTC (Fri) by smitty_one_each (subscriber, #28989) [Link]

Hopefully 3.13 can be the version where python make a point of making packaging more pythonic.

Convergence in the pip and conda worlds?

Posted Feb 3, 2023 10:22 UTC (Fri) by cortana (subscriber, #24596) [Link] (2 responses)

The belief that pip is official is part of the problem

I'm just in despair at this point. The official Install Python Modules documentation says quite clearly:

pip is the preferred installer program. Starting with Python 3.4, it is included by default with the Python binary installers.

I'm now becoming more and more inclined to disregard any recommendations from PyPA, They burned all my goodwill with the Pipenv fiasco, and now this. These days I've settled on a combination of Poetry and micropipenv as the last annoying Python packaging tools, but I expect before too long, something else will force me to move on to newer tools with names generated by picking a random combination of terms from the list [py, dist, package, setup, env, utils, virt, build, virtual, wheel, v, pack, file, ...].

Convergence in the pip and conda worlds?

Posted Feb 3, 2023 20:15 UTC (Fri) by intelfx (subscriber, #130118) [Link] (1 responses)

> I'm now becoming more and more inclined to disregard any recommendations from PyPA, They burned all my goodwill with the Pipenv fiasco <...>

What was the "Pipenv fiasco"?

Convergence in the pip and conda worlds?

Posted Feb 6, 2023 0:56 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

With some Googling, I dug up these links, which I think might be relevant:

https://chriswarrick.com/blog/2018/07/17/pipenv-promises-...
https://old.reddit.com/r/Python/comments/8jd6aq/why_is_pi...

I cannot speak to the accuracy of anything said on either of those pages, because I have never used pipenv myself and did not follow this issue at the time. Note also that both links are several years old, and the situation may have changed since 2018.

Convergence in the pip and conda worlds?

Posted Feb 8, 2023 15:50 UTC (Wed) by qwertyface (subscriber, #84167) [Link]

Just to say, that pypackaging-native site is a fantastic exploration of the issues. None of this is simple!

Before reading it, I'd drafted a comment saying that conda absolutely shouldn't become a standard solution to distributing Python packages. It has most of the scope of a Linux distribution, but without the helpful restriction of being Linux only. I still mostly believe that, but now think something like it might always be necessary in some circumstances.

Convergence in the pip and conda worlds?

Posted Feb 9, 2023 14:56 UTC (Thu) by mboisson (guest, #163560) [Link]

I think that the things conda "blames" pip for, are the same things that HPC cluster administrators blame conda for, if you replace "conda" by "system", i.e.

* don’t vendor things available as <conda=>system> packages
* do include additional dependencies for those things
* link against the import libraries/headers/options used for the matching <conda=>system> builds of dependencies

On cluster environments, we actually ask our users to *not* use conda, in large part due to these reasons.
https://docs.alliancecan.ca/wiki/Anaconda/en

conda is more like yum/apt than it is like pip, and that does not play well on a cluster, but our users keep coming to us wanting to use conda instead of pip (and then we show them that pip works better on clusters).


Copyright © 2023, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds