LWN.net Logo

Package management in Gentoo Linux

July 2, 2007

This article was contributed by Donnie Berkholz

Package management is one of the key defining characteristics of a distribution. The question of where package management is going should be of interest to anyone involved with a distribution or administering a Unix-based box of any sort. In many distributions, package management appears to have reached a near standstill. For example, the RPM format has hardly changed in years. In Gentoo, however, ongoing development of package management is so popular that three separate, actively developed package managers exist.

Over the past couple of years, many developers have grown increasingly unsatisfied with Gentoo's default package manager, Portage. Portage is a high-level interface to Gentoo's package format, a series of scripts called ebuilds. Unfortunately, Portage wasn't planned out in the first place, and features have been added ad hoc over the course of many years. Today, it's extremely difficult to add features to Portage or interface with it because there are complex interdependencies and a pretty much nonexistent API. Consequently, two groups of developers decided to start fresh with two separate projects: paludis and pkgcore.

Paludis is implemented in C++ and bash, with a C++ API and an optional Ruby scripting API. One of the biggest features that Portage lacks but Paludis supports is the ability to remove all unused dependencies of a package when removing that package. Also, it has a much more flexible configuration system, user-definable hooks into the build process, user-defined sets of packages, and clean support for multiple repositories. In Portage, secondary repositories (called "overlays") are second-class citizens. Furthermore, Paludis added a number of features Gentoo developers have been requesting for years that add flexibility to how dependencies can be specified. Paludis contains a number of modules, including:

  • paludis—package installation, removal, and queries
  • contrarius—a client for building cross-compiling toolchains
  • inquisitio—a package searching client
  • qualudis—a quality assurance tool for ebuilds
  • adjutrix—a tool for architecture teams

Paludis includes experimental Portage support as of the end of March. This means you can try it out without wasting time migrating config files over, which significantly lowers its barrier to adoption.

Pkgcore is implemented in Python, the same language as Portage, with a few time-critical modules in C. It was designed so that there's no reason it has to be Gentoo-specific—it could easily support other package formats. Its philosophy is to maintain complete backwards compatibility with Portage while recoding it in a clean, maintainable, extensible fashion. Some of the code written for Pkgcore has been pulled back into Portage, such as the cache-handling code. Its 0.3 release finally reached a point of usability because it added frontends with comprehensible output—one that mirrors Portage and another that mirrors Paludis. Despite being in Python, it runs shockingly fast—it is a good example that not all programs written in high-level languages need be slow. The Pkgcore API is also viewable online. Some of the utilities Pkgcore includes are:

  • pmerge—package merging and unmerging
  • pmaint—repository maintenance: syncing, etc.
  • pquery—package searching
  • pcheck—QA checker for ebuilds

A couple of interesting features Pkgcore has are N-parent inheritance of eclasses (a Portage feature that allows inheritance to be used in bash code) and an ebuild daemon. The daemon has a number of benefits including near-linear scaling to multiple processors for some tasks—Pkgcore's home page cites ~90% scaling on a quad Pentium 3. And of course, one benefit over Paludis is that you don't need to use the occasionally less-than-speedy g++ to compile it.

Pkgcore and Paludis seem fairly well-matched in the features department. They both support sets, the additional dependency flexibility, integrated checking for security vulnerabilities, and Portage's on-disk format. Another useful feature they both support is the ability to restrict packages to install based on their licenses. This gives users the choice of how free they want their installations to be, from FSF-compliant to packed with proprietary. Both projects have active teams working on them of between 5 and 10 developers each. In comparison, Portage is primarily maintained by potential masochist Zac Medico—a glance through the ChangeLog showed that he was the only committer since January.

The advent of multiple package managers accelerated Gentoo's need to adopt a formal Package Manager Specification. In the past, new features or breaks in backwards compatibility in Portage simply forced a wait of roughly 6 months, at which point it was assumed that nobody was using those old Portage versions anymore. Problems with that should be readily apparent. When new package managers came along, additional questions came up of which aspects of ebuild behavior were intrinsic behavior and which were Portage-specific details. With only one implementation and no spec, it's hard to draw a line.

Together, these two developments motivated creation of an Ebuild API or EAPI. The current generation will be EAPI=0, which is being documented in a formal specification. Once this spec is done, Gentoo will have a process in place for dealing with ebuilds using new features and for dealing with breaks in compatibility via setting in each ebuild the EAPI that ebuild supports. This will enable near-instant use of new features that Gentoo developers have already been awaiting for years as well as agreement upon how all these package managers must act in common and where they have flexibility to be different.


(Log in to post comments)

Package management in Gentoo Linux

Posted Jul 5, 2007 12:27 UTC (Thu) by smitty_one_each (subscriber, #28989) [Link]

Great article.
Venerable though portage may be, the nail in its coffin is going to be benchmarking. As a paludis user, I can tell you that dependency resolution is _a lot_ faster with McCreesh's solution than with portage, and I won't hazard a guess about accuracy.

How painful is the migration?

Posted Jul 5, 2007 18:33 UTC (Thu) by felixfix (subscriber, #242) [Link]

I have just started an emerge of paludis and will read up on it, but maybe you can answer two quickie questions?

How painful is switching to Paludis?

Is it possible to run Paludis in parallel with portage, and what is the pain level?

How painful is the migration?

Posted Jul 5, 2007 20:01 UTC (Thu) by smitty_one_each (subscriber, #28989) [Link]

There is a shell script that worked great for me going portage->paludis on the paludis.pioto.org site.
I have no experience running the two in parallel, but I'll venture that you'll move away from emerge rather speedily. ;)
After switching to paludis, I installed layman, and used it to kick-start some of the overlays.gentoo.org stuff. Paludis expects more discipline, apparently, and there was some manual effort to configure, say, the emacs overlay. Had to create several files and directories to silence the error traces.
I think there is some provision for using the old-style emerge configuration files, but paludis again does a better job of herding the cats into a few coherent files.
Hats off to emerge. It had a great run.

How painful is the migration?

Posted Jul 6, 2007 0:43 UTC (Fri) by dberkholz (subscriber, #23346) [Link]

I mentioned that the latest paludis releases have compatibility with Portage config files. Try USE=portage emerge paludis, and see how things go. It should pretty much work.

Package management in Gentoo Linux

Posted Jul 6, 2007 16:39 UTC (Fri) by cventers (guest, #31465) [Link]

What I think would be really neat is if these package managers had the
following:

1. Dynamic library tracking, because revdep-rebuild SUCKS! The package
manager knows what binaries it is installing into the live system, it
should be able to 'ldd' and remember from then on. The hope, of course,
is that the package manager would be smart enough to recompile any
packages dependent on the library.

2. Ebuild cache... periodically, Gentoo deletes ebuilds from the portage
tree. The problem is that you may have an old version of some package
that Gentoo no longer supports which depends on a library you want to
upgrade. If you still have the source, Gentoo should happily rebuild the
old (no longer supported) version of the software against the new
library.

The lack of this ability leads to occasional frustration when you have to
upgrade a library due to a security vulnerability, only to discover that
you now have to upgrade other packages just because Gentoo deleted the
ebuild for the version you were using.

3. Transactional upgrades. If you want to upgrade a slew of software,
merge all the files into a temporary holding directory and wait until all
packages and their dependencies have successfully compiled before
updating the live system. Having to chase down build failures in the
middle of an "emerge", when your system is currently in a broken state,
is irritating.

4. A better etc-update. The one that is included should be taken out back
and shot :P

Gentoo is great, but in some ways I feel like it is just the tallest
midget. I really wish I had the time to help on the code, because I feel
that these features would greatly enhance the OS. A guy can dream, can't
he?

Package management in Gentoo Linux

Posted Jul 6, 2007 19:56 UTC (Fri) by dberkholz (subscriber, #23346) [Link]

Great suggestions!

- It needs to do more than just ldd, so it can handle all types of languages. For example, Perl or Python modules need to get handled a bit smarter. Various people have worked a little on this problem, but nobody in Gentoo has done a good job of finishing it.

- The ebuild cache as it exists now is a little subpar. You've got the current ebuild in /var/db/pkg/, or you can look in the CVS Attic via your anoncvs checkout or http://sources.gentoo.org/.

- I really like the transactional idea.

- Some other possibilities do exist for updating your config files such as dispatch-conf, conf-update, cfg-update (all of which are part of Portage itself or in the main Portage tree) or the new etc-proposals (sunrise overlay). Try 'em out.

Package management in Gentoo Linux

Posted Jul 8, 2007 20:29 UTC (Sun) by dirtyepic (subscriber, #30178) [Link]

3. Transactional upgrades. If you want to upgrade a slew of software, merge all the files into a temporary holding directory and wait until all packages and their dependencies have successfully compiled before updating the live system. Having to chase down build failures in the middle of an "emerge", when your system is currently in a broken state, is irritating.

Interesting idea, but I'm not sure how that would work or why introducing massive changes to the system all at once rather than incrementally would help anything. Would you link to the system libraries or to the ones you've just built in the holding area? What happens when those libraries are suddenly relocated or overwritten?

The best way to handle updating is one package at a time. If something breaks, then you only have to deal with that package. Blindly running emerge world is usually what gets people into trouble in the first place.

4. A better etc-update. The one that is included should be taken out back and shot :P

I don't honestly know why it's still around and the default. dispatch-conf forever.

Package management in Gentoo Linux

Posted Jul 10, 2007 16:00 UTC (Tue) by rise (guest, #5045) [Link]

If you're not using static binaries anywhere in the process (a big if) a CheckInstall/installwatch-style solution might work. Basically it uses a library preload to catch all file accesses and redirect them to a temporary area while overlaying the results over the actual filesystem. Then it bundles up all the changes it saw into a package. This is a nice but sub-optimal solution for software lacking true packages, though I use it heavily to make random source-compiled software trackable and uninstallable. However it should work nicely for transactions - just delay committing the overlay until the process completes properly.

AUFS for Transactional Upgrades

Posted Jul 12, 2007 20:19 UTC (Thu) by hathawsh (guest, #11289) [Link]

For near-transactional upgrades, consider an aufs-based chroot. (Note that this idea applies equally well to any package manager.) Here is someone who tried it:

http://blog.vrplumber.com/1889

AUFS for Transactional Upgrades

Posted Jul 19, 2007 15:48 UTC (Thu) by ferringb (guest, #20752) [Link]

Actually I tried something similar a while back; unionfs sandboxing of the phases to try and get the ability to truly track/reverse what pre_inst/pre_rm were upto, and track ebuilds builds where userpriv restrictions were in effect; problem I had was that it always wound up making gcc spew in a non-obvious way for compilation.

Either way, interesting to see someone playing with it still (nature of some of the phases, it's kind of required long term imo).

Package management in Gentoo Linux

Posted Jul 27, 2007 11:45 UTC (Fri) by fintan (guest, #46464) [Link]

I think conary is more along the right approach. http://wiki.rpath.com/wiki/Conary

Package management in Gentoo Linux

Posted Jul 13, 2007 14:55 UTC (Fri) by muwlgr (guest, #35359) [Link]

That very Paludis that pushed DRobbins back out of Gentoo :

http://www.google.com/search?q=robbins+gentoo+paludis
http://lwn.net/Articles/224082/
http://lwn.net/Articles/224615/

Package management in Gentoo Linux

Posted Jul 14, 2007 9:10 UTC (Sat) by tres (guest, #352) [Link]

Not as mush as Paludis' main developer, Ciaranm did. That guy has somewhat of an abrasive personality (to say the least).

Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds