It's been three years since LWN last covered Gentoo Linux, so checking in on Gentoo's activities since then seems appropriate. Let's start with a re-introduction to Gentoo. Gentoo is a source-based distribution that is unlike the more common binary distributions because packages are compiled on your machines rather than remotely on the distribution's infrastructure. Source-based distributions allow for far more customization than is possible with binary distributions because you can not only control which packages are installed, but also which features of a given package are enabled (and consequently how many dependencies get pulled in).
This leads to compelling advantages for a number of use cases, although Gentoo isn't suitable for everyone. For example, Gentoo is superb on developer workstations because you get a proven-working toolchain and all development packages (headers and so on) by default, as well as good support for building live packages directly from Git/Subversion/CVS/etc. It also stands out for use in embedded or other minimal configurations, systems that need every last drop of performance (since you can control the compilation flags for every package), and other places requiring significant customization. Furthermore, Gentoo's "teach a man to fish" philosophy makes it an excellent distribution for learning more about how Linux works; even the installation process is performed by hand, by following an extensive handbook that explains every concept along the way.
Since mid-2008, Gentoo's made a number of improvements to its packaging format and release schedule, and its development community has mostly held steady in terms of both developers and code committed. There haven't been any drastic changes, but in a mature project with roughly 175 active developers and around 375 irregular contributors, sheer inertia means you shouldn't expect many. Let's take a look at where Gentoo stands today, some of the biggest changes over the past few years, and what looming issues it still faces.
The best way to get an idea of where Gentoo's developer community and code stand today is check the numbers rather than relying on subjective opinion. Ohloh helpfully provides such statistics, so we'll rely on it for this analysis. An overall picture of Gentoo's lines of code since its origins over a decade ago are shown in the graph at right.
What's immediately visible is a steady growth from 2002-2005, a meteoric rise in 2006, and another steady growth from 2007 on. We're going to focus on recent years in this article, since LWN has already analyzed Gentoo's previous history. There are two main features worth noting in this graph. First, the slope of the 2002-2005 period is much higher than the 2007-2011 period. All the hype around Gentoo during that time was directly correlated with a higher rate of development, whereas today's development rate is relatively slower-paced. Second, from 2007 to 2011, the slope appears to be gradually trailing off. The apparent increase at the very end is likely an artifact due to additional repositories being registered with Ohloh rather than a sudden increase in code production. Gentoo's codebase is growing more slowly, suggesting a drop in the size of our community or its productivity.
Now, let's take a look at how many contributors Gentoo has, and how that's changed over time (Note: Ohloh defines contributors as people who committed during a 30-day period), which is shown in the graph at right. The data closely matches the first graph, with a peak near 250 developers around 2006 followed by a steep drop to around 200 in 2008 and then a gradual decline to 175 today. What could have caused the sharp drop from 2006 to 2008? Ubuntu was announced much earlier, in late 2004, and it then became the new "hot" distribution, so the timing is off for this to be the cause.
I suspect the community-related problems Gentoo battled in this time frame (culminating in the forced removal of three developers for abusive behavior in early 2008) demotivated existing contributors and scared away potential new contributors. Gentoo's gradual decline today suggests that its reputation and community never fully recovered from that crisis. Although there is a general perception within the development community that some important things aren't being done, nobody has previously pointed out the quantitative drop in contributors or its potential connection to those issues. My hope is that exposing this gradual but very real decline will spur efforts to address Gentoo's most visible and damaging problems.
Now that we've looked at the health of the project in terms of code and contributors, we'll more closely examine the specific improvements the project has made. Since it's a distribution, there's no surprise that those improvements primarily involving packaging.
To understand much of Gentoo's progress over the past few years, you'll need a basic understanding of Gentoo's packages — called ebuilds. They are essentially bash scripts with a number of helper functions and a primitive form of inheritance. Ebuilds build packages by stepping through a series of bash functions called phases. Each phase corresponds to a conceptual step like unpacking the tarball, patching the code, configuration, compilation, or installation. The key difference from the RPM or deb packages used in binary distributions is that ebuilds must allow for flexibility regarding how the package is built, so they're full of conditionals about how to configure, build, and install specific features.
Gentoo's packaging format, the ebuild, is explicitly versioned to allow for improvements to the format using an Ebuild API (EAPI). Unlike most other distributions, these improvements occur in Gentoo on a fairly regular basis: roughly once a year there's a new EAPI. In late 2008, Gentoo's governing council approved EAPI=2, which contained a significant series of changes to the ebuild format, of which I'll describe a few of the most important examples.
First, it added default implementations for ebuild phases. Previously, we had to re-implement all the default code for a phase if it required modification at all. For example, to install one additional documentation file, we had to rewrite the default code that runs make install with a series of Gentoo-specific arguments; now, we could instead call a function named default to run that code, then just install the docs.
Second, EAPI=2 provided finer-grained control over different steps of the build process. It added two new phases specifically for preparing unpacked code for a build (e.g. applying patches) and configuring the code (running configure or its equivalent). This allowed for shorter, more maintainable ebuilds because more of the code for unpacking and building can fall back to the default implementations. The final important feature of EAPI=2 is the ability to require that specific features be built into dependencies (a.k.a. USE dependencies). Gentoo's USE flags generally correspond to --enable-foo in a configure script or its equivalent, and packages higher in the stack often require that ones lower in the stack be built with or without certain features.
The next major improvement in packaging was EAPI=4, which came with a number of changes, and I will highlight a few of the most important. First was a new, very early phase called pkg_pretend to perform checks during the dependency calculation. This allows developers to perform particular checks before starting the build process, so an extended build of many packages won't die in the middle because the correct kernel options aren't enabled, for example.
Second, EAPI=4 improved error handling by forcing all ebuild utilities to die on failure by default. This shortens error-handling code because typically the failure of any ebuild command during the build should result in failure for the entire package. Third, ebuilds could indicate whether they had interdependencies among various features they support. Often, more complex packages will have various features that are dependent upon other features within the same package; to enable Y, you must also enable X. A new variable called REQUIRED_USE allowed developers to set dependencies and also indicate conflicting features.
If you follow Gentoo, you may have noticed the conspicuous lack of any announcements regarding security updates (a.k.a. GLSAs) since January of this year, and sparse announcements since late 2009. This doesn't mean that security updates do not occur; package maintainers, who comprise the majority of Gentoo developers, continue to quickly add patches and new releases for security fixes. It does mean, however, that the security team is heavily undermanned and cannot keep up with the pace of security holes in a distribution's worth of software. A significant population of the people who care deeply about security updates maintain servers, and often are paid to do so. If they desire GLSAs, perhaps they could contribute to creating them; a modest effort from enough of these people could help to revive these security announcements.
Gentoo follows a rolling-release model, with constant updates to individual packages showing up hourly, 24 hours a day. Previously, it made releases semi-annually by taking snapshots of its package database, performing lots of QA on them, and creating LiveCDs — a process that required intensive manual effort. Gentoo then moved to a "rolling release" strategy for its releases by creating weekly automatic builds rather than formal releases. This was a big win in terms of reducing developer effort but came with an unexpected loss of PR for Gentoo.
When coupled with the current lack of a weekly or monthly newsletter, Gentoo has nearly disappeared from news sites. It turns out that official releases drive news articles; without a major reason to write about an open-source project, like a release announcement, news sites often ignore it. For that reason, as well as users clamoring for full-featured LiveDVDs with pretty artwork, Gentoo again started producing DVD releases, with the most recent being 11.2 in August.
Gentoo seems to have the core aspects right: code and community. But all the peripheral components necessary to a thriving open-source project, at least one of this size, have been lacking in recent years. Contributors have faded away for the weekly newsletter (first monthly, then "on hiatus"); Gentoo's previously award-winning documentation has begun to get stale due to a lack of documentation contributors to update it for recent changes; and the same has been true for its release engineers and security team.
Major years-long initiatives like a migration to Git and a redesigned website have largely stalled, again because the people involved don't have enough time to work on them. Although some of these aspects have improved very recently (real releases again, and former documentation lead Sven Vermeulen just returned to Gentoo), others remain an open question. It seems that the shrinkage in the developer community has affected some of the most important contributors, resulting in a major hit to the distribution that it's still working to recover from.
First, there's the expected: Gentoo will continue to improve upon its ebuild format with new EAPIs. The Google Summer of Code program has brought some welcome new blood into the project, with around two-thirds of the roughly 15 internship students each year becoming Gentoo developers. Work is ongoing to enable integration with new technologies like systemd, although it's unclear at this point whether it will replace Gentoo's custom init system (OpenRC) or become yet another option. Gentoo is about providing choice wherever it makes sense rather than enforcing its own choices upon its users, so this same idea of choosing between alternatives also applies to GNOME 3 vs KDE, and if anyone makes Unity integrate with Gentoo, that will become an option too.
As is unfortunately far too common in open-source projects, progress can sometimes be slow due to lack of volunteer time, especially on larger or complex issues. Some of them have been dragging on for years now, like a migration of Gentoo's main repository to Git from CVS, which is both a large and complex issue. To date, sample conversions exist (such as the one Ohloh uses for statistics), and a scheme was developed in collaboration with upstream Git developers to individually sign every commit for improved security. A tracker bug and mailing list exist for anyone interesting in following (or even better, helping with) the work on the Git conversion.
Another longstanding question that's constantly discussed but rarely acted upon is the problems with Gentoo's organizational structure. The current model is of a seven-member council that is entirely up for re-election every year. In addition, there is a nonprofit foundation that controls Gentoo's finances, copyrights and trademarks, and hardware, with its own independent board of trustees. The members-at-large model of the council (rather than members being in charge of specific areas) means that no progress can happen on any global Gentoo issue without a majority vote of the seven members, and this can take months. A number of ideas have been floated, like shrinking the council, returning to the previous model of a benevolent dictator, or installing the foundation trustees as a corporate board that would appoint a project leader, similar to a CEO. The goal should be whatever allows Gentoo to make faster progress; my entirely biased opinion is that open-source projects exist to accomplish a purpose and should focus on that, rather than attempting to be a democratic government where everyone is equal.
Gentoo's largest problems, however, come down to a single core issue: not enough people are working in the areas outside of development, perhaps because they're under-appreciated — security, release engineering, newsletters, and documentation, to name a few. If Gentoo can focus on finding and retaining contributors there, perhaps by applying lessons from its involvement in the Google Summer of Code, it could improve its reputation and increase its publicity. That could well bring in the contributors to rejuvenate what has become a somewhat sluggish open-source project.
Newsletters and articles of interest
Page editor: Rebecca Sobol
Next page: Development>>
Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds