September 14, 2011
This article was contributed by Donnie Berkholz
It's been three years since
LWN last covered Gentoo Linux, so
checking in on Gentoo's activities since then seems appropriate. Let's start
with a re-introduction to Gentoo. Gentoo is a source-based distribution
that is
unlike the more common binary distributions because packages are compiled
on your machines rather than remotely on the distribution's
infrastructure. Source-based distributions allow for far more customization
than is possible with binary distributions because you can not only control
which packages are installed, but also which features of a given package are
enabled (and consequently how many dependencies get pulled in).
This leads to compelling advantages for a number of use cases, although
Gentoo isn't suitable for everyone. For example, Gentoo is superb on developer workstations
because you get a proven-working toolchain and all development packages
(headers and so on) by default, as well as good support for building live
packages directly from Git/Subversion/CVS/etc. It also stands out for use in
embedded
or other minimal configurations, systems that need every last drop of
performance (since you can control the compilation flags for every package),
and other places requiring significant customization. Furthermore, Gentoo's
"teach a man to fish" philosophy makes it an excellent distribution for
learning more about how Linux works; even the installation process is
performed by hand, by following an extensive
handbook that explains every concept along the way.
Since mid-2008, Gentoo's made a number of improvements to its packaging
format and release schedule, and its development community has mostly held
steady in terms of both developers and code committed. There haven't been
any drastic changes, but in a mature project with roughly 175 active developers and
around 375 irregular contributors, sheer inertia means you shouldn't expect
many. Let's take a look at where Gentoo stands today, some of the biggest
changes over the past few years, and what looming issues it still faces.
Gentoo by the numbers
The best way to get an idea of where Gentoo's developer community and code
stand today is check the numbers rather than relying on subjective
opinion. Ohloh helpfully provides such statistics, so we'll rely on it for
this analysis. An overall picture of Gentoo's lines of code since
its origins over a decade ago are shown in the graph at right.
What's immediately visible is a steady growth from 2002-2005, a meteoric
rise in 2006, and another steady growth from 2007 on. We're going to focus
on recent years in this article, since LWN has already analyzed Gentoo's
previous history. There are two main features worth noting in this
graph. First, the slope of the 2002-2005 period is much higher than the
2007-2011 period. All the hype around Gentoo during that time was directly
correlated with a higher rate of development, whereas today's development
rate is relatively slower-paced. Second, from 2007 to 2011, the slope
appears to be gradually trailing off.
The apparent increase at the very end is likely an artifact due to
additional repositories being registered with Ohloh rather than a sudden
increase in code production.
Gentoo's codebase is growing more
slowly, suggesting a drop in the size of our community or its
productivity.
Now, let's take a look at how many contributors Gentoo has,
and how that's changed over time (Note: Ohloh defines contributors as people
who committed during a 30-day period), which is shown in the graph at right.
The data closely matches the first graph, with a peak near 250 developers
around 2006 followed by a steep drop to around 200 in 2008 and then a gradual decline
to 175 today. What could have caused the sharp drop from 2006 to 2008?
Ubuntu was announced much earlier, in late 2004, and it then became the new
"hot" distribution, so the timing is off for this to be the cause.
I suspect
the community-related problems Gentoo battled in this time frame (culminating
in the forced removal of three developers for abusive behavior in early 2008)
demotivated existing contributors and scared away potential new
contributors. Gentoo's gradual decline today suggests that its reputation
and community never fully recovered from that crisis. Although there is a
general perception within the development community
that some important things aren't being done, nobody has previously
pointed out the quantitative drop in contributors or its potential
connection to those issues. My hope is that exposing this gradual but
very real decline will spur efforts to address Gentoo's most visible and
damaging problems.
Now that we've looked at the health of the project in terms of code and
contributors, we'll more closely examine the
specific improvements the project has made. Since it's a distribution,
there's no surprise that those improvements primarily involving packaging.
Updates to the ebuild packaging format
To understand much of Gentoo's progress over the past few years, you'll need
a basic understanding of Gentoo's packages — called ebuilds. They
are essentially bash scripts with a number of helper functions and a
primitive form of inheritance. Ebuilds build packages by stepping through a
series of bash functions called phases. Each phase corresponds to a
conceptual step like unpacking the tarball, patching the code,
configuration, compilation, or installation. The key difference from the RPM
or deb packages used in binary distributions is that ebuilds must allow for
flexibility regarding how the package is built, so they're full of
conditionals about how to configure, build, and install specific features.
Gentoo's packaging format, the ebuild, is explicitly versioned to allow for
improvements to the format using an Ebuild API (EAPI). Unlike most
other distributions, these improvements occur in Gentoo on a fairly regular
basis: roughly once a year there's a new EAPI. In late 2008, Gentoo's
governing council
approved EAPI=2, which contained a significant series of
changes to the ebuild format, of which I'll describe a few of the most
important examples.
First, it added default implementations for ebuild
phases. Previously, we had to re-implement all the default code for a phase
if it required modification at all. For example, to install one additional
documentation file, we had to rewrite the default code that runs make install with a series of Gentoo-specific arguments; now, we could
instead call a function named default to run that code, then just
install the docs.
Second, EAPI=2 provided finer-grained control over
different steps of the build process. It added two new phases specifically
for preparing unpacked code for a build (e.g. applying patches) and
configuring the code (running configure or its equivalent). This allowed for
shorter, more maintainable ebuilds because more of the code for unpacking
and building can fall back to the default implementations. The final
important feature of EAPI=2 is the ability to require that specific features
be built into dependencies (a.k.a. USE dependencies). Gentoo's USE flags
generally correspond to --enable-foo in a configure script or its
equivalent, and packages higher in the stack often require that ones lower
in the stack be built with or without certain features.
The next major improvement in packaging was EAPI=4,
which came with a number of changes, and I will highlight a few of the
most important. First was
a new, very early phase called pkg_pretend to perform checks during
the dependency calculation. This allows developers to perform particular
checks before starting the build process, so an extended build of many
packages won't die in the middle because the correct kernel options aren't
enabled, for example.
Second, EAPI=4 improved error handling by forcing all
ebuild utilities to die on failure by default. This shortens error-handling
code because typically the failure of any ebuild command during the build should
result in failure for the entire package. Third, ebuilds could indicate
whether they had interdependencies among various features they
support. Often, more complex packages will have various features that are
dependent upon other features within the same package; to enable Y, you must
also enable X. A new variable called REQUIRED_USE allowed developers to set
dependencies and also indicate conflicting features.
Where did the security updates go?
If you follow Gentoo, you may have noticed the conspicuous lack of any
announcements regarding security updates (a.k.a. GLSAs) since January of
this year, and sparse announcements since late 2009. This doesn't mean that
security updates do not occur; package maintainers, who comprise the
majority of Gentoo developers, continue to quickly add patches and new
releases for security fixes. It does mean, however, that the security team
is heavily undermanned and cannot keep up with the pace of security holes in
a distribution's worth of software. A significant population of the people
who care deeply about security updates maintain servers, and often are paid
to do so. If they desire GLSAs, perhaps they could contribute to creating
them; a modest effort from enough of these people could help to revive
these security announcements.
Changes to Gentoo's release strategy
Gentoo follows a rolling-release model, with constant updates to individual
packages showing up hourly, 24 hours a day. Previously, it made releases
semi-annually by taking snapshots of its package database, performing lots
of QA on them, and creating LiveCDs — a process that required intensive
manual effort. Gentoo then moved to a "rolling release" strategy for its
releases by creating weekly automatic builds rather than formal
releases. This was a big win in terms of reducing developer effort but came
with an unexpected loss of PR for Gentoo.
When coupled with the current
lack of a
weekly or monthly newsletter, Gentoo has nearly disappeared from news
sites. It turns out that official releases drive news articles; without a
major reason to write about an open-source project, like a release announcement,
news sites often ignore it. For that reason, as well as users clamoring
for full-featured LiveDVDs with pretty artwork, Gentoo again started
producing DVD releases, with the most recent being 11.2 in August.
What makes a healthy project?
Gentoo seems to have the core aspects right: code and community. But all the
peripheral components necessary to a thriving open-source project, at least
one of this size, have been lacking in recent years. Contributors have faded
away for the weekly newsletter (first monthly, then "on hiatus"); Gentoo's previously
award-winning documentation has begun to get stale due to a lack of
documentation contributors to update it for recent changes; and the same has
been true for its release engineers and security team.
Major years-long
initiatives like a migration to Git and a redesigned website have largely
stalled, again because the people involved don't have enough time to work on
them. Although some of these aspects have improved very recently (real
releases again, and former documentation lead Sven Vermeulen just returned
to Gentoo), others remain an open question. It seems that the shrinkage in
the developer community has affected some of the most important
contributors, resulting in a major hit to the distribution that it's still
working to recover from.
What's in Gentoo's future?
First, there's the expected: Gentoo will continue to improve upon its ebuild
format with new EAPIs. The Google Summer of Code program has brought some
welcome new blood into the project, with around two-thirds of the roughly
15 internship
students each year becoming Gentoo developers. Work is ongoing to enable
integration with new technologies like systemd, although it's unclear at
this point whether it will replace Gentoo's custom init system (OpenRC) or
become yet another option. Gentoo is about providing choice wherever it
makes sense rather than enforcing its own choices upon its users, so this
same idea of choosing between alternatives also applies to GNOME 3 vs KDE,
and if anyone makes Unity integrate with Gentoo, that will become an option
too.
As is unfortunately far too common in open-source projects, progress can
sometimes be slow due to lack of volunteer time, especially on larger or
complex issues. Some of them have been dragging on for years now, like a
migration of Gentoo's main repository to Git from CVS, which is both a large
and complex issue. To date, sample conversions exist (such as the one Ohloh
uses for statistics), and a scheme was developed in collaboration with
upstream Git developers to individually sign every commit for improved
security. A tracker
bug and mailing
list exist for anyone interesting in following (or even better, helping
with) the work on the Git conversion.
Another longstanding question that's constantly discussed but rarely acted
upon is the problems with Gentoo's organizational structure. The current
model is of a seven-member council that is entirely up for re-election every
year. In addition, there is a nonprofit foundation that controls Gentoo's
finances, copyrights and trademarks, and hardware, with its own independent
board of trustees. The members-at-large model of the council (rather than
members being in charge of specific areas) means that no progress can happen
on any global Gentoo issue without a majority vote of the seven members, and
this can take months. A number of ideas have been floated, like shrinking
the council, returning to the previous model of a benevolent dictator, or
installing the foundation trustees as a corporate board that would appoint a
project leader, similar to a CEO. The goal should be whatever allows Gentoo
to make faster progress; my entirely biased opinion is that open-source
projects exist to accomplish a purpose and should focus on that, rather than
attempting to be a democratic government where everyone is equal.
Gentoo's largest problems, however, come down to a single core issue: not
enough people are working in the areas outside of development, perhaps
because they're under-appreciated — security, release engineering,
newsletters, and documentation, to name a few. If Gentoo can focus on
finding and retaining contributors there, perhaps by applying lessons from
its involvement in the Google Summer of Code, it could improve its
reputation and increase its publicity. That could well bring in the
contributors to rejuvenate what has become a somewhat sluggish open-source
project.
(
Log in to post comments)