Since LWN has published
statistics on who wrote the Linux kernel, I thought readers might also
be interested in who's writing other major open-source projects. I recently
obtained the entire CVS repository history for Gentoo Linux, courtesy of
Robin Johnson <robbat2 -AT- gentoo -dot- org>. Although some of the
code has moved to Subversion or Git recently so these numbers may not be
100% accurate, the techniques used to analyze commits should be generally
useful in understanding the progress and contributors to any project.
First, I wanted to understand the developer community. How much experience
do our developers have with Gentoo, and how has that changed over time? To
do this, I created a number called "lifetime" that's the length of time
between the developer's first and last commits. Then I scanned across each
month, checking the average developer lifetime. I used the scanning month
for the last commit of active developers to get the developer's experience
at that time, not the developer's experience today.
What you can see is that the lifetimes go up roughly as a function of time
since CVS history begins. This shows that the "average Gentoo developer"
joins and stays involved for more than a year. Over a span of 3 years, the
average lifetime increases from 1 year to 2 years.
Another way to look at this is to ask how many active and retired developers
there are today as a function of when they gained commit access. The
majority of active developers joined in 2005 and 2006, while the most
retired developers joined in 2003 and 2004. This again shows that the
average lifetime is around 2 years.
Developer counts at any given time is also of interest. I found this by
scanning across months again, checking for how many developers the month is
during their commit lifetimes.
The most interesting part is a sharp decline starting in early 2006. I
wanted to attribute this in part to the addition of Subversion, which was
right around that time, but that would only account for it if the developers
commiting to Subversion no longer commited to CVS. That certainly isn't
the case for more than 100 people, since the main package tree remains in
Instead, I now attribute this drop to Gentoo's developer population
returning toward an equilibrium after an explosive, uncontrolled growth. The
Gentoo structure and governance could not scale quickly enough to deal with
all the new developers, but it took some time to normalize and continues to
Now that we've learned something about our developers, how about our code?
The next three graphs show commits per month to each CVS module. The
"gentoo-x86" module contains all of the ebuilds (the packages). There's
nothing particularly unusual about this, except for a huge peak in early
2006, I suspect when someone accidentally branched the entire
repository. Interestingly, there isn't as much of a decline in commits as
you might expect, given the drop in developers by more than a
third. Apparently, the actively commiting developers weren't the ones who
quit. The "gentoo" module contains the website files as well as some
projects such as the installer and the Catalyst LiveCD creator as well as
patchsets for more complex packages. The website is fairly stable at this
point, and many of the projects in this repository have reached maturity, so
development has slowed down. The "gentoo-src" module contains a number of
projects as well, but the huge drop near the beginning of 2006 indicates a
move of active development to Subversion.
And finally, let's tie the developers and the code together with a
histogram. This shows the number of commits each developer's made, with a
bin size of 100. You can see the incredibly long tail of the most active
commiters, with most developers under 20,000 (note the scale) but the top
developer at 120,000 commits.
Now let's take a closer look at the long tail of the developers with the
largest commit counts. The tables show any developer with at least 1% of
the total commits.
|Robin H. Johnson
About 40% of the all-time commits to Gentoo come from just 18
developers. Unfortunately, I didn't have access to the size of the
commits, just the number of them, so I couldn't try to rank them by
changes in lines of code. One thing to be wary of is the very small
commits, such as those indicating that a package works on a given
architecture. But this list is not dominated by architecture developers.
In 2007 so far, 26 developers accounted for nearly 60% of
commits. Unlike the all-time list, a significant fraction of these
developers are architecture developers, including the top commiter.
This analysis was mostly automated, using a combination of awk, bash shell,
Python and gnuplot. The scripts are available upon request to the
author <dberkholz -AT- gentoo -dot- org>.
to post comments)