News and Editorials
Since LWN has published statistics on who wrote the Linux kernel, I thought readers might also be interested in who's writing other major open-source projects. I recently obtained the entire CVS repository history for Gentoo Linux, courtesy of Robin Johnson <robbat2 -AT- gentoo -dot- org>. Although some of the code has moved to Subversion or Git recently so these numbers may not be 100% accurate, the techniques used to analyze commits should be generally useful in understanding the progress and contributors to any project.
First, I wanted to understand the developer community. How much experience do our developers have with Gentoo, and how has that changed over time? To do this, I created a number called "lifetime" that's the length of time between the developer's first and last commits. Then I scanned across each month, checking the average developer lifetime. I used the scanning month for the last commit of active developers to get the developer's experience at that time, not the developer's experience today.
What you can see is that the lifetimes go up roughly as a function of time since CVS history begins. This shows that the "average Gentoo developer" joins and stays involved for more than a year. Over a span of 3 years, the average lifetime increases from 1 year to 2 years.
Another way to look at this is to ask how many active and retired developers there are today as a function of when they gained commit access. The majority of active developers joined in 2005 and 2006, while the most retired developers joined in 2003 and 2004. This again shows that the average lifetime is around 2 years.
Developer counts at any given time is also of interest. I found this by scanning across months again, checking for how many developers the month is during their commit lifetimes.
The most interesting part is a sharp decline starting in early 2006. I wanted to attribute this in part to the addition of Subversion, which was right around that time, but that would only account for it if the developers commiting to Subversion no longer commited to CVS. That certainly isn't the case for more than 100 people, since the main package tree remains in CVS.
Instead, I now attribute this drop to Gentoo's developer population returning toward an equilibrium after an explosive, uncontrolled growth. The Gentoo structure and governance could not scale quickly enough to deal with all the new developers, but it took some time to normalize and continues to do so.
Now that we've learned something about our developers, how about our code? The next three graphs show commits per month to each CVS module. The "gentoo-x86" module contains all of the ebuilds (the packages). There's nothing particularly unusual about this, except for a huge peak in early 2006, I suspect when someone accidentally branched the entire repository. Interestingly, there isn't as much of a decline in commits as you might expect, given the drop in developers by more than a third. Apparently, the actively commiting developers weren't the ones who quit. The "gentoo" module contains the website files as well as some projects such as the installer and the Catalyst LiveCD creator as well as patchsets for more complex packages. The website is fairly stable at this point, and many of the projects in this repository have reached maturity, so development has slowed down. The "gentoo-src" module contains a number of projects as well, but the huge drop near the beginning of 2006 indicates a move of active development to Subversion.
And finally, let's tie the developers and the code together with a histogram. This shows the number of commits each developer's made, with a bin size of 100. You can see the incredibly long tail of the most active commiters, with most developers under 20,000 (note the scale) but the top developer at 120,000 commits.
Now let's take a closer look at the long tail of the developers with the largest commit counts. The tables show any developer with at least 1% of the total commits.
|Robin H. Johnson||1.98|
About 40% of the all-time commits to Gentoo come from just 18 developers. Unfortunately, I didn't have access to the size of the commits, just the number of them, so I couldn't try to rank them by changes in lines of code. One thing to be wary of is the very small commits, such as those indicating that a package works on a given architecture. But this list is not dominated by architecture developers.
In 2007 so far, 26 developers accounted for nearly 60% of commits. Unlike the all-time list, a significant fraction of these developers are architecture developers, including the top commiter.
This analysis was mostly automated, using a combination of awk, bash shell, Python and gnuplot. The scripts are available upon request to the author <dberkholz -AT- gentoo -dot- org>.
New Releasesannounced the release of openSUSE version 10.3. "Enhancements to openSUSE 10.3 include the newest versions of the GNOME* and KDE desktop environments, including a KDE 4 preview. OpenOffice.org 2.3 makes sharing files with Microsoft Office users easy, and the newest version of AppArmor(TM) protects the Linux operating system and applications from attacks, viruses and malicious applications. OpenSUSE 10.3 also now includes MP3 support out of the box for Banshee(TM) and Amarok, which are the default media players in openSUSE. In addition, openSUSE 10.3 offers the latest open source applications for developing applications, setting up a home network and running a Web server, as well as the latest virtualization software such as Xen* 3.1 and VirtualBox 1.5." There are more links in this announcement from the openSUSE team. the release tour page for more information. announced the immediate availability of Linspire 6.0, the latest commercial release of the desktop Linux operating system. "Building on the best of open source software using Ubuntu as its base line, Linspire 6.0 adds licensed proprietary drivers, codecs, and software in its core distribution to provide a better user experience." Fedora Electronic Lab live CD has been updated to the latest Fedora test release. "The idea of Fedora Electronic Lab is not to include as many as packages for electronic simulations, but mainly to ensure that design flows can be achieved. Because it's useless for the user to have those packages if his/her data can't be process with other applications thus implying the user will waste his/her time." In distribution since 2001, Guardian Digital's EnGarde has been a staple for security enthusiasts, administrators and organizations looking to run servers easily and securely. Solely designed as an integrated server platform, EnGarde Secure Linux provides web, DNS and email functions simply and securely, while delivering on integrated intrusion detection, advanced kernel and network security features, manageable SELinux policies, robust engineering and graphical auditing and reporting."
Debian GNU/LinuxThe summer has finished, and it's about time I summarised how we got on. We had 9 Summer of Code students working for us, and we had a 100% success rate this year. Woo! Last year we only managed 6 successful projects out of 10, so that's a major improvement. A couple of things helped a great deal this time: several of our students were already contributors to the Debian community at various levels, and for the first time this year the SoC programme also included an extra chunk of time to allow the students to get involved ("bonding time") before they had to start coding work. These meant that our students were much more involved in Debian than last year, and that was a very good outcome." [T]he Alioth team is proud to announce YARCS (Yet Another Revision Control System). Alioth can now host your Darcs repositories in pretty much the same way as it can host your CVS, Subversion, Arch/Bazaar, Bzr, Hg and Git repositories. Please refer to http://wiki.debian.org/Alioth/darcs for details."
Mandriva LinuxA new subscription mode in the commercial Powerpack product will provide users to install or upgrade their system and take advantage of all the new technologies integrated in the Powerpack product. Regarding this new pricing policy, the new subscription mode in the Powerpack is the replacement of the former downloading services of the Club. The 'community services' of Club platform will be now available to all. This new plan is designed for users who want to stay tuned with the latest Linux technologies. With the 2008 release, customers can subscribe to this plan to download 2 Powerpack releases. This plan is offered for only 59 euro per year ($69 USD)."
Other distributionsgetting a change in leadership. Project founder Judd Vinet is stepping down and passing on the leadership role to Aaron Griffin. "Though I will still be around for discussions and anything else I can do for Arch (within my feeble time constraints), I feel this is the time to say Goodbye, at least as leader. I will still be around as Arch's Number One Cheerleader, but not as its visionary." Judd's blog has a bit more about what he's up to, besides Arch.
New Distributionsreports on a new distribution called Vixta. "Vixta's home page touts the imminent availability of version 095, and enumerates the goals of the project as follows: 1. Absolutely free, in every sense; 2. Spread Linux to the "masses"; 3. ABN -- Absolutely No Config.; 4. User-Friendly; 5. Eye-catching. Familiar look and feel."
Distribution NewslettersDistroWatch Weekly for October 8, 2007 is out. "The big openSUSE 10.3 release week is now behind us. All went without a hitch and many users are enjoying the newest software, improved package management, and extended support for the latest hardware in this new version. No major bugs have been reported so far, but let's wait for the first reviews before concluding that this is indeed openSUSE's best release ever. In other news, Mandriva Linux 2008 has been released to "early seeders", Ubuntu has begun accepting pre-orders for "Gutsy Gibbon", and Judd Vinet has resigned as the lead developer of Arch Linux. Finally, don't miss the featured story of this week - a Susan Linton's report on the major new release from Puppy Linux, version 3.00."
Newsletters and articles of interestinterview with Máirín Duffy, lead of the Fedora art team, on the process of creating the artwork for Fedora 8. "We don't have any hard restriction saying that you can only produce software using the free and open source tools available in Fedora, but all of the artwork as far as I know was produced exclusively in tools available in Fedora, including Inkscape and the GIMP. So Fedora 8's artwork serves as a pretty good example of what you can do with the tools readily available in Fedora itself." sets up a GNOME desktop on openSUSE 10.3. "This tutorial shows how you can set up an OpenSUSE 10.3 desktop that is a full-fledged replacement for a Windows desktop, i.e. that has all the software that people need to do the things they do on their Windows desktops. The advantages are clear: you get a secure system without DRM restrictions that works even on old hardware, and the best thing is: all software comes free of charge." some tips for speeding up Ubuntu, including filesystem, swapping, and other tweaks. "On default installation Ubuntu chooses 'journal data ordered' and In data=ordered mode, ext3 only officially journals metadata it logically groups metadata and data blocks into a single unit called a transaction. When it want to write the new metadata out to disk, the associated data blocks are written first. data=ordered mode effectively solves the corruption problem found in data=writeback mode and most other journaled filesystems, and it is done without requiring full data journaling. In general, data=ordered ext3 filesystems perform slightly slower than data=writeback filesystems, but slightly faster than the full data journaling counterparts. To speed it up we're going to change it to data=writeback system."
Distribution reviewstakes a look at the latest version of Puppy Linux. "Puppy breathes new life into old hardware and runs well on diskless PCs and thin workstations. Needless to say, it's a total speed demon on state-of-the-art hardware. While I've emphasized Puppy's special role on constrained hardware, the product is fully competitive on current systems. My friends and I run it on our newest computers, too." reviews the 0.4 version of SystemRescueCD, a Gentoo-based distribution for repairing broken systems. "Another major improvement is that you can now use PXE network booting. With PXE, you can boot a troubled PC remote over your LAN into SystemRescueCD. This is great, for example, for a help desk repairing systems scattered over an office or campus. To get this to work, the PCs will need to be set to use wake-on-LAN and network boot. That's been a standard PC feature since 2001, but it usually must be made active in the BIOS before you can use it." looks at the seven top Linux distributions. "GNU/Linux offers a bewildering variety of flavors -- or distributions, as they're called. To a newcomer's eye, many of these seem virtually identical to each other. Yet, the more you learn about a distribution and the community that surrounds it, the more different they become. Here, in alphabetical order, is a list of the seven distributions that have most affected GNU/Linux as a whole."
Page editor: Rebecca Sobol
Next page: Development>>
Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds