How old is our kernel?
Let it rip!
The community did, indeed, let it rip; some 180,000 changesets have been added to the repository since then. Typically hundreds of thousands of lines of code are changed with each three-month development cycle. A while back, your editor began to wonder how much of the kernel had actually been changed, and how much of our 2.6.33-to-be kernel dates back to 2.6.12-rc2, which was tagged at the opening of the git era? Was there anything left of the kernel we were building in early 2005?
Answering this question is a simple matter of bashing out some ugly scripts and dedicating many hours of processing time. In essence, the "git blame" command can be used to generate an annotated version of a file which lists the last commit to change each line of code. Those commit IDs can be summed, then associated with major version releases. At the end of the process, one has a simple table showing the percentage of the current kernel code base which was created for each major release since 2.6.12. Here's what it looks like:
In summary: just over 41% 31% of the kernel tree dates
back to 2.6.12, and has not been
modified since then. Our kernel may be changing quickly, but parts of it
have not changed at all for nearly five years. Since then, we see a steady
stream of changes, with more recent kernels being more strongly represented
than the older ones. That curve will partly be a result of the general
increase in the rate of change over time; 2.6.13 had fewer than 4,000
commits, while 2.6.33 will have almost 11,000. Still, one has to wonder
what happened with 2.6.20 (5,000 commits) to cause that
release to represent less than 2% of the total code base.
Much of the really old material is interspersed with newer lines in many files; comments and copyright notices, in particular, can go unchanged for a very long time. The 2.6.12 top-level makefile set VERSION=2 and PATCHLEVEL=6, and those lines have not changed since; the next line (SUBLEVEL=33) was changed in December.
There are interesting conclusions to be found at the upper end of the graph as well. Using this yardstick, 2.6.33 is the smallest development cycle we have seen in the last year, even though this cycle will have replaced some code added during the previous cycles. 4.2% of the code in 2.6.33 was last touched in the 2.6.33 cycle, while each of the previous four kernels (2.6.29 through 2.6.32) still represents more than 5.5% of the code to be shipped in 2.6.33.
Another interesting exercise is to look for entire files which have not been touched in five years. Given the amount of general churn and API change which has happened over that time, files which have not changed at all have a good chance of being entirely unused. Here is a full list of files which are untouched since 2.6.12 - all 1062 of them. Some conclusions:
- Every kernel tarball carries around drivers/char/ChangeLog, which is
mostly dedicated to documenting the mid-90's TTY exploits of Ted
Ts'o. There is only one change since 1998, and that was in 2001.
Files like this may be interesting from a historical point of view,
but they have little relevance to current kernels.
- Unsurprisingly, the documentation directory contains a great deal of
material which has not been updated in a long time. Much of it need
not change; the means by which one configures an ISA Sound Blaster
card is pretty much as it always was - assuming one can find such a
card and an ISA bus to plug it into. Similarly, Klingon language
support (Documentation/unicode.txt), Netwinder support, and such have
not seen much development activity recently, so the documentation can
be deemed to be current, if not particularly useful. All told,
41% of the documentation directory dates back to 2.6.12. There was a
big surge of
documentation work in 2.6.32; without that, a larger percentage of
this subtree would look quite old.
- Some old interfaces haven't changed in a long time, resulting in a lot
of static files in include/.
<linux/sort.h> declares sort(), which is used
in a number of places. <include/fcdevice.h> declares
alloc_fcdev(), and includes a warning that "
This file will get merged with others RSN.
" Much of the sunrpc interface has remained static for a long time as well. - Ancient code abounds in the driver tree, though, perhaps
unsurprisingly, old header files are much more common than old C
files. The ISDN driver tree has been quite static.
- Much of sound/oss has not been touched for a long time
and must be nicely filled with cobwebs by now; there hasn't been much
of a reason to touch the OSS code for some time.
- net/decnet/TODO contains a "
quick list of things that need finishing off
"; it, too, hasn't been changed in the git era. One wonders how the DECnet hackers are doing on that list...
So which subsystem is the oldest? This plot looks at the kernel subsystems (as defined by top-level directories) and gives the percentage of 2.6.12 code in each:
The youngest subsystem, unsurprisingly, is tools/, which did not exist prior to 2.6.29. Among subsystems which did exist in 2.6.12, the core kernel, with about 13% code dating from that release, is the newest. At the other end, the sound subsystem is more than 45% 2.6.12 code - the highest in the kernel. For those who are curious about the age distribution in specific subsystems, this page contains a chart for each.
In summary: even in a code base which is evolving as rapidly as the kernel, there is a lot of code which has not been touched - even by coding style or white space fixes - in the last five years. Code stays around for a long time.
(For those who would like to play with this kind of data, the scripts used have been folded into the gitdm repository at git://git.lwn.net/gitdm.git).
Note: this article has been edited to fix an error which overstated
the amount of 2.6.12 code remaining in the full kernel.
| Index entries for this article | |
|---|---|
| Kernel | Git |
| Kernel | Releases/2.6.33 |
