By Jonathan Corbet
July 20, 2011
Last week's Kernel Page included
an article on
the top contributors to the 3.0 kernel, an extended version of our
traditional look at who participated in each kernel development cycle.
Since this article is traditional, it tends not to draw a lot of
attention. This time was different: quite a few publications have picked
up on the fact that Microsoft was one of the top contributors of changesets
by virtue of a long series of cleanups to its "hv" driver in the staging
tree. Those sites, seemingly, got a lot more mileage out of those results
than LWN did, an amusing outcome which can be expected occasionally with
subscriber-only content. That said, this outcome is a bit dismaying for
other reasons.
Some background: the hv driver helps Linux to function as a virtualized
guest under
Windows. It is useful code which, with luck, will soon move out of the
staging tree and into the mainline kernel proper. After a period of
relative neglect, developers at Microsoft have started cleaning up hv with
a vengeance - 366 hv patches were merged for the 3.0 development cycle.
This work has clear value; it is aimed at getting this code ready to
graduate from staging; it is worth having.
That said, let's look at the actual patches. 47 of them simply
move functions around to eliminate the need for forward declarations; 39 of
them rename functions and variables; 135 of them take the form "get rid of
X" for some value of (usually unused) X. Clearly this 15,000-line driver
needed a lot of cleaning, and it's good that people are doing the work.
But it also seems somewhat uncontroversial to say that this particular body
of work does not constitute one of the more significant contributions to
the 3.0 kernel.
Part of the problem, beyond any doubt, is the creation of lists of top
changeset contributors in the first place. The number of changes is an
extremely poor metric if one is interested in how much real, interesting
work was contributed to the kernel. Some changes are clearly more
interesting than others. Highlighting changeset counts may have ill effects
beyond misleading readers; if the number of changesets matters, developers
will have an incentive to bloat their counts through excessive splitting of
changes - an activity which, some allege, has been going on for a while
now.
LWN does post another type of statistic - the number of lines changed. As
with changeset counts, this number does contain a modicum of information.
But as a metric for the value of kernel contributions, it is arguably even
worse than changeset counts. Picking a favorite Edsger Dijkstra quote is a
challenging task, but this one would be a contender:
If we wish to count lines of code, we should not regard them as
"lines produced" but as "lines spent".
Just because a patch changes (or adds) a lot of code does not mean that
there is a lot of value to be found therein.
Given these problems, one might be tempted to just stop producing these
statistics at all. Yet these metrics clearly have enough value to be
interesting. When LWN first started posting these numbers, your editor was
approached at conferences by representatives from two large companies who
wanted to discuss discrepancies between those numbers and the ones they had
been generating internally. We are routinely contacted by companies wanting to
be sure that all of their contributions are counted properly. Developers
have reported receiving job offers as a result of their appearance in the
lists of top contributors. Changeset counts are also used to generate the
initial list of nominees to the Kernel Summit. For better or for worse,
people want to know who the most significant contributors to the kernel
are.
So it would be good to find some kind of metric which yields that
information in a more useful way than a simple count of changesets or lines
of code. People who understand the code can usually look at a patch and
come to some sort of internal assessment - though your editor might be
disqualified by virtue of having once suggested that merging devfs would be
a good idea. But the reason why even that flawed judgment is not used in LWN's
lists is simple: when a typical development cycle brings in 10,000
changesets, the manual evaluation process simply does not scale.
So we would like to have a metric which would try, in an automated fashion,
to come up with an improved estimate of the value of each patch. That does
not sound like an easy task. One could throw out some ideas for heuristics
as a place to start; here are a few:
- Changes to core code (in the kernel, mm, and
fs directories, say) affect more users and are usually more
heavily reviewed; they should probably be worth more.
- Bug fixes have value. A heuristic could try to look to see if the
changelog contains a bug ID, whether the patch appears in a stable
update, or whether it is a revert of a previous change.
- Patches that shuffle code but add no functional change generally have
relatively low value. Patches adding documentation, instead, are
priceless.
- Patches that remove code are generally good. A patch that adds code
to a common directory and removes it from multiple other locations may
be even better.
Patches adding significant code which appears to be cut-and-pasted
from elsewhere may have negative value.
- Changes merged late in the development cycle may not have a high
value, but, if the development process is working as it should be,
they should be worth something above the minimum.
- Changes merged directly by Linus presumably have some quality which
caught his attention.
Once a scoring system was in place, one could, over time, try to develop a
series of rules like the above in an attempt to better judge the true value
of a developer's contribution.
That said, any such metric will certainly be seen as unfair by at least
some developers - and rightly so, since it will undoubtedly be
unfair. This problem has no solution that will be universally seen as
correct. So, while we may well play with some of these ideas, it seems
likely that we are stuck with changeset and lines-changed counts for the
indefinite future. These metrics, too, are unfair, but at least they are
unfair in an easily understood and objectively verifiable way.
(
Log in to post comments)