User: Password:
|
|
Subscribe / Log in / New account

Who wrote 2.6.20?

Who wrote 2.6.20?

Posted Feb 21, 2007 2:44 UTC (Wed) by pr1268 (subscriber, #24648)
Parent article: Who wrote 2.6.20?

Using lines of code as a metric is pure evil. Sorry for venting, but I've learned and read that LOC is the single most misused and abused metric in all of software engineering.

However, I do respect and appreciate the hard work our editor has done. I assume there was no easier way to quantify and qualify the data above into meaningful information which accurately represents the state of authorship of the Linux Kernel. Is that a fair assessment? Finally, is there a correlation between the quantity of patches in a particular functional section of the Kernel (i.e. virtualization, filesystems, network device drivers, etc.) with whatever company has a vested interest in ensuring that functionality adds value to the company's Linux product(s)?

Thank you, Jon, for this research.


(Log in to post comments)

Who wrote 2.6.20?

Posted Feb 21, 2007 2:51 UTC (Wed) by corbet (editor, #1) [Link]

As I noted in the article, measuring these things is hard, and I agree that lines-of-code is of limited utility. Still, there's some information there, so I thought it was worth a look.

Delving into the various kernel subsystems is an area of future research. I did some quick-and-dirty runs which suggest that the representation of the various companies does not change as much as one might expect from one subsystem to another. It also looks like the "hobbyist" contribution to the core parts of the kernel is just as high as in, say, the driver tree. I will be looking at this more in the future.

Who wrote 2.6.20?

Posted Feb 21, 2007 6:20 UTC (Wed) by jamesm (guest, #2273) [Link]

grepping for names and email addresses in the kernel source is sometimes useful (try grep -ri davem /usr/src/kernel for example).

If not SLOC, then what?

Posted Feb 21, 2007 6:49 UTC (Wed) by ldo (guest, #40946) [Link]

LOC is the single most misused and abused metric in all of software engineering.

So what? What's the alternative?

Who wrote 2.6.20?

Posted Feb 21, 2007 16:23 UTC (Wed) by richardl@redhat.com (guest, #31678) [Link]

LOC is a perfectly valid metric as long as you normalize against language, etc. In this case, LOC is used as a relative metric. The effort required to produce 100 LOC in C for the kernel is different from the effort required to produce 100 LOC in, say, Ruby for a webapp -- but that's not what the editor is doing here.

I'd be interested in hearing why you think LOC is "pure evil." I think it all depends on how you use it.

Who wrote 2.6.20?

Posted Feb 21, 2007 16:46 UTC (Wed) by lmb (subscriber, #39048) [Link]

LoC changed is difficult though. For example, I could iterate 100 times trying to get a single line of code right. But then, software metrics are hard.

One suggestion for a possibly interesting metric, so that I don't have to code it myself:

Annotate the whole of the tree: Who last changed which line? Number of lines * age = Author score.

This can then be extended to a historical score: who contributed how many lines of code, and how long did they remain in the tree before being removed/changed? Developers changing their own code would get accumulated, so this is essentially neutral.

LOC metric

Posted Feb 23, 2007 1:23 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

...as long as you normalize against language, etc. In this case, LOC is used as a relative metric. The effort required to produce 100 LOC in C for the kernel is different from the effort required to produce 100 LOC in, say, Ruby for a webapp

I saw a study long ago that had the remarkable result that there is nothing to normalize here. It was looking specifically at the cost to develop and test new software, and found that 100 LOC costs the same regardless of the language or subject. What I've seen is consistent with that.

The study did find a few variables that added precision to a LOC-based estimate. With modification of existing code, there were some measurements of the code base that helped. I think number of files touched added precision too.

Who wrote 2.6.20?

Posted Feb 24, 2007 11:05 UTC (Sat) by bockman (guest, #3650) [Link]

Well, for one thing often you can accomplish something equivalent with 1000 lines of dumb code or with 300 lines of very smart code. Most of the programming effort is going into figuring out the 'commonalities' between potential code blocks and write customizable code ( loops, routines, classes, templates) that exploit said commonalities. But the more time a developer spends in this kind of exercise, the shorter the final code would result.

I don't say that LOC measurements are meaningless. Just that they are statistics and should not used outside of this context ( for instance should not be used to measure the productivity of a developer or even a team ).

Ciao
-----
FB

Who wrote 2.6.20?

Posted Mar 1, 2007 21:00 UTC (Thu) by jboorn (guest, #43808) [Link]

So what. You can write reallly slow naive brute force code for some problem with 300 lines. Or you can you use a fancy complicated algorithm that takes 1000 lines of code, but is much faster.

In this case the code is for the same project and I think using lines of code with in a project is good enough for the analysis sought here.

It is a bit annoying to see the same argument about lines of code count come up that is pointless. Sure it is possible to find examples of code that is smaller and as efficient (or more efficient) than a given larger implementation. But, that does not exclude the existence of larger code that is more desirable for a given project based on a meteric other than executable size.

LOC is quite ok...

Posted Feb 21, 2007 21:25 UTC (Wed) by nettings (subscriber, #429) [Link]

"Using lines of code as a metric is pure evil. "

wrong. absolute lines-of-code counts are certainly bogus as a measure for productivity, but the purpose of this article was to find a relative measure of where commits come from.
unless you can demonstrate that corporate-backed hackers produce a significantly different amount of functionality or utility per line of code (which would introduce a systemic error), the method is perfectly valid, because the inherent bogosity of LOC measurements will level out.

LOC is quite ok...

Posted Mar 3, 2007 17:36 UTC (Sat) by jzbiciak (subscriber, #5246) [Link]

Also, LOC is only meaningful if the output of the measurement isn't an input into future productivity. If coders are incentivized by their KLOC numbers (either directly, such as through wages and promotions, or indirectly through ego boosting), then KLOC can quickly become meaningless.

LOC metrics

Posted Feb 21, 2007 23:32 UTC (Wed) by man_ls (guest, #15091) [Link]

LOC is a perfectly valid metric; all metrics can be abused, and LOC have suffered more than their due, but well understood and with a little effort (e.g. removing blanks and comments) they are very useful.

Laird and Brennan said it well: LOC are like square meters for an apartment. Sure, 160 m^2 in Madrid are not comparable directly to 160 m^2 in rural Teruel. And even in the same city, if you compare the price of m^2 for luxury attics with old basements you are probably going to make a bad decision. But if you are going to buy a house, you have better know how many m^2 it has, instead of relying on subjective impressions of size.

In this case, what do you propose measuring? Function points? In case you don't know, when you don't have direct fp counts from construction data, you backfire them from... lines of code, by applying a coefficient.

LOC metrics

Posted Feb 23, 2007 0:00 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

But if you are going to buy a house, you have better know how many m^2 it has, instead of relying on subjective impressions of size.

I'd say just the opposite. If you're looking at the house, your subjective impression of size is what really counts. The square meters in the listing are a cheap estimate -- cheaper than visiting the house -- of how spacious it is.

And so it is with LOC. If you're asking what it would cost to duplicate the development of 2.6.20 from 2.6.19, getting a bunch of professionals to look at the function and give their impression of how many person-hours it would take would be a lot better than counting LOC, but LOC is much cheaper. And history shows that the quality of the estimate you get by multiplying by LOC is quite acceptable.

Who wrote 2.6.20?

Posted Feb 25, 2007 15:55 UTC (Sun) by kingdon (guest, #4526) [Link]

To his credit, Jon gave higher praise to deleting code than writing it.

So although I agree that a naive attitude of "more lines of code means the developers are working harder/better" is dead wrong, I wouldn't tar this analysis with that brush.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds