|
|
Subscribe / Log in / New account

Comparing free and proprietary software defect rates

[This article was contributed by Joe 'Zonker' Brockmeier]

Tuesday a company called Reasoning, Inc. released a study that seems to prove what Open Source developers have been saying for years: Open code, and the inspection that it allows, produces a better product. Specifically, the company compared the Linux TCP/IP stack against a number of commercial TCP/IP stacks and found that the Linux implementation had fewer defects than other proprietary implementations.

The paper, "How Open-Source and Commercial Software Compare" is available from Reasoning by request, so we decided to take a look at it to see how they had reached their conclusions.

Specifically, Reasoning lined up the Linux TCP/IP implementation from the 2.4.19 Linux kernel against five commercial implementations. In total, out of 81,852 lines of code, Reasoning found only 8 defects in the Linux TCP/IP code. All but one of the other five implementations compared with Linux were at least ten years old, the other is about three years old. The company did not name the specific operating systems, but Reasoning's CEO Scott Trappe confirmed that two were commercial Unix systems, one was "not Unix but in very broad use," and the embedded implementations were by "major vendors of networking equipment." Trappe said that Reasoning couldn't name companies specifically, but the companies had agreed to let Reasoning use the aggregate data.

As always, it helps to understand the company doing the research, and the context of the research, before taking the results too seriously. We spoke with Trappe, to clarify some information not in the white paper and to get a feel for Reasoning's background. Reasoning is a company that specializes in automated testing of software written in C/C++, which it has been doing since 2001. Prior to that, the company had specialized in Y2K testing. The company plans to add testing of Java software to its services later this year.

The study was not commissioned by any of the Linux vendors or companies who might be competing with Linux. Instead, Trappe said that the company had performed the study primarily to highlight its services. Unlike the other projects that Reasoning works on, they were free to release their results along with specific code examples from the Linux TCP/IP stack. Trappe also said that the company was looking to prove that inspection itself was important in providing quality software and that "testing alone can never uncover all the defects in software."

The company chose the TCP/IP stack because it provided a good point of comparison. Trappe admitted that it might be stretching it to draw too many conclusions from the study of one piece of software, but that their study "does support some claims that it can rival commercial quality." Trappe also mentioned that the company may do further studies in the future comparing Open Source software to commercial software.

The company looks for five kinds of defects in code: Memory leaks, null pointer dereferences, bad deallocations, out of bounds array access and uninitialized variables. According to Trappe, none of the errors found in the Linux TCP/IP stack were security issues. At least one of the issues, a memory leak, was fixed in the 2.4.20 kernel before Reasoning notifed the kernel team of the defects. Four of the problems found (an uninitialized variable and some out-of-bounds errors) are not truly defects, since they do not cause the code to behave incorrectly. So, of eight defects reported, four are not real, three are debatable and one has been fixed.

When taking into account the revised information, the Linux TCP/IP stack has a defect density of 0.013 per 1,000 lines of code. The implementation with the fewest defects after Linux is one of the embedded stacks, with .08 defects per 1,000 lines of code. One implementation, one of the commercial OSes, had 183 defects out of about 269,100 lines of code - 0.7 per thousand.

To be sure, the Reasoning study raises some interesting points, though there's not enough data to say conclusively that Open Source software is always of higher quality than its proprietary counterparts. The study looked only at one small piece of the Linux kernel, and only considered a small set of information. The Linux kernel has also been extensively checked for this sort of error by the Stanford checker and the new "smatch" program, so it should be relatively clean. Reasoning's study says nothing about performance or features, and it does not address the functionality of the code. However, it does supply some data in favor of the argument that open code leads to higher quality -- at least in terms of specific defects.

We'll be interested to see what kinds of studies Reasoning does in the future, and how other Open Source projects compare to commercial code.


(Log in to post comments)

Comparing free and proprietary software defect rates

Posted Feb 13, 2003 16:15 UTC (Thu) by rakoch (guest, #4666) [Link]

In a commercial setting all kind of "under the hood" work is not very
rewarding. It's much more interesting for a programmer or a team to
implement features that are visible. On the other hand the Linux TCP/IP
stack is probably one of the most scrutinized pieces of software in the
free software world.

There are plenty of worthy free software projects on sf.net. Most of them
simply cannot compare to their commercial competitors. Why? Because they
have a problem the Linux kernel has not: They lack developers.

A TCP stack simply isn't the focus of most commercial OSes. An exception
might be routers which was probably the "embedded device" in the test. You
bet for Cisco the TCP/IP stack is important. But for Sun/IBM/HP it's much
more important to have their Unix scale to dozens of CPUs. And for MS the
IE was probably the most important part of the OS until DRM and Palladium
came along.

So concluding from the quality of the TCP/IP stack to the quality of the
rest is pretty misleading. I'd be curious about a comparison of compilers,
though.

Comparing free and proprietary software defect rates

Posted Feb 13, 2003 19:53 UTC (Thu) by giraffedata (guest, #1954) [Link]

I think the results will be similar everywhere.

Around 1992, there was a research paper on the same topic, with the same results. In that case, a team of programmers fed random input to a whole bunch of Unix programs and noted when the programs crashed. Where they had source code, they debugged the crash and sent the information to the maintainer of the code.

Actually, the paper's main point was the source of the bugs (buffer overrun, etc.) but there was an unmistakeable difference in bug rates between free software and commercial software. At the time, this surprised many of us because the common wisdom in the industry said commercial software was bound to have fewer bugs because of all the investment in testing and because commercial publishers had more to lose from bugs.

But there was another result in that paper which I found much more interesting, which I think explained the phenomenon. The study was a followup on a study done the same way years earlier, which had found the same difference. In the followup, the programmers looked for the same bugs that had been reported in the original study. In commercial software, nearly all of the bugs were still present. In free software, nearly all of the bugs had been removed.

As a software developer for a major software publisher at the time, this didn't surprise me one bit. The software development machine of IBM is not capable of fixing a product just because it learns it's broken. But an individual free software developer not only is capable of releasing a fix, but insists on it as a matter of pride.

--
Bryan Henderson bryanh@giraffe-data.com
San Jose, California

Smatch

Posted Feb 15, 2003 22:00 UTC (Sat) by error27 (subscriber, #8346) [Link]

Thanks for the Smatch plug. :) The website is smatch.sf.net and since the article mentions null pointer dereferences the results for the dereference testing of 2.5.60 are here.

Smatch is too new to have had much influence yet. The Stanford Checker has made a measurable difference on the number of kernel errors. Unfortunately, their bug database only goes up to 2.4.1. They have released newer lists of bugs to lkml but when I released bug lists it was not as effective as a database (especially for long lists).

I am excited about the possibilities for Smatch to improve the kernel. The code for the Smatch derefence check about 130 lines long. A lot of that was cut and paste. Any of the other four checks should be possible with Smatch as well.

Stanford CHECKER?

Posted Feb 20, 2003 13:48 UTC (Thu) by wh (guest, #9477) [Link]

I believe that the only reason why Linux has so few of these
computer-findable bugs is that the people from the Standford "Checker"
project already ran their program on the Linux source and all the bugs
found by it got fixed afterwards.

Stanford CHECKER?

Posted Feb 20, 2003 14:33 UTC (Thu) by haraldt (guest, #961) [Link]

Still, if the Linux kernel is one of the few to have been tested this way, what does this tell us?

Stanford CHECKER?

Posted Feb 20, 2003 18:45 UTC (Thu) by ncm (subscriber, #165) [Link]

Exactly: Linux came out better because it attracts public and academic scrutiny, and improves as a direct result of that scrutiny. It's not because the coders are more competent, or motivated. Sometimes they are, sometimes they aren't. The difference is in how it improves.

Stanford CHECKER?

Posted Feb 20, 2003 19:44 UTC (Thu) by wh (guest, #9477) [Link]

True, but the story's author interprets the results of Reasoning's findings differently. Just because Linux has less memory leaks (found by the Stanford Checker) than other operating systems doesn't mean that Linux has less race conditions (not found by the Stanford Checker) than other operating systems.

Race conditions

Posted Feb 23, 2003 8:36 UTC (Sun) by ncm (subscriber, #165) [Link]

Linux also has many more people looking for race conditions in it than any of the proprietary systems that they compared it to. Therefore, we can reasonably expect it to compare favorably in that area vs. the other systems.

This is not to say Linux would show well, on an absolute scale, in its number of remaining race conditions. It probably stinks. The proprietary stacks, though, probably stink far, far worse, for exactly the same reasons that they leak worse.


Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds