Comparing free and proprietary software defect rates
[This article was contributed by Joe 'Zonker' Brockmeier]
Tuesday a company called Reasoning, Inc. released a study that seems to prove what Open Source developers have been saying for years: Open code, and the inspection that it allows, produces a better product. Specifically, the company compared the Linux TCP/IP stack against a number of commercial TCP/IP stacks and found that the Linux implementation had fewer defects than other proprietary implementations.The paper, "How Open-Source and Commercial Software Compare" is available from Reasoning by request, so we decided to take a look at it to see how they had reached their conclusions.
Specifically, Reasoning lined up the Linux TCP/IP implementation from the 2.4.19 Linux kernel against five commercial implementations. In total, out of 81,852 lines of code, Reasoning found only 8 defects in the Linux TCP/IP code. All but one of the other five implementations compared with Linux were at least ten years old, the other is about three years old. The company did not name the specific operating systems, but Reasoning's CEO Scott Trappe confirmed that two were commercial Unix systems, one was "not Unix but in very broad use," and the embedded implementations were by "major vendors of networking equipment." Trappe said that Reasoning couldn't name companies specifically, but the companies had agreed to let Reasoning use the aggregate data.
As always, it helps to understand the company doing the research, and the context of the research, before taking the results too seriously. We spoke with Trappe, to clarify some information not in the white paper and to get a feel for Reasoning's background. Reasoning is a company that specializes in automated testing of software written in C/C++, which it has been doing since 2001. Prior to that, the company had specialized in Y2K testing. The company plans to add testing of Java software to its services later this year.
The study was not commissioned by any of the Linux vendors or companies who might be competing with Linux. Instead, Trappe said that the company had performed the study primarily to highlight its services. Unlike the other projects that Reasoning works on, they were free to release their results along with specific code examples from the Linux TCP/IP stack. Trappe also said that the company was looking to prove that inspection itself was important in providing quality software and that "testing alone can never uncover all the defects in software."
The company chose the TCP/IP stack because it provided a good point of comparison. Trappe admitted that it might be stretching it to draw too many conclusions from the study of one piece of software, but that their study "does support some claims that it can rival commercial quality." Trappe also mentioned that the company may do further studies in the future comparing Open Source software to commercial software.
The company looks for five kinds of defects in code: Memory leaks, null pointer dereferences, bad deallocations, out of bounds array access and uninitialized variables. According to Trappe, none of the errors found in the Linux TCP/IP stack were security issues. At least one of the issues, a memory leak, was fixed in the 2.4.20 kernel before Reasoning notifed the kernel team of the defects. Four of the problems found (an uninitialized variable and some out-of-bounds errors) are not truly defects, since they do not cause the code to behave incorrectly. So, of eight defects reported, four are not real, three are debatable and one has been fixed.
When taking into account the revised information, the Linux TCP/IP stack has a defect density of 0.013 per 1,000 lines of code. The implementation with the fewest defects after Linux is one of the embedded stacks, with .08 defects per 1,000 lines of code. One implementation, one of the commercial OSes, had 183 defects out of about 269,100 lines of code - 0.7 per thousand.
To be sure, the Reasoning study raises some interesting points, though there's not enough data to say conclusively that Open Source software is always of higher quality than its proprietary counterparts. The study looked only at one small piece of the Linux kernel, and only considered a small set of information. The Linux kernel has also been extensively checked for this sort of error by the Stanford checker and the new "smatch" program, so it should be relatively clean. Reasoning's study says nothing about performance or features, and it does not address the functionality of the code. However, it does supply some data in favor of the argument that open code leads to higher quality -- at least in terms of specific defects.
We'll be interested to see what kinds of studies Reasoning does in the
future, and how other Open Source projects compare to commercial code.
