Comparing free and proprietary software defect rates
[Posted February 12, 2003 by corbet]
[This article was contributed by Joe 'Zonker'
Brockmeier]
Tuesday a company called
Reasoning,
Inc. released a study that seems to prove what Open Source
developers have been saying for years: Open code, and the inspection
that it allows, produces a better product. Specifically, the company
compared the Linux TCP/IP stack against a number of commercial TCP/IP
stacks and found that the Linux implementation had fewer defects than
other proprietary implementations.
The paper, "How Open-Source and Commercial Software Compare" is
available from Reasoning by request, so we decided to take a look at it
to see how they had reached their conclusions.
Specifically, Reasoning lined up the Linux TCP/IP implementation from
the 2.4.19 Linux kernel against five commercial implementations. In
total, out of 81,852 lines of code, Reasoning found only 8 defects in
the Linux TCP/IP code. All but one of the other five implementations
compared with Linux were at least ten years old, the other is about
three years old. The company did not name the specific operating
systems, but Reasoning's CEO Scott Trappe confirmed that two were
commercial Unix systems, one was "not Unix but in very broad use," and
the embedded implementations were by "major vendors of networking
equipment." Trappe said that Reasoning couldn't name companies
specifically, but the companies had agreed to let Reasoning use the
aggregate data.
As always, it helps to understand the company doing the research, and
the context of the research, before taking the results too seriously. We
spoke with Trappe, to clarify some information not in the white paper
and to get a feel for Reasoning's background. Reasoning is a company
that specializes in automated testing of software written in C/C++,
which it has been doing since 2001. Prior to that, the company had
specialized in Y2K testing. The company plans to add testing of Java
software to its services later this year.
The study was not commissioned by any of the Linux vendors or companies
who might be competing with Linux. Instead, Trappe said that the company
had performed the study primarily to highlight its services. Unlike
the other projects that Reasoning works on, they were free to release
their results along with specific code examples from the Linux TCP/IP
stack. Trappe also said that the company was looking to prove that
inspection itself was important in providing quality software and that
"testing alone can never uncover all the defects in software."
The company chose the TCP/IP stack because it provided a good point of
comparison. Trappe admitted that it might be stretching it to draw too
many conclusions from the study of one piece of software, but that their
study "does support some claims that it can rival commercial quality."
Trappe also mentioned that the company may do further studies in the
future comparing Open Source software to commercial software.
The company looks for five kinds of defects in code: Memory leaks, null
pointer dereferences, bad deallocations, out of bounds array access and
uninitialized variables. According to Trappe, none of the errors found
in the Linux TCP/IP stack were security issues. At least one of the
issues, a memory leak, was fixed in the 2.4.20 kernel before Reasoning
notifed the kernel team of the defects. Four of the problems found (an
uninitialized variable and some out-of-bounds errors) are not
truly defects, since they do not cause the code to behave incorrectly.
So, of eight defects reported, four are not real, three are
debatable and one has been fixed.
When taking into account the revised information, the Linux TCP/IP stack
has a defect density of 0.013 per 1,000 lines of code. The
implementation with the fewest defects after Linux is one of the
embedded stacks, with .08 defects per 1,000 lines of code. One
implementation, one of the commercial OSes, had 183 defects out of about
269,100 lines of code - 0.7 per thousand.
To be sure, the Reasoning study raises some interesting points, though
there's not enough data to say conclusively that Open Source software is
always of higher quality than its proprietary counterparts. The study
looked only at one small piece of the Linux kernel, and only considered
a small set of information. The Linux kernel has also been extensively
checked for this sort of error by the Stanford checker and the new "smatch"
program, so it should be relatively clean.
Reasoning's study says nothing about
performance or features, and it does not address the
functionality of the code. However, it does supply some data in favor of
the argument that open code leads to higher quality -- at least in terms
of specific defects.
We'll be interested to see what kinds of studies Reasoning does in the
future, and how other Open Source projects compare to commercial code.
(
Log in to post comments)