Reasoning Inc. has released the
results of a study of MySQL. "Reasoning's inspection study
shows that the code quality of MySQL was six times better than that of
comparable proprietary code. A key quality indicator is defect density,
which is defined as the number of defects found per thousand lines of
source code. In its latest study, Reasoning found 21 software defects in
236,000 lines of MySQL source code. The defect density of the MySQL code
was 0.09 defects per thousand lines of source code. Using a benchmark that
covered over 200 recent projects totaling 35 million lines of commercial
code, Reasoning found that the commercial average defect density of these
projects came to 0.57 defects per thousand lines of source code."
(Log in to post comments)
Reasoning Study Reveals Code Quality of MySQL
Posted Dec 15, 2003 17:55 UTC (Mon) by jamienk (subscriber, #1144)
[Link]
Does this mean that if mySQL releases a version 4.01.r with all 21 "defects" fixed that a new Reasoning study would now find that this version has 0 defects per thousand lines of source code?
Reasoning Study Reveals Code Quality of MySQL
Posted Dec 15, 2003 18:04 UTC (Mon) by dwheeler (subscriber, #1216)
[Link]
Yes, changing the code to remove these bugs would mean that there would be "0 bugs" reported by the tool -- unless new problems are inserted. But that would be true for the proprietary programs too. The comparison of open source software vs. proprietary software is only interesting as a comparison if NEITHER used Reasoning's tool before, and then the tool is used to examine both. Since there are defects identified in both sets, that's probably true.
There are certainly going to be defects missed by the tool, and the tool probably has false positives too (it did last time).
However, if the tool is used the same way on both sets there's still a case to be made that this is a valid test. It's certainly evidence suggesting a link.
Note that there was an earlier study comparing Linux's TCP/IP stack to other's stacks using the same tool, with similar results.
Definition of Defects
Posted Dec 15, 2003 18:07 UTC (Mon) by ncm (subscriber, #165)
[Link]
As for the previous press release from Reasoning, Inc., it's
worth pointing out that their measures of defect density are
defined by what their tool flags. This sort of tool is
subject to high rates of false positives, which means its
output must be vetted thoroughly to discover the real faults
among the "defects" it identifies. They also have an even
higher rate of false negatives, which are faults in program
logic or specification that the tool can know nothing about.
Neither of its failings cripple it as a tool for improving code
quality, but they do make statistics based on its output hard to
interpret. It may be that MySQL was coded with better discipline,
or it may be that MySQL authors are more clever at burying faults
in their code. Most likely, it means somebody has been running
valgrind on MySQL and combing out the bugs it reveals, and also
the sources of spurious noise in its output. While the latter
aren't faults, they are likely to show up as defects in source code
analysis.
Thus, the evidence for quality that Reasoning reports may be indirect.
Rather than reporting the true fault density, they are reporting on
the degree to which the project uses methods other than ordinary
testing to improve quality.
The true value of tools like Reasoning's, and of very different tools
like valgrind, as well as of traditional unit testing, lies not so
much in the actual bugs they reveal, but in how they call attention
to problem areas in the code. Bugs come in swarms, so that if either
tool detects a defect, there are probably real faults nearby, even if
not at the exact point the tool identifies.
Definition of Defects
Posted Dec 15, 2003 18:49 UTC (Mon) by JoeBuck (subscriber, #2330)
[Link]
Reasoning doesn't just run code through their tool and report what it says; human beings pre-filter the results before they write them up. This means that the reported MySQL defects are those that the Reasoning folks believe are real flaws, not false positives. The fact that they operate this way, rather than selling their tool as a product, possibly means either that the tool needs handholding, or produces large numbers of false positives, or is based partly on GPL code (meaning that the only way to keep their work proprietary is not to distribute it), or some combination.
Of course, any false negatives (flaws not detected by their methodology) remain.
Definition of Defects
Posted Dec 15, 2003 21:05 UTC (Mon) by ncm (subscriber, #165)
[Link]
I seem to recall from their analysis of the Linux TCP/IP
stack that, of the defects they identified, only one was considered
"real" by the kernel hackers. Maybe I'm hallucinating. Or maybe
Reasoning and the kernel hackers disagreed about what amounts to
a real defect.
I didn't realize that Reasoning doesn't release their tool
to customers. That's an interesting business model.
Definition of Defects
Posted Dec 15, 2003 18:52 UTC (Mon) by JoeBuck (subscriber, #2330)
[Link]
Tools like valgrind can only find a flaw if you have a testcase that tweaks the flaw. If the code contains a buffer overflow, valgrind won't help you find it if no input testcase overflows the buffer.
Static analysis, on the other hand, can find many buffer overflows without any test cases.
Reasoning Study Reveals Code Quality of MySQL
Posted Dec 15, 2003 19:18 UTC (Mon) by leandro (guest, #1460)
[Link]
Do they mention that less complexity correlates with higher quality, and that MySQL is simple by shifting code to your application?
Theoretically, another DBMS such as PostgreSQL could have lesser quality due to higher complexity, yet a deployed system could be of higher quality due to a simpler applicative.
Reasoning Study Reveals Code Quality of MySQL
Posted Dec 15, 2003 20:12 UTC (Mon) by lakeland (subscriber, #1157)
[Link]
Not really. These tests could be roughly summarised as checking for ugly code. Specifically, code that appears buggy to both a machine and a human.
I think that open source programmers know their code is going to be seen by others, so don't tend to write as much ugly looking code. I don't think the result says very much about real defects or mysql vs postgres, etc.
Reasoning Study Reveals Code Quality of MySQL
Posted Dec 18, 2003 13:54 UTC (Thu) by dps (subscriber, #5725)
[Link]
Before you can prove a program defect free you need a specification. Databases tables could be characterised as partial functions that map their primary keys to data. Searches are inverses of the functions applied to a target set, etc.
Once you have a specification you *can* prove a program implements it and a complete proof guarantees no bugs. Typical examples take display a trivial program and proceed to take several pages to prove it works. 200,000 line programs could in theory be proved correct too but it is obviously impractial to do so.
Simpler analysis might be sufficient to identify bugs like derefencing NULL, use of uninitialised data, etc and it might be possible to automate it. I assume this is what reasoning inc's tool does and this will miss things like the code not performing the right function.
Reasoning Study Reveals Code Quality of MySQL
Posted Dec 22, 2003 14:26 UTC (Mon) by shane (subscriber, #3335)
[Link]
Once you have a specification you *can* prove a program implements it and a complete proof guarantees no bugs.
The problem with this assertion is that specifications are can be no better than the requirements. There is an inverse relationship between the ability of humans to understand the requirements, and the rigorousness of the language used to write them. To say this another way, people can't easily understand how pages full of algebraic symbols relate to their real-world needs.
People can understand natural language (that is English, Russian, etc.), and most requirements are written in a semi-formal natural language style. But finding inconsistencies and ambiguities in this sort of framework is difficult - certainly not something you can run through your favourite Prolog interpreter and have it spit out problems!
Domain specialists (which is a fancy way to say "people who know about the subject matter, rather than software") can be trained to understand the language of formal proofs, but the cost of this is so high that very, very few projects bother with this.