Hello njs:
I see that I didn't explain my thinking very clearly.
I think that there are a few ways that you could have a development process that produces
fewer rare, subtle bugs. They probably come at the cost of also producing good code slower.
One of these is relying more on code inspection and less on experiment, for your
bug-detection. Then there is "writing more carefully" in the first place, which is I suppose
just like code inspection done by the original author. That one presumably means you get less
written per day.
Another is emphasizing automated tests vs. manual tests. I really like the way that in the
Twisted Python project, it is pretty hard to get the developers to fix a bug without
submitting a unit test that deterministically demonstrates that bug. The Twisted project, as
a policy (more or less) doesn't apply patches to code that isn't tested, and doesn't add
features or fix bugs unless the feature or bug is exercised by a unit test.
Another is "designing for testability". I work with an excellent programmer named Brian
Warner and I sometimes see him do something like "Hm, doing it this way would have plenty of
good properties, but I can't figure out how to write a unit test that would exercise that all
of that code in a deterministic way. I think I'll do it a different way.".
Some of these may be inappropriate for kernel code (Twisted Python, and the project that Brian
Warner and I work on, are both free of the obligation to deal with hardware, for example). Or
some of them may be appropriate for kernel code in general, but would be inappropriate for
Linux's goals of rapid evolution and high performance. (For example, OpenBSD apparently does
extensive code review.) But I hope that these give you some ideas about what I am thinking of
as possible alternatives.
In general, I have the suspicion that any development process that emphasizes producing code
quickly in Step 1, and then QA'ing it later in Step 2, is likely to add subtle rare bugs, as
contrasted with more "integrated" development process in which specific techniques intended to
prevent subtle rare bugs are part of Step 1. This is doubly true when Step 2 is largely ad
hoc, i.e. it is not systematic or automated.
This general strategy -- rely on lots of users and downstream distributors and the like to do
lots of manual testing for you -- seems to be a core part of the Linux development process
culture. (You can always tell that an idea has a profound effect on a culture when the
members of that culture think that there is no possible alternative. :-))
I do wish that there were a keen technical and cultural observer like Jon Corbet who wrote
detailed analyses of the development processes of other operating systems, starting with
OpenBSD and Solaris.
Posted Jul 30, 2008 17:01 UTC (Wed) by mcortese (guest, #52099)
[Link]
In general, I have the suspicion that any development process that emphasizes producing code
quickly in Step 1, and then QA'ing it later in Step 2, is likely to add subtle rare bugs, as
contrasted with more "integrated" development process in which specific techniques intended to
prevent subtle rare bugs are part of Step 1.
What you say is called "First Time Quality" by lean production specialists. They, like you, seem pretty sure that doing it right the first time is better than doing it quickly and then refining it later.
I've been looking for a proof of that for a long time, but to date with no success.