Your complaints are bogus. They were discussing a potential problem and a potential fix.
Linus proposed a fix and admitted that he hadn't tested it yet. Later it was decided that
there was no actual kernel bug in this case, just poorly-written userspace code. There was
never any chance of this untested patch getting into the kernel. Context matters.
Posted Jul 17, 2008 21:40 UTC (Thu) by zooko (subscriber, #2589)
[Link]
Of course context matters -- this off-handed quip from Linus is not the issue, it is only
indicative of the issue. My impression of the issue itself comes mostly from reading LWN.net
itself cover-to-cover (excepting the "Linux in the News" and "Announcements" pages) for about
a decade, as well as lots of other general experience and reading since I started using Linux
in 1993.
If you have read the LWN kernel pages carefully for the last couple of years and you *don't*
think that my characterization of how the Linux development and QA process works is accurate,
then I'd like to know how your view of that process differs from mine.
Here, let me point to some of the most recent and most significant LWN.net articles and LKML
posts:
http://lwn.net/Articles/285088/http://lwn.net/Articles/281109/http://lwn.net/Articles/277872/http://lwn.net/Articles/281132/http://lwn.net/Articles/277967/http://lwn.net/Articles/277970/http://lwn.net/Articles/278149/
Oh, and of course the Linus Torvalds quote that started this thread.
As far as I can tell, my assertion that the kernel QA process relies on users reporting bugs
is not disputed by anyone. Andrew Morton's repeated suggestions that the result of this
process might be that new Linux kernels are buggier than old ones is disputed. My assertion
that a QA process like this one *ought to be expected* to reduce the incidence of
99-percentile bugs while continually increasing the number of rarer bugs is new (as far as I
know).
the Linux process for generating many rare flaws
Posted Jul 19, 2008 21:10 UTC (Sat) by njs (subscriber, #40338)
[Link]
>As far as I can tell, my assertion that the kernel QA process relies on users reporting bugs
is not disputed by anyone. Andrew Morton's repeated suggestions that the result of this
process might be that new Linux kernels are buggier than old ones is disputed. My assertion
that a QA process like this one *ought to be expected* to reduce the incidence of
99-percentile bugs while continually increasing the number of rarer bugs is new (as far as I
know).
I don't actually understand the reasoning behind your claim. By definition, rare, subtle bugs
are rare and subtle; any QA process is going to have trouble finding them. There's no a
priori reason to think that just changing how you do QA will let you find more of them. I
guess you're suggesting that specifically, a more "professional" QA process -- paid testers
probing edge cases, large automated regression tests, etc. -- will do an incrementally better
job at finding these bugs?
It could be true, but it isn't obvious to me. For one thing, many of those "users" you refer
to actually *are* corporations that do exactly this sort of testing. (RH, IBM, Google, ...)
Then there are the security researchers who pore over the code for obscure interactions in hope
of fame, the general userbase who has far more permutations of weird hardware than any formal
QA team could hope to access, etc.
If there's a problem with kernel quality, I don't think it's with the bug discovery process;
every meaningful bug gets tripped over by somebody. (By definition, because who cares about
bugs that never affect anybody? :-)) I'd look instead to the processes whereby these bugs --
once found -- are reported, tracked, analyzed, and fixed. That's the hard part...
the Linux process for generating many rare flaws
Posted Jul 19, 2008 23:18 UTC (Sat) by nix (subscriber, #2304)
[Link]
The problem is the bugs with security implications that nobody spots *but*
the blackhats :( they're hardly going to tell anyone about it (other than
other blackhats), but they'll use the knowledge, oh yes.
the Linux process for generating many rare flaws
Posted Jul 20, 2008 0:26 UTC (Sun) by njs (subscriber, #40338)
[Link]
Oh, sure, that's a problem -- though I'd guess one place bug reports quietly flow from is
people quietly watching black hat message boards -- but how is it a problem unique to Linux's
development/QA process?
the Linux process for generating many rare flaws
Posted Jul 20, 2008 11:06 UTC (Sun) by nix (subscriber, #2304)
[Link]
It isn't, of course. If anything we can get better metrics on bug density,
because the source is open and all bugfixes to public trees visible (even
if they aren't marked up as 'security' all the time, they're all bugs and
from the POV of stopping them happening it doesn't matter if they're a
security bug or a 'mere' data-corruption bug).
On that note, am I the only one who considers it utterly bizarre that
crash bugs are considered by some more serious than data-corruption bugs
merely because some of the crash bugs are remotely-triggerable, while
data-corruption bugs rarely are? The consequences of data-corruption bugs
are so much worse, yet the crash bugs are 'security holes! patch now!'...
the Linux process for generating many rare flaws
Posted Jul 20, 2008 18:08 UTC (Sun) by zooko (subscriber, #2589)
[Link]
(About whether the Linux kernel development process is more likely to introduce security holes
than alternative development processes.)
"It isn't, of course."
What -- where do you get your confidence? I think that it is plausible that the Linux kernel
development process produces more bugs and security holes than alternative processes, such as
for example the way that OpenBSD or Solaris are developed. (I also think that the Linux
development process produces new features and improvements faster than the OpenBSD process
does.)
I'm not entirely confident of this -- I could be wrong. But how did you become so confident
of the opposite hypothesis?
the Linux process for generating many rare flaws
Posted Jul 20, 2008 18:56 UTC (Sun) by nix (subscriber, #2304)
[Link]
The question was whether the problem was *unique* to Linux's development
process. Of course it isn't. Proprietary systems have security holes too.
You don't need 'confidence' to know that.
the Linux process for generating many rare flaws
Posted Jul 20, 2008 22:01 UTC (Sun) by njs (subscriber, #40338)
[Link]
I read "it isn't, of course" as responding to my question about how black-hat scrutiny was
something unique to Linux's development process. These threads get a little spread out...
I would still be curious to hear your response to my original post, because a priori I don't
see why any one of Linux/Solaris/OpenBSD's models should be better. (Actually, I don't have a
lot of confidence in OpenBSD myself, because I've gotten the impression that in general it's
buggier -- probably just due to lack of manpower, and prioritizing security features
proportionately higher than non-security testing and bugfixes. And I don't like non-security
bugs much better than security bugs.)
the Linux process for generating many rare flaws
Posted Jul 21, 2008 3:52 UTC (Mon) by jzbiciak (✭ supporter ✭, #5246)
[Link]
I think the point is that an ad hoc testing process can only go so far. Without a more
directed testing process that explicitly targets corner conditions, you can only say you've
found and fixed the common cases, for an ever expanding definition of "common."
If you have a system with very tight reliability requirements (think, for instance, medical
equipment, ABS brake controllers, life support systems), such a model is inappropriate.
I do disagree with the notion that the number of bugs above a certain level of rarity
monotonically increases with time. If that were true, in the limit you'll end up with a
system that fails every time, in a completely different way every time it's used. No
repeatable bugs, but rather a custom bug every time. That defies belief.
Of course that's a ridiculous endpoint, but we have occasionally seen the precursors to that
in localized areas that start gaining a reputation of being pervasively flaky. (Linux's VM
some years ago, for example.)
When a subsystem starts developing behavior that's too subtle and too quirky to adequately
analyze and diagnose, it gets reworked. Rather than fix the bugs directly, they get replaced
wholesale. We've seen that a number of times with varying levels of success. The main point
is that any quirks that were accumulating there now have been flushed away.
the Linux process for generating many rare flaws
Posted Jul 21, 2008 6:19 UTC (Mon) by nix (subscriber, #2304)
[Link]
Quite so. I suspect this will always be true for something like the
kernel. In a sense it's true even for perfectly reproducible systems like
compilers, but with the kernel, half the things can't have testsuites
because they require specific hardware, and the other half can't have
complete enough testsuites because a lot of the bugs are races or
otherwise timing-dependent and can't be reliably reproduced. (The mm
subsystem is a classic example of the latter.)
Filesystems can *probably* have good testsuites, but I'm at a loss with
respect to most of the rest.
the Linux process for generating many rare flaws
Posted Jul 21, 2008 7:22 UTC (Mon) by njs (subscriber, #40338)
[Link]
IIRC iptables has a build system setup where they link the same code into either the kernel or
a user-space test harness... but that's just another example of the filesystem case.
the Linux process for generating many rare flaws
Posted Jul 24, 2008 13:06 UTC (Thu) by zooko (subscriber, #2589)
[Link]
Hello njs:
I see that I didn't explain my thinking very clearly.
I think that there are a few ways that you could have a development process that produces
fewer rare, subtle bugs. They probably come at the cost of also producing good code slower.
One of these is relying more on code inspection and less on experiment, for your
bug-detection. Then there is "writing more carefully" in the first place, which is I suppose
just like code inspection done by the original author. That one presumably means you get less
written per day.
Another is emphasizing automated tests vs. manual tests. I really like the way that in the
Twisted Python project, it is pretty hard to get the developers to fix a bug without
submitting a unit test that deterministically demonstrates that bug. The Twisted project, as
a policy (more or less) doesn't apply patches to code that isn't tested, and doesn't add
features or fix bugs unless the feature or bug is exercised by a unit test.
Another is "designing for testability". I work with an excellent programmer named Brian
Warner and I sometimes see him do something like "Hm, doing it this way would have plenty of
good properties, but I can't figure out how to write a unit test that would exercise that all
of that code in a deterministic way. I think I'll do it a different way.".
Some of these may be inappropriate for kernel code (Twisted Python, and the project that Brian
Warner and I work on, are both free of the obligation to deal with hardware, for example). Or
some of them may be appropriate for kernel code in general, but would be inappropriate for
Linux's goals of rapid evolution and high performance. (For example, OpenBSD apparently does
extensive code review.) But I hope that these give you some ideas about what I am thinking of
as possible alternatives.
In general, I have the suspicion that any development process that emphasizes producing code
quickly in Step 1, and then QA'ing it later in Step 2, is likely to add subtle rare bugs, as
contrasted with more "integrated" development process in which specific techniques intended to
prevent subtle rare bugs are part of Step 1. This is doubly true when Step 2 is largely ad
hoc, i.e. it is not systematic or automated.
This general strategy -- rely on lots of users and downstream distributors and the like to do
lots of manual testing for you -- seems to be a core part of the Linux development process
culture. (You can always tell that an idea has a profound effect on a culture when the
members of that culture think that there is no possible alternative. :-))
I do wish that there were a keen technical and cultural observer like Jon Corbet who wrote
detailed analyses of the development processes of other operating systems, starting with
OpenBSD and Solaris.
the Linux process for generating many rare flaws
Posted Jul 30, 2008 17:01 UTC (Wed) by mcortese (guest, #52099)
[Link]
In general, I have the suspicion that any development process that emphasizes producing code
quickly in Step 1, and then QA'ing it later in Step 2, is likely to add subtle rare bugs, as
contrasted with more "integrated" development process in which specific techniques intended to
prevent subtle rare bugs are part of Step 1.
What you say is called "First Time Quality" by lean production specialists. They, like you, seem pretty sure that doing it right the first time is better than doing it quickly and then refining it later.
I've been looking for a proof of that for a long time, but to date with no success.