I think the point is that an ad hoc testing process can only go so far. Without a more
directed testing process that explicitly targets corner conditions, you can only say you've
found and fixed the common cases, for an ever expanding definition of "common."
If you have a system with very tight reliability requirements (think, for instance, medical
equipment, ABS brake controllers, life support systems), such a model is inappropriate.
I do disagree with the notion that the number of bugs above a certain level of rarity
monotonically increases with time. If that were true, in the limit you'll end up with a
system that fails every time, in a completely different way every time it's used. No
repeatable bugs, but rather a custom bug every time. That defies belief.
Of course that's a ridiculous endpoint, but we have occasionally seen the precursors to that
in localized areas that start gaining a reputation of being pervasively flaky. (Linux's VM
some years ago, for example.)
When a subsystem starts developing behavior that's too subtle and too quirky to adequately
analyze and diagnose, it gets reworked. Rather than fix the bugs directly, they get replaced
wholesale. We've seen that a number of times with varying levels of success. The main point
is that any quirks that were accumulating there now have been flushed away.
Posted Jul 21, 2008 6:19 UTC (Mon) by nix (subscriber, #2304)
[Link]
Quite so. I suspect this will always be true for something like the
kernel. In a sense it's true even for perfectly reproducible systems like
compilers, but with the kernel, half the things can't have testsuites
because they require specific hardware, and the other half can't have
complete enough testsuites because a lot of the bugs are races or
otherwise timing-dependent and can't be reliably reproduced. (The mm
subsystem is a classic example of the latter.)
Filesystems can *probably* have good testsuites, but I'm at a loss with
respect to most of the rest.
the Linux process for generating many rare flaws
Posted Jul 21, 2008 7:22 UTC (Mon) by njs (subscriber, #40338)
[Link]
IIRC iptables has a build system setup where they link the same code into either the kernel or
a user-space test harness... but that's just another example of the filesystem case.