LWN.net Logo

Quotes of the week

That said, I didn't actually _test_ my patch. That's what users are for!
-- Linus Torvalds

6924d1ab8b7bbe5ab416713f5701b3316b2df85b is a work of art. Is it ascii-art tetris? a magic eye picture? you decide! It looks even more spectacular in gitk.
-- Dave Jones

[The
art of Ingo Molnar]
-- Ingo Molnar

But I could also see the second number as being the "year", and 2008 would get 2.8, and then next year I'd make the first release of 2009 be 2.9.1 (and probably avoid the ".0" just because it again has the connotations of a "big new untested release", which is not true in a date-based numbering scheme). And then 2010 would be 3.0.1 etc..

Anyway, I have to say that I personally don't have any hugely strong opinions on the numbering. I suspect others do, though, and I'm almost certain that this is an absolutely _perfect_ "bikeshed-painting" subject where thousands of people will be very passionate and send me their opinions on why _their_ particular shed color is so much better.

-- Linus Torvalds opens the can of worms

Indeed, I apologise for reviewing the code on a monitor that is wider than yours. If only we could make sure that all Linux developers used smaller monitors then the code quality would surely improve!
-- Herbert Xu

And we should obviously have _a_ version of the firmware available with the kernel when that is possible. But I'd hate for it to be 1:1 with a particular driver version - because at that point it smells of being a single work, and if it is more than mere aggregation it's no longer viable with most of our firmware (I don't think we have source for more than one or two cases).
-- Linus Torvalds
(Log in to post comments)

Quotes of the week

Posted Jul 17, 2008 12:01 UTC (Thu) by darwish07 (subscriber, #49520) [Link]

Andrew was really right on getting mad for not matching the kernel syntax standard. I get
**really mad** on projects code where I see completely different styles in different files.
Heck it sometimes even happen in the same source file!

And guess what, they are a lot. Especially in the old user-land -- but core and already used
-- packages.

Conforming to coding style

Posted Jul 18, 2008 18:43 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

Andrew was really right on getting mad for not matching the kernel syntax standard. I get **really mad** on projects code where I see completely different styles in different files.

I used to feel that way, until I got involved in open source code and found myself reading code from multiple projects, sometimes within the space of an hour. I got used to it, and now I don't care if the code within a single project is diverse either. I adjust, like the child of bilingual parents.

the Linux process for generating many rare flaws

Posted Jul 17, 2008 18:36 UTC (Thu) by zooko (subscriber, #2589) [Link]

 "That said, I didn't actually _test_ my patch. That's what users are for!"
-- Linus Torvalds

Sigh.  I know Linus likes to be brash, but this is, for me, a "Ha ha only serious" moment.

This really is the Linux development paradigm in a nutshell -- the core developers generate
new code and update existing code as fast as possible and without too much time and effort
spent on quality control, because they are relying on a vast and ill-organized community of
other hackers, distributions, users, etc. test it and report bugs.  (And then, as Andrew
Morton and others have repeatedly stated, most of the those bug reports go nowhere.)

From a systems/evolutionary viewpoint, this process should be expected to be very efficient at
quickly generating lots of valuable new functionality and fixing what I'll call the "99%
percentile" flaws -- the ones that show up frequently enough and are analyzable enough that
this process will fix them.  This is ESR's "to enough eyeballs all bugs are shallow" notion.

Except that it doesn't apply to *all* bugs, only to the ones that are frequent enough and
analyzable enough that they can be noticed by users, communicated effectively to developers,
and reproduced or analyzed by developers.

So by the same token, this process should be expected to generate more and more of the rare
bugs, the three-nines or four-nines bugs, i.e. the ones that are visible only in 1 run out of
10,000 runs or only for one user out of 1000 users.

And indeed, there is some circumstantial evidence (nicely chronicled by LWN.net's reporting on
the concerns of Andrew Morton and other Linux core developers) that this is exactly what it is
doing.

Now obviously there is great value in a tool which is very featureful and flexible and
performant and widely supported and swarming with rare, subtle, small bugs.  Linux is very
widely used and provides great value to its users.  But this kind of tool is not the right
tool for every job.

I really hope to find a Free and Open operating system that I can use whose core developers
are a little more...  careful.  Fortunately the new open source version of Solaris has worked
fine for me so far.  Thank goodness for diversity and competition.  Now I've got to figure out
how to gain insight into the Solaris core development practices and see if it is any
different.

the Linux process for generating many rare flaws

Posted Jul 17, 2008 19:03 UTC (Thu) by zooko (subscriber, #2589) [Link]

By the way, I posted a copy of my comment onto my blog.

the Linux process for generating many rare flaws

Posted Jul 17, 2008 20:54 UTC (Thu) by jimparis (subscriber, #38647) [Link]

Your complaints are bogus.  They were discussing a potential problem and a potential fix.
Linus proposed a fix and admitted that he hadn't tested it yet.  Later it was decided that
there was no actual kernel bug in this case, just poorly-written userspace code.  There was
never any chance of this untested patch getting into the kernel.  Context matters.

the Linux process for generating many rare flaws

Posted Jul 17, 2008 21:40 UTC (Thu) by zooko (subscriber, #2589) [Link]

Of course context matters -- this off-handed quip from Linus is not the issue, it is only
indicative of the issue.  My impression of the issue itself comes mostly from reading LWN.net
itself cover-to-cover (excepting the "Linux in the News" and "Announcements" pages) for about
a decade, as well as lots of other general experience and reading since I started using Linux
in 1993.

If you have read the LWN kernel pages carefully for the last couple of years and you *don't*
think that my characterization of how the Linux development and QA process works is accurate,
then I'd like to know how your view of that process differs from mine.

Here, let me point to some of the most recent and most significant LWN.net articles and LKML
posts:

http://lwn.net/Articles/285088/
http://lwn.net/Articles/281109/
http://lwn.net/Articles/277872/
http://lwn.net/Articles/281132/
http://lwn.net/Articles/277967/
http://lwn.net/Articles/277970/
http://lwn.net/Articles/278149/

Oh, and of course the Linus Torvalds quote that started this thread.

As far as I can tell, my assertion that the kernel QA process relies on users reporting bugs
is not disputed by anyone.  Andrew Morton's repeated suggestions that the result of this
process might be that new Linux kernels are buggier than old ones is disputed.  My assertion
that a QA process like this one *ought to be expected* to reduce the incidence of
99-percentile bugs while continually increasing the number of rarer bugs is new (as far as I
know).

the Linux process for generating many rare flaws

Posted Jul 19, 2008 21:10 UTC (Sat) by njs (guest, #40338) [Link]

>As far as I can tell, my assertion that the kernel QA process relies on users reporting bugs
is not disputed by anyone.  Andrew Morton's repeated suggestions that the result of this
process might be that new Linux kernels are buggier than old ones is disputed.  My assertion
that a QA process like this one *ought to be expected* to reduce the incidence of
99-percentile bugs while continually increasing the number of rarer bugs is new (as far as I
know).

I don't actually understand the reasoning behind your claim.  By definition, rare, subtle bugs
are rare and subtle; any QA process is going to have trouble finding them.  There's no a
priori reason to think that just changing how you do QA will let you find more of them.  I
guess you're suggesting that specifically, a more "professional" QA process -- paid testers
probing edge cases, large automated regression tests, etc. -- will do an incrementally better
job at finding these bugs?

It could be true, but it isn't obvious to me.  For one thing, many of those "users" you refer
to actually *are* corporations that do exactly this sort of testing.  (RH, IBM, Google, ...)
Then there are the security researchers who pore over the code for obscure interactions in hope
of fame, the general userbase who has far more permutations of weird hardware than any formal
QA team could hope to access, etc.

If there's a problem with kernel quality, I don't think it's with the bug discovery process;
every meaningful bug gets tripped over by somebody.  (By definition, because who cares about
bugs that never affect anybody? :-))  I'd look instead to the processes whereby these bugs --
once found -- are reported, tracked, analyzed, and fixed.  That's the hard part...

the Linux process for generating many rare flaws

Posted Jul 19, 2008 23:18 UTC (Sat) by nix (subscriber, #2304) [Link]

The problem is the bugs with security implications that nobody spots *but* 
the blackhats :( they're hardly going to tell anyone about it (other than 
other blackhats), but they'll use the knowledge, oh yes.

the Linux process for generating many rare flaws

Posted Jul 20, 2008 0:26 UTC (Sun) by njs (guest, #40338) [Link]

Oh, sure, that's a problem -- though I'd guess one place bug reports quietly flow from is
people quietly watching black hat message boards -- but how is it a problem unique to Linux's
development/QA process?

the Linux process for generating many rare flaws

Posted Jul 20, 2008 11:06 UTC (Sun) by nix (subscriber, #2304) [Link]

It isn't, of course. If anything we can get better metrics on bug density, 
because the source is open and all bugfixes to public trees visible (even 
if they aren't marked up as 'security' all the time, they're all bugs and 
from the POV of stopping them happening it doesn't matter if they're a 
security bug or a 'mere' data-corruption bug).

On that note, am I the only one who considers it utterly bizarre that 
crash bugs are considered by some more serious than data-corruption bugs 
merely because some of the crash bugs are remotely-triggerable, while 
data-corruption bugs rarely are? The consequences of data-corruption bugs 
are so much worse, yet the crash bugs are 'security holes! patch now!'...

the Linux process for generating many rare flaws

Posted Jul 20, 2008 18:08 UTC (Sun) by zooko (subscriber, #2589) [Link]

(About whether the Linux kernel development process is more likely to introduce security holes
than alternative development processes.)

"It isn't, of course."

What -- where do you get your confidence?  I think that it is plausible that the Linux kernel
development process produces more bugs and security holes than alternative processes, such as
for example the way that OpenBSD or Solaris are developed.  (I also think that the Linux
development process produces new features and improvements faster than the OpenBSD process
does.)

I'm not entirely confident of this -- I could be wrong.  But how did you become so confident
of the opposite hypothesis?

the Linux process for generating many rare flaws

Posted Jul 20, 2008 18:56 UTC (Sun) by nix (subscriber, #2304) [Link]

The question was whether the problem was *unique* to Linux's development 
process. Of course it isn't. Proprietary systems have security holes too.

You don't need 'confidence' to know that.

the Linux process for generating many rare flaws

Posted Jul 20, 2008 22:01 UTC (Sun) by njs (guest, #40338) [Link]

I read "it isn't, of course" as responding to my question about how black-hat scrutiny was
something unique to Linux's development process.  These threads get a little spread out...

I would still be curious to hear your response to my original post, because a priori I don't
see why any one of Linux/Solaris/OpenBSD's models should be better.  (Actually, I don't have a
lot of confidence in OpenBSD myself, because I've gotten the impression that in general it's
buggier -- probably just due to lack of manpower, and prioritizing security features
proportionately higher than non-security testing and bugfixes.  And I don't like non-security
bugs much better than security bugs.)

the Linux process for generating many rare flaws

Posted Jul 21, 2008 3:52 UTC (Mon) by jzbiciak (✭ supporter ✭, #5246) [Link]

I think the point is that an ad hoc testing process can only go so far.  Without a more
directed testing process that explicitly targets corner conditions, you can only say you've
found and fixed the common cases, for an ever expanding definition of "common."

If you have a system with very tight reliability requirements (think, for instance, medical
equipment, ABS brake controllers, life support systems), such a model is inappropriate.

I do disagree with the notion that the number of bugs above a certain level of rarity
monotonically increases with time.  If that were true, in the limit you'll end up with a
system that fails every time, in a completely different way every time it's used.  No
repeatable bugs, but rather a custom bug every time.  That defies belief.

Of course that's a ridiculous endpoint, but we have occasionally seen the precursors to that
in localized areas that start gaining a reputation of being pervasively flaky.  (Linux's VM
some years ago, for example.)

When a subsystem starts developing behavior that's too subtle and too quirky to adequately
analyze and diagnose, it gets reworked.  Rather than fix the bugs directly, they get replaced
wholesale.  We've seen that a number of times with varying levels of success.  The main point
is that any quirks that were accumulating there now have been flushed away.

the Linux process for generating many rare flaws

Posted Jul 21, 2008 6:19 UTC (Mon) by nix (subscriber, #2304) [Link]

Quite so. I suspect this will always be true for something like the 
kernel. In a sense it's true even for perfectly reproducible systems like 
compilers, but with the kernel, half the things can't have testsuites 
because they require specific hardware, and the other half can't have 
complete enough testsuites because a lot of the bugs are races or 
otherwise timing-dependent and can't be reliably reproduced. (The mm 
subsystem is a classic example of the latter.)

Filesystems can *probably* have good testsuites, but I'm at a loss with 
respect to most of the rest.

the Linux process for generating many rare flaws

Posted Jul 21, 2008 7:22 UTC (Mon) by njs (guest, #40338) [Link]

IIRC iptables has a build system setup where they link the same code into either the kernel or
a user-space test harness... but that's just another example of the filesystem case.

the Linux process for generating many rare flaws

Posted Jul 24, 2008 13:06 UTC (Thu) by zooko (subscriber, #2589) [Link]

Hello njs:

I see that I didn't explain my thinking very clearly.

I think that there are a few ways that you could have a development process that produces
fewer rare, subtle bugs.  They probably come at the cost of also producing good code slower.

One of these is relying more on code inspection and less on experiment, for your
bug-detection.  Then there is "writing more carefully" in the first place, which is I suppose
just like code inspection done by the original author.  That one presumably means you get less
written per day.

Another is emphasizing automated tests vs. manual tests.  I really like the way that in the
Twisted Python project, it is pretty hard to get the developers to fix a bug without
submitting a unit test that deterministically demonstrates that bug.  The Twisted project, as
a policy (more or less) doesn't apply patches to code that isn't tested, and doesn't add
features or fix bugs unless the feature or bug is exercised by a unit test.

Another is "designing for testability".  I work with an excellent programmer named Brian
Warner and I sometimes see him do something like "Hm, doing it this way would have plenty of
good properties, but I can't figure out how to write a unit test that would exercise that all
of that code in a deterministic way.  I think I'll do it a different way.".

Some of these may be inappropriate for kernel code (Twisted Python, and the project that Brian
Warner and I work on, are both free of the obligation to deal with hardware, for example).  Or
some of them may be appropriate for kernel code in general, but would be inappropriate for
Linux's goals of rapid evolution and high performance.  (For example, OpenBSD apparently does
extensive code review.)  But I hope that these give you some ideas about what I am thinking of
as possible alternatives.

In general, I have the suspicion that any development process that emphasizes producing code
quickly in Step 1, and then QA'ing it later in Step 2, is likely to add subtle rare bugs, as
contrasted with more "integrated" development process in which specific techniques intended to
prevent subtle rare bugs are part of Step 1.  This is doubly true when Step 2 is largely ad
hoc, i.e. it is not systematic or automated.

This general strategy -- rely on lots of users and downstream distributors and the like to do
lots of manual testing for you -- seems to be a core part of the Linux development process
culture.  (You can always tell that an idea has a profound effect on a culture when the
members of that culture think that there is no possible alternative. :-))

I do wish that there were a keen technical and cultural observer like Jon Corbet who wrote
detailed analyses of the development processes of other operating systems, starting with
OpenBSD and Solaris.

the Linux process for generating many rare flaws

Posted Jul 30, 2008 17:01 UTC (Wed) by mcortese (guest, #52099) [Link]

In general, I have the suspicion that any development process that emphasizes producing code quickly in Step 1, and then QA'ing it later in Step 2, is likely to add subtle rare bugs, as contrasted with more "integrated" development process in which specific techniques intended to prevent subtle rare bugs are part of Step 1.

What you say is called "First Time Quality" by lean production specialists. They, like you, seem pretty sure that doing it right the first time is better than doing it quickly and then refining it later.

I've been looking for a proof of that for a long time, but to date with no success.

the Linux process for generating many rare flaws

Posted Jul 24, 2008 8:57 UTC (Thu) by eduperez (guest, #11232) [Link]

So, what you are saying about linux is:

a) New features and enhancements are developed and released quickly.
b) Only a few hard-to-bump bugs reach users.

Well, that seems fantastic, doesn't it?

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds