LWN.net Logo

New bugs and old bugs

By Jonathan Corbet
December 12, 2007
As the 2.6.24 release slowly gets closer, the desire to shrink the list of known regressions grows. As can be seen from the current list (as of just before 2.6.24-rc5), there is still some work yet to be done. That list is long enough that, as Linus pointed out in the -rc5 announcement, the traditional holiday release may not happen this year.

One of those regressions is a failure of a certain model of DVD drive to work with the 2.6.24-rc kernels; this drive works fine with 2.6.23. A look at the corresponding bugzilla entry shows that quite a bit of effort has been expended (by both developers and testers) toward tracking this one down, but, as of this writing, its exact cause remains unknown. So there is not (again, as of this writing) a well-defined fix for the problem.

What is known is which patch broke the device. Tejun Heo describes it this way: "It's introduced by setting ATAPI transfer chunk size to actual transfer size which is the right thing to do generally." The current development code (destined for 2.6.25) works just fine with this device, but that would be far too big a patch to put into the 2.6.24 kernel at this stage in the cycle. So Tejun (along with others) continues to look for a simpler fix. He also has a backup plan:

If we fail to find out the solution in time, we always have the alternative of backing out the ATAPI transfer chunk size update. This will break some other cases which were fixed by the change but those won't be regressions at least and we can add transfer chunk size update with other changes to 2.6.25.

This plan drew an immediate complaint from Alan Cox, who notes that backing out this fix will break quite a few devices which had finally been made to work while fixing only one which is known to have problems with the new code. This change, he says, "...is nonsensical and not in the general good". Alan would rather take the hit of breaking one device for the benefit of making a larger number of others work properly for the first time. If need be, the failing drive could be handled via a special blacklist in 2.6.24.

That idea, however, was firmly shot down by Linus:

"The one off regression" is likely the tip of an iceberg. If something regresses for one person, for that one person who tested and noticed and made a bug-report, there's probably a thousand people who haven't even tested the development kernel, or who had problems and just went back to the previous version.

In contrast, reverting something will be guaranteed to not have those kinds of issues, since the only people who could notice are people for who it never worked in the first place. There's no "silent mass of people" that can be affected.

In recent years, as the complexity of the kernel (and concerns about its quality) have grown, the development community has taken an increasingly hard line against regressions. As Linus points out above, regressions cause visible problems for people whose systems were once working; that is a clear way to lose testers and (eventually) users. On the other hand, something which has never worked, and which still does not work, does not make life worse for Linux users. For this reason, the avoidance of regressions has become one of the highest development priorities.

There is another, related reason: the aforementioned kernel quality concerns. One can easily ask whether the quality of the kernel is improving or not, but truly answering that question is not an easy thing to do. A better kernel may, by attracting additional users, actually result in more bug reports; similarly, a buggier kernel may drive testers away, with the result that the number of reported bugs goes down. One cannot simply look at the lists of known problems and come to a reasonably defensible conclusion as to whether a given kernel is better than another or not.

What one can do, however, is ensure that everything which works now continues to work in future versions. If working things do not break, then, on the assumption that other problems are occasionally being fixed, it is reasonable to conclude that the kernel is getting better. If regressions are allowed, instead, then one never really knows. Regressions thus are the closest thing we have to an objective measurement of the quality of a given kernel release, and fixing regressions is an unambiguous way of improving that quality. So it's no wonder that the higher priority placed on improving kernel quality has led to a stronger focus on regressions.

Anybody who has watched Alan Cox's work knows that he cares deeply about the quality of the kernel. But he thinks that the anti-regression policy is being taken a little too far this time around:

To blindly argue regressions are critical is sometimes (as in this case) to argue that "this freeway is no longer compatible with a horse and cart" means the freeway should be turned back into a dirt road.

It may yet be that a proper fix for this problem will be found for 2.6.24, at which point the larger change can go through. Failing that, though, it appears that the horses and carts will win the day for now. Those needing the full freeway will have to wait until the horse-compatible version becomes available in 2.6.25.

(Update: it appears that the problem has now been fixed.)


(Log in to post comments)

New bugs and old bugs

Posted Dec 13, 2007 17:09 UTC (Thu) by iabervon (subscriber, #722) [Link]

The other thing about lack of features versus regressions is that it's easier to apply a patch
in order to get your system working in the first place and apply that patch to each kernel
version until it's included in general than it is to find a patch to keep your working system
working. As long as the progression for a given system is always from "it can't be made to
work" to "it requires messing with" to "it just works", lagging in the middle isn't too much
of an issue. And, of course, for distro kernels, they can decide whether to ship a kernel with
the patch in advance of its inclusion in a mainline kernel and deal with the potential fallout
themselves, so long as the base they're working from always improves and each patch eventually
gets merged in some form.

New bugs and old bugs

Posted Dec 19, 2007 0:57 UTC (Wed) by hingo (guest, #14792) [Link]

(Update: it appears that the problem has now been fixed.)

Another good reason to keep an uncompromising attitude: if compromises are not allowed, often somebody will be determined enough to actually fix the whole problem. Good job!

Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds