By Jonathan Corbet
December 12, 2007
As the 2.6.24 release slowly gets closer, the desire to shrink the list of
known regressions grows. As can be seen from
the current list (as of just before
2.6.24-rc5), there is still some work yet to be done. That list is long
enough that, as Linus pointed out in the -rc5 announcement, the traditional
holiday release may not happen this year.
One of those regressions is a failure of a certain model of DVD drive to
work with the 2.6.24-rc kernels; this drive works fine with 2.6.23. A look
at the
corresponding bugzilla entry shows that quite a bit of effort has been
expended (by both developers and testers) toward tracking this one down,
but, as of this writing, its exact cause remains unknown.
So there is not (again, as of this writing) a well-defined fix for the problem.
What is known is which patch broke the device. Tejun Heo describes it this way: "It's introduced
by setting ATAPI transfer chunk size to actual transfer size which is the
right thing to do generally." The current development code
(destined for 2.6.25) works just fine with this device, but that would be
far too big a patch to put into the 2.6.24 kernel at this stage in the
cycle. So Tejun (along with others) continues to look for a simpler fix.
He also has a backup plan:
If we fail to find out the solution in time, we always have the
alternative of backing out the ATAPI transfer chunk size update.
This will break some other cases which were fixed by the change but
those won't be regressions at least and we can add transfer chunk
size update with other changes to 2.6.25.
This plan drew an immediate complaint from
Alan Cox, who notes that backing out this fix will break quite a few
devices which had finally been made to work while fixing only one which is
known to have problems with the new
code. This change, he says, "...is nonsensical and not in the
general good". Alan would rather take the hit of breaking one
device for the benefit of making a larger number of others work properly
for the first time. If need be, the failing drive could be handled via a
special blacklist in 2.6.24.
That idea, however, was firmly shot down by
Linus:
"The one off regression" is likely the tip of an iceberg. If
something regresses for one person, for that one person who tested
and noticed and made a bug-report, there's probably a thousand
people who haven't even tested the development kernel, or who had
problems and just went back to the previous version.
In contrast, reverting something will be guaranteed to not have
those kinds of issues, since the only people who could notice are
people for who it never worked in the first place. There's no
"silent mass of people" that can be affected.
In recent years, as the complexity of the kernel (and concerns about its
quality) have grown, the development community has taken an increasingly
hard line against regressions. As Linus points out above, regressions cause
visible problems for people whose systems were once working; that is a
clear way to lose testers and (eventually) users. On the other hand,
something which has never worked, and which still does not work,
does not make life worse for Linux users. For this reason, the avoidance
of regressions has become one of the highest development priorities.
There is another, related reason: the aforementioned kernel quality
concerns. One can easily ask whether the quality of the kernel is
improving or not, but truly answering that question is not an easy thing to
do. A better kernel may, by attracting additional users, actually result
in more bug reports; similarly, a buggier kernel may drive testers away,
with the result that the number of reported bugs goes down. One cannot
simply look at the lists of known problems and come to a reasonably
defensible conclusion as to whether a given kernel is better than another
or not.
What one can do, however, is ensure that everything which works now
continues to work in future versions. If working things do not break,
then, on the assumption that other problems are occasionally being fixed,
it is reasonable to conclude that the kernel is getting better. If
regressions are allowed, instead, then one never really knows. Regressions
thus are the closest thing we have to an objective measurement of the
quality of a given kernel release, and fixing regressions is an unambiguous
way of improving that quality. So it's no wonder that the higher priority
placed on improving kernel quality has led to a stronger focus on
regressions.
Anybody who has watched Alan Cox's work knows that he cares deeply about
the quality of the kernel. But he thinks that the anti-regression policy
is being taken a little too far this time
around:
To blindly argue regressions are critical is sometimes (as in this
case) to argue that "this freeway is no longer compatible with a
horse and cart" means the freeway should be turned back into a dirt
road.
It may yet be that a proper fix for this problem will be found for 2.6.24,
at which point the larger change can go through. Failing that, though, it
appears that the horses and carts will win the day for now. Those needing
the full freeway will have to wait until the horse-compatible version
becomes available in 2.6.25.
(Update: it appears that
the problem has now been fixed.)
(
Log in to post comments)