Kernel bugs: out of control?

Posted May 11, 2006 13:54 UTC (Thu) by k8to (guest, #15413)
In reply to: Kernel bugs: out of control? by malor
Parent article: Kernel bugs: out of control?

My impression is that of someone who feels that they know how to engineer
software quality more successfully than the linux kernel developers, but
yet does not know how to engineer software. Perhaps I have made a
mistake in my readings.

In complex software, security problems crop up, patches are released, and
installing them is a certain level of annoyance. If the approach of
continuing to re-engineer interfaces and systems to eliminate categories
of problems offends you, then the linux kernel in general should offend
you, since this has been the mode of operation since day one. There _are_
other free unixes which have a much more conservative approach. They are
not horrible.

Kernel bugs: out of control?

Posted May 11, 2006 14:46 UTC (Thu) by malor (guest, #2973) [Link] (3 responses)

Linux used to be stable while also doing those things in a development branch. It no longer is: the development branch and the mainline kernel are one and the same thing, forcing us all into alpha testing.

No, I'm not a developer, but I have been using Linux a long, LONG time. (in around kernel 0.8 or 0.9). So I'm certainly qualified to comment on the way it used to be (stable) and the way it is now (unstable). The development process and lack of focus on quality would appear to be the cause.

Do you have an alternate explanation?

Kernel bugs: out of control?

Posted May 11, 2006 15:03 UTC (Thu) by k8to (guest, #15413) [Link] (2 responses)

I agree that the kernel development is being handled differently now,
which is resulting in a larger number of releases in the stable line. I
do not agree that this indicates a lower level of quality. I think it is
simply factual that the one does not necessarily imply the other.

Kernel releases with corrected functionality are being created faster
than in the past, enabling users to get these fixes sooner. If users
feel the need to reboot for every one of these updates, then the fixes
may be seen as something of a nuisance. However, the alternative is to
not apply the fixes. In the past, the choice was not available, fixes
were provided less frequently and less rapidly, and so there was a longer
window of vulnerability and no possibility for frequent reboots. You can
simulate the old situation by installing fewer kernels.

Kernel bugs: out of control?

Posted May 13, 2006 20:19 UTC (Sat) by Baylink (guest, #755) [Link] (1 responses)

This sub-thread speaks to a topic near and dear to my heart: what does a version number *mean*?

Let me quote here my contribution to the Wikipedia page on the topic, based on my 20 years of observation of various software packages:

A different approach is to use the major and minor numbers, along with an alphanumeric string denoting the release type, i.e. 'alpha', 'beta' or 'release candidate'. A release train using this approach might look like 0.5, 0.6, 0.7, 0.8, 0.9 == 1.0b1, 1.0b2 (with some fixes), 1.0b3 (with more fixes) == 1.0rc1 (which, if it's stable enough) == 1.0. If 1.0rc1 turns out to have bugs which must be fixed, it turns into 1.0rc2, and so on. The important characteristic of this approach is that the first version of a given level (beta, RC, production) must be identical to the last version of the release below it: you cannot make any changes at all from the last beta to the first RC, or from the last RC to production. If you do, you must roll out another release at that lower level.
The purpose of this is to permit users (or potential adopters) to evaluate how much real-world testing a given build of code has actually undergone. If changes are made between, say, 1.3rc4 and the production release of 1.3, then that release, which asserts that it has had a production-grade level of testing in the real world, in fact contains changes which have not necessarily been tested in the real world at all.

The assertion here seems to be that an even higher level of overloading on version numbering ("even revision kernels are stable") and it's associated 'social contract' are no longer being upheld by the kernel development team.

If that's, in fact, a reasonable interpretation of what's going on, then indeed, it's probably not the best thing. I'm not close enough to kernel development to know the facts, but I do feel equipped to comment on the 'law'.

Kernel bugs: out of control?

Posted May 17, 2006 23:17 UTC (Wed) by k8to (guest, #15413) [Link]

I think your comments on versioning are not far from the mark. The fact
of these "minor" stable relases, eg. 2.6.X.Y, is that they are _smaller_
changes than have ever occurred in the stable series before. It is true
that these smaller changes do not receive widespread real-world
production evaluation, but no non-stable release kernel (rc versions
included) ever receives enough attention to catch even some showstopper
bugs.

So I think you are right to question this change, but the balancing facts
are that the release candidate process for the Linux kenel doesn't seem
very effective, and the changes made in the revision series are
_strongly_ conservative.

It is important to remember that in this particular (highly visible,
highly open) development process, there is very little pressure to
deviate from the conservative perspective in these updates.

Kernel bugs: out of control?

Posted May 11, 2006 19:04 UTC (Thu) by oak (guest, #2786) [Link] (4 responses)

> If the approach of continuing to re-engineer interfaces and
> systems to eliminate categories of problems offends you,
> then the linux kernel in general should offend you, since
> this has been the mode of operation since day one.

This reminds me of the recent change in Glibc, they now
abort programs which do double frees.

Yes, more programs may now be "appear unstable", but I personally
prefer application rather being terminated than silently corrupting
my data when they hobble forward with inconsistent state.
Broken apps should be shot down as soon as possible so that
people know to fix them, this is the Unix way.

If you don't force quality, you don't get it.
You end up with an unmaintainable mess instead.

> There _are_ other free unixes which have a much more
> conservative approach. They are not horrible.

I'm sure the person complaining here would then
complain about the lack of features and HW support...

Kernel bugs: out of control?

Posted May 11, 2006 23:24 UTC (Thu) by malor (guest, #2973) [Link] (3 responses)

I agree with you about forcing quality... that's a great idea. If I thought the new development process would actually DO that, I'd be enthusiastically behind it. Instead, it's just about speed, speed, speed... and avoiding the stuff that's no fun to do, like bugfixing and testing.

Waving your hands in the air and expecting other people to fix your programs is not, in my long experience supporting developers, the way to get it fixed, particularly not properly.

As far as switching OSes goes, I've already stopped using Linux on my firewalls because of the unending stream of security reboots. Netfilter is faster and more featureful than OpenBSD's pf, and its language is more amenable to shell scripting, but the first mission of a firewall is to stay up. I can throw OpenBSD on a firewall and not have to update it again for a couple of years. This means no downtime, which means happy users. I've never seen any Linux kernel that lasted that long without security holes.

FreeBSD is looking better all the time... I've been talking about switching over, but haven't yet. If matters continue as they have, maybe I will. And you'll have one less complaining user, which, from your tone, you may prefer.

Kernel bugs: out of control?

Posted May 15, 2006 4:27 UTC (Mon) by ChristopheC (guest, #28570) [Link] (2 responses)

I think it is unfair to say the kernel developer do not test their patches. However, they can only test them on the few combinations of hardware they have access to.

To discover the bugs, the kernel needs wide-spread testing. But few people are willing to test the development releases (-rc) - the problem has been mentioned countless times on lkml and here on lwn. So they have to release often toge tthe needed coverage. (This is a somewhat simplified explanation, of course)

Kernel bugs: out of control?

Posted May 15, 2006 5:48 UTC (Mon) by malor (guest, #2973) [Link] (1 responses)

2.6.14 broke *traceroute*. Give me a break.

Kernel bugs: out of control?

Posted May 21, 2006 17:05 UTC (Sun) by nix (subscriber, #2304) [Link]

Er, how often do you *run* traceroute? I don't run it so often myself that I'd notice immediately if it broke. It could easily be a week or so between runs...