Weekly Edition Return to the Kernel page |
A survey on kernel quality
As has been reported on LWN
recently, Andrew Morton has been heard to worry that bugs are being added
to the kernel more quickly than they are being fixed. But it is hard to
know for sure. In an attempt to obtain a little more data on the problem,
Andrew has asked LWN to run a survey of its subscribers. The results will,
hopefully, shed some light on how a wider part of the community sees the
kernel quality issue; they will be discussed at the upcoming kernel summit.
This opportunity is an honor for LWN subscribers, who are seen as being more than sufficiently knowledgeable to provide good answers while being unlikely to attempt to skew the results. It is a chance for all of us to help with the development process. If you are an LWN subscriber, please take a few minutes, proceed to the survey and help out. (Log in to post comments)
A survey on kernel quality Posted Jul 10, 2006 3:20 UTC (Mon) by frazier (subscriber, #3060) [Link] This is really cool. I'm not the person to comment on kernel-specific things and thus didn't fill out the survey. Regardless, flattering that LWN Subscribers are being asked for feedback. Lots of other subscibers do have knowledge in this area. Seems like a good place to ask for feedback on the topic, myself excluded. -Brock Frazier
A survey on kernel quality Posted Jul 10, 2006 15:49 UTC (Mon) by tjc (subscriber, #137) [Link] I'm not the person to comment on kernel-specific things and thus didn't fill out the survey.Nor did I. But if the Gnome project ever asks LWN to do a quality survey, I'll have something to say...
A survey on kernel quality Posted Jul 11, 2006 14:13 UTC (Tue) by southey (subscriber, #9466) [Link] I think you should at least answer the first few questions. After all it will give the LWN and developers some idea of who is using Linux.
A survey on kernel quality Posted Jul 10, 2006 4:24 UTC (Mon) by guest (guest, #2027) [Link] Not too long ago, I heard from a RedHat developerthat the IDE ( disk and CD ) code has no one assigned.
This recent information does not explain why I can't
I understand a new IDE system has been in the wings for some time.
Default CD boots on distros should have Minimal functions
drivers: still can't use my ancient Colorado HP parallel port scanner,
pauses:
A survey on kernel quality Posted Jul 10, 2006 6:58 UTC (Mon) by nix (subscriber, #2304) [Link] Chalk one up to `posted a disparaging blog entry', I guess...
A survey on kernel quality Posted Jul 10, 2006 7:33 UTC (Mon) by Wol (guest, #4433) [Link] As I remember it, IDE is a mares nest of bugs (that's not the kernel, that's the hardware, and every new chipset includes a brand new bugset :-(
Alan tried to get the kernel driver as clean as possible WITHOUT breaking all the numerous bugfixes. Then he went on sabbatical. When he came back, he discovered that the person who'd taken over from him had revamped the code, "cleaned up" a large number of bugfixes (note - they were fixes to chipsets, not the driver) such that the driver no longer worked with those chipsets, and there was a major row.
Basically, IDE has *always* been an absolute nightmare to maintain. Probably the reason Windows has far fewer problems is that (a) Windows code is as big a mess as the chipsets, and (b) Windows needs to support far fewer chipsets because they have a habit of obsoleting hardware... one occasion where that policy actually makes good programming sense, rather than just good business sense...
Cheers,
A survey on kernel quality Posted Jul 10, 2006 11:30 UTC (Mon) by superstoned (subscriber, #33164) [Link] well, most irritations i have are IDE related - now and then, a new kernel version introduces less stable performance, ie stalls etc. lately things have ben good, btw, I think the kernel is getting better, not worse. but it might not go fast enough, and i'm sure if you have newer hardware, you might be bugged a lot more often.
Won't install/boot on older machines Posted Jul 11, 2006 12:51 UTC (Tue) by vonbrand (subscriber, #4458) [Link] It most probably doesn't work because today most distros assume i686 or equivalent. Without further details, very little can be said. Have you reported your troubles to the relevant distributions? Did you ask Google, searched the relevant mailing lists?
A survey on kernel quality Posted Jul 10, 2006 6:38 UTC (Mon) by iabervon (subscriber, #722) [Link] I'm not sure what to say about the two probable kernel bugs I've encountered.
One is that on one of my systems the BIOS occasionally reserves different memory, and then a USB hub gets assigned the same IRQ as the mouse, and mouse interrupts never arrive. I'm running 2.6.8.1, and I have no idea if it's been fixed or noticed by anyone else, and it happens rarely and can be worked around by just rebooting.
The other is that my server running 2.6.15-gentoo-r1 somehow confused one of its hard drives, such that it kept giving errors suggesting that the hard drive wasn't understanding commands. Then it tried a bus reset, which didn't fix it, and then I rebooted. Everything worked, no corruption, hard drive doesn't report any media problems. It had been running for a while, and it hasn't happened again, but it hasn't been running that long.
There doesn't seem to be any way of responding for bugs where I couldn't collect enough information to make a real bug report, and can't repeat the failure to determine if it affects later versions.
For what it's worth, this seems to me to be in line with other stable series I've used previously.
A survey on kernel quality Posted Jul 10, 2006 10:00 UTC (Mon) by ken (subscriber, #625) [Link] I started to fill out the forms but soon realised that my type of bugs that really bother me really did not fit into the questions asked.
My problem is that I have since I started to use 2.6 kernels had several hard lockups where the machine just stops working even ping was not reponding on the few times I tried.
Since I'm a software developer myself I know just how impossible it would be to make any progress on that report so I have never done one but this type of hangs did really not happen before.
And yes I have a binary only nvidia blob that is a really good candidate for this type of hard lockup but still it not 100% it's nvidias fault.
A survey on kernel quality Posted Jul 10, 2006 12:09 UTC (Mon) by MathFox (subscriber, #6104) [Link] As a developer, I once wrote a device driver for a non-FOSS OS. From my experience I can tell that these kinds of "random lockups" are hard to debug: The bugs usually are timing-sensitive and adding debug statements to the code makes the bug go away. Furthermore there is no easy way to get the information out of the computer when the kernel hangs.The best way to make progress here is to find a workload that makes reproducing the bug easy (having it occur once every day) and instrument the computer with "bus snooping" hardware (logic analysers, etc.) that can provide you with a log of the activity in the milliseconds before the crash.
N.B. This kinds of Heisenbugs are influenced by any attempt to pin them down; some species can reliably detect hardware probes.
Hard lockups Posted Jul 10, 2006 21:17 UTC (Mon) by ringerc (guest, #3071) [Link] Do the keyboard LEDs blink (if running under X11, as I assume you are given your use of the NVidia drivers) when the machine crashes?
If so, it'll be panicing. You should be able to retrive a dump of the panic by hooking another machine up to the crashing box with a null modem cable (xover serial, essentially) and booting with serial console. There's plenty of info on how to do this on the 'net. If you can reproduce the fault _without_ the NVidia drivers loaded and send off that dump info along with a hardware summary etc, then you might actually have a bug report to make.
Hard lockups Posted Jul 11, 2006 12:55 UTC (Tue) by vonbrand (subscriber, #4458) [Link] In my experience, hard lockups without any traces in the logs are due to hardware faults, notably CPU overheating.
A survey on kernel quality Posted Jul 13, 2006 9:05 UTC (Thu) by jschrod (subscriber, #1646) [Link] Same here, also hard lockups, regularly every 2nd or 3rd night. I also had nvidia loaded and suspected it at first, but using nv brought the same result.
Eventually, I traced it to the ionice call in the updatedb cron job. (This is SUSE 10.0, btw.) Discarding ionice caused my system to run smoothly. Since I have no problems at all with my disks in other heavy usage, I really suspect that it's a kernel problem.
Joachim
A survey on kernel quality Posted Jul 10, 2006 13:12 UTC (Mon) by arcticwolf (guest, #8341) [Link] Strange - I always thought the difference between subscribers and non-subscribers was that the former pay money, nothing else. I must've missed the part where you restricted subscriptions to those who could prove that they're "sufficiently knowledgeable"...
Or maybe I should just resubscribe; I sure could use an instant boost of increased knowledge, not to mention the maturity that would make it less likely that I'd attempt skew the results.
After all, this kind of process is already successfully used by other forums, too, like the lkml, for example, so it must be a good idea to limit the ability to provide feedback to those who pay for the priviledge of being able to do so.
(N.B.: Just in case, I am definitely not opposed to LWN's subscription model in general, even though I cannot afford a subscription anymore these days. I just think that while requiring subscriptions for early access to feature articles, the weekly edition etc. makes sense, the same does not hold true when it comes to a *survey*. Not to mention that concerns about skewed results seem a bit silly when the survey will be available to everyone in the not too distant future, anyway. In any case... I know you're unlikely to change anything as a result of this comment, but consider it some unsolicited feedback from someone who at least cares enough to provide feedback, even if he's just a member of the unwashed non-subscribing Lumpenproletariat masses. :))
Subscriber-only Posted Jul 10, 2006 13:38 UTC (Mon) by corbet (editor, #1) [Link] Please do resubscribe - but do note that we require new subscribers to pass a kernel hacking test first :)This particular pool of people was chosen (by the relevant kernel folks, not by LWN) in an attempt to get around some of the worst problems with web surveys. Somebody who finds LWN worth paying for is reasonably likely to have at least some experience in the area of interest. But, just as importantly, they are relatively unlikely to attempt to skew the survey for reasons of their own. As soon as you open a web survey to everybody, you ask for a bunch of responses from Hank the Angry Drunken Windows User. Closing the survey to non-subscribers will exclude quite a few people who would otherwise provide good input. (And it will stay closed to them, BTW; the survey as a whole closes before the subscription period ends). That is an unfortunate result. But putting out a wide-open survey risks wiping out what little significance this exercise has.
A survey on kernel quality Posted Jul 10, 2006 15:53 UTC (Mon) by jimmybgood (guest, #26142) [Link] As arcticwolf mentions the suggestion that being a paid subscriber makes one more knowledgeable is nonsense. I do have the money to subscribe and would gladly pay it, were that to be true.
The difference between subscribers and nonsubscribers is largely that subscribers identify professionally with Linux as a career and are doing well enough at it to afford the subscription. I would think this makes them _more_ biased not less. Imagine if you were to ask a group of automotive engineers if automobiles were becoming less fuel efficient or less safe. Of course, they would be concerned about the image of the industry that supports their livelihood and would have a bias to protect it by minimizing problems in their responses.
While the above claim is mere conjecture, the well-recognized psychological phenomena of cognitive consonance would, if such a bias were present, lead the subjects to convince themselves that they were not biased.
So long as Andrew Morton is aware of this bias, though, the bias should be fairly predictable and can be corrected for in a rough manner. In other words, if LWN subscribers claim that the kernel quality is staying stable, it's probably really declining.
Designing surveys that produce meaningful responses is not easy. One poster has already reported that he can't fit his observations into the survey's structure. Are the survey questions going to be released for public perusal or will they remain permanently restricted?
A survey on kernel quality Posted Jul 10, 2006 18:04 UTC (Mon) by Los__D (subscriber, #15263) [Link] No the difference is that subscribers are willing to support LWN... I'm a student, and have close to _NO_ money, I still cashed out to help LWN...
A survey on kernel quality Posted Jul 13, 2006 14:44 UTC (Thu) by lysse (subscriber, #3190) [Link] I gave up my widows' mites too; LWN is worth it.
However, since for the most part I regard 2.4 series kernels as those nasty newfangled things I grudgingly have to put up with, I don't see my input to this survey being that useful...
Subscribe because LWN helps the developers (me thinks) Posted Jul 10, 2006 20:03 UTC (Mon) by jstAusr (subscriber, #27224) [Link] There are lots of reasons to subscribe, if it helps someone who helps the community and has a long history of helping - that seems to be a good reason to me.
jstAusr
A survey on kernel quality Posted Jul 11, 2006 12:27 UTC (Tue) by vblum (guest, #1151) [Link] I agree that, here, excluding the knowledgable non-subscribers is really justified because and only because you close the door on a lot of potential noise. I am sure though that no one will mind if non-subscribers post their valuable experiences in a less formalized way in the comments section here - and all are helped.
Very much off topic: If you have the money to subscribe and would gladly pay it, then be glad that you were not around when Jon Corbet announced the end of LWN due to financial difficulties - or else you might be paying after all.
Honestly, no one on LWN's side ever demanded subscriptions - in fact, the opposite is true, Jon Corbet explicitly ruled them out for many years. It was LWN's readership who asked for a subscription option - and loudly so - when Jon Corbet announced that LWN would be closed due to lack of funding.
It is a good question whether I _have_ the money to subscribe to LWN. I do not pay for any other subscriptions whatsoever. This one I pay for; because I was there when there was no other option. Perhaps subscribing to LWN does make a good filter to find people that definitively care about Linux - at least they care enough to keep a high-quality Linux publication going which would otherwise have ceased to exist, quietly and modestly. Sure, you have lots of false negatives, people who really care but don't subscribe. But then, if you do really care, write up your kernel bugs as a comment, no?
Noise is better than bias Posted Jul 11, 2006 18:31 UTC (Tue) by jimmybgood (guest, #26142) [Link] I'm going to try real hard once more to explain my point, which seems to have been missed.
If you want to find out something with a survey, a large noisy sample is far better than a small biased sample. If Andrew Morton thinks he's going to find out something by limiting his survey to a small sample of "knowledgeable" respondents, he's making a mistake. He works in an environment where expert knowledge is highly desirable and out of habit, he imagines that experts might be able to tell him whether the kernel is getting buggier.
Even studying the kernel itself with objective code analysis tools is not a valid way to answer his questions, because those tools have recently been applied to the kernel. Many of the bugs those tools can detect have already been fixed, so the bias will tend to make the kernel appear to be less buggy than it really is. A survey is a good approach to finding out if the Linux kernel is getting buggier.
I have no proof that LWN subscribers are biased, but it is generally accepted that professional societies _are_ biased. I think LWN functions more as a professional society than as a social group.
If you want a good survey, procure as many unique responses from as wide a sample as you can get.
Noise is better than bias Posted Jul 11, 2006 19:36 UTC (Tue) by nix (subscriber, #2304) [Link] Fine, so figure out a way of preventing a single malicious attacker from poisoning an open survey by means of an auto-submission robot. Avoid penalizing multiple people behind a single proxy, and detect a single malicious attacker routing false requests via a network such as tor or a bunch of compromised hosts (he needn't be *running* said botnet: there are vast numbers of known botted hosts with open proxies running on them; he can use some of those).
Until you've done that, a scheme whereby survey responses are tied to single entities (like the registered-subscriber scheme) is needed.
Bias BS Posted Jul 11, 2006 21:48 UTC (Tue) by s_cargo (guest, #10473) [Link] I think your objections would be valid if Andrew Morton asked to survey kernel developers. He did not. This is where your earlier analogy regarding automotive engineers falls flat. I consider it a completely reasonable assumption that LWN subscribers are "serious" users with no motivation for creating any false sense of kernel quality.And to be blunt, you and I have paid nothing to keep LWN going. If everyone were "freeloading" as we are, there wouldn't be any LWN to conduct a survey in the first place. You should be grateful you only have to wait one week to have access to what subscribers have paid to access. So stop your whining.
Noise is better than bias Posted Jul 13, 2006 14:53 UTC (Thu) by lysse (subscriber, #3190) [Link] > If you want to find out something with a survey, a large noisy sample is far better than a small biased sample.
Not if the bias actually consists of a useful property you want to capture, not if the large noisy sample results in such a poor signal-to-noise ratio that anything meaningful descends to the level of statistical insignificance, and not if the noise itself is biased.
Not listed Posted Jul 10, 2006 13:58 UTC (Mon) by ikm (subscriber, #493) [Link] Why isn't this article listed under the 'Recent features' on the left? Some subscribers might miss it as it goes down in the list of news.
A survey on kernel quality Posted Jul 10, 2006 16:13 UTC (Mon) by g2boojum (subscriber, #152) [Link] The survey seems to be missing an obvious response to how one might respond to a kernel bug: "Used ${favorite-search-engine} to discover that thebug had already been found and fixed." That's been the case for all of the 2.6 bugs that have hit me.
Survey II Your favorite kernel bug? Posted Jul 10, 2006 17:09 UTC (Mon) by jimmybgood (guest, #26142) [Link] Who can forget the Thanksgiving day, 2001 "Greased Turkey" kernel? It contained an ext2fs bug that caused massive file system corruption the _second_ time the file system was mounted. Even the best make mistakes. This one was particularly hard to detect, because it didn't show up the first time you booted into the kernel.
Any other favorites?
Survey II Your favorite kernel bug? Posted Jul 10, 2006 20:34 UTC (Mon) by pr1268 (subscriber, #24648) [Link] usbhid.c in 2.6.15 The kernel wouldn't even load because of a kernel in the USB drivers somewhere - the root cause seemed to be where someone replaced all occurrences of some pointer variable but missed one in usbhid.c and I got a "cannot dereference null pointer" crash at boot-up. The ironic thing is that since this didn't happen on my laptop, and the only USB device attached to my desktop that wasn't attached to my laptop was a Microsoft Force Feedback Pro joystick, then it was the likely cause... Yes, I took the above survey and made the appropriate choices for this bug. I have been fortunate not to experience the issues above with the IDE driver (or filesystem driver) others have had. In fact, this is the only bug I've noticed with 2.6 other than some differences of opinion between Jorg Schilling's cdrtools code and 2.6 kernels in general (not really a bug but rather a minor inconvenience). The USB crash was fixed by 2.6.15.4 or so (but 2.6.16 was released shortly after that). Overall, I've been pleased with the kernel but I sometimes worry that it will get so huge and bloated as to become the same mess Windows Vista has become. I am forever grateful for all the kernel developers' hard work, especially Andrew Morton's dedication. Thank you, Andrew!
A survey on kernel quality Posted Jul 10, 2006 22:37 UTC (Mon) by dang (subscriber, #310) [Link] This is actually a bit tricky because the line betweeen a kernel issue and a distro issue is not always clear ( nor are differences between distro mgmt response time and kernel developer response time always clear.... ). And as the sanest way to experience the kernel is through the lens of a distro....
Or maybe I've just been fortunate enough to not have had nagging, pernicous kernel issues at home or at work. Still, I wonder if Andrew would really ought to be polling the distro maintainers rather than the end-users.
A survey on kernel quality Posted Jul 10, 2006 23:42 UTC (Mon) by dlang (subscriber, #313) [Link] I'm sure that he is (along with polling other kernel developers)
A survey on kernel quality Posted Jul 11, 2006 0:26 UTC (Tue) by jd (subscriber, #26381) [Link] List of bugs I have personally experienced:
Although not bugs I know of, there are plenty of areas where the code is ugly and possibly hiding bugs that haven't been found. This is definitely true in a lot of the QoS code and multicast code. Definitely not bugs, but feqtures I'd definitely like to see added: KTau kernel profiling, BLUE and GREEN QoS methods, DSM and RDMA support (ccNUMA is so 20th century) and a better underlying mechanism for kernel-to-kernel communication so clustering projects can get somewhere useful. Hey, whilst I'm at it, multicast DSM and multicast RDMA would be awesome and much-needed. Useful information: IPv6 coders might want to run the various TAHI compliance suites. At present, nobody seems to know if/how Linux is standards-compliant - either vanilla or with the USAGI patches. If USAGI turns out to be greatly superior than the standard IPv6 implementation, then the maintainers need to EITHER do some serious work on the code, OR apply the patches. If the existing system is better, then we need to know that too - if only so us IPv6 users can stop patching our kernels.
big iron Posted Jul 11, 2006 2:07 UTC (Tue) by xoddam (subscriber, #2322) [Link] Careful, or the rest of us will start coveting your high-end hardware :-)
A survey on kernel quality, my 2 cents Posted Jul 13, 2006 4:50 UTC (Thu) by filteredperception (subscriber, #5692) [Link] I started filling out the survey, but bailed when I realized it wasn't suited to ingest the input that I think I have which is useful. My inputis this-
Most of the kernel bugs I've run into had to do with bleeding edge non-
However, the general answer I have for mainline stuff is this-
Has anyone made a time plot of the kernel update packages to redhat/fedora
Really, that may be a criticism for fedora. I wonder if it's possible that
I guess I just long for the day when the average time between kernel
Survey is kind of straitjacket-like Posted Jul 14, 2006 19:07 UTC (Fri) by kingdon (subscriber, #4526) [Link] The survey didn't have choices for "don't know", "no relevant experience", "none of the above", or free-form essay responses. So although I filled it out, I'm not sure whether my responses will be interpreted the way I intended.
My biggest issue with the kernel as a user has been the non-merging of a driver for the rt2500 wireless (http://sourceforge.net/projects/rt2400/). Which, as I understand it, might have a few causes but the lack of an agreed upon in-tree wireless stack implementation is probably high on the list.
So based on my personal experience I would say "worry more about features, not so much about bugs" ;-). I don't think I'm really advocating that, but the one thing which keeps biting me in my own use is a missing feature, not a bug....
I do get a few lockups and similar strange symptoms, but I have neither the ability nor desire to track down whether that is the kernel or hardware or what.
survey inflexibility Posted Jul 15, 2006 0:42 UTC (Sat) by roelofs (subscriber, #2599) [Link] I don't have direct experience with any 2.6 bugs (unless you count Ubuntu's failure to boot from HD using 2.6 due to a missing device node in the ramdisk, IIRC, but I've been blaming that on Ubuntu). However, I am aware of several that came up at work, all with Red Hat's 2.6.9 kernel and all (except maybe the most recent) fixed in the same version AFAIK. That particular use case doesn't really fit the survey, however; its "fixed in" option starts at 2.6.10, and I have no clue when/how the RH fixes might have gotten submitted "upstream."Just another ill-fitting data point... Greg
VFAT corruption bug Posted Jul 24, 2006 21:16 UTC (Mon) by barrygould (guest, #4774) [Link] I've been having a VFAT/FAT32 corruption problem in recent Fedora kernels:https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=195440
I've been running Fedora on the same machine with the same HD for a few years, so I'm pretty sure this is something that popped up in one of the more recent kernels.
Feel free to contact me if more information is needed (my email is in the URL above).
Barry
|
Copyright © 2006, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.