LWN.net Logo

A survey on kernel quality

A survey on kernel quality

Posted Jul 10, 2006 10:00 UTC (Mon) by ken (subscriber, #625)
Parent article: A survey on kernel quality

I started to fill out the forms but soon realised that my type of bugs that really bother me really did not fit into the questions asked.

My problem is that I have since I started to use 2.6 kernels had several hard lockups where the machine just stops working even ping was not reponding on the few times I tried.

Since I'm a software developer myself I know just how impossible it would be to make any progress on that report so I have never done one but this type of hangs did really not happen before.

And yes I have a binary only nvidia blob that is a really good candidate for this type of hard lockup but still it not 100% it's nvidias fault.


(Log in to post comments)

A survey on kernel quality

Posted Jul 10, 2006 12:09 UTC (Mon) by MathFox (guest, #6104) [Link]

As a developer, I once wrote a device driver for a non-FOSS OS. From my experience I can tell that these kinds of "random lockups" are hard to debug: The bugs usually are timing-sensitive and adding debug statements to the code makes the bug go away. Furthermore there is no easy way to get the information out of the computer when the kernel hangs.
The best way to make progress here is to find a workload that makes reproducing the bug easy (having it occur once every day) and instrument the computer with "bus snooping" hardware (logic analysers, etc.) that can provide you with a log of the activity in the milliseconds before the crash.

N.B. This kinds of Heisenbugs are influenced by any attempt to pin them down; some species can reliably detect hardware probes.

Hard lockups

Posted Jul 10, 2006 21:17 UTC (Mon) by ringerc (subscriber, #3071) [Link]

Do the keyboard LEDs blink (if running under X11, as I assume you are given your use of the NVidia drivers) when the machine crashes?

If so, it'll be panicing. You should be able to retrive a dump of the panic by hooking another machine up to the crashing box with a null modem cable (xover serial, essentially) and booting with serial console. There's plenty of info on how to do this on the 'net. If you can reproduce the fault _without_ the NVidia drivers loaded and send off that dump info along with a hardware summary etc, then you might actually have a bug report to make.

Hard lockups

Posted Jul 11, 2006 12:55 UTC (Tue) by vonbrand (subscriber, #4458) [Link]

In my experience, hard lockups without any traces in the logs are due to hardware faults, notably CPU overheating.

A survey on kernel quality

Posted Jul 13, 2006 9:05 UTC (Thu) by jschrod (subscriber, #1646) [Link]

Same here, also hard lockups, regularly every 2nd or 3rd night. I also had nvidia loaded and suspected it at first, but using nv brought the same result.

Eventually, I traced it to the ionice call in the updatedb cron job. (This is SUSE 10.0, btw.) Discarding ionice caused my system to run smoothly. Since I have no problems at all with my disks in other heavy usage, I really suspect that it's a kernel problem.

Joachim

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds