LWN.net Logo

Crash dumps with kexec

Crash dumps with kexec

Posted Oct 28, 2004 11:45 UTC (Thu) by NRArnot (subscriber, #3033)
Parent article: Crash dumps with kexec

Crash dump to net is a good idea - much safer than writing with what might be corrupt software onto what might be a failing disk with valuable and retrievable data on it. Absolute worst it could do is a DoS on your network, and a faulty network card can do that anyway.

You don't need a full IP network stack to dump over the network. You need just enough to accomplish reliable 2-way communication with a crash dump receiver across a LAN - a "packet driver" for the network hardware plus a minimal protocol for handshaking with the receiver.

This wouldn't be much baggage to carry with a crash-dump kernel, though it would have to be configured for a particular network card (by the normal kernel at normal kernel boot time? )

BTW - I'm currently trying to work out what is wrong with a system that wedges hard at rare intervals, with an MTBF of about a fortnight. Crash dump would help only if there was a way to get a crash dump out of a system that appears to be responsive only to the reset switch (though maybe it's still doing something with keyboard interrupts, no way I can tell). Yes, it's probably a hardware fault (the same kernel doesn't do this on other very similar systems) - but a dump might point at what hardware. Nothing else does.


(Log in to post comments)

Crash dumps with kexec

Posted Oct 29, 2004 16:48 UTC (Fri) by amh (guest, #1902) [Link]

I don't think crach dump to net is much better than any other kind: a corrupt kernel can still corrupt your data.

You might try a watchdog timer (even a card) to help with your system problem. There are various drivers for them. And I guess a kernel dump could well help afterwards.

Crash dumps with kexec

Posted Oct 31, 2004 0:33 UTC (Sun) by brouhaha (subscriber, #1698) [Link]

I don't think crach dump to net is much better than any other kind: a corrupt kernel can still corrupt your data.
I strongly disagree. If the crashdump code doesn't even TRY to access your disk drives, it's not very likely to corrupt the filesystems thereon. Of course, the crash that lead to the invocation of the crashdump might already have done that, but you can't blame that on the crashdump code.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds