Crash dumps with kexec

[Posted October 27, 2004 by corbet]

One of the longstanding wishlist items for the Linux kernel is a built-in crash dump capability. Companies providing support for Linux, such as vendors of "enterprise" distributions, want this capability for the help it can provide in tracking down those obnoxious problems which only happen at the customer's site. Numerous implementations exist, but none have made it into the mainline kernel. Among the reasons for this is a lack of comfort with the crash dump code itself. That code runs when the state of the system is known to be compromised; people tend to worry that the kernel, in that state, could do unpleasant things, like corrupting filesystems. Even code which takes pains to never touch a disk is not entirely to be trusted when the system is reeling from a panic.

The -mm tree contains an approach to crash dumps which may inspire a bit more trust. The core idea is to get the failing kernel out of the way entirely, as soon as possible, and to boot into a new kernel which can handle the real crash dump tasks.

The mechanism used is the kexec system call, which loads and boots directly into a new kernel. The original goal was simply to speed up reboots by avoiding the BIOS and the whole set of time-consuming boot-time rituals which it performs; it's the sort of feature which appeals to kernel developers. Kexec patches have been circulating for some time, though the call has yet to make its way into a mainline kernel.

Using kexec to perform crash dumps requires some additional work and advance planning. A separate kernel must be built to run when the crash dump capability is desired. This kernel needs to be as small as possible, and it must be specially configured to load into a portion of memory not used by the primary kernel. This kernel is also set up so that it only uses a small piece of the total system memory; it must be able to boot and run without changing the primary kernel's memory.

When the system is running, kexec is used to preload the crash dump kernel into its reserved portion of memory. If all goes well, it simply sits there, wasting memory, and never gets run. That overhead is simply the price one pays for running an enterprise-class kernel.

Should the system panic, however, the crash dump kernel has its day. The primary kernel, once it decides that something has gone drastically wrong, must first make a copy of the very bottom part of memory (it will get stepped on in the booting process). Once that is done, kexec is invoked to boot directly into the crash dump kernel. That kernel starts up, aware of all memory in the system, but only using the small portion which was reserved to it before. The result is a full, running Linux system with complete access to the memory state of the failed kernel.

To help with debugging of kernel crashes, the crash dump kernel provides a couple of mechanisms for inspecting the failed kernel's memory. The file /proc/vmcore provides the old kernel's memory as an ELF-format core dump; it can be looked at with gdb or another debugging tool. If need be, a char device (/dev/oldmem) can also be set up; it provides raw access to the old kernel's memory.

A developer might choose to work with the crash dump kernel and try to track down the problem immediately. In most deployed situations, instead, the crash dump kernel may be configured to simply copy the old kernel's memory image to a disk file somewhere, then reboot back into the primary system. The crash dump file can then be examined at leisure, perhaps by remote support staff.

The end result of all this work should be a mechanism which can be used to track down the cause of infrequent crashes at remote sites. That should be good for the stability of the kernel as a whole - and the bottom line of enterprise support companies. See Documentation/kdump.txt from the patch for more information.

Index entries for this article
Kernel	Crash dumps
Kernel	Debugging
Kernel	Kexec

Crash dumps with kexec

Posted Oct 28, 2004 9:51 UTC (Thu) by mcatkins (guest, #4270) [Link] (2 responses)

I'd never thought much about this issue before reading this article, but
a thought occured when I did: rather than trying to save a crash dump
to a disk file, why not send it over a network to a crash dump server
on another (i.e., not currently not crashing!) machine?

Thus the local disks need not be touched - one would need to have a
working network stack though (doesn't the console logging over-the-network
stuff have some useful stuff for this purpose?).

The result might be small/safe enough to be in the main kernel, and hence
not require all that crash-kernel memory!

Just a thought!

Martin

FC2 already has netdump (and netconsole) functionality

Posted Oct 28, 2004 18:46 UTC (Thu) by epithumia (subscriber, #23370) [Link] (1 responses)

Fedora Core 2 includes network crash dumping and a network console. See /etc/sysconfig/netdump and "service netdump".

FC2 already has netdump (and netconsole) functionality

Posted Oct 29, 2004 11:41 UTC (Fri) by mcatkins (guest, #4270) [Link]

Ah. That's the touble with good ideas - someone else has
usually had them first!

Thanks for letting me know!

Martin

Crash dumps with kexec

Posted Oct 28, 2004 11:45 UTC (Thu) by NRArnot (subscriber, #3033) [Link] (2 responses)

Crash dump to net is a good idea - much safer than writing with what might be corrupt software onto what might be a failing disk with valuable and retrievable data on it. Absolute worst it could do is a DoS on your network, and a faulty network card can do that anyway.

You don't need a full IP network stack to dump over the network. You need just enough to accomplish reliable 2-way communication with a crash dump receiver across a LAN - a "packet driver" for the network hardware plus a minimal protocol for handshaking with the receiver.

This wouldn't be much baggage to carry with a crash-dump kernel, though it would have to be configured for a particular network card (by the normal kernel at normal kernel boot time? )

BTW - I'm currently trying to work out what is wrong with a system that wedges hard at rare intervals, with an MTBF of about a fortnight. Crash dump would help only if there was a way to get a crash dump out of a system that appears to be responsive only to the reset switch (though maybe it's still doing something with keyboard interrupts, no way I can tell). Yes, it's probably a hardware fault (the same kernel doesn't do this on other very similar systems) - but a dump might point at what hardware. Nothing else does.

Crash dumps with kexec

Posted Oct 29, 2004 16:48 UTC (Fri) by amh (guest, #1902) [Link] (1 responses)

I don't think crach dump to net is much better than any other kind: a corrupt kernel can still corrupt your data.

You might try a watchdog timer (even a card) to help with your system problem. There are various drivers for them. And I guess a kernel dump could well help afterwards.

Crash dumps with kexec

Posted Oct 31, 2004 0:33 UTC (Sun) by brouhaha (subscriber, #1698) [Link]

I don't think crach dump to net is much better than any other kind: a corrupt kernel can still corrupt your data.

I strongly disagree. If the crashdump code doesn't even TRY to access your disk drives, it's not very likely to corrupt the filesystems thereon. Of course, the crash that lead to the invocation of the crashdump might already have done that, but you can't blame that on the crashdump code.