In brief

By Jonathan Corbet
June 3, 2009

Retrying core dump writes: Paul Smith posted a patch that would retry short or interrupted writes while dumping core, thus preventing the creation of an incomplete core dump when a signal arrives. Alan Cox NAK-ed the patch noting: "The existing behaviour is an absolute godsend when you've something like a core dump stuck on an NFS mount or something trying to core dump to very slow media." But the idea did lead to some interesting discussion of which signals should cause a core dump to be interrupted—thus leaving a short core file—and which should be ignored.

There is an inherent difference between some interactive program that is dumping core which a user might wish to interrupt with SIGINT versus a non-interactive process which the user or developer might wish to finish its core dump. Smith describes one scenario: "a worker process might appear unresponsive due to a core being dumped and the parent would decide to shoot it with SIGINT based on various timeouts etc." No decision was made, but Roland McGrath analyzed four signal categories and noted that at least two of the categories needed to be addressed as they are mishandled by the current code.

Device tree. The Open Firmware "device tree" is a description of a system's hardware configuration in a standardized data structure. Some platforms have used device trees to separate the description of the hardware from the kernel running on that hardware; that, in turn, allows one kernel to support a wider variety of systems. Janboe Ye recently proposed adding device tree support to the ARM architecture, which arguably supports the widest variety of hardware of all. That has, in turn, led to a long discussion of how much device tree really helps, and how feasible it is to create a single kernel for all systems of a given architecture.

Developers of architectures using device tree seem to be happy with the results; see this 2008 OLS paper [PDF] for a description of how things went with the PowerPC architecture. Maintainers of other architectures are less convinced, though. ARM maintainer Russell King worries that device tree could turn out to be an expensive dead end; he would like to see a subset of ARM architectures converted first to find out whether it is likely to work well or not. An incremental approach probably makes sense in general, so that's how things are likely to go.

The "host protected area" is an IDE concept which allows a controller to hide a portion of a drive from the operating system's view. When HPA was introduced years ago, its primary use was to make large drives (by the standards of the day) appear small so that certain legacy operating systems would not be confused. Linux, naturally, never had any such problem, so the Linux IDE layer would traditionally disable the HPA during the probing process. That was the right thing to do at the time; it allowed Linux systems to make use of the entire drive.

It has been a while since operating systems required protection from the shock of seeing an overly-large drive. But the HPA remains for different reasons. Vendors will use the HPA to stash RAID information, for example. Windows systems often come with a full "reinstall this system from the beginning" recovery image - apparently a useful feature on that platform. Rootkits sometimes hide information there. And so on. In all cases but the last, it is probably a mistake for the operating system to overwrite the HPA on contemporary systems. So turning off HPA protection by default is no longer the right thing to do.

The libata driver subsystem has observed the HPA since the beginning, but the IDE code retains its old default. That could change, though, with a patch set posted by IDE maintainer Bartlomiej Zolnierkiewicz. These patches will cause the IDE layer to preserve the HPA by default - unless the drive has partitions which cover the HPA already. That test should be enough to ensure that older systems continue to function while avoiding trashing the HPA on newer drives. For systems not properly covered by this change, the nohpa module parameter can be used to control HPA behavior directly.

reflink(). There's another reflink() proposal out there. This one simplifies the preserve argument slightly, replacing the set of flags with an all-or-none option for now. So reflink() can be used in the full snapshot mode (with suitable privilege) or in the reflink-as-copy mode, but with no options in between.

Control over process IDs. The proposed checkpoint/restart feature has a number of challenges to overcome. One of those is that processes can become very confused if their process ID changes suddenly. So restarting a checkpointed process requires that the process's old ID be restored as well. The use of PID namespaces can help to ensure that the requisite IDs are available, but there's no way in Linux to request that a process be started with a specific ID.

Sukadev Bhattiprolu has a proposal for a new system call to address this problem: clone_with_pids(). It would behave like ordinary clone(), but it takes an additional argument being an array of process IDs. The array contains one desired process ID for each namespace in the current hierarchy, with the first being the global namespace. Deeply-nested processes can, thus, be created with a specific ID in each namespace where it will appear.

This patch has been "gently tested" and not posted outside of the containers list, so it has seen relatively little review thus far. Expect some changes if this code starts to get closer to the mainline.

In brief

Posted Jun 4, 2009 2:56 UTC (Thu) by felixfix (subscriber, #242) [Link] (1 responses)

I like this new section. More than anything, it reminds me of the ships bells I got used to in the navy -- it became a nice background event, and I suddenly realized one day that I knew what the time of day was without having to think about it, a nice change when working deep in the bowels of a ship for 12 hours at a stretch.

Thus with in brief -- as it begins to accumulate a history, such as reflink, the kernel takes on a different feeling.

Many thanks for this.

Me too

Posted Jun 6, 2009 23:46 UTC (Sat) by Velmont (guest, #46433) [Link]

Yes. I'm sorry to come with such a uninteresting comment, but I've tried writing something on the last «In brief» articles just to be able to say I really love getting these regular updates. I don't have time for LKML or other high volume news-sources, which is why I read LWN. It's the important stuff, well written and pre-selected for me. Great.

In brief

Posted Jun 4, 2009 6:44 UTC (Thu) by bronson (subscriber, #4806) [Link]

> Windows systems often come with a full "reinstall this system from the beginning" recovery image - apparently a useful feature on that platform.

That's hilarious! I always just blow away the recovery partition, I never thought about how the whole concept stinks.

Also, I like In Brief. Hope it becomes a semi-regular feature.

Wrong link

Posted Jun 4, 2009 6:55 UTC (Thu) by nikanth (guest, #50093) [Link]

Device tree. Janboe Ye recently proposed ..

In brief

Posted Jun 4, 2009 12:28 UTC (Thu) by alankila (guest, #47141) [Link]

I've written about this earlier on some LWN comment, but will say again. Observing HPA is dangerous if not every part of the toolchain is doing it the same way.

I have made Linux installations which seem to use the entire disk image, but when the time comes to make the first write to the sectors reserved by HPA (which happens sometime much later as disk fills up), I suddenly discover Linux failing to write there, claiming that the write occurs beyond the end of device.

It took a while to figure out that the install-time system (fdisk or its equivalent) apparently did not care about HPA at all. The bug has since been fixed such that install images shrink the partitions to match HPA...

HPA can be disabled with hdparm -N, first getting the last sector of the device and then setting it equal to that and persisting the change on the drive's config.