By Jonathan Corbet
June 3, 2009
Retrying core dump writes: Paul Smith posted a patch that would retry short or interrupted
writes while dumping core, thus preventing the creation of an incomplete
core dump when a signal arrives. Alan Cox NAK-ed the patch noting: "The existing behaviour is an absolute godsend when you've something like
a core dump stuck on an NFS mount or something trying to core dump to
very slow media." But the idea did lead to some interesting
discussion of which signals should cause a core dump to be
interrupted—thus leaving a short core file—and which should be
ignored.
There is an inherent difference between some interactive program
that is dumping core which a user might wish to interrupt with
SIGINT versus a non-interactive process which the user or
developer might wish to finish its core dump.
Smith describes one scenario: "a worker process might appear unresponsive due to a core being dumped
and the parent would decide to shoot it with SIGINT based on various
timeouts etc." No decision was made, but Roland McGrath analyzed four signal categories and noted that
at least two of the categories needed to be addressed as they are
mishandled by the current code.
Device tree. The Open Firmware "device tree" is a description of a
system's hardware configuration in a standardized data structure. Some
platforms have used device trees to separate the description of the
hardware from the kernel running on that hardware; that, in turn, allows
one kernel to support a wider variety of systems. Janboe Ye recently proposed adding device tree support to the ARM
architecture, which arguably supports the widest variety of hardware of
all. That has, in turn, led to a long discussion of how much device tree
really helps, and how feasible it is to create a single kernel for all
systems of a given architecture.
Developers of architectures using device tree seem to be happy with the
results; see this
2008 OLS paper [PDF] for a description of how things went with the
PowerPC architecture. Maintainers of other architectures are less
convinced, though. ARM maintainer Russell King worries that device tree could turn out to be
an expensive dead end; he would like to see a subset of ARM architectures
converted first to find out whether it is likely to work well or not. An
incremental approach probably makes sense in general, so that's how things
are likely to go.
The "host protected area" is an IDE concept which allows a
controller to hide a portion of a drive from the operating
system's view. When HPA was introduced years ago, its primary use was to
make large drives (by the standards of the day) appear small so that
certain legacy operating systems would not be confused. Linux, naturally,
never had any such problem, so the Linux IDE layer would traditionally
disable the HPA during the probing process. That was the right thing to do
at the time; it allowed Linux systems to make use of the entire drive.
It has been a while since operating systems required protection from the
shock of seeing an overly-large drive. But the HPA remains for
different reasons. Vendors will use the HPA to stash RAID information, for
example. Windows systems often come with a full "reinstall this system
from the beginning" recovery image - apparently a useful feature on that
platform. Rootkits sometimes hide information there. And so on. In all
cases but the last, it is probably a mistake for the operating system to
overwrite the HPA on contemporary systems. So turning off HPA protection
by default is no longer the right thing to do.
The libata driver subsystem has observed the HPA since the beginning, but
the IDE code retains its old default. That could change, though, with a patch set posted by IDE
maintainer Bartlomiej Zolnierkiewicz. These patches will cause the IDE
layer to preserve the HPA by default - unless the drive has partitions
which cover the HPA already. That test should be enough to ensure that
older systems continue to function while avoiding trashing the HPA on newer
drives. For systems not properly covered by this change, the
nohpa module parameter can be used to control HPA behavior
directly.
reflink(). There's another reflink()
proposal out there. This one simplifies the preserve argument
slightly, replacing the set of flags with an all-or-none option for now.
So reflink() can be used in the full snapshot mode (with suitable
privilege) or in the reflink-as-copy mode, but with no options in between.
Control over process IDs. The proposed checkpoint/restart feature
has a number of challenges to overcome. One of those is that processes can
become very confused if their process ID changes suddenly. So restarting a
checkpointed process requires that the process's old ID be restored as
well. The use of PID namespaces can help to ensure that the requisite IDs
are available, but there's no way in Linux to request that a process be
started with a specific ID.
Sukadev Bhattiprolu has a
proposal for a new system call to address this problem:
clone_with_pids(). It would behave like ordinary
clone(), but it takes an additional argument being an array of
process IDs. The array contains one desired process ID for each namespace
in the current hierarchy, with the first being the global namespace.
Deeply-nested processes can, thus, be created with a specific ID in each
namespace where it will appear.
This patch has been "gently tested" and not posted outside of the
containers list, so it has seen relatively little review thus far. Expect
some changes if this code starts to get closer to the mainline.
(
Log in to post comments)