By Jonathan Corbet
May 13, 2009
Editor's note: it's no secret that far more happens on the kernel
mailing lists than can ever be reported on this page. As a result,
interesting discussions and developments often slip by without a mention
here. This article is the beginning of an experimental attempt to improve
that situation. The idea is to briefly mention important topics which have
not, yet, been developed into a full Kernel Page article. Some items will
be followups from previous discussions; others may foreshadow full articles
to come.
The "In brief" article will probably not appear every week. But, if it
works out, it should become a semi-regular feature filling out LWN's kernel
coverage. Comments are welcome.
reflink(): the proposed reflink() system call was covered last week. Since then,
there have been some followup postings. reflink() v2, posted on
May 7, maintained the reflink-as-snapshot semantics. When asked about
that decision, Joel Becker responded
"reflink() is a snapshotting call, not a kitchen sink." It
seemed
like there was to be no comfort for those wanting reflink-as-copy
semantics.
reflink() v4, posted on the 11th, changed
that tune somewhat. In this version, a process which either (1) owns the
target file, or (2) has sufficient capabilities will create a link which
copies the original security information - reflink-as-snapshot,
essentially. A process lacking ownership and privilege, but having read
access to the target file, will get a reflink with "new file" security
information - reflink-as-copy. The idea is to do the right thing in all
situations, but some developers are now concerned about a system call which
has different semantics for processes running as root. This conversation
has a while to go yet.
devtmpfs was also covered
last week. This patch, too, has been reposted; the resulting
conversation, again, looks to go on for a while. The return of devfs was
always going to be controversial; the first version, after all, inspired
flame wars for years before being merged. The devtmpfs developers feel
that they need this feature to provide distributions which boot quickly and
reliably in a number of situations; others think that there are better
solutions to the problem. There is no consensus on merging this code at
this time, but it is worth noting that the discussion has slowly shifted
away from general opposition and toward fixing problems with the code.
Wakelocks are back, but now the facility has been rebranded suspend block. The core idea is
the same: it allows code in kernel or user space to keep the system from
suspending for a brief period of time. The user-space API has changed;
there is now a /dev/suspend_blocker device which provides a couple
of ioctl() calls. Closing the device releases the block,
eliminating a potential problem with the wakelock API where a failed
process could leave a block in place indefinitely.
There has been relatively little discussion of the new code; either
everybody is happy with it now, or nobody has really noticed the new
posting yet.
Doctor, it HZ. Much of the kernel is now tickless and equipped with
high-resolution timers. So, says
Alok Kataria, there is really no need to run x86 systems with a 1ms
clock tick anymore. Running with HZ=1000 measurably slows the execution of
a CPU-bound loop. So why not lower it?
There are problems with a lower HZ value, though, many of which have, at
their source, the same problem which makes HZ=1000 more expensive: the
kernel is still not truly tickless. Yes, the periodic clock interrupt is
turned off when the processor is idle. But, when the CPU is busy, the
clock ticks away as usual. Making the system fully tickless is a harder
job than just making the idle state tickless; among other things, it pretty
much requires doing away with the jiffies variable and all that
depends on it. But, until that happens, lowering HZ will have costs of its
own.
Wu Fengguang has been trying for a while to extend /proc/kpageflags,
his patch adds a great deal
of information about the usage of memory in the system. One might think
that adding more useful information would be uncontroversial, but Ingo
Molnar continues to oppose its inclusion.
Ingo does not like the interface or the fact that it lives in
/proc; his preferred solution looks more like an extension to ftrace. More
thought toward the creation of uniform instrumentation interfaces is
probably a good idea, but the current /proc/kpageflags interface
has proved useful. It's also an established kernel ABI, so it's not going
away anytime soon. But whether /proc/kpageflags will be extended
further remains to be seen.
(
Log in to post comments)