The current 2.6 prepatch is 2.6.21-rc5
on March 25. It
contains a number of fixes, including a set for timer-related regressions.
Says Linus: "Those timer changes ended up much more painful than
anybody wished for, but big thanks to Thomas Gleixner for being on it like
a weasel on a dead rat, and the regression list has kept shrinking.
for the details.
Several dozen fixes have been merged into the mainline git repository
since -rc5 was released.
The current -mm tree is 2.6.21-rc5-mm2. Recent changes
to -mm include a new lumpy reclaim patch, an updated deadline staircase
(formerly RSDL) scheduler, a number of futex enhancements, and the integrity
management patch set (see below).
The current stable 2.6 kernel is 184.108.40.206, released on March 23.
For older kernels: 220.127.116.11 was released with
several fixes on March 26.
In the 2.4 world, 18.104.22.168
was released on March 24; it only contains two changes. 2.4.35-pre2 is also out with a
rather larger set of fixes.
Comments (none posted)
Kernel development news
Anyway, if it doesn't fix a bug it is nowhere near a high-priority
patch for that seething bugfest which we like to call a kernel, so
I'll drop it.
-- Andrew Morton
In [the] future, I'd recommend adding a witty comment to any such
trivial patch: it's really the only way to get it featured on LWN's
Kernel Quote of the Week.
-- Rusty Russell
In talking with a lot of different companies recently, I've come to
the realization that we really need to do something about companies
that violate the kernel's GPLv2 license. It has been a common
criticism that "Well, our company abides by the GPL by releasing
the code properly for our kernel modules, but what about all of
those other companies that do not?" The companies that are good
members of the community are getting a lot of pressure by people
internal to them to stop releasing the code. This is justified by
pointing to the companies that do not release their code as they
are not having any "penalties" by doing this.
Comments (10 posted)
The "hugetlb" feature of the kernel allows applications to create and use
"huge" pages in memory. These pages use a special page table mode which
allows a single page table entry to provide the translation for up to 16MB
of contiguous memory (on some architectures). The advantage to doing
things this way is that references to the entire huge page only take up one
slot in the translation lookaside buffer (TLB), and that can have good
effects on performance.
Access to huge pages is through the hugetlbfs filesystem. Hugetlbfs is a
virtual filesystem much like tmpfs, but with a twist: mappings of files
within the filesystem use huge pages. It's not possible to do normal reads
and writes from this filesystem, but it is possible to create a
file, extend it, and use mmap() to map it into virtual memory.
This interface gets the job done, but it's evidently a little too involved
for some application programmers.
To make life simpler, Ken Chen has
proposed /dev/hugetlb. This
device is much like /dev/zero, except that it uses huge pages.
Applications can simply open the device and use mmap() to create
as much huge-paged anonymous memory as they need. The patch is simple and
seemingly uncontroversial; Andrew Morton did note, though:
afaict the whole reason for this work is to provide a quick-n-easy
way to get private mappings of hugetlb pages. With the emphasis on
We can do the same with hugetlbfs, but that involves (horror)
The way to avoid "fuss" is of course to do it once, do it properly
then stick it in a library which everyone uses.
He goes on to observe, however, that getting yet another library
distributed widely can be a difficult task - to the point that it's easier
to just add more functionality within the kernel itself. He concludes:
"This comes up regularly, and it's pretty sad."
In a separate message, Andrew talked about
how kernel interfaces should be designed in general:
The fact that a kernel interface is "hard to use" really shouldn't
be an issue for us, because that hardness can be addressed in
libraries. Kernel interfaces should be good, and complete, and
maintainable, and etcetera. If that means that they end up hard to
use, well, that's not necessarily a bad thing. I'm not sure that
in all cases we want to be optimising for ease-of-use just because
In many cases, the C library fills this role by providing a more
application-friendly interface to kernel calls. But there are limits to
how much code even the glibc developers want to stuff into the library, and
things like a friendlier huge page interface may be on the wrong side of
the line. A separate library for developers trying to do obscure and
advanced things with the kernel might be the right solution.
The right solution, Andrew suggests, is to have a user-space API library
which is maintained as part of the kernel itself. That would keep
oversight over the API and help to ensure that the library is maintained
into the future while minimizing the amount of code which goes into the
kernel solely for the purpose of creating friendlier interfaces. Somebody
would have to step up to create and maintain that library, though; as of
this writing, volunteers are in short supply.
Comments (7 posted)
The dynamic tick
in the upcoming 2.6.21 kernel seeks to avoid processor wakeups by turning
off the period timer tick when nothing is happening. Before stopping the clock,
the kernel must decide when it should wake up again; this decision involves
looking at the timer queue to see when the next timer expires. In the
absence of other events (hardware interrupts, for example), the system will
sleep until the nearest timer is due.
Many of these timers should, in fact, run as soon as the requested period
has expired. Others, however, are less important - to the point that they
are not worth waking up the processor. These non-critical timeouts can run
some fraction of a second later (when the processor wakes up for other
reasons) and nobody will notice the difference. So it would be nice if there
were a way to tell the kernel that a specific timer does not require
immediate action on expiration and that the processor should not wake up
for the sole purpose of handling it.
Venki Pallipadi has created such a way with the deferrable timers patch. There is just one
new function added to the internal kernel API:
void init_timer_deferrable(struct timer_list *timer);
Timers which are initialized in this fashion will be recognized as
deferrable by the kernel. They will not be considered when the kernel
makes its "when should the next timer interrupt be?" decision. When the
system is busy these timers will fire at the scheduled time. When things
are idle, instead, they will simply wait until something more important
wakes up the processor.
Venki appears to have gone to great length to minimize the changes required
by this patch. So, in particular, the timer_list structure does
not change at all. Instead, the low-order bit on an internal pointer
(which is known to always be zero) is repurposed as a "deferrable" flag.
The result is that the timer_list structure does not grow to support this new
functionality, at the cost of requiring all code using the internal
base pointer to mask out the "deferrable" bit.
The patch, as presented, only affects timers used within the kernel; no
code has been changed to actually use deferrable timers yet. There could
be potential in extending this interface somehow to user space. Our user
space remains full of applications which feel the need to wake up
frequently to check
the state of the world; these applications are a real
problem for power-limited systems. If those applications truly cannot be
fixed, perhaps they could at least indicate a willingness to wait when
nothing important is going on.
Comments (2 posted)
Certain patches seem to pop up occasionally on the kernel lists for years.
One of those is the whole integrity management patch set from IBM; these
patches were last covered here in November, 2005
. They are back
for consideration yet again. Integrity management still looks like it is
not ready for inclusion into the mainline, but it is getting closer; at
some point it will force consideration of some interesting questions.
The core idea behind integrity management is providing some sort of
assurance that the files on the system have not been messed with. David
Safford described it this way:
[B]asically this integrity provider is designed to complement
mandatory access control systems like selinux and slim. Such
systems can protect a running system against on-line attacks, but
do not protect against off-line attacks (booting Knoppix and
changing executables or their selinux labels), or against attacks
which find weaknesses in the kernel or the LSM module itself.
The current patches work, at the lowest level, by defining a new set of
security module hooks for an "integrity provider." The provider can hook
into system calls which access or execute files and check the integrity of
those files; should it conclude that Bad Things have happened, access to
the files can be denied. On top of that is the EVM ("extended verification
module") code, which checks the integrity of files (and their
metadata) by checksumming them and comparing the result with a value stored
as an extended attribute. The IBAC (integrity-based access control) module
can then use EVM and the LSM hooks to allow or deny access to files based
on the conclusions reached by the integrity checker.
All of this can work using a passphrase supplied by the system
administrator, but the intended mode of operation uses the trusted platform
module (TPM) built into an increasing number of computers. With
cooperation from the system's BIOS, the TPM can do an effective job of
checksumming the software running on the system. The TPM also performs
basic cryptographic functions, like signing the checksums used to verify
the integrity of files. The key aspect of the system, though, is that the
TPM can be set up to create these signatures only if the checksums for the
running system match a set of pre-configured values. The end result is
that the checksums associated with files cannot be changed on another
system or by booting a different kernel - at least, not in a way which
preserves their value as checksums. If the system holds together as
advertised, it should be able to prevent attacks based on changing
the files used by the system.
Beyond that, this system supports remote attestation: providing a
TPM-signed checksum to a third party which proves that only approved
software is running on the system.
There are clear advantages to a structure like this. A Linux-based teller
machine, say, or a voting machine could ensure that it has not been
compromised and prove its integrity to the network. Administrators in
charge of web servers can use the integrity code in similar ways. In
general, integrity management can be a powerful tool for people who want to
be sure that the systems they own (or manage) have not be reconfigured into
spam servers when they weren't looking.
The other side of this coin is that integrity management can be a powerful
tool for those who wish to maintain control over systems they do not own.
Should it be merged, the kernel will come with the tools needed to create a
locked-down system out of the box. As these modules get
closer to mainline confusion, we may begin to see more people getting
worried about them. Quite a few kernel developers may oppose license terms
intended to prevent "tivoization," but that doesn't mean they want to actively
support that sort of use of their software. Certainly it would be harder
to argue against the shipping of locked-down, Linux-based gadgets when the
kernel, itself, provides the lockdown tools.
For now, that issue can be avoided; there are still plenty of more mundane
problems with this patch set. But, sooner or later, the integrity
management developers are going to get past the lower-level issues; they
have certainly shown persistence in working on this patch. Based on his
prior statements, Linus is
unlikely to oppose the merging of these modules
once they are ready. Whether the rest of the development community will be
so welcoming remains to be seen.
Comments (6 posted)
Patches and updates
Core kernel code
Filesystems and block I/O
Virtualization and containers
Page editor: Jonathan Corbet
Next page: Distributions>>