There have been no kernel releases over the last week
. The 2.6.23
merge window remains open, and patches are flooding into the mainline
repository; see the article below for a summary.
Comments (none posted)
Kernel development news
I just really _really_ wish we could have two fairly stable
releases in a row. I think 2.6.22 has the potential to be a pretty
good setup, and I'd really like to avoid having another 2.6.21
-- Linus Torvalds
Sysfs never tried to be an ABI/API in the usual sense, parts of it are
just a nicer looking "kernel dump". :)
You have to follow _very_ special rules to extract information here in a
way that will not produce unexpected results between kernel releases, or
even a second later on the same system.
-- Kay Sievers
In my opinion any hibernation framework that doesn't take the above
requirements into account in any way will be a failure. Moreover,
the existing frameworks fail to follow some of them too, so I
consider all of these frameworks as a work in progress. For this
reason, I will much more appreciate ideas allowing us to improve
the existing frameworks in a more or less evolutionary way, then
attempts to replace them all with something entirely new.
-- Rafael Wysocki
Comments (4 posted)
Some 2600 changesets have been merged into the mainline kernel
repository since last week's
. The shape of 2.6.23 is now becoming clearer; this kernel will
- New drivers for Dallas DS1682 elapsed time recorder chips, PMC-Sierra
MSP71xx i2c controllers, Renesas M66592 USB peripheral controllers,
Renesas R8A66597 USB host controllers, OTi-6858 USB-to-RS232 bridge
controllers, Samsung S3C24xx SoC USB device controllers, Intel iop32x,
iop33x, and iop13xx DMA engines, Xilinx SystemACE compact flash
interfaces, BCM1250 dual UART devices, OMAP24xx multichannel SPI
controllers, Atmel AVR32 AT32AP700x real-time clocks, ST M41T80 and ST
M48T59 real-time clocks, Dallas DS1216 real-time clocks, TI OMAP
framebuffers, display controllers, and LCD controllers (along with a
support for a number of panels), Atmel AT32AP700X watchdog devices,
IBM z/VM virtual card readers and punches, Afatech AF9005
- After years of work, the core Xen i386 implementation has been
merged. Xen is finally a part of the mainline kernel. (Anybody who
is tempted to believe that predictions found in LWN are worth anything
may be amused by Dave Jones
poking fun at a suggestion, published in 2004, that Xen could be
merged sometime soon).
- The fallocate()
system call has been merged, but without the deallocation options.
- The developmental ext4 filesystem has gained a number of new features,
including fallocate() support, nanosecond timestamps, and
support for directories containing more than 65,000 other directories.
- The new "macvlan" driver allows the system administrator to create
virtual interfaces mapped to and from specific MAC addresses.
- A number of virtual drivers for Sun logical domains (on the SPARC64
architecture) have been added. LDOM CPU hotplug support has also been
- The bsg code - a new generic SCSI device driver based on the block
layer - has been merged.
- IPV4 multipath cached routing support has been dropped; this code
never did work very well, and never got out of the experimental
- Basic, experimental support for PPP over L2TP sockets has been added.
- A device model extension (marked experimental) can export a laptop's
desktop management information (DMI) data through sysfs. This will
allow distributors to load just the drivers needed for a specific
laptop instead of the "load them all and let the hardware sort them
out" technique which is often used now.
- The highly experimental "USB persist" feature attempts to maintain the
state of USB devices when they lose power. The driving motivation
between this patch is to be able to suspend a system containing
filesystems on USB storage and still have those filesystems mounted
and working at resume time.
- As scheduled, the speedstep-centrino CPU governor has been removed in
favor of the acpi-cpufreq code.
- The XFS filesystem now has a "stream of files" concept which allows it
to place related files (a series of frames in a video stream, for
example) contiguously on disk.
- The AFS filesystem now has file locking support.
- The raw block driver has been un-deprecated since it appears it will
not be going away anytime soon.
- The O_CLOEXEC
open flag has been added.
- There is a new clone() flag - CLONE_NEWUSER - which
creates a new user namespace for the process; it is intended for use
with container systems.
- The long-debated memory
fragmentation avoidance patches have been merged at last; the
associated lumpy reclaim
code has been merged as well.
- The kernel virtual machine (KVM) code can now support SMP guests.
Changes visible to kernel developers include:
It's worth noting a couple of things which will not be in 2.6.23.
The first is the process
containers patch, which is not quite considered to be ready yet. Some
other features (notably CFS group scheduling) are waiting for process
containers, so chances are good that this code will be in shape for merging
The other big omission is the x86_64 clockevents, dynamic tick, and high-resolution timers
code. This patch is considered by its authors to be ready (and your editor
has been running it without ill effect), but, after the troubles caused by
the integration of the i386 version of this code in 2.6.21, there is a
desire felt by some developers to go a bit more slowly and carefully. The
result was a somewhat unhappy discussion on the mailing lists and a plan to
better split these patches so they can be carefully reviewed for the next
Comments (5 posted)
Universal serial bus (USB) devices do not normally have much of a security
model associated with them. If a user is able to plug a USB device into
the system, said system assumes that the device is properly authorized to
be there. There are situations where the connection of USB device causes
people to worry; the usual scenario is the fear of corporate secrets being
copied into some sort of USB storage device and being carried out of the
building. In general, in situations where such fears run strong, the
response has involved (attempted) bans of USB devices or simply filling the
USB ports of accessible computers with glue.
Wireless USB changes the situation slightly. This protocol allows USB
devices to operate remotely, without that pesky cable to trip over; it can
be thought of as occupying a niche similar to that of Bluetooth. While a
typical laptop user might be expected to notice an attacker plugging a
normal USB keyboard into their system, said attacker could attempt to
connect a wireless USB keyboard without coming near. Clearly, some sort of
security layer is required. The wireless USB specification
has anticipated this need; it provides for a whole series of acronym-laden
techniques for (1) ensuring that both hosts and devices authenticate
themselves to each other, and (2) that wireless USB communications are
sufficiently well encrypted that they cannot be eavesdropped upon.
Iñaky Perez-Gonzalez is working on wireless USB support for Linux. He has
come to the conclusion that the grungy details of wireless USB
authentication belong in user space; the kernel cannot, on its own, keep
track of which devices are known to the system and are allowed to connect. It
is, however, up to the kernel to implement the authorization part of the
equation: a wireless USB device which is not authorized should not be able
to perform any sort of exchange with the host system. Iñaky's response to
the authorization problem is this
set of patches to the USB subsystem.
These patches add three new flags to the usb_device structure:
wusb, authorized, and authenticated. The first
indicates that a device is wireless, and the last (which is not
yet used) indicates that the device has passed authentication. In the
middle is the authorized flag which indicates whether it is OK to
talk to the device. If the device is not authorized, the kernel will not even read
its configuration to find the endpoints it provides; the only thing that
can happen at that point is authentication. To that end, various points in
the USB stack are changed to check the authorized flag before
allowing access to a USB device.
User space is brought into the picture by way of the usual device-attach
announcement and the creation of an associated sysfs tree. The sysfs
directories for USB devices gain a new authorized attribute which
corresponds to the internal flag; user space can enable access to the
device by writing a non-zero value to that attribute. That infrastructure
is all that is required for some sort of user-space daemon to notice the
arrival of a new wireless USB device, check its database of known devices,
possibly pop up some sort of pairing dialog to the user, and implement a
decision on whether the device should be allowed to connect or not.
Iñaky has taken things a step further by realizing that this authorization
mechanism need not be limited to wireless devices; it can, in fact, be used
to allow some sort of management code to pass judgment on any USB device.
There is a set of per-host authorized_default flags which can be
configured by the administrator; simply setting the default to zero with no
other action will disallow the connection of any new devices, whether wired
A more complex implementation might allow only certain types of devices to
connect. Keyboards and mice might be acceptable, but anything which could
remove data from a system - storage devices or printers, say - would be
disallowed. Or storage devices could be allowed, but only if they contain
some sort of properly signed authorization certificate which can be
verified by the host system. There are a number of interesting
possibilities. The resulting security will be less than that which could
be had by filling in the ports or simply configuring USB out of the system
entirely, but it might be just what is needed at some sites.
Overall, it's a relatively simple patch set which adds some interesting
capabilities. Much of the hard work - authentication and encryption setup
- remains, but that's a job for user space. Iñaky has asked that this code
be merged for 2.6.23; it's just a bit late, though, for a relatively
untested (in the wider world) chunk of code to slip through the merge
window. 2.6.24 seems more likely.
Comments (2 posted)
Back in early 2006, there was an ongoing, energetic debate over the future
of the software suspend (to disk) code - a situation which remains true
to this day. In the middle of it all, Andrew Morton had jumped in
with a suggestion for
a different approach:
If you want my cheerfully uninformed opinion, we should toss both
of them out and implement suspend3, which is based on the
kexec/kdump infrastructure. There's so much duplication of intent
here that it's not funny. And having them separate like this
weakens both in the area where the real problems are: drivers.
Eighteen months later, it looks like we might just get that "suspend3" in
the form of the kexec jump
patch, posted by Ying Huang.
Ying's patch builds on the existing kdump facility. The purpose of kdump is
to provide safe and useful crash dumps in situations where the state of the
operating system is uncertain. If the system panics it is nice to be able to save
its current state for post-mortem debugging. It is important, however,
that the buggy kernel - which is now in an untrustworthy state - not be
used to do dangerous things like write crash dump data to disk.
To avoid that situation, a small "dump kernel" is
placed in a reserved area of memory where, most of the time, it lurks
unnoticed and unneeded. Should a panic occur, a kexec() call is
made to transfer control to the dump
kernel, which will be able to start up in a known state. As long as the
dump kernel stays within its reserved area of memory, it will be able to
write the rest of the system state to disk (or wherever) in a relatively
What Andrew recognized last year is that suspend-to-disk (which is slowly
being rebranded "hibernation") does essentially the same thing: system
activity is stopped and the current system state is written to disk. If
the dump kernel could read that state back into memory and return to the
original kernel, it would be able to hibernate (and resume) the system. An
implementation along these lines would have the advantage of unifying much
of the kdump and hibernation code, thus concentrating development effort
and generally simplifying things. Plus it would be a way to eliminate the
current code, which, despite many years' tenure in the mainline, remains
The current patch does not do all of that; it is really just the first
step: making it possible to jump from the secondary kernel back into the
original kernel. The code is relatively simple; though it does rely on
much of the existing infrastructure to properly suspend and power down all
devices in the system for the jump in either direction. So if device
drivers are interfering with hibernation now, that problem will still exist
in a kexec-based implementation. But much of the other hibernation code,
including the much-maligned process freezer, would be unneeded and could be
There's a few little details to take care of before one can take a hatchet
to the current hibernation code, though. Powering-down devices between the
two kernels is not really necessary or desirable; they just need to go into
a quiet "hibernate" state. A kdump kernel needs to be placed in reserved
memory from the beginning; trying to load it at panic time would be far too
late. A kernel used for hibernation, instead, need not occupy system
memory all the time, so some sort of on-demand secondary kernel loading is
needed. The actual task of saving and restoring the system image is yet to
be implemented - that can all be done easily in user space, however, with
very little in the way of kernel support. Making the resume process fast
enough will take some work - users might take a dim view of having to wait
for two kernels to boot before getting their system back. And so on.
So, in other words, nobody should be holding their breath for kexec-based
hibernation in the near future. But the initial response to this approach
was mostly positive; there seems to be a lot of interest in simply starting
over in this area. Some of that enthusiasm might fade as work progresses
and it turns out that, even with a new approach, hibernation is still a
difficult and somewhat grungy problem. So only time will tell if this code
will develop into a better hibernation implementation.
Comments (14 posted)
Patches and updates
Core kernel code
Filesystems and block I/O
Virtualization and containers
Page editor: Jonathan Corbet
Next page: Distributions>>