Brief items
The current development kernel is 2.6.39-rc1,
released by Linus on March 29. "
So
2.6.39-rc1 is out there, and the merge window is closed. I still have to
look over the cleancache pull request (which I got in plenty of time, but
decided that I want to review after the merge window craziness is over),
but other than that, we're done." Significant changes in 2.6.39 will
include the
open by handle system calls,
CLOCK_BOOTTIME, an option to force all
interrupt handlers to run as threads,
ipset, the
transcendent memory core, the
media controller subsystem, the
CHOKe flow scheduler, and much more; see
the
long-format changelog for details.
Stable updates: the 2.6.32.36,
2.6.33.9, 2.6.37.6, and 2.6.38.2 stable kernel updates were released
on March 28. Each contains another long list of important fixes; note
that the 2.6.37.6 update will be the last for the 2.6.37 series.
Previously, 2.6.32.35 was released on
March 24 with a single build fix.
Comments (none posted)
The C preprocessor... It is ugly, inelegant, painful, annoying,
and should have been strangled at birth -- but it is always there
when you need it!
--
Paul McKenney
Make Linux Software presents the fastest ever embedded Linux boot
for 720 MHz ARM and NAND flash memory. Linux boot time is 300
milliseconds from boot loader to shell.
--
Constantine Shulyupin
Our hammer is kernel patches and all problems look like nails, but
we'd end up with better user interfaces and a better kernel if we'd
just stop stuffing more and fatter user interface code into the
kernel.
--
Andrew Morton
Comments (7 posted)
By Jonathan Corbet
March 30, 2011
The
jump label mechanism was last seen here
in October, 2010. In short, jump label allows the optimization of "highly
unlikely" code branches to the point that their normal overhead is close to
zero. This speedup is done with runtime code patching; that is also the
cost: enabling or disabling the unlikely case is an expensive operation.
Thus, jump label is best used for code which is almost never enabled;
tracepoints and
dynamic debugging
statements are obvious cases.
There were a number of complaints about the initial jump label
implementation, including the fact that it was somewhat awkward to use. In
response, a reworked version has been
posted which changes the interface considerably. One starts by declaring a
"jump key":
#include <linux/jump_label.h>
struct jump_label_key my_key;
Enabling and disabling the key is a simple matter of calling:
jump_label_inc(struct jump_label_key *key);
jump_label_dec(struct jump_label_key *key);
And using the key to control the execution of rarely-needed code becomes:
if (static_branch(&my_key)) {
/* Unlikely stuff happens here */
}
In the absence of full jump label support, a jump key is represented by an
atomic_t value. jump_label_inc() becomes
atomic_inc(), jump_label_dec() becomes
atomic_dec(), and static_branch() is implemented with
atomic_read(). If jump label is configured into the kernel,
enabling and disabling a jump key become heavier operations,
while static_branch() becomes nearly free. For the intended use
cases for jump labels, that is a worthwhile tradeoff.
As of this writing, these changes have not been merged for 2.6.39. There
is always a possibility that they could be pulled in before -rc2, but
chances are that, at this point, the new jump label will have to jump into
2.6.40.
Comments (3 posted)
By Jonathan Corbet
March 30, 2011
The APM power management interface has never been much loved - even ACPI
was seen as a better alternative. There has been little or no hardware
made which depends on APM for some years; Windows evidently stopped
supporting it in 2006. Linux does still support APM, though, and that
support has a cost, so it is perhaps not surprising that Len Brown
would like to remove that support as of
2.6.40.
Removal of APM support on that schedule is almost certainly not going to
happen; a number of developers have expressed concerns that there may still
be hardware out there in use which would then be unable to run new
kernels. In general, the Linux kernel tries not to abandon users running
older hardware. So APM may stay for a while, but there is a problem:
keeping APM support, it seems, conflicts with some needed changes to the
cpuidle code. The need to keep APM working, in other words, threatens to hold back
improvements for the majority of users who have more current hardware.
The solution to this conflict may take the form of a partial
removal of APM support. The most important APM feature for users of old
systems is likely to be the ability to power-off the system; other features
may be less important. As Andi Kleen noted, idle support probably matters less to
such users:
Phasing out APM idle at least would be reasonable. Presumably even
if the old laptops still work they are likely on AC because their
batteries have long died. So using a bit more power in idle
shouldn't be a big issue.
So APM support, as such, may stick around for a while, but it may begin to
lose features as the kernel moves on.
Comments (25 posted)
Kernel development news
By Jonathan Corbet
March 29, 2011
There have been just over 2,200 non-merge changesets pulled into the mainline since
the second installment in this series; that
makes 8,757 total changes for this development cycle. The
2.6.39 merge window is now closed, so the feature set for this kernel
development cycle should be complete. User-visible changes merged in the
final part of the merge window include:
- Beginning user namespace support has
been merged. User namespaces are a sort of container where processes
can safely be given root access within the container without being
able to affect the rest of the system. Full container support is a
long-term project, but the user namespace patches get the kernel one
step closer.
- It is now possible for a suitably privileged process to write
to a processes /proc/pid/mem file.
- The "group isolation" tunable for the CFQ I/O scheduler has been
removed; group isolation is always provided now that the performance
issues associated with that mode have been fixed.
- There is a new "mtdswap" block device which allows swapping directly
to memory technology devices.
- New hardware support includes:
- Processors and systems: Samsung Laptop SABI interfaces,
WMI Hotkeys for Dell All-In-One series,
Intel Medfield platform thermal sensors, and
Asus Notebook WMI interfaces.
- Miscellaneous: MSM chipset SMD packet ports,
Texas Instruments TWL4030 hardware monitoring controllers,
ST-Ericsson AB8500 voltage monitors,
Maxim Semiconductor MAX8997/8966 PMICs,
Maxim 8997/8966 regulators,
Texas Instruments TPS61050/61052 boost converters,
Ricoh R5C592 card readers, and
OLPC XO-1.5 ebook switches.
- Video4Linux: Technisat USB2.0 DVB-S/S2 receivers,
Silicionfile NOON010PC30 CIF camera sensors,
DiBcom 9000 tuners,
3com homeconnect "ViCam" cameras,
OmniVision OV9740 sensors,
ST Microelectronics STV0367 demodulators,
OMAP3 camera controllers,
Divio NW80x-based camera controllers, and
ITE Tech IT8712/IT8512 infrared transceivers.
Changes visible to kernel developers include:
The 2.6.39 kernel now goes into the stabilization phase of the development
cycle. If the usual pattern holds, we can expect to see on the order of
2000 fixes merged between now and the final release, which is likely to
happen in early June.
Comments (10 posted)
By Jonathan Corbet
March 29, 2011
Linux users in the Good Old Days were treated to a number of experiences
which are denied to newcomers; one of those was the tiresome task of
figuring out where peripheral devices had chosen to put their I/O ports and
interrupt lines and communicating that information to the kernel.
Contemporary, self-describing hardware had taken a lot of the fun away in
the name of making things Just Work.
This kind of joy can still be had at the embedded level, though, where the
trend toward discoverable hardware has not caught on in the same way.
Recent discussions show that there is not, yet, a consensus among kernel
developers regarding how such hardware should be configured.
The OMAP-based PandaBoard is a popular
platform for those who are interested in experimenting with embedded
applications. It comes with a dual-core processor, high-definition video
capability, wireless networking, Bluetooth, an HDMI output, and the sadly
standard closed graphics usually associated with these devices. It also
has a "USB-attached" network port which is actually soldered to the board;
it looks like a USB device, but it's not something the user could unplug
without an act of significant violence.
This network port has moved developers toward violence for other reasons as
well. It is recognizable as a network device, but there is no way to know
that it is wired down. The board developers, in a move which is common in
this area, also left out the small EEPROM which would normally contain the
MAC address for this interface. In response to these design decisions, a
standard Linux kernel
booting on this board will call its network interface usb0 (a name
normally used for
USB point-to-point connections), and will generate a random MAC address for
it. Anybody who might depend on a MAC address which is stable across boots
will be out of luck.
This kind of non-discoverable hardware is common in the embedded sphere, so
a number of techniques have been developed to allow the kernel to run on
the resulting systems. The traditional approach is through the creation of
"board files"; see board-msm7x30.c as an
example. These files are meant to provide the kernel with enough
information to understand the topology of the hardware it is running on;
information related to specific devices is typically passed through a set
of static platform_device structures, and through that structure's
platform_data pointer in particular. As the driver initializes
the device, it can refer to the platform_data pointer (which
points to some sort of device-specific structure) for any information which
it cannot get from the hardware itself.
The current platform_data implementation will not work for the
PandaBoard, though, because platform_data is not passed to USB
devices. These devices are meant to be entirely discoverable and
self-describing, so it was thought that there would be no need for external
configuration data in the kernel. The fact that these devices are dynamic
means that their existence cannot be known or guaranteed when the board
file is written, so trying to create static platform data for them would
seem to make little sense.
The problem with this reasoning is that the PandaBoard's network interface is not
fully discoverable and it is not dynamic. It is a sort of platform device
disguised as a USB device. So Andy Green thought it would be reasonable to use platform
data as a way of configuring this device; in particular, he would like to
pass the device name (eth0 instead of usb0) and a MAC
address via a platform_data pointer. What he got was an extended
discussion making it clear that (1) the platform data mechanism is not
universally loved, and (2) there is not a complete consensus on how
this kind of problem should really be solved.
There are a couple of perceived problems with platform data; first of those
is that it encodes the information about a specific hardware configuration
in the kernel itself. That leads to a proliferation of board files in the
kernel source - each of which is controlled by its own configuration option
- and makes it hard to build kernels which can run on multiple boards. The
platform_data pointer itself, being a void pointer, is
seen as not being type-safe: there is no way for the compiler to ensure
that every board file is passing the right type of pointer to every device
driver. For these reasons, there is strong opposition to expanding the
platform data mechanism.
What are the alternatives? One of those is to do everything in user space,
using udev rules. This approach appeals to those who want to see
no policy in kernel space, but it is hard to implement in this case; there
is no information available to distinguish this wired-down network
controller from the traditional USB variety. Some developers are also
unconvinced that replacing in-kernel board files with fragile-looking (to
them) user-space configuration files which must be pushed to distributors
is the way toward a more robust solution. It is also argued that the
device naming policy (usb0, in this case) is already in the
kernel; the discussion is about the details of what that policy should be.
The other approach would be to use device
trees, which are meant for just this type of application. A device
tree would allow the passing of configuration-specific information into
drivers without the need to put board-specific hacks into the drivers
themselves. As more components show up in both consumer and deep embedded
situations, this capability will only become more useful. For these
reasons, Arnd Bergmann thought that this
problem would be an ideal place to demonstrate the use of device trees:
Let's make this the first use case where a lot of people will want
to have the device tree on ARM. The patch to the driver to check
for a mac-address property is trivial, and we can probably come up
with a decent way of parsing the device tree for USB devices, after
all there is an existing spec for it.
The problem with the device tree approach is that its adoption, in general,
is slow, especially in the ARM architecture which, arguably, has the most
need of it. It does not seem like a solution for people who have a
PandaBoard now and would like it to work; it is also not immediately
applicable to all of those systems which are currently described by board
files and platform data. While many people seem to see a transition to
device trees as something which will happen eventually, few of them are
holding their breath in anticipation of an immediate changeover.
So what is a PandaBoard owner to do? There are, it seems, a couple of
short-term solutions which will fix this particular board without waiting
for longer-term answers. One is a patch from
Arnd which will cause USB-attached Ethernet devices to carry
an ethN name unless they are known to be point-to-point
connections. For the MAC address problem, Alan Cox has suggested a hack which would allow the board
file to take control of the address assignment for a specific interface.
Neither of these solutions addresses the real problem, but they will give
some breathing room while the proper fix is debated.
Comments (2 posted)
By Jonathan Corbet
March 29, 2011
Unix-like systems tend to be well hardened against attacks from outside,
but more vulnerable to attacks by local users. One of the softer spots in
most systems has to do with "fork bombs" - processes which madly
fork() until they run the system out of resources. These attacks
are difficult to defend against and difficult to stop without a reboot;
they can also, at times, be created inadvertently. If Hiroyuki Kamezawa
has his way, fork bombs will be less of a problem in the future.
The problem with fork bombs is that they are moving targets; by the time a
system administrator notices a rapidly-forking process, it may have created
vast numbers of children and exited. Killing processes individually in a
fork bomb situation is not really an option; even a program written
especially for this task can be hard put to keep up with the stream of new
processes. There is just no way to get a handle on the entire tree of
offending processes from user space. So it is not surprising that the best
response in this situation can be to hit the Big Red Button and start over.
Even if, as in Kamezawa-san's case, hitting the button involves walking to
another building where the afflicted system is housed.
Indeed, it can be hard to get a handle on this tree from kernel space as
well. The process tree only exists, as such, as long as the parent processes
remain alive; once a process exits, all of its children are reparented to
the init process. That causes a flattening of the tree structure and makes
it hard to identify all of the processes involved in the attack. So
Kamezawa-san's patch starts with the
addition of a new process tracking structure. It is organized as a simple
tree reflecting the actual family structure of the processes on the
system. It differs from existing data structures, though, in that this
"history tree" persists even when some processes exit. That allows
the kernel to view the entire tree of processes involved in a fork bomb
even if those which launched the attack have long since gone away.
Keeping the entire history of all processes created over the lifetime of a
Linux system would be a costly endeavor. Clearly, there comes a point
where history needs to be discarded. Every so often (30 seconds by
default), the kernel will try to determine whether there might possibly be
a fork bomb attack in process; if no signs of an attack are detected, any
tracking history which has existed for more then 30 seconds will be
deleted.
How does the kernel decide whether it might be under attack? The way fork
bombs incapacitate a system is usually through memory exhaustion, so the
code looks for signs of memory stress: in particular, it looks to see if
there have been any memory allocation stalls or kswapd runs since the last
check. It also looks at whether the total number of processes on the
system has increased. If none of those checks shows any reason for concern,
the older history data will be removed from the system. If, instead,
memory allocations are getting harder to come by or the number of processes
is growing, the tracking structure will be kept around.
If a fork bomb runs the system out of memory, the kernel's first response
will be to fire up the out-of-memory (OOM) killer. Given time, the OOM
killer might manage to clean up the mess, but the fact of the matter is
that the OOM killer is designed around finding the one process which is
creating the problem and killing it. The OOM killer cannot identify a
whole tree of rapidly-forking processes and do away with all of them.
Enter the fork bomb killer, which is invoked by the OOM killer. The
fork bomb killer will perform a depth-first traversal of the process
history tree, filling in each node with information on the total number of
processes below that node and the total memory used by those processes. At
the end, the process with the highest score is examined; if there are at
least ten processes in the history below the high scorer, it is deemed to
be a fork bomb; that process and all of its descendants will be killed.
Problem solved - hopefully.
There are a couple of control knobs which have been placed under
/sys/kernel/mm/oom. History tracking will only be performed if
mm_tracking_enabled is set to "enabled" (which is the
default setting). The value in mm_tracking_reset_interval_msecs
controls how often the process tracking tree is cleaned up; the default
value is 30,000 milliseconds. A possibly surprising omission is the lack
of a knob controlling how many descendants a process must have before it is
declared to be a fork bomb; the hardcoded value of ten seems low.
The reception for this patch has not been entirely favorable; commenters
worry about the runtime cost of maintaining the tracking structure and
suggest that user-space solutions may be better. Kamezawa-san seems resigned that the patch may not go in,
saying "To go to other buildings to press reset-button is good for
my health." Other administrators, who may not be within easy
walking distance of their systems, may feel their health is better
served by some extra fork bomb protection, though.
Comments (18 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
Janitorial
Memory management
Architecture-specific
Security-related
- Mimi Zohar: EVM .
(March 29, 2011)
Benchmarks and bugs
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>