Brief items
The 2.6.39 merge window remains open as of this writing; see the
separate summary below for details on what has been merged over the last
week.
Stable updates:
The 2.6.32.34, 2.6.37.5, and 2.6.38.1 stable kernel updates were released
on March 23; each contains a number of important fixes.
2.6.33.8 was
released on March 21. 2.6.33 had gone out of maintenance, but Greg
Kroah-Hartman has resumed creating updates because the realtime preemption
patch set is stuck at this release.
Comments (1 posted)
I tend to view arch specific embedded code as rather like very
dubious parties. What goes on in other peoples' house out of sight
is none of my business.
The 8250 however is core code so it should keep its clothes on and
behave in a manner befitting its status.
--
Alan Cox
I also believe that Greg spends lots of lonely nights looking into
git commits, drinking his favorite Latte and cursing at all the git
patches that fix a bug that didn't have a Cc: stable tag
attached. He use to have a huge mane of hair on his head before
taking over as stable maintainer.
--
Steven Rostedt explains the stable series
If it's some desperate cry for attention by somebody, I just wish
those people would release their own sex tapes or something, rather
than drag the Linux kernel into their sordid world.
--
Linus
Torvalds
I do get the impression that you're extremely unhappy with the way
ARM stuff works, and I've no real idea how to solve that. I think
much of it is down to perception rather than anything tangible.
Maybe the only solution is for ARM to fork the kernel, which is
something I really don't want to do - but from what I'm seeing its
the only solution which could come close to making you happy.
--
Russell King
This was discussed before, and it was felt that perhaps 75000 lines
of ocaml code was not really appropriate for the Linux source tree.
--
Julia Lawall
Comments (2 posted)
A group of Video4Linux2 developers recently gathered in Warsaw to discuss a
number of topics of interest to that subsystem; Hans Verkuil has posted a
report from that gathering. Issues discussed include an API for compressed
video formats, subdev hierarchies, cropping and composing, pipeline
configuration, HDMI support, and more.
Full Story (comments: none)
By Jonathan Corbet
March 23, 2011
Kernel developers might rightly complain about being confused over which
functions should be used to convert strings to integer types. Old
functions like
simple_strtoul() will silently ignore junk at the
end of an integer value, so "
100xx" successfully converts to an
unsigned integer type. Alternatives like
strict_strtoul() have
been encouraged instead, but they have problems too, including the lack of
overflow checks. So what's a kernel hacker to do?
As of 2.6.39, there is a new set of string-to-integer converters which is
expected to be used in preference to all others.
- Unsigned conversions can be done with any of kstrtoull(),
kstrtoul(), kstrtouint(), kstrtou64(),
kstrtou32(), kstrtou16(), or kstrtou8().
- Conversions to signed integers can be done with kstrtoll(),
kstrtol(), kstrtoint(), kstrtos64(),
kstrtos32(), kstrtos16(), or kstrtos8().
All of these functions are marked __must_check, so callers are
expected to check to ensure that the conversion happened successfully. The
older functions are marked deprecated, and will eventually be removed.
These new kstrto*() functions are now the Official Best Way To
Convert Strings, so developers need wonder no longer.
Comments (none posted)
Kernel development news
By Jonathan Corbet
March 23, 2011
As of this writing, some 5,500 non-merge changesets have been merged into
the mainline since
last week's 2.6.39 merge
window summary. A wide-ranging set of new features, cleanups, and
performance improvements has been added to the kernel. Some of the more
significant user-visible changes include:
Changes visible to kernel developers include:
- After many years of work by a large number of developers, the big
kernel lock has
been removed from the kernel.
- The dynamic debug mechanism has some
new control flags allowing for
control over whether the function name, line number, module name, and
current thread ID are printed.
- The kernel can export raw DMI table data via sysfs, making it
available in user space without needing to go through
/dev/mem.
- Network drivers can now enable hardware support for receive flow
steering via the new ndo_rx_flow_steer() method.
- The "pstore" filesystem provides access to platform-specific
persistent storage which can be used to carry information across
reboots.
- The EXTRA_CFLAGS and EXTRA_AFLAGS makefile variables
have been replaced with ccflags-y, ccflags-m,
asflags-y, and asflags-m.
- kmem_cache_name(), which returned the name of a slab cache,
has been removed from the kernel.
- The SLUB memory allocator now has a lockless fast path for
allocations, speeding performance considerably. "Sadly this
does nothing for the slowpath which is where the main issues with
performance in slub are but the best case performance rises
significantly."
- Kernel threads can be created on a specific NUMA node with the new
kthread_create_on_node() function.
- The new function delete_from_page_cache() does what its name
implies; unlike remove_from_page_cache() (which has now been
deleted), it also decrements
the page's reference count. It thus more closely mirrors
add_to_page_cache().
- There is a whole new set of functions which are the preferred way to
convert strings to integer values; see this article for details.
- The new "hwspinlock" framework allows the implementation of
synchronization primitives on systems where different cores are
running different operating systems. See Documentation/hwspinlock.txt for more
information.
If the usual two-weeks rule holds, the 2.6.39 merge window can be expected
to close on March 28. Watch this space next week for a summary of the
final changes merged for this development cycle.
Comments (11 posted)
By Jonathan Corbet
March 22, 2011
The kernel's "dynamic debugging" interface saw some minor changes for
2.6.39. As it happens, LWN has never written about how dynamic debug
works, so this seems like an opportune time to fill in the gap.
It can be nice to instrument kernel code with abundant print statements
that illustrate what is going on inside. The problem, of course, is that
those statements can generate vast amounts of output which is usually not
of interest. These statements can be left commented out most of the time, but
that leads to situations where an edit/rebuild/reboot cycle is needed to
get the output. In response, many developers have created
mechanisms which enable or disable specific print statements at run time.
The dynamic debugging interface was added as a way of providing a uniform
control interface for debugging output while avoiding cluttering the kernel
with various hand-rolled alternatives.
Dynamic debug operates on print statements written with either of:
pr_debug(char *format, ...);
dev_dbg(struct device *dev, char *format, ...);
If the CONFIG_DYNAMIC_DEBUG option is not set, the above functions
will be turned into normal printk() statements at the
KERN_DEBUG level. If the option is enabled, though, the code sets
aside a special descriptor for every call site, noting the module,
function, and file names, along with the line number and format string. At
system boot, all of these debug statements are turned off, so their output
will not appear even if debug-level kernel messages are routed somewhere
useful by the syslog daemon.
Turning on dynamic debug causes a new virtual file to appear at
/sys/kernel/debug/dynamic_debug/control (modulo any individual
preferences for the location of debugfs, naturally). Writing to that file
will enable or disable specific debugging functions, as specified by a
simple but flexible language.
As an example, drivers/char/tpm/tpm_nsc.c contains the following
code at line 346:
dev_dbg(&pdev->dev, "NSC TPM detected\n");
Turning on that specific line could be done with a line like:
echo file tpm_nsc.c line 346 +p > .../dynamic_debug/control
(Where the full path to debugfs has been replaced with "..."). As
it happens, that dev_dbg() line does not stand alone - there is a
long series of them providing information on the newly-detected device.
One could enter a series of lines like the above to enable them all
individually, but either of the following would also work:
echo file tpm_nsc.c line 346-373 +p > .../dynamic_debug/control
echo file tpm_nsc.c function init_nsc +p > .../dynamic_debug/control
Along with selection by file name, line number, and function name, the
interface also allows "module name" to select a specific
module, and "format fmt" to select any line whose format
string contains "fmt". If more than one selector is given,
all must match for a given statement to be enabled.
Commands to the control file must end with a "flags" operation telling the
system what to do; "+p" turns on printk() output, while
"-p" turns it off. There is also a set of flags (new for 2.6.39)
controlling information added to each output line: "f" adds the
function name, "l" adds the line number, "m" adds the
module name, and "t" adds the thread ID. One can use "="
to set the full mask of flags to a specific value - "=plm" will enable printing
with line numbers and module names while disabling thread ID and function output regardless of their prior setting. The only way to clear all of the
flags is with "-pflmt".
Reading the control file will produce a list of all currently-enabled call
sites.
Sometimes the interesting action happens before the system reaches a point
where the control file can be accessed. Dynamic debug output can be turned
on early in the boot process with the ddebug_query boot parameter.
More information on how to use this facility can be found in Documentation/dynamic-debug-howto.txt.
Dynamic debug has been in the kernel since 2.6.30, but it is still common
to see code submitted which contains its own, home-brewed mechanism for
controlling debug output. Chances are that reviewers will ask for such
mechanisms to be taken out before the code is merged. Given the
flexibility and ease of use of the in-kernel implementation, it makes sense
to use it from the beginning.
Comments (15 posted)
By Jake Edge
March 23, 2011
When Linux systems crash, there are various ways to find out what went
wrong, but generally those rely on writing to log files on
disk. For some systems, disk may not be available, or trusted in the case
of a crash, so a way to poke some data into a platform-specific place for
use by
a subsequent kernel boot would be useful. That's exactly what the pstore
filesystem, which was just added during the current 2.6.39 merge
window, will provide.
The idea
for
pstore came out of a conversation between Tony Luck and Thomas Gleixner
at last year's Linux Plumbers Conference. Luck wanted to use the ACPI
error record serialization table (ERST) to store crash information across a
reboot. The ERST is a mechanism specified by the ACPI specification
[PDF] (in section 17.4, page 519) that allows saving and retrieving
hardware error information to and from
a non-volatile location (like flash).
Rather than just doing something specific for the x86
architecture, he decided to create a more general framework so that other
platforms could use whatever persistent storage they had available. It
would be, as Luck put, "a generic
layer for persistent storage usable to pass tens or hundreds
of kilobytes of data from the dying breath of a crashing
kernel to its successor".
There have been a number of iterations of the code since Luck first posted
it for comments back in November. After Alan Cox's suggestion,
pstore moved from its original firmware driver with a sysfs interface
to a more straightforward filesystem-based implementation.
The basic idea is that a platform can register the availability of a
persistent storage location with a call to pstore_register() and
pass a pointer to a struct pstore_info, which looks like:
struct pstore_info {
struct module *owner;
char *name;
struct mutex buf_mutex; /* serialize access to 'buf' */
char *buf;
size_t bufsize;
size_t (*read)(u64 *id, enum pstore_type_id *type,
struct timespec *time);
u64 (*write)(enum pstore_type_id type, size_t size);
int (*erase)(u64 id);
};
The platform driver needs to provide three I/O routines and a buffer. There
is also a mutex present to protect against simultaneous access to the
buffer. With that, pstore
will implement a
filesystem that can be accessed from the kernel—or from user space once
it has been mounted. The underlying ERST storage is record oriented, and
Luck posits that other platform storage areas will be also, so the I/O
interface is record oriented as well.
In addition to
the pstore framework, the ERST driver was modified
to take advantage of pstore; that change was also merged, so there is
an in-kernel user of pstore. The pstore_info buffer is allocated
and managed by drivers/acpi/apei/erst.c, and is larger than the
bufsize advertised to account for the record and section headers
required by ERST. Users of the IO interface either fill the buffer before
calling pstore_info.write() or read the data from the buffer
after a call to pstore_info.read().
Each item is stored with a type, either PSTORE_TYPE_DMESG for log
messages (likely oops output), PSTORE_TYPE_MCE for hardware
errors, or PSTORE_TYPE_UNKNOWN for other undefined types. When
stored, each item gets a record ID associated with it, which gets returned
from the pstore_info.write() call. That ID can then be used in
read() and erase() operations, but it also appears in the
filenames in the pstore filesystem.
The filesystem can be mounted using:
# mount -t pstore - /dev/pstore
Files will appear there with names based on the type, name of the storage
driver, and the id, so the first dmesg record for ERST would be
/dev/pstore/dmesg-erst-1. The typical scenario would be for the
filesystem to be mounted at boot time, then some user-space process would
check for any files there, copy them to some more permanent place, and
delete the files with
rm. That will allow the storage facility
driver to reclaim the space in order to be ready for other crashes or
hardware errors.
By default, pstore will register a dump handler with kmsg_dump to write the
last 10K bytes of data from the kernel log to the pstore device when there
is a kernel oops or panic. The amount of data to store can be configured
at mount time using the kmsg_bytes parameter.
Luck has also put out an RFC patch to
disable dumping information into pstore for some kinds of kmsg_dump reasons
(e.g. KMSG_DUMP_HALT or KMSG_DUMP_RESTART), but various
other developers weren't so sure. Seiji Aguchi pointed to two use cases
(1, 2) he has found for needing
to store the tail of the kernel log messages in most of those cases. In
addition, Artem Bityutskiy pointed out that
having pstore decide which kmsg_dump reasons to handle "smells like
policy in the kernel". Adding more options to control that behavior
is certainly possible, but Luck seems to be of a mind to wait a bit before making any change.
There are other persistent storage methods for kernel log messages,
notably devices/mtd/mtdoops.c and devices/char/ramoops.c.
But those are targeted at the
embedded space where NVRAM devices are prevalent or for platforms where RAM
can be reserved that will not be cleared on a restart. Pstore is more
flexible, as it can store more than just kernel logs, while the two
*oops devices are wired into storing the output of kmsg_dump.
Now that pstore has been merged, others will likely start using it. David
Miller has already indicated
that he will use it for sparc64, where a region of memory can be set aside
to persist across reboots. One would guess that other architectures that
have hardware support for similar mechanisms will as well.
Comments (7 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
Memory management
Architecture-specific
Security-related
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>