Brief items
The current development kernel is 3.0-rc4,
released on June 20. It consists of a
bunch of fixes (some for a couple of significant performance
regressions) and a couple of new drivers. It also apparently has a new
compilation error which may require the application of
this
patch to get around. The
full
changelog has all the details.
Stable updates: there have been no stable updates released in the
last week.
Comments (none posted)
Hardware often makes me want to dress all in black, sit at the end
of the bar, drink, and cry. Often Matthew Garrett is right there
with me so at least I have company on my trip to black, black
oblivion.
--
Dan
Williams
So being such a hopeless optimist I set out to solve all pin
controlling in this subsystem. The sales rap would be something
like:
- Do you need to bias your pin to 3.3 V with an 1MOhm resistor?
- Do you want to mux your pin with different functions?
- Do you want to set the load capacitance of your pin to 10pF?
- Do you want to drive your pin as open collector?
- Is all or some the above software-controlled in your ASIC?
- Do you despair from the absence of relevant frameworks?
DON'T WORRY!
The pinctrl subsystem is here to save YOU!
--
Linus Walleij
Due to intermittent email access on my vacation right now, the
stable and longterm kernels will be delayed until late this week,
or early next week.
In place of them, here's a lovely haiku to sooth you:
Sand castle contest
Rain falling sideways and cold
Summer in Oregon
--
Greg Kroah-Hartman
Comments (9 posted)
By Jonathan Corbet
June 22, 2011
The Linux networking developers held their roughly annual networking
minisummit on June 14 and 15 in Toronto. There will probably not
be a detailed report from the gathering, and no videotaping was done, but
there is still some information out there. Interested readers can start
with
the agenda for the
gathering, which includes the slides for almost all of the
presentations. Also worth a read is
Ben
Hutchings's summary with notes from most of the talks:
World IPv6 Day seems to have mostly worked. However there are
still some gaps and silly bugs in IPv6 suport in both Linux kernel
(e.g. netfilter can't track DHCPv6 properly) and user-space
(e.g. ping6 doesn't restrict hostname lookup to IPv6 addresses).
[...]
David [Miller] wants to get rid of the IPv4 routing cache.
Removing the cache entirely seems to make route lookup take about
50% longer than it currently does for a cache hit, and much less
time than for a cache miss. It avoids some potential for denial of
service (forced cache misses) and generally simplifies routing.
All told, it looks like an interesting and productive gathering; if past
patterns hold, we will get a more thorough summary at the 2011 Kernel
Summit in October.
Comments (none posted)
Kernel development news
By Jake Edge
June 22, 2011
Device names, particularly for disks, can be confusing to Linux administrators
because they get assigned at boot time based on the order in which the disks are
discovered. So the same physical disk can be assigned a different device
name (in /dev) on each boot, which means that kernel log messages
and the output of various utilities may not correspond with the
administrator's view of the system. A recent patch set looks to change that situation, but
it is meeting some resistance from kernel hackers who think it should be
handled by user space.
The patches posted by Nao Nishijima are pretty straightforward. They just
add a
preferred_name entry to struct device, which can be
updated via sysfs. The patches then change some SCSI log messages and
/proc/partitions output to use the preferred name if it has been
set. Greg Kroah-Hartman expressed concerns about
changing /proc/partitions as various tools parse that file so it
is part of the kernel's user-space interface. Adding the preferred name
as a new field on each line might very well confuse utilities that parse
the file.
More importantly,
though, he notes that one could just change the tools so that they use
the names as arguments or in their output. Any scheme that would map
preferred names to specific disks requires some kind of mapping file, so
tools that wanted to use these preferred names (things like mount,
smartd, and other disk-related tools) could do so using that
mapping without involving the kernel at all:
Seriously, this could be done by now, it's been over a year since this
was first discussed. All distros could have the updated packages by now
and this would not be an issue.
I still think this is the correct way to solve the problem as it is a
userspace issue, not a kernel one.
While the patches only use preferred_name for disk devices, the
idea is to allow them to be added to any device (and then change log
messages and utilities to use them). It is modeled after the
ifalias entry that was added for network devices back in 2008, but
some don't see that as something to emulate. Allowing only one alias for
network devices is generally not enough "because people want not a
single but multiple names at the same time", Kay Sievers said, so ifalias is only used by some SNMP
utilities. Currently, udev
maintains a set of links in /dev/disk/by-* that relate disks to
kernel devices by a variety of characteristics (ID, label, path, and
UUID). James Bottomley would like to see
that be extended for the
preferred names:
All userspace naming will be taken care of by the usual udev rules, so
for disks, something like /dev/disk/by-preferred/<fred> which would be
the usual symbolic link.
This will ensure that kernel output and udev input are consistent. It
will still require that user space utilities which derive a name for a
device will need modifying to print out the preferred name.
But there are problems inherent in that idea. In order for udev
to know that the preferred name was set, a uevent would have to be
generated. That could be done, but it leads to other problems, as
Sievers points out (instead of
by-preferred, he uses by-pretty):
What would happen if we mount:
/dev/disk/by-pretty/foo
and some tool later thinks the pretty name should better be 'bar', it
writes the name to /sys, we get a uevent, the old link disappears, we
get a new link, mount has no device node anymore for the mounted
device ...
Essentially, udev keeps track of the devices present in the system (and
their attributes like, potentially, preferred name), but doesn't have any
concept of tracking "no longer valid names" as Sievers puts it. That means that udev can't
just leave older entries around when user space changes the preferred name: "We can not just add stuff to /dev without a udev database entry, it
would never get removed on device unplug and leave a real mess
behind."
One possible solution for the renaming problem is to only allow one write
to preferred_name, so that, once established, those aliases
couldn't be changed without a reboot. udev could set up the
proper links, and various tools could use the aliases as needed. That
would solve the renaming problem at the cost of some flexibility. In
general, no one was really opposed to the idea of some kind of
more-mnemonic name for disks, it is more of a question of how to get
there.
Sievers proposed adding a way for udev to list all of the symlinks
that it creates during device discovery. Anyone (or any tool) that needed
to associate an alias with a particular disk could use that output to
determine the current device being used (based on the UUID for example),
then make the substitution as appropriate. That would work, in general,
but Bottomley sees it as overly complex for
users:
However, even if we assume they choose
one of the current names, they still have to do the mapping manually;
even if they have all the information, they can't just cut and paste
from dmesg say, they have to cut, edit the buffer to put in the
preferred name and then paste ... that's just one annoying step too far
for most users. I agree that all the output tools within reason can be
fixed to do this automatically, but fixing cat say, just so
cat /proc/partitions works would never be acceptable upstream.
The reason for storing this in the kernel is just that it's easier than
trying to update all the tools, and it solves 90% of the problem, which
makes the solution usable, even if we have to update tools to get to
100%.
But Sievers and driver core maintainer Kroah-Hartman see it as papering
over more substantial issues. Sievers, at least, would like to see
text-file-style debug and error message output replaced (or supplemented)
with something more
structured:
We need _smart_ userspace
with a debug/error message channel from the kernel to userspace that
pops out _structured_ data. Userspace needs to index the data, and
merge a lot of userspace information into it.
Adding just another single-name to the kernel just makes the
much-too-dumb free-text printk() a bit more readable, but still sounds
not like a solution. Pimping up syslog is not the solution to this
problem, and it can't be solved in the kernel alone.
But, from the user's perspective, disks may already have names (with labels
on the enclosures themselves for example) and it would
be quite convenient for the kernel's messages to reflect them. In the end,
Sievers isn't opposed to a disk-specific
(rather than for all devices)
solution, though he thinks it isn't really the right direction to go.
Kroah-Hartman agrees and is
adamant that this change not go into the driver core. Based on that,
Nishijima plans to redo the patches, moving
the name to struct gendisk, renaming the field to
alias_name (rather than "preferred") to better reflect its
purpose, and generating a uevent when the name changes.
But, following the lead of the network ifalias will add to the
kernel's user-space interface, only this time for disks. While it may
solve an immediate problem for administrators, it will also leave behind some
legacy code when, or if, a better solution comes around. That's
unfortunate, but, since it solves a real problem, and the change is
restricted to subsystem whose
maintainer (Bottomley) is in favor of it, it may well turn up in the
mainline before long. Any change to system error and debug logging along
the lines of what Sievers described is certainly quite a ways off, though
there have long been calls for structured kernel output.
Sometimes it is just easier to make a change like this in one place, rather
than trying to identify and fix all of the places outside of the kernel
that would need it.
Comments (30 posted)
By Jonathan Corbet
June 21, 2011
In the very early days, Linux users often had to tell the kernel where
specific devices were to be found before their systems would work. In the
absence of this information, the driver could not know which I/O ports and
interrupt line(s) the device was configured to use. Happily, we now live
in the days of busses like PCI which have discoverability built into them;
any device sitting on a PCI bus can tell the system what sort of device it
is and where its resources are. So the kernel can, at boot time, enumerate
the devices available and everything Just Works.
Alas, life is not so simple; there are plenty of devices which are still
not discoverable by the CPU. In the embedded and system-on-chip world,
non-discoverable devices are, if anything, increasing in number. So the
kernel still needs to provide ways to be told about the hardware that is
actually present. "Platform devices" have long been used in this role in
the kernel. This article will describe the interface for platform devices;
it is meant as needed background material for a
following article on integration with device trees.
Platform drivers
A platform device is represented by struct platform_device, which,
like the rest of the relevant declarations, can be found in
<linux/platform_device.h>. These devices are deemed to be
connected to a virtual "platform bus"; drivers of platform devices must
thus register themselves as such with the platform bus code. This
registration is done by way of a platform_driver structure:
struct platform_driver {
int (*probe)(struct platform_device *);
int (*remove)(struct platform_device *);
void (*shutdown)(struct platform_device *);
int (*suspend)(struct platform_device *, pm_message_t state);
int (*resume)(struct platform_device *);
struct device_driver driver;
const struct platform_device_id *id_table;
};
At a minimum, the probe() and remove() callbacks must be
supplied; the other callbacks have to do with power management and should
be provided if they are relevant.
The other thing the driver must provide
is a way for the bus code to bind actual devices to the driver; there are
two mechanisms
which can be used for that purpose. The first is the id_table
argument; the relevant structure is:
struct platform_device_id {
char name[PLATFORM_NAME_SIZE];
kernel_ulong_t driver_data;
};
If an ID table is present, the platform bus code will scan through it every
time it has to find a driver for a new platform device. If
the device's name matches the name in an ID table entry, the device will be
given to the driver for management; a pointer to the matching ID table
entry will be made available to the driver as well.
As it happens, though, most platform
drivers do not provide an ID table at all; they simply provide a
name for the driver itself in the driver field. As an example,
the i2c-gpio driver turns two GPIO lines into an i2c bus; it sets itself up
as a platform device with:
static struct platform_driver i2c_gpio_driver = {
.driver = {
.name = "i2c-gpio",
.owner = THIS_MODULE,
},
.probe = i2c_gpio_probe,
.remove = __devexit_p(i2c_gpio_remove),
};
With this setup, any device identifying itself as "i2c-gpio" will
be bound to this driver; no ID table is needed.
Platform drivers make themselves known to the kernel with:
int platform_driver_register(struct platform_driver *driver);
As soon as this call succeeds, the driver's probe() function can
be called with new devices. That function gets as an argument a
platform_device pointer describing the device to be instantiated:
struct platform_device {
const char *name;
int id;
struct device dev;
u32 num_resources;
struct resource *resource;
const struct platform_device_id *id_entry;
/* Others omitted */
};
The dev structure can be used in contexts where it is needed - the
DMA mapping API, for example. If the device was matched using an ID table
entry, id_entry will point to the specific entry matched. The
resource array can be used to learn where various resources,
including memory-mapped I/O registers and interrupt lines, can be found.
There are a number of helper functions for getting data out of the resource
array; these include:
struct resource *platform_get_resource(struct platform_device *pdev,
unsigned int type, unsigned int n);
struct resource *platform_get_resource_byname(struct platform_device *pdev,
unsigned int type, const char *name);
int platform_get_irq(struct platform_device *pdev, unsigned int n);
The "n" parameter says which resource of that type is desired,
with zero indicating the first one. Thus, for example, a driver could find
its second MMIO region with:
r = platform_get_resource(pdev, IORESOURCE_MEM, 1);
Assuming the probe() function finds the information it needs, it
should verify the device's existence to the extent possible, register the
"real" devices associated with the platform device, and return zero.
Platform devices
So now we have a driver for a platform device, but no actual devices yet.
As was noted at the beginning, platform devices are inherently not
discoverable, so there must be another way to tell the kernel about their
existence. That is typically done with the creation of a static
platform_device structure providing, at a minimum, a name which is
used to find the associated driver. So, for example, a simple (fictional)
device might be set up this way:
static struct resource foomatic_resources[] = {
{
.start = 0x10000000,
.end = 0x10001000,
.flags = IORESOURCE_MEM,
.name = "io-memory"
},
{
.start = 20,
.end = 20,
.flags = IORESOURCE_IRQ,
.name = "irq",
}
};
static struct platform_device my_foomatic = {
.name = "foomatic",
.resource = foomatic_resources,
.num_resources = ARRAY_SIZE(foomatic_resources),
};
These declarations describe a "foomatic" device with a one-page
MMIO region starting at 0x10000000 and using IRQ 20. The
device is made known to the system with:
int platform_device_register(struct platform_device *pdev);
Once both a platform device and an associated driver have been registered,
the driver's probe() function will be called and the device will
be instantiated. Registration of device and driver are usually done in
different places and can happen in either order. A call to
platform_device_unregister() can be used to remove a platform
device.
Platform data
The above information is adequate to instantiate a simple platform device,
but many devices are more complex than that. Even the simple i2c-gpio
driver described above needs two additional pieces of information: the
numbers of the GPIO lines to be used as i2c clock and data lines. The
mechanism used to pass this information is called "platform data"; in
short, one defines a structure containing the specific information needed
and passes it in the platform device's dev.platform_data field.
With the i2c-gpio example, a full configuration looks like this:
#include <linux/i2c-gpio.h>
static struct i2c_gpio_platform_data my_i2c_plat_data = {
.scl_pin = 100,
.sda_pin = 101,
};
static struct platform_device my_gpio_i2c = {
.name = "i2c-gpio",
.id = 0,
.dev = {
.platform_data = &my_i2c_plat_data,
}
};
When the driver's probe() function is called, it can fetch the
platform_data pointer and use it to obtain the rest of the
information it needs.
Not everybody in the kernel community is enamored with platform devices;
they seem like a bit of a hack used to encode information about specific
hardware platforms into the kernel. Additionally, the platform data
mechanism lacks any sort of type checking; drivers must simply assume that
they have been passed a structure of the expected type. Even so, platform
devices are heavily used, and that's unlikely to change, though the means
by which they are created and discovered is changing. The way of the
future appears to be device trees, which will be described in the following
article.
Comments (5 posted)
By Jonathan Corbet
June 21, 2011
The
first part of this pair of articles
described the kernel's mechanism for dealing with non-discoverable devices:
platform devices. The platform device scheme has a long history and is
heavily used, but it has some disadvantages, the biggest of which is the
need to instantiate these devices in code. There are alternatives coming
into play, though; this article will describe how platform devices interact
with the device tree mechanism.
The current platform device mechanism is relatively easy to use for a
developer trying to bring up Linux on a new system. It's just a matter of
creating the descriptions for the devices present on that system and
registering all of the devices at boot time. Unfortunately, this approach
leads to the proliferation of "board files," each of which describes a
single type of computer. Kernels are typically built around a single board
file and cannot boot on any other type of system. Board files sort of
worked when there were relatively small numbers of embedded system types to
deal with. Now Linux-based embedded systems are everywhere, architectures
which have typically depended on board files (ARM, in particular) are
finding their way into more types of systems, and the whole scheme looks
poised to collapse under its own weight.
The hoped-for solution to this problem goes by the term "device trees"; in
essence, a device tree is a textual description of a specific system's
hardware configuration. The device tree is passed to the kernel at boot
time; the kernel then reads through it to learn about what kind of system
it is actually running on. With luck, device trees will abstract the
differences between systems into boot-time data and allow generic kernels
to run on a much wider variety of hardware.
This article is a
good introduction to the device tree format and how it can be used to
describe real-world systems; it is recommended reading for anybody
interested in the subject.
It is possible for platform devices to work on a device-tree-enabled system
with no extra work at all, especially once Grant Likely's improvements are merged. If
the device tree includes a platform device
(where such devices, in the device tree context, are those which are
direct children of the root or are attached to a "simple bus"), that device
will be
instantiated and matched against a driver. The memory-mapped I/O and
interrupt resources will be marshalled from the device tree description and
made available to the device's probe() function in the usual way.
The driver need not know that the device was instantiated out of a device
tree rather than from a hard-coded platform device definition.
Life is not always quite that simple, though.
Device names appearing in the device tree (in the "compatible"
property) tend to take a standardized form which does not necessarily match
the name given to the driver in the Linux kernel; among other things,
device trees really are meant to work with more than one operating system.
So it may be desirable to attach specific names to a platform device for
use with device trees. The kernel provides an of_device_id
structure which can be used for this purpose:
static const struct of_device_id my_of_ids[] = {
{ .compatible = "long,funky-device-tree-name" },
{ }
};
When the platform driver is declared, it stores a pointer to this table in
the driver substructure:
static struct platform_driver my_driver = {
/* ... */
.driver = {
.name = "my-driver",
.of_match_table = my_of_ids
}
};
The driver can also declare the ID table as a device table to enable
autoloading of the module as the device tree is instantiated:
MODULE_DEVICE_TABLE(of, my_of_ids);
The one other thing capable of complicating the situation is platform
data. Needless to say, the device tree code is unaware of the specific
structure used by a given driver for its platform data, so it will be unable
to provide that information in that form. On the other hand, the device
tree mechanism is equipped to allow the passing of just about any
information that the driver may need to know. Making use of that
information will require the driver to become a bit more aware of the
device tree subsystem, though.
Drivers expecting platform data should check the dev.platform_data
pointer in the usual way. If there is a non-null value there, the driver
has been instantiated in the traditional way and device tree does not enter
into the picture; the platform data should be used in the usual way. If,
however, the driver has been instantiated from the device tree code, the
platform_data pointer will be null, indicating that the
information must be acquired from the device tree directly.
In this case, the driver will find a device_node pointer in the
platform devices dev.of_node field. The various device tree
access functions (of_get_property(), primarily) can then be used
to extract the needed information from the device tree. After that, it's
business as usual.
In summary: making platform drivers work with device trees is a relatively
straightforward task. It is mostly a matter of getting the right names in
place so that the binding between a device tree node and the driver can be
made, with a bit of additional work required in cases where platform data
is in use. The nice result is that the static platform_device
declarations can go away, along with the board files that contain them.
That should, eventually, allow the removal of a bunch of boilerplate code
from the kernel while simultaneously making the kernel more flexible.
Comments (16 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
Memory management
Networking
Architecture-specific
Virtualization and containers
Page editor: Jonathan Corbet
Next page: Distributions>>