Brief items
The current stable 2.6 release is 2.6.16.5.
2.6.16.2 was released on
April 7 with a fair number of fixes;
2.6.16.3 and
2.6.16.4 - each containing a
single security fix - both came out on April 11, and
2.6.16.5, with a pair of x86-64
fixes, was released on April 12.
The current 2.6 prepatch remains 2.6.17-rc1; no prepatches have been
released over the last week. 2.6.17-rc2 would appear to be imminent,
however, and may be out by the time you read this.
The patches merged since 2.6.17-rc1 are mostly fixes, but there are a few
more substantive changes, including a simplified form of the scheduler starvation avoidance
patch, some tweaks to the memory overcommit algorithm, the removal of
the obsolete blkmtd driver, the removal of the unmaintained Sangoma WAN
drivers, and a new "kernel internal pipe" object (and other changes) for the
splice() system call.
The current -mm tree is 2.6.17-rc1-mm2. Recent changes to
-mm include a new set of patches for 64-bit resource tracking and some core
dump code tweaks.
Comments (none posted)
Kernel development news
"Virtualization" is the act of making a set of processes believe that it
has a dedicated system to itself. There are a number of approaches being
taken to the virtualization problem, with Xen, VMWare, and User-mode Linux
being some of the better-known options. Those are relatively heavy-weight
solutions, however, with a separate kernel being run for each virtual
machine. Often, that is exactly the right solution to the problem; running
independent kernels gives strong separation between environments and
enables the running of multiple operating systems on the same hardware.
Full virtualization and paravirtualization are not the only approaches
being taken, however. An alternative is lightweight virtualization,
generally based on some sort of container concept. With containers, a
group of processes still appears to have its own dedicated system, but it
is really running in a specially isolated environment. All containers run
on top of the same kernel. With containers, the ability to run different
operating systems is lost, as is the strong separation between virtual
systems. Thus, one might not want to give root access to processes running
within a container environment. On the other hand, containers can have
considerable performance advantages, enabling large numbers of them to run
on the same physical host.
There is no shortage of container-oriented projects. These include
relatively simple efforts like the BSD jail module through more
thorough efforts like Linux-VServer, OpenVZ, and the proprietary Virtuozzo (based on OpenVZ) offering. Many of these
projects would like to get at least some of their code into the kernel and
shed the load of carrying out-of-tree patches. There is little
interest, however, in merging code which only supports some of these
projects. The container people are going to have to get together and work
out some common solutions which they can all use.
It appears that this is exactly what the container developers are doing. A
loose agreement has been put in place
wherein developers from a few projects will discuss proposed changes and
jointly work them into a form where they meet everybody's needs. Once a
particular patch has reached a point where all of the developers are
willing to sign off on it, it can be forwarded for eventual merging into
the mainline.
The more complex and intrusive changes, such as PID virtualization, appear to be
on hold for now. Instead, it looks like the first jointly-agreed patch
might be the UTS namespace
virtualization patch. The aim of the patch is relatively straightforward:
it allows each container (as represented by a family tree of processes) to
have its own version of the utsname structure, which holds the
node name, domain name, operating system version, and a few other things.
In essence, it replaces a single global structure with multiple structures
attached at various places in the process tree. It still requires a
five-part patch, with every reference to the global system_utsname
structure replaced by a call to the new utsname() function.
Longer-range plans call for the virtualization of every global namespace in
the kernel, including SYSV IPC, process IDs, and even netfilter rules.
There was an interesting discussion on the virtualization of security
modules; some think that each container should be able to load its own
security policy, while others argue in favor of a single system security
policy which is aware of (and able to use) containers. Unsurprisingly,
SELinux is already equipped with a type hierarchy mechanism which can be
used with containers in the single-policy approach.
Containers might still prove to be a hard sell with some developers, who
will see them as complicating access to many internal kernel data structures
without adding a whole lot of value. It is clear, however, that there is a
demand for this sort of lightweight virtualization - OpenVZ, alone, claims to be running over 300,000 virtual
environments. So the pressure to standardize this code and move it into
the mainline will only grow over time. Once they are clean enough to
satisfy the development community, pieces of the container concept are
likely to be merged.
Comments (6 posted)
The
new splice() system
call was covered here last week. As was predicted then, this new
kernel API has continued to evolve; many of the non-fix patches going into
the post-2.6.17-rc1 mainline involved changes to
splice().
For starters, the prototype of the splice() system call has
changed:
long splice(int fd_in, loff_t *off_in, int fd_out, loff_t *off_out,
size_t len, unsigned int flags);
The two offset values (off_in and off_out) are new; they
indicate where each file descriptor should be positioned prior to beginning
the transfer of data. Note that these offsets are passed via pointers;
user space can use a NULL pointer to indicate that the current offset
should be used. Note also that these offsets do not work like the
offsets in pread() or pwrite(): they will actually change
the offset associated with the file descriptor. Providing an offset for a
file descriptor associated with a pipe is an error.
Internally, the splice() code has seen a couple of interesting
changes. One of them (in the regular pipe code, actually) is the creation
of a new pipe_inode_info structure to represent the core machinery
behind a pipe. This structure can stand apart from the normal
inode structure. Many of the internal interfaces have been
changed to use the new structure, including the new methods in the
file_operations structure:
ssize_t (*splice_write)(struct pipe_inode_info *pipe,
struct file *out, size_t len,
unsigned int flags);
ssize_t (*splice_read)(struct file *in, struct pipe_inode_info *pipe,
size_t len, unsigned int flags);
Since there are still few implementations of these methods in the kernel,
the changes are not particularly disruptive.
Next in the list is support for directly splicing two file descriptors
where neither is a pipe. This functionality is not (yet) available to user
space via splice(), but it is used internally to implement
sendfile() with the splice() mechanism. The direct
splicing is implemented using a hidden pipe_inode_info structure
(i.e. a pipe); it is stored in the new splice_pipe field of the
task structure, so each process can only have one such connection running
at any given time.
One patch which has not been merged - and will likely wait until 2.6.18 at
this point - is the
tee() system call:
long tee(int fdin, int fdout, size_t len, unsigned int flags);
This call requires that both file descriptors be pipes. It simply connects
fdin to fdout, transferring up to len bytes
between them. Unlike splice(), however, tee() does not
consume the input, enabling the input data to be read normally later on by the calling
process. Jens Axboe provides an example implementation of the user-space
tee utility, which comes down to a couple of calls:
len = tee(STDIN_FILENO, STDOUT_FILENO, INT_MAX, SPLICE_F_NONBLOCK);
splice(STDIN_FILENO, out_file, len, 0);
The input data will be written both to the standard output and the file
represented by out_file without ever being copied to or from user
space.
To be sure of copying the entire input data stream, the application must
perform the above calls within a loop, of course; see the full example at
the end of the tee()
patch.
This call is quite new, and may well change before it makes it into the
mainline. Among other things, it might get a new name, since something as
simple as tee() may already be in use in a number of
applications.
Comments (6 posted)
Once upon a time, setting up a Linux system was a long and problematic
process, with no assurance that a given system would work without great
pain. Most of those problems have been overcome for years, and, to a great
extent, a system can be expected to "just work" with Linux. A few
problematic areas remain, however, and wireless networking is one of them.
Even when the available hardware is supported (often not the case), making
a wireless connection work in a fully-featured way can often be a
challenge.
A lot is happening in the wireless area, however. To help things along,
the folks at the Open Source Development Labs hosted a summit for
wireless networking developers on April 6
and 7 at OSDL headquarters in Portland. This event brought
together a diverse mix of developers from around the world, many of whom
had never met before. Its purpose - to chart a path forward for the
creation of a reasonable Linux wireless networking implementation -
appeared to have been largely achieved.
Your editor was fortunate enough to be able to attend this meeting. The
following report is an attempt to summarize the conclusions from the summit
- it is not a set of detailed minutes, and your editor will engage in some
chronological reordering along the way. Hopefully the result will provide
a sense for where things stand, and where they are likely to go in the near
future.
History
As John Linville (the recently-named wireless networking maintainer) noted
in a conversation with your editor, early wireless adapters were marketed
as "wireless Ethernet," and Linux kernel developers treated them as a sort
of slow, unreliable, fiddly Ethernet adapters. But wireless is not
Ethernet in any way - it is a completely different set of networking
standards with its own quirks, special features, and distinct needs.
Treating wireless as a form of Ethernet slowed support for those special
features, and, more importantly, impeded the development of any sort of
internal kernel support for wireless. Each developer who set out to
write a driver for a wireless adapter ended up implementing everything from
the beginning. So there was no general wireless API, no comprehensive
support of wireless features, and a great deal of divergence and
duplication of code between drivers.
In 1997, Jean Tourrilhes decided to do something about this situation. The
result was WE-1 - the first version of the Linux wireless extensions.
There was still no 802.11 standard at the time, but the WE API enabled the
configuration and operation of wireless adapters with a single set of
tools. Jean's wireless tools are still the core utilities for managing
wireless adapters, though the graphical interfaces are replacing the
wireless tools for most users.
Development of the wireless extensions continued, with WE-9 - supporting
the new 802.11 standard - being released in 1999. WE-18, merged last year,
added support for WPA ("WiFi Protected Access"). The current revision,
WE-20, adds a new, netlink-based interface as a future replacement for the
current ioctl() API.
Though development continues, there appears to be a general, shared feeling
that the wireless extensions are heading toward the end of their useful
life. A replacement API - which does not exist yet - would work with the
entire wireless networking stack, rather than being an interface directly
to the low-level drivers. Regardless of how that plays out, however, the
wireless extensions are likely to be around for a long time to come.
The current status
The current effort to create a proper wireless stack for Linux started in
2004, when Jeff Garzik announced the creation of a
special wireless tree, initially seeded with the HostAP code. The merging
of HostAP enabled support of some relatively current networking cards and
the use of a Linux system as a wireless access point. The creation of this
tree did help to get things going, but HostAP has not turned out to be
everything that had been hoped for. In particular, there is no support in
HostAP for cards which need software MAC ("softmac") implementations. But
many contemporary cards rely upon the host software for many low-level
operations; these cards can not be supported by the wireless stack found in
the current (2.6.16) kernel.
The result is, as John Linville put it, a Linux wireless implementation
which supports "anything which is obsolete." Some cards are supported,
some better than others. Most noteworthy among current hardware is the set
of Intel IPW drivers, which, thanks to Intel, have very good support in the
kernel - but these adapters do not need softmac support.
What is lacking, at this point, is a small list of mildly desirable
features, including support for much widely-used hardware. Ease of use is
also lacking - despite improvements in the graphical tools, configuring a
wireless connection can still be a painful procedure. Perhaps the best
demonstration of these two problems was to be found at the summit itself,
where about 25% of the participants ended up using an Ethernet cable to
plug directly into the OSDL network.
Other problems include consistency (or the lack thereof) across hardware -
there are still a number of adapter-specific APIs in the kernel and in the
out-of-tree drivers. The documentation of APIs is, well, nonexistent; a
complaint was heard that Linux Device Drivers does not describe how to
write a wireless driver. There is no coordinated process for extending
APIs. Quality of service support is not present - an issue we'll return to
shortly. There are no driver test suites in general circulation. And the
whole regulatory issue looms over the wireless networking arena, and is the
largest single cause of out-of-tree (or nonexistent) wireless drivers.
Many vendors simply do not feel that they can release programming
information or free drivers and remain compliant with regulatory regimes
worldwide.
Meanwhile, the upcoming 2.6.17 kernel will see some improvements in its
wireless support. John merged one of the many softmac implementations out
there, on the theory that it was one of the most active projects and that
it would help to support driver development. The bcm43xx (Broadcom)
driver, which uses softmac, was also merged, and there are a couple of
other softmac-based drivers under development. Even so, the consensus
appears to be that softmac is not the way forward; that, instead, the
Devicescape stack is the real future of Linux wireless.
Devicescape
Devicescape is a company which offers
a number of products and services around wireless networking. In
developing it offerings, Devicescape created its own, Linux-based 802.11
stack with a number of nice features - including good softmac and WPA
support . This stack was recently released under the GPL and has been
fixed up for the kernel by Jiri Benc. It is regarded by many as
being the best of the available free stacks.
When Jeff Garzik maintained the wireless tree, he took a firm position
against moving to the Devicescape stack, stating instead that the in-kernel
code should be evolved toward the needed capabilities. He appears to have
found himself in the minority, however, and John Linville seems poised to
merge this stack for a future kernel release. He maintains a separate
development tree which includes Devicescape, and some drivers (notably
bcm43xx) have been ported to this stack. Nobody at the summit was heard to
argue against merging Devicescape.
Devicescape hacker Simon Barber talked about this code for a bit,
and a separate breakout session addressed it as well. This stack is a
large body of code. The freely-released code available now includes the
802.11 stack, the "openap" access point code, and a link-layer bridging
module. Work which will be released soon includes improvements to the
hostapd daemon (802.11g support, among other things; this code is being
merged now); bridging and VLAN integration, and various improvements to
Ethereal for wireless developers. There is also "a complete home gateway
distribution" in the works. There is the inevitable web portal being put
together to provide access to all this code.
Quite a bit of work is foreseen for the Devicescape stack. It is composed,
internally, of a long list of handler functions which deal with frames
(both data packets and 802.11 management frames) on their way to and from
the adapter. Future plans call for enabling loadable modules to plug in
their own handler functions. More of the management code may also
eventually be moved out to user space. To that end, some additional
management capabilities will be added to the hostapd daemon, which handles
authentication and management tasks. Merging hostapd with wpa_supplicant, which
handles the client side of the authentication process, is envisioned;
evidently a number of things become easier when the two functions are
merged into the same process.
There is also a great deal of complexity coming with the long list of
future 802.11 standards. These standards will require support as they are
adopted.
One interesting area of development has to do with quality of service
support. 802.11 defines four service levels: "voice," "video," "best
effort," and "background." There is a priority range for each service
level, and the ranges overlap. All voice packets will go out before any
background packets, but the rest of the levels will share the available
bandwidth. With proper QoS support, a wireless user can carry on a
voice-over-IP conversation, stream video of the latest "breaking news"
celebrity sighting from CNN, grab a new kernel by FTP, and distribute
materials (best not to ask what) via Bittorrent. Each activity can operate
at its own quality of service level, and all should get the best available
performance.
Some wireless network adapters have quality of service support in the
form of four separate transmit queues. If the host places each packet in
the appropriate queue, the adapter will divide the available bandwidth
between them in a way which respects each level's service quality. The
problem is that the Linux networking stack only supports one transmit queue
per device. This presents a problem when one of the four device-level
queues fills up. There is no way to tell the kernel that no more
background packets can be queued, but there is still space for voice
packets, for example; the only thing the driver can do is to stop the queue
for all packets.
The Devicescape hackers have worked around this problem using the traffic
control mechanism built into the networking stack, which normally operates
at a level not seen by driver code. By creating a separate internal queue
for each service level, the Devicescape stack can, for all practical
purposes, implement a separate transmit queue for each service level. Even
better, it becomes possible to configure policy - which types of traffic
get which service level - from user space using the normal traffic control
tools. What would be nice, however, would be to generalize this use of the
queueing discipline code, and to make it available for other sorts of
hardware as well.
Another area requiring work is user-space API definition. There is no
well-understood API which, for example, can be used by a graphical wireless
management utility to talk with the networking stack and with processes like
hostapd. There isn't really even a discussion of how such an API should
look at the moment.
Other open issues include the usual regulatory hassles, the lack of a
user-space MAC-layer management environment, the need for better scanning,
support for adapters which perform MAC management in hardware,
power management support, and a rework of the configuration interface.
Configuration is handled by way of ioctl() calls and a
/proc interface. It was noted, in a pointed manner, that the
Devicescape code will not make it into the mainline as long as it contains
/proc files. It seems that the Devicescape stack also needs some
work before it will operate properly on SMP systems.
Finally, adding proper wireless support to the kernel will involve the
creation of a specific net_device type for 802.11 devices. An
802.11-specific sk_buff structure should also be defined. Current
code still uses the Ethernet types and drags along the extra needed
information on the side.
The biggest open issue, however, may be this: what happens to the
just-merged softmac code when Devicescape is merged? There is much
duplication of functionality there, and nobody is thrilled by the idea of
having to maintain two separate 802.11 stacks indefinitely into the
future. There is a clear parallel with the OSS and ALSA sound drivers;
ALSA was supposed to replace OSS, but removing the OSS drivers has proved
to be a difficult thing to do. It is not clear what can be done to make
removing softmac any easier.
Tools
The summit was mostly attended by kernel-oriented developers, but there was
also some discussion of user-space tools; NetworkManager
hacker Daniel Williams was present. It is recognized by all that,
while the quality of the available tools has improved significantly in the
last couple of years, there is some ground to cover yet. In particular,
configuring an interface can be relatively painless when things go well,
but, as soon as something doesn't quite work, the whole experience falls
apart.
Improving the situation will require support from the kernel side. When
things go wrong, user space needs to know just what the problem is. But
there is no consistent set of error codes returned by the kernel to
indicate, for example, that the required adapter firmware is not present,
or the provided WEP key is not valid. Some drivers support more of the
current API than others, which does not help, and API documentation is
generally not available. Better scanning support would also be useful.
Hardware support
While getting the networking stack and user-space tools into shape is
necessary, improving hardware support is also a necessary step toward a
Linux wireless implementation which truly "just works." Some hardware
(Intel, others) is well supported now, others (Broadcom) will be supported
soon. Some, such as the Atheros chipset, may be a long time in coming.
The existing Atheros driver (as found in OpenBSD) appears to be severely
tainted by code of questionable origin, to the point that its chances of
being merged into Linux are about zero. There is an effort to
document the Atheros hardware from the currently-available code, enabling a
clean-room driver implementation in the future, but there is quite a bit of
work yet to be done.
The regulatory compliance issue came up again in this context. Some
adapters (such as Atheros) are, for all practical purposes, general-purpose
radios which can be programmed to operate far out of the 802.11
specification. When a free driver is developed for such hardware, it would
be a Good Thing to be sure that it runs the hardware in a manner compliant
with the applicable regulations, even if it cannot necessarily be certified
as such. That sort of testing requires specialized equipment, however, and
is evidently a multi-day process. The necessary equipment does exist at
companies like Nokia and at some universities, but there is currently no
process for obtaining access to that equipment for compliance testing.
Much of the current driver work is done outside of the mainline tree, and
the kernel developers would like to see that changed. Once code gets into
the mainline, it is easier for others to review and improve. Greg
Kroah-Hartman encouraged driver developers to merge their code as early as
possible, even if it doesn't work yet.
Communications regarding wireless drivers, it was agreed, would remain on
the netdev mailing list for now. If, at some point, that conversation
threatens to overwhelm other traffic on netdev, a new list can be created.
There will also likely be a web set put together for wireless driver
information in the near future.
Other issues
One purpose behind the summit was simply to try to pull more of the
relevant developers into the wider kernel process. To that end, there was
a talk on source control systems - git and quilt in particular. The "merge
early" approach was advocated many times.
Stephen Hemminger gave a talk on the state of the bridging code.
Bridging is of interest to wireless developers - it can be used for
connection sharing and mesh networking applications. To that end, the
bridging code is likely to be reworked and much of it moved to user space.
Just like routing is mostly handled by user-space daemons now, bridging
management - including the spanning tree maintenance - will move to user
space in the future.
Some representatives of the Personal
Telco Project were brave enough to compete with a delivery of pizza for
the developers' attention at lunch time. These folks have put together a
network of over 100 Linux-based free wireless hotspots around Portland.
They had a number of requests of the kernel developers, including
free Atheros drivers which don't crash the system and good,
zero-configuration mesh networking. This is an interesting project which
shows the power of what a few "unemployed geeks" can do.
Overall, the wireless summit was an optimistic event. While the
shortcomings of Linux wireless support were well recognized and understood,
there was also a clear sense that not only could the problems be solved,
but that many of the solutions were already well advanced. If all goes
according to plan, the day when Linux wireless "just works" is not that far
off.
Comments (30 posted)
Patches and updates
Kernel trees
Core kernel code
- Jens Axboe: sys_tee.
(April 11, 2006)
Development tools
Device drivers
Documentation
Filesystems and block I/O
Kernel building
Memory management
Networking
Architecture-specific
Security-related
Virtualization and containers
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>