The current 2.6 prepatch is 2.6.21-rc4
by Linus on
March 16. It consists mostly of fixes, but there is also a patch
, which lets device-oriented code
request a callback (from process context) in the near future. See the
for more details on 2.6.21-rc4.
The current -mm tree is 2.6.21-rc4-mm1. Recent changes
to -mm include a new version of the lumpy reclaim patch, some
anti-fragmentation work, an updates RSDL scheduler, and the revoke()
There is a 18.104.22.168 stable kernel update in the works as this is written;
it may well be released by the time you read it.
For older kernels: 22.214.171.124 was released on
March 20 with a fair number of fixes, a couple of which are
Comments (none posted)
Kernel development news
Quite frankly, I was *planning* on merging RSDL very early after
2.6.21, but there is one thing that has turned me completely off
the whole thing:
- the people involved seem to be totally unwilling to even admit there
might be a problem.
This is like alcoholism. If you cannot admit that you might have a
problem, you'll never get anywhere. And quite frankly, the RSDL
proponents seem to be in denial ("we're always better", "it's your
problem if the old scheduler works better", "just one report of old
scheduler being better").
-- Linus Torvalds
Comments (none posted)
When memory gets tight (a situation which usually comes about shortly after
starting an application like tomboy), the kernel must find a way to free up
some pages. To an extent, the kernel can free memory by cleaning up its
own internal data structures - reducing the size of the inode and dentry
caches, for example. But, on most systems, the bulk of memory will be
occupied by user pages - that is what the system is there for in the first
place, after all. So the kernel, in order to accommodate current demands
for user pages, must find some existing pages to toss out.
To help in the choice of pages to remove, the kernel maintains two big
linked lists for each memory zone. The "active" list contains pages which
have been recently accessed, while the "inactive" list has those which have
not been used in the recent past. When the kernel looks for pages to
evict, it will scan through the inactive list, in the theory that the pages
least likely to be needed soon are to be found there.
There is an additional complication, though: there are two fundamental
types of pages to be found on these lists. "Anonymous" pages are those
which are not associated with any files on disk; they are process memory
pages. "Page cache" pages, instead, are an in-memory representation of
(portions of) files on the disks. A proper balance between anonymous and
page cache pages must be maintained, or the system will not perform well.
If either type of page is allowed to predominate at the expense of the
other, thrashing will result.
The kernel offers a knob called swappiness which controls how
this balance is struck. If the system administrator sets a higher value of
swappiness, the kernel will allow the page cache to occupy a larger portion
of memory. Setting swappiness to a very low value is a way to tell the
kernel to keep anonymous pages around at the expense of the page cache. In
general, the system can be expected to perform better if page cache pages
are reclaimed first; they can often be reclaimed without needing to be
written back to disk, and their layout on the disk can make recovery faster
should they be needed again. For this reason, the default value for
swappiness favors the eviction of page cache pages; anonymous pages will
only be targeted when memory pressure becomes relatively severe.
Swappiness clearly affects how the process of scanning pages for eviction
candidates is done. If swappiness is low,
anonymous pages will simply be passed over. As it turns out, this behavior
can lead to performance problems; there may be a lot of anonymous pages
which must be scanned over before the kernel finds any page cache pages,
which are the ones it was looking for in the first place. It would be nice
to avoid all of that extra work, especially since it comes at a time when
the system is already under stress.
Rik van Riel has posted a
patch which tries to improve this situation. The approach taken is
quite simple: the active and inactive lists are each split into two new
lists: one pair (active and inactive) for anonymous pages and one pair for
page cache pages. With
separate lists for the page cache, the kernel can go after those pages
without having to iterate over a bunch of uninteresting anonymous pages on
the way. The result should be better scalability on larger systems.
The idea is simple, but the patch is reasonably large. Any code which
puts pages onto one of the lists must be changed to specify which list is
to be used; that requires a number of small changes throughout the memory
management and filesystem code. Beyond that, the current patch does not
really change how the page reclamation code works, though Rik does note:
For now the swappiness parameter can be used to tweak swap
aggressiveness up and down as desired, but in the long run we may
want to simply measure IO cost of page cache and anonymous memory
There tends to be a lot of sympathy for changes which remove tuning knobs
in favor of automatic adaptation within the kernel itself. So if this
approach could be made to work, it might well be adopted. Getting system
tuning right is hard; it's often better if the computer can figure it out
Meanwhile, the list-splitting patch, so far, lacks widespread testing or
benchmarking. So, at this point, it is difficult to say when (or in what
form) this patch will find its way into the mainline.
Comments (17 posted)
Applications do not normally worry about the allocation of blocks for files
they create; instead, they simply write the data and assume the the kernel
will do a proper job of finding a home for that data. There are times when
it is useful to take a more active role in block allocation, though. If an
application knows how much data it will be writing, it can request the
needed blocks ahead of time, enabling the kernel to allocate them all at
once, contiguously on the disk. Application developers concerned about
reliability may also want to know that the needed disk space has already
been procured before beginning a critical operation.
Unix systems have not traditionally provided a way for applications to
control block allocation. An application on a current Linux kernel has
only one way to force allocation: write a stream of data to the relevant
portion of the file. This technique works, but it loses one of the
advantages of preallocation: letting the kernel do all the work at once and
ensure that the blocks are contiguous on disk if possible. Writing useless
data to the disk solely for the purpose of forcing block allocation is also
The POSIX way of preallocating disk space is the posix_fallocate()
system call, defined as:
int posix_fallocate(int fd, off_t offset, off_t len);
On success, this call will ensure that the application can write up to
len bytes to fd starting at the given offset and
know that the disk space is there for it.
Linux does not currently have an implementation of
posix_fallocate() in the kernel. This patch by Amit Arora may
change that situation, however. Amit's patch has been through a couple of
rounds of review which have changed the interface considerably; the current
form of the proposed system call is:
long fallocate(int fd, int mode, loff_t offset, loff_t len);
The fd, offset, and len arguments have the same
meaning as with posix_fallocate(), making it easy for the C library to
implement the standard interface. The additional mode argument
changes the way the call operates; normal usage will be to specify
FA_ALLOCATE, which causes the requested blocks to be allocated.
If, instead, FA_DEALLOCATE is given, the requested block range
will be deallocated, allowing an application to punch a hole in the file.
Internally, the system call does not do much of the work; instead, it calls
the new fallocate() inode operation. Thus, each filesystem must
implement its own fallocate() support. The future plans call for
a possible generic implementation for filesystems which lack
fallocate() support, but the generic version would almost
certainly have to rely on writing zeroes to the file. By pushing the
operation into the filesystem itself, the kernel gives the filesystem the
opportunity to satisfy the allocation in a more efficient way, without the
need to write filler data. Filesystems do need to be sure that
applications cannot use fallocate() to read old data from the
allocated blocks, though.
For now, filesystem-level support is scarce. There are patches circulating
which add fallocate() support to ext4. The XFS filesystem has
supported preallocation (through a special ioctl() call) for some
time, but will need to be modified to do preallocation through the new
inode operation. It's not clear when other filesystems may get native
support; the tracking of allocated but unwritten blocks is a significant
addition. So, for the near future, the efficiency benefits of
fallocate() may be unavailable for most users.
Comments (7 posted)
Fifty members of the Linux storage and file system communities met
February 12 and 13 in San Jose,
California to give status updates, present new ideas and discuss issues during
the 2007 Linux Storage and File Systems Workshop. The workshop was chaired
by Ric Wheeler and sponsored by EMC, NetApp, Panasas, Seagate and Oracle.
Day 1: Joint Session
Ric Wheeler opened with an explanation of the basic contract that storage
systems make with the user: the complete set of data will be
stored, bytes are correct and in order, and raw capacity is utilized as
completely as possible. It is so simple that it seems that there should be no
open issues, right?
Today, this contract is met most of the time but Ric posed a number of
questions. How do we validate that no files have been lost? How do we
verify that the
bytes are correctly stored? How can we utilize disks efficiently for small
files? How do errors get communicated between the layers?
Through the course of the next two days some of these questions were discussed,
others were raised and a few ideas proposed. Continue reading for the details.
Ext4 Status Update
Mingming Cao gave a status update on ext4, the recent fork of the ext3 file
system. The primary goal of the fork was the move to 48-bit block numbers;
this change allows the file system to support up to 1024 petabytes of storage.
This feature was originally designed to be merged into ext3, but was seen as too disruptive. The patch is also
built on top of the new extents
system. Support for greater than 32K directory entries will also be merged
On top of these changes a number of ext3 options will be enabled by default
including: directory indexing which improves file access for large directories,
"resize inodes" which reserve space in the block group descriptor for online
growing, and 256-byte inodes. Ext3 users can use these features today with
a command like:
mkfs.ext3 -I 256 -O resize_inode dir_index /dev/device
A number of other features are also being considered for inclusion into ext4
and have been posted on the list as RFCs.
This includes a patch that will add nanosecond
timestamps and the creation of persistent
file allocations, which will be similar to posix_fallocate() but won't waste
time writing zeros to the disk.
Ext4 currently stores a limited number of extended attributes in-inode and has
space for one additional block of extended attribute data, but this may not be
enough to satisfy xattr-hungry applications. For example, Samba needs
additional space to support Vista's heavy use of ACLs, and eCryptFS can store
arbitrarily large keys in extended attributes. This led everyone to the
conclusion that data needs to be collected on how extended attributes are being used to help
developers decide how to best implement them. Until larger extended attributes
are supported, application developers need to pay attention to the limits that
exist on current file systems e.g. one block on ext3 and 64K on XFS.
Online shrinking and growing was briefly discussed and it was suggested that
online defragmentation, which is a planned feature, will be the first step
toward online shrinking. A bigger issue however is storage management and Ted
Ts'o suggested that the Linux file system community can learn from ZFS on how
to create easy to manage systems. Christoph Hellwig sees the disk management
issue as being a user space problem that can be solved with kernel hooks and
sees ZFS as a layering violation. Either way it is clear that disk management
should be improved.
The fsck Problem
Zach Brown and Valerie Henson were slated to speak on the topic of file system
repair. While Val booted her laptop, she introduced us to the latest fashion:
laptop rhinestones, a great discussion piece if you are waiting on a fsck. If
Val's estimates for fsck time in 2013 come true, having a way to pass the time
will become very important.
Val presented an estimate of 2013 fsck times. She first measured a fsck of her
37GB /home partition (with 21GB in use) which took 7.5 minutes and read 1.3GB of
file system data. Next, she used projections of disk technology from Seagate to
estimate the time to fsck a circa-2013 home partition, which will be 16 times larger.
Although 2013 disks will have a five-fold bandwidth increase, seek times will
only improve about 1.2 times (to 10ms) leading to an increase in fsck time from about 8
minutes to 80 minutes! The primary reason for long fscks is seek latency, since
fsck spends most of its time seeking over the disk discovering and fetching
dynamic file system data like directory entries, indirect blocks and extents.
Reducing seeks and avoiding the seek latency punishment is key to reducing fsck
times. Val suggested one solution would be keeping a bitmap on disk that
tracks the blocks that contain file system metadata; this would allow for
reading all data in a single arm sweep. This optimization, in the best case,
would make a single sequential sweep over the disk and, on the future disk, reading
all file system metadata would only take around 134 seconds, a large
improvement over 80 minutes. A full explanation of the findings and possible
solutions can be found in the paper Repair-Driven File System
Design [PDF]. Also, Val announced that she is working full time on a file system
that will make speed and ease of repair a primary design goal.
Zach Brown presented some blktrace output from e2fsck. The outcome of the trace
is that, while the disk can stream data at 26 Mb/s, fsck is achieving only 12 Mb/s.
This situation could be improved to some degree without on-disk layout changes
if the developers had a vectorized I/O call. Zach explained that in many cases
you know the block locations that you need, but with the current API you can
only read one at a time.
A vectorized read would take a number of buffers and a list of blocks to read
as arguments. Then the application could submit all of the reads at once.
Such a system call could save a significant amount of time since the I/O
scheduler can reorder requests to minimize seeks and merge requests that are
nearby. Also, reads to blocks that are located on different disks could be
parallelized. Although a vectorized read could speed up the fsck eventually
file system layout changes will be needed to make fsck faster.
libata: bringing the ATA community together
Jeff Garzik gave an update on the progress of libata, the in-kernel library to
support ATA hosts and devices. He first presented the ATAPI/SATA features that
libata now supports including: PATA+C/H/S, NCQ, FUA, SCSI SAT, and
CompactFlash. The growing support for parallel ATA (PATA) drives in libata
will eventually deprecate the IDE driver; Fedora developers are helping to
accelerate testing and adoption of the libata PATA code by disabling the IDE
driver in Fedora 7 test 1.
Native Command Queuing (NCQ) is a new command protocol introduced in the SATA
II extensions and is now supported under libata. With NCQ the host can have
multiple outstanding requests on the drive at once. The drive can reorder and
reschedule these requests to improve disk performance. A useful feature of NCQ
drives is the force unit access (FUA) bit which will ensure the data, in write
commands with this bit set, will be written to disk before returning success.
This has the potential of enabling the kernel to have both synchronous and
non-synchronous commands in flight. There was a recent discussion
about both NCQ FUA and SATA FUA in libata on the linux-ide mailing list.
Jeff briefly discussed libata's support for SCSI ATA translation (SAT) which
lets an ATA device appear to be a SCSI device to the system. The motivation
for this translation is the reuse of error handling and support for distribution
installers which already know how to handle SCSI devices.
There are also a number of items slated as future work for libata. Many
drivers need better suspend/resume support and the driver API is due for a sane
initialization model using a allocate/register/unallocate/free system and "Greg
blessed" kobjects. Currently libata is written under the SCSI layer and
debate continues on how to restructure libata to minimize or eliminate its SCSI
dependence. Error handling has been substantially improved by Tejun Heo and
his changes are now in mainline. If you have had issues with SATA or libata
error handling, try an updated kernel to see if those issues have been
resolved. Tejun and others continue to add features and tune the libata stack.
Communication Breakdown: I/O and File Systems
During the morning a number of conversations sprung up about communication
between I/O and file systems. A hot topic was getting information from the
block layer about non-retryable errors that affect an entire range of bytes and
passing that data up to user space. There are situations when retries are
happening on a large range of bytes even when the I/O layer knows that an
entire range of blocks are missing or bad.
A "pipe" abstraction was discussed to communicate data on byte ranges that are
currently in error, under performance strain (because of a RAID5 disk failure),
or temporarily unplugged. If a file system were aware of ranges that are
currently handling a recoverable error, have unrecoverable errors or are
temporarily slow, it may be able to handle the situation more gracefully.
File systems currently do not receive unplug events and handling unplug
situations can be tricky. For example, if a fibre channel disk is pulled for a
moment and plugged back in it may be down for only 30 seconds but how should
the file system handle the situation? Ext3 currently remounts the entire file
system as read only. XFS has a configurable timeout for fibre channel disks
that must be reached before it sends an EIO error. And what should be done
with USB drives that are unplugged? Should the file system save state and hope
the device gets plugged back in? How long should it wait and should it still
work if it is plugged into a different hub? All of these questions were raised
but there are no clear answers.
The Filesystems Track
The workshop split into different tracks; your author decided to follow the
one dedicated to filesystems.
Michael Halcrow, eCryptFS developer, presented an idea to use SELinux to
make file encryption/decryption dependent on application execution. For example, a
policy could be defined so that the data would be unencrypted when OpenOffice
is using the file but encrypted when the user copies the file to a USB key.
After presenting the mechanism and mark-up language for this idea Michael
opened the floor
to the audience. The general feeling was that SELinux is often disabled
by users and that per-mount-point encryption may be a more useful and easy to
understand user interface.
Why Linux Sucks for Stacking
Josef Sipek, Unionfs
maintainer, went over some of the issues involved with stacking file systems
under Linux. A stacking file system, like Unionfs, provides an alternative
view of a lower file system. For example, Unionfs takes a number of mounted
directories, which could be NFS/ext3/etc, as arguments at mount time and merges
their name space.
The big unsolved issue with stacking file systems is handling modifications to
the lower file systems in the stack. Several people suggested that leaving the
lower file system available to the user is just broken and that by default the
lower layers should only be mounted internally.
The new fs/stack.c file was discussed too. This file currently contains a
simple inode copy routines that is used by Unionfs and eCryptfs, but in the
future more stackable file system routines should be pushed to this file.
Future work for Unionfs includes getting it working under lockdep and
additional experimentation with an on-disk format. The on-disk format for
Unionfs is currently under development; it will store white-out files
(representing files which have been deleted by a user but which still exist on
the lower-level filesystems) and
persistent Unionfs inode data.
B-trees for a Shadowed FS
Many file systems use B-trees to represent files and directories. These
structures keep data sorted, are balanced, and allow for insertion and deletion
in logarithmic time. However, there are difficulties in using them with
shadowing. Ohad Rodeh presented his approach to using b-trees and shadowing in
an object storage device, but the methods are general and useful for any
Shadowing may also be called copy-on-write (COW); the basic idea is that
when a write is made the block is read into memory, modified, and written to a
new location on disk. Then the tree is recursively updated starting at the
child and using COW until the root node is atomically updated. In this way the
data is never in an inconsistent state; if the system crashes before the root
node is updated then the write is lost but the previous contents remain intact.
Replicating the details of his presentation would be a wasted effort as his
Shadowing and Clones [PDF], is well written and easy to read. Enjoy!
eXplode the code
Storage systems have a simple and important contract to keep: given user data
they must save that data to disk without loss or corruption even in the face of
system crashes. Can Sar gave an overview of eXplode [PDF], a systematic
approach to finding bugs in storage systems.
eXplode systematically explores all possible choices that can be made at each
choice point in the code to make low-probability events, or corner cases, just
as probable as the main running path. And it does this exploration on a real
running system with minimal modifications.
This system has the advantage of being conceptually simple and very effective.
Bugs were found in every major Linux file system, including a fsync bug that
can cause data corruption on ext2. This bug can be produced by doing the
following: create a new file, B, which recycles an indirect block from a
recently truncated file, A, then call fsync on file B and crash the system
before file A's truncate gets to disk. There is now inconsistent data on disk
and when e2fsck tries to fix the inconsistency it corrupts file B's data. A
discussion of the bug has been started on the linux-fsdevel
The second day of the file systems track started with a discussion of an NFS
race. The race appears when a client opens up a file between two writes
that occur during the same second. The client that just opened the file
unaware of the second write and will keep an out-of-date version of the file in
cache. To fix the problem, a "change" attribute was suggested. This number would
be consistent across reboots, unit-less and would increment on every write.
In general everyone agreed that a change attribute is the right solution,
however Val Henson pointed out that implementing this on legacy file systems
will be expensive and will require on disk format changes.
Discussion then turned to NSFv4 access control lists (ACLs). Trond Myklebust
said they are becoming a standard and Linux should support them. Andreas
Gruenbacher is working on patches to add NFSv4 support to Linux but currently
only ext3 is supported; more information can be found on the Native NFSv4 ACLs on Linux page.
A possibly difficult issue will be mapping current POSIX ACLs to NFSv4 ACLs,
but a draft document, Mapping
Between NFSv4 and Posix Draft ACLs, lays out a mapping scheme.
Steven Whitehouse gave an overview of the recent changes in the Global File
System 2 (GFS2), a cluster file system where a number of peers share
access to the storage device.
The important changes include a new journal layout that can
support mmap(), splice() and other system calls on
journaled files, page cache
level locking, readpages() and partial writepages()
support, and ext3 standard
ioctls lsattr and chattr.
readdir() was discussed at some length, particularly the ways in which it is
broken. A directory insert on GFS2 may cause a reorder of the extensible hash
structure GFS2 uses for directories. In order to support readdir() every hash
chain must be sorted. The audience generally agreed that readdir() is difficult
to implement and Ted Ts'o suggested that someone should try to go through
committee to get telldir/seekdir/readdir fixed or eliminated.
A brief OCFS2 status report was given by Mark Fasheh. Like GFS2, OCFS2 is a
cluster file system, designed to share a file system across nodes in a cluster.
The current development focus is on adding features, as the basic file system
features are working well.
After the status update the audience asked a few questions. The most requested
OCFS2 feature is forced unmount and several people suggested that this should
be a future virtual file system (vfs) feature. Mark also said that users
really enjoy the easy setup of OCFS2 and the ability to use it as a local file
system. A performance hot button for OCFS2 are the large inodes and occupy an
In the future Mark would like to mix extent and extended attribute data
in-inode to utilize all of the available space. However, as the audience
pointed out, this optimization can lead to some complex code. In the future
Mark would also like to move to GFS's distribute lock manager.
DualFS: A New Journaling File System for Linux
DualFS is a file system by Juan Piernas that separates data and meta data into
separate file systems. The on-disk format for the data disk is similar to ext2
without meta-data blocks. The meta data file system is a log file system, a
design that allows for very fast writes since they are always made at the head
of the log which reduces expensive seeks. A few performance numbers were
presented; under a number of micro- and macro-benchmarks DualFS performs
better than other Linux journaling file systems. In its current form, DualFS
uses separate partitions for data and metadata, forcing the user to answer
a difficult question: how much metadata do I expect to have?
More information, including performance comparisons, can be found on the DualFS LKML announcement and the project homepage. The currently
available code is a patch on top of 2.4.19 and can be found on SourceForge.
pNFS Object Storage Driver
Benny Halevy gave an overview of pNFS (parallel NFS), which is part of the IETF
NFSv4.1 draft and
tries to solve the single server performance bottleneck of NFS storage systems.
pNFS is a mechanism for an NFS client to talk directly to a disk device without
sending requests through the NFS server, fanning the storage system out to the
number of SAN devices. There are many proprietary systems that do a similar
thing including EMC's High Road, IBM's TotalStorage SAN, SGI's CXFS and Sun's
QFS. Having an open protocol would be a good thing.
However, Jeff Garzik was skeptical of including pNFS in the NFSv4.1 draft
particularly because to support pNFS the kernel will need to provide
implementations of all three access protocols: file storage, object storage and
block storage. This will add significant complexity to the Linux NFSv4
Benny explained that the pNFS implementation in Linux is modular to support
multiple layout-type specific drivers which are optional. Each layout driver
dynamically registers itself using its layout type and the NFS client calls it
across a well-defined API. Support for specific layout types is optional. In
the absence of a layout driver for some specific layout type the NFS client
falls back to doing I/O through the server.
After this overview Benny turned to the topic of OSDs, or object based storage
devices. These devices provide a more abstract view of the disk than the
classic "array of blocks" abstraction seen in todays disks. Instead of blocks,
objects are the basic unit of an OSD, and each object contains both meta-data
and data. The disk manages the allocation of the bytes on disk and presents
the object data as a contiguous array to the system. Having this abstraction
in hardware would make file system implementation much simpler. To support
OSDs in Linux Benny and others are working to get bi-directional SCSI command
support into the Kernel and support for variable length command descriptor
Hybrid disks with an NVCache (flash memory) will be in consumers' hands soon.
Timothy Bisson gave an overview of this new technology. The NVCache will
have 128-256Mb of non-volatile flash memory that the disk can manage as a cache
(unpinned) or the operating system can manage by pinning specified blocks to
the non-volatile memory. This technology can reduce power consumption or
increase disk performance.
To reduce power consumption the block layer can enable the NVCache Power Mode,
which tells the disk to redirect writes to the NVCache, reducing disk spin-up
operations. In this mode the 10 minute writeback threshold of Linux laptop
mode can be removed. Another strategy is to pin all file system metadata in the
NVCache, but spin-ups will still occur on non-metadata reads. An open question
is how this pinning should be managed when two or more file systems are using
the same disk.
Performance can be increased by using the NVCache as a cache for writes
requiring a long seek. In this mode the block layer would pin the target
blocks ensuring a write to the cache instead of incurring the expensive seek.
Also, a file system can use the NVCache to store its journal and boot files for
additional performance and reduced system start-up time.
If Linux developers decide to manage the NVCache there are many open questions.
Which layer should manage the NVCache? The file system or block layer? And what
type of API should be created to leverage the cache? Another big question is
how much punishment can these caches take? According to Timothy it takes about
a year (using a desktop workload) to a fry the cache if you are using it as a
Scaling Linux to Petabytes
Sage Weil presented Ceph, a network file system that is designed to scale to
petabytes of storage. Ceph is based on a network of object based storage
devices and complete copies of each object is distributed across multiple nodes
using an algorithm called CRUSH. This distribution makes it possible for nodes
to be added and removed from the system dynamically. More information on the
design and implementation can be found on the Ceph homepage
The workshop concluded with the general consensus that bringing together SATA,
SCSI and file system people was a good idea and that the status updates and
conversations were useful. However, the workshop was a bit too large for code
discussion and more targeted workshops will need to be held to workout the
details of some of the issues discussed at LSF'07. Topics for future workshops
include virtual memory and file system issues and extensions that are needed to
Comments (52 posted)
Patches and updates
Core kernel code
Filesystems and block I/O
Page editor: Jonathan Corbet
Next page: Distributions>>