Longtime GStreamer hacker Wim Taymans opened the first-ever GStreamer
conference with a look at where the multimedia framework came from, where
it stands, and where it will be going in the future. The framework is a
bit over 11 years old and Taymans has been working on it for ten of those
years, as conference organizer Christian Schaller noted in his
introduction. From a simple project that was started by Eric Walthinsen
on an airplane flight, GStreamer has grown into a very capable framework
that is heading toward its 1.0 release—promised by Taymans by the end
Starting off with the "one slide about what GStreamer is",
Taymans described the framework as a library for making multimedia
applications. The core of the framework, which provides the plugin system for
inputs, codecs, network devices, and so on, is the interesting part to
him. The actual implementations of the plugins are contained in separate
with a core-provided "pipeline that allows you to connect
When GStreamer was started, the state of Linux multimedia was "very
poor". XAnim was the utility for playing multimedia formats on
Linux, but it was fairly painful to use. Besides GStreamer, various other
multimedia projects (e.g. VLC, Ogle, MPlayer, FFmpeg, etc.) started in the
1999/2000 timeframe, which was something of an indication of where things
were. The competitors were well advanced as QuickTime had appeared in 1991
and DirectShow in 1996. Linux was "way behind", Taymans said.
GStreamer's architecture came out of an Oregon Graduate Institute research
project with some ideas from DirectShow (but not the bad parts) when the
was started in 1999. Originally, GStreamer was not necessarily targeted at
multimedia, he said.
The use cases for GStreamer are quite varied, with music players topping
the list. Those were "one of the first things that actually
worked" using GStreamer. Now there are also video players (which
are moving into web browsers), streaming servers, audio and video editors,
and transcoding applications. One of the more recent uses for GStreamer,
which was "unpredicted from my point of view", is for
voice-over-IP (VoIP) and both the Empathy messaging application and
Tandberg video conferencing application are using it.
After the plane flight, Walthinsen released version 0.0.1 in June 1999. By
July 2002, 0.4.0 was released with GNOME support, though it was "very
rough". In February 2003, 0.6.0 was released as the first version
where audio worked well. After a major redesign to support
multi-threading, 0.10.0 was released in December 2005. That is still the
most recent major version, though there have been 30 minor releases, and
0.10.31 is coming soon. 0.10.x has been working very well, he said, which
raises the question about when there will be a 1.0.
To try to get a sense for the size of the community and how it is growing,
Taymans collected some statistics. There are more than 30 core developers
in the project along with more than 200 contributors for a codebase that is
roughly 205K lines of code. He also showed various graphs of the
commits per month for the project and pointed a spike around the time of
the redesign for 0.10. There was also a trough at the point of the Git
conversion. As expected, the trend of the number of commits per month
rises over the life of the project.
In order to confirm a suspicion that he had, Taymans made the same graph
for just the core, without the plugins, and found that commits per month
has trailed off over the last year or so. The project has not been doing
much in the way of new things in the core recently and this is reflected in
the commit rate. He quoted Andy Wingo as an explanation for that:
"We are in 'a
state of decadence'".
When looking at a graph in the number of lines of code, you can see
different growth rates between the core and plugins as well. The core
trend line is a flat, linear growth rate. In contrast, the trend line for the
plugins shows exponential growth. This reflects the growing number of
plugins, many of which are also adding new features, while the core just gets
incremental improvements and features.
The current state
Taymans then spent some time describing the features of GStreamer. It is
fully multi-threaded now; that code is stable and works well. The advanced
trick mode playback is also a high point, and it allows easy seeking within
audio and video streams. The video editing support is coming along, while
the RTP and streaming support are "top notch". The plugins are
extensive and well-tested because they are out there and being used by lots
of people. GStreamer is used by GNOME's Totem video player, which puts
it in more hands. "Being in GNOME helps", he said.
The framework has advanced auto-plugging features that allow for dynamic
pipeline changes to support a wide variety of application types. It is
also very "binding friendly" as it has bindings for most
languages that a developer might want to use. Developers will also find that
it is "very debuggable".
There are many good points with the 0.10 codebase, and he is very happy
with it, which is one of the reasons it has taken so long to get to a 1.0
release. The design of 0.10 was quite extensible, and allowed many more
features to be added to it. Structures were padded so that additional
elements could be added for new features, without breaking the API or ABI.
For example, the state changes and clocks handling code was rewritten
during the 0.10 lifetime. The developers were also able to add new features like
navigation, quality of service, stepping, and buffering in 0.10.
Another thing that GStreamer did well was to add higher-level objects.
GStreamer itself is fairly low-level, but for someone who just wants to
play a file, there are a set of higher-level constructs to make that easy—like playbin2, for playing video and audio content, and tagreadbin
to extract media metadata. The base classes that were implemented for
0.10, including those that have been added over the last five years, are
also a highlight of the framework. Those classes handle things like sinks,
transforms, decoders, encoders, and so on.
There are also a number of bad points in the current GStreamer. The
current negotiation of formats, codecs, and various other variable
properties is too slow. The initial idea was to have a easy and
comprehensible way to ask an object what it can do. That query will return
the capabilities of the object, as well as the capabilities of everything
that it is connected to, so the framework spends a lot of time generating a
huge list of capabilities. Those capabilities are expressed in too verbose
of a format in Taymans's opinion. Reducing the verbosity and rethinking the
negotiation API would result in major performance gains.
The "biggest mistake of all" in GStreamer is that there is no
extensible buffer metadata. Buffers are passed between the GStreamer
elements, and there is no way to attach new information, like pointers to
multiple video planes or information to handle non-standard strides, to
those buffers. There also need to be generic ways to map the buffer data
to support GPUs and DSPs, especially in embedded hardware. It is very
difficult to handle that with GStreamer currently and is important for
While dynamic pipeline modifications work in the current code, "the
moment you try it, you will suffer the curse of new segments",
Taymans said. Those can cause the application to lose its timing and
synchronization, and it is not easy to influence the timing of a stream, so
it is difficult for an application to recover from. The original idea was that
applications would create objects that encapsulated dynamic modifications,
but that turned out not to be the case. There are also a handful of minor
problems with 0.10, including an accumulation of deprecated APIs, running
out of padding
in some structures, and it becoming harder to add new features without
breaking the API/ABI.
A look to the future
To address those problems, Taymans laid out the plans for GStreamer for the
next year or so. In the short term, there will be a focus on speeding up
the core, while still continuing to improve the plugins. There are more
applications trying to do realtime manipulation of GStreamer pipelines, so
it is important to make the core faster to support them. Reducing overhead
by removing locks in shared data structures will be one of the ways used to
In the medium term, over the next 2 or 3 months, Taymans will be collecting
requirements for the next major version. The project will be looking at
how to fix the problems that have been identified, so if anyone "has
other problems that need fixing, please tell me". There will also
be some experimentation in Git branches for features like adding extensible
Starting in January, there will be a 0.11 branch and code will be merged
there. Porting plugins to 0.11 will then start, with an eye toward having
them all done by the end of 2011. Once the plugins have started being
ported, applications will be as well. Then there will a 1.0 release near
the end of 2011, not 2010 as was announced in the past. "This time
we'll do it,
promise". Taymans concluded his talk with a joking
promise that "world domination" would then be the result of a GStreamer 1.0
Comments (54 posted)
The term "high dynamic range photography" (HDR) encompasses a variety of
techniques for working with specialized image formats that are capable of
handling extremes of brightness and shadow beyond what can be stored in
more pedestrian file formats like TIFF and JPEG, and beyond what can be
displayed on CRT and LCD monitors. The leading HDR application for desktop
Linux users is Luminance
HDR, though that dominance is mostly by default: Linux-based HDR
applications are quite scarce, and have, if anything, become more so since
the Grumpy Editor's HDR with Linux
article was published in 2007. Luminance recently released an update,
which makes progress
on the usability front, but still leaves considerable room for growth in
Version 2.0.1 was unveiled on October 9, the
first update to the 2.0-series released by the project's new maintainer
Davide Anastasia. Anastasia inherited maintenance duties in September,
making him the fourth project leader in two years. At that time he outlined a short list of
goals on the project blog, beginning with fixing long-standing crashes,
then working to undo feature regressions introduced in the 2.0 release, and
finally improving on what many users and software reviewers have
(accurately) described as a confusing user interface. 2.0.1 introduces a
few cleanups, but primarily consists of bug-fixes.
Linux users can download source code
packages from the project's SourceForge.net site (the URL comes from the
application's original name, Qtpfsgui — arguably the most
intimidating project moniker open source software has ever released).
There are Mac OS X and Windows binaries provided as well. The code is
simple to compile; it uses the Qmake build tool and depends on Qt4, the
image processing libraries Exiv2, libTIFF, and OpenEXR, and the FFTW3 and
GNU Scientific Library math libraries. The only hiccup that I encountered
in the build process was that Qt4-specific versions of Qmake, the Qt user
interface compiler UIC, and Qt meta-object compiler MOC are required; those
who build Qt4 applications regularly should have no trouble whatsoever.
HDR workflow: image creation
Presumably, anyone with hardware capable of natively capturing and
displaying HDR content also has special-purpose editing software provided
by Skywalker Ranch, Weta Digital, or some other professional studio.
Luminance is designed for the rest of us, with standard-issue digital
cameras and displays. Thus, its workflow consists of two major tasks:
importing a set of low dynamic range (LDR) images to blend into a single
HDR image, and taking an HDR image and mapping it into an LDR format for
distribution over the web or in print.
Most readers have seen HDR-based photos on Flickr or other online sites.
The canonical example scenarios are city streets photographed late at night
(where the buildings, the street lights, and the lit windows are all
visible in one shot) and scenes in broad daylight, where both sun-lit and
shaded subjects are properly exposed. In all of these situations, the key
is taking multiple exposures at different settings: some exposed for the
shadow areas, some exposed for the highlights. In software, we can blend
them together, leaving neither washed-out bright spots nor murky,
Creating a new HDR image in Luminance consists of loading in the set of
LDR originals, taken at bracketed exposure settings, lining them up, and
blending the stack down into a single image. The image importer allegedly
supports a wide variety of formats, including any camera raw file type
supported by DCraw,
JPEG, and TIFF. Once imported, Luminance reads the exposure setting from
each file's EXIF tags, or allows you to input it manually if such a tag
cannot be found. There are two automatic image-alignment algorithms
available — an internal scheme labeled "Median threshold bitmap," and
the align_image_stack function from the open source panorama tool Hugin. Alternately, you can
choose to manually align the images using built-in editing tools. Finally,
you must choose an HDR image "profile," consisting of a weighting function,
response curve, and HDR creation algorithm. The settings you choose are
applied to your stack of input images, and the result pops up in a preview
window, where you can inspect it in all of its HDR glory.
In my tests, however, there were more than a few pitfalls to this
process. First, selecting and loading the LDR images is more difficult
than it needs to be, because you must select all of the images you
want to use in the file selector, at the same time (i.e., there is no "add
another" button). This means they must all be in the same directory, and on
a practical level it means you must look them up in an image previewer
first, because there is no thumbnail preview, and after a while the
contents of IMG_4342.CR2 and IMG_4243.CR2 become harder to memorize.
Luminance also cannot read Exif tags from TIFF files, and I was unable to
successfully load JPEG conversions of my raw images, with Luminance
complaining that they were an "invalid size," regardless of what size they
were saved at.
The alignment step is also problematic; the Median threshold algorithm
crashed every time I tried it, and align_image_stack tended to hang
indefinitely. Eventually I decided to align my test images in Hugin
directly, but this is also a very trying process. The wiki documentation
is more than two releases out-of-date, and I could not decipher which
combination of checkboxes needed to be set for Hugin to align the images
geometrically without attempting to blend their exposure settings. That
experiment ended up being a useless tangent anyway, however, because
Luminance could not read any of the TIFF files Hugin produced. At the
Hugin wiki's suggestion, I also attempted to use the Perl-based hdrprep
for alignment, but it too failed to read the Exif data from the TIFFs.
The manual alignment tools offer some fine-grained control, including
multiple ways to overlay two images on the canvas in order to eyeball their
overlap and a masking function called Anti Ghosting. Sadly these tools
also fall a bit short, primarily because there is no way to correct
rotation problems, only vertical and horizontal pixel shifts. Even when I
took test photos with a tripod, a small amount of rotation misalignment was
part of the natural wind-and-camera-shake effects.
It is also difficult to make an informed choice about the HDR profiles,
which are named "Profile 1" through "Profile 6." The weighting function,
response curve, and HDR creation algorithm options are similarly opaque,
and because installing 2.0.1 from source evidently does not include the
user manual, looking up the HDR terminology online is the confused
user's only recourse.
HDR workflow: image output
Luminance is capable of saving directly to several HDR image formats,
TIFF and OpenEXR.
These formats use floating point numbers rather than integers for pixel
data, allowing them to encode a much wider range of total values —
potentially 38 f-stops, depending on the options.
This is orders of magnitude greater than a modern PC screen can display,
so Luminance provides an HDR "visualizer" that allows you to explore an HDR
image by adjusting the gamma and exposure with sliders. It might be
confusing to new users at first, because it appears at first glance like
the process of importing and blending the source images has produced
nothing more than another LDR image, but in fact the visualizer only shows
a portion of the image at a time, due to the physical limitations of the
If your goal is to save the image to OpenEXR or another format, you can
do so as soon as the import process is complete. Most of the time,
however, you will be interested in the second of Luminance's major tasks,
compressing the HDR image back into a common LDR format — in such a
way that it preserves as much detail as possible. You do this with the
"Tonemap HDR image" menu entry, which brings up a workspace where you can
select and test nine different tone-mapping algorithms, creating
thumbnail-sized preview images before committing to a final choice.
Here again the user interface confronts the user with a formidable list
of techno-speak options and little in the way of explanation. At some
level, that is expected; the algorithms have scientific (rather than
marketing-approved) names such as Mantiuk '06 and Reinhard '02 because they
are named after their
creators. But without reading the original papers, it is unreasonable
to expect a user to decipher all of the individual settings. The Ashikhmin
algorithm, for example, sports a checkbox labeled "Simple" and a radio
button allowing you to choose between "Equation Number 2" and "Equation
Number 4." Anyone who can guess what that means without looking is my
Still, at least Luminance gets it right by allowing you to experiment
with multiple test images and to compare them side-by-side. Other parts of
the GUI (such as the image loader) have a frustrating lack of backup or
undo operations. The final output, after all, is the end that justifies
all of the means — so if a user can experiment with different
algorithms and eventually stumble across a pleasant result, he or she will
be happy even if the underlying formulas remain a mystery.
The upper-bound on usability
The tone-mapping algorithm "issue" raises an important question for
Luminance and other niche graphics applications, namely: is it
always possible to build a user interface with novice-level
simplicity, or are some tasks inherently complicated? Do users really
need all nine tone-mapping algorithms? Perhaps Luminance could be
refactored to hide all of the mathematical details from the user, or dress
them up in friendlier terms — but maybe that process would destroy
too much of the application itself, turning it into a toy. The same
question could probably be asked about Hugin or any of several complex GIMP
I tend to think that photographers (like everyone else) have a greater
capacity for understanding the scary mathematical and theoretical tasks
than they give themselves credit for. Most have gotten used to the arcane
demosaicing and noise-removal algorithms found in raw image editors, after
all. While Luminance 2.0.1 was frustrating to work with for many reasons,
the bulk of the frustration came not from exposing too much scientific
technobabble, but from the same sort of usability and interface problems
that plague any understaffed project: the lack of thumbnail previews,
vacant tooltips, missing "undo" buttons, unsupported file formats, and
sudden crashes. My guess is that, absent those stumbling blocks, almost
any user could get used to the peculiarities of HDR image creation and
That having been said, Anastasia has his work cut out for him.
Luminance has had many cooks in recent years, a fact that has undoubtedly
contributed to its perplexing user interface and crash-proneness. Cleaning
it up is high on Anastasia's to-do list as project maintainer; those of us
who want to see a high-quality open source HDR tool can only hope he
manages to build some momentum. Version 2.0.1, although it was only a
bug-fix release, is a tantalizing first step because it came mere weeks
after Anastasia took over the reins — the gap between the last 1.9.x
release and 2.0.0 lasted well over a year. Today, Luminance has an active
maintainer, a new release, and a TODO file included with the source code
package. It isn't perfect, but it could be the beginning of something
Comments (4 posted)
The exploration of design patterns is importantly a historical
search. It is possible to tell in the present that a particular
approach to design or coding works adequately in a particular
situation, but to identify patterns which repeatedly work, or
repeatedly fail to work, a longer term or historical perspective is
needed. We benefit primarily from hindsight.
on design patterns took advantage of the
development history of the Linux Kernel only implicitly, looking at
the patterns that could be found it the kernel at the time with little
reference to how they got there. Perspective was provided by looking
at the results of multiple long-term development efforts, all included
in the one code base.
For this series we try to look for patterns which become visible only over
an extended time period. As development of a system proceeds, early
decisions can have consequences that were not fully appreciated when
they were made. If we can find patterns relating these decisions to
their outcomes, it might be hoped that a review of these patterns
while making new decisions will help to avoid old mistakes or to
leverage established successes.
A very appropriate starting point for this exploration is the Ritchie
and Thompson paper, published in Communications of the ACM, which introduced
In that paper the authors claimed that the success of Unix was not in
"new inventions but rather in the full exploitation of a carefully
selected set of fertile ideas."
The importance of "careful selection" implies a historical
perspective much like the one here proposed for exploring design
patterns. A selection can only be made if previous experience is
available which demonstrates a number of design avenues to choose
between. It is to be hoped that identifying patterns would be one
aspect of the care taken in that selection.
Over four weeks we will explore four design patterns which can be traced
back to that early Unix of which Ritchie and Thompson wrote, but which
can be seen much more clearly from the current perspective.
Unfortunately they are not all good, but both good and bad can provide
valuable lessons for guiding subsequent design.
"Full exploitation" is essentially a pattern in itself, and one we
will come back to repeatedly. Whether it is applied to software
development, architecture, or music composition, exploiting a good
idea repeatedly can enhance the integrity and cohesion of the result
and is - hopefully - a pattern that does not need further
That said, "full exploitation" can benefit from detailed
illumination. We will gain such illumination for this, as for the
other three patterns, by examining two specific examples.
Ritchie and Thompson identified in their abstract several features of
Unix which they felt were noteworthy. The first two of these will be our
two examples. Using their words:
- A hierarchical file system incorporating demountable volumes,
- Compatible file, device, and inter-process I/O,
The second of these is sometimes seen as a key hallmark of Unix and
has been rephrased as "Everything is a file". However that term does
the idea an injustice as it overstates the reality. Clearly
everything is not a file. Some things are devices and some things are
pipes and while they may share some characteristics with files, they
certainly are not files.
A more accurate, though less catchy, characterization would be
"everything can have a file descriptor". It is the file descriptor as
a unifying concept that is key to this design. It is the file
descriptor that makes files, devices, and inter-process I/O
Though files, devices and pipes are clearly different objects with
different behaviors, they nonetheless have some behaviors in common
and by using the same abstract handle to refer to them, those
similarities can be exploited. A program or library routine that does not
care about the differences does not need to know about those differences
at all, and a program that does care about the differences only needs
to know at the specific places where those differences are relevant.
By taking the idea of a file descriptor and exploiting it also for
serial devices, tape devices, disk devices, pipes, and so forth, Unix
gained an integrity that has proved to be of lasting value. In modern
Linux we also have file descriptors for network sockets, for receiving
timer events and other events, and for accessing a whole range of new
types of devices that were barely even thought of when Unix was first
developed. This ability to keep up with ongoing development
demonstrates the strength of the file-descriptor concept and is
central to the value of the "full exploitation" pattern.
As we shall see, the file descriptor concept was not exploited as
fully as possibly it could have been, either initially or during ongoing
development. Some of the weaknesses that we will find are in
places where there was missed opportunity for full exploitation of
file descriptors or related ideas, and many of the strengths are in
places where file descriptors were used to enable new functionality.
Single, Hierarchical namespace
The other noteworthy feature identified by Ritchie and Thompson (first
in their list) was a hierarchical filesystem incorporating
There are three key aspects to this file system which are particularly
significant for the present illustration.
- It was hierarchical. We are so used to hierarchical namespaces
today that this seems like it should be a given. However at the time
it was somewhat innovative. Some contemporaneous filesystems, such as
the one used in CP/M, were completely flat with no sub-directories.
Others might have a fix number of levels to the hierarchy, typically
two. The Unix filesystem allowed an arbitrarily deep hierarchy.
- It allowed demountable volumes. While each distinct storage
volume could store a separate hierarchical set of files, this
separation was hidden by combining all of these file sets into a
single all-encompassing hierarchy. Thus the idea of hierarchical
naming was exploited not just for a single device, but across the
union of all storage devices.
- It contained device-special files. These are filesystem objects
that provide access to devices, both character devices like modems
and block devices like disk drives. Thus the hierarchical naming
scheme covered not only files and directories, but also all devices.
The design idea being fully exploited here is the hierarchical namespace.
The result of exploiting it within a single storage device,
across all storage devices, and providing access to devices as well as
storage, is a "single namespace". This provides a uniform naming
scheme to provide access to a wide variety of the objects managed by
The most obvious area where this exploitation continued in subsequent
development is the area of virtual filesystems, such as procfs and
sysfs in Linux. These allowed processes and many other entities which
were not strictly devices or files to appear in the same common
Another effective exploitation is in the various autofs or auto-mount
implementations which allow other objects, which are not necessarily
storage, to appear in the namespace. Two examples are
/net/hostname which includes hosts on the local
network into the namespace, and /home/username which
allows user names to appear. While these don't make hosts and users
first-class namespace objects they are still valuable steps forward.
In particular the latter removes the need for the tilde prefix
supported by most shells and some editors (i.e. the mapping from
~username to that user's home directory). By incorporating
this feature directly in the namespace, the functionality becomes available to
As with file descriptors, the hierarchical namespace concept was not
exploited as fully as might have been possible so we don't really have
a single namespace. Some aspects of this incompleteness are simple
omissions which have since been rectified as mentioned above. However
there is one area where a hierarchical namespace was kept separate,
with unfortunate consequences that still aren't fully resolved today.
That namespace is the namespace of devices. The
device-special files used to include devices into the single
namespace, while effective to some degree, are a poor second cousin to
doing it properly.
A little reflection will show that the device namespace in Unix is a
hierarchical space with three or more levels. The top level
distinguishes between 'block' and 'character' devices. The second
level, encoded in the major device number, usually identifies the driver which
manages the device. Beneath this are one or two levels encoded in bit
fields of the minor number. A disk drive controller might use some
bits to identify the drive and others to identify the partition on
that drive. A serial device driver might identify a particular
controller, and then which of several ports on that controller
corresponds to a particular device.
The device special files in Unix provide only limited access to this
namespace. It can be helpful to see them as symbolic links into this
alternate namespace which add some extra permission checking.
However while symlinks can point to any point in the hierarchy,
device special files can only point to the actual devices,
so they don't provide access to the structure of the namespace.
It is not possible to examine the different levels in the
namespace, nor to get a 'directory listing' of all entries from some
particular node in the hierarchy.
Linux developers have made several attempts to redress this omission
with initiatives such as devfs, devpts, udev, sysfs, and more recently
devtmpfs. Given the variety of attempts, this is clearly a hard
problem. Part of the difficulty is maintaining backward compatibility
with the original Unix way of using device special files which gave,
for example, stable permission setting on devices. There are
doubtless other difficulties as well.
Not only was the device hierarchy not fully accessible, it was not
fully extensible. The old limit of 255 major numbers and 255 minor
number has long since been extended with minimal pain. However the
top level of "block or char" distinction is more deeply entrenched and harder to
change. When network devices came along they didn't really
fit either as "block" or "character" so, instead of being squeezed
into a model where they didn't fit, network devices got their very
own separate namespace which has its own separate functions for
enumerating all devices, opening devices, renaming devices etc.
So while hierarchical namespaces were certainly well exploited in the
early design, they fell short of being fully exploited, and this lead to
later extensions not being able to continue the exploitation fully.
These two examples - file descriptors and a uniform hierarchical
namespace - illustrate the pattern of "full exploitation" which can
be a very effective tool for building a strong design. While we can
see with hindsight that neither was carried out perfectly, they both
added considerable value to Unix and its successors, adequately
demonstrating the value of the pattern. Whenever one is looking to
add functionality it is important to ask "how can this build on what
already exists rather than creating everything from scratch?" and
equally "How can we make sure this is open to be built upon in the
The next article in this series will explore two more examples, examine their historical
development, and extract a different pattern -- one that brings
weakness rather than strength. It is a pattern that can be recognized
early, but still is an easy trap for the unwary.
The interested reader might like to try the following exercises to
further explore some of the ideas presented in this article. There
are no definitive answers, but rather the questions are starting
points that might lead to interesting discoveries.
- Make a list of all kernel-managed objects that can be referenced
using a file descriptor, and the actions that can be effected through
that file descriptor. Make another list of actions or objects which do
not use a file descriptor. Explain how one such action or object
could benefit by being included in a fuller exploitation of file
- Identify three distinct namespaces in Unix or Linux that are not
primarily accessed through the "single namespace". For each,
identify one benefit that could be realized by incorporating the
namespace into the single namespace.
- Identify an area of the IP protocol suite where "full exploitation"
has resulted in significant simplicity, or otherwise been of benefit.
- Identify a design element that was fully exploited in the NFSv2
protocol. Compare and contrast this with NFSv3 and NFSv4.
Ghosts of Unix past, part 2:
Comments (31 posted)
Page editor: Jonathan Corbet
Tavis Ormandy has been busy of late, poking around in the guts of GNU
libc. Out of that have come two separate local privilege escalations that
exploit an obscure corner (the dynamic linker auditing API) of glibc, while
the exploits themselves use—abuse—some Linux features that many
probably aren't aware of. These vulnerabilities and exploits provide good
examples of the way that security researchers look at code and
systems—a way of looking that more developers would do well to emulate.
The runtime library auditing API is a way for developers to intercept the
actions of the dynamic linker to see the steps that it is taking while
searching for .so files and resolving symbols from them. When a
program is executed with the LD_AUDIT environment variable
pointing to one or more shared libraries, the linker will make callbacks
into functions in those libraries for various events that happen in the
linking process. There are various events specified in the rtld-audit
man page, including searching for an object, opening an object, binding to
a symbol, and so on. It seems like a useful facility, but one that is
likely not in the toolbox of many Linux developers.
The simpler of the two problems that Ormandy found was that setuid
programs will open whatever arbitrary library a user specifies in
LD_AUDIT, as long as that library lives on the trusted library path. The more well-known LD_PRELOAD environment
variable, which preloads the specified libraries before the linker searches
for others, is specifically prohibited from operating on setuid
programs unless the library is on the trusted path and has the
setuid bit set. Exploiting ping (or some other setuid program)
with LD_PRELOAD would be trivial—a user-provided library could
remap any call ping made to anything the attacker wanted—so
it was an
obvious restriction. LD_AUDIT using non-setuid libraries was evidently not so obvious.
The problem with allowing user-provided libraries to be used for auditing
setuid programs is not anywhere in the auditing API, but is
instead inherent in the way the runtime linker processes libraries. When
the library is opened with dlopen() to determine whether the
auditing callback symbols are present, any library initialization routines
run. So, an exploit is done by finding a vulnerable system library
(it must be on the trusted path) that was not written with setuid
execution in mind (and thus does not have that bit set in the filesystem).
In his description of the
flaw, Ormandy gives an example of using the libpcprofile.so
library, which writes an output file to the path specified by the
PCPROFILE_OUTPUT environment variable. Using ping for
its setuid nature, he sets LD_AUDIT to the library,
points PCPROFILE_OUTPUT where he wants, and
ping ends up putting a user-writable file in
/etc/cron.d. The details will vary depending on the distribution,
but most will be vulnerable to the flaw.
There is nothing particularly special about libpcprofile.so, as
Ormandy describes ways to find other vulnerable system libraries, which are
likely to be numerous—those libraries weren't meant to be used by
The other vulnerability is
more difficult to exploit, but stems from a similar laxness in
LD_AUDIT handling. In the Linux executable file format, ELF,
library search paths can be specified in the executable itself using
DT_RPATH or DT_RUNPATH tags. Those tags can contain a
$ORIGIN value, which is replaced with location of the executable
in the filesystem. That way, a library used by a single executable can be
located in a program-specific location rather than in the system library
The ELF specification recommends that $ORIGIN be disallowed for
setuid executables, but glibc ignores that recommendation.
Ormandy doesn't really see a problem with that:
It is tough to form a thorough complaint about this glibc behaviour however,
as any developer who believes they're smart enough to safely create suid
programs should be smart enough to understand the implications of $ORIGIN
and hard links on load behaviour. The glibc maintainers are some of the
smartest guys in free software, and well known for having a "no hand-holding"
stance on various issues, so I suspect they wanted a better argument than this
for modifying the behaviour (I pointed it out a few years ago, but there was
Unfortunately, the $ORIGIN substitution code was reused in the
LD_AUDIT path. There was seemingly an attempt to restrict the use
of $ORIGIN in LD_AUDIT for privileged programs, but it
was insufficient. $ORIGIN will be expanded if it is the
only entry in LD_AUDIT. Since $ORIGIN expands to the
directory that contained the program, it isn't necessarily obvious that
there is anything there to exploit. But, there are known ways to exploit this
kind of situation.
If the directory that contains the executable can be replaced with an
exploit library object between the time $ORIGIN is expanded and
value is used, the library will be loaded and the attacker can do what they
like. It is essentially a
race condition, but one that can be reliably won by the attacker.
Ormandy's example basically pauses the execution of a ping that
has been hardlinked into an attacker-controlled directory after the
expansion of $ORIGIN has been done. He then removes the directory
and its contents, and puts a library that has exploit code in its
initialization function in the place of the directory.
That particular exploit mechanism is fairly modern, using relatively recent
Linux kernel features, but there are others. Ormandy describes several
other ways to exploit the flaw, with differing requirements (e.g. a C
compiler or winning an easily winnable race) that might serve different
attack strategies. While both are local privilege escalations, they very
well might be used in conjunction with a web application or other flaw to
turn them into a remote root vulnerability.
Both of these vulnerabilities are quite serious for systems that allow
untrusted users to log in. Their impact on other systems depends on
whether there are other vulnerable, network-facing programs. While it is a
bit ironic that it was an audit of LD_AUDIT behavior that found
these bugs, it seems clear that there isn't enough of that kind of auditing
being done for Linux systems. It's always a bit worrisome to think of how
many of these kinds of flaws are still lingering out there.
Comments (33 posted)
The Mozilla Security Blog warns
of a new Firefox vulnerability which is already being exploited.
"Users who visited an infected site could have been affected by the
malware through the vulnerability. The trojan was initially reported as
live on the Nobel Peace Prize site, and that specific site is now being
blocked by Firefox's built-in malware protection. However, the exploit code
could still be live on other websites.
running NoScript) will block exploit attempts.
Comments (42 posted)
festival: code execution
|Created:||October 22, 2010
||Updated:||November 3, 2010|
||From the openSUSE advisory:
festival_server uses an unsafe LD_LIBRARY_PATH. Local users
could exploit that to execute code as another user if that
user runs festival_server.
Comments (none posted)
glibc: privilege escalation
|Created:||October 21, 2010
||Updated:||April 15, 2011|
From the Red Hat advisory:
It was discovered that the glibc dynamic linker/loader did not handle the
$ORIGIN dynamic string token set in the LD_AUDIT environment variable
securely. A local attacker with write access to a file system containing
setuid or setgid binaries could use this flaw to escalate their privileges.
For a detailed look, see Tavis Ormandy's report.
Comments (none posted)
glibc: privilege escalation
|Created:||October 22, 2010
||Updated:||January 12, 2011|
||From the Debian advisory:
Ben Hawkes and Tavis Ormandy discovered that the dynamic loader in GNU
libc allows local users to gain root privileges using a crafted
LD_AUDIT environment variable.
Comments (none posted)
libsmi: arbitrary code execution
|Created:||October 22, 2010
||Updated:||January 25, 2011|
||From the Mandriva advisory:
A buffer overflow was discovered in libsmi when long OID was given
in numerical form. This could lead to arbitrary code execution.
Comments (none posted)
pidgin: denial of service
|Created:||October 21, 2010
||Updated:||March 14, 2011|
From the Red Hat advisory:
Multiple NULL pointer dereference flaws were found in the way Pidgin
handled Base64 decoding. A remote attacker could use these flaws to crash
Pidgin if the target Pidgin user was using the Yahoo! Messenger Protocol,
MSN, MySpace, or Extensible Messaging and Presence Protocol (XMPP) protocol
plug-ins, or using the Microsoft NT LAN Manager (NTLM) protocol for
Comments (none posted)
tuxguitar: code execution
|Created:||October 21, 2010
||Updated:||October 27, 2010|
From the Red Hat bugzilla entry:
Raphael Geissert conducted a review of various packages in Debian and found
that tuxguitar contained a script that could be abused by an attacker to
execute arbitrary code .
The vulnerability is due to an insecure change to LD_LIBRARY_PATH, and
environment variable used by ld.so(8) to look for libraries in directories
other than the standard paths. When there is an empty item in the
colon-separated list of directories in LD_LIBRARY_PATH, ld.so(8) treats it as a
'.' (current working directory). If the given script is executed from a
directory where a local attacker could write files, there is a chance for
Comments (none posted)
Page editor: Jake Edge
The 2.6.37 merge window is open
as of this writing, so there is no
current development kernel prepatch. The merge window can be expect to
close right around the end of the month. See the article below for a
summary of activity in this merge window so far.
Stable updates: there have been no stable updates in the last week.
The 126.96.36.199, 188.8.131.52, and 184.108.40.206 updates are currently in the review
process and may be released at any time.
Comments (none posted)
All we really need to do is get someone a case of the beverage of
their choice and turn them loose on the problem. I think that the
few anti-stacking holdouts (I was one, but converted a couple years
ago) can be swayed by a reasonable implementation. It won't be
easy, there are plenty of problems that need to be solved, but
anyone who wants easy should stick to developing web portals and
stay out of the kernel.
-- Casey Schaufler
This option enables support for Zalgo in kernel
messages. Zalgo is a corruption. The arrival of Zalgo has
been foretold. Zalgo will not... wait.
-- Matthew Garrett
People do tend to prefer to do localised expedient things rather
than sticking their necks out and implementing proper, generic
kernel-wide functions. If I see it happen, I'll tell them.
Usually I don't see it until months after it's merged.
-- Andrew Morton
Comments (3 posted)
Linus has sent out a notice that the 2.6.37 merge window will indeed be
shorter than usual; it will probably conclude on October 30 or 31, just in
time for the 2010 Kernel Summit. "And so far, in the five days since the 2.6.36 release, we've merged
5500+ commits. That has turned my "maybe we can do a shorter merge
window" into a 'we can definitely do a shorter merge window'. Because
we already have enough changes, and there's almost a week to go - so I
think we're well on track for doing that.
Full Story (comments: none)
Bryce Lelbach has announced that he has managed to build and boot a
(mostly) working kernel using the LLVM-based Clang compiler. It seems that
there are a lot of
problems remaining, though, and he had to use a couple of GCC-compiled
pieces to get the system to boot. " SELinux, Posix ACLs, IPSec,
eCrypt, anything that uses the crypto API - None of these will compile, due
to either an ICE or variable-length arrays in structures (don't remember
which, it's in my notes somewhere). If it's variable-length arrays or
another intentionally unsupported GNUtension, I'm hoping it's just used in
some isolated implementation detail (or details), and not a fundamental
part of the crypto API (honestly just haven't had a chance to dive into the
crypto source yet).
Full Story (comments: 69)
As a general rule, kernel developers work to avoid running code in hardware
interrupt context; there is a whole array of mechanisms by which
interrupt-driven work can be deferred to less pressing times. Apparently,
however, there is an occasional need to run arbitrary code in the hardware
interrupt context - and there is no hardware conveniently signaling
interrupts at the time. To enable the running of code in hardware
interrupt context, a new API has been added to 2.6.37.
The first step is to fill in an irq_work structure:
struct irq_work my_work;
init_irq_work(struct irq_work *entry, void (*func)(struct irq_work *func));
There is then a fairly familiar pair of functions for running the work
indicated by this structure:
bool irq_work_queue(struct irq_work *entry);
void irq_work_sync(struct irq_work *entry);
The intended area of use is apparently code running from non-maskable
interrupts which needs to be able to interact with the rest of the system.
One should assume that just about any other use of this feature is likely
to be scrutinized closely.
Comments (2 posted)
The kernel is filled with tests whose results almost never change. A
classic example is tracepoints, which will be disabled on running systems
with only very rare exceptions. There has long been interest in optimizing
the tests done in such places; with 2.6.37, the "jump label" feature
will make those tests go away entirely.
Consider the definition of a typical tracepoint, which, behind all of the
preprocessor madness, looks something like:
static inline trace_foo(args)
/* Actually do tracing stuff */
The cost of a test for a single tracepoint is essentially zero. The number
of tracepoints in the kernel is growing, though, and each one adds a new
test. Each test must fetch a value from memory, adding to the pressure on
the cache and hurting performance. Given that the value almost never changes, it
would be nice to find a way to optimize the "tracepoint disabled" case.
In 2.6.37, this tracepoint can be rewritten using a new macro:
#define JUMP_LABEL(key, label) \
if (unlikely(*key)) \
The nice thing is that JUMP_LABEL() does not have to be
implemented like that. It can, instead, (1) note the location of the
test and the key value in a special table, and (2) simply
insert a no-op instruction. That reduces the cost of the test (and the
tracepoint) to zero for the common "not enabled" case. Most of the time,
the tracepoint will never be enabled and the omitted test will never be
The tricky part happens when somebody wants to enable the tracepoint.
Changing its status now requires calling one of a pair of special
void enable_jump_label(void *key);
void disable_jump_label(void *key);
A call to enable_jump_label() will look up the key in the jump
label table, then replace the special no-op instructions with the assembly
equivalent of "goto label", enabling the tracepoint.
Disabling the jump label will cause the no-op instruction to be restored.
The end result is a significant reduction in the overhead of disabled
tracepoints. This feature only works on architectures which support it
(x86 only, at the moment) and only with relatively recent versions of GCC;
otherwise the preprocessor version is used.
Comments (16 posted)
Kernel development news
The 2.6.36 kernel was released on October 20, and the 2.6.37 merge window
duly started shortly thereafter. As of this writing, some 6450
changes have been merged for the next development cycle, with more surely
to come. Some of the more significant, user-visible changes merged for
- The first parts of the inode scalability patch set have been merged,
but, as of this writing, the core locking changes have not yet been
pushed for inclusion. See this
article for more information on the inode scalability work.
- The x86 architecture now uses separate stacks for interrupt handling
when 8K stacks are in use. The option to use 4K stacks has been
- The big kernel lock removal process continues; the core kernel is
almost entirely BKL-free. There is now a configuration option which
may be used to build a kernel without the BKL. File locking still
requires the BKL, though; schemes are afoot to fix it before the
close of the merge window, but this work is not yet complete. If file
locking can be cleaned up, it will be possible for many (or most)
users to run a BKL-free 2.6.37 kernel.
- The "rados block device" has been added. RBD allows the creation
of a special block device which is backed by objects stored in the
Ceph distributed system.
- The GFS2 cluster filesystem is no longer marked "experimental." GFS2
has also gained support for the fallocate() system call.
- A new sysfs file, /sys/selinux/status, allows a user-space
application to quickly notice when security policies have changed.
The intended use is evidently daemons which cache the results of
access-control decisions and need to know when those results might
change. A separate file, called policy, has been added for
those simply wanting to read the current policy from the kernel.
- The scheduler now works harder to avoid migrating high-priority
realtime tasks. The
scheduler also will no longer charge processor time used to handle
interrupts to the process which happened to be running at the time.
- VMware's VMI paravirtualization support has been deprecated
by the company and, as scheduled, removed from the 2.6.37 kernel.
- Some hibernation improvements have been merged, including the ability
to compress the hibernation image with LZO,
- The ARM architecture has gained support for the seccomp (secure computing)
- The block layer can now throttle I/O bandwidth to specific devices,
controlled by the cgroup mechanism. This is the second piece of the
I/O bandwidth controller puzzle which allows the establishment of
specific bandwidth limits which will be enforced even if more I/O
bandwidth is available.
- The new "ttyprintk" device allows suitably-privileged user space to
feed messages through the kernel by way of a pseudo TTY device.
- The kernel has gained support for the point-to-point tunneling
protocol (PPTP); see the
accel-pptp project page for more information.
- The NFS
server client has a new "idmapper" implementation for the translation
between user and group names and IDs. The new code is more flexible
and performs better; see Documentation/filesystems/nfs/idmapper.txt
- There is a new -olocal_lock= mount option for the NFS client
which can cause it to treat either (or both) of flock() and
POSIX locks as local.
- Most of the functions of the nfsservctl() system call have
been deprecated and marked for removal in 2.6.40. There is a new
configuration option for those who would like to remove this
functionality ahead of time.
- Simple support for the pNFS protocol has been merged.
- Huge pages can now be migrated between nodes like normal memory pages.
- There is the usual pile of new drivers:
- Systems and processors: Flexibility Connect boards,
Telechips TCC ARM926-based systems,
Telechips TCC8000-SDK development kits,
Vista Silicon Visstrim_m10 i.MX27-based boards,
LaCie d2 Network v2 NAS boards,
Qualcomm MSM8x60 RUMI3 emulators,
Qualcomm MSM8x60 SURF eval boards,
Eukrea CPUIMX51SD modules,
Freescale MPC8308 P1M boards,
APM APM821xx evaluation boards,
Ito SH-2007 reference boards,
IBM "SMI-free" realtime BIOS's,
MityDSP-L138 and MityDSP-1808 systems,
OMAP3 Logic 3530 LV SOM boards,
OMAP3 IGEP modules, and
taskit Stamp9G20 CPU modules.
- Block: Chelsio T4 iSCSI offload engines.
- Input: Roccat Pyra gaming mice,
UC-Logic WP4030U, WP5540U and WP8060U tablets,
several varieties of Waltop tablets,
OMAP4 keyboard controllers,
NXP Semiconductor LPC32XX touchscreen controllers,
Hanwang Art Master III tablets,
ST-Ericsson Nomadik SKE keyboards,
ROHM BU21013 touch panel controllers, and
TI TNETV107X touchscreens.
- Miscellaneous: Freescale eSPI controllers,
Topcliff platform controllher hub devices,
OMAP AES crypto accelerators,
NXP PCA9541 I2C master selectors,
Intel Clarksboro memory controller hubs,
OMAP 2-4 onboard serial ports,
Linear Technology LTC4261 Negative Voltage Hot Swap Controller
TI BQ20Z75 gas gauge ICs,
OMAP TWL4030 BCI chargers,
ROHM ROHM BH1770GLC and OSRAM SFH7770 combined ALS and proximity sensors,
Avago APDS990X combined ALS and proximity sensors,
Intersil ISL29020 ambient light sensors, and
Medfield Avago APDS9802 ALS sensor modules.
- Network: Brocade 1010/1020 10Gb Ethernet cards,
Conexant CX82310 USB ethernet ports,
Atheros AR9170 "otus" 802.11n USB devices, and
Topcliff PCH Gigabit Ethernet controllers.
- Sound: Marvell 88pm860x codecs,
TI WL1273 FM radio codecs,
HP iPAQ RX1950 audio devices,
Native Instruments Traktor Kontrol S4 audio devices,
Aztech Sound Galaxy AZT1605 and AZT2316 ISA sound cards,
Wolfson Micro WM8985 and WM8962 codecs,
Wolfson Micro WM8804 S/PDIF transceivers,
Samsung S/PDIF controllers, and
Cirrus Logic EP93xx AC97 controllers.
- USB: Intel Langwell USB OTG transceivers,
YUREX "leg shake" sensors, and
USB-attached SCSI devices.
- The old ieee1394 stack has been removed, replaced at last by
the "firewire" drivers.
Changes visible to kernel developers include:
- The jump label
optimization mechanism has been merged; its initial purpose is to
reduce the overhead of inactive tracepoints.
- Yet another RCU variant has been added: "tiny preempt RCU" is meant
for uniprocessor systems. "This implementation uses but a
single blocked-tasks list rather than the combinatorial number used
per leaf rcu_node by TREE_PREEMPT_RCU, which reduces memory
consumption and greatly simplifies processing. This version also
takes advantage of uniprocessor execution to accelerate grace periods
in the case where there are no readers."
- New tracepoints have been added in the network device layer, places
where sk_buff structures are freed,
softirq_raise(), workqueue operations, and
memory management LRU list shrinking operations.
There is also a new script for using perf to analyze network device
- The wakeup latency tracer now has function graph support.
- There is a new mechanism for running
arbitrary code in hardware interrupt context.
- The power management layer now has a formal concept of "wakeup
sources" which can bring the system out of a sleep state. Among other
things, it can collect statistics to help the user determine what is
keeping a system awake. Wakeup events can abort the freezing of
tasks, reducing the time required to recover from an aborted suspend
or hibernate operation.
- A new mechanism for managing the automatic suspending of idle devices
has been added.
- There is a new set of functions for managing the "operating
performance points" of system-on-chip components. (commit).
- A long list of changes to the memblock (formerly LMB) low-level
management code has been merged, and the x86 architecture now uses
memblock for its early memory management.
- The default handling for lseek() has changed: if a driver
does not provide its own llseek() function, the VFS layer
will cause all attempts to change the file position to fail with an
ESPIPE error. All in-tree drivers which lacked
llseek() functions have been changed to use
noop_llseek(), which preserves the previous behavior.
- There is a new way to create workqueues:
struct workqueue_struct *alloc_ordered_workqueue(const char *name,
unsigned int flags);
Items submitted to the resulting workqueue will be run in order, one
at a time. It's meant to eventually replace the old singlethreaded
Also added is:
bool flush_work_sync(struct work_struct *work);
This function will wait until a specific work item has completed.
- The ALSA ASoC API has been significantly extended to support sound
cards with multiple codecs and DMA controllers. (commit).
- The stack-based
kmap_atomic() patch has been merged, with an associated
API change. See the new Documentation/vm/highmem.txt file for
- There are two new memory allocation helpers:
void *vzalloc(unsigned long size);
void *vzalloc_node(unsigned long size, int node);
Both behave like the equivalent vmalloc() calls, but they
also zero the allocated memory.
- Most of the work needed to remove the concept of hard
barriers from the block layer has been merged. This task will
probably be completed before the closing of the merge window.
Linus has let it be known that he expects this merge window to be shorter
than usual so that it can be closed before the 2010 Kernel Summit begins on
November 1. Expect patches to be merged at a high rate until the end
of October; an update next week will cover the changes merged in the last
part of the 2.6.37 merge window.
Comments (13 posted)
Nick Piggin's VFS scalability
has been under development for well over one year. Linus was
ready to pull this work during the 2.6.36 merge window, but Nick asked for
more time for things to settle out; as a result, only some of the simpler
parts were merged then. Last week, we mentioned
that some developers
became concerned when it started to become clear that the remaining work
would not be ready for 2.6.37 either. Out of that concern came a competing
version of the patch set (by Dave Chinner) and a big fight. This
discussion was of the relatively deep and intimidating variety, but your
editor, never afraid to make a total fool of himself, will attempt to
clarify the core disagreements and a possible path forward anyway.
The global inode_lock is used within the virtual filesystem layer
(VFS) to protect several data structures and a wide variety of
inode-oriented operations. As a global lock,
it has become an increasingly annoying bottleneck as the number of CPUs and
threads in systems increases; it clearly needs to be broken up in a way
which makes it more scalable. Unfortunately, like a number of old locks in
the VFS, the boundaries of what's protected by inode_lock are not
always entirely clear, so any attempts to change locking in that area must
be done with a great deal of caution. That is why improving inode locking
scalability has been such a slow affair.
Getting rid of inode_lock requires putting some other locking in
place for everything that inode_lock protects. Nick's patch set
creates separate global locks for some of those resources:
wb_inode_list_lock for the list of inodes under writeback, and
inode_lru_lock for the list of inodes in the cache. The standalone
inodes_stat statistics structure is converted over to atomic
types. Then the existing i_lock per-inode spinlock is used to
cover everything else in the inode structure; once that is done,
inode_lock can be removed. The remainder of the patch set (more
than half of the total) is then dedicated to reducing the coverage of
i_lock, often by using read-copy-update (RCU) instead.
Before any of that, though, Nick's patch set changed the way the core
memory management "shrinker" code works. Shrinkers are callbacks which can
be invoked by the core when memory is tight; their job is then to reduce
the amount of memory used by a specific data structure. The inode and
dentry caches can take up quite a bit of memory, so they both have
shrinkers which will free up (hopefully) unneeded cache entries when the
memory is needed elsewhere. Nick changed the shrinker API to cause it to
target specific memory zones; that allows the core to balance free memory
across memory types and across NUMA nodes.
The per-zone shrinkers were one of the early flash points in this debate.
Dave Chinner and others on the VFS side of the house worried that invoking
shrinkers in such a fine-grained way would increase contention at the
filesystem level and make it
harder to shrink the caches in an efficient way. They also thought that
this change was orthogonal to the core goal of eliminating the scalability
problems caused by the global inode_lock. Nick fought hard for
per-zone shrinkers, and he clearly believes that they are necessary, but he
has also dropped them from his patch set for now in an attempt to push
The next disagreement has to do with the coverage of i_lock; Dave
Chinner's alternative patch set avoids using i_lock to cover most
of the inode structure. Instead, Dave introduces other locks from
the outset, reaching a point where he has relatively fine-grained lock
coverage by the time inode_lock is removed at the end of his
series. Compared to this approach, Nick's patches have been criticized as
being messy and not as scalable.
Nick's response is that the "width" of i_lock is a detail which
can be resolved later. His
intent was to do the minimal amount of work required to allow the removal
of inode_lock, without going straight for the ultimate scalable
solution. The goal was to be able to ensure that the locking remains
correct by changing as little as possible before the removal of the global
lock; that way, hopefully, there are fewer chances of breaking things.
Beyond that, any bugs which do slip through before the patch removing
inode_lock will almost certainly not reveal themselves until after
that removal. That means that anybody trying to use bisection to find a
bug will end up at the inode_lock removal patch instead of the
real culprit. Thus, minimizing the number of changes before that removal
should make debugging easier.
That is why Nick removes inode_lock before the middle of his patch
series, while Dave's series does that removal near the end. Both patch
sets include a number of the same changes - putting per-bucket locks onto
the inode hash table, for example - but Nick does it after removing
inode_lock, while Dave does it before. There are also
differences, with Nick heading deep into RCU territory while Dave avoids
using RCU. Both developers claim to be aiming for similar end results,
they just take different roads to get there.
One of the hardest problems
in the VFS is ensuring that all locks are taken in the proper order so that
the system will not deadlock.
Finally, there is also a deep disagreement over the locking of the inode
cache itself. In current kernels, the cache data structure (the LRU and
writeback lists, essentially) is covered by inode_lock with the
rest. Both patch sets create separate locks for the LRU and for
writeback. The problem is with lock ordering; one of the hardest problems
in the VFS is ensuring that all locks are taken in the proper order so that
the system will not deadlock. Nick's patches require the VFS to acquire
i_lock for the inode(s) of interest prior to acquiring the
writeback or LRU locks; Dave, instead, wants i_lock to be the
The problem is that it is not always possible to acquire the locks in the
specified order. Code which is working through the LRU list, for example, must
have that list locked; if it then decides to operate on an inode found in
the LRU list, it must lock the inode. But that violates Nick's locking
order. To make things work correctly, Nick uses spin_trylock() in
such situations to avoid hanging. Uses of spin_trylock() tend to
attract scrutiny, and that is the case here; Dave has described the code as "a
large mess of trylock operations" which he has gone out of his way
to avoid. Nick responds that the code is
not that bad, and that Dave's approach brings locking complexities of its
This is about where Al Viro jumped in,
calling both approaches wrong. Al would like to see the writeback locks
taken prior to i_lock (because code tends to work from the list
first, prior to attacking individual inodes), but he says the LRU lock
should be taken after i_lock because code changing the LRU status
of an inode will normally already have that inode's lock. According to Al, Nick is overly concerned with
the management of the various inode lists and, as a result,
"overengineering" the code. After some discussion, Dave eventually agreed with something close to Al's view and
acknowledged that Nick's placement of the LRU lock below i_lock
was correct, eliminating that point of contention.
Al has also described the way he would like things
to proceed; this is a good thing. When it comes to VFS locking, few are
willing to challenge his point of view; that means that he can probably
bring about a resolution to this particular dispute. He wants a patch
series which starts with the split of the writeback and LRU lists, then
proceeds by pulling things out from under inode_lock one at a
time. He is apparently pulling together a tree based on both Nick's and
Dave's work, but with things done in the order he likes. The end result
will probably be credited to Nick, who figured out how to solve a long list
of difficult problems around inode_lock, but it will differ
significantly from what he initially proposed.
What is not at all clear, though, is how much of this will come together
for the 2.6.37 merge window. Al has a long history of last-second pull
requests full of hairy changes; Linus tends to let him get away with it.
But this would be very last minute, and the changes are deep, so, while Al
has pushed some of the initial changes, the core locking work may not be
ready in time for 2.6.37. Either way, once inode scalability has been
taken care of, discussion can begin
on the removal of dcache_lock, which is a rather more complex
problem than inode_lock; that should be interesting to watch.
Comments (none posted)
One tends to think of "the NASDAQ" as a single exchange based in the US,
but, in fact, NASDAQ OMX
exchanges all over the world - and they
run on Linux. In the US for instance, that includes markets like the
NASDAQ Stock Market, The NASDAQ Options Market, and NASDAQ OMX
newest market that launched on October 8. At a brief presentation at the
Linux Foundation's invitation-only End User Summit in Jersey City, NASDAQ
OMX vice president Bob Evans talked about the ups and downs of using Linux
in a seriously mission-critical environment.
NASDAQ OMX's exchanges run on thousands of Linux-based servers. These
servers handle realtime transaction processing, monitoring, and development
as well. The big challenge in this environment, of course, is performance;
real money depends on whether the exchange can keep up with the order
stream. Latency matters as much as throughput, though; orders must be
responded to (and executed) within bounded period of time. Needless to say,
reliability is also crucially important; down time is not well received, to
say the least.
To meet these requirements, NASDAQ OMX runs large clusters of thousands of
machines. These clusters can process hundreds of millions of orders per day
- up to one million orders per second - with 250µs latency.
According to Bob, Linux has incorporated some useful technologies in recent
years. The NAPI interrupt mitigation technique for network drivers has, on
its own, freed up about 1/3 of the available CPU time for other work. The
epoll system call cuts out much of the per-call overhead, taking 33µs off
of the latency in one benchmark. Handling clock_gettime() in user space via
the VDSO page cuts almost another 60ns. Bob was also quite pleased with how
the Linux page cache works; it is effective enough, he says, to eliminate
the need to use asynchronous I/O, simplifying the code considerably.
On the other hand, there are some things which have not worked out as
well for them. These include I/O signals; they are complex to program with
and, if things get busy, the signal queue can overflow. The user-space
libaio asynchronous I/O (AIO) implementation is thread-based; it scales
poorly, he says, and does not integrate well with epoll. Kernel-based
asynchronous I/O, instead, lacks proper socket support. He also mentioned
the recvmsg() system call, which requires a call into the kernel for every
There is some new stuff coming along which shows some promise. The new
recvmmsg() system call can receive multiple packets with a single
call. For now, though, it is just a wrapper around the internal
recvmsg() implementation and does not hold the socket lock across
the entire operation. But, he said, recvmmsg() is a good example
of how the ability to add new APIs to Linux is a good thing. He also likes the
combination of kernel-based AIO and the eventfd() system call; that makes
it possible to integrate file-based AIO into an applications normal
event-processing loop. There is also some potential in syslets, which he
sees as a way of delivering cheap notifications to user space; it's not
clear whether syslets will scale usefully, though.
What NASDAQ OMX would really like to see in Linux now is good socket-based
AIO. That would make it possible to replace epoll/recvmsg/sendmsg sequences
with fewer system calls. Even better would be if the kernel could provide
notifications for multiple events at a time. Best would be if the interface
to this functionality were completely based on sockets. He described a
vision of an "epoll-like kernel object" which would handle in-kernel
network traffic processing. The application could post asynchronous send
and receive requests to the queue, and receive notifications when they have
been executed. He would like to see multiple sockets attached to a single
object, and a file descriptor suitable for passing to poll() for
notifications. With a setup like that, it should be possible to push more
network traffic through the kernel with lower latencies.
In summary, NASDAQ OMX seems to be happy with its use of Linux. They also
seem to like to go with current software - the exchange is currently
rolling out 220.127.116.11 kernels. "Emerging APIs" are helping operations like
NASDAQ OMX realize real-world performance gains in areas that
matter. Linux, Bob says, is one of the few systems that are willing to
introduce new APIs just for performance reasons. That is an interesting
point of view to contrast with Linus Torvalds's often-stated claim that
nobody uses Linux-specific APIs; it seems that there are users, they just
tend to be relatively well hidden.
Comments (80 posted)
Patches and updates
Core kernel code
Filesystems and block I/O
Page editor: Jonathan Corbet
From 20th to 23rd of October 2010, the second international openSUSE conference
took place in Nuremberg, Germany. With the motto "Collaboration across
Borders", all users, contributors, and supporters of the openSUSE project
and free software in general were invited to four days of learning,
discussing, and hacking. More than 250 openSUSE enthusiasts came to
Nuremberg, and it was an excellent opportunity to see how the openSUSE
project is doing these days.
Get your ass up
Hendrik "Henne" Vogelsang, a founder and board member of the openSUSE
project, gave the first keynote, "Get your ass up!". He kicked off his
presentation with a question to the audience about how old they thought
SUSE was. People tend to forget that SUSE is one of the oldest
distributions: it's already 18 years old. Compare that with Debian which is 17 or Red Hat/Fedora which is 16. People from all over the world have been using SUSE since they were very young, and it has a large user base. But at the same time, the community is very young: the openSUSE project was founded only 5 years ago, and Henne explained that it has only become a real open source project very recently: "For the first 3 years we really struggled in the transition from a company-made product to an open source project. Only when Factory was opened in 2009, openSUSE became really open."
This means that openSUSE is in a unique position: it is a very young project with a very old distribution and a very large user base. According to Henne, now is the time to take advantage of this position and make a difference: "We have no rules, so we have all freedom to start to do things today." And he immediately followed this with the advice to take responsibility:
There is no big dark overlord in the background who fixes things if we fail or who tells us what to do. Novell won't, the openSUSE board won't, the team leaders won't, the strategy team won't. It's you that makes the difference, so step up and take responsibility.
Then Henne highlighted some examples of people that stuck their neck out and made a difference. Andrew Wafaa spearheaded a MeeGo version of openSUSE, Smeegol. This is not a Novell-initiated project (although there are some Novell people contributing as individuals), but completely done by volunteers. The openSUSE wiki is another example of the power of individuals: when the openSUSE project was started 5 years ago, one of the first things some volunteers did was starting with a wiki. And a couple of months ago, the openSUSE wiki team launched a complete overhaul of the wiki with a new structure, theme, and search engine. Last but not least, Henne praised the strategy team and community manager Jos Poortvliet for investing their time.
He also stressed that you don't need to wait for an OK from everybody before you start with such an initiative: just make the difference, consensus is not needed. One of the things often stopping us from stepping up is the fear of duplication, he explained: "Why do we offer 8 desktop environments, why do we have Vim, Emacs and Gedit, why do we have KDE's Plasma Netbook and MeeGo for netbooks, and so on." According to Henne, there is nothing wrong with this: duplicated efforts are not a waste of time, because we can't all possibly want the same things. His advice is simple: don't let other people tell you to not do something because someone else already did it; we need diversity. He put it somewhat bluntly: "If you want to help, then help; but if you see someone doing something you are not interested in, just shut up and get out of the way."
Another thing that discourages people to step up is the fear of failure. But of course, if you always take a safe route and don't fail, then the project never really advances, never innovates. That's why it's so important to let each other fail and pick each other up after that. Henne's last advice in his talk is a direct consequence of this approach: "Don't always think things through: you can't always have a 100% solution. Even if you have a small idea, try it out. Be playful, this is what open source is about." All in all, Henne's talk was a great reminder of the responsibility that each individual community member in an open source project bears.
OpenSUSE's community manager Jos Poortvliet presented an update about the strategy discussion we wrote about in June and in September. He started his talk with a remark: "When I joined Novell, I was glad that this discussion was going on, because I hadn't a really good idea of what openSUSE was either." He summarized that a strategy has two main goals: to help make decisions, and to help focus. Both goals are needed not only for technical matters, but also for marketing. For example, if the openSUSE marketing team decides to create a leaflet, the result should obviously depend on the target audience and the goals of the distribution. For instance, will you present the openSUSE Build Service and YaST in the leaflet? Probably not if you're targeting beginners. And will you make a default choice for a desktop environment? If you target beginners, you can choose for the users so they don't have to; but if you target power users, you pick a default desktop environment but allow choosing, or you include all necessary information for the users to make an informed choice themselves.
Jos also made it clear from the beginning that a strategy is not a vehicle to limit the community: he referred to Henne's message that you should not tell people not to do something, and he added that this holds even if it's something going against the strategy. And in a philosophical mood, he said "If you're not seeing yourself in the strategy document, the document has to change; not you!" In a free software community, people will always work on all sorts of things, but if you have a clear identity (not just "We are just another Linux distribution, and we're green") and clear goals, it's much easier to invite and attract other people.
The current openSUSE vision and strategy proposal is published on co-ment, a web-based document collaboration and annotation tool. The target user of openSUSE is described like this:
We cater to beginner or advanced users who are interested in computers and want to experiment, learn or get work done. We offer them a stable and comfortable computing experience which does not limit their freedom of choice, offering sane defaults and easy configuration.
Jos gave some hypothetical examples: if you're an audio professional and need JACK and a realtime kernel, you are one of openSUSE's target users. And if you're a student new to Linux but wanting to learn and experiment, you're also a target user. The strategy also spells out what openSUSE offers:
We are the openSUSE Community - a friendly, welcoming, vibrant, and active community. Within the openSUSE Project we provide an open and innovative atmosphere to collaboratively work on a variety of distribution- and packaging related technologies and products.
Our development philosophy is stability and flexibility rather than being bleeding edge; innovative community infrastructure; and actively seeking collaboration with the wider Free and Open Source community.
The openSUSE project is built upon three pillars mentioned in the above quote: the community, the distribution, and the infrastructure. Therefore, the strategy proposal describes all three pillars. The community ("the heart of the openSUSE project") is described as collaborative and contributing improvements to upstream projects. Moreover, the community works closely with companies in its ecosystem that provide additional value, including support and enterprise offerings on top of or derived from openSUSE technology. Also, the barrier to becoming part of the openSUSE community should be lowered wherever possible. And last but not least, the proposal emphasizes that openSUSE aims to foster the development of free and open source software, but takes a pragmatic approach to what they ship to their users. Jos explained this as "We prefer to ship free software, but we'll not screw our users if they want audio or Flash support."
The openSUSE distribution is described as follows:
The openSUSE distribution offers a powerful, stable core and enables everybody to contribute additional packages and tools through the openSUSE Build Service: freedom and choice are our keywords.
Jos added that one of the goals of the distribution is a good out-of-the-box experience based on sane defaults. The freedom of choice is exemplified in a wide software selection and compatibility with other operating systems, including Windows and Mac OS X. The last pillar of the project is the infrastructure:
The freely available openSUSE tools and services aim to support the collaborative development process within openSUSE and we encourage other projects to leverage them for their own usage.
This infrastructure part refers to one of the really strong points of the openSUSE distribution in recent times. The openSUSE Build Service makes it possible to make up to date packages available for multiple current releases, even for other distributions. Moreover, with the Kiwi build system and SUSE Studio, the project provides technology to easily build openSUSE derivatives in the form of live images, appliances, and even full distributions. Your author experienced that this is not just theory: when he wanted to create a Dutch version of the openSUSE 11.3 KDE4 live CD this week and his first attempts failed, he asked Jos who could help him and got an immediate response from openSUSE Boosters Will Stephenson and Stephan Kulow. After some configuration changes and two kiwi commands, the result was the desired Dutch live CD.
The strategy proposal also lists some things that openSUSE doesn't focus on. For example, it won't oversimplify the system to the point where configuring it becomes harder: "We prefer flexibility over an extreme focus on ease of use". OpenSUSE will also not aim at having the latest and greatest in shipped releases, nor will it provide feature upgrades for a shipped release. But flexibility also means that if you really want to install the latest packages, you can through the openSUSE Build Service. This way, you preserve the stability and integrity of the rest of your system.
After the presentation, there was plenty of time for questions, and questions there were. A valid criticism that was raised is that the wording of the current strategy proposal is too developer-centric. Jos agreed that this is the case and said that he wants to address this in a next version. Another person had the opinion that the strategy proposal is too boring and negative (he summarized it as "We don't want to be selfish like Ubuntu or unstable like Fedora"), to which Jos answered that a strategy document is indeed boring, but that it's needed to build upon and to create exciting marketing material.
Someone else remarked that this strategy doesn't seem actionable: "How will it change how we do openSUSE? What are the next steps?" According to Jos, this strategy is indeed not sufficient, some people need to step up and really move the community forward to its goals. Someone else proposed to split the document into a short one with a sexy high-level description of the strategy, and an implementation document that describes how the community will implement this strategy. The latter document can then talk about detailed things like openSUSE not doing feature upgrades for a shipped release.
openSUSE and Novell
Gerald Pfeifer, Director of Product Management at Novell's Open Platform Solutions Business Unit and thus responsible for all SUSE Linux Enterprise products and SUSE Studio, was the keynote speaker on Friday, with a talk entitled "openSUSE and Novell: an unlikely couple?" In his talk, Gerald tried to answer some misconceptions about how openSUSE and Novell work together. He started by stressing that there is no such thing as "Novell employees" as opposed to "the community": many of the employees of Novell's Open Platform Solutions business unit, even in the management team, have a history with free software. "A lot of Novell employees are part of the community, and these people with their feet in both Novell and the openSUSE community are not only important for openSUSE but also for Novell." As a side note, he remarked that it even doesn't make much sense to talk about "the openSUSE community", as there is an openSUSE kernel community, openSUSE forums community, openSUSE wiki community, openSUSE KDE community, openSUSE GNOME community, and so on, all behaving differently.
So why does Novell support the openSUSE community? Gerald presented Novell's twofold goal: increase the share of Linux as opposed to other operating systems, and maximize the amount of openSUSE and SUSE Linux Enterprise used. But even just awareness - if people know what openSUSE is - is already important. That said, Novell is a company with shareholders and a board, and it has certain rules and regulations to follow. For instance, a company is supposed to increase revenue or decrease costs with every action it takes, so Gerald explained that Novell can't fulfill each request from the openSUSE community like "Why don't you add 50 more people to the openSUSE Boosters team?"
Obviously Novell needs openSUSE because it is the base for SUSE Linux Enterprise (SLE). Gerald made this clear: "It's very hard to develop an enterprise Linux distribution every three to four years if you don't base it on a current distribution." For SLE, Novell does a lot of quality assurance, so it becomes more ripe and stable than openSUSE, but Gerald stressed that stability is not a side criterion for openSUSE: "OpenSUSE is not a crash test facility for our enterprise Linux distribution, and it never was meant to be: I want it to be as stable as possible." He admitted, though, that there was at least one painful case where this went wrong: many SUSE users will remember the broken update mechanism in SUSE Linux 10.1.
One of those areas were Novell has a clear focus on contributing to openSUSE directly is the openSUSE Boosters team. These are thirteen people paid by Novell to support and "boost" the openSUSE community: they don't develop specific projects or maintain specific packages, but if they see a stumbling block for users or contributors, they remove these obstacles, e.g. in the areas of documentation or infrastructure.
But Gerald emphasized that Novell is contributing a lot more people to
the openSUSE project than just the Boosters: there is for example the
security team, the openSUSE community manager Jos Poortvliet, kernel
people, a GCC maintainer who maintains GCC packages for openSUSE even if
these versions are not and will not be used in SUSE Linux Enterprise
(Richard Günther), and so on. Moreover, Novell contributes hardware
and other infrastructure, and even pays legal costs, e.g. for checking
license compliance. Tools like the openSUSE Build Service and SUSE Studio,
developed by Novell, are directly beneficial to the openSUSE community as
well. Of course, Novell also provides significant contributions to upstream projects of various sorts, which also benefits openSUSE.
Gerald concluded his talk by saying that it's important both for Novell
and openSUSE that openSUSE stands more on its own feet: "I want to
see more openSUSE volunteers at next year's conference." His
reasoning was: the more Novell needs to direct efforts to baseline work,
the less will land higher up the stack in terms of innovation. That's also
why he said that Novell is very supportive of an openSUSE Foundation,
e.g. by investing quite a bit of lawyers' time for the needed legal work.
After his talk, Gerald left some room for questions, which were
numerous. Andrew Wafaa asked the most interesting one from the point of
view of the relationship between Novell and openSUSE: why doesn't Novell
open up its internal mailing lists? Gerald's answer was that internal
mailing lists will always exist, and other companies working with open
source projects also have them. Some conversations should just remain
internal to the company. However, he has seen several cases of discussions
on an internal mailing list, e.g. about the kernel, where someone mentioned
openSUSE and someone else requested taking the matter to the opensuse-kernel mailing list. "At Novell, we all keep an eye on which discussions benefit from being done publicly." Someone else in the audience added the remark that the openSUSE community also has a responsibility here: "Keep the openSUSE mailing lists friendly, otherwise not only external people but also Novell employees get scared and discuss their stuff on the internal mailing lists."
Between focus and anarchy
These three talks give a good picture of what is going on in the
openSUSE community right now. The presence of Gerald's talk in the schedule
seems to suggest that Novell feels the need to defend the specifics of its
relationship with openSUSE to the community. His assurance that Novell is
very supportive of an openSUSE Foundation should be nice to hear for
members of the community who want a stronger and more independent
distribution that is able to attract more corporate sponsors.
The two other talks had somewhat contradictory messages. While Henne emphasized that everyone can do what they want in the openSUSE project, Jos tried to convince his audience that openSUSE needs a focused strategy to attract new people. Both of these approaches have some truth in them, but combining them will be a delicate task for the project. If everything is possible, like Henne maintains, then openSUSE will be a wonderful playground for technology enthusiasts and anarchist programmers, but most people from outside the community may be scared to join this chaos. On the other hand, if openSUSE chooses a strategy that is too focused, it may be easily able to attract new people that are interested in its goals, but it may alienate many of its current users.
The trick will be to fine-tune the current strategy to the point where
most members of the community will choose to focus on the strategy's goals themselves, while those who want to explore other topics still have the freedom to do so. In practice, this doesn't seem a big change from the current situation, but it's important that people who contribute or want to contribute to openSUSE now have some written guidelines. Coupled with Henne's reminder that each individual community member bears responsibility for the project, the message is clear: now that openSUSE seems to have figured out its place in the Linux ecosystem, it's time to take action.
Comments (none posted)
The surprise controversy this week was that I removed /usr from the list of default mount points in the UI. That doesn't mean you can't still use it. It just means you have to type it in manually as opposed to choosing it from a drop down. The rationale is that Fedora in general does a pretty bad job of supporting /usr on its own mount point. We do such a bad job that even the install guide recommends against it.
Well, someone actually read the anaconda changelog (which is probably the most surprising of all) and decided to comment. The whole mail thread was focused on whether or not Fedora as a whole should allow /usr on its own partition so it was really only tangentially about anaconda. We largely stayed out of the conversation and it kind of died without any real conclusion. The more interesting parts of the thread were about per-user /tmp which doesn't really have anything to do with the initial post.
In a way, it's not all that different from what Moblin and Maemo did. They
used GNOME technologies with a different shell. We were ok with that
because they were expanding into new markets - netbooks and tablets - and
because it didn't seem like a step away from GNOME but a step forward with
GNOME. Canonical's move with Unity is similar. Except that they aren't
starting from scratch, they are moving from a traditional GNOME desktop to
Unity. So we feel the change more.
-- Stormy Peters
Comments (none posted)
Debian Edu/Skolelinux has released Debian "squeeze" based 6.0.0 alpha1.
"This is the second test release based on Squeeze. The focus of this
release is the thin clients and the diskless workstation setup. Please
install a thin client server, and make sure all programs in the KDE menu
work on both thin clients and diskless workstations. Especially sound is
important to test.
Full Story (comments: none)
Click below for a recap of the October 21, 2010 special meeting of the
Fedora Board. Names for Fedora 15, the voting schedule, and spins were
among the topics discussed.
Full Story (comments: none)
Click below for a recap of the October 25, 2010 meeting of the Fedora
Board. Topics include F14 release planning, F15 release names, Fedora
elections, and several other items.
Full Story (comments: none)
Fedora 14 was declared ready during the Go/No-Go meeting. Look for the
release announcement on November 2.
Full Story (comments: none)
Newsletters and articles of interest
Comments (none posted)
Ars technica has a
from Mark Shuttleworth's keynote at the Ubuntu Developer Summit,
where Shuttleworth announced that the Unity shell will become Ubuntu's
default user interface for both the desktop and netbook editions. "I also asked Shuttleworth why Canonical is building its own shell rather than customizing the GNOME Shell. He says that Canonical made an effort to participate in the GNOME Shell design process and found that Ubuntu's vision for the future of desktop interfaces was fundamentally different from that of the upstream GNOME Shell developers. He says that GNOME's rejection of global menus, for example, is one of the key philosophical differences that would be difficult to reconcile. Canonical has accumulated a team of professional designers with considerable expertise over the past few years. They want to set their own direction and create a user experience that meets the needs of their audience. The other major Linux vendors, who are setting the direction of GNOME Shell's design, have different priorities and are arguably less focused than Ubuntu on serving basic desktop users.
Comments (127 posted)
Ars technica has a review
of Ubuntu 10.10. "Ubuntu 10.10, codenamed Maverick Meerkat, emerged from its burrow this month with some important changes. The user interface got a lift from some theming improvements and a new default font. Usability got a nice boost from a wide range of design improvements and feature enhancements in the Software Center and Ubiquity installer. Canonical's effort to clean up the notification area took another step forward with the addition of playback controls in the sound indicator menu. The latest version of GNOME is included, with a handful of minor improvements, and the F-Spot photo manager was replaced with Shotwell.
Comments (none posted)
Raphaël Hertzog blogs
his work on the new source format known as "3.0 (quilt)". "This patch can have two functions: creating the required files in the debian sub-directory and applying changes to the upstream sources. Over time, if the maintainer made several modifications to the upstream source code, they would end up entangled (and undocumented) in this single patch. In order to solve this problem, patch systems were created (dpatch, quilt, simple-patchsys, dbs, ...) and many maintainers started using them. Each implementation is slightly different but the basic principle is always the same: store the upstream changes as multiple patches in the debian/patches/ directory and apply them at build-time (and remove them during cleanup).
Comments (none posted)
PCWorld has a review
of Linux Mint 10 RC. "Along with Ubuntu 10.10, Linux Mint 10 RC is based on version 2.6.35 of the Linux kernel along with version 2.32 of the GNOME desktop environment and X.org 7.5. All of these bring with them a raft of security and other improvements.
Comments (none posted)
Red Hat News takes
at the relationship between Fedora and Red Hat. "Red Hat participates in this process as part of the Fedora community, and its contributions to Fedora help enhance the technology selected by Fedora's substantial user and contributor base. Fedora helps Red Hat meet a goal of more scalable, extensible, and interoperable Red Hat Enterprise Linux, which is derived from Fedora.
Comments (none posted)
Yet Another Linux Blog looks
TinyMe. "TinyMe is based on Unity Linux 2010 and was previously based on PCLinuxOS. It uses LXPanel, PCManFM and the Openbox Window Manager to handle the heavy desktop lifting. The ISO I used was a release candidate and lacked much of the polish of the TinyMe stable release of the past. Even though it's a release candidate, I still found it quite stable and usable..especially since I know my way around the openbox window manager.
Comments (none posted)
Page editor: Rebecca Sobol
Once hailed as the "next-generation" of package management, Conary was
introduced by Eric Troan in 2004
at the Ottawa Linux Symposium (OLS). Though Conary hasn't replaced
traditional Linux packaging technologies, it is in wider use than one might
think. The next release promises better system management, but is anyone
actually using Conary, and where's it going? The answers are yes, and
possibly beyond Linux.
Conary was meant to solve some deficiencies that exist in standard
package formats. For instance, that package versioning as expressed by RPM
or Debian packages does not allow for branches, only a linear newer/older
model. Conary was developed to make it easier for users to create their own
distribution, from a collection of repositories. The idea being that one
might pick and choose repositories from which to install GNOME, Firefox,
etc., rather than getting all of their software from Fedora or Debian or
This has not quite come to pass, at least for most users. While there
are distributions using Conary, the primary usage of Conary these days
seems to be building custom distribution appliances for businesses.
How is Conary Different?
To learn more about Conary and its current state, we interviewed rPath's
Michael K. Johnson, founding engineer at rPath and founding technical
leader of the Fedora Project.
Despite its age, many Linux users have probably never heard of Conary or
only have heard of it in passing. Fewer still are likely to be familiar
with the details. Though Conary is lumped into discussions of package
management, it's a bit more than that. Conary is described as "distributed
software management system" for Linux distributions, as opposed to a
package management system. Rather than managing software as "specialized
archives" (as Johnson calls RPMs and Debian packages), Conary packages are
references to files in a database. The packages contain references to
components, which are divided by their roles in a package — such as
runtime requirements, documentation, libraries, etc.
Conary actually works as a sort of distributed source control
system. Software comes from specific repositories, and the associations are
much more granular than Debian packages or RPMs. For example, it's possible
to remove a file from the system and, when the package that owns the file
is updated, the individual file is not reinstalled. Files are treated as
first class objects in Conary, and can be managed individually if
Packages can have branches called shadows, a customized
version of the package that references the original plus changes, or for
minimal changes it's possible to have a "derived package" that applies
changes without rebuilding a package. As the SCM heritage suggests, Conary
also has rollback capabilities that are much more elegant than what is
allowed by RPM or dpkg.
Conary also allows for "groups," something like a metapackage or task,
that pulls together the components that make up a collection of software
meant to be installed together. GNOME or KDE might be distributed as a
group, or a collection of server software that contains all of the
libraries, applications, and supporting software that needs to be
In short, Conary introduces much more detail and flexibility in managing
Conary 2.2 is due out "soon," and Johnson says "near-final" snapshots
are already being used in Foresight Linux development. Johnson says that
2.2 introduces a new and more flexible way to manage systems:
Conary 2.2 introduces "system models", which you can think of as
"groups lite". A system model allows you to describe concisely how a
system is different from a group on which it is based. Instead of
building a group for each unique software combination, you can build
fewer base groups, and then express minor unique variations on a
per-system basis where conflict is unlikely. It makes it possible to
have a more dynamic building-block approach to configuring systems,
without giving up the control that Conary provides.
As an example, it is reasonable to have a model that expresses,
'This is one of my web application server systems, built from my
group-mywebapp manifest. It is a Dell server, so I will add to it the
Dell hardware support packages that my organization uses, which I have
bundled together in group-dell-packages. It is being deployed in my
Atlanta data center, so I need to add the administrative credentials
required for all systems deployed in my Atlanta data center, so also
include my atlanta-data-center-credentials package that sets up those
Johnson went on to give a detailed example of the process and group
definitions behind the changes that would be required to set up the server,
in just a few lines. The upshot here is that Conary 2.2 adds features that
make it very easy to clone and manage systems in a few commands.
System models are the primary new feature in 2.2, but Johnson says it
also has memory improvements and uses less bandwidth.
Conary and rPath Adoption
For all its technical advantages, Conary (like rPath) has yet to take
the world by storm. None of the major distributions have switched to Conary
as their package management system or base for development. But that
doesn't mean that it's not in use.
Conary and other package management systems are not mutually
exclusive. Johnson says there was an "epiphany," about what Conary could do
early in 2009.
We realized that we could analyze packages for
different packaging systems (e.g. RPM) and provide the same essential group
build capabilities for sets of packages (e.g. RPMs).
"encapsulating" other package formats was introduced in Conary 2.1. rPath
has been offering custom versions of CentOS, Red Hat Enterprise Linux, SUSE
Linux Enterprise, and rPath.
If you don't see much of Conary in the wild, where is it being used?
Johnson says that its largest audience is "enterprises using rPath's
product line, based on Conary, to manage diverse systems on a massive
scale," in other words "enterprise appliances." Johnson says that ISVs are
also a significant audience for Conary, and are using Conary and rPath's
tools to deliver software as software, hardware, and virtual appliances. He
cites the Department of Energy, EMC, Fujitsu, IBM, and Qualcomm as
customers who are using Conary and other rPath tools to build and manage
In general, Johnson says that there are hundreds of rPath derivatives,
but not all are public:
The derivative products we have in mind are
not like the typical 'respin.' Our derivative products will be less
visible, precisely because our products were intentionally "unflavored" so
that the derivatives wouldn't require co-branding.
Some derivatives that use Conary aren't rPath-based at all. Johnson says
many customers are using rPath supported versions of SLES, RHEL, and CentOS
to create multiple products or "system definitions" from those
Of course there's also Foresight Linux, which is based
on rPath and Conary. Foresight has had its ups and downs, and was on the ropes briefly when rPath
laid off the developers who were working on the distribution.
There's also interest in using Conary with major distributions, albeit
in a slightly different way. For instance, there's Boots,
a Conary-encapsulated mirror of Fedora. Interest in Boots started when Johnson proposed a change in
direction for Foresight Linux.
And Conary has been adopted for some derivative versions, like the Openfiler Storage Appliance.
In 2005, Johnson suggested that Conary was not limited to Linux, and
could be by the BSDs and other operating systems. So far, Johnson says that
rPath has "not received significant feedback" from customers saying it'd be
worthwhile for the company to package any of the BSDs. It's technically
possible, but not in demand.
But the company is building support for managing Windows
packages. Johnson says that the company is building in support to create
MSI installable packages for Windows, and the company specifically is hiring for
field engineers that have experience with MSI packaging and other
Windows system provisioning.
Johnson says that the company is also working on managing other package
types, so it may not be long before the rPath rBuilder tools support
creating Ubuntu or Debian based appliances as well.
Though Conary has not replaced traditional package management for most
Linux users or developers, nor has rPath become a household name for Linux
users, it's more successful than one might think at first glance. Conary
may well be worth a look for developers and ISVs that create software
appliances, or for enterprises that wish to have more control over the
management of their systems.
Comments (7 posted)
In fact, overall Py3k uptake is very slow. Which isn't very
surprising, as there's very little to gain by switching to it and
more than a little headache.
At any rate, most of the world appears to still be on Python 2.6
and can blissfully ignore that's there's no 2.8 planned for
probably another year or more. Which means the Python devs still
have a year to come to their senses about discontinuing the
language people actually use.
-- Matt Mackall
Recently, a few python packagers from a couple Linux distributions
started thinking about wanting to port more python modules to
python3 to aid in migration and respond to the user demand for
python3 versions of some software. We came to the conclusion that
we're all feeling our way down this path, writing a little patch
here and a little patch there as we try to make a package here or a
package there compatible. This is good organic growth but has some
limitations: different distributions tending to reinvent the same
changes, patches floating around unaccepted by upstreams, and
common porting issues having to be discovered by each person
-- Toshio Kuratomi
working to improve the situation
The Oracle employees who are members of the OpenOffice.org project
and who expressed themselves these past days have displayed a
disturbing lack of understanding of Free and Open Source Software;
LibreOffice is, after all, and until proven otherwise, a downstream
version of OpenOffice.org, and as such deserves inclusion into the
OpenOffice.org community. I can only imagine what it would be like
if Debian was rejecting the Ubuntu employees among its teams,
calling it a fork.
The sad thing in the bigger picture, is to see the community of
companies grow (and fail) faster than the lessons / experience of
past failures percolate through to their leadership. In my
experience most people totally miss the most difficult piece of
software engineering: which sounds like it should be software - but
is really about people - and more importantly - them working
together collaboratively in a constructive, friendly, and
There is not one out-and-out success story of a company building a
great high-quality custom user interface on the standard Linux
stack, except Android, which is hardly a model of collaborative
Comments (none posted)
telephony system has been
released; this is a major,
long-term-supported release. New features include secure RTP support, IPv6
SIP support, calendaring integration, a new call logging system, and more;
for more information than you will ever possibly be able
to use. (Thanks to Graham Cantin).
Comments (14 posted)
is the tool which
does the real work behind every other
Linux-based DVD authoring application. Version 0.7.0 has been released;
changes include better encoding support, more flexible configuration, and
Full Story (comments: none)
KDE.News has the
KDevelop 4.1 announcement
. There are improvements in patch exporting,
script integration, PHP support, and a new hex editor, but the headline
feature seems to be Git support. "That means that we have support
for the basic features for management of a VCS-controlled project, like
moving, adding and removing files inside the project. Additionally we
integrate the basic VCS features like comparing and reviewing local
changes, sending our changes back to the server, updating the local
checkout and annotating files.
Comments (none posted)
Mozy, an online backup provider, has announced
the release of some of its
code under the BSD license; that includes the Mordor
C++ I/O library.
"Mordor is a high performance I/O library. One of its main goals is
to provide very easy-to-use abstractions and encapsulation of difficult and
complex concepts, yet still provide near absolute power in wielding them if
Comments (none posted)
Mozilla Labs has announced
the "Chromeless" project, aimed at making it easier for developers to
create browser interfaces. "The 'Chromeless' project experiments
with the idea of removing the current browser user interface and replacing
it with a flexible platform which allows for the creation of new browser UI
There is a "pre-alpha prototype" available now.
Comments (30 posted)
Shed skin is a Python-to-C++ compiler; the 0.6 release is now available.
It includes some major changes in how program analysis is done, allowing it
to scale to larger ("several thousands of lines") programs. See the
for more information.
Comments (none posted)
Valgrind 3.6.0 is available. New features include support for the ARM
architecture, updated distribution support, an understanding of the SSE4.2
instruction set, an improved profiler, and experimental new heap profiler,
and more; see the release
Comments (none posted)
Newsletters and articles
Comments (none posted)
Over at Linux Magazine, Joe "Zonker" Brockmeier takes Zimbra Desktop 2.0 for a spin
, comparing it to the email interface that Gmail provides. While there is much to like with Zimbra, which is open source, he found Gmail to be easier to use. "Another major feature for Zimbra is that it allows you to use pretty much any mail service. So you can tie Zimbra Desktop to an IMAP server that you control, and all your mail belongs to you. Totally. For some folks, this feature alone is going to make Zimbra (or another email client) far more desirable than Gmail. While I am sometimes uneasy with Googles ever-increasing collection of data, Im not personally concerned that someone at Google is reading my email. And Googles Gmail reliability has improved to the point that I havent had a problem reaching my mail in several months.
Comments (14 posted)
The H has a
lengthy review of Rosegarden
. "If access to computers and
networks have given us the means to copy and distribute recorded works,
they have also given us the means to create our own music. If computers and
networks have made it possible to take the programming art into the home,
they have also taken the black arts of the recording studio closer to the
back bedroom, and in so doing, have made the making of music more
accessible to greater numbers of people - much as free software has made it
easier for 'hobbyist' programmers to join in and test their skills and make
Tools for GNU / Linux and free software have played their part in this evolution, and Rosegarden is a significant part of the canon, a well structured MIDI / audio sequencer and musical notation editor with a well thought out user interface which has put usability and ease of learning to the fore.
Comments (4 posted)
LinuxDevices.com has an
overview of the Yocto Project
just announced by the Linux Foundation. "Unlike build systems based on shell scripts or makefiles, the Yocto Project automates the fetching of sources from upstream sources or local project repositories, says the project. Its customization architecture is said to allow the choice of a wide variety of footprint sizes as well as control over the choice or absence of components such as graphics subsystems, visualization middleware, and services.
Yocto is based on the GNOME-derived Poky Linux, a well established
platform-independent, cross-compiling build system that uses the same
architecture as the OpenEmbedded build system.
Comments (9 posted)
Page editor: Jonathan Corbet
At the Embedded Linux Conference Europe
, Tim Bird, architecture chair of the Consumer Electronics Linux Forum
(CELF), announced that the organization was joining the Linux Foundation. Bird said that CELF "couldn't be more happy to have the opportunity
" to work within the LF. Jim Zemlin, LF executive director, congratulated both organizations and mentioned that the LF would be doubling the funding that CELF currently puts into promoting embedded Linux. He also said that there would be some more information about the new Yocto project
—an effort to standardize the embedded Linux development environment—later in the conference.
Update: The Linux Foundation press release about the merger is available as well.
Comments (2 posted)
The Fedora Scholarship program is accepting applicants from students who
will be entering college in Fall 2011. "The Fedora Scholarship
program recognizes one high school senior per year for contributions to the
Fedora Project and free software/content in general. The scholarship is a
$2,000 USD reward per year over each of the four years the recipient is in
college, which is funded by Red Hat's Community Architecture team, as well
as travel and lodging to the nearest FUDCon for each year of the
Full Story (comments: 4)
The mobile patent thicket grows thicker: a company called Gemalto has announced
filing of a
against Google, HTC, Motorola and Samsung, claiming that
Android violates its patents
latter two were just issued in October; all seem to cover the revolutionary
concept of running an interpreted language on a microcontroller.
Comments (35 posted)
Articles of interest
Ars technica reports
on some changes to Nokia's mobile platform strategy. It plans to do more rapid and incremental Symbian releases, while making Qt the "sole focus" of its application development. "Nokia's plan to use Qt for all of its own applications is also significant. It will enable richer user interfaces and more consistency between Symbian and MeeGo. It also sends a strong message to third-party developers that Qt is ready for prime time on Nokia devices. The recent Qt 4.7 release brings some extremely compelling new functionality for building modern touch-friendly mobile software. Taking advantage of these capabilities will make the Symbian user experience better and help ameliorate some of the issues that detract from Symbian's competitiveness. During my recent tests of the N8, I often found myself thinking that the whole experience would be better if Qt was used pervasively in the bundled applications.
Comments (9 posted)
Ars technica looks
at the advantages of the Qt toolkit. "A point that I think often gets overlooked in the toolkit debate is that adopting Qt doesn't necessarily imply ditching GNOME or switching to KDE. As we discussed in our review of Qt 4.5 last year, Qt has relatively robust support for Gtk+ theming, including conformity with the GNOME HIG and support for native GNOME dialogs. When everything is properly configured, Qt applications look entirely at home in GNOME environments. Adding a standard Qt library stack to a fresh Ubuntu installation requires only 16.5MB of packages, which expands to approximately 50MB on disk.
Comments (65 posted)
The Free Software Foundation Europe has posted an interview with
. "I am fighting within The Pirate Party, as well as
in the Freedom not Fear movement, for Free Software. In both movements a
lot of people haven't understood yet how important Free Software is: FS
does not really connect the one with the other. They are connected in
different ways and I can also understand their critique about Free
Comments (6 posted)
The H reports
that Christoph Noack, Florian Effenberger and Thorsten Behrens have
resigned from the OpenOffice.org community council. "Noack says in his email that his "idea of a stable and working open-source environment differs from what I currently perceive when we talk about certain community structure characteristics." Effenberger notes that he feels it's unfortunate that some people view OpenOffice.org and LibreOffice as separate and conflicting projects and that he hopes there will be a resolution in the future.
Comments (none posted)
Simon Phipps worries
that excessive focus on license compliance actions obscures the fact that free
software licenses make life easy for users. "Open source does not
place a compliance burden on the end user, does not mandate acceptance of
an end-user license agreement, does not subject you to para-police action
from the BSA. That is a significant advantage, and there's no wonder that
proprietary vendors want to hide it from you and make you think open source
licensing is somehow complex, burdensome or risky. If all you want to do is
use the software - which is all you are allowed to do with proprietary
software as the other three freedoms are entirely absent - then open source
software carries significantly less risk.
Comments (46 posted)
No Starch Press has released "Land of Lisp", a "Unique,
Cartoon-Filled Guide Makes Lisp Programming Fun
", by Conrad Barski.
Full Story (comments: none)
MAKE Magazine Volume 24 from O'Reilly Media is available.
Full Story (comments: none)
The October issue of the Linux Foundation newsletter covers the Linux
Foundation User Survey; New Open Compliance Resources Available; Linux
Kernel Summit & Plumbers Conferences Are Coming Up; Aava Mobile,
Insprit and OpenLogic Join The Linux Foundation; the Linux Foundation in
the News; and Upcoming Training Opportunities.
Full Story (comments: none)
Contests and Awards
The Electronic Frontier Foundation (EFF) has announced the
of its 2010 Pioneer Awards. The winners are Pamela Jones
and Groklaw, Steven Aftergood, James Boyle, and Hari Krishna Prasad
Vemuru. "When Pamela Jones created Groklaw in 2003, she envisioned a new kind of participatory journalism and distributed discovery -- a place where programmers and engineers could educate lawyers on technology relevant to legal cases of significance to the Free and Open Source community, and where technologists could learn about how the legal system works. Groklaw quickly became an essential resource for understanding such important legal debates as the SCO-Linux lawsuits, the European Union antitrust case against Microsoft, and whether software should qualify for patent protection.
Comments (none posted)
Education and Certification
The Free Technology Academy (FTA) and the Free Software Foundation (FSF)
have announced their partnership in the FTA's Associate Partner Network.
"The Network aims to expand the availability of professional
educational courses and materials covering the concepts and applications of
Free Software and free standards.
Full Story (comments: none)
The linux.conf.au 2011 organizing team has announced that Vinton G. Cerf
will be a keynote speaker for lca2011. "Vinton G. Cerf has served as
vice president and chief Internet evangelist for Google since October
2005. In this role, he is responsible for identifying new enabling
technologies to support the development of advanced, Internet-based
products and services from Google. He is also an active public face for
Google in the Internet world.
Full Story (comments: none)
Events: November 4, 2010 to January 3, 2011
The following event listing is taken from the
|ApacheCon North America 2010
||Atlanta, GA, USA
|Linux Plumbers Conference
||Cambridge, MA, USA
||2010 LLVM Developers' Meeting
||San Jose, CA, USA
|Free Society Conference and Nordic Summit
|Technical Dutch Open Source Event
|OpenOffice.org HackFest 2010
|Free Open Source Academia Conference
|OpenStack Design Summit
||San Antonio, TX, USA
||NLUUG Fall conference: Security
|8th International Firebird Conference 2010
|Japan Linux Conference
|Mini-DebConf in Vietnam 2010
||Ho Chi Minh City, Vietnam
||Ho Chi Minh City (Saigon), Vietnam
|MeeGo Conference 2010
|OpenFest - Bulgaria's biggest Free and Open Source conference
|Kiwi PyCon 2010
||Waitangi, New Zealand
|Open Source Developers' Conference
||Open Source Conference Shimane 2010
||12. LinuxDay 2010
|European OpenSource & Free Software Law Event
||London Perl Workshop 2010
||London, United Kingdom
|PGDay Europe 2010
||Open Source Conference Fukuoka 2010
If your event does not appear here, please
tell us about it.
Page editor: Rebecca Sobol