By Michael Kerrisk
January 30, 2013
Bdale Garbee has become a rather well-known presence at linux.conf.au,
having attended and spoken at every instance of the conference since 2002.
Until his retirement in 2012, he was also the public face of long-time
sponsor of the conference, Hewlett-Packard (HP). His keynote speech on the
opening day of linux.conf.au 2013 was entitled "The Future of the Linux
Desktop". However, given the various interests and projects that he has
championed and spoken about during his long involvement with the
conference, he also took some time to update the audience on some of those
projects.
Bdale, FOSS, LCA, and rockets
Bdale noted that his first personal contribution to what is now called
called open source was in 1979. "My suspicion is that a few of you in
the audience weren't alive then." The first LCA that he had the
"privilege" to attend was in 2002, when was invited to participate in the
Debian miniconference. He noted that attending LCA that year had a
significant impact on his life thereafter. At the conference, he was
encouraged to run once more for the role of Debian Project Leader (DPL),
and was this time successful, which had a ripple effect in many other
areas of his life.
Thus, for example, as DPL, Bdale was invited to be a keynote speaker at
LCA 2003. He was invited
to be a keynote speaker again at LCA 2004, and in the same year
his employer became a sponsor of the conference. After that, he noted that
there was a shift in the content of his talks at LCA. "I had
achieved a standing in the company that let me do things like flying at
company expense to Australia to talk about my hobbies."
As many people know, one of those hobbies is rockets. "On my
last official day of employment at HP, I was in Kansas at a rocket
launch." As well as launching his own rockets at great speeds and
to great heights, Bdale has also been involved in the creation of the Tripoli
Mentoring Program, which allows children as young as twelve to become
involved in
rocketry. Bdale and Keith Packard have also turned their rocketry hobby
into a successful 100% open source small business, Altus Metrum, selling rocketry
components.
Bdale remains involved with the FreedomBox
project, the subject of one of his talks at LCA 2012. He noted that
work on the project was taking longer than hoped, but was moving
forward. He leaves LCA on Friday to go straight to FOSDEM in Brussels,
where he will present the current state of FreedomBox with Eben Moglen.
How do we define the Linux desktop?
Bdale began his discussion of the Linux desktop with the observation
that the metaphor of a desktop is somewhat stale. In most parts of the
world, there has been a dramatic transition away from computers as things
that sit on desks. A question that follows from that observation is:
"When we talk about the 'desktop', are we really just talking about
the user interface of the device in front of us?"
Bdale's answer to that question is "no". His fridge and TV may be
running Linux, but he does not want a desktop interface for them. Rather,
what he sees as the desktop is the interface that is provided by a
universal computing device—one that lets him read mail, browse the
web, design rocket components, design objects for 3D printing, give
presentations, do accounts, develop software, run simulations, and so on.
Will Linux ever displace Windows?
Bdale is frequently asked "when will HP start shipping Linux on
desktop systems?". His answer is "a whole bunch of years
ago", since HP has for several years been shipping computers running
Linux to many parts of the world. However, he acknowledged that the user
interface that most people get when they buy their universal computing
device is likely to be Microsoft Windows, especially in the developed
world.
In "green field deployments", there have been many cases where Linux
desktops have been successfully rolled out. Here, Bdale noted a few of the
more well-known deployments, such as the deployment of Linux on around
80,000 desktops in the Extremadura school system, and deployments in the
city of Munich and the French parliament. But, he noted, if you have an
existing body of users who have been using Microsoft Windows, then the cost
of change can be pretty high. Reeducating users onto a new computing
environment can be a major task.
There are strong disincentives for OEMs such as HP to move away from
Windows. People used to ask Bdale whether HP might not save a lot of money
on license fees if they moved away from Windows, and instead preinstalled
Linux on the machines they sold. He used to ask the same question himself,
until he had the experience of spending six months working with HP's
netbook and notebook division in 2009 with the goal of helping them develop
an open source strategy. He made the mistake of trying to explain to the
senior vice president (SVP) with whom he worked that Linux could
save the division a lot of money.
However, the SVP explained "You don't understand. This doesn't
cost us money. The preinstalled software on a PC is a substantial
source of revenue." All of the preinstalled software that is
present on new Windows machines is there because two things happen once an
OEM starts shipping millions of PCs. First, the OEM starts paying much
lower license fees for Windows. Second, the OEM becomes in effect a major
software distribution channel. Software producers and providers of network
services want, and will pay for, the opportunity to appear in front of
millions of users. Thus, the ecosystem of preinstalled Windows-based
software subsidizes the cost of a PC; replacing that with Linux would
actually increase the price of the system.
In Bdale's opinion, it's difficult to displace Windows as the operating
system that is preinstalled on new PCs. Either one has to abandon that
idea, or come up with a user experience that is sufficiently
compelling that users prefer your system. We've already seen this happening
in the mobile device space, he said.
Displacing Windows is also difficult because of the joint-marketing
opportunities that companies sometimes engage in. Bdale mentioned the
example of the Media Vault, a small personal network-attached storage
device that runs Linux and Samba. Microsoft approached HP proposing
to build a similar higher-specification device based on Windows. Over
time, the two companies invested significant amounts of money in developing
and marketing the new device, to the point where it ultimately displaced
the earlier free-software-based device from the market. Given the amount of
money that companies are prepared to invest in order to create successful
products, a result such as this is a natural outcome; it is difficult for
individual developers to compete with these sorts of perfectly legitimate
company behaviors.
Bdale noted that similar challenges apply when it comes to considering
whether Linux could displace Apple. In addition, there are other
challenges as well, such as the extent to which Apple's devices and walled
gardens "captivate" users.
A few rants regarding the Linux desktop
Bdale then noted that there would be "a few rants that I'd be
remiss in not mentioning". One of those rants springs from
observing how successful Linux has become (in the form of Android) in
the mobile market. One of his frustrations in this area is the amount of
energy that has gone into open source mobile projects that didn't succeed.
Furthermore, environments like Android use a lot of open source software
and employ many developers to work on open source technologies, but the
resulting products and ecosystems are not very open.
There is technical work being done in the mobile space (for example,
work on the kernel)
that is certainly proving
useful to the universal Linux desktop, Bdale said. On the other
hand, can one user interface really span all sorts of devices? The idea is
appealing. But the problem with user interfaces is that the capabilities of
devices vary so widely, with a wide range of screen sizes and input models
that range from keyboard-centric to touch-centric. These differences have
a big impact on the model of how a user interface should work.
"I thought the whole idea of personal computers with free software
was to really empower people." Our licensing structures are designed
to allow any user to also become a developer. Nevertheless, it's
perfectly okay to want to expand the user base of our software to welcome
people who don't consider themselves programmers. But the problem is that
some desktop projects seem to have become confused about who their
target audience is. "The problem is that so much of the work
that has gone into some of those projects has left me—and a lot of
users like me—feeling abandoned by Linux desktop developers."
Any time you think you're designing something for someone else, and not
something you want to use yourself, then you are on a slippery slope.
Bdale said that he had had some horrifying experiences over the years
with desktop developers who were clearly not eating their own dog food. By
way of example, he noted that in conversation with some Evolution
developers at GUADEC some years ago, he found that not one of the
developers would admit to using Evolution to read their own
mail. "They laughed at me for using it to try and read my
mail." In this case, one of the fundamental tenets of free software
was being cast aside: the developers were not scratching their own itches.
What really matters on the desktop?
Recently, Debian changed its default desktop for the Wheezy release.
The problem was that GNOME became so large that it could not fit with the
rest of the Debian system on a single installation CD. As a consequence,
Joey Hess, one of the Debian maintainers made an arbitrary decision in
August 2012 to change
the default desktop environment to the smaller Xfce desktop system, which
allowed the Debian install system to fit on a single CD. [Note: As commenters (and Bdale) have pointed out, the Xfce switch was never uploaded, so the report is "accurate", but the speaker misspoke. ]
Bdale noted that the decision to change Debian's default desktop
generated relatively little heat. One reason for this is likely because the
user can change the desktop after installation. But, in his view, there is
also another reason: most people care more about applications than
desktops. The desktop doesn't matter until it gets in the way. Bdale's
key point here was that users are happy as long as they can run any
application on any desktop that they choose. Conversely, applications that
tie in certain desktop dependencies are a source of frustration for users,
who don't want to be forced to include software components they didn't
choose.
Another point that really matters to users is inefficiency. When users
get a faster computer, they expect their applications to run faster,
although this often doesn't turn out to be true in practice. Users also
really care about efficiency because of battery life. Bdale's point was
that a lot of the things that people get excited about in the desktop world
aren't that exciting to him. For example, he is not so interested in
compositing and other "bling" graphics features that tend to be expensive
in terms of CPU load and energy consumption.
Customization is another thing that really matters to users. For
example, when Bdale's then small daughter first encountered Star Office,
she wanted to know how to do things such as changing the font type and
size. This sort of customization is, he said, part of the process of taking
ownership of technology.
The ability to automate repetitive tasks is also
important. Scripts are many people's first step toward programming.
Providing graphical interfaces to allow common tasks to be quickly executed
is fine, but don't hide access to text interfaces that may be useful for
scripting repetitive tasks.
The final aspect of the desktop interface that Bdale noted as being
important is hackability. The interface should be something that the user
can work on and improve. We should be thinking of building a world where
the people using our systems want to learn our systems; we have
nothing to win by competing with people who are intent on creating
appliances that don't require any understanding by the user. "This
is why I hope we'll refocus on building things that we really like to
use. "
Being able to understand and fix the software we use is important.
Bdale ultimately gave up trying to build the Evolution mail client because
the build environment was just too complex. That sort of complexity
prevents casual contributions to a project, because the effort of trying to
understand the system is too great. Bdale noted the Linux kernel as an
example of a project that has done a better job of managing complexity in
the build environment. He also referred to the statistics on Linux kernel
contributions that show a long tail of contributors who make just a single
improvement to the kernel during each release cycle. Those long-tail
contributions occur because users feel able to master the tools that are
needed in order to make a contribution.
What does it all mean?
Bdale finished up with a few summary points. First, it's great to feel
good about the success of Linux in the mobile space, in the form of
Android. Second, we need to pick realistic goals when developing desktop
software for Linux. There are some powerful market forces that make it
difficult to unseat Windows on the PC. Instead, we should put all of our
time and energy into building the systems that we want to use. Our
collaborative model is "awesomely powerful" and allows us to make huge
changes in the world. The key to success is that when we choose to
differentiate, we should do so in interoperable ways. Finally, we should
always empower users to contribute, so that we get the long-tail
contribution effect.
Comments (39 posted)
By Nathan Willis
January 30, 2013
The Serval project accounted for a number of
separate talks at linux.conf.au 2013,
but it was not the only project
focused on free software in mobile computing. Among the others making
a showing were OpenPhoenux,
which is an ongoing effort
to build and support new motherboards for the venerable OpenMoko Freerunner phone, and Mozilla's Firefox OS,
whose team had demonstration hardware in tow. But there were also
several talks that dealt with that most mainstream of mobile Linux platforms,
Android, although in offbeat ways—such as deploying cutting-edge
digital radio modes like codec2 on commodity mobile handsets.
OpenPhoenux
Often, new mobile platform projects (such as Jolla with Sailfish OS)
spend a considerable amount of time searching for hardware partners
interested in building devices, but the Open Pheonux project does just
the opposite. It is plowing straight ahead at making its own phone
motherboards, with interested hackers committing to pre-orders of
small batches. The result of this approach is a more expensive
per-unit price, but the hardware reaches the community quickly. Neil
Brown presented a session explaining the goals of Open Phoenux and its
current status, with hints at what may be coming in the next few months.
In its day, the Openmoko Freerunner was an exceptional device; the
hardware was open in addition to the software. But the last
motherboards produced by Openmoko were made in 2009; enthusiasts have
subsequently been forced to look for old stock on eBay or other
second-hand markets—and even when found, the 2009-era hardware
shows its age. OpenPhoenux offers a brand new motherboard that fits
into the same case as the original Freerunner, but sports a faster CPU
(a 1GHz ARM Cortex-A8), more RAM (512MB), a 3G-capable modem, and an assortment of new or
updated sensors (GPS, FM transceiver, accelerometer, compass, gyroscope, and
altimeter). In addition, the various sensors and components have
shrunk enough in size since 2009 that there is even more room today,
so the new motherboard can overcome limitations in the original
Freerunner design, like the lack of stereo speakers.
The OpenPhoenux motherboard is called the GTA04, and it is
currently in revision 4 (denoted as GTA04A4). Brown said there are
about 300 of the motherboards in the wild, and users are running a
variety of software environments on them (including the Replicant Android distribution,
QtMoko, and a derivative
of the original GTK+-based Openmoko software), with several users
using the GTA04 as their primary phone. Brown devotes most of his time to
making sure that the mainline Linux kernel runs on the GTA04, which he
described as a work in progress. There are a number of missing
drivers at the moment, including the FM transceiver and the altimeter,
and there are other sensors which are simply tricky to figure out how
to support. For example, he said, what is an accelerometer? In some
senses, it is used as an input device, responding to the user's
movement, but at the same time, in produces a constant stream of
sensor data whether the user is actively using it or not. As a result,
there are two interfaces that need to be supported: that of a standard
input device, and that of an Industrial IO (IIO) device. The kernel's
IIO subsystem has recently moved out of staging and into main, he
said, but it is still overly complex and not great to work with.
Keeping the kernel running is a constant challenge, Brown said, as
there are always new bugs and old bugs reintroduced. He outlined a
dozen or so from the period between kernel 3.5 to 3.7, such as CPU
checks that incorrectly assume that cpu_is_omap34xx and
cpu_is_omap3630 are mutually exclusive conditions—which
they are not, as the GTA04 proves. There is also an ongoing struggle
to minimize power consumption on the device, which Brown described as
a nebulous challenge that touches many areas of the system at once and
can be hard to track down. "If I see a green tinge on the
screen, at least I know it is a problem with the video
subsystem," he said. Still, it is a fun exercise, he said, and
one that he learns from constantly. The next revision of the GTA04
boards (GTA04A5) is tentatively scheduled for March, if enough
preorders are made by that time. The exact price depends on the
number of orders, but is around the $500 range—there are plenty
of cheaper alternatives, he cautioned: the OpenPhoenux's primary
selling point is its complete openness. For now, that openness
extends beyond the motherboard: users have recently started making
their own alternative cases with 3d printers. In the future, some
project members are looking for ways to add hardware keyboards, or to
put the GTA04 into a tablet form factor with a bigger screen and battery.
Firefox OS and Android
On the subject of soon-to-be-available phone hardware, Mozilla developer Ben Kero presented a
demonstration of Firefox OS running on the recently-announced
developer phones at the mobile
FOSS miniconf on Monday. Users have been able to run Firefox OS in a
simulated environment through an add-on
for quite some time, but seeing it running on actual devices is far
more compelling. The developer phones are modest compared to a
cutting edge Android device (the model on hand was the lower-end of
the two, with a 1GHz processor), but thanks to the demo it is clear to
see that Firefox OS is fast and responsive on them.
The developer phones are slated to be sold through Geekphone, a
company that specializes in unlocked, hackable phone hardware, but two
models is still not much of a selection. Several people in the
audience expressed interest in what other hardware devices Firefox OS
would be capable of running on. The good news is that Mozilla
evidently tracks the official builds of the CyanogenMod project, and
the Firefox OS build system automatically builds images for any device
supported by CyanogenMod 9 or later. Firefox OS uses the kernel and hardware adaptation from
CyanogenMod, Kero said, "we just tore out the Java."
Perhaps predictably, a round of applause from the audience followed that comment.
Which is not to say that Android was a punching bag at the mobile
computing talks at LCA 2013; there were in fact several talks that
focused on Android specifically. But by and large, Android was seen
as the commodity test platform on which other, more interesting work
was based. Project
Grimlock, for example, is a port of the Rhino
JavaScript implementation to Android's Dalvik. Grimlock developer
Maksim Lin also presented a session on modifying Android to run as an
appliance-like operating system, like one might find in a kiosk or
digital sign: a single application, presenting a stripped down interface to the user.
As it turns out, running a kiosk-like appliance is relatively
straightforward, once one knows where to look for the undocumented
privileged APIs that Google apps routinely take advantage of. Lin
said that many of the API calls are visible when reading
through Android source, but that there are directives in the
documentation that hide them whenever the documentation is rendered to
HTML for display on a public web site. It was a thought-provoking
talk that might make one seriously consider reading the Android source
for the first time.
Codec2 and digital voice radio
Pushing boundaries was a common theme outside of the mobile FOSS
miniconf as well. David
Rowe presented his recent
work on an extremely low-bitrate speech
codec called codec2 which he
has been developing for digital radio in the amateur-licensed HF and VHF bands. Codec2
can encode intelligible human speech in 1400 bits per second, which is
a fraction of the bitrate required by analog voice modes in amateur
radio. More importantly, it is significantly lower than the 2400 bits
per second needed by comparable proprietary codecs. Digital voice
modes in radio are a hot topic at the moment, and Rowe cautioned that
if proprietary offerings were allowed to be blessed into standards,
they would lock out open source, potentially for several decades to
come.
The good news is that Rowe's work on codec2 does appear to be
gaining traction with ham radio enthusiasts—and that, he said,
bodes well for its future, because ham radio groups are influential
with police, medical, and emergency services when the latter look to
establish standards. At the moment, codec2 is usable immediately with
Rowe's FreeDV, a cross-platform GUI
application. Rowe demonstrated FreeDV and compared samples encoded
with codec2 to other options. The FreeDV GUI is primarily used on
Windows, he said, because most ham radio users are Windows-based,
"but we are converting them over."
A ham radio codec might sound out of place at first, considering
that ham radio is generally associated with hardware transceivers and
fixed stations, but therein lies the secret. Following Rowe's talk
was a demonstration by Joel Stanley, who showed codec2 running with
software defined radio (SDR) on an Android handset. The combination
of SDR and today's more powerful digital signal processors have the
potential to blur the lines significantly between previously-distinct
classes of communication, he said.
Stanley's demo was decidedly "homebrew" in appearance, but much of
that stems from limits imposed by trying to create a
device-vendor-neutral solution. He would have preferred to build a
dongle that could plug directly into an Android phone's microphone
port and receive HF radio, but that proved impossible because of the
incompatible pinouts used by different phone makers. Instead, Stanley
hooked the Android phone into a USB soundcard, but it did indeed
serve as a working codec2 receiver. As was the case with the Serval project talks
earlier in the event, it is intriguing to see free software used to
redefine something so widespread (and taken for granted) as a mobile
phone. It is a refreshing reminder that although Linux, through
Android, has made enormous waves in the mobile industry, those waves
do not stop at the edge of the app store.
Comments (7 posted)
By Nathan Willis
January 30, 2013
Linux on mobile devices is a perpetually hot topic, but the discussion
typically centers around Android, webOS, MeeGo, and other
commercially backed operating system projects. The Mobile FOSS
miniconf at linux.conf.au 2013
offered a decidedly different program,
highlighting projects that pushed mobile computing in directions of
little interest to phone carriers, such as the Serval project, which
focuses on freeing mobile phones from the cellular infrastructure
altogether.
Miniconfs are one-day tracks organized by volunteers, separate from
the main event schedule. The first two days of LCA featured thirteen
miniconfs, with topics ranging from security to open government.
Monday's Mobile FOSS miniconf featured eleven talks, although the
schedule was dominated—in a friendly way—by a series of
interconnected Serval sessions. The project builds a free software
mesh networking framework for mobile and
embedded devices. These devices can operate over the mesh network
without cellular connections, instead using WiFi, or (potentially) any
other radio layer. One of the sessions dealt with the project's
interest in building specialty phone hardware to use other radio
frequency bands, but most of the talks described how Serval runs on
current Android devices today.
Haiti and the origin of Serval
Paul Gardner-Stephen of Flinders University explained that the
Serval Project had its origin in the 2010 Haiti
earthquake, in the
wake of which humanitarian relief organizations faced unprecedented
logistical challenges—starting with the unavailability of
airports, highways, and harbors to even get physical access to the
disaster site. Gardner-Stephen likened the telecommunications
challenge to that physical access challenge, noting that without the
cellular telecom infrastructure, even high-end smartphones were
unusable—in spite of the fact that the devices are physically
capable of communicating directly to each other. Software, he said,
is all that prevents smartphones from offering even the short-range
connections offered by cheap walkie-talkies for children. Serval is
his attempt to overcome that limitation.
The Serval framework runs on top of Linux; at this time primarily
on Android devices, though the project has used routers as well (such
as the Mesh Potato) to
provide access points or uplinks to the Internet. In theory, phones should be
able to connect to one another directly in ad hoc WiFi mode, but the
reality is that ad hoc support has long been buggy [Note: this link is broken currently]
in Android (and has been disabled in recent versions), and ad hoc mode
interoperability between phone vendors is
poor enough to be unusable. Consequently, some devices must be run
in access point mode, while the others connect to them in client mode. Still,
Gardner-Stephen noted, in field trials the project has been able to
get phone-to-phone connections working over several hundred meters,
which is significantly more range than one sees in an urban
environment polluted by microwave oven interference (and by hundreds
of conference attendees simultaneously browsing the web).
Design
The Serval software consists of two layers, the servald
daemon and the applications that utilize it. The
project had freshly-updated Android packages available for
installation during the sessions (delivered from one of the speakers'
own phones running in access point mode). The main Serval package
installed both the servald daemon and Serval BatPhone, an
application that provides voice calling, text messaging, and file
sharing over the mesh. A second package installed a mapping
application intended to let teams keep track of individuals' locations
in the field.
The servald layer implements device identification and
discovery through a module named Distributed Numbering Architecture
(DNA). Voice calls, since they require stricter latency than
other applications, are handled by a special module called VoMP. All
other services are implemented on top
of Rhizome, a file distribution module described as USENET-like because it
opportunistically stores-and-forwards every uploaded file to all
clients on the mesh. Rhizome can be used to propagate static files to
all clients. For example, field workers can take geotagged photos while on survey
trips; the photos are automatically propagated to other team members.
This protects against data loss, and it rapidly disseminates
information about points of interest or concern. But Rhizome can
also be used as a "datagram"-like protocol layer on which other
applications are built. Serval's text message analogue "MeshMS" is
implemented on top of Rhizome in just such a fashion.
Given the humanitarian relief scenarios that first prompted Serval,
it should come as no surprise that the network was designed to provide
users with security—disasters can put relief workers at risk of
harm from criminals or militants in addition to natural dangers. At
first run, each Serval client generates a 256-bit Elliptic Curve
public-private key pair. The public key is used as the client node's
Serval ID (SID); consequently any client that knows the SID of the node
it wishes to exchange messages with already has enough information to
encrypt and/or cryptographically sign that message.
Various applications are implemented on top of these core
services, including the aforementioned BatPhone and Serval Maps. For
the phone functions, users select their own "phone number" which is
propagated to other clients over the mesh. The phone number is simply
easier for users to dial than the SID public key itself, but VoMP and
MeshMS are encrypted. The project described some other applications that they
have implemented in field tests with humanitarian relief
organizations, such as filing structured forms.
In the field
Gardner-Stephen described one of the Serval project's field tests
in detail; the project worked with the New Zealand Red Cross (NZRC) on
training exercises that placed NZRC teams in realistic disaster relief
scenarios. From the exercises, he said, the Serval project learned practical
lessons like the need to reduce the amount of chatter that clients
produce synchronizing with each other when there are many nodes in
close proximity. As he put it, Serval nodes are like ducks: they don't
quack much on their own, but when you get a roomful of them together
they won't shut up. The project also learned a number of
details about real-world relief work that are not directly software
driven, like the fact that NZRC has little interest in voice
communications, but being able to exchange short data messages is
vital.
The NZRC field tests allowed Serval developers to work with Iridium
satellite networking equipment as a backbone; in other instances they
have worked with OpenBTS to create a micro-cellular network. But
Gardner-Stephen said the project was also exploring the possibility of
building low-cost cellular phones with an Arduino-like microcontroller
"backpack" that could be filled with swappable radio modules. WiFi in
open-field conditions has far greater range than it does in crowded
cities (or conferences), but unlicensed spectrum like the ISM 900MHz band
offers the potential for greater range at lower power consumption.
Since the same bands are not available in every region, of course,
making the radio modules pluggable is seen as a critical feature.
At several points during the day, the question arose as to why the
project had invented its own protocol rather than using an existing
one, such as the mesh networking deployed by the One Laptop Per Child
(OLPC) project. The project's members indeed had rationales for each
case, although most of them boiled down to the same issue: making a
decentralized system that could work without any infrastructure. The
OLPC mesh network, for example, is based on 802.11s, but that
standard is used to build a mesh extension that branches out
from a fixed base station: without the base, the clients cannot
communicate among themselves. Similarly, early experiments with
voice calling revealed that the common SIP protocol found in most VoIP
clients was unsuitable for stand-alone mesh calling because it
ultimately relies on the certificate authority infrastructure of TLS.
The Serval project has built some impressive mesh networking
technologies, although it still has plenty of challenges ahead. The
hardware support issue is chief among them; the speakers observed that
WiFi antennas in most Android phones are low-priority because
phone-to-phone networking is not important to the device-maker. The
cellular antenna is positioned to work well, but the assumption is
that the WiFi antenna can be tucked in anyplace, since checking email
and browsing the web is less critical than making a voice call.
Whether Serval will overcome that hurdle by making its own hardware
remains to be seen, but it does demonstrate that cellular voice calls
are not always the most useful task a phone can perform. That is a
notion many smartphone users would likely agree with, but Serval shows
that humanitarian work, and entirely new networking topologies, are
among the features.
Comments (none posted)
Page editor: Jonathan Corbet
Security
By Jake Edge
January 30, 2013
SCSI command filtering has been the source of a number of Linux kernel
problems over the
years. In order to allow unprivileged users to have access to the commands
needed for
playing and burning CDs/DVDs, for example, the privilege requirement for
sending SCSI commands was lowered. But that, in turn, caused
problems where those unprivileged users could issue commands that were rather
dangerous, including some that could destroy devices entirely. That led
to a SCSI command whitelist being added to
the 2.6.8 kernel, way back in 2004.
That whitelisting approach has itself proved problematic to the point
where it was proposed for removal in 2006;
that proposal failed due to
strong opposition from Linus Torvalds. A privilege escalation vulnerability that was
found in late 2011 is a more recent example where the filtering wasn't
strict enough. Another hole has recently been discovered; Paolo Bonzini
has posted a patch set to close the hole, while also
addressing some other deficiencies in the SCSI command filtering.
The hole is CVE-2012-4542,
which is caused by SCSI commands that overlap between device classes. The
existing filter is set up to distinguish between devices opened for
read-only and those opened for read-write. But in some cases the same command
opcode will write to one kind of a device while it will read from some
other type. For example, the READ SUB-CHANNEL (0x42) command for an MMC
(CD or DVD) device is the same as the UNMAP command on a disk. So, using
the command to
request the sub-channel information for an audio CD would result in unmapping
logical blocks if sent to a disk.
There are other examples cited in the bug report and patches, but the basic
problem stems from the filtering not being aware of the destination device
class.
Without that information, it is not possible to be sure which opcodes
are actually read-only and which will write to the device. The first part
of Bonzini's patch set restructures the filter table to associate the device
class and direction (read or write) with each command. He also changes
blk_verify_command() to use the device class and new table.
Another chunk of the patch set adds more entries to the table both to add
"rare & obsolete device types" and more whitelisted
commands for existing device types.
The last piece of the set (beyond a minor cleanup) adds the ability to turn
off the whitelist on a
per-device basis. Currently, a process can be given the
CAP_SYS_RAWIO capability, which will allow it to send any SCSI
command to any device. But that makes for fairly coarse-grained control
because it allows access to all devices. In addition,
CAP_SYS_RAWIO may be used to
elevate privileges, which may argue against its use.
Bonzini adds a new sysfs file,
/sys/block/<device>/unpriv_sgio, if it is set to '1', the
command filter will be bypassed for any file descriptor that is not
read-only. This can be
used to pass suitable file descriptors to trusted processes, as described
in the patch:
This is useful for virtualization, where some trusted guests would like
to send commands such as persistent reservations, but still the virtual
machine monitor should run with restricted permissions.
Other than some fairly minor quibbles from Tejun Heo, there have been no
comments on the patch set. Given that it fixes a CVE, it seems likely to
be picked up fairly soon (even if the CVE number in the patch subject may
get lost in translation to
Torvalds's Git tree). The other pieces of the patch set are perhaps less
important, but seem relatively uncontroversial.
Allowing non-root users to access hardware more or less directly is always
problematic from a security standpoint. There is always tension, though,
because users have strong ideas about how they want to use their systems.
The history of the SCSI command whitelist shows that it is rather difficult
to find the right balance between protecting the system and its hardware,
and making a system that is usable—at least for some definitions of "usable".
Comments (none posted)
Brief items
That's security in today's world. We have no choice but to trust
Microsoft. Microsoft has reasons to be trustworthy, but they also have
reasons to betray our trust in favor of other interests. And all we can do
is ask them nicely to tell us first.
--
Bruce
Schneier on the
Open Letter
to Skype
That said, recently made security "improvements" to Java
SE 7 software don't prevent silent exploits at all. Users
that require Java content in the web browser need to rely
on a Click to Play technology implemented by several web
browser vendors in order to mitigate the risk of a silent
Java Plugin exploit.
--
Adam
Gowdiak is unimpressed with recent Java security updates
Newegg refuses to settle in cases like this, even when it would be cheaper
to settle than to fight. They beat the hell out of Soverain, killed their
patent, and freed not just themselves, but all the firms that faced
potential extortion from them -- and all of us, who will pay higher prices
to keep these ticks nicely, comfortably bloated with their parasitic gains.
--
Cory Doctorow
We require that government agencies conducting criminal investigations use
a search warrant to compel us to provide a user's search query information
and private content stored in a Google Account—such as Gmail messages,
documents, photos and YouTube videos. We believe a warrant is required by
the Fourth Amendment to the U.S. Constitution, which prohibits unreasonable
search and seizure and overrides conflicting provisions in ECPA [Electronic
Communications Privacy Act].
--
Google
Comments (10 posted)
The Greatfire.org site has
a
detailed analysis of a man-in-the-middle attack apparently directed
against Chinese Github users. "
It’s clear that a lot of software
developers in China rely on GitHub for their code sharing. Completely
cutting access affects big business. GitHub may just be too important to
block. That leaves the authorities in a real pickle. They can’t
selectively block content on GitHub nor monitor what users are doing
there. They also cannot block the website altogether lest they hurt
important Chinese companies. This is where man-in-the-middle attacks make
their entrance. By faking SSL certificates, the authorities can indeed
intercept and track traffic to encrypted websites."
Comments (21 posted)
New vulnerabilities
corosync: denial of service
| Package(s): | corosync |
CVE #(s): | |
| Created: | January 30, 2013 |
Updated: | January 30, 2013 |
| Description: |
Corosync v2.3.0 fixes a potential denial of service, because HMAC was used without a key. |
| Alerts: |
|
Comments (none posted)
cronie: file descriptor leak
| Package(s): | cronie |
CVE #(s): | CVE-2012-6097
|
| Created: | January 29, 2013 |
Updated: | April 5, 2013 |
| Description: |
From the openSUSE advisory:
cron: does not close file descriptors
before invocation of commands. See this bug report for more information. |
| Alerts: |
|
Comments (none posted)
drupal: multiple vulnerabilities
| Package(s): | drupal7, drupal6 |
CVE #(s): | |
| Created: | January 28, 2013 |
Updated: | March 6, 2013 |
| Description: |
From the Red Hat bugzilla:
Drupal upstream has released 6.28 and 7.19 versions to correct multiple security issues. See the Drupal advisory for SA-CORE-2013-001. |
| Alerts: |
|
Comments (none posted)
glance: information leak
| Package(s): | glance |
CVE #(s): | CVE-2013-0212
|
| Created: | January 30, 2013 |
Updated: | February 14, 2013 |
| Description: |
From the Ubuntu advisory:
Dan Prince discovered an issue in Glance error reporting. An authenticated
attacker could exploit this to expose the Glance operator's Swift
credentials for a misconfigured or otherwise unusable Swift endpoint. |
| Alerts: |
|
Comments (none posted)
inkscape: unintended file access
| Package(s): | inkscape |
CVE #(s): | CVE-2012-6076
|
| Created: | January 30, 2013 |
Updated: | February 14, 2013 |
| Description: |
From the Ubuntu advisory:
It was discovered that Inkscape attempted to open certain files from the
/tmp directory instead of the current directory. A local attacker could
trick a user into opening a different file than the one that was intended. |
| Alerts: |
|
Comments (none posted)
ipa: authentication bypass
| Package(s): | ipa |
CVE #(s): | CVE-2012-5484
|
| Created: | January 24, 2013 |
Updated: | February 25, 2013 |
| Description: |
From the Red Hat advisory:
A weakness was found in the way IPA clients communicated with IPA servers
when initially attempting to join IPA domains. As there was no secure way
to provide the IPA server's Certificate Authority (CA) certificate to the
client during a join, the IPA client enrollment process was susceptible to
man-in-the-middle attacks. This flaw could allow an attacker to obtain
access to the IPA server using the credentials provided by an IPA client,
including administrative access to the entire domain if the join was
performed using an administrator's credentials. (CVE-2012-5484)
Note: This weakness was only exposed during the initial client join to the
realm, because the IPA client did not yet have the CA certificate of the
server. Once an IPA client has joined the realm and has obtained the CA
certificate of the IPA server, all further communication is secure. If a
client were using the OTP (one-time password) method to join to the realm,
an attacker could only obtain unprivileged access to the server (enough to
only join the realm).
|
| Alerts: |
|
Comments (none posted)
ircd-ratbox: denial of service
| Package(s): | ircd-ratbox |
CVE #(s): | CVE-2012-6084
|
| Created: | January 25, 2013 |
Updated: | February 11, 2013 |
| Description: |
From the Debian advisory:
It was discovered that a bug in the server capability negotiation code of
ircd-ratbox could result in denial of service.
|
| Alerts: |
|
Comments (none posted)
libav: multiple vulnerabilities
| Package(s): | libav ffmpeg |
CVE #(s): | CVE-2012-2783
CVE-2012-2791
CVE-2012-2797
CVE-2012-2803
CVE-2012-2804
|
| Created: | January 28, 2013 |
Updated: | February 18, 2013 |
| Description: |
From the CVE entries:
Unspecified vulnerability in libavcodec/vp56.c in FFmpeg before 0.11 has unknown impact and attack vectors, related to "freeing the returned frame." (CVE-2012-2783)
Multiple unspecified vulnerabilities in the (1) decode_band_hdr function in indeo4.c and (2) ff_ivi_decode_blocks function in ivi_common.c in libavcodec/ in FFmpeg before 0.11 have unknown impact and attack vectors, related to the "transform size." (CVE-2012-2791)
Unspecified vulnerability in the decode_frame_mp3on4 function in libavcodec/mpegaudiodec.c in FFmpeg before 0.11 has unknown impact and attack vectors related to a calculation that prevents a frame from being "large enough." (CVE-2012-2797)
Double free vulnerability in the mpeg_decode_frame function in libavcodec/mpeg12.c in FFmpeg before 0.11 has unknown impact and attack vectors, related to resetting the data size value. (CVE-2012-2803)
Unspecified vulnerability in libavcodec/indeo3.c in FFmpeg before 0.11 has unknown impact and attack vectors, related to "reallocation code" and the luma height and width. (CVE-2012-2804) |
| Alerts: |
|
Comments (none posted)
libssh: denial of service
| Package(s): | libssh |
CVE #(s): | CVE-2013-0176
|
| Created: | January 28, 2013 |
Updated: | March 29, 2013 |
| Description: |
From the Ubuntu advisory:
Yong Chuan Koh discovered that libssh incorrectly handled certain
negotiation requests. A remote attacker could use this to cause libssh to
crash, resulting in a denial of service. |
| Alerts: |
|
Comments (none posted)
libvirt: code execution as root
| Package(s): | libvirt |
CVE #(s): | CVE-2013-0170
|
| Created: | January 29, 2013 |
Updated: | February 22, 2013 |
| Description: |
From the Red Hat advisory:
A flaw was found in the way libvirtd handled connection cleanup (when a
connection was being closed) under certain error conditions. A remote
attacker able to establish a read-only connection to libvirtd could use
this flaw to crash libvirtd or, potentially, execute arbitrary code with
the privileges of the root user. |
| Alerts: |
|
Comments (none posted)
mingw-freetype: multiple vulnerabilities
| Package(s): | mingw-freetype |
CVE #(s): | CVE-2012-1126
CVE-2012-1127
CVE-2012-1128
CVE-2012-1130
CVE-2012-1131
CVE-2012-1132
CVE-2012-1133
CVE-2012-1134
CVE-2012-1135
CVE-2012-1136
CVE-2012-1137
CVE-2012-1138
CVE-2012-1139
CVE-2012-1140
CVE-2012-1141
CVE-2012-1142
CVE-2012-1143
CVE-2012-1144
|
| Created: | January 28, 2013 |
Updated: | January 30, 2013 |
| Description: |
From the CVE entries:
FreeType before 2.4.9, as used in Mozilla Firefox Mobile before 10.0.4 and other products, allows remote attackers to cause a denial of service (invalid heap read operation and memory corruption) or possibly execute arbitrary code via crafted property data in a BDF font. (CVE-2012-1126)
FreeType before 2.4.9, as used in Mozilla Firefox Mobile before 10.0.4 and other products, allows remote attackers to cause a denial of service (invalid heap read operation and memory corruption) or possibly execute arbitrary code via crafted glyph or bitmap data in a BDF font. (CVE-2012-1127)
FreeType before 2.4.9, as used in Mozilla Firefox Mobile before 10.0.4 and other products, allows remote attackers to cause a denial of service (NULL pointer dereference and memory corruption) or possibly execute arbitrary code via a crafted TrueType font. (CVE-2012-1128)
FreeType before 2.4.9, as used in Mozilla Firefox Mobile before 10.0.4 and other products, allows remote attackers to cause a denial of service (invalid heap read operation and memory corruption) or possibly execute arbitrary code via crafted property data in a PCF font. (CVE-2012-1130)
FreeType before 2.4.9, as used in Mozilla Firefox Mobile before 10.0.4 and other products, on 64-bit platforms allows remote attackers to cause a denial of service (invalid heap read operation and memory corruption) or possibly execute arbitrary code via vectors related to the cell table of a font. (CVE-2012-1131)
FreeType before 2.4.9, as used in Mozilla Firefox Mobile before 10.0.4 and other products, allows remote attackers to cause a denial of service (invalid heap read operation and memory corruption) or possibly execute arbitrary code via crafted dictionary data in a Type 1 font. (CVE-2012-1132)
FreeType before 2.4.9, as used in Mozilla Firefox Mobile before 10.0.4 and other products, allows remote attackers to cause a denial of service (invalid heap write operation and memory corruption) or possibly execute arbitrary code via crafted glyph or bitmap data in a BDF font. (CVE-2012-1133)
FreeType before 2.4.9, as used in Mozilla Firefox Mobile before 10.0.4 and other products, allows remote attackers to cause a denial of service (invalid heap write operation and memory corruption) or possibly execute arbitrary code via crafted private-dictionary data in a Type 1 font. (CVE-2012-1134)
FreeType before 2.4.9, as used in Mozilla Firefox Mobile before 10.0.4 and other products, allows remote attackers to cause a denial of service (invalid heap read operation and memory corruption) or possibly execute arbitrary code via vectors involving the NPUSHB and NPUSHW instructions in a TrueType font. (CVE-2012-1135)
FreeType before 2.4.9, as used in Mozilla Firefox Mobile before 10.0.4 and other products, allows remote attackers to cause a denial of service (invalid heap write operation and memory corruption) or possibly execute arbitrary code via crafted glyph or bitmap data in a BDF font that lacks an ENCODING field. (CVE-2012-1136)
FreeType before 2.4.9, as used in Mozilla Firefox Mobile before 10.0.4 and other products, allows remote attackers to cause a denial of service (invalid heap read operation and memory corruption) or possibly execute arbitrary code via a crafted header in a BDF font. (CVE-2012-1137)
FreeType before 2.4.9, as used in Mozilla Firefox Mobile before 10.0.4 and other products, allows remote attackers to cause a denial of service (invalid heap read operation and memory corruption) or possibly execute arbitrary code via vectors involving the MIRP instruction in a TrueType font. (CVE-2012-1138)
Array index error in FreeType before 2.4.9, as used in Mozilla Firefox Mobile before 10.0.4 and other products, allows remote attackers to cause a denial of service (invalid stack read operation and memory corruption) or possibly execute arbitrary code via crafted glyph data in a BDF font. (CVE-2012-1139)
FreeType before 2.4.9, as used in Mozilla Firefox Mobile before 10.0.4 and other products, allows remote attackers to cause a denial of service (invalid heap read operation and memory corruption) or possibly execute arbitrary code via a crafted PostScript font object. (CVE-2012-1140)
FreeType before 2.4.9, as used in Mozilla Firefox Mobile before 10.0.4 and other products, allows remote attackers to cause a denial of service (invalid heap read operation and memory corruption) or possibly execute arbitrary code via a crafted ASCII string in a BDF font. (CVE-2012-1141)
FreeType before 2.4.9, as used in Mozilla Firefox Mobile before 10.0.4 and other products, allows remote attackers to cause a denial of service (invalid heap write operation and memory corruption) or possibly execute arbitrary code via crafted glyph-outline data in a font. (CVE-2012-1142)
FreeType before 2.4.9, as used in Mozilla Firefox Mobile before 10.0.4 and other products, allows remote attackers to cause a denial of service (divide-by-zero error) via a crafted font. (CVE-2012-1143)
FreeType before 2.4.9, as used in Mozilla Firefox Mobile before 10.0.4 and other products, allows remote attackers to cause a denial of service (invalid heap write operation and memory corruption) or possibly execute arbitrary code via a crafted TrueType font. (CVE-2012-1144)
|
| Alerts: |
|
Comments (none posted)
moodle: man-in-the-middle attack
| Package(s): | moodle |
CVE #(s): | CVE-2012-6087
|
| Created: | January 28, 2013 |
Updated: | April 3, 2013 |
| Description: |
From the Red Hat bugzilla:
A security flaw was found in the way Moodle, a course management system (CMS), used (lib)cURL's CURLOPT_SSL_VERIFYHOST variable, when doing certificate validation (value of '1' meaning only check for the existence of a common name was used instead of value '2' - which also checks if the particular common name matches the requested hostname of the server). A rogue service could use this flaw to conduct man-in-the-middle (MiTM) attacks. |
| Alerts: |
|
Comments (none posted)
nova: access controls bypass
| Package(s): | nova |
CVE #(s): | CVE-2013-0208
|
| Created: | January 30, 2013 |
Updated: | February 10, 2013 |
| Description: |
From the Ubuntu advisory:
Phil Day discovered that nova-volume did not validate access to volumes. An
authenticated attacker could exploit this to bypass intended access
controls and boot from arbitrary volumes. |
| Alerts: |
|
Comments (none posted)
perl: code execution
| Package(s): | perl |
CVE #(s): | CVE-2012-6329
|
| Created: | January 25, 2013 |
Updated: | February 19, 2013 |
| Description: |
From the Red Hat bugzilla entry:
A commit to the upstream perl git repository indicated that perl's Locale::Maketext was vulnerable to a flaw that could lead to arbitrary code execution of this function was executed on user-supplied input. Quoting the commit message:
Case 61251: This commit fixes a misparse of maketext strings that could
lead to arbitrary code execution. Basically, maketext was compiling
bracket notation into functions, but neglected to escape backslashes
inside the content or die on fully-qualified method names when
generating the code. This change escapes all such backslashes and dies
when a method name with a colon or apostrophe is specified. |
| Alerts: |
|
Comments (none posted)
php-symfony2-Yaml: code execution
| Package(s): | php-symfony2-Yaml |
CVE #(s): | CVE-2013-1348
CVE-2013-1397
|
| Created: | January 28, 2013 |
Updated: | February 4, 2013 |
| Description: |
From the Symfony advisory:
When parsing an input with Yaml::parse(), and if the input is a valid filename, the input is evaluated as a PHP file before being parsed as YAML. If the input comes from an untrusted source, malicious code might be executed.
Symfony applications are not vulnerable to this attack but if you are parsing YAML with the YAML component in your application, check that your code does not pass untrusted input to Yaml::parse(). Note that Yaml\Parser::parse() is not affected. (CVE-2013-1348)
The Symfony YAML component supports PHP objects parsing and dumping (via the !!php/object: XXX notation).
When parsing an untrusted input that contains a serialized PHP object, it will be unserialized by default, which can lead to malicious code being executed.
Symfony applications are not vulnerable to this attack but if you are parsing YAML in your application, check that your code does not pass untrusted input to Yaml::parse() or Yaml\Parser::parse(). (CVE-2013-1397) |
| Alerts: |
|
Comments (none posted)
rubygem-activesupport: multiple vulnerabilities
| Package(s): | rubygem-activesupport |
CVE #(s): | CVE-2013-0333
|
| Created: | January 29, 2013 |
Updated: | February 10, 2013 |
| Description: |
From the Red Hat advisory:
A flaw was found in the way Active Support performed the parsing of JSON
requests by translating them to YAML. A remote attacker could use this flaw
to execute arbitrary code with the privileges of a Ruby on Rails
application, perform SQL injection attacks, or bypass the authentication
using a specially-created JSON request. |
| Alerts: |
|
Comments (none posted)
rubygem-multi_xml: code execution
| Package(s): | rubygem-multi_xml |
CVE #(s): | CVE-2013-0175
|
| Created: | January 25, 2013 |
Updated: | January 30, 2013 |
| Description: |
From the Red Hat bugzilla entry:
A security flaw was found in the way multi_xml gem, a Ruby gem to provide swappable XML backends utilizing LibXML, Nokogiri, Ox, or REXML, performed Symbol and YAML parameters parsing. A remote attacker could use this flaw to execute arbitrary code with the privileges of the Ruby on Rails application using the multi_xml gem via specially-crafted HTTP POST request.
|
| Alerts: |
|
Comments (none posted)
rubygem-rack: multiple vulnerabilities
| Package(s): | rubygem-rack |
CVE #(s): | CVE-2012-6109
CVE-2013-0183
CVE-2013-0184
|
| Created: | January 28, 2013 |
Updated: | March 15, 2013 |
| Description: |
From the Red Hat bugzilla [1], [2], [3]:
[1] Upstream released Rack 1.4.2, 1.3.7, 1.2.6, and 1.1.4 to fix a denial of service condition when Rack parses content with a certain Content-Disposition header as noted in the original report. (CVE-2012-6109)
[2] Upstream released [1] Rack 1.4.3 and 1.3.8 to fix a denial of service condition due to a malicious client sending excessively long lines that trigger an out-of-memory error in Rack. (CVE-2013-0183)
[3] A flaw that was fixed in 1.4.4, 1.3.9, 1.2.7, and 1.1.5 was also announced that creates a minor denial of service condition, this time in the Rack::Auth::AbstractRequest, where it symbolized arbitrary strings (apparently this has something to do with authentication, but there is no further information provided other than the fix itself, which is noted as "a breaking API change"). (CVE-2013-0184)
|
| Alerts: |
|
Comments (none posted)
samba4: privilege escalation
| Package(s): | samba4 |
CVE #(s): | CVE-2013-0172
|
| Created: | January 25, 2013 |
Updated: | February 5, 2013 |
| Description: |
From the Red Hat bugzilla entry:
Samba 4.0 as an AD DC may provide authenticated users with write access to LDAP directory objects.
In AD, Access Control Entries can be assigned based on the objectClass of the object. If a user or a group the user is a member of has any access based on the objectClass, then that user has write access to that object.
Additionally, if a user has write access to any attribute on the object, they may have access to write to all attributes.
|
| Alerts: |
|
Comments (none posted)
zabbix: LDAP authentication override
| Package(s): | zabbix |
CVE #(s): | CVE-2013-1364
|
| Created: | January 28, 2013 |
Updated: | January 30, 2013 |
| Description: |
From the Red Hat bugzilla:
It was reported that the user.login method in Zabbix would accept a 'cnf' parameter containing the configuration parameters to use for LDAP authentication, which would override the configuration stored in the database. This can be used to authenticate to Zabbix using a completely different LDAP application (e.g. authenticate to Zabbix using some other LDAP directory the attacker has credentials for).
This has been corrected in upstream versions 2.1.0 r32446, 2.0.5rc1 r32444 and 1.8.16rc1 r32442. |
| Alerts: |
|
Comments (none posted)
Page editor: Jake Edge
Kernel development
Brief items
The current development kernel is 3.8-rc5, released on
January 25. The only announcement appears to be
this Google+
posting. Just over 250 fixes were merged since -rc4 came out; see
the short-form changelog for details.
Stable updates: 3.7.5,
3.4.28 and 3.0.61 were released on January 27.
Comments (none posted)
People really ought to be forced to read their code aloud over the
phone - that would rapidly improve the choice of identifiers
—
Al Viro
Besides, wouldn't it be cool to see a group of rovers chasing each
other across Mars, jockeying for the best positioning to reduce
speed-of-light delays?
—
Paul
McKenney
The real problem is, Moore's Law just does not work for spinning
disks. Nobody really wants their disk spinning faster than [7200]
rpm, or they don't want to pay for it. But density goes up as the
square of feature size. So media transfer rate goes up linearly
while disk size goes up quadratically. Today, it takes a couple of
hours to read each terabyte of disk. Fsck is normally faster than
that, because it only reads a portion of the disk, but over time,
it breaks in the same way. The bottom line is, full fsck just isn't
a viable thing to do on your system as a standard, periodic
procedure. There is really not a lot of choice but to move on to
incremental and online fsck.
—
Daniel Phillips
Comments (41 posted)
Kernel development news
By Jonathan Corbet
January 30, 2013
The kernel's block loop driver has a conceptually simple job: take a file
located in a filesystem somewhere and present it as a block device that can
contain a filesystem of its own. It can be used to manipulate filesystem
images; it is also useful for the management of filesystems for virtualized
guests. Despite having had some optimization effort applied to it, the
loop driver in current kernels is not as fast as some would like it to be.
But that situation may be about to change, thanks to an old patch set that
has been revived and prepared for merging in a near-future development
cycle.
As a block driver, the loop driver accepts I/O requests described by
struct bio (or "BIO")
structures; it then maps each request to a suitable block offset in the
file serving as backing store and issues I/O requests to perform the
desired operations on that file. Each loop device has its own thread,
which, at its core, runs a loop like this:
while (1) {
wait_for_work();
bio = dequeue_a_request()
execute_request(bio);
}
(The actual code can be seen in drivers/block/loop.c.) This code
certainly works, but it has an important shortcoming: it performs I/O in a
synchronous, single-threaded manner. Block I/O is normally done
asynchronously when possible; write operations, in particular, can be done
in parallel with other work. In the loop above, though, a single, slow
read operation can hold up many other requests, and there is no
ability for the block layer or the I/O device itself to optimize the
ordering of requests. As a result, the performance of loop I/O traffic is
not what it could be.
In 2009, Zach Brown set out to fix this problem by changing the loop driver
to execute multiple, asynchronous requests at the same time. That
work fell by the wayside when other priorities took over Zach's time, so
his patches were never merged. More recently, Dave Kleikamp has
taken over this patch set, ported it to current kernels, and added support to
more filesystems. As a result, this patch set may be getting close to
being ready to go into the mainline.
At the highest level, the goal of this patch set is to use the kernel's
existing asynchronous I/O (AIO) mechanism in the loop driver. Getting
there takes a surprising amount of work, though; the AIO subsystem was
written to manage user-space requests and is not an easy fit for
kernel-generated operations. To make these subsystems work together, the
30-part patch set takes a bottom-up
approach to the problem.
The AIO code is based around a couple of structures, one of which is
struct iovec:
struct iovec {
void __user *iov_base;
__kernel_size_t iov_len;
};
This structure is used by user-space programs to describe a segment of an
I/O operation; it is part of the user-space API and cannot be changed.
Associated with this structure is the internal iov_iter structure:
struct iov_iter {
const struct iovec *iov;
unsigned long nr_segs;
size_t iov_offset;
size_t count;
};
This structure (defined in <linux/fs.h>) is used by the
kernel to track progress working through an
array of iovec structures.
Any kernel code needing to submit asynchronous I/O needs to express it in
terms of these structures. The problem, from the perspective of the loop
driver, is that struct iovec deals with user-space addresses. But
the BIO structures representing block I/O operations deal with physical
addresses in the form of struct page pointers. So there is an
impedance mismatch between the two subsystems that makes AIO unusable for
the loop driver.
Fixing that involves changing the way struct iov_iter works. The
iov pointer becomes a generic pointer called data that
can point to an array of iovec structures (as before) or, instead,
an array of kernel-supplied BIO structures. Direct access to structure
members by kernel code is discouraged in favor of a set of defined
accessor operations; the iov_iter structure itself gains a pointer
to an operations structure
that can be changed depending on whether iovec or bio
structures are in use. The
end result is an enhanced iov_iter structure and surrounding
support code that allows AIO operations to be expressed in either
user-space (struct iovec) or kernel-space (struct bio)
terms. Quite a bit of code using this structure must be adapted to use the
new accessor functions; at the higher levels, code that worked directly
with iovec structures is changed to work with the
iov_iter interface instead.
The next step is to make it possible to pass iov_iter structures
directly into filesystem code. That is done by adding two more functions
to the (already large) file_operations structure:
ssize_t (*read_iter) (struct kiocb *, struct iov_iter *, loff_t);
ssize_t (*write_iter) (struct kiocb *, struct iov_iter *, loff_t);
These functions are meant to work much like the existing
aio_read() and aio_write() functions, except that they
work with iov_iter structures rather than with iovec
structures directly. A filesystem supporting the new operations must be
able to cope with I/O requests expressed directly in BIO structures —
usually just a matter of bypassing the page-locking and mapping operations
required for user-space addresses. If these new operations are provided,
the aio_*() functions will never be called and can be removed.
After that, the patch set adds a new interface to make it easy for kernel
code to submit asynchronous I/O operations. In short, it's a matter of
allocating an I/O control block with:
struct kiocb *aio_kernel_alloc(gfp_t gfp);
That block is filled in with the relevant information describing the
desired operation and a pointer to a completion callback, then handed off
to the AIO subsystem with:
int aio_kernel_submit(struct kiocb *iocb);
Once the operation is complete, the completion function is called to
inform the submitter of the final status.
A substantial portion of the patch set is dedicated to converting
filesystems to provide read_iter() and write_iter()
functions. In
most cases the patches are relatively small; most of the real work is done
in generic code, so it is mostly a matter of changing declared types and
making use of some of the new iov_iter accessor functions. See the ext4 patch for an example of what needs to
be done.
With all that infrastructural work done, actually speeding up the loop
driver becomes straightforward. If the backing store for a given loop
device implements the new operations, the loop driver will use
aio_kernel_submit() for each incoming I/O request. As a result,
requests can be run in parallel with, one hopes, a significant improvement
in performance.
The patch set has been through several rounds of review, and most of the
concerns raised would appear to have been addressed. Dave is now asking
that it be included in the linux-next tree, suggesting that he intends to
push it into the mainline during the 3.9 or 3.10 development cycle. Quite
a bit of kernel code will be changed in the process, but almost no
differences should be visible from user space — except that block loop
devices will run a lot faster than they used to.
Comments (7 posted)
By Jonathan Corbet
January 30, 2013
Contemporary compilers are capable of performing a wide variety of
optimizations on the code they produce. Quite a bit of effort goes into
these optimization passes, with different compiler projects competing to
produce the best results for common code patterns. But the nature of
current hardware is such that some optimizations can have surprising
results; that is doubly true when kernel code is involved, since kernel
code is often highly performance-sensitive and provides an upper bound on
the performance of the system as a whole. A recent discussion on the
best optimization approach for the kernel shows how complicated the
situation can be.
Compiler optimizations are often aimed at making frequently-executed code
(such as that found in inner loops)
run more quickly. As an artificially simple example, consider a loop like
the following:
for (i = 0; i < 4; i++)
do_something_with(i);
Much of the computational cost of a loop like this may well be found in the
loop structure itself — incrementing the counter, comparing against the
maximum, and jumping back to the beginning. A compiler that performs loop
unrolling might try to reduce that cost by transforming the code into something like:
do_something_with(0);
do_something_with(1);
do_something_with(2);
do_something_with(3);
The loop overhead is now absent, so one would expect this code to execute
more quickly. But there is a cost: the generated code may well be larger
than it was before the optimization was applied. In many situations, the
performance improvement may outweigh the cost, but that may not always be
the case.
GCC provides an optimization option (-Os) with a different
objective: it instructs the compiler to produce more compact code, even if
there is some resulting performance cost. Such an option has obvious value
if one is compiling for a space-constrained environment like a small
device. But it turns out that, in some situations, optimizing for space
can also produce faster code. In a sense, we are all running
space-constrained systems, in that the performance of our CPUs depends
heavily on how well those CPUs are using their cache space.
Space-optimized code can make better use of scarce instruction cache space,
and, as a result, perform better overall. With this in mind, compilation
with -Os was made
generally available for the 2.6.15 kernel in 2005 and made
non-experimental for 2.6.26 in 2008.
Unfortunately, -Os has not always lived up to its promise in the real-world.
The problem is not necessarily with the idea of
creating compact code; it has more to do with how GCC interprets the
-Os option. In the space-optimization mode, the compiler tends to
choose some painfully slow instructions, especially on older processors. It
also discards the branch prediction information provided by kernel
developers in the form of the likely() and unlikely()
macros. That, in turn, can cause rarely executed code to share cache space
with hot code, effectively wasting a portion of the cache and wiping out
the benefits that optimizing for space was meant to provide.
Because -Os did not produce the desired results, Linus disabled
it by default in 2011, effectively ending experimentation with this
option. Recently, though, Ling Ma posted some
results suggesting that the situation might have changed. Recent Intel
processors, it seems, have a new cache for decoded instructions, increasing
the benefit obtained by having code fit into the cache. The performance of
the repeated "move" instructions used by GCC for memory copies in
-Os mode has also been improved in newer processors. The posted
results claim a 4.8% performance improvement for the netperf benchmark and
2.7% for the volano benchmark when -Os is used on a newer CPU. Thus, it was
suggested, maybe it is time to reconsider -Os, at least for some
target processors.
Naturally, the situation not quite that simple. Valdis Kletnieks complained that the benchmark results may not
be showing an actual increase in real-world performance. Distributors hate
shipping multiple kernels, so an optimization mode that only works for some
portion of a processor family is unlikely to be enabled in distributor
kernels. And there is
still the problem of the loss of branch prediction information which, as
Linus verified, still happens when
-Os is used.
What is really needed, it seems, is a kernel-specific optimization mode
that is more focused on instruction-cache performance than code size in its
own right. This mode would take some behaviors from -Os while
retaining others from the default -O2 mode. Peter Anvin noted that the GCC developers are receptive to
the idea of implementing such a mode, but there is nobody who has the time
and inclination to work on that project at the moment. It would be nice to
have a developer who is familiar with both the kernel and the compiler and
who could work to make GCC produce better code for the kernel environment.
Until somebody steps up to do that work, though, we will likely have to
stick with -O2, even knowing that the resulting code is not as
good as it could be.
Comments (37 posted)
By Michael Kerrisk
January 30, 2013
We are accustomed to thinking of a system call as
being a direct service request to the kernel. However, in reality, most
system call invocations are mediated by wrapper functions in the GNU C
library (glibc). These wrapper functions eliminate work that the programmer
would otherwise need to do in order to employ a system call. But it turns
out that glibc does not provide wrapper functions for all system calls,
including a few that see somewhat frequent use. The question of what (if
anything) to do about this situation has arisen a few times in the last few
months on the libc-alpha mailing list, and has recently surfaced once more.
A system call allows a program to request a service—for example,
open a file or create a new process—from the kernel. At the assembler
level, making a system call requires the caller to
assign the unique system call number and the argument values to particular
registers, and then execute a special instruction (e.g., SYSENTER on modern
x86 architectures) that switches the processor to kernel mode to execute
the system-call handling code. Upon return, the kernel places the system
call's result status into a particular register and executes a special
instruction (e.g., SYSEXIT on x86) that returns the processor to user
mode. The usual convention for the result status is that a non-negative
value means success, while a negative value means failure. A negative
result status is the negated error number (errno) that indicates
the cause of the failure.
All of the details of making a system call are normally hidden from the
user by the C library, which provides a corresponding wrapper function and
header file definitions for most system calls. The wrapper function accepts
the system call arguments as function arguments on the stack, initializes
registers using those arguments, and executes the assembler instruction
that switches to kernel mode. When the kernel returns control to user mode,
the wrapper function examines the result status, assigns the (negated)
error number to errno in the case of a negative result, and
returns either -1 to indicate an error or the non-negative result status as
the return value of the wrapper function. In many cases, the wrapper
function is quite simple, performing only the steps just described. (In
those cases, the wrapper is actually autogenerated from
syscalls.list files in the glibc source that tabulate the types
of each system call's return value and arguments.) However, in a few cases
the wrapper function may do some extra work such as repackaging arguments
or maintaining some state information inside the C library.
The C library thus acts as a kind of gatekeeper on the API that the kernel
presents to user space. Until the C library provides a wrapper function,
along with suitable header files that define the calling signature and any
constant and structure definitions used by the system call, users must
do some manual work to make a system call.
That manual work includes defining the structures and constants needed
by the system call and then invoking the syscall() library
function, which handles the details of making the system call—copying
arguments to registers, switching to kernel mode, and then setting
errno once the kernel returns control to user space. Any system
call can be invoked in this manner, including those for which the C library
already provides a wrapper. Thus for example, one can bypass the wrapper
function for read() and invoke the system call directly by
writing:
nread = syscall(SYS_read, fd, buf, len);
The first argument to syscall() is the number of the system
call to be invoked; SYS_read is a constant whose
definition is provided by including <unistd.h>
The C library used by most Linux developers is of course the GNU C
library. Normally, glibc tracks kernel system call changes quite
closely, adding wrapper functions and suitable header file definitions to
the library as new system calls are added to the kernel. Thus, manually
coding system calls is normally only needed when trying to use the
latest system calls that have not yet appeared in the most recent iteration
of glibc's six-month release cycle or when using a recent kernel on a
system that has a significantly older version of glibc.
However, for some system calls, glibc support never appears. The
question of how the decision is made on whether to support a particular
system call in glibc has once again become a topic of discussion on the
libc-alpha mailing list. The most recent discussion started when Kees Cook,
the implementer of the recently added
finit_module() system call, submitted a rudimentary patch to add glibc
support for the system call. In response, Joseph Myers and Mike Frysinger
noted various pieces that were missing from the patch, with Joseph
adding that "in the
kexec_load discussion last May / June, doubts were expressed about whether
some existing module-related syscalls really should have had functions in
glibc."
The module-related system calls—init_module(),
delete_module(), and so on—are among those for which glibc
does not provide support. The situation is in fact slightly more complex
in the case of these system calls: glibc does not provide any header file
support for these system calls but does, through an accident of history,
export a wrapper function ABI for the calls.
The earlier discussion that Joseph referred to took place when
Maximilian Attems attempted to add a header file to glibc to provide
support for the kexec_load() system call, stating that his aim was "to axe the
syscall maze in kexec-tools itself and have this syscall supported in
glibc." One of the primary glibc maintainers, Roland McGrath, had a rather different take on the
necessity of such a change, stating "I'm not really convinced this
is worthwhile. Calling 'syscall' seems quite sufficient for such arcane
and rarely-used calls." In other words, adding support for these
system calls clutters the glibc ABI and requires (a small amount of) extra
code in order to satisfy the needs of a handful of users who could just use
the syscall() mechanism.
Andreas Jaeger, who had reviewed earlier versions of Maximilian's
patch, noted that
"linux/syscalls.list already [has] similar esoteric syscalls like
create_module without any header support. I wouldn't object to do this for
kexec_load as well". Roland agreed
that the kexec_load() system call is a similar case, but felt that
this point wasn't quite germane, since adding the module system calls to
the glibc ABI was a "dubious" historical step that can't be reversed for
compatibility reasons.
But in the recent discussion of finit_module(), Mike Frysinger
spoke in favor of adding full glibc support
for module-related system calls such as init_module(). Dave
Miller made a similar argument even more
succinctly:
It makes no sense for every tool that wants to support
doing things with kernel modules to do the syscall()
thing, propagating potential errors in argument signatures
into more than one location instead of getting it right in
one canonical place, libc.
In other words, employing syscall() can be error prone: there is
no checking of argument types nor even checking that sufficient arguments
have been passed.
Joseph Myers felt that the earlier
kexec_load() discussions hadn't fully settled the issue, and was
interested in having some concrete data on how many system calls don't have
glibc wrappers. Your editor subsequently donned his man-pages maintainer
hat and grepped the man pages in section 2 to determine which system calls
do not have full glibc support in the form of a wrapper function and header
files. The resulting list turns out to be
quite long, running to nearly nearly 40 Linux system calls. However, the
story is not quite so simple, since some of those system calls are obsolete
(e.g., tkill(), sysctl(), and query_module())
and others are intended for use only by the kernel or glibc (e.g.,
restart_syscall()). Yet others have wrappers in the C library,
although the wrappers have a significantly different names and provide some
piece of extra functionality on top of the system call (e.g.,
rt_sigqueueinfo() has a wrapper in the form of the sigqueue()
library function). Clearly, no wrapper is required for those system calls,
and once they are excluded there remain perhaps 15 to 20 system calls
that might be candidates to have glibc support added.
Motohiro Kosaki considered that the
remaining system calls could be separated into two categories: those with
only one or a few applications uses and those that seemed to him to have
more widespread application use. Motohiro was agnostic about whether
the former category (which includes the module-related system calls,
kcmp(), and kcmp_load()) required a wrapper. However, in
his opinion the system calls in the latter category (which includes system
calls such as ioprio_set(), ioprio_get(), and
gettid()) clearly merited having full glibc support.
The lack of glibc support for gettid(), which returns the
caller's kernel thread ID, is an especially noteworthy case. A
long-standing glibc bug report
requesting that glibc add support for this system gained little traction
with the previous glibc maintainer. However, excluding that system call is
rather anomalous, since it is quite frequently used and the kernel exposes
thread IDs via various /proc interfaces, and glibc exposes various
kernel APIs that can employ kernel thread IDs (for example,
sched_setaffinity(), fcntl(), and the
SIGEV_THREAD_ID notification mode for POSIX timers).
The discussion has petered out in the last few days, despite Mike
Frysinger's attempt to further push the debate along by reading and
summarizing the various pro and contra arguments in a single email. As noted by various
participants in the discussion, adding glibc wrappers for some currently
unsupported system calls would seem to have some worthwhile benefits. It
would also help to avoid the confusing situation where programmers
sometimes end up searching for a glibc wrapper function and header file
definitions that don't exist. It remains to be seen whether these arguments
will be sufficient to persuade Roland in the face of his concerns about
cluttering the glibc ABI and adding extra code to the library for the
benefit of what he believes is a relatively small number of users.
Comments (20 posted)
Patches and updates
Kernel trees
Build system
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Memory management
Networking
Architecture-specific
Security-related
Miscellaneous
Page editor: Jonathan Corbet
Distributions
By Jake Edge
January 30, 2013
Since Oracle's purchase of Sun Microsystems in 2009, several of Sun's
high-profile
free software projects have fallen by the wayside. OpenSolaris,
OpenOffice.org, Hudson, and others have all had their communities upended,
to some extent, due to Oracle's inattention or worse. MySQL has largely
avoided that fate, but that seems to be changing—at least
partly because MySQL is being developed in a more closed way.
The emergence of MariaDB as a drop-in
replacement for MySQL over the last few years makes it a viable
alternative. Beyond that, MySQL has changed some of its policies and
practices over the same period in ways that make it less attractive for Linux
distributions. Changes to the MySQL security reporting practices, the visibility
of bugs in the bug tracker, and the lack of a full regression test suite
are all cited as reasons to consider switching.
Several distributions are either considering switching to MariaDB as the
default, or have already done so. Mageia 2, which was released in
mid-2012, was the first to adopt MariaDB as the default instead of MySQL.
On January 25, openSUSE MySQL (and MariaDB) maintainer Michal Hrušecký announced
that openSUSE 12.3 would make the switch as well. Meanwhile, Fedora has proposed
replacing MySQL with MariaDB as a feature for Fedora 19, citing the
following reasons:
Recent changes made by Oracle indicate they are moving the MySQL project to
be more closed. They are no longer publishing any useful information about
security issues (CVEs), and they are not providing complete regression
tests any more, and a very large fraction of the mysql bug database is now
not public.
Because it is a drop-in replacement, though, MariaDB cannot (easily) be
installed in parallel with MySQL. Library and utility names conflict
between the two projects, so
users will need to make a choice—or allow the distribution to choose
for them.
From the discussions it is clear that the distributions recognize the
need to continue supporting MySQL where it is already installed, and
for users who have a preference; it is just new installs that would be
affected.
If, indeed, MariaDB just "drops in" that won't
pose much of a problem.
There are a large number of packages available that use MySQL or, more
correctly, the MySQL API. One of the concerns brought up after Hrušecký
proposed the switch on the opensuse-factory
mailing list was the compatibility question. Hrušecký expressed
confidence that there would be no compatibility issues, but pointed to the
continuing availability of MySQL as a fallback. The other main
complaint in the surprisingly short thread was that it was rather late in
the 12.3 development cycle to be making such a sweeping change. Even
though 12.3 is due in March, the switch will evidently be made.
Fedora has a bit more time. At the January 30 Fedora engineering steering
committee (FESCo) meeting, the feature was accepted for Fedora 19, which is
due at
the end of May (though it may well slip a bit after the long Fedora 18
delay). When Fedora program manager Jaroslav Reznik announced the feature proposal, it was
greeted largely with approval in a thread on the fedora-devel mailing list. The
only real opposition came from a somewhat surprising direction: Oracle, or at
least some of its employees.
Andrew Rist of Oracle suggested that Fedora
keep "MySQL, the leading edge and highest quality open source
database, as the
default choice in your Linux distribution". According to Rist,
MariaDB lags in features and doesn't have the large development and QA
staff that Oracle brings to the table. The posting refers to "we"
throughout—and in places reads something like a press
release—but does
offer to help with integration and packaging. His argument is that Fedora
should only be looking at the "merits and quality" of the code
in making its decision.
Rist encouraged Fedora to not allow the competition between Oracle and Red
Hat in
the Linux support business to be a factor in any decision to switch. As
Adam Williamson pointed out, though, the
complaints about MySQL are typically regarding "the freedom of the
development community,
a topic that was noticeably absent from your mail". Furthermore,
Rist's attempt to liken the Oracle security
disclosure practices to Red Hat's tarball kernel
releases was quickly shot down. As Rahul Sundaram put it:
Just to be clear, RHEL != Fedora. Red Hat policy for RHEL kernel is not
acceptable to Fedora and Fedora kernel continues to have the patches split
out. You cannot use that to defend MySQL policies here. You can do
whatever you want to do for the MySQL "enterprise edition" which is a
commercial product but the community project should have transparency and
openness in how it handles bugs, security issues, test cases etc.
Another Oracle employee, Norvald Ryeng, also posted in the thread. He
focused on trying to determine what problems Fedora has had in maintaining
MySQL, and to try to fix them: "Linux distributions are an important part of our community,
and we'd like very much to hear your feedback and help make package
maintainers' lives easier." Some of the specific complaints made by Michael Scherer
and others made by Remi Collet were addressed by
Ryeng. The dialog and Ryeng's efforts can only result in easier
packaging of MySQL for Fedora. But it may be too late.
There is already some discussion of eventually dropping MySQL entirely.
Mageia was able to do so as Olav Vitters reported, but it is a much smaller
distribution, which means it has fewer users to be affected. Honza Horak,
who co-owns the Fedora feature proposal with Tom Lane, suggested a "wait and see" approach on what to
do with MySQL in the future.
It is a little hard to imagine that larger distributions can completely
drop support for a package as popular as MySQL—at least in the next
few years. On the other hand, there
is a strong sense in the free software world that MySQL development is
slowly being pulled inside the Oracle mothership. As has been noted in the
thread, it is frustrating to get
security updates with little or no information about what is being fixed and
it is difficult to ensure that backports are working correctly without
a regression test suite. Those things used to be
more open and the
concern seems to be that more secrecy and obfuscation are on their way.
It is a bit interesting to see Oracle (or, more correctly, some folks
at Oracle) take such an interest in whether Fedora switches to a
MariaDB default. One would think that Fedora users are a fairly small piece of
the MySQL user pie, so it is likely that the real concern is for what
happens in Red Hat Enterprise Linux (RHEL) down the road. While Fedora is
partly
an upstream for RHEL, there's no reason to believe that Red Hat would
blindly follow Fedora's lead here (or on any other package choice for that
matter). These recent shifts away from MySQL might make Oracle
rethink some of its development strategies for the open source project. If
not, it is probably only a matter of time before MariaDB (or some other
openly developed MySQL fork) overtakes MySQL as the "most popular open
source database". Or perhaps PostgreSQL will supplant both. It's hard to
say, but we certainly won't lack for options whatever the outcome.
Comments (11 posted)
Brief items
They're not warnings, they're "we just broke your system, hope you
weren't doing anything tonight!" A boulder with WARNING: FALLING ROCKS
spray-painted on the bottom.
Better to spare the innocents, and for the people who set
I_KNOW_WHAT_I_AM_DOING=y in make.conf, we can create
RESOLVED:I_THOUGHT_YOU_KNEW.
--
Michael Orlitzky
Comments (none posted)
Newsletters and articles of interest
Comments (none posted)
Page editor: Rebecca Sobol
Development
January 30, 2013
This article was contributed by Andi Kleen
elision : the act or an instance of omitting something : omission
Contended locks are a common software scaling problem on modern multi-core
systems. As more cores are added the synchronization overhead increases,
potentially exceeding the speed-up provided by the extra
processors. Writing good, scalable, fine-grained
locking is difficult for programmers. An alternative to locking is to use
memory transactions.
A memory transaction buffers all of the memory side effects from a code region and
makes the memory changes visible in an atomic manner, but only if the transaction
commits. If, instead, there is a conflict — memory used by one thread in
the transaction is modified by another thread at the same time —
the transaction is aborted and rolled back, and its side effects are discarded.
If the transaction commits, all memory changes become visible to other
threads atomically.
Traditionally, memory transactions have been implemented in software and
were too slow for most practical uses. In order to make memory
transactions usable in real-world situations, Intel has announced "Intel
Transactional Synchronization Extension" (Intel TSX) for the upcoming
Haswell CPU core. TSX supports memory transactions in hardware to improve
their performance and make them practical. IBM has also announced similar
extensions.
Of course very few programs today are written to use memory transactions
directly. Hardware memory transactions are a best-effort model intended for fast
paths: there is no guarantee that any hardware transaction will be
successful (but most reasonable ones should be). Any transaction implemented
with TSX needs a fallback path. The simplest fallback path is to use a lock
when the transaction doesn't succeed. This technique is called "lock elision"; it
executes a lock-protected region speculatively as a transaction, and only falls back
to blocking on the lock if the transaction does not commit.
Lock elision uses the same programming model as normal locks, so it can be
directly applied to existing programs. The programmer keeps using locks,
but the locks are faster as they can use hardware transactional memory
internally for more parallelism. Lock elision uses memory transactions as a
fast path, while the slow path is still a normal lock. Deadlocks and other
classic locking problems are still possible, because the transactions may
fall back to a real lock at any time. Critical sections that do not conflict in
their memory accesses with other threads and are otherwise
transaction-friendly will execute in parallel, and not block each other
out. The lock
elision fast path makes the lock lock-less (non-blocking) if the conditions
are right.
Usually, getting good performance with locks requires a large programming
effort to do locking in a sufficiently fine-grained manner that individual locks are not
overly contended. This work complicates the program. With lock elision it is
possible to stay with coarser-grained locks and let the hardware discover
the underlying parallelism opportunities instead, saving programmer
time.
Intel TSX implements two instruction-set interfaces: HLE (Hardware Lock
Elision) and RTM (Restricted Transactional Memory). HLE can be used to add
transaction hints to existing code paths, while RTM allows control of the
transactions directly and the use of software abort handlers. HLE refers to the
specific instruction prefixes, while lock elision is a more general concept
that can be also implemented with RTM. RTM requires new code paths as new
instructions are used.
The lock library needs to be modified to enable elision. Obvious candidates
are the glibc POSIX thread (pthread) mutex and rwlock primitives. Existing binaries can
be dynamically linked to the new glibc pthreads library. Programs using
pthread locking can then directly elide their locks without modifications.
Programs that implement their own locking library need some changes to
their lock code paths to enable lock elision.
glibc pthreads elision implementation
An implementation of RTM lock elision for
glibc has been posted to the
libc-alpha mailing list. POSIX mutexes support different locking types and
attributes. Glibc already has a range of lock types, like timed, adaptive,
recursive; elided locks simply become a new lock attribute and type. The elision
can be controlled by the administrator or the programmer, or it can be automatically
controlled by the pthreads library. The current interface is documented in
the glibc manual draft. This is a preliminary draft that may change, as
the code reviewers are expressing concerns about the API design.
The pthread lock functions like pthread_mutex_lock() dispatch on the
different lock types. A new elision path has been added there for timed
and adaptive mutexes. The glibc implementation uses RTM because it is more
flexible.
Inside the locking code, elision is implemented as a wrapper around the
existing lock. The wrapper first tries to run in a transaction, and only if
it's not successful does it call the underlying lock for blocking as a
fallback path.
A simple RTM elision wrapper looks like this (the actual glibc
implementation is somewhat more complex, since it implements retries and
adaptation):
void elided_lock_wrapper(lock) {
if (_xbegin() == _XBEGIN_STARTED) { /* Start transaction */
if (lock is free) /* Check lock and put into read-set */
return; /* Execute lock region in transaction */
_xabort(0xff); /* Abort transaction as lock is busy */
} /* Abort comes here */
take fallback lock
}
void elided_unlock_wrapper(lock) {
if (lock is free)
_xend(); /* Commit transaction */
else
unlock lock;
}
The _xbegin() call is a wrapper around a special instruction that
begins the transaction. It can be seen as being similar to
setjmp() in that it may return a second time if the transaction
is aborted partway through. Unlike the setjmp() situation,
though, a transaction abort will also unwind any other changes made since
the transaction began. If the transaction begins successfully,
_XBEGIN_STARTED will be returned. Should the transaction be
aborted, _xbegin() will return with an error code after the
transaction's side effects have been unwound; that will cause the code to
take the lock instead.
Either way, the lock itself will be put into the transaction's "read set";
a change to memory in the read set by another thread will cause the
transaction to abort immediately. Other memory locations accessed during
the transaction are treated in the same way; the processor will watch for
conflicting accesses and abort the transaction if need be. So, as soon as
one thread generates a conflicting memory access, one or more threads will
take the lock, causing all ongoing transactions tied to this lock to abort
and fall back to normal locking.
In its full form, the glibc elision wrapper uses an adaptive elision algorithm. When a lock
region frequently aborts its transactions it may slow down the program due
to mis-speculation (executing transactions that do not commit). Not
all mis-speculations necessarily cost time because they often happen in the
time the thread would have blocked anyway. But transactions that run for a
long time before aborting may slow down the execution of the program as a
whole. The adaptive algorithm
reacts to an aborted transaction by disabling elision for the lock for some time,
before retrying. This avoids the need of manually annotating every lock in
programs for elision.
It is also possible for the programmer to manually opt in or out, per-lock,
using a per-lock annotation interface. That capability may be especially
useful for rwlocks, since adaptive elision is not currently implemented for
them. See the
manual for more details.
Writing to a cache line used in a transaction will cause that transaction
to abort; thus, code in critical sections protected by elided locks cannot write to the
cache lines (regions of memory stored as a unit in the CPU's cache)
containing those locks. One of the big
advantages of elision is that it can keep the lock cache line in the "shared"
state, avoiding the infamous "lock cache line bouncing" communication
overhead that often slows down programs with fine-grained locking. But this
requires not writing to the lock or any memory location in the same cache
line. Normal glibc mutexes have owner fields
for debugging purposes which they always write. The elision code path
disables changing these owner fields, to keep the mutex structure read-only
in the locking fast path. The only side effect is that
pthread_mutex_destroy() will not return an error when called on a locked
lock, as the implementation can no longer detect this case.
Code using
RTM-elided locks do not know inside the critical section that a lock is locked
(the lock appears to be free). POSIX pthreads locking does not export an
operation to query a lock's state directly, but it can be queried
indirectly by calling pthread_mutex_trylock() recursively on a lock already
locked in the same thread. With RTM elision this call will succeed (HLE has
special hardware support to handle this case). Essentially, elided locks
behave like recursive locks. There were some concerns among code reviewers
that this behavior violates the POSIX standard ("shall fail" instead of "may
fail"). The latest version aborts nested trylocks by default, unless
elision is explicitly
forced for the mutex. This prevents some elision
opportunities, especially for programs that use pthread_mutex_trylock() as their
main locking primitive; this is usually done in an attempt to do a homebrew
spin-then-sleep lock, instead of using glibc's adaptive mutexes.
/* Assuming lock is not recursive */
pthread_mutex_lock(&lock);
if (pthread_mutex_trylock(&lock) == 0)
/* lock with elision forced will come here *
else
/* non elided lock will come here, after aborting */
pthread_mutex_unlock() detects whether the current lock is executed
transactionally by checking if the lock is free. If it is free it commits
the transaction, otherwise the lock is unlocked normally. This implies that
if a broken program unlocks a free lock, it may attempt to commit outside a
transaction, an error which causes a fault in RTM. In POSIX, unlocking a free lock is
undefined (so any behavior, including starting World War 3 is acceptable). It is
possible to detect this situation by adding an additional check in the
unlock path. The current glibc implementation does not do this, but if this
programming mistake is common, the implementation may add this check in the
future.
There are some other minor semantic changes that are not expected to affect
programs. See the glibc manual for a detailed list.
Tuning and Profiling TSX
Existing programs that use pthread
locking can run unchanged with the eliding pthreads library. Many programs
have some unnecessary conflicts in common lock regions that can be fixed
relatively easily, with a corresponding improvement in performance with
elision. These changes
typically also improve performance without elision, because they eliminate
unnecessary bouncing of cache lines between CPU cores.
Memory transactions involve speculative execution, so generally a profiler
is needed to understand their performance implications. A patch kit for perf to
enable
profiling on an Intel Haswell CPU is currently pending on
linux-kernel. This patch kit is relatively large because it includes special
support for TSX profiling. A normal cycle profiler would cause additional
aborts with its profiling interrupts, so special TSX events need to be used
to profile aborts.
Basic profiling for TSX with the perf patch kit is done with:
perf stat -T ....
to measure basic transaction success.
When the abort rate is high, aborts can be profiled with:
perf record -g -e tx-aborts ....
perf report
This profiles RTM abort locations in the source (for HLE aborts,
"-e el-aborts" should be used instead). It is important to
keep in mind that the call graph displayed
with -g is after the abort, so it points to the lock library, not the abort
location inside the lock region. The code address reported at the beginning
of the call graph is inside the transaction though. Essentially, the
call graph is not continuous.
Additional information on the abort can be recorded with:
perf record -g -W --transaction -e tx-aborts ...
perf report --sort comm,vdso,symbol,weight,transaction
The transaction flag allows the categorization of aborts into different
classes: caused by the current thread (SYNC), caused by another thread
(CONFLICT), and others. weight represents the number of cycles wasted in
the transaction before they are aborted; that is, how costly the abort is. Due to
limitations in the perf infrastructure and, contrary to what the option name
suggests, these fields do not actually sort currently.
Conflict abort profiles show only the target of the conflict abort, not the
cause. There are some common program patterns that are vulnerable to
conflicts and can be relatively easily fixed when detected:
- Global statistic counters (disable or make per thread)
- False sharing
in data structures or variables
- Unnecessary changes to variables that are already the expected value
(add a test)
Lock elision in other projects
Other projects that implement their own locking would need their own
TSX-enabled lock library. The kernel has its own locking library, but, in a
sense,
lock elision is less important for the kernel because it already has
highly tuned fine-grained locks, unlike many user-level programs. So glibc
elision is more important than kernel elision. But even a highly tuned code
base like the kernel can benefit from locking improvements (and not all
locks in the kernel are highly tuned for every workload).
The kernel has a large number of locks. Benchmarking each lock individually
with different workloads and annotating all locks is impractical. One
approach is to enable lock elision for all locks and rely on automatic
adaption algorithms (and some manual overrides) to disable elision for
elision-unfriendly locks.
The kernel locking primitives (spinlocks, rw spinlocks, mutexes, rw
semaphores, bit spinlocks) can all be elided with a similar RTM wrapping
pattern as used for the glibc mutexes. The kernel has an "is-locked"
primitive, which can be handled with RTM locks only with aborting. This can
be avoided with some changes to the callers in the kernel.
Some more details are available in this Linux plumbers
talk [PDF]. The kernel elision implementation is still in development.
Next steps
The next step is to pass glibc code review and integrate the lock elision
implementation into the glibc mainline. Then testing with a large range of
applications is useful to evaluate performance and potential problems. This
will require volunteers with programs that use pthread locking and the
willingness to test them. When no TSX-enabled system is available the SDE
emulator can be used for functional testing.
Credits
The glibc elision implementation has been implemented by Andi Kleen and Hongjiu Lu.
Author
Andi Kleen is a software engineer working on Linux in the Intel Open Source
Technology Center.
This article is the author's opinion and he is not speaking for Intel.
Comments (7 posted)
Brief items
Do you remember the GNOME 1.x => 2.x transition? Similarly to
how there are forks of GNOME now to 'keep the GNOME 2 candle
burning,' there were forks of GNOME 1.x to 'keep the GNOME 1 candle
burning.'
Do you remember what they were called? I didn't; I had to look 'em
up. Do you ever wonder what happened to them? Dead projects nobody
seems to remember. Do we really want to switch to a desktop that
history has shown is likely to become a dead project in a few
years?
—
Máirín Duffy
The only thing we're missing is a nice car analogy! So let me
provide one.
systemd's an Edsel with the trailer and aircraft-carrier-catapult
attachments, sysvinit is a Peel Trident. (I just want a Volvo.)
—
"nix"
Comments (57 posted)
Version 0.9.4 of the
Kdenlive video editor is out with a number of new features.
"
Kdenlive can now parse your clips to find the different scenes and
add markers or cut the clip accordingly. The process is currently very slow
but it's a start... Kdenlive can also now analyse
an object's motion, and the result of this can be used as keyframes for a
transition or an effect. For example, you can now have a title clip that
follows an object."
Comments (none posted)
Aaron Seigo
describes the
development plans for the Plasma framework. "
However, in Plasma
Active we've made two purposeful decisions: do not expose the hierarchical
file system (unless the use case dictates that as a requirement) and do not
expose details that are not relevant to the usage of the device (e.g. I
care that I can open that spreadsheet, it's less important at that moment
that the application is Calligra Sheets). Thus to open a spreadsheet one
opens the file manager and goes to Documents, or simply searches for the
document from the Launch area directly. No file system, no application
launchers."
Comments (107 posted)
Version 4.11.0 of the RPM package manager is out. It offers a number of
performance improvements, better conflict detection, a new
%license directive, and a new
%autosetup macro for better
automation of source unpacking and patch application.
Full Story (comments: none)
Newsletters and articles
Comments (none posted)
On his blog, Jono Bacon
describes the Ubuntu Phone core apps development process while also soliciting volunteers to help on the design side. "
So, we have a good set of developers assigned for each app, but we would like to invite our community to contribute design ideas for each of these apps. We have already defined a set of user stories and functional requirements, and for each app we have also defined a set of the core screens and functionality that we will need design for. We would like to invite you wonderful designers out there to contribute your design ideas, and these ideas can provide food for thought for the developers." The "core" apps are things like calendar, clock, calculator, email client, document viewer, social media apps, and so on.
Comments (5 posted)
Animationsupplement.com has an
interview with Ajit Nair who is leading a team making the 100-minute animation feature
Naughty 5 using the
Blender open source 3D content creation suite. The film is about five naughty children and is expected to be released by mid-2013. "
Students should focus on enhancing their creativity and art rather than understanding software functions. What you want to achieve is primary and how you achieve it is secondary. For people planning to make a film should definitely give Blender a try, there will be some time spent training but it will definitely be worth it." (Thanks to Paul Wise.)
Comments (1 posted)
Matthias Clasen
previews
the changes in GNOME 3.8. "
Allowing you to focus on your task
and minimizing interruptions has been an important aspect of the GNOME 3
design from the start. So far, we just had a global switch to turn off
notifications. The new Notification panel expands on this and allows
fine-grained control over what applications get to annoy you, and how
much."
Comments (179 posted)
Lennart Poettering decided to
refute a few
systemd myths on his blog, where "a few" means "30". "
There's
certainly some truth in that. systemd's sources do not contain a single
line of code originating from original UNIX. However, we drive inspiration
from UNIX, and thus there's a ton of UNIX in systemd. For example, the UNIX
idea of 'everything is a file' finds reflection in that in systemd all
services are exposed at runtime in a kernel file system, the
cgroupfs. Then, one of the original features of UNIX was multi-seat
support, based on built-in terminal support. Text terminals are hardly the
state of the art how you interface with your computer these days
however. With systemd we brought native multi-seat support back, but this
time with full support for today's hardware, covering graphics, mice,
audio, webcams and more, and all that fully automatic, hotplug-capable and
without configuration."
Comments (465 posted)
Charles H. Schulz
Looks
forward to the LibreOffice 4.0 release, currently planned for early
February. "
On a more abstract level, these changes also mark a more
radical departure from the OpenOffice.org codebase, and it is now becoming
quite difficult to just assume that because OpenOffice.org, Apache
OpenOffice behave in one specific way LibreOffice would do just the
same. Of course the API changes do not make the whole work themselves, but
the work we started with the 3.4 branch is paying off: LibreOffice 4.0 is
becoming a different animal, and that comes with its own distinct
advantages while clearly showing our ability as a community to innovate and
move forward."
Comments (16 posted)
Luis Villa
ponders
the "post open-source" movement, which rejects licensing altogether.
"
If some 'no license' sharing is a quiet rejection of the permission
culture, the lawyer’s solution (make everyone use a license, for their own
good!) starts to look bad. This is because once an author has used a
standard license, their immediate interests are protected – but the
political content of not choosing a license is lost. Or to put it another
way: if license authors get their wish, and everyone uses a license for all
content, then, to the casual observer, it looks like everyone accepts the
permission culture. This could make it harder to change that culture – to
change the defaults – in the long run."
Comments (72 posted)
Page editor: Nathan Willis
Announcements
Brief items
The Linux Foundation has announced that Axis Communications, D-Link,
O.S. Systems and Perforce have joined the organization.
Full Story (comments: none)
Articles of interest
The H
reports
that booting with UEFI can brick some Samsung laptop models; this can
happen regardless of whether secure boot is enabled. "
The Ubuntu
development team has held talks with Samsung staff, who have identified the
kernel's samsung-laptop driver as the prime suspect. This driver has
previously had issues – it had caused problems for other Samsung laptop
owners when booting Linux using UEFI. Also involved in analysing the
problem is Intel developer Matt Fleming, who posted two kernel changes for
discussion a week ago."
Comments (2 posted)
Over at opensource.com, Davis Miller
writes about an open source tabletop game (
available at Github) that teaches some of the ideas behind network security. "
While you seek these valuable digital assets, the network administrators respond by patching all compromised machines, raising an alarm, and sometimes changing the very topology to derail your movements. You and your team work together diligently, checking and raiding machines on the network, trying to not alert the network administrators of your presence. If the administrators feel threatened by any of the activity they see on a network, they'll take your stolen personal data and release it to the Internet. In other words, you'll get d0x3d!!"
Comments (1 posted)
Calls for Presentations
The 2013 Linux Plumbers Conference will be held September 18-20 in New
Orleans. The conference has just
announced
its call for participation. "
There are two ways for individuals to
participate in the conference: by submitting a refereed track presentation,
or proposing/running a microconference. Refereed track presentations are
traditional presentation-format sessions. A Microconference is a
collection of collaborative sessions focused on a particular area of the
kernel plumbing. Note that this year, refereed track presentations will be
shared with LinuxCon North America on a single overlapping day that will be
available to attendees at both conferences." See
the "participate"
page for details; submissions are due by June 17.
Comments (none posted)
Upcoming Events
Early
registration for
Linux Plumbers Conference is open until April 29. LPC takes place
September 18-20 in New Orleans, Louisiana.
Full Story (comments: none)
openSUSE Conference, oSC13,
has been
announced.
The conference will take place July 18-22 in Thessaloniki, Greece. "
The slogan of the conference this year is ‘Power to the Geeko’, as we would like to emphasize the bottom-up nature of our Free Software movement (an excellent fit with the country where early democracies developed). Thessaloniki provides many opportunities to Have a lot of Fun! The city features beautiful beaches and a lively night life as well as good food and drinks. We expect plenty of socializing between the technical sessions and code."
Comments (none posted)
The Southern California Linux Expo (February 22-24 in Los Angeles,
California) has announced a Poker Quiz for this year's event.
"
Building on last year's unique new conference game, the SCALE Team
has created an updated version of the game where Open Source meets
Jeopardy: SCALE Poker Quiz, an epic mix of scavenger hunt, trivia
quiz, and trading game."
Full Story (comments: none)
Events: January 31, 2013 to April 1, 2013
The following event listing is taken from the
LWN.net Calendar.
| Date(s) | Event | Location |
January 28 February 2 |
Linux.conf.au 2013 |
Canberra, Australia |
February 2 February 3 |
Free and Open Source software Developers' European Meeting |
Brussels, Belgium |
February 15 February 17 |
Linux Vacation / Eastern Europe 2013 Winter Edition |
Minsk, Belarus |
February 18 February 19 |
Android Builders Summit |
San Francisco, CA, USA |
February 20 February 22 |
Embedded Linux Conference |
San Francisco, CA, USA |
February 22 February 24 |
Mini DebConf at FOSSMeet 2013 |
Calicut, India |
February 22 February 24 |
FOSSMeet 2013 |
Calicut, India |
February 22 February 24 |
Southern California Linux Expo |
Los Angeles, CA, USA |
February 23 February 24 |
DevConf.cz 2013 |
Brno, Czech Republic |
February 25 March 1 |
ConFoo |
Montreal, Canada |
February 26 February 28 |
ApacheCon NA 2013 |
Portland, Oregon, USA |
February 26 February 28 |
O’Reilly Strata Conference |
Santa Clara, CA, USA |
February 26 March 1 |
GUUG Spring Conference 2013 |
Frankfurt, Germany |
March 4 March 8 |
LCA13: Linaro Connect Asia |
Hong Kong, China |
March 6 March 8 |
Magnolia Amplify 2013 |
Miami, FL, USA |
March 9 March 10 |
Open Source Days 2013 |
Copenhagen, DK |
March 13 March 21 |
PyCon 2013 |
Santa Clara, CA, US |
March 15 March 16 |
Open Source Conference |
Szczecin, Poland |
March 15 March 17 |
German Perl Workshop |
Berlin, Germany |
March 16 March 17 |
Chemnitzer Linux-Tage 2013 |
Chemnitz, Germany |
March 19 March 21 |
FLOSS UK Large Installation Systems Administration |
Newcastle-upon-Tyne , UK |
March 20 March 22 |
Open Source Think Tank |
Calistoga, CA, USA |
| March 23 |
Augsburger Linux-Infotag 2013 |
Augsburg, Germany |
March 23 March 24 |
LibrePlanet 2013: Commit Change |
Cambridge, MA, USA |
| March 25 |
Ignite LocationTech Boston |
Boston, MA, USA |
| March 30 |
Emacsconf |
London, UK |
| March 30 |
NYC Open Tech Conference |
Queens, NY, USA |
If your event does not appear here, please
tell us about it.
Page editor: Rebecca Sobol