By Nathan Willis
November 21, 2012
Adobe's proprietary and often annoying Flash format is dying, to be
replaced by a bagful of open technologies like HTML5, CSS3, SVG,
JavaScript, and royalty-free media codecs. Or so we are told. Of
course, we have been told this story often enough over the years that
it is difficult to muster genuine excitement at the news.
Nevertheless, the most recent combatant to enter the ring is Mozilla's
Shumway, which
constitutes a distinctly different life form than existing free
software projects like Lightspark and Gnash. Rather than implement Flash
support in a browser plugin, Shumway translates .swf content
into standard HTML and JavaScript, to be handled by the browser's main
rendering engine.
The sparking and gnashing of teeth
Gnash and Lightspark are both reverse-engineered implementations of
a Flash runtime (and, naturally, come with an accompanying Netscape
Plugin API (NP-API) browser plugin), but they cover different parts of the
specification. Gnash, the older of the two projects, implements
versions 1 and 2 of Flash's ActionScript language, and the
corresponding first generation of the ActionScript Virtual Machine
(AVM1). This provides solid coverage for .swf files up
through Flash 7, and partial support of Flash 8 and
Flash 9 (including a significant chunk of .flv video
found in the wild). Lightspark implements ActionScript 3 and the
AVM2 virtual machine, which equates to support for Flash 9 and
newer. Lightspark does have the ability to fall back on Gnash for
AVM1 content, though, which enables users to install both and enjoy
reasonably broad coverage without having to know the version
information of Flash content in advance.
As is typical of reverse engineering efforts, however, neither project
can claim full compatibility with the proprietary product. In
practice, development tends to focus on specific use-cases and popular
sites. Gnash, for example, was founded with the goal of supporting
Flash-based educational games, and previous releases have been pinned
to fixing support for popular video sites like YouTube. Lightspark
maintains a wiki
page detailing the status of support for common Flash-driven web
sites. But the sheer variety of Flash content makes it virtually
impossible to implement the full specification and offer any
meaningful guarantee that the plugins will render the content without
significant errors.
But an even bigger problem remains one of time and funding. Gnash in
particular has struggled to raise the funds necessary for lead developer
Rob Savoye to devote significant time to the code. Gnash has been a
Free Software Foundation (FSF) high-priority project for years, and
Savoye was the 2010 recipient of the FSF's Award for the Advancement
of Free Software, but fundraising drives have nevertheless garnered
low returns — low enough that as recently as March 2012, Savoye
reported
that the hosting bills for the site were barely covered. The last
major release was version 0.8.10 in February 2012, which included
OpenVG-based vector
rendering along with touchscreen support. A student named
Joshua Beck proposed
a 2012 Google Summer of Code (GSoC) project to add OpenGL ES 2.0
support under Savoye's mentorship, but it was not accepted. Traffic
on the mailing
lists has slowed to a trickle, though there are still commits from
Savoye and a devoted cadre of others.
Lightspark has made more frequent releases in recent years, including
two milestone releases in 2012. In June, Version 0.6.0.1
introduced support for Adobe AIR applications and the BBC web site's
video player. Version 0.7.0
in October added support for LZMA-compressed Flash content and
experimental support for runtime bytecode optimization.
Both projects regularly make incremental additions to their suites of
supported Flash opcodes and ActionScript functions, but neither has
much in the way of headline-grabbing features in new releases. This
is a bigger problem for Gnash, which does not have Adobe's newer
enhancements to Flash to worry about (and is probably a key reason
Gnash has had a hard time attracting donations). Lightspark can
still tackle a host of new features with each update of Adobe Flash.
Of course, both projects' real competition has come from the easy
availability of a freely-downloadable official browser plugin for
Linux, but Adobe announced in February
2012 that Flash 11.2 would be the last release available as an NP-API
plugin for Linux. Subsequent Linux releases would only be made as
the built-in Flash plugin in Google's Chrome. The move has seemingly
not motivated Flash-using Linux fans to cough up support for Gnash and
Lightspark — but perhaps the next major update to Flash will.
I did it Shumway
Mozilla developer Jet Villegas wrote a blog
post introducing Shumway on November 12, but the code has been
available for several months. Shumway is described as an
"experimental web-native runtime implementation" of
the .swf format. Shumway essentially pushes handling of the
formerly-Flash content to the browser's rendering engine and
JavaScript interpreter. This protects against misbehaving plugins
that eat up too many resources or simply crash. Shumway is available
as a Firefox extension [XPI], though it is only expected to work on the most recent Firefox
beta builds.
The recent Firefox build is required because Shumway parses Flash
content and translates it into HTML5 elements, such as
<canvas> and <video> elements, WebGL
contexts, and good-old-fashioned embedded images. Shumway translates
ActionScript into JavaScript to handle interactivity. Both AVM1 and
AVM2 are supported, as are ActionScript versions 1, 2, and 3. The
extension supports the use of <object> and
<embed> tags to incorporate Flash into the page. As
for multimedia codecs, Shumway can automatically take advantage of
whatever codecs are available on the system.
At the moment there is not a definitive compatibility list, so
Shumway's support for any particular Flash file is a gamble. Villegas
did say in a comment
that the project is targeting Flash 10 and below, which he said
accounts for "the vast majority of existing content."
The idea of translating Flash content into HTML5 is not original to
Shumway, but its predecessors have been focused on Flash-based
advertising. Google offers a web service called Swiffy
that translates uploaded Flash files into JSON objects, targeted at
advertisers wanting to deploy animated ads. Smokescreen
was a JavaScript player designed to render Flash ads on iOS devices.
Slaying the Flash gorgon
Mozilla's goal with Shumway is to remove Flash from the equation
altogether, replacing it with "open web" technologies. By
demonstrating that HTML5 content is capable of reproducing anything
that can be done in Flash, the thinking goes, the browser-maker can
encourage more content creators to drop Flash from their workflows.
One might think it fair to ask whether supporting Flash in any sense
genuinely "promotes" the use of Flash alternatives. After all, in
December 2010, Mozilla's Chris Blizzard told
Joe Brockmeier that the organization was not interested in funding
Flash support, open source or otherwise:
Our strategy is to invest in the web. While Flash is used on the web,
it lacks an open process for development with open specifications and
multiple competing implementations. Creating an open source version of
Flash wouldn't change the fact that Flash's fate is determined by a
single entity.
Blizzard's comment was in response to a question about supporting
Gnash and Lightspark development. Sobhan Mohammadpour asked the same
thing on the Shumway blog post, to which Villegas replied:
Processing SWF content in C/C++ exposes the same security & device
compatibility problems as the Adobe Flash Player. It also doesn’t help
advance the Open Web stack (eg. faster javascript and canvas
rendering) with the research work.
Such a distinction might seem like splitting hairs to some. In
particular, Villegas suggests that Gnash and Lightspark are a greater
security risk than an .xpi browser extension. The Gnash team
might take offense at that, especially considering the work the project has done to enforce a
solid testing policy. But it is certainly true that massaging Flash
content into generic web content has the potential to bring
.swf and .flv support to a broader range of
platforms. Both Gnash and Lightspark are developed primarily for
Linux, with only intermittent working builds for Windows. On the
other hand, Gnash and Lightspark also offer stand-alone, offline Flash
players, which can be a resource-friendly way to work with Flash games
and applications.
History also teaches us that it would be unwise to embrace Shumway too
tightly, writing off Gnash and Lightspark as also-rans, for the simple
reason that Shumway is still an experimental Mozilla project. Sure,
some Mozilla experiments (such as Firefox Sync) move on to be fully
integrated features in future browsers — but far more are put
out to pasture and forgotten, with nary an explanation. Firefox Home,
Chromatabs, Mozilla Raindrop — the list goes on and on.
It is also not clear exactly what to make of Villegas's statement
about Flash 10 being the newest supported version. If that is
long-term limitation, then Shumway may be of finite usefulness. True,
Flash may die out completely before there is ever a Flash 12, and
Flash 11 may never account for a significant percentage of the web's
.swf files. In that case, users everywhere will enjoy a
blissful HTML5-driven future with plugin-crashes a forgotten woe, and
free unicorns as far as the eye can see. But where have we heard that
one before?
Comments (47 posted)
By Nathan Willis
November 21, 2012
Select a software license can be tricky, considering all of the
effects that such a choice has for the future: library compatibility,
distribution, even membership in larger projects. But agreeing on a
license at the beginning is child's play compared to
trying to alter that decision later on. Case in point: the VLC media
player has recently been relicensed under LGPLv2.1+, an
undertaking that required project lead Jean-Baptiste Kempf to track down
more than 230 individual developers for their personal authorization
for the move.
VLC had been licensed under GPLv2+ since 2001; the development
team decided to undertake the relicensing task for a number of
reasons, including making VLC compatible with various gadget-vendor
application stores (e.g., Apple's). Making the core engine available
under LGPL terms would make it a more attractive target for
independent developers seeking to write non-GPL applications, the
argument goes, which benefits the project through added exposure, and
may even attract additional contributor talent.
The license migration was approved by team
vote in September 2011. The first big milestone was cleared the
following November, a relicensing of libVLC
and libVLCcore (which implement the external API and internal
plugin layer, respectively), plus the auxiliary libraries
libdvbpsi, libaacs, and libbluray. Kempf
described
the process involved on his blog. Because VLC contributors retain
the authors' rights to their contributions, no matter how small, Kempf
needed to locate and obtain permission from all of the roughly 150
developers who had written even a minor patch during the project's
long history.
To do so, he harvested names and email addresses from the project's
git repository and logs, and undertook a lengthy process of
sifting through the records (both to weed out false matches, and to
identify contributors who were credited in unofficial spots like in-line
comments). With the list in hand, Kempf set out to contact each of the
contributors to approve the licensing change. He was ultimately
successful, and the change was committed.
The commit notes that more than 99% of the developers consented to the
change, and those agreeing account for 99.99% of the code, which he
said is sufficient from a legal standpoint.
The modular community
But, as Kempf described
in a follow-up post, the same method was less successful when he set out
in 2012 to relicense the next major chunk of VLC code: the major
playback modules. Together, they constitute a much larger
codebase, with considerably more contributors (including some who are
not necessarily committed VLC team members). After emailing the
module authors, he said, he received responses from only 25% of them.
Two rounds of follow-up emails edged the number up closer to 50%, but
for the remainder he resorted to "finding and stalking"
the holdouts through other means. Those means included IRC, GitHub,
social networks, mutual friends, employers, and even whois
data on domain names.
In the end, he managed to get approval from the overwhelming majority
of the contributors, but there were some "no"s as well, plus a handful
of individuals who never replied at all. At that point, he had to
examine the unresolved contributions themselves and decide whether
to delete them, reimplement them, refactor them into separate files,
or drop the offending modules altogether. He made the license-changing
commit
on November 6, and listed about 25 modules that were not
included. They include the work of 13 developers who either
declined give their approval or were unreachable, plus a set of modules
that were ports from other projects (such as Xine or MPlayer) and thus
not in the VLC team's purview.
By all accounts, the legwork required to hunt down and cajole more
than 230 developers was arduous: in the second blog post, Kempf noted
that it could get "really annoying" to contact people "over,
over, over and over, and over" to ask for an answer. That is
probably an understatement; in an email Kempf said at the outset that
no one thought it would even be doable.
He also elaborated on what comes next. Not every VLC module
was targeted for the relicensing work of the previous year, he said.
Out of the roughly 400 modules being developed, about 100 remain
non-LGPL. First, for those who rarely venture beyond VLC's desktop
media player functionality, it can be easy to forget all of the other
functions it provides; those modules will remain under their
existing licenses. In particular, VLC's media server, converter, and
proxy functionality will remain in GPL modules. Other modules,
including scripting and visualization, will remain GPL-licensed at
least for the time being, because they do not impede the ability of
third-party developers to write non-GPL playback applications, which
was the leading use-case motivating the change. VLC's user interface and
control modules will also remain GPL-licensed, in order to discourage
non-free forks.
Kempf also pointed out that the VideoLAN non-profit organization holds
the
trademarks to VLC, VideoLAN, and other names, and restricts their usage to
open source code. That reflects the project's concern that the move
away from the GPL will be misinterpreted by someone as a move away
from free-ness (in multiple senses of the word); in addition to the
trademark policy, both of the announcements about the relicensing
project have emphasized that despite the change, VLC will remain free
software.
Holdouts
But despite the consensus reached by the majority of core and module
developers, there is still the problem of those twenty-odd playback
modules that, for one reason or another, are not being relicensed.
Kempf explained that the main VLC application will still be able to
use all of the non-LGPL modules, and that only third-party libVLC
applications will encounter any difficulties with license
compatibility.
Authors of such applications may write their own modules for the
missing functionality, or simply migrate to another module —
given the modular nature of VLC, there are several modules
out there that duplicate functionality implemented elsewhere.
"The results might be slightly different, but I doubt many
people will notice. There are a few exceptions, (probably 2 or 3) that
will get rewritten, at some point, I think."
There are two modules Kempf predicted will never be reimplemented
in LGPL code — DVD playback and Teletext support — because
they rely on other GPL-licensed packages without viable non-GPL
alternatives. He still holds out hope for tracking down a few of the
still-unreached contributors, of course — only the authors of
the iOS, Dolby, Headphone, and Mono modules outright declined to
relicense their work.
It is not possible to predict exactly what effect the LGPL-relicensing
work will have on third-party developers targeting iOS or other "app
store" markets, thanks to the often opaque processes governing which
content gets in and which gets rejected. But VLC was yanked from the
iOS App Store in January 2011, a decision believed
to be due to the GPL license. But because Apple does not provide
details about its decisions, the situation remains nebulous.
Nevertheless, hunting down several hundred individual developers from more than a
decade of development is an impressive feat of, shall we say, logistical
engineering. Relicensing a community project is rarely a simple task;
one is reminded of the multi-year process required to relicense the Heyu home automation engine, which
involved tracking down the estates of developers no longer with us.
Many large software projects have contemplated a license change at one
time or another, and typically the scope of tracking down and
persuading all of the former developers is cited as a reason that such
a change is unworkable. For example, VLC's contributor pool is far smaller than
the kernel's, to be sure. But the fact that Kempf was able to
successfully chase down virtually the full set of both uncooperative
and unintentionally-AWOL contributors in such a short time frame is an
admirable achievement. Then again, the VLC team has long enjoyed
a reputation for admirable achievements.
Comments (21 posted)
By Jonathan Corbet
November 20, 2012
The kind folks at Google decided that your editor was in need of a present
for the holidays; soon thereafter, a box containing a Nexus 7 tablet
showed up on the doorstep. One might think that the resulting joy might be
somewhat mitigated by the fact that your editor has
been in possession of an N7 tablet since last
July, and one might be right. But the truth of the matter is that the gift
was well timed, and not just because it's nice to be able to install
ill-advised software distributions on a tablet without depriving oneself of
a useful device.
It was not that long ago that a leading-edge tablet device was a fairly big
deal. Family members would ask where the tablet was; the house
clearly wouldn't contain more than one of them. What followed, inevitably,
was an argument over who got to use the household tablet. But tablets are
quickly becoming both more powerful and less expensive — a pattern that a
few of us have seen in this industry before. We are quickly heading toward
a world where tablet devices litter the house like notepads, cheap pens, or
the teenager's dirty socks. Tablets are not really special anymore.
They are, however, increasingly useful. Your editor recently purchased a
stereo component that locates his music on the network (served by Samba),
plays said music through the sound system with a fidelity far exceeding
that available from portable music players, and relies on an application
running on a handy Android (or iOS) device for its user interface. Every
handset and tablet in the house, suddenly, is part of the music system;
this has led to a rediscovery of your editor's music collection — a
development not universally welcomed by your editor's offspring. Other
household devices, thermostats for example, are following the same path.
There is no need to attach big control surfaces to household gadgets; those
surfaces already exist on kitchen counters and in the residents' pockets.
So the addition of a tablet into a household already containing a few of
them is not an unwelcome event; it nicely replaces the one that will
eventually be found underneath the couch.
What's new in Android 4.2
About the time this tablet showed up, the Android 4.2 release came out as
an over-the-air update. Some of the features to be found there would seem
to have been developed with the ubiquitous tablet very much in mind. At
the top of the list, arguably, is the new multiuser support. A new "users"
settings screen allows the addition of new users to the device; each user
gets their own settings, apps, lock screen, etc. Switching between users
is just a matter of selecting one from the lock screen.
Android users are still not as strongly isolated as on a classic Linux
system. Apps are shared between them so that, for example, if one user
accepts an app update that adds permissions, it takes effect for everybody.
The initial user has a sort of mild superuser access; no other users can add or
delete users, for example, and the "factory reset" option is only available
to the initial account. There doesn't seem to be a way to parcel out
privileges to other accounts. The feature works well enough for a common
use case: a tablet that floats around the house and is used by multiple
family members. Perhaps someday the face unlock feature will recognize the
user of the tablet and automatically switch to the correct account.
A feature that is not yet present is the ability to clone one tablet
onto another. As we head toward the day when new tablets will arrive as
prizes in cereal boxes, we will lose our patience with the quaint process
of configuring the new tablet to work like the others do. Google has made
significant progress in this area; a lot of useful stuff just appears on a
new tablet once the connection to the Google account has been made. But
there is still work to do; the process of setting up the K9 mail client is
particularly tiresome, for example. And, naturally, storing even more
information on the
Google mothership is not without its concerns. Wouldn't it be nice to just
put the new tablet next to an existing one and say "be like that other
one"? The transfer could be effected with no central data storage at all,
and life would be much easier.
Much of the infrastructure for this kind of feature appears to already be
in place. The near-field communications (NFC) mechanism can be used to
"beam" photos, videos, and more between two devices just by touching them
together. The "wireless display" feature can be used to transmit screen
contents to a nearby television. It should not be hard to do a full
backup/restore to another device. Someday. Meanwhile, the "beaming"
feature is handy to move photos around without going through the tiresome
process of sending them through email.
Another significant new feature is the "swipe" gesture typing facility,
whereby one spells words by dragging a finger across the keyboard from one
letter to the next.
Gesture typing has been available via add-on apps for a while, but now it's
a part of the Android core. Using it feels a little silly at the outset;
it is like a return to finger painting in elementary-school art class. For
added fun, it will attempt to guess which word is coming next, allowing
the typing process to be skipped entirely — as long as the guesses turn out
to be accurate. In your editor's experience, gesture typing is no faster
than tap-typing; if anything, it is a little slower. But the resulting
text does seem to be less error-prone; whoever wrote the code doing the
gesture recognition did a good job.
One interesting change is that the notification bar at the top has been
split into two. The downward-swipe gesture on the left side gives the
usual list of notifications — though many of them have been enhanced with
actions selectable directly from the notification. On the right side,
instead, one gets various settings options. The new scheme takes a while
to get used to; it also seems like it takes a more determined effort to get
the selected screen to actually stay down rather than teasing the user and
popping right back up.
Various other new features exist. The "photo sphere camera" is evidently
an extension of the panorama feature found in earlier releases; alas, it
refuses to work on the N7's (poor) front-facing camera, so your editor was
unable to test it out. The camera also now evidently has high dynamic
range (HDR) processing functionality. On the Nexus 10 tablet, the
"Renderscript" mechanism can use the GPU for computational tasks; no other
device has the requisite hardware support at the moment.
There is a screen magnification feature that can be
used to zoom any screen regardless of whether the running app was written
with that in mind. And so on.
One other change in the 4.2 release is the replacement of the BlueZ-based
Bluetooth stack with a totally new stack (called "Bluedroid") from
Broadcom. This stack, according to the
release notes, "provides improved compatibility and
reliability." A message on the
android-platform list gives some additional reasons for the change, including
the ability to run Bluetooth processes in a separate process, elimination
of the D-Bus dependency, and more. The licensing of the new "Bluedroid"
stack has raised some
questions of its own that have not been clarified as of this writing.
Bluetooth stack questions aside,
the obvious conclusion is that the Android platform continues to advance
quickly. Each release improves the experience, adds features, and
generally cements Android's position as the Linux-based platform for mobile
devices. Your editor would still like to see an alternative platform,
preferably one that is closer to traditional Linux, but that seems
increasingly unlikely as the spread of Android continues unabated and
unchallenged. The good news is that Android continues to be (mostly) free
software and it continues to improve. This stage of the evolution of the
computing industry could easily have taken a highly proprietary turn;
thanks to Android, the worst of that has been avoided.
(Thanks to Karim Yaghmour for pointers to the Bluedroid discussion).
Comments (34 posted)
Page editor: Jonathan Corbet
Security
By Jake Edge
November 21, 2012
A recently discovered Linux rootkit has a number of interesting attributes
that make it worth a look. While it demonstrates the power that a rootkit
has (to perform its "job" as well as hide itself from detection) this
particular rootkit also has some fairly serious bugs—some that are
almost comical. What isn't known, at least yet, is how the system where it
was discovered became infected; there is no exploit used by the rootkit to
propagate itself.
The rootkit was reported to the
full-disclosure mailing list on November 13 by "stack trace". Customers
had noticed that they were being redirected to malicious sites by means of
an <iframe> in the HTTP responses from Stack Trace's site. Stack Trace
eventually found that the Nginx web server on the system was not
delivering the
<iframe> and tracked it to a loadable kernel module, which
was attached to the posting. Since
then, both CrowdStrike
and Kaspersky Lab's Threatpost
have analyzed the module's behavior.
The first step for a rootkit is to get itself running in the kernel. That
can be accomplished by means of a loadable kernel module. In this
case, the module is
targeted
at the most recent 64-bit
kernel version used by Debian 6.0 ("Squeeze"):
/lib/modules/2.6.32-5-amd64/kernel/sound/module_init.ko
The presence of that file would indicate infection, though a look at the
process list is required to determine if the rootkit is actually loaded.
Once loaded, the module has a number of different tasks to perform that
are described below. The CrowdStrike post has even more detail for those
interested.
The rootkit targets HTTP traffic, so that it can inject an
<iframe> containing an attack: either a
malicious URL or some kind of JavaScript-based attack. In order to do
that in a relatively undetectable way, it must impose itself into the
kernel's TCP send path. It does so by hooking tcp_sendmsg().
Of course, that function and other symbols that the rootkit wants to access
are not exported symbols that would be directly accessible to a kernel
module. So the rootkit uses /proc/kallsyms to get the addresses
it needs. Amusingly, there is code to fall back to looking for the proper
System.map to parse for the addresses, but it is never used due to
a bug. Even though the kernel version is hardcoded in several places in
the rootkit, the
System.map helper function actually uses uname -r to
get the version. The inability to fall back to checking
System.map, along with this version-getting
oddity make it seem like multiple people—with little or no
inter-communication—worked on the code. Other odd bugs in the
rootkit only add to that feeling.
For example, when hooking various functions, the rootkit carefully saves
away the five bytes it needs to overwrite with a jmp instruction, but then
proceeds to write 19 bytes at the start of the function. That obliterates
14 bytes of code, which eliminates any possibility of unhooking the
function. Beyond that, it can't call the unhooked version of the function
either, so the rootkit contains private copies of all the functions it hooks.
Beyond hooking tcp_sendmsg(), the rootkit also attempts to hide
its presence. There is code to hide the files that it installs, as well as
its threads. The file hiding works well enough by hooking
vfs_readdir() and using a list of directories and files that
should not be returned. Fortunately (or unfortunately, depending on one's
perspective), the thread hiding doesn't work at all. It uses the same
file-hiding code, but doesn't look in /proc nor convert the names
into PIDs, so ps and other tools show the threads. In the original
report, Stack Trace noted two threads named get_http_inj_fr and
write_startup_c; those names are fairly descriptive given the
behavior being seen. The presence of one or both of those names in the
process list would mean that the system has the rootkit loaded.
The rootkit does successfully remove itself from the list of loaded
modules. It directly iterates down the kernel's module list and deletes
the entry for itself. That way lsmod will not list the module,
but it also means that it cannot be unloaded, obviating the "careful"
preparations in the hooked functions for that eventuality.
As with other malware (botnets in particular), the rootkit has a "command
and control" client. That client contacts a particular server (at a
hosting service in Germany) for information about what to inject in the web
pages. There is some simple, weak encryption used on the link for both
authentication and obfuscation of the message.
Beyond just missing a way to propagate to other systems, the rootkit is
also rather likely to fail to persist after a reboot. It has code to
continuously monitor and alter /etc/rc.local to add an
insmod for the rootkit module. It also hooks vfs_read()
to look for the exact insmod line and adjusts the buffer to hide
that line from anyone looking at the file. But it just appends the command
to rc.local, which means that on a default installation of
Debian Squeeze it
ends up just after an exit 0 line.
Like much of the rest of the rootkit, the HTTP injection handling shows an
odd mix of reasonably sensible choices along with some bugs. It looks at
the first buffer to be sent to the remote side, verifies that its source
port is 80 and that it is not being sent to the loopback address. It also
compares the destination IP address with a list of 1708 search engine IP
addresses, and does no further processing if it is on the list.
One of the bugs that allowed Stack Trace to diagnose the problem is the
handling of status codes. Instead of looking for the 200 HTTP success
code, the rootkit looks for three strings on a blacklist that correspond to HTTP
failures. That
list is not exhaustive, so Stack Trace was able to see the injection in a
400 HTTP error response. Beyond that, the rootkit cleverly handles chunked
Transfer-Encodings and gzip Content-Encodings, though the latter does an
in-kernel decompress-inject-compress cycle that could lead to
noticeable server performance problems.
None of the abilities of the rootkit are particularly novel, though it is
interesting to see them laid bare like this. As should be obvious, a rootkit
can do an awful lot in a Linux system, and has plenty of ways to hide its
tracks. While this rootkit only hid some of its tracks, some of that may
have happened after the initial development. The CrowdStrike conclusion is
instructive here: "Rather, it seems that this is contract work of an
intermediate programmer with no extensive kernel experience, later
customized beyond repair by the buyer."
The question of how the rootkit was installed to begin with is still open.
Given the overall code quality, CrowdStrike is skeptical that some
"custom privilege escalation exploit" was used. That implies
that some known but unpatched vulnerability (perhaps in a web application)
or some kind of credential leak (e.g. the root password or an SSH
key) was the culprit. Until and unless some mass exploit is used to
propagate an upgraded version of the rootkit, it is really only of academic
interest—except, of course, to anyone whose system is already infected.
Comments (16 posted)
Brief items
So far, in most of the drive-by download scenarios an automated injection mechanism is implemented as a simple PHP script. In the case described above, we are dealing with something far more sophisticated - a kernel-mode binary component that uses advanced hooking techniques to ensure that the injection process is more transparent and low-level than ever before. This rootkit, though it's still in the development stage, shows a new approach to the drive-by download schema and we can certainly expect more such malware in the future.
--
Marta Janus
I do stay awake at night worrying that people are tagging my photo on Facebook, which could allow the New York Police Dept to submit a photo of protesters to Facebook and get a list of names and addresses of the people in the photo. Or it could allow the police to track my movements via existing networks of surveillance cameras by matching my image to my name. Would that require a search warrant? How would that impact my trust in my government to know that my movements are being tracked? Or worse, to know they might be tracked but I'll never know if [they] are or aren't?
--
Jamie McClelland
The ITU is the wrong place to make decisions about the future of the Internet.
Only governments have a voice at the ITU. This includes governments that do not support a free and open Internet. Engineers, companies, and people that build and use the web have no vote.
The ITU is also secretive. The treaty conference and proposals are confidential.
--
Google
is concerned about a closed-door International Telecommunication Union
(ITU) meeting in December
Comments (2 posted)
The "main is usually a function" blog has
a
discussion on the use of "Jit spraying" techniques to attack the
kernel, even when features like supervisor-mode execution prevention are
turned on. "
JIT spraying is a viable tactic when we (the attacker)
control the input to a just-in-time compiler. The JIT will write into
executable memory on our behalf, and we have some control over what it
writes. Of course, a JIT compiling untrusted code will be careful with
what instructions it produces. The trick of JIT spraying is that seemingly
innocuous instructions can be trouble when looked at another way."
Comments (44 posted)
Threat Post
reports
the discovery of a rootkit that targets 64-bit Linux systems.
"
The Linux rootkit does not appear to be a modified version of any
known piece of malware and it first came to light last week when someone
posted a quick description and analysis of it on the Full Disclosure mailing list. That poster said that his site had been targeted by the malware and some of his customers had been redirected to malicious sites."
Comments (13 posted)
New vulnerabilities
java-1.5.0-ibm: two vulnerabilities
| Package(s): | java-1.5.0-ibm |
CVE #(s): | CVE-2012-4820
CVE-2012-4822
|
| Created: | November 16, 2012 |
Updated: | November 23, 2012 |
| Description: |
From the Red Hat advisory:
CVE-2012-4820 IBM JDK: java.lang.reflect.Method invoke() code execution
CVE-2012-4822 IBM JDK: java.lang.class code execution |
| Alerts: |
|
Comments (none posted)
java-1.6.0-ibm: code execution
| Package(s): | java-1.6.0-ibm |
CVE #(s): | CVE-2012-4823
|
| Created: | November 16, 2012 |
Updated: | November 21, 2012 |
| Description: |
From the Red Hat advisory:
CVE-2012-4823 IBM JDK: java.lang.ClassLoder defineClass() code execution
|
| Alerts: |
|
Comments (none posted)
java-1.7.0-ibm: code execution
| Package(s): | java-1.7.0-ibm |
CVE #(s): | CVE-2012-4821
|
| Created: | November 16, 2012 |
Updated: | November 21, 2012 |
| Description: |
From the Red Hat advisory:
CVE-2012-4821 IBM JDK: getDeclaredMethods() and setAccessible() code execution |
| Alerts: |
|
Comments (none posted)
kdelibs: multiple vulnerabilities
| Package(s): | kdelibs |
CVE #(s): | CVE-2012-4515
CVE-2012-4514
|
| Created: | November 16, 2012 |
Updated: | February 18, 2013 |
| Description: |
From the Fedora advisory:
Bug #865831 - CVE-2012-4515 kdelibs: Use-after-free when context menu being used whilst the
document DOM is being changed from within JavaScript
Bug #869681 - CVE-2012-4514 kdelibs (khtml): NULL pointer dereference when trying to reuse
a frame with null part |
| Alerts: |
|
Comments (none posted)
libtiff: code execution
| Package(s): | tiff |
CVE #(s): | CVE-2012-4564
|
| Created: | November 15, 2012 |
Updated: | December 31, 2012 |
| Description: |
From the Ubuntu advisory:
Huzaifa S. Sidhpurwala discovered that the ppm2tiff tool incorrectly
handled certain malformed PPM images. If a user or automated system were
tricked into opening a specially crafted PPM image, a remote attacker could
crash the application, leading to a denial of service, or possibly execute
arbitrary code with user privileges. (CVE-2012-4564)
|
| Alerts: |
|
Comments (none posted)
libunity-webapps: code execution
| Package(s): | libunity-webapps |
CVE #(s): | CVE-2012-4551
|
| Created: | November 21, 2012 |
Updated: | November 21, 2012 |
| Description: |
From the Ubuntu advisory:
It was discovered that libunity-webapps improperly handled certain hash
tables. A remote attacker could use this issue to cause libunity-webapps
to crash, or possibly execute arbitrary code. |
| Alerts: |
|
Comments (none posted)
mozilla: multiple vulnerabilities
| Package(s): | firefox, thunderbird |
CVE #(s): | CVE-2012-4201
CVE-2012-4202
CVE-2012-4207
CVE-2012-4209
CVE-2012-4210
CVE-2012-4214
CVE-2012-4215
CVE-2012-4216
CVE-2012-5829
CVE-2012-5830
CVE-2012-5833
CVE-2012-5835
CVE-2012-5839
CVE-2012-5840
CVE-2012-5841
CVE-2012-5842
|
| Created: | November 21, 2012 |
Updated: | January 8, 2013 |
| Description: |
From the Red Hat advisory:
Several flaws were found in the processing of malformed web content. A web
page containing malicious content could cause Firefox to crash or,
potentially, execute arbitrary code with the privileges of the user running
Firefox. (CVE-2012-4214, CVE-2012-4215, CVE-2012-4216, CVE-2012-5829,
CVE-2012-5830, CVE-2012-5833, CVE-2012-5835, CVE-2012-5839, CVE-2012-5840,
CVE-2012-5842)
A buffer overflow flaw was found in the way Firefox handled GIF (Graphics
Interchange Format) images. A web page containing a malicious GIF image
could cause Firefox to crash or, possibly, execute arbitrary code with the
privileges of the user running Firefox. (CVE-2012-4202)
A flaw was found in the way the Style Inspector tool in Firefox handled
certain Cascading Style Sheets (CSS). Running the tool (Tools -> Web
Developer -> Inspect) on malicious CSS could result in the execution of
HTML and CSS content with chrome privileges. (CVE-2012-4210)
A flaw was found in the way Firefox decoded the HZ-GB-2312 character
encoding. A web page containing malicious content could cause Firefox to
run JavaScript code with the permissions of a different website.
(CVE-2012-4207)
A flaw was found in the location object implementation in Firefox.
Malicious content could possibly use this flaw to allow restricted content
to be loaded by plug-ins. (CVE-2012-4209)
A flaw was found in the way cross-origin wrappers were implemented.
Malicious content could use this flaw to perform cross-site scripting
attacks. (CVE-2012-5841)
A flaw was found in the evalInSandbox implementation in Firefox. Malicious
content could use this flaw to perform cross-site scripting attacks.
(CVE-2012-4201) |
| Alerts: |
|
Comments (none posted)
mysql: multiple unspecified vulnerabilities
| Package(s): | mysql |
CVE #(s): | CVE-2012-0540
CVE-2012-1689
CVE-2012-1734
CVE-2012-2749
|
| Created: | November 15, 2012 |
Updated: | November 21, 2012 |
| Description: |
From the Red Hat advisory:
833737 - CVE-2012-2749 mysql: crash caused by wrong calculation of key length for sort order index
841349 - CVE-2012-0540 mysql: unspecified vulnerability related to GIS extension DoS (CPU Jul
2012)
841351 - CVE-2012-1689 mysql: unspecified vulnerability related to Server Optimizer DoS (CPU Jul
2012)
841353 - CVE-2012-1734 mysql: unspecified vulnerability related to Server Optimizer DoS (CPU Jul
2012) |
| Alerts: |
|
Comments (none posted)
phpmyadmin: cross-site scripting
| Package(s): | phpmyadmin |
CVE #(s): | CVE-2012-5339
CVE-2012-5368
|
| Created: | November 20, 2012 |
Updated: | November 21, 2012 |
| Description: |
From the CVE entries:
Multiple cross-site scripting (XSS) vulnerabilities in phpMyAdmin 3.5.x before 3.5.3 allow remote authenticated users to inject arbitrary web script or HTML via a crafted name of (1) an event, (2) a procedure, or (3) a trigger. (CVE-2012-5339)
phpMyAdmin 3.5.x before 3.5.3 uses JavaScript code that is obtained through an HTTP session to phpmyadmin.net without SSL, which allows man-in-the-middle attackers to conduct cross-site scripting (XSS) attacks by modifying this code. (CVE-2012-5368) |
| Alerts: |
|
Comments (none posted)
python-keyring: weak cryptography
| Package(s): | python-keyring |
CVE #(s): | CVE-2012-4571
|
| Created: | November 21, 2012 |
Updated: | November 21, 2012 |
| Description: |
From the Ubuntu advisory:
Dwayne Litzenberger discovered that Python Keyring's CryptedFileKeyring
file format used weak cryptography. A local attacker may use this issue to
brute-force CryptedFileKeyring keyring files. |
| Alerts: |
|
Comments (none posted)
ruby: denial of service
| Package(s): | ruby |
CVE #(s): | CVE-2012-5371
|
| Created: | November 19, 2012 |
Updated: | December 7, 2012 |
| Description: |
From the Red Hat bugzilla:
Ruby 1.9.3-p327 was released to correct a hash-flooding DoS vulnerability that only affects 1.9.x and the 2.0.0 preview [1].
As noted in the upstream report:
Carefully crafted sequence of strings can cause a denial of service attack on the service that parses the sequence to create a Hash object by using the strings as keys. For instance, this vulnerability affects web application that parses the JSON data sent from untrusted entity.
This vulnerability is similar to CVS-2011-4815 for ruby 1.8.7. ruby 1.9 versions were using modified MurmurHash function but it's reported that there is a way to create sequence of strings that collide their hash values each other. This fix changes the Hash function of String object from the MurmurHash to SipHash 2-4.
Ruby 1.8.x is not noted as being affected by this flaw. |
| Alerts: |
|
Comments (none posted)
typo3-src: multiple vulnerabilities
| Package(s): | typo3-src |
CVE #(s): | |
| Created: | November 16, 2012 |
Updated: | November 21, 2012 |
| Description: |
From the Debian advisory:
Several vulnerabilities were discovered in TYPO3, a content management
system. This update addresses cross-site scripting, SQL injection,
and information disclosure vulnerabilities and corresponds to
TYPO3-CORE-SA-2012-005. |
| Alerts: |
|
Comments (none posted)
weechat: code execution
| Package(s): | weechat |
CVE #(s): | CVE-2012-5854
|
| Created: | November 19, 2012 |
Updated: | November 28, 2012 |
| Description: |
From the CVE entry:
Heap-based buffer overflow in WeeChat 0.3.6 through 0.3.9 allows remote attackers to cause a denial of service (crash or hang) and possibly execute arbitrary code via crafted IRC colors that are not properly decoded. |
| Alerts: |
|
Comments (none posted)
xen: multiple vulnerabilities
| Package(s): | Xen |
CVE #(s): | CVE-2012-3497
CVE-2012-4535
CVE-2012-4536
CVE-2012-4537
CVE-2012-4538
CVE-2012-4539
|
| Created: | November 16, 2012 |
Updated: | December 24, 2012 |
| Description: |
From the SUSE advisory:
* CVE-2012-4535: xen: Timer overflow DoS vulnerability
(XSA 20)
* CVE-2012-4536: xen: pirq range check DoS
vulnerability (XSA 21)
* CVE-2012-4537: xen: Memory mapping failure DoS
vulnerability (XSA 22)
* CVE-2012-4538: xen: Unhooking empty PAE entries DoS
vulnerability (XSA 23)
* CVE-2012-4539: xen: Grant table hypercall infinite
loop DoS vulnerability (XSA 24)
* CVE-2012-3497: xen: multiple TMEM hypercall
vulnerabilities (XSA-15) |
| Alerts: |
|
Comments (none posted)
Page editor: Jake Edge
Kernel development
Brief items
The current development kernel is 3.7-rc6,
released on November 16; things have been
slow since then as Linus has gone on vacation. "
I'll have a laptop
with me as I'm away, but if things calm down even further, I'll be
happy. I'll do an -rc7, but considering how calm things have been, I
suspect that's the last -rc. Unless something dramatic happens."
Stable updates: 3.0.52,
3.2.34,
3.4.19, and
3.6.7 were all released on November 17
with the usual set of important fixes.
Comments (none posted)
Sometimes it's scary how many latent bugs we have in the kernel and
how long many of them have been around. At other times, it's
comforting. I mean, there's a pretty good chance that other people
don't notice my screw ups, right?
—
Tejun Heo
End result: A given device only ever crashes exactly once on a given
Windows system.
—
Peter Stuge on why Linux may have to do
the same
I read that line several times and it just keeps sounding like some
chant done by the strikers in Greece over the austerity measures...
"Consolidate a bit! The context switch code!"
"Consolidate a bit! The context switch code!"
"Consolidate a bit! The context switch code!"
"Consolidate a bit! The context switch code!"
I guess because it just sounds Greek to me.
—
Steven Rostedt
After six and a half years of writing and maintaining KVM, it is
time to move to new things.
—
Avi Kivity hands off to Gleb Natapov
Comments (none posted)
Hans Verkuil has posted a report from the meeting of kernel-space media
developers recently held in Barcelona. Covered topics include a new
submaintainer organization, requirements for new V4L2 drivers, asynchronous
loading, and more. "
Basically the number of
patch submissions increased from 200 a month two years ago to 700 a month this
year. Mauro is unable to keep up with that flood and a solution needed to be
found."
Full Story (comments: none)
Kernel development news
By Michael Kerrisk
November 20, 2012
Checkpoint/restore refers to the ability to snapshot the state of an
application (which may consist of multiple processes) and then later
restore the application to a running state, possibly on a different
(virtual) system. Pavel Emelyanov's talk at LinuxCon Europe 2012 provided
an overview of the current status of the checkpoint/restore in user space
(CRIU) system that has been in development
for a couple of years now.
Uses of checkpoint/restore
There are various uses for checkpoint/restore functionality. For
example, Pavel's employer, Parallels, uses it for live migration, which
allows a running application to be moved between host machines without loss
of service. Parallels also uses it for so-called rebootless kernel updates,
whereby applications on a machine are checkpointed to persistent storage
while the kernel is updated and rebooted, after which the applications
are restored; the applications then continue to run, unaware that the
kernel has changed and the system has been restarted.
Another potential use of checkpoint/restore is to speed start-up of
applications that have a long initialization time. An application can be
started and checkpointed to persistent storage after the initialization is
completed. Later, the application can be quickly (re-)started from the
checkpointed snapshot. (This is analogous to the dump-emacs
feature that is used to speed up start times for emacs by creating a
preinitialized binary.)
Checkpoint/restore also has uses in high-performance computing. One
such use is for load balancing, which is essentially another application of
live migration. Another use is incremental snapshotting,
whereby an application's state is periodically checkpointed to persistent
storage, so that, in the event of an unplanned system outage, the
application can be restarted from a recent checkpoint rather than losing
days of calculation.
"You might ask, is it possible to already do all of these things
on Linux right now? The answer is that it's almost possible." Pavel
spent the remainder of the talk describing how the CRIU implementation
works, how close the implementation is to completion, and what work remains
to be done. He began with some history of the checkpoint/restore project.
History of checkpoint/restore
The origins of the CRIU implementation go back to work that started in
2005 as part of the OpenVZ
project. The project provided a set of out-of-mainline patches to the
Linux kernel that supported a kernel-space implementation of
checkpoint/restore.
In 2008, when the first efforts were made to upstream the
checkpoint/restore functionality, the OpenVZ project communicated with a
number of other parties who were interested in the functionality. At the
time, it seemed natural to employ an in-kernel implementation of
checkpoint/restore. A few year's work resulted in a set of more than 100 patches that
implemented almost all of the same functionality as OpenVZ's kernel-based
checkpoint/restore mechanism.
However, concerns from the upstream
kernel developers eventually led to the rejection of the kernel-based
approach. One concern related to the sheer scale of the patches and the
complexity they would add to the kernel: the patches amounted to tens of
thousands of lines and touched a very wide range of subsystems in the
kernel. There were also concerns about the difficulties of implementing
backward compatibility for checkpoint/restore, so that an application could
be checkpointed on one kernel version and then successfully restored on a
later kernel version.
Over the course of about a year, the OpenVZ project then turned its
efforts to developing an implementation of checkpoint/restore that was done
mainly in user space, with help from the kernel where it was needed. In
January 2012, that effort was repaid when Linus Torvalds merged
a first set CRIU-related patches into the mainline kernel, albeit with an
amusingly skeptical covering note from Andrew Morton:
A note on this: this is a project by various mad Russians to perform
checkpoint/restore mainly from userspace, with various oddball helper code
added into the kernel where the need is demonstrated.
So rather than some large central lump of code, what we have is
little bits and pieces popping up in various places which either
expose something new or which permit something which is normally
kernel-private to be modified.
Since then, two versions of the corresponding user-space tools have
been released: CRIU v0.1 in July, and CRIU v0.2, which added support for Linux Containers (LXC), in
September.
Goal and concept
The ultimate goal of the CRIU project is to allow the entire state of
an application to be dumped (checkpointed) and then later restored. This
is a complex task, for several reasons. First of all, there are many
pieces of process state that must be saved, for example, information about
virtual memory mappings, open files, credentials, timers, process ID,
parent process ID, and so on. Furthermore, an application may consist of
multiple processes that share some resources. The CRIU facility must allow
all of these processes to be checkpointed and restored to the same state.
For each piece of state that the kernel records about a process, CRIU
needs two pieces of support from the kernel. The first piece is a mechanism
to interrogate the kernel about the value of the state, in preparation for
dumping the state during a checkpoint. The second piece is a mechanism to
pass that state back to the kernel when the process is restored. Pavel
illustrated this point using the example of open files. A process may open
an arbitrary set of files. Each open() call results in the
creation of a file descriptor that is a handle to some internal kernel
state describing the open file. In order to dump that state, CRIU needs a
mechanism to ask the kernel which files are opened by that process. To
restore the application, CRIU then re-opens those files using the same
descriptor numbers.
The CRIU system makes use of various kernel APIs for retrieving and
restoring process state, including files in the /proc file system,
netlink sockets, and system calls. Files in /proc can be used to
retrieve a wide range of information about processes and their
interrelationships. Netlink sockets are used both to retrieve and to
restore various pieces of state information.
System calls provide a mechanism to both retrieve and restore various
pieces of state. System calls can be subdivided into two categories.
First, there are system calls that operate only on the process that calls
them. For example, getitimer() can be used to retrieve only
the caller's interval timer value. System calls in this category
can't easily be used to retrieve or restore the state of arbitrary
processes. However, later in his talk, Pavel described a technique that the
CRIU project came up with to employ these calls. The other category of
system calls can operate on arbitrary processes. The system calls
that set process scheduling attributes are an example:
sched_getscheduler() and sched_getparam() can be used to
retrieve the scheduling attributes of an arbitrary process and
sched_setscheduler() can be used to set the attributes of an
arbitrary process.
CRIU requires kernel support for retrieving each piece of process
state. In some cases, the necessary support already existed. However, in
other cases, there is no kernel API that can be used to interrogate the
kernel about the state; for each such case, the CRIU project must add a
suitable kernel API. Pavel used the example of memory-mapped files to
illustrate this point. The /proc/PID/maps file provides the
pathnames of the files that a process has mapped. However, the file
pathname is not a reliable identifier for the mapped file. For example,
after the mapping was created, filesystem mount points may have been
rearranged or the pathname may have been unlinked. Therefore, in order to
obtain complete and accurate information about mappings, the CRIU
developers added a new kernel API:
/proc/PID/map_files.
The situation when restoring process state is often a lot simpler: in
many cases the same API that was used to create the state in the first
place can be used to re-create the state during a restore. However, in
some cases, restoring process state is not so simple. For example,
getpid() can be used to retrieve a process's PID, but there is no
corresponding API to set a process's PID during a restore (the
fork() system call does not allow the caller to specify the PID of
the child process). To address this problem, the CRIU developers added an API that could be used to control the
PID that was chosen by the next fork() call. (In response to a
question at the end of the talk, Pavel noted that in cases where the new
kernel features added to support CRIU have security implications, access to
those features has been restricted by a requirement that the user must have
the CAP_SYS_ADMIN capability.)
Kernel impact and new kernel features
The CRIU project has largely achieved its goal, Pavel said. Instead of
having a large mass of code inside the kernel that does checkpoint/restore,
there are instead many small extensions to the kernel that allow
checkpoint/restore to be done in user space. By now, just over 100
CRIU-related patches have been merged upstream or are sitting in "-next"
trees. Those patches added nine new features to the kernel, of which only
one was specific to checkpoint/restore; all of the others have
turned out to also have uses outside checkpoint/restore.
Approximately 15 further patches are currently being discussed on the
mailing lists; in most cases, the principles have been agreed on by the
stakeholders, but details are being resolved. These "in flight" patches
provide two additional kernel features.
Pavel detailed a few of the more interesting new features added to the
kernel for the CRIU project. One of these was parasite code injection, which was added by
Tejun Heo, "not specifically within the CRIU project, but with the
same intention". Using this feature, a process can be made to
execute an arbitrary piece of code. The CRIU framework employs parasite
code injection to use those system calls mentioned earlier that operate
only on the caller's state; this obviated the need to add a range of new
APIs to retrieve and restore various pieces of state of arbitrary
processes. Examples of system calls used to obtain process state via
injected parasite code are getitimer() (to retrieve interval
timers) and sigaction() (to retrieve signal dispositions).
The kcmp() system call was
added as part of the CRIU project. It allows the comparison of various
kernel objects used by two processes. Using this system call, CRIU can
build a full picture of what resources two processes share inside the
kernel. Returning to the example of open files gives some idea of how
kcmp() is useful.
Information about an open file is available via /proc/PID/fd
and the files in /proc/PID/fdinfo. Together, these files reveal
the file descriptor number, pathname, file offset, and open file flags for
each file that a process has opened. This is almost enough information to
be able to re-open the file during a restore. However, one notable piece of
information is missing: sharing of open files. Sometimes, two open file
descriptors refer to the same file structure. That can happen, for
example, after a call to fork(), since the child inherits copies
of all of its parent's file descriptors. As a consequence of this type of
sharing, the file descriptors share file offset and open file flags.
This sort of sharing of open file descriptions can't be restored via
simple calls to open(). Instead, CRIU makes use of the
kcmp() system call to discover instances of file sharing when
performing the checkpoint, and then uses a combination of open()
and file descriptor passing via UNIX domain sockets to re-create the
necessary sharing during the restore. (However, this is far from the full
story for open files, since there are many other attributes associated with
specific kinds of open files that CRIU must handle. For example, inotify
file descriptors, sockets, pseudo-terminals, and pipes all require
additional work within CRIU.)
Another notable feature added to the kernel for CRIU is
sock_diag. This is a netlink-based subsystem that can be used to
obtain information about sockets. sock_diag is an example of how
a CRIU-inspired addition to the kernel has also benefited other projects.
Nowadays, the ss command, which displays information about sockets
on the system, also makes use of sock_diag. Previously,
ss used /proc files to obtain the information it
displayed. The advantage of employing sock_diag is that, by
comparison with the corresponding /proc files, it is much easier
to extend the interface to provide new information without breaking
existing applications. In addition, sock_diag provides some
information that was not available with the older interfaces. In
particular, before the advent of sock_diag, ss did not
have a way of discovering the connections between pairs of UNIX domain
sockets on a system.
Pavel briefly mentioned a few other kernel features added as part of
the CRIU work. TCP repair mode allows CRIU
to checkpoint and restore an active TCP connection, transparently to the
peer application. Virtualization of network device indices allows virtual
network devices to be restored in a network namespace; it also had the
side-benefit of a small improvement in the speed of network routing. As
noted earlier, the /proc/PID/map_files file was added for CRIU.
CRIU has also implemented a technique for
peeking at the data in a socket queue, so that the contents of a socket
input queue can be dumped. Finally, CRIU added a number of options to the
getsockopt() system call, so that various options that were
formerly only settable via setsockopt() are now also retrievable.
Current status
Pavel then summarized the current state of the CRIU implementation,
looking at what is supported by the mainline 3.6 kernel. CRIU currently
supports (only) the x86-64 architecture. Asked at the end of the talk how
much work would be required to port CRIU to a new architecture, Pavel
estimated that the work should not be large. The main tasks are to
implement code that dumps architecture-specific state (mainly registers)
and reimplement a small piece of code that is currently written in x86
assembler.
Arbitrary process trees are supported: it is possible to dump a process
and all of its descendants. CRIU supports multithreaded applications,
memory mappings of all kinds, and terminals, process groups, and sessions.
Open files are supported, including shared open files, as
described above. Established TCP connections are supported, as are UNIX
domain sockets.
The CRIU user-space tools also support various kinds of non-POSIX
files, including inotify, epoll, and signalfd file descriptors, but the
required kernel-side support is not yet available. Patches for that
support are currently queued, and Pavel hopes that they will be merged for
kernel 3.8.
Testing
The CRIU project tests its work in a variety of ways. First, there is
the ZDTM (zero-down-time-migration) test suite. This test suite consists
of a large number of small tests. Each test program sets up a test before
a checkpoint, and then reports on the state of the tested feature after a
restore. Every new feature merged into the CRIU project adds a test to
this suite.
In addition, from time to time, the CRIU developers take some real
software and test whether it survives a checkpoint/restore. Among the
programs that they have successfully checkpointed and restored are Apache
web server, MySQL, a parallel compilation of the kernel, tar, gzip, an SSH
daemon with connections, nginx, VNC with XScreenSaver and a client
connection, MongoDB, and tcpdump.
Plans for the near future
The CRIU developers have a number of plans for the near future. (The
CRIU wiki has a TODO list.) First
among these is to complete the coverage of resources supported by CRIU.
For example, CRIU does not currently support POSIX timers. The problem
here is that the kernel doesn't currently provide an API to detect whether
a process is using POSIX timers. Thus, if an application using POSIX
timers is checkpointed and restored, the timers will be lost. There
are some other similar examples. Fixing these sorts of problems will
require adding suitable APIs to the kernel to expose the required state
information.
Another outstanding task is to integrate the user-space crtools into
LXC and OpenVZ to permit live migration of containers. Pavel noted that
OpenVZ already supports live migration, but with its own out-of-tree kernel
modules.
The CRIU developers plan to improve the automation of live migration.
The issue here is that CRIU deals only with process state. However, there
are other pieces of state in a container. One such piece of state is the
filesystem. Currently, when checkpointing and restoring an application, it
is necessary to ensure that the filesystem state has not changed in the
interim (e.g., no files that are open in the checkpointed application have
been deleted). Some scripting using rsync to automate the copying files
from the source system to the destination system could be implemented to
ease the task of live migration.
One further piece of planned work is to improve the handling of
user-space memory. Currently, around 90% of the time required to
checkpoint an application is taken up by reading user-space memory. For
many use cases, this is not a problem. However, for live migration and
incremental snapshotting, improvements are possible. For example, when
performing live migration, the whole application must first be frozen, and
then the entire memory is copied out to the destination system, after which
the application is restarted on the destination system. Copying out a huge
amount of memory may require several seconds; during that time the
application is unavailable. This situation could be alleviated by allowing
the application to continue to run at the same time as memory is copied to
the destination system, then freezing the application and asking the kernel
which pages of memory have changed since the checkpoint operation began.
Most likely, only a small amount of memory will have changed; those
modified pages can then be copied to the destination system. This could
result in a considerable shortening of the interval during which the
application is unavailable. The CRIU developers plan to talk with the
memory-management developers about how to add support for this
optimization.
Concluding remarks
Although many groups are interested in having checkpoint/restore
functionality, an implementation that works with the mainline kernel has
taken a long time in coming. When one looks into the details and realize
how complex the task is, it is perhaps unsurprising that it has taken so
long. Along the way, one major effort to solve the
problem—checkpoint/restore in kernel space—was considered and
rejected. However, there are some promising signs that the mad Russians
led by Pavel may be on the verge of success with their alternative approach
of a user-space implementation.
Comments (11 posted)
By Jonathan Corbet
November 20, 2012
At any level of the system, from the hardware to high-level applications,
performance often depends on keeping frequently-used data in a place
where it can be accessed quickly. That is the principle behind hardware
caches, virtual memory, and web-browser image caches, for example. The
kernel already tries to
keep useful filesystem data in the page cache for quick access, but there
can also be advantages to keeping track of "hot" data at the filesystem
level and treating it specially. In 2010, a
data temperature tracking patch set
for the Btrfs filesystem was posted, but then faded from view. Now the
idea has returned as a more general solution.
The current form of the patch set, posted by Zhi Yong Wu, is called
hot-data tracking. It works at the virtual
filesystem (VFS) level, tracking accesses to data and making the resulting
information available to user space via a couple of mechanisms.
The first step is the instrumentation of the VFS to obtain the needed
information. To that
end, Zhi Yong's patch set adds hooks to a number of core VFS functions
(__blockdev_direct_IO(), readpage(),
read_pages(), and do_writepages()) to record specific
access operations. It is worth noting that hooking at this level means
that this subsystem is not tracking data accesses as such; instead, it is
tracking operations that cause actual file I/O. The two are not quite
the same thing: a frequently-read page that remains in the page cache will
generate no I/O; it could look quite cold to the hot-data tracking code.
The patch set uses these hooks to maintain a surprisingly complicated data
structure, involving a couple of red-black trees, that is hooked into a
filesystem's superblock structure. Zhi Yong used this bit of
impressive ASCII art to describe it in the documentation file included with the patch
set:
heat_inode_map hot_inode_tree
| |
| V
| +-------hot_comm_item--------+
| | frequency data |
+---+ | list_head |
| V ^ | V
| ...<--hot_comm_item-->... | | ...<--hot_comm_item-->...
| frequency data | | frequency data
+-------->list_head----------+ +--------->list_head--->.....
hot_range_tree hot_range_tree
|
heat_range_map V
| +-------hot_comm_item--------+
| | frequency data |
+---+ | list_head |
| V ^ | V
| ...<--hot_comm_item-->... | | ...<--hot_comm_item-->...
| frequency data | | frequency data
+-------->list_head----------+ +--------->list_head--->.....
In short, the idea is to track which inodes are seeing the most I/O
traffic, along with the hottest data ranges within those inodes. The
subsystem can produce a sorted list on demand. Unsurprisingly, this data
structure can end up using a lot of memory on a busy system, so Zhi Yong
has added a shrinker to clean things up when space gets tight. Specific
file information is also dropped after five minutes (by default) with no
activity.
There is a new ioctl() command (FS_IOC_GET_HEAT_INFO)
that can be used to obtain the relevant information for a specific file.
The structure it uses shows the information that is available:
struct hot_heat_info {
__u64 avg_delta_reads;
__u64 avg_delta_writes;
__u64 last_read_time;
__u64 last_write_time;
__u32 num_reads;
__u32 num_writes;
__u32 temp;
__u8 live;
};
The hot-data tracking subsystem monitors the number of read and write
operations, when the
last operations occurred, and the average period between operations. A
complicated calculation boils all that
information down to a single
temperature value, stored in temp. The live field is an
input parameter to the ioctl() call: if it is non-zero, the
temperature will be recalculated at the time of the call; otherwise a
cached, previously-calculated value will be returned.
The ioctl() call does not provide a way to query which parts of
the file are the hottest, or to get a list of the hottest files. Instead,
the debugfs interface must be used. Once debugfs is mounted, each
device or partition with a mounted filesystem will be represented by a
directory under hot_track/
containing two files. The most active files can be found by reading
rt_stats_inode, while the hottest file ranges can be read from
rt_stats_range. These are the interfaces that user-space
utilities are expected to use to make decisions about, for example, which
files (or portions of files) should be stored on a fast, solid-state drive.
Should a filesystem want to influence how the calculations are done, the
patch set provides a structure (called hot_func_ops) as a place for
filesystem-provided functions to calculate access frequencies,
temperatures, and when information should be aged out of the system. In
the posted patch set, though, only Btrfs uses the hot-data tracking
feature, and it does not override any of those operations, so it is not
entirely clear why they exist. The changelog states that support for ext4
and xfs has been implemented; perhaps one of those filesystems needed that
capability.
The patch set has been through several review cycles and a lot of changes
have been made in response to comments. The list of things still to be
done includes scalability
testing, a simpler temperature calculation function, and the ability to
save file temperature data across an unmount. If nothing else, some solid
performance information will be required before this patch set can be
merged into the core VFS code. So hot-data tracking is not 3.8 material,
but it may be ready for one of the subsequent development cycles.
Comments (1 posted)
By Jake Edge
November 21, 2012
Inserting a loadable module into the running kernel is a potential security
problem, so
some administrators want to be able to restrict which modules are allowed.
One way to do that is to cryptographically sign modules and have the kernel
verify that signature before loading the module.
Module signing isn't for everyone, and
those who aren't interested
probably don't want to pay much of a price for that new feature. Even
those who are interested will want to minimize that price. While
cryptographically signing kernel modules can provide a security benefit,
that boon comes with a cost: slower kernel builds. When that cost is
multiplied across a vast
number of kernel builds, it draws some attention.
David Miller complained
on Google+
about the cost of module signing in mid-October. Greg Kroah-Hartman agreed
in the
comments, noting that
an allmodconfig build took more than 10% longer between 3.6 and 3.7-rc1.
The problem is the addition of module signing to the build process.
Allmodconfig builds the kernel with as many modules as possible, which has
the effect of build-testing nearly all of the kernel. Maintainers like
Miller and Kroah-Hartman do that kind of build frequently, typically after
each patch they apply, in order to ensure that the kernel still builds.
Module signing can, of course, be turned off using
CONFIG_MODULE_SIG, but that adds a manual configuration step to
the build
process, which is annoying.
Linus Torvalds noted Miller's complaint and offered up a "*much*
simpler" solution: defer module
signing until install time. There is already a mechanism to strip
modules during the make modules_install step. Torvalds's
change adds module signing into that step, which means that you don't pay
the signing price until you actually install the modules. There are some
use cases that would not be supported by this change, but Torvalds
essentially dismissed them:
Sure, it means that if you want to load modules directly from your
kernel build tree (without installing them), you'd better be running a
kernel that doesn't need the signing (or you need to sign things
explicitly). But seriously, nobody cares. If you are building a module
after booting the kernel with the intention of loading that modified
module, you aren't going to be doing that whole module signing thing
*anyway*. Signed modules make sense when building the kernel and
module together, so signing them as we install the kernel and module
is just sensible.
One of the main proponents behind the module signing feature over
the years has been David Howells; his code was used as the basis for module
maintainer Rusty Russell's signature
infrastructure patch. But, Howells was not
particularly happy with Torvalds's changes. He would like to be able
to handle some of the use cases that
Torvalds dismissed, including loading modules from the kernel build
tree. He thinks that automatic
signing
should probably just be removed from the build process; a script could be
provided
to do signing manually.
Howells is looking at the signed modules problem from a distribution view.
Currently, the keys
used to sign modules can be auto-generated at build time, with the public key
getting built into the kernel and the private portion being used for
signing—and then likely deleted once the build finishes. That isn't
how distributions will do things, so
auto-generating keys concerns Howells:
It would also be nice to get rid of the key autogeneration stuff. I'm not
keen on the idea of unparameterised key autogeneration - anyone signing their
modules should really supply the appropriate address elements.
That may make sense for distributions or those who will be using long-lived
keys, but it makes little sense for a more basic use case. With
characteristic bluntness, Torvalds pointed that
out:
You seem to dismiss the "people
want to build their own kernel" people entirely.
One of the main sane use-cases for module signing is:
- CONFIG_CHECK_SIGNATURE=y
- randomly generated one-time key
- "make modules_install; make install"
- "make clean" to get rid of the keys.
- reboot.
and now you have a custom kernel that has the convenience of modules,
yet is basically as safe as a non-modular build. The above makes it
much harder for any kind of root-kit module to be loaded, and
basically entirely avoids one fundamental security scare of modules.
Kroah-Hartman agreed with the need to
support the use case Torvalds described,
though he noted that keys are not removed by make clean,
which he considered a bit worrisome. It
turns out that make clean is documented to leave the files
needed to build modules, so make distclean should be
used to get rid of the key files.
Russell, who has always been a bit skeptical of module signing, pointed out that Torvalds's use case could be
handled by just storing the hashes of the modules in the kernel—no
cryptography necessary. While that's true, Russell's scheme would disallow some
other use cases. Signing provides flexibility, Torvalds said, and is "technically the right
thing to do". Russell countered:
It's 52k of extra text to get that 'nice flexible'; 1% of my
kernel image. That's a lot of bug free code.
Russell's concerns
notwithstanding, it is clear that module signing is here to stay.
Torvalds's change was added for 3.7 (with some additions by Russell and
Howells). For
distributions, Josh Boyer has a patch that
will add a "modules_sign" target. It will operate on the modules
in their installed location (i.e. after a modules_install), and remove
the signature, which will allow the distribution packaging system (e.g. RPM) to
generate debuginfo for the modules before re-signing them. In that way,
distributions can use Torvalds's solution at the cost of signing modules
twice. Since that process should be far less frequent than developers
building kernels (or build farms building kernels or ...), that tradeoff is
likely to be well worth that small amount of pain.
Comments (none posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Memory management
Networking
Architecture-specific
Security-related
Virtualization and containers
Miscellaneous
Page editor: Jonathan Corbet
Distributions
By Jonathan Corbet
November 21, 2012
The ability to fork a project is one of the fundamental freedoms that come
with free software. If a software project is heading in a direction that
is less than ideal for its users or developers, a competing version can be
created and managed in a more useful manner. Forking has been used to
great effect with projects like Emacs, GCC, OpenOffice.org, and XFree86.
The most successful forks have specific goals in mind and tend to attract
at least a significant subset of the original project's developers. Other
types of forks face a harder road. Arguably, a recently launched fork of
the
udev utility under the aegis of the Gentoo project is of the
latter variety.
On November 13, the Gentoo council met to discuss, among other things, how
to support systems that are configured with the /usr directory on
a separate partition. The meeting minutes
show that much of the discussion centered around a new udev fork that, it
was hoped, might help to solve this problem. The existence of a new udev
fork (now called eudev)
within the Gentoo project took some developers by surprise, especially when
the associated README file was observed to claim that it was a "Gentoo
sponsored" project. This surprise led Greg Kroah-Hartman (a longtime
Gentoo developer) to ask what the goals of
the fork were. Getting an answer to that question turned out to be harder
that one might have expected.
One of the developers behind eudev is Richard Yao; his response really needs to be read in its
original form:
If we were using the waterfall model, I could outline some very
nice long term goals for you, but we are doing AGILE development,
so long term goals have not been well defined. Some short term
goals have been defined, but I imagine that you are already
familiar with them.
After extensive discussion with a lengthy digression on copyright law (the
eudev developers removed some copyright notices in a way that drew
objections), some of the project's motivations came into a bit of focus.
Part of the problem is certainly the increased integration between udev and
systemd. Udev is still easily built as a standalone binary, but some
people worry that this situation might change in the future; mere
association with systemd seems to be enough to provoke a response in some
people.
That response carries over to the ongoing unhappiness over the deprecation of
/usr on a separate partition. The developers involved claim that
this configuration has not been supported for years and cannot be counted
on to work. In truth, though, it does work for a lot of people, and those
people are feeling like they are having a useful option taken away from
them. Whether a fork like eudev can address those concerns remains to be seen.
Beyond that, a recent switch to the "kmod" library for
the loading of kernel modules has created a certain amount of irritation;
backing
out that feature is one the first changes made by the eudev
developers. Udev uses kmod for a reason: avoiding modprobe calls
speeds the bootstrap process considerably. But Gentoo developers like fine
control over their systems, and some of them want to use that control to
exclude kmod, which they see as unnecessary bloat or even a potential
security problem. If udev requires kmod, that desire is thwarted, so the
change has to come out.
There was also some discontent over the firmware loading difficulties caused by a udev
change earlier this year. That problem has since been fixed; indeed, it
has been fixed twice: once by loading firmware directly from the kernel,
and once in udev. But some developers have not forgotten the incident and
feel that the current udev maintainers cannot be trusted.
In truth, a bit of concern is understandable. The eudev developers point
to statements like this
one from Lennart Poettering:
Yes, udev on non-systemd systems is in our eyes a dead end, in case
you haven't noticed it yet. I am looking forward to the day when we
can drop that support entirely.
After reading that, it is natural to wonder if the current udev maintainers
can truly be trusted to look after the interests of users who do not plan
to switch to systemd. From there, it is not too hard to imagine maintaining
a fork of udev as an "insurance policy"
against misbehavior in the future.
That said, a better insurance policy might be to establish oneself as a
participating and respected member of the current systemd/udev development
community. The strong personalities found there notwithstanding, it is an
active community with developers representing a number of distributions. A
developer who can work within that group should be able to represent the
interests of a distribution like Gentoo nicely while avoiding the costs of
maintaining a fork of a crucial utility. And, should that strategy fail,
creating a fork of udev remains an option in the future.
But nobody can tell free software developers what they can or cannot work
on, and nobody is trying to tell the eudev developers that creating their
own udev fork is wrong. The situation becomes a bit less clear, though, if
eudev is destined to replace udev within Gentoo itself; then Gentoo users
may well find they have an opinion on the matter. For now, no such
replacement has happened. If it begins to look like that situation could
change, one can imagine that the resulting discussion in the Gentoo
community will be long and entertaining.
Comments (41 posted)
Brief items
The Linux Mint team has
released Linux Mint 14
"Nadia". "
For the first time since Linux Mint 11, the development team was able to capitalize on upstream technology which works and fits its goals. After 6 months of incremental development, Linux Mint 14 features an impressive list of improvements, increased stability and a refined desktop experience. We’re very proud of MATE, Cinnamon, MDM and all the components used in this release, and we’re very excited to show you how they all fit together in Linux Mint 14."
Comments (none posted)
Distribution News
Fedora
Fedora elections for seats on the advisory board, FESCo (Fedora Engineering
Steering Committee) and FAmSCo (Fedora Ambassadors Steering Committee) are
underway. The candidates' responses to questions from the community are
available.
Full Story (comments: none)
A release name for Fedora 19 has been selected. F19 will also be known as Schrödinger's Cat.
Full Story (comments: 3)
Newsletters and articles of interest
Comments (none posted)
Ars Technica has a
review
of Ubuntu 12.10, or Quantal Quetzal. "
One of the more intriguing desktop features for Ubuntu 12.10 is the inclusion of the Web Apps feature trialed in Ubuntu 12.04. Web apps are controls that support various popular Web tools, such as Gmail, Twitter, and Google Docs. Ubuntu includes two such Web apps out of the box: Amazon and Ubuntu One."
Comments (none posted)
Page editor: Rebecca Sobol
Development
By Michael Kerrisk
November 20, 2012
H. Peter Anvin has been involved with Linux for more than 20
years, and is currently one of the x86 architecture maintainers. During his
work on Linux, one of his areas of interest has been the generation and use
of random numbers, and his talk at LinuxCon Europe 2012 was designed to
address a lot of misunderstandings that he has encountered regarding random
numbers. The topic is complex, and so, for the benefit of experts in the
area, he noted that "I will be making some
simplifications". (In addition, your editor has attempted to fill
out some details that Peter mentioned only briefly, and so may have added
some "simplifications" of his own.)
Random numbers
Possibly the first use of random numbers in computer programs was for
games. Later, random numbers were used in Monte Carlo
simulations, where randomness can be used to mimic physical processes. More
recently, random numbers have become an essential component in security
protocols.
Randomness is a subtle property. To illustrate this, Peter displayed a
photograph of three icosahedral dice that he'd thrown at home, saying
"here, if you need a random number, you can use 846". Why
doesn't this work, he asked. First of all, a random number is only random
once. In addition, it is only random until we know what it is. These
facts are not the same thing. Peter noted that it is possible to misuse a
random number by reusing it; this can lead to
breaches in security protocols.
There are no tests for randomness. Indeed, there is a yet-to-be-proved
mathematical conjecture that there are no tractable tests of randomness.
On the other hand, there are tests for some kinds
non-randomness. Peter noted that, for example, we can probably quickly
deduce that the bit stream 101010101010… is not random. However,
tests don't prove randomness: they simply show that we haven't detected any
non-randomness. Writing reliability tests for random numbers requires an
understanding of the random number source and the possible failure modes
for that source.
Most games that require some randomness will work fine even if the
source of randomness is poor. However, for some applications, the quality
of the randomness source "matters a lot".
Why does getting good randomness matter? If the random numbers used to
generate cryptographic keys are not really random, then the keys will be
weak and subject to easier discovery by an attacker. Peter noted a couple
of recent cases of poor random number handling in the Linux ecosystem. In
one of these cases, a well-intentioned
Debian developer reacted to a warning from a code analysis tool by removing
a fundamental part of the random number generation in OpenSSL. As a
result, Debian for a long time
generated only one of 32,767 SSH keys. Enumerating that set of keys
is, of course, a trivial task. The resulting security bug went unnoticed
for more than a year. The problem is, of course, that unless you are
testing for this sort of failure in the randomness source, good random
numbers are hard to distinguish from bad random numbers. In another case,
certain embedded Linux devices have been known to generate a key before
they could generate good random numbers. A weakness along these lines
allowed the Sony PS3 root key to
be cracked [PDF] (see pages 122 to 130 of that presentation, and also
this video of the
presentation, starting at about 35'24").
Poor randomness can also be a problem for storage systems that depend
on some form of probabilistically unique identifiers. Universally unique
IDs (UUIDs) are the classic example. There is no theoretical guarantee
that UUIDs are unique. However, if they are properly generated from truly
random numbers, then, for all practical purposes, the chance of two UUIDs
being the same is virtually zero. But if the source of random numbers is
poor, this is no longer guaranteed.
Of course, computers are not random; hardware manufacturers go to great
lengths to ensure that computers behave reliably and deterministically. So,
applications need methods to generate random numbers. Peter then turned to
a discussion of two types of random number generators (RNGs): pseudo-random
number generators and hardware random number generators.
Pseudo-random number generators
The classical solution for generating random numbers is the so-called
pseudo-random number generator (PRNG):
A PRNG has two parts: a state, which is some set of bits determined by
a "seed" value, and a chaotic process that operates on the state and
updates it, at the same time producing an output sequence of numbers. In
early PRNGs, the size of the state was very small, and the chaotic process
simply multiplied the state by a constant and discarded the high bits.
Modern PRNGs have a larger state, and the chaotic process is usually a
cryptographic primitive. Because cryptographic primitives are usually
fairly slow, PRNGs using non-cryptographic primitives are still sometimes
employed in cases where speed is important.
The quality of PRNGs is evaluated on a number of criteria. For example,
without knowing the seed and algorithm of the PRNG, is it possible to
derive any statistical patterns in or make any predictions about the output
stream? One statistical property of particular interest in a PRNG is its
cycle length. The cycle length tells us how many numbers can be generated
before the state returns to its initial value; from that point on, the PRNG
will repeat its output sequence. Modern PRNGs generally have extremely
long cycle lengths. However, some applications still use
hardware-implemented PRNGs with short cycle lengths, because they don't
really need high-quality randomness. Another property of PRNGs that is of
importance in security applications is whether the PRNG algorithm is
resistant to analysis: given the output stream, is it possible to figure
out what the state is? If an attacker can do that, then it is possible to
predict future output of the PRNG.
Hardware random number generators
The output of a PRNG is only as good as its seed and algorithm, and
while it may pass all known tests for non-randomness, it is not truly
random. But, Peter noted, there is a source of true randomness in the
world: quantum mechanics. This fact can be used to build hardware "true"
random number generators.
A hardware random number generator (HWRNG) consists of a number of
components, as shown in the following diagram:
Entropy is a measure
of the disorder, or randomness, in a system. An entropy source is a device
that "harvests" quantum randomness from a physical system. The process of
harvesting the randomness may be simple or complex, but regardless of that,
Peter said, you should have a good argument as to why the harvested
information truly derives from a source of quantum randomness. Within a
HWRNG, the entropy source is necessarily a hardware component, but the
other components may be implemented in hardware or software. (In Peter's
diagrams, redrawn and slightly modified for this article, yellow indicated
a hardware component, and blue indicated a software component.)
Most entropy sources don't produce "good" random numbers. The source
may, for example, produce ones only 25% of the time. This doesn't negate
the value of the source. However, the "obvious" non-randomness must be
eliminated; that is the task of the conditioner.
The output of the conditioner is then fed into a cryptographically
secure pseudo-random number generator (CSPRNG). The reason for doing
this is that we can better reason about the output of a CSPRNG; by
contrast, it is difficult to reason about the output of the entropy source.
Thus, it is possible to say that the resulting device is at least as secure
as a CSPRNG, but, since we have a constant stream of new seeds, we can be
confident that it is actually a better source of random numbers than a
CSPRNG that is seeded less frequently.
The job of the integrity monitor is to detect failures in the entropy
source. It addresses the problem that entropy sources can fail silently.
For example, a circuit in the entropy source might pick up an induced
signal from a wire on the same chip, with the result that the source
outputs a predictable pattern. So, the job of the integrity monitor is to
look for the kinds of failure modes that are typical of this kind of
source; if failures are detected, the integrity monitor produces an error
indication.
There are various properties of a HWRNG that are important to users.
One of these is bandwidth—the rate at which output bits are produced.
HWRNGs vary widely in bandwidth. At one end, the physical drum-and-ball
systems used in some public lottery draws produce at most a few numbers per
minute. At the other end, some electronic hardware random number sources
can generate output at the rate of gigabits per second. Another important
property of HWRNGs is resistance to observation. An attacker should not be
able to look into the conditioner or CSPRNG and figure out the future
state.
Peter then briefly looked at a couple of examples of entropy sources.
One of these is the recent Bull
Mountain Technology digital random number generator produced by his
employer (Intel). This device contains a logic circuit that is forced into
an impossible state between 0 and 1, until the circuit is released by a CPU
clock cycle, at which point quantum thermal noise forces the circuit
randomly to zero or one. Another example of a hardware random number
source—one that has actually been
used—is a lava lamp. The motion of the liquids inside a lava
lamp is a random process driven by thermal noise. A digital camera can be
used to extract that randomness.
The Linux kernel random number generator
The Linux kernel RNG has the following basic structure:
The structure consists of a two-level cascaded sequence of pools
coupled with CSPRNGs. Each pool is a large group of bits which represents
the current state of the random number generator. The CSPRNGs are
currently based on SHA-1,
but the kernel developers are considering a switch to SHA-3.
The kernel RNG produces two user-space output streams. One of these
goes to /dev/urandom and also to the kernel itself; the latter
is useful because there are uses for random numbers within the kernel. The
other output stream goes to /dev/random. The difference between
the two is that /dev/random tries to estimate how much entropy is
coming into the system, and will throttle its output if there is
insufficient entropy. By contrast, the /dev/urandom stream does
not throttle output, and if users consume all of the available entropy, the
interface degrades to a pure CSPRNG.
Starting with Linux 3.6, if the system provides an architectural HWRNG,
then the kernel will XOR the output of the HWRNG with the output of each
CSPRNG. (An architectural HWRNG is a complete random number
generator designed into the hardware and guaranteed to be available in
future generations of the chip. Such a HWRNG makes its output stream
directly available to user space via dedicated assembler instructions.)
Consequently, Peter said, the output will be even more secure. A member of
the audience asked why the kernel couldn't just do away with the existing
system and use the HWRNG directly. Peter responded that some people had
been concerned that if the HWRNG turned out not to be good enough, then
this would result in a security regression in the kernel. (It is also worth
noting that some
people have wondered whether the design of HWRNGs may have been
influenced by certain large government agencies.)
The inputs for the kernel RNG are shown in this diagram:
In the absence of anything else, the kernel RNG will use the timings of
events such as hardware interrupts as inputs. In addition, certain fixed
values such as the network card's MAC address may be used to initialize the
RNG, in order to ensure that, in the absence of any other input, different
machines will at least seed a unique input value.
The rngd
program is a user-space daemon whose job is to feed inputs (normally from
the HWRNG driver, /dev/hwrng) to the kernel RNG input pool,
after first performing some tests for non-randomness in the input. If
there is HWRNG with a kernel driver, then rngd will use that as input
source. In a virtual machine, the driver is also capable of taking random
numbers from the host system via the virtio system. Starting with Linux
3.7, randomness can also be harvested from the HWRNG of the Trusted
Platform Module (TPM) if the system has one. (In kernels before 3.7,
rngd can access the TPM directly to obtain randomness, but doing
so means that the TPM can't be used for any other purpose.) If the system
has an architectural HWRNG, then rngd can harvest randomness from
it directly, rather than going through the HWRNG driver.
Administrator recommendations
"You really, really want to run rngd", Peter
said. It should be started as early as possible during system boot-up, so
that the applications have early access to the randomness that it provides.
One thing you should not do is the following:
rngd -r /dev/urandom
Peter noted that he had seen this command in several places on the web.
Its effect is to connect the output of the kernel's RNG back into itself,
fooling the kernel into believing it has an endless supply of entropy.
HAVEGE
(HArdware Volatile Entropy Gathering and Expansion) is a piece of
user-space software that claims to extract entropy from CPU nondeterminism.
Having read a number of papers about HAVEGE, Peter said he had been unable
to work out whether this was a "real thing". Most of the papers that he
has read run along the lines, "we took the output from HAVEGE, and
ran some tests on it and all of the tests passed". The problem with
this sort of reasoning is the point that Peter made earlier: there are no
tests for randomness, only for non-randomness.
One of Peter's colleagues replaced the random input source employed by
HAVEGE with a constant stream of ones. All of the same tests passed. In
other words, all that the test results are guaranteeing is that the HAVEGE
developers have built a very good PRNG. It is possible that HAVEGE does
generate some amount of randomness, Peter said. But the problem is that
the proposed source of randomness is simply too complex to analyze; thus it
is not possible to make a definitive statement about whether it is truly
producing randomness. (By contrast, the HWRNGs that Peter described earlier
have been analyzed to produce a quantum theoretical justification that they
are producing true randomness.) "So, while I can't really recommend it, I
can't not recommend it either." If you are going to run HAVEGE,
Peter strongly recommended running it together with rngd,
rather than as a replacement for it.
Guidelines for application writers
If you are writing applications that need to use a lot of randomness,
you really want to use a cryptographic library such as OpenSSL, Peter said.
Every cryptographic library has a component for dealing with random
numbers. If you need just a little bit of randomness, then just use
/dev/random or /dev/urandom. The difference between the
two is how they behave when entropy is in short supply. Reads from
/dev/random will block until further entropy is available. By
contrast, reads from /dev/urandom will always immediately return
data, but that data will degrade to a CSPRNG when entropy is exhausted.
So, if you prefer that your application would fail rather than getting
randomness that is not of the highest qualify, then use
/dev/random. On the other hand, if you want to always be able to
get a non-blocking stream of (possibly pseudo-) random data, then use
/dev/urandom.
"Please conserve randomness!", Peter said. If you are
running recent hardware that has a HWRNG, then there is a virtually
unlimited supply of randomness. But, the majority of existing hardware
does not have the benefit of a HWRNG. Don't use buffered I/O on
/dev/random or /dev/urandom. The effect of performing a
buffered read on one of these devices is to consume a large amount of the
possibly limited entropy. For example, the C library stdio
functions operate in buffered mode by default, and an initial read will
consume 8192 bytes as the library fills its input buffer. A well written
application should use non-buffered I/O and read only as much randomness as
it needs.
Where possible, defer the extraction of randomness as late as possible
in an application. For example, in a network server that needs randomness,
it is preferable to defer extraction of that randomness until (say) the
first client connect, rather extracting the randomness when the server
starts. The problem is that most servers start early in the boot process,
and at that point, there may be little entropy available, and many
applications may be fighting to obtain some randomness. If the randomness
is being extracted from /dev/urandom, then the effect will be that
the randomness may degrade for a CSPRNG stream. If the randomness is being
extracted from /dev/random, then reads will block until the system
has generated enough entropy.
Future kernel work
Peter concluded his talk with a discussion of some upcoming kernel work.
One of these pieces of work is the implementation of a policy interface
that would allow the system administrator to configure certain aspects of
the operation of the kernel RNG. So, for example, if the administrator
trusts the chip vendor's HWRNG, then it should be possible to configure the
kernel RNG to take its input directly from the HWRNG. Conversely, if you
are paranoid system administrator who doesn't trust the HWRNG, it should be
possible to disable use of the HWRNG. These ideas were discussed by some
of the kernel developers who were present at the 2012 Kernel Summit; the
interface is still being architected, and will probably be available
sometime next year.
Another piece of work is the completion of the virtio-rng system. This
feature is useful in a virtual-machine environment. Guest operating
systems typically have few sources of entropy. The virtio-rng system is a
mechanism for the host operating system to provide entropy to a guest
operating system. The guest side of this work was already completed in
2008. However, work on the host side (i.e., QEMU and KVM) got stalled for
various reasons; hopefully, patches to complete the host side will be
merged quite soon, Peter said.
A couple of questions at the end of the talk concerned the problem of
live cloning a virtual machine image, including its memory (i.e., the
virtual machine is not shut down and rebooted). In this case, the
randomness pool of the cloned kernel will be duplicated in each virtual
machine, which is a security problem. There is currently no way to
invalidate the randomness pool in the clones, by (say) setting the clone's
entropy count to zero. Peter thinks a (kernel) solution to this problem is
probably needed.
Concluding remarks
Interest in high-quality random numbers has increased in parallel
with the increasing demands to secure stored data and network
communications with high-quality cryptographic keys. However, as various
examples in Peter's talk demonstrated, there are many pitfalls to be wary
of when dealing with random numbers. As HWRNGs become more prevalent, the
task of obtaining high-quality randomness will become easier. But even if
such hardware becomes universally available, it will still be necessary to
deal with legacy systems that don't have a HWRNG for a long
time. Furthermore, even with a near infinite supply of randomness it is
still possible to make subtle but dangerous errors when dealing with random
numbers.
Comments (82 posted)
Brief items
I think that it'd be cool to have our community be the
community of people who can go wild on the platform - "let a thousand
flowers bloom". That the core GNOME project is solid and useful, but
that we encourage experimentation, respins, freedom for our users. That
seems inconsistent with the current GNOME messaging.
—
Dave Neary
Our patent system is the envy of the world.
—
David Kappos, head of
the United States Patent and Trademark Office
Comments (9 posted)
Ubuntu's James Hunt announced the release of version 1.6 of the Upstart event-driven init system. This release adds support for initramfs-less booting, sbuild tests, and a stateful re-exec, which allows Upstart "to continue to supervise jobs after an upgrade of either itself, or any of its dependent libraries."
Full Story (comments: 1)
Mozilla has released Thunderbird 17. According to the release notes, this version includes layout changes for RSS feeds and for tabs on Windows, and drops support for Mac OS X 10.5, not to mention the usual bundle of bugfixes and minor enhancements.
Full Story (comments: none)
Firefox 17 has been
released. The
release
notes have all the details. Firefox 17 for Android has also been
released, with
separate
release notes.
Comments (17 posted)
At his blog, Allan Day outlines
the next phase of GNOME 3's user experience development, which focuses
on "content applications." The project is aiming to make
it "quicker and less laborious for people to find
content" and subsequently organize it. "To this end,
we’re aiming to build a suite of new GNOME content applications:
Music, Documents, Photos, Videos and Transfers. Each of these
applications aims to provide a quick and easy way to access content,
and will seamlessly integrate with the cloud." New mockups are
available on the GNOME wiki.
Comments (67 posted)
GNOME developer Matthias Clasen has announced that, with the upcoming
demise of "fallback mode," the project will support a set of official GNOME
Shell extensions to provide a more "classic" experience. "
And while
we certainly hope that many users will find the new ways comfortable and
refreshing after a short learning phase, we should not fault people who
prefer the old way. After all, these features were a selling point of
GNOME 2 for ten years!"
Full Story (comments: 204)
Newsletters and articles
Comments (none posted)
Linux color management developers met in Brno, Czech Republic over
the weekend of November 10, and the lead developers from two teams
have subsequently published their recaps of the event: Kai-Uwe
Behrmann of Oyranos, and Richard
Hughes of colord. Developers from GIMP, Taxi, and a host of
printing-related projects were also on hand.
Comments (none posted)
As we start to see more UEFI firmware become available, one would guess we'll find more exciting weirdness like what Matthew Garrett
found. For whatever reason, the firmware in a Lenovo Thinkcentre M92p only wants to boot Windows or Red Hat Enterprise Linux (and, no, it is not secure boot related): "
Every UEFI boot entry has a descriptive string. This is used by the firmware when it's presenting a menu to users - instead of "Hard drive 0" and "USB drive 3", the firmware can list "Windows Boot Manager" and "Fedora Linux". There's no reason at all for the firmware to be parsing these strings. But the evidence seemed pretty strong - given two identical boot entries, one saying "Windows Boot Manager" and one not, only the first would work."
Comments (39 posted)
Libre Graphics World takes
a look at Movit, a new C++ library for video processing on
GPUs. "Movit does all the processing on a GPU with GLSL fragment
shaders. On a GT9800 with Nouveau drivers I was able to get 10.9 fps
(92.1 ms/frame) for a 2971×1671px large image." The library
currently performs color space conversions plus an assortment of
general filters, and is probably best suited for non-linear video
editor projects.
Comments (none posted)
Page editor: Nathan Willis
Announcements
Brief items
The Free Software Foundation Europe looks at a white paper from the German
Ministry of the Interior about "Trusted Computing" and "Secure Boot".
"
The white paper says that "device owners must be in complete control
of (able to manage and monitor) all the trusted computing security systems
of their devices." This has been one of FSFE's key demands from the
beginning. The document continues that "delegating this control to third
parties requires conscious and informed consent by the device owner"."
Full Story (comments: 2)
The Mozilla Foundation has released it's
2011
annual report in a "collection of boxes" format that reminds one of
a recent proprietary operating system release. "
In June 2012 we
released an update to Firefox for Android that we believe is the best
browser for Android available. We completely rebuilt and redesigned the
product in native UI, resulting in a snappy and dynamic upgrade to mobile
browsing that is significantly faster than the Android stock
browser."
Comments (1 posted)
Articles of interest
The November edition of the Linux Foundation Monthly Newsletter covers the
Automotive Grade Linux Workgroup, HP's platinum membership, open clouds,
and several other topics.
Full Story (comments: none)
James Bottomley's
UEFI
bootloader signing experience is worth a read...still a few glitches in
the system. "
Once the account is created, you still can’t upload
UEFI binaries for signature without first signing a paper contract. The
agreements are pretty onerous, include a ton of excluded licences
(including all GPL ones for drivers, but not bootloaders). The most
onerous part is that the agreements seem to reach beyond the actual UEFI
objects you sign. The Linux Foundation lawyers concluded it is mostly
harmless to the LF because we don’t ship any products, but it could be
nasty for other companies."
Comments (41 posted)
The New York Times describes the latest innovation from Apple: a page turning animation for e-readers. Not only is it astonishingly brilliant, it's
patented. "
Apple argued that its patented page turn was unique in that it had a special type of animation other page-turn applications had been unable to create.
[ ... ]
The patent comes with three illustrations to explain how the page-turn algorithm works. In Figure 1, the corner of a page can be seen folding over. In Figure 2, the page is turned a little more. I’ll let you guess what Figure 3 shows."
Comments (55 posted)
Andy Updegrove
covers
a press release from the Portuguese Open Source Business Association on
the government adoption of standard formats for documents. "
[T]he Portuguese government has opted for ODF, the OpenDocument Format, as well as PDF and a number of other formats and protocols, including XML, XMPP, IMAP, SMTP, CALDAV and LDAP. The announcement is in furtherance of a law passed by the Portuguese Parliament on June 21 of last year requiring compliance with open standards (as defined in the same legislation) in the procurement of government information systems and when exchanging documents at citizen-facing government Web sites."
Comments (1 posted)
New Books
No Starch Press has released "Python for Kids" by Jason R. Briggs.
Full Story (comments: none)
Calls for Presentations
There will be an Apache OpenOffice devroom at FOSDEM 2013, to be held
February 2. The call for talks is open until December 23. FOSDEM (Free
and Open Source software Developers' European Meeting) will take place
February 2-3, 2013 in Brussels, Belgium.
Full Story (comments: none)
Upcoming Events
linux.conf.au (LCA) has announced the first of four keynote speakers for
the 2013 conference. "
Andrew "bunnie" Huang is best known as the lead hardware developer of open-source gadget "Chumby"*, a device designed from the ground up as an open source gadget, complete with open source hardware, and whose designers encourage hackers to get into the device and make it their own. He is also the author of "Hacking the Xbox"^, a book about reverse engineering consumer products and the social and practical issues around doing so."
Full Story (comments: none)
Events: November 29, 2012 to January 28, 2013
The following event listing is taken from the
LWN.net Calendar.
| Date(s) | Event | Location |
November 29 November 30 |
Lua Workshop 2012 |
Reston, VA, USA |
November 29 December 1 |
FOSS.IN/2012 |
Bangalore, India |
November 30 December 2 |
Open Hard- and Software Workshop 2012 |
Garching bei München, Germany |
November 30 December 2 |
CloudStack Collaboration Conference |
Las Vegas, NV, USA |
December 1 December 2 |
Konferensi BlankOn #4 |
Bogor, Indonesia |
| December 2 |
Foswiki Association General Assembly |
online and Dublin, Ireland |
| December 5 |
4th UK Manycore Computing Conference |
Bristol, UK |
December 5 December 7 |
Qt Developers Days 2012 North America |
Santa Clara, CA, USA |
December 5 December 7 |
Open Source Developers Conference Sydney 2012 |
Sydney, Australia |
December 7 December 9 |
CISSE 12 |
Everywhere, Internet |
December 9 December 14 |
26th Large Installation System Administration Conference |
San Diego, CA, USA |
December 27 December 29 |
SciPy India 2012 |
IIT Bombay, India |
December 27 December 30 |
29th Chaos Communication Congress |
Hamburg, Germany |
December 28 December 30 |
Exceptionally Hard & Soft Meeting 2012 |
Berlin, Germany |
January 18 January 19 |
Columbus Python Workshop |
Columbus, OH, USA |
January 18 January 20 |
FUDCon:Lawrence 2013 |
Lawrence, Kansas, USA |
| January 20 |
Berlin Open Source Meetup |
Berlin, Germany |
If your event does not appear here, please
tell us about it.
Page editor: Rebecca Sobol