User: Password:
Subscribe / Log in / New account Weekly Edition for January 2, 2014

LWN's unreliable 2014 predictions

By Jonathan Corbet
January 1, 2014
Welcome to the first Weekly Edition for 2014. Tradition says that this issue must carry a set of predictions that, beyond doubt, will look silly and embarrassing by the end of the year. Your editor is not one to go against tradition, especially this time of year when there is a relative scarcity of real news to report. So, without further ado, here are some thoughts on what the Linux and free software community may see in the coming year.

ARM-based server systems will hit the market, and those systems, naturally, will be running Linux. In the process, they will highlight no end of entertaining conflicts between the buttoned-down server space and the rather looser ARM world; the advent of ARM servers will also bring ARM developers closer to the core kernel, where they have had a relatively small presence thus far.

In 2013, we got a confirmation that some of the more paranoid people among us were correct: the Internet is indeed being used for widespread surveillance. In 2014, we will learn how bad the situation really is as ongoing revelations show the extent of the surveillance efforts — and the fact that this activity is not even remotely limited to one often-named US agency. The world is full of nosy agencies, both governmental and otherwise, and many (or most) of them have been taking advantage of technology in any way they could.

Awareness of free software as a tool against surveillance will increase, but it will also become clear that free software is not anywhere near enough. Free software provides a modicum of assurance that it is not operating contrary to its users' interests, but that assurance falls down if the software is not closely reviewed, and the sad truth is that we often have fewer eyeballs on our code than we would like to admit.

There is also the little problem that, increasingly, the hardware that our software is running on cannot be trusted. Contemporary hardware, at all levels down to that of simple memory chips, is running software that is invisible to us; that software can be subverted by any of a number of agencies. Even those who are in favor of NSA surveillance (and such people certainly exist) would do well to pause and think about just where much of that hardware and invisible firmware comes from.

Some possible good news is that progress may be made in the fight against patent trolls this year. The economic costs imposed by trolls have become so widespread and indiscriminate that cries for reform are being heard throughout the US, which is the primary venue in which these entities operate. The US Supreme Court will have an opportunity to restrict software patents this year. To believe that the problem will be solved in 2014 would be recklessly optimistic, but, with luck, the situation will be better at the end of the year than it is at the beginning.

The Debian project will resolve its init system debate early in 2014. Whatever conclusion the Technical Committee comes to will be contentious at best; this does not appear to be an issue around which a strong consensus can be formed. If the Committee can explain its reasoning well enough, the project will pick itself up and move on; most people realize that not every decision goes the way they would like. A poorly considered or overtly political decision would create far more strife, but, given the people involved, that outcome seems highly unlikely.

Predicting the actual decision is rather harder. Your editor suspects that systemd may be chosen in the end, perhaps with a decision to make some Debian-specific changes, but would be unwilling to bet more than a single beer on that outcome.

There will be significant challenges for Android in 2014. Alternative mobile platforms, including Sailfish OS, Tizen, Firefox OS, and Ubuntu will all ship during this year; at least one of those may well prove to be a viable alternative that acquires significant market share. The heavily funded Cyanogen Inc. also has the potential to shake things up in the coming year, should it ever decide what it intends to do. Meanwhile, Google's attempts to maintain control over the platform and concentrate functionality into the proprietary "Play Services" layer will sit poorly with manufacturers and some users. Android will remain a strong and successful platform at the end of the year, but it will be operating in a more competitive market.

ChromeOS will have a good year, building on the surprising success of Chromebook systems — 21% of all notebook systems sold — in 2013. The traditional desktop can be expected to continue to fade in importance as it is pushed aside by systems that, while seemingly being less capable, are able to provide useful functionality in a simple, mobile, and relatively secure manner. The down side of this trend, of course, is that it pushes users into company-controlled systems with central data storage — not necessarily a recipe for the greatest level of freedom. The creation of an alternative system that can achieve widespread adoption while more strongly protecting privacy is one of the biggest challenges our community faces.

The fundamental question of what is a Linux distribution? will continue to attract debate. Some people are unwilling to even see something like Android as a proper distribution, but the diversity of systems running on the Linux kernel will only increase. Lamentation over the twilight of "traditional Unix" will continue, but that will not stop the countless numbers of people who are doing ever more interesting things with Linux.

On the filesystem front, Btrfs will start seeing wider production use in 2014, finally, though users will learn to pick and choose between the various available features. XFS will see increasing use resulting from the growth in file and storage sizes, along with Red Hat's decision to use it by default in RHEL 7. 2014 may just be the year when the workhorse ext4 filesystem starts to look like a legacy system to be (eventually) left behind. Ext4 will still be the most widely deployed Linux filesystem at the end of the year, but it won't be the automatic default choice that it has been for so many years.

The kernel community has never really had to worry about attracting new developers, but new kernel developers may become harder to find in 2014. Kernel development is getting harder to get into, and the level of contributions from unpaid developers has been falling for years. An environment where an understanding of Documentation/memory-barriers.txt is increasingly necessary is going to be off-putting for a lot of potential developers, who may just decide to focus on JavaScript applications instead.

For years, LWN's annual predictions have included a statement to the effect that our community, along with the software it creates, would be stronger than ever come December. That prediction, at least, has always proved accurate. There is no reason to believe that things will be any different in 2014. We are looking forward to telling you all about 2014 as it happens; thanks, once again, to all of our readers who make another year of LWN possible.

Comments (33 posted)

Darktable 1.4

By Nathan Willis
January 1, 2014

Version 1.4 of the open source photo-manipulation tool Darktable was released on December 26, bringing with it several long-requested features. Some of these new additions (such as editing with masks) include new core functionality, while others might best be viewed as exploratory features. Darktable has always included a wide swath of image-manipulation and filtering options that lend themselves well to experimentation. But there are also usability improvements in 1.4, which are especially welcome to users who may find the toolbox-ful of experimental options tricky to navigate.

Zooming in

The 1.4 release is available for download as a source bundle and as an Ubuntu package through the project's personal package archive (PPA). Other distributions rely on community builds, so the corresponding packages may take some time to arrive; there are also Mac OS X installers provided, but no Windows builds. In keeping with the just-after-Christmas release date, the Darktable logo in the new release sports a Santa hat; no telling how well that gag will age.

The project made one other stable release (1.2) since we looked at Darktable 1.1 in December 2012. The key features in that update included a de-noising function that could be tailored to fit the profile of one's actual camera, the ability to apply several instances of each filter to an image (which enables far more complex image manipulation for most filters), and the ability to import images from Adobe Lightroom, preserving most—but not all—of the edit operations performed in Lightroom.

In a sense, the 1.4 release mirrors the 1.2 release: a handful of focused feature additions for image manipulation, an important application-compatibility feature, and a lengthy list of minor tweaks and fixes. But they all add up; Darktable has come a long way from its early days, in which it often felt like a tool with tremendous power that was impossible to decipher and use, thanks to unlabeled controls, cryptic icons and UI elements, and a penchant for hiding settings and dials in out-of-the-way corners of the interface. Darktable 1.4 is a lot closer to a standard, easy-to-discover image processor.

Edit this

Darktable offers four major modes of operation, although two of those are likely to get considerably less usage on average. The emphasis is primarily on the "lighttable" mode for browsing and organizing an image collection and the "darkroom" mode for image editing. The other two options are "tethering" mode for controlling a USB-attached camera through the application and "map" mode, which simply offers a world-map view of any georeferenced images in the collection.

Darkroom mode is where the fun happens, of course. In the early days, Darktable would have been described as a "raw converter" because it focused on manipulating raw image file formats from digital cameras. Today it is a bit more general, supporting essentially every image format available, thanks to import filters from GraphicsMagick. In darkroom mode, the interface shows an image histogram in the top right corner of the window, and users can activate any of fifty (in version 1.4) different filter and effect modules. The modules range from simple operations like cropping to special effects like "bloom" (which simulates the washed-out halo effect usually seen when directly photographing a light source like a bulb or streetlight).

In practice, the main challenge in using Darktable's filter modules is keeping them straight in one's mind. Each module has its own sliders and controls; the active modules in an image are stacked on top of each other in a column underneath the histogram, so that the user must scroll up and down through the column to see them all. There is barely enough vertical space in the interface to see more than a couple of modules at a time, and there is even less room if the user opens up the palette of modules to select another one.

This interface challenge is shared by virtually all raw photo editing applications, and no one has come up with a particularly good solution to it yet. Darktable 1.4 does make an attempt at improvement in this area, though: a new preference will keep only one module "expanded" at a time; clicking on another module opens the new module and collapses the others. It isn't perfect, but neither are the alternatives.


[Darktable 1.4]

The new showcase feature in 1.4 is support for masking off part of the image, so that a filter module only applies to part of the image. There are five mask types available: circles, ellipses, gradients, closed bezier paths, and arbitrary shapes drawn with a brush. All are resizable and allow a fade-out radius for smooth blending—in fact, the "gradient" mask is just a fade-out projected from a straight line. Users can create as many masks as they want, and name or rename them in the "mask manager" panel (which sits on the left side of the window, so it does not steal more room from the filter module list).

The easiest way to use a mask with a filter is to create the mask first. Then the filter needs to be switched on, at which point the user can change the filter's "blend" control to "drawn mask." That setting restricts the filter to the masked area. The mask can be inverted or blurred if further customization is required.

Not every filter module can be masked, however: essentially, filters that operate on all of an image's pixels can be masked (including color, sharpness, and many of the special effects modules), but modules for other effects cannot. The un-maskable modules include controls for how the original image is imported (white balance, demosaicing algorithm, and so forth).

It is hard to overstate just how powerful masking filters are in the image-editing process. Without them, almost all effects are limited to operating on the entire picture; users who need to treat one part of the image differently from another would most likely perform part of the corrections in Darktable, then export the result for further work in another program. There are a few quirks in this implementation—for example, I could not get Darktable to allow me to rearrange existing nodes on a path mask, which means the mask needs to be drawn perfectly the first time through. But the feature is a considerable step up in Darktable's power.

Scripting, modules, and other interesting bits

Also powerful, at least in theory, is 1.4's support for scripting with Lua. At the moment, there are not many Lua scripts available for Darktable, but that should change over time. Scripting languages can be a divisive subject, of course; it seems plausible that the choice of Lua was made because Adobe Lightroom uses Lua for scripting, but there are several other open source image-processing applications that also support Lua scripts.

There are several other user-visible changes introduced in 1.4, although they are not as potentially important as masking and scripting. One of them is automatic focus detection in lighttable mode. When there are multiple shots of a particular subject, it is usually important to find the sharpest image; the focus detection feature will highlight sharp zones in an image in red, and blurry zones in blue, so the user can quickly narrow down the choice of which shots to work on.

There are also three new filter modules: "contrast/brightness/saturation," "color balance," and "color mapping." The first of those is a straightforward set of sliders for the image attributes in its name, while the second two require a bit more explanation. "Color balance" allows the user to manipulate the image with gamma and gain controls, while "color mapping" allows the user to swap colors in the image, with a considerable level of control.

Most color manipulations can be arrived at in multiple ways, of course; what makes any one filter module useful is whether it is intuitive to use or simplifies a specific operation. On that front, "color balance" and "color mapping" might take some getting used to. Fortunately, as is usually the case, Darktable provides several sensible defaults to help users get started.

A new feature that might require a bit more experimentation to get the hang of is Darktable's "waveform" mode for the image histogram. Usually, an image histogram graphs the number of pixels from an image on a dark-to-light axis, so that a balanced image will not show a graph that skews significantly to the dark end or the light end. The same is true for color histograms, which chart the amount of red, green, and blue: a balanced image shows all three colors in roughly equal proportions.

[Darktable 1.4]

The new waveform histogram mode is a different beast entirely: its x-axis plots the horizontal position of the image itself, but the y-axis shows brightness, and the color of the actual point on the graph maps how many pixels in the image to the intensity of the point. Confused? Not to worry; you are not alone. This is certainly an experimental way to examine an image; with some playing around it is possible to get a feel for how the waveform plots change in response to different images and different filters. But it is hard to say so far how useful it will prove in practice.

Last but certainly not least, Darktable 1.4 includes some new utility functions, such as the ability to analyze a sample image to create a "base curve" for a camera (i.e., to profile the specific camera unit's output), and the ability to query the operating system's color management configuration.

For casual photographers, Darktable is one of the best options for creative image adjustment because its stable of filter modules offer decent presets and defaults, and because it renders the results to screen rapidly. In contrast, an application like Rawstudio can perform many of the same effects, but the user needs to play around with individual controls to achieve them. For straightforward color correction, though, Darktable's lengthy list of modules may prove intimidating, and the interface—although it has made many positive strides in recent years—can still be confusing to navigate. For demanding photographers, however, Darktable 1.4 is sure to be a favorite on the strength of masking support alone. The rest is simply gravy.

Comments (11 posted)


By Nathan Willis
January 1, 2014

Having more data is almost always a plus—in theory. But out in the real world, data sets are messy, inconsistent, and incomplete. Correlating multiple data sources introduces a whole new set of problems: matching up fields and keys that are labeled differently, use different units, or have other incompatibilities. These are the issues that OpenRefine sets out to address. It is a tool for sanitizing and standardizing data sets, hopefully through a faster and easier process than a full manual clean up would require. But it also allows the user to explore the data along the way, potentially providing some insights that should be followed up with a more detailed data-mining or visualization program.

The bigger the data, the bigger the headaches

OpenRefine was initially a Google product called (naturally) Google Refine. The company stopped working on the code in October 2012, however, so it was re-branded and re-launched as a standalone open source project a short time later. The application is a self-contained Java program that runs on a local HTTP port through the Jetty web server. Users access the front end through their browser, but nothing ever leaves the local machine—a fact that the project emphasizes several times in the documentation and tutorials. One key reason for this concern is that Google Refine / OpenRefine was initially positioned for use in "data-driven journalism," which often involves downloading government or commercial data sets and then hunting through them for patterns. Keeping potentially sensitive projects private is obviously a recurring concern for that use case.

On one hand, the focus on data-driven journalism probably does not matter to most other users; the tool set works just as well for any type of data, from sensor readings to web server logs. On the other hand, the journalism angle does drive some of the application's most useful tools, which are designed to quickly cut through the problems commonly encountered in public data sets. This includes a lot of problems in text fields: inconsistent labels, spacing, the use of acronyms in one place but not in another, and so on.

If you are (for example) simply logging the temperature and electricity usage in your house to look for inefficiencies, you probably care far more about numeric fields, but the text manipulation tools could prove invaluable for other tasks, like exploring wordy log files.

I put OpenRefine to the test against several categories of data: sensor readings from a health tracker, a state government data set listing school district performance ratings, and the backend logs from a MythTV server. In short, the messier the original data and the larger the data set, the more valuable pre-processing it with OpenRefine will be. To get started, you can download compressed tarballs from the OpenRefine web site; at the moment two versions are available: Google Refine 2.5 (the last release branded as such), and a beta build of the upcoming OpenRefine 2.6 release. 2.6 does not introduce much in the way of new features; apart from the re-branding work the majority of the changes are bugfixes. In both cases, you unpack the archive and run the ./refine script inside to launch the program.

Data massaging 101

[OpenRefine parsing a
data set]

The workflow consists of loading in a data set, transforming and sanitizing it as much as required, then exporting the result as another file. Along the way, one can do some exploration of the data itself, but generating publication-ready graphs, maps, or other output is out of scope. OpenRefine can import many different file types: comma-, tab-, or space-delimited text, spreadsheets (including ODF and Microsoft Excel), plus RDF, XML, JSON, and more. It can also retrieve files remotely by URL, which makes it useful for regularly-updated data publications.

It cannot, though, read from an actual database—the expectation, after all, is that the data needs massaging and correcting; a proper database with well-defined columns and records is hopefully unlikely to need such work, and if it does, working on it with proper database tools is probably better.

The sanitization process begins with the import itself; OpenRefine will do its best to auto-discover column delimiters, header text, and other structural information, but before it even completes the import, it shows the user a preview of the first few rows so that corrections can be made. The use of limited previews is a recurring theme; the operations OpenRefine can use to transform the data are executed on the full data set, but only a subset is ever shown on screen.

[OpenRefine with several
facets on data set]

The primary data-cleaning process is done, after import, in the Facet/Filter screen. "Facets" are akin to smart "views" into a data column. You activate a facet on a column by clicking the column header. The facet type chosen (text, numeric, timeline, or scatterplot) triggers a quick analysis of the column for commonly-needed corrections. Activating a numeric facet brings up a bar chart showing the range of values seen and highlighting problematic entries (such as undefined "NaN"s or empty cells).

Activating a text facet brings up a list of values that can be further refined by automatically detecting what seem to be similar values: in my MythTV log file test, for example, there were quite a few lines that had extraneous spaces surrounding some fields, which OpenRefine could repair with a single click. In government data sets, a lot of the online documentation notes that inconsistent acronym usage, stray HTML character-code entries, and misspellings are common problems that OpenRefine facets can make short work of.

Apart from flagging fields for correction, facets can also be used to narrow down the portion of the data set examined. For example, sliding the handles on either side a numeric facet's bar chart will winnow down the range of entries shown in the user interface. You could export a subset of the data quite easily at that point, but narrowing down to a sub-range can also reveal further problems that need correction. For example, a few outliers that are abnormally high numbers might simply be expressed in the wrong units (e.g., $1999 instead of $19.99); by zooming in with the facet, you can click on the "change" button and apply a transformation. In this example, a simple value/100 is all that is required for the fix, but there is actually a fairly comprehensive expression language defined with string, array, math, and date functions, control structures, and variables.

[OpenRefine locating similar clusters]

"Filters" are the other main option for data massaging, although the line between a filter and a facet can be a bit hazy. As one might expect, cell contents can be filtered with regular expressions or through the built-in expression language. The difference is that activating a filter on a column does not analyze the data set for the common problems that a facet offers quick corrections for.

Facets and filters are both transient changes to the current on-screen view of the data; when activated they pop up as a box on the left-hand side of the window, and they can be closed to restore the unaltered view on the data. Their primary purpose is to isolate subsets of the whole data set, in order to fix specific problem cells. But more permanent and direct editing of the data set is possible as well; a variety of cell transformations come predefined (from simple letter-case changes to parsing HTML markup), and you can re-order, split, combine, and transpose cells, rows, and columns. The cell transformation functions are "more" permanent in the sense that they alter the table's contents, but one important feature of OpenRefine is infinite undo/redo for projects; the entire operation history is preserved and can be rolled back, even across sessions.

One of the more useful cell transformations is "Add column based on this column," which allows you to write an expression (in the built-in expression language) that evaluates the cells in one column to create data for another. This is where the conditionals and control structures prove valuable; the filter and facet tools can already alter cells based on mathematical expressions, but tossing if/then constructs and branching logic into the expression crosses over into "new data" territory. Deriving new columns from the existing ones is more powerful than simply transforming existing cell content—plus, it does not overwrite cells that could be useful for other purposes.

If it seems like there are multiple ways to alter cell contents in OpenRefine (i.e., facets/filters and the transformation tools for entire columns), that is because a lot of work went into making the facet/filter functionality powerful, since it enables the user to find and repair a subset of the data. You can still transform an entire column all at once, but the ability to zero in on a subset consisting of bad or misformatted cells is a key part of the data sanitization process.

Data out

Admittedly, complex cell transformations are pushing the envelope of "data sanitization," and can easily cross over into territory that would be better handled by a data mining tool like Orange or a statistical program like R. It is a personal choice, of course, and there is nothing wrong with doing data mining in OpenRefine; most users will export their data sets (into one of the same file formats supported as input) for use in another application. But OpenRefine does not make very deep exploration easy—the statistical functions in the expression language are basic, and there is no real visualization tool (the "scatterplot" facet will do a basic X-Y graph of two columns, but that is about all).

Where OpenRefine shines is in pinpointing problems with the data set itself—problems that the set has as a document, so to speak. And, in reality, this is a huge concern. My own brief excursion to try and find a government data set to experiment with was 90% failure, thanks to everything from dead links, mislabeled and unlabeled files, "databases" published as PDFs, and just about every other flaw imaginable. Then when I found a working data set, almost every column was an unlabeled acronym filled in with single letters to which there was no key.

A tool like OpenRefine does not make repairing all of those problems trivial, but for a data set not selected at random one would hope that making sense of it would be easier. I did, for example, find it much easier to use for purging bad data from my sensor logs and for weeding out unnecessary MythTV error messages. In any case, the tools for making the repairs are easy to use, and the program does a good job of quickly analyzing and locating problem areas. It is hard to get too specific about all that OpenRefine can do without picking a real-world task and pursuing it in depth. Fortunately, there are a lot of published tutorials that do just that; they are recommended reading for anyone still unsure if OpenRefine meets their needs.

It is not clear at present just how active the OpenRefine project is; the 2.6 beta was released in August and there have been precious few commits since. The program does seem to do what most of its users want it to, but it also seems obvious that Google's withdrawal of support has left the effort with far fewer developers. That never bodes well for a project, of course, but there appear to be plenty of dedicated users (based on the mailing list and wiki traffic), and some of them are even extension writers, so the project may still find its footing as an independent community. Hopefully so, because an easy-to-use data exploration tool is a valuable thing to have at one's beck and call, regardless of the data being explored.

Comments (3 posted)

Page editor: Jonathan Corbet


A new Dual EC DRBG flaw

By Jake Edge
January 1, 2014

The dual elliptic curve deterministic random bit generator (Dual EC DRBG) cryptographic algorithm has a dubious history—it is believed to have been backdoored by the US National Security Agency (NSA)—but is mandated by the FIPS 140-2 US government cryptographic standard. That means that any cryptographic library project that is interested in getting FIPS 140-2 certified needs to implement the discredited random number algorithm. But, since certified libraries cannot change a single line—even to fix major, fatal bugs—having a non-working version of Dual EC DRBG may actually be the best defense against the backdoor. Interestingly, that is exactly where the OpenSSL project finds itself.

OpenSSL project manager Steve Marquess posted the tale to the openssl-announce mailing list on December 19. It is, he said, "an unusual bug report for an unusual situation". It turns out that the Dual EC DRBG implementation in OpenSSL is fatally flawed, to the point where using it at all will either crash or stall the program. Given that the FIPS-certified code cannot be changed without invalidating the certification, and that the bug has existed since the introduction of Dual EC DRBG into OpenSSL, it is clear that no one has actually used that algorithm from OpenSSL. It did, however, pass the testing required for the certification somehow.

It is also interesting to note that the financial sponsor of the feature adding support for Dual EC DRBG, who is not named, did so after the algorithm was already known to be questionable. It was part of a request to implement all of SP 800-90A, which is a suite of four DRBGs that Marquess called "more or less mandatory" for FIPS certification. At the time, the project recognized the "dubious reputation" for Dual EC DRBG, but also considers OpenSSL to be a comprehensive library and toolkit: "As such it implements many algorithms of varying strength and utility, from worthless to robust." Dual EC DRBG was not even enabled by default, but it was put into the library.

The bug was discovered by Stephen Checkoway and Matt Green of the Johns Hopkins University Information Security Institute, Marquess said. Though there is a one-line patch to fix the problem included with the bug report, there are no plans to apply it. Instead, OpenSSL will be removing the Dual EC DRBG code from its next FIPS-targeted version. The US National Institute of Standards and Technology (NIST), which oversees FIPS and other government cryptography standards, has recently recommended not using Dual EC DRBG [PDF]. Since that recommendation, Dual EC DRBG has been disabled in OpenSSL anyway. Because there is essentially the same amount of testing required for fixing or removing the algorithm (for FIPS recertification), removal seems like the right course.

The problem stems from a requirement in FIPS that each block of output random numbers not match the previous block. It is, effectively, a crude test that the algorithm is actually producing random-looking data (and not repeating blocks of zeroes, for example). When there is no previous block to compare against, OpenSSL generates one that should be discarded after the comparison. But the Dual EC DRBG implementation botched the discard operation by not updating the state correctly.

Dual EC DRBG was under suspicion for other reasons even before it was adopted by NIST in 2006. In 2007, Bruce Schneier raised the alarm about an NSA backdoor in the algorithm. For one thing, Dual EC DRBG is different than the other three algorithms specified in SP 800-90A in that it is three orders of magnitude slower and that it was only added at the behest of the NSA. It was found that the elliptic curve constants chosen by NIST (with unspecified provenance) could be combined with another set of numbers—not generally known, except possibly by the NSA—to predict the output of the random number generator after observing 32 bytes of its output. Those secret numbers could have been generated at the same time the EC constants were, but it is unknown if they actually were.

The NIST standards were a bit unclear about whether the EC constants were required, but Marquess noted that the testing lab required using the constants (aka "points"):

SP800-90A allows implementers to either use a set of compromised points or to generate their own. What almost all commentators have missed is that hidden away in the small print (and subsequently confirmed by our specific query) is that if you want to be FIPS 140-2 compliant you MUST use the compromised points. Several official statements including the NIST recommendation don't mention this at all and give the impression that alternative uncompromised points can be generated and used.

So, what we have here is a likely backdoored algorithm that almost no one used (evidently unless they were paid $10 million) added to an open-source cryptography library funded by money from an unnamed third party. After "rigorous" testing, that code was certified as conforming to a US government cryptographic standard, but it never actually worked at all. According to Marquess: "Frankly the FIPS 140-2 validation testing isn't very useful for catching 'real world' problems."

It is almost comical (except to RSA's BSafe customers, anyway), but it does highlight some fundamental problems in the US (and probably other) government certification process. Not finding this bug is one thing, but not being able to fix it (or, more importantly, being unable to fix a problem in an actually useful cryptographic algorithm) without spending lots of time and money on recertification seems entirely broken. The ham-fisted way that the NSA went about putting the backdoor into the standard is also nearly amusing. If all its attempts were similarly obvious and noisy, we wouldn't have much to worry about—unfortunately that seems unlikely to be the case.

One other thing to possibly consider: did someone on the OpenSSL project "backdoor" the Dual EC DRBG implementation such that it could never work, but would pass the certification tests? Given what was known about the algorithm and how unlikely it was that it would ever be used by anyone with any cryptographic savvy, it may have seemed like a nice safeguard to effectively disable the backdoor. Perhaps that is far-fetched, but one can certainly imagine a developer being irritated by having to implement the NSA's broken random number generator—and doing something about it. Either way, we will probably never really know for sure.

Comments (22 posted)

Brief items

Security quotes of the week

I’m willing to believe you were tricked in 2004, RSA. I’m not willing to believe that you were the only people on the planet too dumb to avoid Dual EC after 2007. At some point, you figured it out.

If there are any other skeletons in the closet, it’s probably a good time to air them out before we find out there’s other things you repeatedly did not disclose. Look on the bright side: can it really be any worse than that time you had to replace every single freakin’ token in the world?

Melissa Elliott

I don’t really expect your multibillion dollar company or your multimillion dollar conference to suffer as a result of your deals with the NSA. In fact, I'm not expecting other conference speakers to cancel. Most of your speakers are American anyway – why would they care about surveillance that’s not targeted at them but at non-americans. Surveillance operations from the US intelligence agencies are targeted at foreigners. However I’m a foreigner. And I’m withdrawing my support from your event.
Mikko Hypponen withdraws from the RSA conference

The White House's review of the underwear bomb plot concluded that there was sufficient information known to the U.S. government to determine that AbdulMutallab was likely working for al Qaeda in Yemen and that the group was looking to expand its attacks beyond Yemen. Yet AbdulMutallab was allowed to board a plane bound for the United States without any question.

All of these serious terrorism cases argue not for the gathering of ever vaster troves of information but simply for a better understanding of the information the government has already collected and that are derived from conventional law enforcement and intelligence methods.

Peter Bergen

It also creates a problem for companies like Cisco and Juniper, who now face the same sort of scrutiny the US and others put Huawei under for its connections to the Chinese military. Even if Dell, HP, Cisco, and Juniper had no hand in creating the backdoors for their products, the Snowden documents will undoubtedly be used against them the next time they try to sell hardware to a foreign government.
Sean Gallagher in ars technica on more Snowden NSA revelations

Comments (none posted)

GNUnet 0.10.0 released

The GNUnet secure peer-to-peer networking framework has released version 0.10.0. "This release represents a major overhaul of the cryptographic primitives used by the system. GNUnet used RSA 2048 since its inception in 2001, but as of GNUnet 0.10.0, we are "powered by Curve25519". Naturally, changing cryptographic primitives like this breaks backwards compatibility entirely. We have used this opportunity to implement protocol improvements all over the system." GNUnet provides four applications: anonymous censorship-resistant file-sharing, a virtual private network (VPN) service, the GNU name system (GNS) a fully-decentralized and censorship resistant replacement for DNS, and GNUnet Conversation that allows voice calls to be made over GNUnet.

Full Story (comments: 7)

Huang: On Hacking MicroSD Cards

Worth a read: this posting by Andrew "bunnie" Huang on loading new firmware into a MicroSD card. "From the security perspective, our findings indicate that even though memory cards look inert, they run a body of code that can be modified to perform a class of MITM attacks that could be difficult to detect; there is no standard protocol or method to inspect and attest to the contents of the code running on the memory card’s microcontroller. Those in high-risk, high-sensitivity situations should assume that a 'secure-erase' of a card is insufficient to guarantee the complete erasure of sensitive data."

Comments (16 posted)

New vulnerabilities

aaa_base: incorrect /etc/shadow permissions

Package(s):aaa_base CVE #(s):CVE-2013-3713
Created:December 27, 2013 Updated:January 1, 2014

From the openSUSE advisory:

On systems installed via the Live Media that /etc/shadow file was readable by the "users" group, which was not intended. (bnc#843230, CVE-2013-3713)

Reason for this was that the user "root" was put into the "users" group.

openSUSE openSUSE-SU-2013:1955-1 aaa_base 2013-12-25

Comments (none posted)

ack: code execution

Package(s):ack CVE #(s):CVE-2013-7069
Created:December 20, 2013 Updated:January 28, 2014

From the Red Hat bug report:

A flaw was found in the way ack, a tool similar to grep, processed .ackrc files. If a local user ran ack in an attacker-controlled directory, it would lead to arbitrary code execution with the privileges of the user running ack. This issue affects versions 2.00 to 2.10 (such as the version in Fedora 19), and should be fixed in version 2.12. It does not affect versions below 2.00 (such as those in EPEL).

openSUSE openSUSE-SU-2014:0142-1 ack 2014-01-28
Fedora FEDORA-2013-23206 ack 2013-12-20
Fedora FEDORA-2013-23197 ack 2013-12-20

Comments (none posted)

asterisk: denial of service

Package(s):asterisk CVE #(s):CVE-2013-7100
Created:December 23, 2013 Updated:January 8, 2014
Description: From the Mageia advisory:

Buffer overflow in the unpacksms16 function in apps/app_sms.c in Asterisk Open Source 1.8.x before, 10.x before 10.12.4, and 11.x before 11.6.1; Asterisk with Digiumphones 10.x-digiumphones before 10.12.4-digiumphones; and Certified Asterisk 1.8.x before 1.8.15-cert4 and 11.x before 11.2-cert3 allows remote attackers to cause a denial of service (daemon crash) via a 16-bit SMS message.

Mageia MGASA-2014-0171 asterisk 2014-04-15
Gentoo 201401-15 asterisk 2014-01-21
Fedora FEDORA-2013-24142 asterisk 2014-01-08
Fedora FEDORA-2013-24119 asterisk 2014-01-08
Fedora FEDORA-2013-24108 asterisk 2014-01-08
Debian DSA-2835-1 asterisk 2014-01-05
Mandriva MDVSA-2013:300 asterisk 2013-12-23
Mageia MGASA-2013-0384 asterisk 2013-12-23

Comments (none posted)

boinc-client: denial of service

Package(s):boinc-client CVE #(s):CVE-2013-2298
Created:December 27, 2013 Updated:January 1, 2014

From the Red Hat bugzilla entry:

Multiple stack overflow flaws were found in the way the XML parser of boinc-client, a Berkeley Open Infrastructure for Network Computing (BOINC) client for distributed computing, performed processing of certain XML files. A rogue BOINC server could provide a specially-crafted XML file that, when processed would lead to boinc-client executable crash.

Mageia MGASA-2014-0460 boinc-client 2014-11-21
Fedora FEDORA-2013-23720 boinc-client 2013-12-27
Fedora FEDORA-2013-23734 boinc-client 2013-12-27

Comments (none posted)

denyhosts: denial of service

Package(s):denyhosts CVE #(s):CVE-2013-6890
Created:December 23, 2013 Updated:January 5, 2015
Description: From the Debian advisory:

Helmut Grohne discovered that denyhosts, a tool preventing SSH brute-force attacks, could be used to perform remote denial of service against the SSH daemon. Incorrectly specified regular expressions used to detect brute force attacks in authentication logs could be exploited by a malicious user to forge crafted login names in order to make denyhosts ban arbitrary IP addresses.

Fedora FEDORA-2014-17081 denyhosts 2015-01-05
Fedora FEDORA-2014-17067 denyhosts 2015-01-05
Gentoo 201406-23 denyhosts 2014-06-26
Debian DSA-2826-2 denyhosts 2014-01-23
Debian DSA-2826-1 denyhosts 2013-12-22
Mageia MGASA-2014-0080 denyhosts 2014-02-17

Comments (none posted)

devscripts: command execution

Package(s):devscripts CVE #(s):CVE-2013-7050
Created:December 23, 2013 Updated:January 1, 2014
Description: From the CVE entry:

The get_main_source_dir function in scripts/ in devscripts before 2.13.8, when using USCAN_EXCLUSION, allows remote attackers to execute arbitrary commands via shell metacharacters in a directory name.

Fedora FEDORA-2013-23192 devscripts 2013-12-21

Comments (none posted)

eucalyptus: denial of service and information disclosure

Package(s):eucalyptus CVE #(s):CVE-2012-4067 CVE-2013-2296
Created:January 1, 2014 Updated:January 1, 2014
Description: Eucalyptus contains two vulnerabilities in the "Walrus" object store. An XML parsing problem (CVE-2012-4067, ESA-09) can enable unspecified denial of service attacks, while a missing authentication step (CVE-2013-2296, ESA-10) could allow unauthorized access to the internal bucket logs.
Fedora FEDORA-2013-6117 eucalyptus 2013-12-19

Comments (none posted)

horizon: information disclosure

Package(s):horizon CVE #(s):CVE-2013-6858
Created:December 20, 2013 Updated:April 4, 2014

From the Ubuntu advisory:

Chris Chapman discovered cross-site scripting (XSS) vulnerabilities in Horizon via the Volumes and Network Topology pages. An authenticated attacker could exploit these to conduct stored cross-site scripting (XSS) attacks against users viewing these pages in order to modify the contents or steal confidential data within the same domain.

openSUSE openSUSE-SU-2015:0078-1 openstack-dashboard 2015-01-19
Red Hat RHSA-2014:0365-01 python-django-horizon 2014-04-03
Ubuntu USN-2062-1 horizon 2013-12-19

Comments (none posted)

keystone: access control bypass

Package(s):keystone CVE #(s):CVE-2013-6391
Created:December 20, 2013 Updated:April 7, 2014

From the Ubuntu advisory:

Steven Hardy discovered that Keystone did not properly enforce trusts when using the ec2tokens API. An authenticated attacker could exploit this to retrieve a token not scoped to the trust and elevate privileges to the trustor's roles.

Fedora FEDORA-2014-4210 openstack-keystone 2014-04-05
Red Hat RHSA-2014:0368-01 openstack-keystone 2014-04-03
Red Hat RHSA-2014:0089-01 openstack-keystone 2014-01-22
Ubuntu USN-2061-1 keystone 2013-12-19

Comments (none posted)

libgadu: missing ssl certificate validation

Package(s):libgadu CVE #(s):CVE-2013-4488
Created:December 30, 2013 Updated:September 24, 2014
Description: From the Red Hat bugzilla:

Libgadu, an open library for communicating using the protocol e-mail, was found to have missing the ssl certificate validation. The issue is that libgadu uses openSSL library for creating secure connections. A program using openSSL can perform SSL handshake by invoking the SSL_connect function. Some certificate validation errors are signaled through, the return values of the SSL_connect, while for the others errors SSL_connect returns OK but sets internal "verify result" flags. Application must call ssl_get_verify_result function to check if any such errors occurred. This check seems to be missing in libgadu. And thus a man-in-the-middle attack is possible failing all the SSL protection.

Upstream suggested that it was a conscious decision as libgadu is reverse-engineered implementation of a proprietary protocol, they had no control over the certificates used for SSL connections, so they would add a note to the documentation about this.

Gentoo 201508-02 libgadu 2015-08-15
Mandriva MDVSA-2014:185 libgadu 2014-09-24
Mageia MGASA-2014-0375 libgadu 2014-09-15
Fedora FEDORA-2013-23260 libgadu 2013-12-28
Fedora FEDORA-2013-23517 libgadu 2013-12-28

Comments (none posted)

libreswan: denial of service

Package(s):libreswan CVE #(s):CVE-2013-4564
Created:December 23, 2013 Updated:January 1, 2014
Description: From the Red Hat bugzilla:

As noted in bug #1031818, libreswan suffers from a problem with the new ike_pad= feature that was implemented in version 3.6:

During an effort to ignore IKEv2 minor version numbers as required for RFC-5996, complete parse errors of any IKE packets with version 2.1+ were mistakenly accepted for further processing. This causes a crash later on if the IKE packet is mangled (e.g. too short). Openswan turns out not to be vulnerable because it happens to abort on the mismatched IKE length versus packet length before it inspects the rest of the IKE header. And since reading an invalid IKE major aborts further parsing of the IKE header, the length remains at 0, and so it will always mismatch.

Fedora FEDORA-2013-23250 libreswan 2013-12-23
Fedora FEDORA-2013-23315 libreswan 2013-12-23
Fedora FEDORA-2013-23299 libreswan 2013-12-23

Comments (none posted)

memcached: multiple vulnerabilities

Package(s):memcached CVE #(s):CVE-2013-7239 CVE-2013-0179
Created:January 1, 2014 Updated:February 3, 2014
Description: From the Debian advisory:

CVE-2011-4971: Stefan Bucur reported that memcached could be caused to crash by sending a specially crafted packet.

CVE-2013-7239: It was reported that SASL authentication could be bypassed due to a flaw related to the management of the SASL authentication state. With a specially crafted request, a remote attacker may be able to authenticate with invalid SASL credentials.

openSUSE openSUSE-SU-2014:0951-1 memcached 2014-07-30
openSUSE openSUSE-SU-2014:0867-1 memcached 2014-07-03
Gentoo 201406-13 memcached 2014-06-14
Mageia MGASA-2014-0018 memcached 2014-01-21
Mandriva MDVSA-2014:010 memcached 2014-01-17
Ubuntu USN-2080-1 memcached 2014-01-13
Debian DSA-2832-1 memcached 2014-01-01
Fedora FEDORA-2014-0934 memcached 2014-02-03
Fedora FEDORA-2014-0926 memcached 2014-02-03
Oracle ELSA-2016-2819 memcached 2016-11-22

Comments (none posted)

openssl: multiple vulnerabilities

Package(s):openssl CVE #(s):CVE-2013-6450 CVE-2013-6449
Created:January 1, 2014 Updated:December 29, 2014
Description: From the Debian advisory:

Multiple security issues have been fixed in OpenSSL: The TLS 1.2 support was susceptible to denial of service and retransmission of DTLS messages was fixed. In addition this updates disables the insecure Dual_EC_DRBG algorithm (which was unused anyway, see for further information) and no longer uses the RdRand feature available on some Intel CPUs as a sole source of entropy unless explicitly requested.

Fedora FEDORA-2014-17587 mingw-openssl 2015-01-02
Gentoo 201412-39 openssl 2014-12-25
Oracle ELSA-2014-1652 openssl 2014-10-16
Fedora FEDORA-2014-1567 mingw-openssl 2014-01-28
Mandriva MDVSA-2014:007 openssl 2014-01-17
Mageia MGASA-2014-0012 openssl 2014-01-17
Fedora FEDORA-2014-0476 openssl 2014-01-10
Slackware SSA:2014-013-02 openssl 2014-01-13
openSUSE openSUSE-SU-2014:0049-1 openssl 2014-01-12
openSUSE openSUSE-SU-2014:0048-1 openssl 2014-01-11
Fedora FEDORA-2014-0474 openssl 2014-01-12
Ubuntu USN-2079-1 openssl 2014-01-09
Fedora FEDORA-2014-0456 openssl 2014-01-10
Scientific Linux SLSA-2014:0015-1 openssl 2014-01-09
Oracle ELSA-2014-0015 openssl 2014-01-08
CentOS CESA-2014:0015 openssl 2014-01-08
Red Hat RHSA-2014:0015-01 openssl 2014-01-08
Debian DSA-2833-1 openssl 2014-01-01
Fedora FEDORA-2014-1560 mingw-openssl 2014-02-04

Comments (none posted)

openssl: denial of service

Package(s):openssl CVE #(s):CVE-2013-6449
Created:December 23, 2013 Updated:January 6, 2014
Description: From the Red Hat bugzilla:

A flaw was reported for OpenSSL 1.0.1e, that can cause application using OpenSSL to crash when using TLS version 1.2. Issue was reported via the following OpenSSL upstream ticket:

Fedora FEDORA-2014-17587 mingw-openssl 2015-01-02
Oracle ELSA-2014-1652 openssl 2014-10-16
Slackware SSA:2014-013-02 openssl 2014-01-13
openSUSE openSUSE-SU-2014:0048-1 openssl 2014-01-11
Ubuntu USN-2079-1 openssl 2014-01-09
Fedora FEDORA-2014-0456 openssl 2014-01-10
openSUSE openSUSE-SU-2014:0018-1 openssl 2014-01-03
openSUSE openSUSE-SU-2014:0015-1 openssl 2014-01-03
openSUSE openSUSE-SU-2014:0012-1 openssl 2014-01-03
Mageia MGASA-2014-0008 openssl 2014-01-06
Fedora FEDORA-2013-23794 openssl 2013-12-22
Fedora FEDORA-2013-23788 openssl 2013-12-22
Fedora FEDORA-2013-23768 openssl 2013-12-22

Comments (none posted)

perl-Proc-Daemon: writes pidfile with mode 666

Package(s):perl-Proc-Daemon CVE #(s):CVE-2013-7135
Created:December 30, 2013 Updated:January 27, 2014
Description: From the Red Hat bugzilla:

It was reported that perl-Proc-Daemon, when instructed to write a pid file, does that with a umask set to 0, so the pid file ends up with mode 666. This might be a security issue.

Mandriva MDVSA-2014:021 perl-Proc-Daemon 2014-01-24
Mageia MGASA-2014-0025 perl-Proc-Daemon 2014-01-24
Fedora FEDORA-2013-23646 perl-Proc-Daemon 2013-12-28
Fedora FEDORA-2013-23635 perl-Proc-Daemon 2013-12-28
Fedora FEDORA-2013-23594 perl-Proc-Daemon 2013-12-28

Comments (none posted)

puppet: insecure temporary files

Package(s):puppet CVE #(s):CVE-2013-4969
Created:January 1, 2014 Updated:February 20, 2014
Description: From the Debian advisory:

An unsafe use of temporary files was discovered in Puppet, a tool for centralized configuration management. An attacker can exploit this vulnerability and overwrite an arbitrary file in the system.

Mageia MGASA-2014-0084 puppet & puppet3 2014-02-19
Fedora FEDORA-2014-0850 puppet 2014-01-23
Fedora FEDORA-2014-0825 puppet 2014-01-23
Debian DSA-2831-2 puppet 2014-01-17
Ubuntu USN-2077-2 puppet 2014-01-09
Ubuntu USN-2077-1 puppet 2014-01-06
Debian DSA-2831-1 puppet 2013-12-31
Mandriva MDVSA-2014:040 puppet 2014-02-18

Comments (none posted)

python-setuptools: code execution

Package(s):python-setuptools CVE #(s):CVE-2013-2215
Created:January 1, 2014 Updated:March 30, 2015
Description: From the Red Hat bugzilla:

A security flaw was found in the way Python Setuptools, a collection of enhancements to the Python distutils module, that allows more easily to build and distribute Python packages, performed integrity checks when loading external resources, previously extracted from zipped Python Egg archives(formerly if the timestamp and file size of a particular resource expanded from the archive matched the original values, the resource was successfully loaded). A local attacker, with write permission into the Python's EGG cache (directory) could use this flaw to provide a specially-crafted resource (in expanded form) that, when loaded in an application requiring that resource to (be able to) run, would lead to arbitrary code execution with the privileges of the user running the application.

Fedora FEDORA-2013-23141 python-setuptools 2014-01-01
Fedora FEDORA-2013-23140 python-setuptools 2014-01-01

Comments (none posted)

rubygem-actionmailer: denial of service

Package(s):rubygem-actionmailer-3_2 CVE #(s):CVE-2013-4389
Created:December 23, 2013 Updated:March 27, 2014
Description: From the CVE entry:

Multiple format string vulnerabilities in log_subscriber.rb files in the log subscriber component in Action Mailer in Ruby on Rails 3.x before 3.2.15 allow remote attackers to cause a denial of service via a crafted e-mail address that is improperly handled during construction of a log message.

Debian DSA-2888-1 ruby-actionpack-3.2 2014-03-27
Debian DSA-2887-1 ruby-actionmailer-3.2 2014-03-27
Fedora FEDORA-2014-0970 rubygem-activesupport 2014-01-24
Fedora FEDORA-2014-0970 rubygem-actionpack 2014-01-24
Fedora FEDORA-2014-0970 rubygem-actionmailer 2014-01-24
openSUSE openSUSE-SU-2014:0009-1 rubygem-actionpack-3_2 2014-01-03
openSUSE openSUSE-SU-2013:1931-1 rubygem-activesupport-3_2 2013-12-23
openSUSE openSUSE-SU-2013:1928-1 rubygem-actionmailer-3_2 2013-12-23

Comments (none posted)

rubygem-i18n: cross-site scripting

Package(s):rubygem-i18n CVE #(s):CVE-2013-4492
Created:December 23, 2013 Updated:January 21, 2014
Description: From the CVE entry:

Cross-site scripting (XSS) vulnerability in exceptions.rb in the i18n gem before 0.6.6 for Ruby allows remote attackers to inject arbitrary web script or HTML via a crafted call.

Mageia MGASA-2014-0017 ruby-i18n 2014-01-21
Fedora FEDORA-2013-23034 rubygem-i18n 2013-12-19
Fedora FEDORA-2013-23062 rubygem-i18n 2013-12-19
Debian DSA-2830-1 ruby-i18n 2013-12-30
openSUSE openSUSE-SU-2013:1930-1 rubygem-i18n, 2013-12-23

Comments (none posted)

wireshark: multiple vulnerabilities

Package(s):wireshark CVE #(s):CVE-2013-7113 CVE-2013-7114
Created:December 20, 2013 Updated:January 6, 2014

From the CVE entries:

CVE-2013-7113 - epan/dissectors/packet-bssgp.c in the BSSGP dissector in Wireshark 1.10.x before 1.10.4 incorrectly relies on a global variable, which allows remote attackers to cause a denial of service (application crash) via a crafted packet.

CVE-2013-7114 - Multiple buffer overflows in the create_ntlmssp_v2_key function in epan/dissectors/packet-ntlmssp.c in the NTLMSSP v2 dissector in Wireshark 1.8.x before 1.8.12 and 1.10.x before 1.10.4 allow remote attackers to cause a denial of service (application crash) via a long domain name in a packet.

Scientific Linux SLSA-2014:0342-1 wireshark 2014-03-31
Oracle ELSA-2014-0342 wireshark 2014-03-31
CentOS CESA-2014:0342 wireshark 2014-03-31
Red Hat RHSA-2014:0342-01 wireshark 2014-03-31
openSUSE openSUSE-SU-2014:0020-1 wireshark 2014-01-03
openSUSE openSUSE-SU-2014:0017-1 wireshark 2014-01-03
openSUSE openSUSE-SU-2014:0013-1 wireshark 2014-01-03
Mandriva MDVSA-2013:296 wireshark 2013-12-20
Mageia MGASA-2013-0380 wireshark 2013-12-19
Debian DSA-2825-1 wireshark 2013-12-20

Comments (none posted)

xen: denial of service/privilege escalation

Package(s):xen CVE #(s):CVE-2013-6400
Created:December 23, 2013 Updated:January 1, 2014
Description: From the CVE entry:

Xen 4.2.x and 4.3.x, when using Intel VT-d and a PCI device has been assigned, does not clear the flag that suppresses IOMMU TLB flushes when unspecified errors occur, which causes the TLB entries to not be flushed and allows local guest administrators to cause a denial of service (host crash) or gain privileges via unspecified vectors.

Gentoo 201407-03 xen 2014-07-16
openSUSE openSUSE-SU-2014:0483-1 xen 2014-04-04
openSUSE openSUSE-SU-2014:0482-1 xen 2014-04-04
SUSE SUSE-SU-2014:0373-1 Xen 2014-03-14
Fedora FEDORA-2013-23466 xen 2013-12-25
Fedora FEDORA-2013-23457 xen 2013-12-25
Fedora FEDORA-2013-23251 xen 2013-12-21

Comments (none posted)

Page editor: Jake Edge

Kernel development

Brief items

Kernel release status

The current development kernel is 3.13-rc6, released on December 29. Previously, 3.13-rc5 was released on December 22. "This might also be a good time to say that even _if_ things continue to calm down, I think we'll be going to at least -rc8 regardless, since LCA is fairly early this year, and I won't be opening the merge window for 3.14 until after I'm back from those travels."

Stable updates: 3.12.6, 3.10.25, and 3.4.75 were released on December 20.

Comments (none posted)

Quotes of the week

If you don't have full source to your firmware you don't have a system you can trust.
Alan Cox

It may be that our existing non-Linux ports are not very widely used, undermaintained, and/or not of production quality. However, I think it is important for us to keep those options open. Of course that provides a space for people to work on them and use them, directly, but more importantly it keeps Debian's options open for the future. And the competition helps keep Linux honest, which is important because Linux is effectively unforkable, has a poor history of responsiveness to concerns of some of its downstream userbases, and has a nearly-unuseable governance setup.
Ian Jackson

And even if we default it to off, someone is going to cry and tell all the distributions to turn it on in /etc/sysctl.conf, just like they did for rp_filter. And they will. I don't have the strength and time to fight every person who makes these decisions at all the major distributions to explain to each and every one of them how foolish it would be.

No end host should have rp_filter on. It unnecessarily makes our routing lookups much more expensive for zero gain on an end host. But people convinced the distributions that turning it on everywhere by default was a good idea and it stuck.

David Miller

Comments (9 posted)

No more .bz2 files from

The administrators have announced that they will no longer be adding bzip2-compressed files to the archives, though all existing files will remain available. Going forward, kernel patches and tarballs, along with non-kernel-related files, will be compressed with gzip or xz.

Comments (17 posted)

Kernel development news

Btrfs: Working with multiple devices

By Jonathan Corbet
December 30, 2013
LWN's guide to Btrfs
The previous installments of this series on the Btrfs filesystem have focused on the basics of using Btrfs like any other Linux filesystem. But Btrfs offers a number of features not supported by the alternatives; near the top of that list is support for multiple physical devices. Btrfs is not just a filesystem; it also has its own RAID mechanism built in. This article will delve into how this feature works and how to make use of it.

There are two fundamental reasons to want to spread a single filesystem across multiple physical devices: increased capacity and greater reliability. In some configurations, RAID can also offer improved throughput for certain types of workloads, though throughput tends to be a secondary consideration. RAID arrays can be arranged into a number of configurations ("levels") that offer varying trade-offs between these parameters. Btrfs does not support all of the available RAID levels, but it does have support for the levels that most people actually want to use.

RAID 0 ("striping") can be thought of as a way of concatenating multiple physical disks together into a single, larger virtual drive. A strict striping implementation distributes data across the drives in a well-defined set of "stripes"; as a result, all of the drives must be the same size, and the total capacity is simply the product of the number of drives and the capacity of any individual drive in the array. Btrfs can be a bit more flexible than this, though, supporting a concatenation mode (called "single") which can work with unequally sized drives. In theory, any number of drives can be combined into a RAID 0 or "single" array.

RAID 1 ("mirroring") trades off capacity for reliability; in a RAID 1 array, two drives (of the same size) store identical copies of all data. The failure of a single drive can kill an entire RAID 0 array, but a RAID 1 array will lose no data in that situation. RAID 1 arrays will be slower for write-heavy use, since all data must be written twice, but they can be faster for read-heavy workloads, since any given read can be satisfied by either drive in the array.

RAID 10 is a simple combination of RAID 0 and RAID 1; at least two pairs of drives are organized into independent RAID 1 mirrored arrays, then data is striped across those pairs.

RAID 2, RAID 3, and RAID 4 are not heavily used, and they are not supported by Btrfs. RAID 5 can be thought of as a collection of striped drives with a parity drive added on (in reality, the parity data is usually distributed across all drives). A RAID 5 array with N drives has the storage capacity of a striped array with N-1 drives, but it can also survive the failure of any single drive in the array. RAID 6 uses a second parity drive, increasing the amount of space lost to parity blocks but adding the ability to lose two drives simultaneously without losing any data. A RAID 5 array must have at least three drives to make sense, while RAID 6 needs four drives. Both RAID 5 and RAID 6 are supported by Btrfs.

One other noteworthy point is that Btrfs goes out of its way to treat metadata differently than file data. A loss of metadata can threaten the entire filesystem, while the loss of file data affects only that one file — a lower-cost, if still highly undesirable, failure. Metadata is usually stored in duplicate form in Btrfs filesystems, even when a single drive is in use. But the administrator can explicitly configure how data and metadata are stored on any given array, and the two can be configured differently: data might be simply striped in a RAID 0 configuration, for example, while metadata is stored in RAID 5 mode in the same filesystem. And, for added fun, these parameters can be changed on the fly.

A striping example

Earlier in this series, we used mkfs.btrfs to create a simple Btrfs filesystem. A more complete version of this command for the creation of multiple-device arrays looks like this:

    mkfs.btrfs -d mode -m mode dev1 dev2 ...

This command will group the given devices together into a single array and build a filesystem on that array. The -d option describes how data will be stored on that array; it can be single, raid0, raid1, raid10, raid5, or raid6. The placement of metadata, instead, is controlled with -m; in addition to the modes available for -d, it supports dup (metadata is stored twice somewhere in the filesystem). The storage modes for data and metadata are not required to be the same.

So, for example, a simple striped array with two drives could be created with:

    mkfs.btrfs -d raid0 /dev/sdb1 /dev/sdc1

Here, we have specified striping for the data; the default for metadata will be dup. This filesystem is mounted with the mount command as usual. Either /dev/sdb1 or /dev/sdc1 can be specified as the drive containing the filesystem; Btrfs will find all other drives in the array automatically.

The df command will only list the first drive in the array. So, for example, a two-drive RAID 0 filesystem with a bit of data on it looks like this:

    # df -h /mnt
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/sdb1       274G   30G  241G  11% /mnt

More information can be had with the btrfs command:

    root@dt:~# btrfs filesystem show /mnt
    Label: none  uuid: 4714fca3-bfcb-4130-ad2f-f560f2e12f8e
	    Total devices 2 FS bytes used 27.75GiB
	    devid    1 size 136.72GiB used 17.03GiB path /dev/sdb1
	    devid    2 size 136.72GiB used 17.01GiB path /dev/sdc1

(Subcommands to btrfs can be abbreviated, so one could type "fi" instead of "filesystem", but full commands will be used here). This output shows the data split evenly across the two physical devices; the total space consumed (17GiB on each device) somewhat exceeds the size of the stored data. That shows a commonly encountered characteristic of Btrfs: the amount of free space shown by a command like df is almost certainly not the amount of data that can actually be stored on the drive. Here we are seeing the added cost of duplicated metadata, among other things; as we will see below, the discrepancy between the available space shown by df and reality is even greater for some of the other storage modes.

Device addition and removal

Naturally, no matter how large a particular filesystem is when the administrator sets it up, it will prove too small in the long run. That is simply one of the universal truths of system administration. Happily, Btrfs makes it easy to respond to a situation like that; adding another drive (call it "/dev/sdd1") to the array described above is a simple matter of:

    # btrfs device add /dev/sdd1 /mnt

Note that this addition can be done while the filesystem is live — no downtime required. Querying the state of the updated filesystem reveals:

    # df -h /mnt
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/sdb1       411G   30G  361G   8% /mnt

    # btrfs filesystem show /mnt
    Label: none  uuid: 4714fca3-bfcb-4130-ad2f-f560f2e12f8e
	    Total devices 3 FS bytes used 27.75GiB
	    devid    1 size 136.72GiB used 17.03GiB path /dev/sdb1
	    devid    2 size 136.72GiB used 17.01GiB path /dev/sdc1
	    devid    3 size 136.72GiB used 0.00 path /dev/sdd1

The filesystem has been expanded with the addition of the new space, but there is no space consumed on the new drive. It is, thus, not a truly striped filesystem at this point, though the difference can be hard to tell. New data copied into the filesystem will be striped across all three drives, so the amount of used space will remain unbalanced unless explicit action is taken. To balance out the filesystem, run:

    # btrfs balance start -d -m /mnt
    Done, had to relocate 23 out of 23 chunks

The flags say to balance both data and metadata across the array. A balance operation involves moving a lot of data between drives, so it can take some time to complete; it will also slow access to the filesystem. There are subcommands to pause, resume, and cancel the operation if need be. Once it is complete, the picture of the filesystem looks a little different:

    # btrfs filesystem show /mnt
    Label: none  uuid: 4714fca3-bfcb-4130-ad2f-f560f2e12f8e
	    Total devices 3 FS bytes used 27.78GiB
	    devid    1 size 136.72GiB used 10.03GiB path /dev/sdb1
	    devid    2 size 136.72GiB used 10.03GiB path /dev/sdc1
	    devid    3 size 136.72GiB used 11.00GiB path /dev/sdd1

The data has now been balanced (approximately) equally across the three drives in the array.

Devices can also be removed from an array with a command like:

    # btrfs device delete /dev/sdb1 /mnt

Before the device can actually removed, it is, of course, necessary to relocate any data stored on that device. So this command, too, can take a long time to run; unlike the balance command, device delete offers no way to pause and restart the operation. Needless to say, the command will not succeed if there is not sufficient space on the remaining drives to hold the data from the outgoing drive. It will also fail if removing the device would cause the array to fall below the minimum number of drives for the RAID level of the filesystem; a RAID 0 filesystem cannot be left with a single drive, for example.

Note that any drive can be removed from an array; there is no "primary" drive that must remain. So, for example, a series of add and delete operations could be used to move a Btrfs filesystem to an entirely new set of physical drives with no downtime.

Other RAID levels

The management of the other RAID levels is similar to RAID 0. To create a mirrored array, for example, one could run:

    mkfs.btrfs -d raid1 -m raid1 /dev/sdb1 /dev/sdc1

With this setup, both data and metadata will be mirrored across both drives. Exactly two drives are required for RAID 1 arrays; these arrays, once again, can look a little confusing to tools like df:

    # du -sh /mnt
    28G	    /mnt

    # df -h /mnt
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/sdb1       280G   56G  215G  21% /mnt

Here, df shows 56GB of space taken, while du swears that only half that much data is actually stored there. The listed size of the filesystem is also wrong, in that it shows the total space, not taking into account that every block will be stored twice; a user who attempts to store that much data in the array will be sorely disappointed. Once again, more detailed and correct information can be had with:

    # btrfs filesystem show /mnt
    Label: none  uuid: e7e9d7bd-5151-45ab-96c9-e748e2c3ee3b
	    Total devices 2 FS bytes used 27.76GiB
	    devid    1 size 136.72GiB used 30.03GiB path /dev/sdb1
	    devid    2 size 142.31GiB used 30.01GiB path /dev/sdc1

Here we see the full data (plus some overhead) stored on each drive.

A RAID 10 array can be created with the raid10 profile; this type of array requires an even number of drives, with four drives at a minimum. Drives can be added to — or removed from — an active RAID 10 array, but, again, only in pairs. RAID 5 arrays can be created from any number of drives with a minimum of three; RAID 6 needs a minimum of four drives. These arrays, too, can handle the addition and removal of drives while they are mounted.

Conversion and recovery

Imagine for a moment that a three-device RAID 0 array has been created and populated with a bit of data:

    # mkfs.btrfs -d raid0 -m raid0 /dev/sdb1 /dev/sdc1 /dev/sdd1
    # mount /dev/sdb1 /mnt
    # cp -r /random-data /mnt

At this point, the state of the array looks somewhat like this:

    # btrfs filesystem show /mnt
    Label: none  uuid: 6ca4e92a-566b-486c-a3ce-943700684bea
	    Total devices 3 FS bytes used 6.57GiB
	    devid    1 size 136.72GiB used 4.02GiB path /dev/sdb1
	    devid    2 size 136.72GiB used 4.00GiB path /dev/sdc1
	    devid    3 size 136.72GiB used 4.00GiB path /dev/sdd1

After suffering a routine disk disaster, the system administrator then comes to the conclusion that there is value in redundancy and that, thus, it would be much nicer if the above array used RAID 5 instead. It would be entirely possible to change the setup of this array by backing it up, creating a new filesystem in RAID 5 mode, and restoring the old contents into the new array. But the same task can be accomplished without downtime by converting the array on the fly:

    # btrfs balance start -dconvert=raid5 -mconvert=raid5 /mnt

(The balance filters page on the Btrfs wiki and this patch changelog have better information on the balance command than the btrfs man page). Once again, this operation can take a long time; it involves moving a lot of data between drives and generating checksums for everything. At the end, though, the administrator will have a nicely balanced RAID 5 array without ever having had to take the filesystem offline:

    # btrfs filesystem show /mnt
    Label: none  uuid: 6ca4e92a-566b-486c-a3ce-943700684bea
	    Total devices 3 FS bytes used 9.32GiB
	    devid    1 size 136.72GiB used 7.06GiB path /dev/sdb1
	    devid    2 size 136.72GiB used 7.06GiB path /dev/sdc1
	    devid    3 size 136.72GiB used 7.06GiB path /dev/sdd1

Total space consumption has increased, due to the addition of the parity blocks, but otherwise users should not notice the conversion to the RAID 5 organization.

A redundant configuration does not prevent disk disasters, of course, but it does enable those disasters to be handled with a minimum of pain. Let us imagine that /dev/sdc1 in the above array starts to show signs of failure. If the administrator has a spare drive (we'll call it /dev/sde1) available, it can be swapped into the array with a command like:

    btrfs replace start /dev/sdc1 /dev/sde1 /mnt

If needed, the -r flag will prevent the system from trying to read from the outgoing drive if possible. Replacement operations can be canceled, but they cannot be paused. Once the operation is complete, /dev/sdc1 will no longer be a part of the array and can be disposed of.

Should a drive fail outright, it may be necessary to mount the filesystem in the degraded mode (with the "-o degraded" flag. The dead drive can then be removed with:

    btrfs device delete missing /mnt

The word "missing" is recognized as meaning a drive that is expected to be part of the array, but which is not actually present. The replacement drive can then be added with btrfs device add, probably followed by a balance operation.


The multiple-device features have been part of the Btrfs design from the early days, and, for the most part, this code has been in the mainline and relatively stable for some time. The biggest exception is the RAID 5 and RAID 6 support, which was merged for 3.9. Your editor has not seen huge numbers of problem reports for this functionality, but the fact remains that it is relatively new and there may well be a surprise or two there that users have not yet encountered.

Built-in support for RAID arrays is one of the key Btrfs features, but the list of advanced capabilities does not stop there. Another fundamental aspect of Btrfs is its support for subvolumes and snapshots; those will be discussed in the next installment in this series.

Comments (88 posted)

Understanding the Jailhouse hypervisor, part 1

January 1, 2014

This article was contributed by Valentine Sinitsyn

Jailhouse is a new hypervisor designed to cooperate with Linux and run bare-metal applications or modified guest operating systems. Despite this cooperation, Jailhouse is self-contained and uses Linux only to bootstrap and (later) manage itself. The hypervisor is free software released under GPLv2 by Siemens; the Jailhouse project was publicly announced in November 2013, and is in an early stage of development. Currently, Jailhouse supports 64-bit x86 systems only; ARM support is on the roadmap, though, and, given that the code is portable, we may see more architectures added to this list in the future.

Linux has many full-fledged hypervisors (including KVM and Xen), so why bother creating another one? Jailhouse is different. First of all, it is a partitioning hypervisor that is more concerned with isolation than virtualization. Jailhouse is lightweight and doesn't provide many features one traditionally expects from virtualization systems. For example, there is no support for overcommitment of resources, guests can't share a CPU because there is no scheduler, and Jailhouse can't emulate devices you don't have.

Instead, Jailhouse enables asymmetric multiprocessing (AMP) on top of an existing Linux setup and splits the system into isolated partitions called "cells." Each cell runs one guest and has a set of assigned resources (CPUs, memory regions, PCI devices) that it fully controls. The hypervisor's job is to manage cells and maintain their isolation from each other. This approach is most useful for virtualizing tasks that require full control over the CPU; examples include realtime control tasks and long-running number crunchers (high-performance computing). Besides these, it can be used for security applications: to create sandboxes, for example.

A running Jailhouse system has at least one cell known as the "Linux cell." It contains the Linux system used to initially launch the hypervisor and to control it afterward. This cell's role is somewhat similar to that of dom0 in Xen. However, the Linux cell doesn't assert full control over hardware resources as dom0 does; instead, when a new cell is created, the Linux cell cedes control over some of its CPU, device, and memory resources to that new cell. This process is called "shrinking".

Jailhouse relies on hardware-assisted virtualization features provided by the target architecture; for Intel processors (the only ones supported as of this writing) this means VT-x and VT-d support. These requirements make the hypervisor design clean, its code compact and relatively simple; the goal is to keep Jailhouse below 10,000 lines of code. Traditionally, hypervisors were either large and complex, or intentionally simple if built for the classroom. Jailhouse fits in between: it is a real product targeted at production use that is small enough to cover in a two-part article series.

The easiest way to play with Jailhouse now is to run it inside KVM with a simple bare-metal application, apic-demo.bin (provided with the Jailhouse source), as a guest. In this case, VT-d is not used since KVM doesn't emulate it (yet). The README file describes how to create this setup in detail; additional help can be found in the mailing list archives.

Running Jailhouse on real hardware is also possible, but is not very easy at this time. You will need to describe the resources available to Jailhouse (a process covered in the next section); a good starting point for this is the contents of /proc/iomem in your Linux system. This is an error-prone process, but hopefully this article will provide enough insight into how Jailhouse works internally to get it running on the hardware of your choice.

A good introduction to Jailhouse (including slides) can be found in the initial announcement. We won't reproduce it here but rather will dive straight into the hypervisor internals.

Data structures

Before it can be used to partition a real system, the Jailhouse system must be told how that system is put together. To that end, Jailhouse uses struct jailhouse_system (defined in cell-config.h) as a descriptor for the system it runs on. This structure contains three fields:

  • hypervisor_memory, which defines Jailhouse's location in memory;

  • config_memory, which points to the region where hardware configuration is stored (for x86, it's the ACPI tables); and

  • system, a cell descriptor which sets the initial configuration for the Linux cell.

A cell descriptor starts with struct jailhouse_cell_desc, defined in cell-config.h as well. This structure contains basic information like the cell's name, size of its CPU set, the number of memory regions, IRQ lines, and PCI devices. Associated with struct jailhouse_cell_desc are several variable-sized arrays which follow immediately after it in memory; these arrays are:

  • A bitmap which lists the cell's CPUs.

  • An array which stores the physical address, guest physical address (virt_start), size, and access flags for this cell's memory regions. There can be many of these regions, corresponding to the cell's RAM (currently it must be the first region), PCI, ACPI, or I/O APIC, etc. See config/qemu-vm.c for an example.

  • An array which describes the cell's IRQ lines. It's unused now and may disappear or change in the future.

  • The I/O bitmap, which controls I/O ports accessible from the cell (setting a bit indicates that the associated port is inaccessible). This is x86-only, since no other supported architecture has a separate I/O space.

  • An array which maps PCI devices to VT-d domains.

Currently, Jailhouse has no human-readable configuration files. Instead, the C structures mentioned above are compiled with the "-O binary" objcopy flag to produce raw binaries rather than ELF objects, and the jailhouse user-space tool (see tools/jailhouse.c) loads them into memory in that form. Creating such descriptors is tedious work that requires extensive knowledge of the hardware architecture. There are no sanity checks for descriptors except basic validation, so you can easily create something unusable. Nothing prevents Jailhouse from using a higher-level XML or similar text-based configuration files in the future — it is just not implemented yet.

Another common data structure is struct per_cpu, which is architecture-specific and defined (for x86) in x86/include/asm/percpu.h. It describes a CPU that is assigned to a cell. Throughout this text, we will refer to it as cpu_data. There is one cpu_data structure for each processor Jailhouse manages, and it is stored in a per-CPU memory region called per_cpu. cpu_data contains information like the logical CPU identifier (cpu_id field), APIC identifier (apic_id), the hypervisor stack (stack[PAGE_SIZE]), a back reference to the cell this CPU belongs to (cell), a set of Linux registers (i.e. register values used when Linux moved to this CPU's cell), and the CPU mode (stopped, wait-for-SIPI, etc). It also holds the VMXON and VMCS regions required for VT-x.

Finally, there is struct jailhouse_header defined in header.h, which describes the hypervisor as a whole. It is located at the very beginning of the hypervisor binary image and contains information like the hypervisor entry point address, its memory size, page offset, and number of possible/online CPUs. Some fields in this structure have static values, while the loader initializes the others at Jailhouse startup.

Enabling Jailhouse

Jailhouse operates in a physically continuous memory region. Currently, this region must be reserved at boot using the "memmap=" kernel command-line parameter; future versions may use the contiguous memory allocator (CMA) instead. When you enable Jailhouse, the loader linearly maps this memory into the kernel's virtual address space. Its offset from the memory region's base address is stored in the page_offset field of the header. This makes converting from host virtual to physical address (and the reverse) trivial.

To enable the hypervisor, Jailhouse needs to initialize its subsystems, create a Linux cell according to the system configuration, enable VT-x on each CPU, and, finally, migrate Linux into its cell to continue running in guest mode. From this point, the hypervisor asserts full control over the system's resources. As stated earlier, Jailhouse doesn't depend on Linux to provide services to guests. However, Linux is used to initialize the hypervisor and to control it later. For these tasks, the jailhouse user-space tool issues ioctl() commands to /dev/jailhouse. The jailhouse.ko module (the loader), compiled from main.c, registers this device node when it is loaded into the kernel.

To start the sequence of events described above, the jailhouse tool is used to issue a JAILHOUSE_ENABLE ioctl() which causes a call to jailhouse_enable(). It loads the hypervisor code into the reserved memory region via a request_firmware() call. Then jailhouse_enable() maps Jailhouse's reserved memory region into kernel space using ioremap() and marks its pages as executable. The hypervisor and a system configuration (struct jailhouse_system) copied from user space are laid out in the reserved region. Finally, jailhouse_enable() calls enter_hypervisor() on each CPU, passing it the header, and waits until all these calls return. After that, Jailhouse is considered enabled and the firmware is released.

enter_hypervisor() is really a thin wrapper that jumps to the entry point set in the header. The entry point is defined in hypervisor/setup.c as arch_entry, which is coded in assembler and resides in x86/entry.S. This code locates the per_cpu region for a given cpu_id, stores the Linux stack pointer and cpu_id in it, sets the Jailhouse stack, and calls the architecture-independent entry() function, passing it a pointer to cpu_data. When this function returns, the Linux stack pointer is restored.

The entry() function is what actually enables Jailhouse. It behaves slightly differently for the first CPU it initializes than for the rest of them. The first CPU is called "master"; it is responsible for system-wide initialization and checks. It sets up paging, maps config_memory if it is present in the system configuration, checks the memory regions defined in the Linux cell descriptor for alignment and access flags, initializes the APIC, creates Jailhouse's Interrupt Descriptor Table (IDT), configures x2APIC guest (VMX non-root) access (if available), and initializes the Linux cell. After that, VT-d is enabled and configured for the Linux cell. Non-master CPUs, instead, only initialize themselves.

CPU initialization

CPU initialization is a lengthy process that begins in the cpu_init() function. For starters, the CPU is registered as a "Linux CPU": its ID is validated, and, if it is on the system CPU set, it is added to the Linux cell. The rest of the procedure is architecture-specific and continues in arch_cpu_init(). For x86, it saves the current register values in the cpu_data structure. These values will be restored on first VM entry. Then Jailhouse swaps the IDT (interrupt handlers), the Global Descriptor Table (GDT) that contains segment descriptors, and CR3 (page directory pointer) register with its own values.

Finally, arch_cpu_init() fills the cpu_data->apic_id field (see apic_cpu_init()) and configures Virtual Machine Extensions (VMX) for the CPU. This is done in vmx_cpu_init(), which first checks that CPU provides all the required features. Then it prepares the Virtual Machine Control Structure (VMCS) which is located in cpu_data, and enables VMX on the CPU. The VMCS region is configured in vmcs_setup() so that on every VM entry or exit:

  • The host (Jailhouse) gets the appropriate control and segmentation register values. The corresponding VMCS fields are simply copied from the hardware registers set by arch_cpu_init(). The LMA and LME bits are raised in the host's IA32_EFER MSR, indicating that the processor is in 64-bit mode, and the stack pointer is set to the end of cpu_data->stack (remember that the stack grows down). The host's RIP (instruction pointer) is set to vm_exit() defined in x86/entry.S, and interrupts are disabled in the host RFLAGS. vm_exit() calls vmx_handle_exit() function and resumes VM execution with VMRESUME instruction when it returns. This way, on each VM exit, interrupts are disabled and control is transferred to the dispatch function that analyzes the exit reason and acts appropriately. SYSENTER MSRs are cleared because Jailhouse has no user-space applications or system calls and its guests use a different means to switch to the hypervisor.

  • The guest gets its control and segmentation registers from cpu_data->linux_*. RSP and RIP are taken from the kernel stack frame created for the arch_entry() call. This way, on VM entry, Linux code will continue execution as if the entry() call in hypervisor_enter() has already completed; thus the kernel is transparently migrated to the cell. The guest's IA32_EFER MSR is also set to its Linux value so that 64-bit mode is enabled on VM entry. Cells besides the Linux cell will reset their CPUs just after initialization, overwriting the values defined here.

When all CPUs are initialized, entry() calls arch_cpu_activate_vmm(). This is point of no return: it sets the RAX register to zero, loads all the general-purpose registers left and issues a VMLAUNCH instruction to enter the guest. Due to the guest register setup described earlier and because RAX (which, by convention, stores function return values) is zero, Linux will consider the entry() call to be successful and move on as a guest.

This concludes the Part 1 of the series. In Part 2, we will look at how Jailhouse handles interrupts, and what needs to be done to create a cell, and to disable the hypervisor.

Comments (9 posted)

Patches and updates

Kernel trees

  • Sebastian Andrzej Siewior: 3.12.6-rt9 . (December 24, 2013)


Core kernel code

Development tools

Device drivers


Filesystems and block I/O

Virtualization and containers


  • Lucas De Marchi: kmod 16 . (December 23, 2013)

Page editor: Jonathan Corbet


FedUp on Fedora 20

By Jake Edge
January 1, 2014

Distributions, like all software (and other) projects, have failures. One of the most important things that can come out of any kind of failure is to learn from it and try to prevent similar failures in the future. That is precisely the goal of Adam Williamson's post mortem on the FedUp bug that affected users trying to upgrade to Fedora 20. In it, he explained how and why things went bad, with an eye toward better testing to catch this kind of bug in the future. He also had some thoughts on how the current release process might be changed to help avoid bugs that arise because of the time crunch at the end of the cycle.

Williamson shepherds Fedora's quality assurance (QA) efforts and is thus well-placed to observe what went wrong and to suggest fixes going forward. QA didn't catch the bug before it got out into the wild and Williamson accepts his share of the blame for that. But blame is not really the purpose of the exercise. Finding the underlying problems and addressing them for the future are the goals.

When Fedora 20 was released, the FedUp version most Fedora 18 and 19 users had (fedup-0.7) would not properly upgrade those systems to F20. FedUp is the approved method for upgrading from one Fedora version to the next (or even for skipping a version and going straight from F18 to F20, for example). The solution was fairly simple, even for those who had tried and failed to upgrade with 0.7: get fedup-0.8 and use it. There was a bit more to it than that, particularly for F18 users, but that was the crux of the fix.

The bug was spotted quickly and fixed pretty quickly, but the upgrade process is one of the most high-profile places for a release-day bug. It would certainly have left a bad taste for any users who were bitten by it. The fact that the bug could easily be overcome helped, but it was something of a black eye for the distribution on a day intended to celebrate a new release.

So, how does Fedora avoid the problem in the future? The actual underlying cause of the bug has not been identified, according to Williamson, but it appears that the versions for FedUp and the fedup-dracut package must be kept the same, so that the initramfs created by fedup-dracut will work with the FedUp installed on the user's machine. Essentially, FedUp 0.7 was fetching an initramfs created by fedup-dracut-0.8, which would not work to reboot the system as part of the upgrade. Falling back to the F18 or F19 kernel and initramfs would still allow the system to boot, however.

Beyond the bug's proximate cause, Williamson noted several problems that led to the bug, including a lack of widespread knowledge about how FedUp works, inadequate test cases, and two problems that are endemic to Fedora's short stabilization phase: release candidates that are short-lived and large changes to fundamental packages made late in the cycle. The latter two tend to reduce the amount of time that QA has for testing, which can lead to more bugs slipping through the cracks. Large, late changes also mean that not all of the ramifications of a new feature are discovered pre-release, which is another source of surprises.

Adding better test cases is fairly straightforward. The existing tests were set up when FedUp was developing rapidly, so the test case grabbed the package from the updates-testing repository (rather than the stable or updates repositories). For Fedora 18 and 19, fedup-0.8 was in updates-testing, so QA never saw the bug. The tests have been changed to get the package from the other repositories.

The bug also probably led to a better understanding of at least some of the workings of FedUp within the Fedora development community. In tracking down the bug and fixes for it, some folks got a crash course in FedUp and how it operates. That may help address Williamson's concern about a lack of knowledge of the tool. Given its importance to the distribution, a tool like FedUp should be well understood by more than just a small handful of community members.

The other identified issues will be harder to address, at least in the short term. But, as Williamson noted, squeezing everything into the tail end of the release cycle is a known problem; this bug just helped highlight it again:

In wider terms, this issue is another indicator on top of several previous ones that we should redouble our efforts to get 'releaseable' RCs built days ahead of go/no-go, rather than hours. That's a whole story in itself, but this is something the parties involved are all aware of and working on. Of course, the whole release process may look somewhat different in a world, but as long as we have our current release schedule and freeze policies, this issue is likely to exist at least in essence.

It's also another good indicator that we should do whatever we can to try and land major changes much earlier in the release cycle. This is hardly a new observation, of course, nor an issue of which many relevant people were previously unaware, and there are always good reasons why we wind up landing the kitchen sink a week before release, but it's always good to have another reminder.

There are likely lessons for other projects and distributions here. While some of the issues were Fedora-specific, most were not. Williamson has done a nice service not only for Fedora here, but for the wider community. There are some real advantages to doing our work in the open—learning from other projects' successes and failures is just one of them.

Comments (3 posted)

Brief items

Distribution quotes of the week

Part of the printing packages on Gentoo are very similar to, err, well, a dumpster on a hot summer afternoon. Nobody really wants to lift the lid and look inside, and beware of poking around in it
-- Andreas K. Hüttel

If you try to outsmart your compiler, it will get it's revenge very soon and very hard.
-- Sven Eden

When it comes to technology choices, you win some and you lose some. If upstart wins, I will be happy. If systemd wins, I will also be happy, because it's long overdue that Debian *make a decision*; and for all that there are aspects of systemd that make me uncomfortable, it will still be far better than the status quo.
-- Steve Langasek

Comments (none posted)

Positions forming in the Debian init system discussion

Some of the members of the Debian Technical Committee are starting to post their conclusions regarding which init system the distribution should use in the future. In particular, Ian Jackson has come out in favor of upstart: "Firstly, unlike the systemd maintainers, I think portability to non-Linux systems is important. It may be that our existing non-Linux ports are not very widely used, undermaintained, and/or not of production quality. However, I think it is important for us to keep those options open."

Russ Allbery, meanwhile, is in favor of systemd. "There are two separate conceptual areas in which I think systemd offers substantial advantages over upstart, each of which I would consider sufficient to choose systemd on its own. Together, they make a compelling case for systemd."

In both cases, the authors have extensively documented their reasons for their decisions; reading the full messages is recommended.

Comments (320 posted)

Red Hat Announces Red Hat Enterprise Linux OpenStack Platform 4.0

Red Hat has announced the general availability of Red Hat Enterprise Linux OpenStack Platform 4.0. "Engineered with Red Hat-hardened OpenStack Havana code, Red Hat Enterprise Linux 6.5, and the Red Hat Enterprise Virtualization Hypervisor built on KVM, Red Hat Enterprise Linux OpenStack Platform offers IT infrastructure teams, cloud application developers, and experienced cloud builders a clear path to the open hybrid cloud without compromising on availability, security, or performance."

Full Story (comments: none)

Distribution News

Debian GNU/Linux

Bits from the release team (freeze time line)

The Debian release time presents a time line for the Jessie freeze and more. "PS: We will freeze on the 5th of November. Your packages will be RC bug free way before then, right?"

Full Story (comments: 1)

Debian Contributors

Enrico Zini and others have been working on making contributions to Debian more visible. "Debian is vast, and many people contribute to it, not just Debian Developers. We value and encourage all kinds of contributions, but we currently fail to make that work visible, and to credit it."

Full Story (comments: none)


Fedora 20 release day FedUp bug: post-mortem

When Fedora 20 was released many people tried and failed to upgrade using FedUp, the approved upgrade mechanism. Adam Williamson presents a post-mortem of the FedUp problem, its solution, and lessons learned. "Prior to Fedora 20's release, the test cases for fedup recommended testing the latest version of fedup from updates-testing against the upgrade initramfs from the development/20 tree. This procedure was a holdover from the very early days of FedUp, when it was changing daily and testing anything older was uninteresting, and when procedures for the generation and publishing of the upgrade initramfs had not yet been clearly established (and TC/RC trees did not contain one). However, it is no longer appropriate for the more mature state of FedUp development at this point in time, and it should have been changed earlier. We in QA apologize to the project for this oversight."

Full Story (comments: none)

Newsletters and articles of interest

Distribution newsletters

Comments (none posted)

Santoku Linux (PC Magazine)

PC Magazine takes a look at Santoku Linux. "Santoku Linux puts the tools security professionals and hackers need to examine mobile malware, detect malicious apps, and forensically analyze data at their fingertips. If you're into mobile security and mobile forensics, Santoku Linux is worth a closer look."

Comments (none posted)

Page editor: Rebecca Sobol


Debating a "transitional" Python 2.8

By Jake Edge
January 1, 2014

Python 3 has been around for five years now, but adoption of the language still languishes—at least according to some. It is a common problem for projects that make non-backward-compatible changes, but it is even more difficult for programming languages. There is, typically, a huge body of installed code and, for Python, libraries that need to be upgraded as well. If the new features aren't compelling enough, and the previous version is already quite capable, it can lead to slow adoption of the new version—and frustration for some of its developers. Solving that problem for Python 3 is on the minds of some.

Recently, Alex Gaynor, who has worked on core Python and PyPy, expressed his concerns that Python 3 would never take off, partly because the benefits it brings are not significant enough to cause projects and users to switch over to Python 3. After five years, few are downloading packages for Python 3 and developers aren't moving to it either:

Looking at download statistics for the Python Package Index, we can see that Python 3 represents under 2% of package downloads. Worse still, almost no code is written for Python 3. As I said all of my new code supports Python 3, but I run it locally with Python 2, I test it locally with Python 2; Travis CI [continuous integration] runs it under Python 3 for me; certainly none of my code is Python 3 only. At companies with large Python code bases I talk to no one is writing Python 3 code, and basically none of them are thinking about migrating their codebases to Python 3.

That leads to language stagnation at some level, Gaynor said. Everyone is still using Python 2, so none of the features in 3.x are getting exercised. Since Python 2 is feature frozen, all of the new features go into versions that few are using. And it is mostly Python-language developers (or others closely related to the project) who are using the new features; the majority of "real" users are not, so the feedback on those features may be distorted:

The fact that Python 3 is being used exclusively by very early adopters means that what little feedback happens on new features comes from users who may not be totally representative of the broader community. And as we get farther and farther in the 3.X series it gets worse and worse. Now we're building features on top of other features and at no level have they been subjected to actual wide usage.

Gaynor concluded that the divergence of Python 2 and 3 has been bad for the community. He suggested releasing a Python 2.8 that backported all of the features from Python 3 and emitted warnings for constructs that would not work in Python 3. That, he said, would give users a path for moving to Python 3—and a way to start exercising its new features.

As might be guessed, not all were in favor of his plan, but the frustration over the Python 3 adoption rate seems fairly widespread—at least with the commenters on Gaynor's blog. So far, the conversation has not spilled over to python-ideas or some other mailing list.

Alan Franzoni agreed that some kind of transition path, rather than an abrupt jump, is needed.

We badly need smooth transitions, a version where features are deprecated and replacements exist at the same time, so we can progressively improve our libraries. Very few developers are willing to invest their time and resources in porting code to a not-so-used language version, and just one non-py3k-[compatible] library is enough for a project for not migrating to Python 3. It's a chicken-and-egg problem.

David A. Wheeler and others agreed with Franzoni, but "rwdim" felt that Gaynor did not go far enough:

The solution is clear: build a full compatibility layer, not a conversion tool, into 3.x that makes 2.x code run without change and ABRUPTLY stop development on 2.x. Keep the compats backward compatible for 2 releases and then deprecate items that need to die, forcing people to update.

Burn the ship and force people to move ashore or go away.

In addition, rwdim suggested that the Python package index (PyPI) stop serving packages that are not Python 3 compatible at some point. That last suggestion was not well received by PyPI maintainers and others, but it did attract other semi-belligerent comments. For example, "jgmitzen" likens users sticking with 2.x to terrorists (and to the Tea Party in another comment). While perhaps a bit overwrought, jgmitzen's point is that supporting 2.x in the Python ecosystem is taking time and energy away from 3.x—to the detriment of the language.

But "gsnedders" is not sure that a 2.8 really brings anything to the table. In the libraries that gsnedders maintains, things have gotten to the point where a single code base can support both >=2.6 and 3.x, and that should be true for most projects. The more recent feature additions for Python 3 are in the standard library, which means they are available for 2.x via PyPI.

Like rwdim, Sean Jensen-Grey would like to see an evolutionary approach so that a single interpreter can be used with both Python 2 and 3. In another comment, he referenced a March 2012 blog post from Aaron Swartz that outlines his vision of how the Python 3 transition should have worked. It followed the established pattern of adding new features to Python 2.x, which is clearly an evolutionary approach.

But Python 3 set out with a non-evolutionary approach. Python Enhancement Proposal (PEP) 3000 clearly specified a break with 2.x backward compatibility. The question seems to be: is it time to rethink that strategy in light of the slow adoption for Python 3?

It may simply be a matter of time, too. Linux distributions are starting to plan for shipping Python 3 as the default—some already have made the switch. Those kinds of changes can only help spur adoption, though it may still take a while.

In addition, Some don't seem convinced that Python 3 adoption is lagging, or at least that it is lagging as badly as is sometimes portrayed. To start to answer that question, Dan Stromberg has put together a survey on Python 2.x/3.x use. Whatever the outcome of that, though, it seems likely that many core developers are less than completely pleased with where Python 3 uptake is today—and will be looking for ways to improve it.

Comments (13 posted)

Brief items

Quotes of the week[s]

Setting up an email server is a real pain in the ass, and if I didn't have working config files from the previous server, I'd have never got it working.

Email is something that everyone uses, every day. It's intrinsically federated.

We should really be working harder to make it easy for every family or individual to run an email server at home or on their own cloud server.

Evan Prodromou

I: 00check: Untarring chroot environment. This might take a minute or two.


Wouter Verhelst

Comments (10 posted)

GnuPG 1.4.16 released

Version 1.4.16 of the GNU Privacy Guard is out; it contains a fix for the recently disclosed acoustic cryptoanalysis attack. "A possible scenario is that the attacker places a sensor (for example a standard smartphone) in the vicinity of the targeted machine. That machine is assumed to do unattended RSA decryption of received mails, for example by using a mail client which speeds up browsing by opportunistically decrypting mails expected to be read soon. While listening to the acoustic emanations of the targeted machine, the smartphone will send new encrypted messages to that machine and re-construct the private key bit by bit. A 4096 bit RSA key used on a laptop can be revealed within an hour." Note that GnuPG 2.x is not vulnerable to this particular attack.

Also worthy of note: the GnuPG developers have launched a crowdfunding campaign to help with GnuPG 2.1 development, update the project's infrastructure, and more.

Full Story (comments: 6)

Enlightenment DR 0.18.0 Release

There's a new major release of Enlightenment available, DR 0.18.0. This version includes Wayland client support, and much more. See the full release announcement for details.

Full Story (comments: 17)

GnuCash 2.6.0 released

Version 2.6.0 of the GnuCash accounting system has been released. New features include a reworked reports subsystem, the ability to attach external files (receipts, for example) to transactions, a number of new business features, a year-2038 fix, and a relicensing to GPLv2+. See the GnuCash 2.6.0 release tour page for more information.

Comments (none posted)

Darktable 1.4 released

Version 1.4 of the Darktable photo editor is out. New features include an embedded Lua engine for scripting, a number of new mask types, various performance enhancements, a new "waveform" histogram mode, and more.

Comments (3 posted)

Commotion 1.0 released

Commotion is a mesh networking system intended for resiliency in difficult situations; it claims a number of real-world deployments. The 1.0 release has just been announced. "The launch represents the first full iteration of the technology, which makes it possible for communities to build and own their communications infrastructure using 'mesh' networking. In mesh networks, users connect their devices to each other without having to route through traditional major infrastructure." Binary downloads (including a special OpenWRT image and an Android client) are available from this page; source is hosted on github.

Comments (none posted)

notmuch 0.17 available

Version 0.17 of the notmuch email indexer has been released. This update fixes a major bug with SHA1 computation on big-endian machines, so it is thus incompatible with older releases in that regard, even though the old SHA1 computations were incorrect. "This meant that messages with overlong or missing message-ids were given different computed message-ids than on more common little endian architectures like i386 and amd64. If you use notmuch on a big endian architecture, you are strongly advised to make a backup of your tags using `notmuch dump` before this upgrade." Other changes include better handling of duplicate messages, many improvements to the Emacs front end, and a long list of assorted bugfixes and new options.

Full Story (comments: none)

GNU Octave 3.8.0 released

GNU Octave is a Matlab-like interpreted language for numerical computations; the 3.8.0 release has just been announced. "One of the biggest new features for Octave 3.8 is a graphical user interface. It is the one thing that users have requested most often over the last few years and now it is almost ready." Other new features include Matlab-compatible nested functions, named exceptions, and more; see the NEWS file for the full list.

Full Story (comments: none)

Newsletters and articles

Development newsletters from the past two weeks

Comments (none posted)

Gaynor: About Python 3

On his blog, Alex Gaynor laments the adoption rate of Python 3 and wonders if the split 2.x/3.x development model is to blame. "First, I think it's because of a lack of urgency. Many years ago, before I knew how to program, the decision to have Python 3 releases live in parallel to Python 2 releases was made. In retrospect this was a mistake, it resulted in a complete lack of urgency for the community to move, and the lack of urgency has given way to lethargy. Second, I think there's been little uptake because Python 3 is fundamentally unexciting. It doesn't have the super big ticket items people want, such as removal of the GIL or better performance (for which many are using PyPy). Instead it has many new libraries (whose need is largely filled by pip install), and small cleanups which many experienced Python developers just avoid by habit at this point."

Comments (98 posted)

Page editor: Nathan Willis


Brief items

More money for Cyanogen Inc.

Cyanogen Inc. has announced another round of venture funding, said to be on the order of $22 million. "What does this mean for you as a CM user? Not much yet, except that you’ll see more new things from us more often. We will continue to invest in the community by way of increased resources, sponsoring more events, and of course staying open. You’ll see new apps and features from us, new services, and also more devices which run CM out of the box."

Comments (7 posted)

Articles of interest

Free Software Supporter - Issue 69, December 2013

The December issue of the Free Software Foundation's newsletter covers fund raising, LibrePlanet, Defective by Design visits an Apple store, Gluglug X60 Laptop now certified, end-of-life for Ututo, and several other topics.

Full Story (comments: none)

Taking stock of 2013′s crowdfunded Linux devices (LinuxGizmos) has a survey of crowdfunded Linux-based device projects launched in 2013. "Of the 19 such products listed below, five were never successfully crowdfunded. Of these unfunded devices, all but one appear to be moving forward with alternative funding. In fact, one — CrystalFontz America’s CFA10036 module — has already shipped. That leaves Canonical’s doomed, yet history making Ubuntu Edge smartphone as the only 'failure.'"

Comments (11 posted)

LinuxDevices content returns to the net

LinuxDevices founder Rick Lehrbaum has announced the return of a great deal of historical embedded Linux content to the web. "The LinuxDevices Archive is searchable and also available from a calendar interface, so you can click on any month of any year between 1999 and 2012 and see what pops up. Although some stories did not survive the various transitions between content management systems, the Archive includes over 14,000 LinuxDevices posts, most with images intact, including news, product showcases, and special articles and editorials. So far, just about everything we’ve searched for has emerged in good shape."

Comments (3 posted)

Calls for Presentations

MiniDebConf 2014 Barcelona Call for Proposals

Debian Women has announced a MiniDebConf that will take place March 15-16 in Barcelona, Spain. The call for proposals closes January 31. "The idea behind the conference is not to talk about women in free software, or women in Debian, but rather to make discussion about Debian subjects more inclusive for women. If you agree with this goal, spread the word. Forward this call for potential speakers and help us make this event a great success!"

Full Story (comments: none)

LSF/MM 2014 call for proposals

The 2014 Linux Storage, Filesystem, and Memory Management Summit will be held March 24-25 in Napa Valley, California. The call for proposals is out now for those who would like to participate; the deadline is January 31, but those who need visas to attend should get theirs in earlier. For those unfamiliar with this event, see LWN's coverage of the 2013 Summit for an overview of the type of discussion held there.

Full Story (comments: none)

CFP Deadlines: January 2, 2014 to March 3, 2014

The following listing of CFP deadlines is taken from the CFP Calendar.

DeadlineEvent Dates EventLocation
January 7 March 15
March 16
Chemnitz Linux Days 2014 Chemnitz, Germany
January 10 January 18
January 19
Paris Mini Debconf 2014 Paris, France
January 15 February 28
March 2
FOSSASIA 2014 Phnom Penh, Cambodia
January 15 April 2
April 5
Libre Graphics Meeting 2014 Leipzig, Germany
January 17 March 26
March 28
16. Deutscher Perl-Workshop 2014 Hannover, Germany
January 19 May 20
May 24
PGCon 2014 Ottawa, Canada
January 19 March 22 Linux Info Tag Augsburg, Germany
January 22 May 2
May 3
LOPSA-EAST 2014 New Brunswick, NJ, USA
January 28 June 19
June 20
USENIX Annual Technical Conference Philadelphia, PA, USA
January 30 July 20
July 24
OSCON 2014 Portland, OR, USA
January 31 March 29 Hong Kong Open Source Conference 2014 Hong Kong, Hong Kong
January 31 March 24
March 25
Linux Storage Filesystem & MM Summit Napa Valley, CA, USA
January 31 March 15
March 16
Women MiniDebConf Barcelona 2014 Barcelona, Spain
January 31 May 15
May 16
ScilabTEC 2014 Paris, France
February 1 April 29
May 1
Android Builders Summit San Jose, CA, USA
February 1 April 7
April 9
ApacheCon 2014 Denver, CO, USA
February 1 March 26
March 28
Collaboration Summit Napa Valley, CA, USA
February 3 May 1
May 4
Linux Audio Conference 2014 Karlsruhe, Germany
February 5 March 20 Nordic PostgreSQL Day 2014 Stockholm, Sweden
February 8 February 14
February 16
Linux Vacation / Eastern Europe Winter 2014 Minsk, Belarus
February 9 July 21
July 27
EuroPython 2014 Berlin, Germany
February 14 May 12
May 16
OpenStack Summit Atlanta, GA, USA
February 27 August 20
August 22
USENIX Security '14 San Diego, CA, USA

If the CFP deadline for your event does not appear here, please tell us about it.

Upcoming Events

Events: January 2, 2014 to March 3, 2014

The following event listing is taken from the Calendar.

January 6 Sysadmin Miniconf at 2014 Perth, Australia
January 6
January 10 Perth, Australia
January 13
January 15
Real World Cryptography Workshop NYC, NY, USA
January 17
January 18
QtDay Italy Florence, Italy
January 18
January 19
Paris Mini Debconf 2014 Paris, France
January 31 CentOS Dojo Brussels, Belgium
February 1
February 2
FOSDEM 2014 Brussels, Belgium
February 3
February 4
Config Management Camp Gent, Belgium
February 4
February 5
Open Daylight Summit Santa Clara, CA, USA
February 7
February 9
Django Weekend Cardiff Cardiff, Wales, UK
February 7
February 9 Brno, Czech Republic
February 14
February 16
Linux Vacation / Eastern Europe Winter 2014 Minsk, Belarus
February 21
February 23 2014 Gandhinagar, India
February 21
February 23
Southern California Linux Expo Los Angeles, CA, USA
February 25 Open Source Software and Govenrment McLean, VA, USA
February 28
March 2
FOSSASIA 2014 Phnom Penh, Cambodia

If your event does not appear here, please tell us about it.

Page editor: Rebecca Sobol

Copyright © 2014, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds