ARM-based server systems will hit the market, and those systems, naturally, will be running Linux. In the process, they will highlight no end of entertaining conflicts between the buttoned-down server space and the rather looser ARM world; the advent of ARM servers will also bring ARM developers closer to the core kernel, where they have had a relatively small presence thus far.
In 2013, we got a confirmation that some of the more paranoid people among us were correct: the Internet is indeed being used for widespread surveillance. In 2014, we will learn how bad the situation really is as ongoing revelations show the extent of the surveillance efforts — and the fact that this activity is not even remotely limited to one often-named US agency. The world is full of nosy agencies, both governmental and otherwise, and many (or most) of them have been taking advantage of technology in any way they could.
Awareness of free software as a tool against surveillance will increase, but it will also become clear that free software is not anywhere near enough. Free software provides a modicum of assurance that it is not operating contrary to its users' interests, but that assurance falls down if the software is not closely reviewed, and the sad truth is that we often have fewer eyeballs on our code than we would like to admit.
There is also the little problem that, increasingly, the hardware that our software is running on cannot be trusted. Contemporary hardware, at all levels down to that of simple memory chips, is running software that is invisible to us; that software can be subverted by any of a number of agencies. Even those who are in favor of NSA surveillance (and such people certainly exist) would do well to pause and think about just where much of that hardware and invisible firmware comes from.
Some possible good news is that progress may be made in the fight against patent trolls this year. The economic costs imposed by trolls have become so widespread and indiscriminate that cries for reform are being heard throughout the US, which is the primary venue in which these entities operate. The US Supreme Court will have an opportunity to restrict software patents this year. To believe that the problem will be solved in 2014 would be recklessly optimistic, but, with luck, the situation will be better at the end of the year than it is at the beginning.
The Debian project will resolve its init system debate early in 2014. Whatever conclusion the Technical Committee comes to will be contentious at best; this does not appear to be an issue around which a strong consensus can be formed. If the Committee can explain its reasoning well enough, the project will pick itself up and move on; most people realize that not every decision goes the way they would like. A poorly considered or overtly political decision would create far more strife, but, given the people involved, that outcome seems highly unlikely.
Predicting the actual decision is rather harder. Your editor suspects that systemd may be chosen in the end, perhaps with a decision to make some Debian-specific changes, but would be unwilling to bet more than a single beer on that outcome.
There will be significant challenges for Android in 2014. Alternative mobile platforms, including Sailfish OS, Tizen, Firefox OS, and Ubuntu will all ship during this year; at least one of those may well prove to be a viable alternative that acquires significant market share. The heavily funded Cyanogen Inc. also has the potential to shake things up in the coming year, should it ever decide what it intends to do. Meanwhile, Google's attempts to maintain control over the platform and concentrate functionality into the proprietary "Play Services" layer will sit poorly with manufacturers and some users. Android will remain a strong and successful platform at the end of the year, but it will be operating in a more competitive market.
ChromeOS will have a good year, building on the surprising success of Chromebook systems — 21% of all notebook systems sold — in 2013. The traditional desktop can be expected to continue to fade in importance as it is pushed aside by systems that, while seemingly being less capable, are able to provide useful functionality in a simple, mobile, and relatively secure manner. The down side of this trend, of course, is that it pushes users into company-controlled systems with central data storage — not necessarily a recipe for the greatest level of freedom. The creation of an alternative system that can achieve widespread adoption while more strongly protecting privacy is one of the biggest challenges our community faces.
The fundamental question of what is a Linux distribution? will continue to attract debate. Some people are unwilling to even see something like Android as a proper distribution, but the diversity of systems running on the Linux kernel will only increase. Lamentation over the twilight of "traditional Unix" will continue, but that will not stop the countless numbers of people who are doing ever more interesting things with Linux.
On the filesystem front, Btrfs will start seeing wider production use in 2014, finally, though users will learn to pick and choose between the various available features. XFS will see increasing use resulting from the growth in file and storage sizes, along with Red Hat's decision to use it by default in RHEL 7. 2014 may just be the year when the workhorse ext4 filesystem starts to look like a legacy system to be (eventually) left behind. Ext4 will still be the most widely deployed Linux filesystem at the end of the year, but it won't be the automatic default choice that it has been for so many years.
For years, LWN's annual predictions have included a statement to the effect that our community, along with the software it creates, would be stronger than ever come December. That prediction, at least, has always proved accurate. There is no reason to believe that things will be any different in 2014. We are looking forward to telling you all about 2014 as it happens; thanks, once again, to all of our readers who make another year of LWN possible.
Version 1.4 of the open source photo-manipulation tool Darktable was released on December 26, bringing with it several long-requested features. Some of these new additions (such as editing with masks) include new core functionality, while others might best be viewed as exploratory features. Darktable has always included a wide swath of image-manipulation and filtering options that lend themselves well to experimentation. But there are also usability improvements in 1.4, which are especially welcome to users who may find the toolbox-ful of experimental options tricky to navigate.
The 1.4 release is available for download as a source bundle and as an Ubuntu package through the project's personal package archive (PPA). Other distributions rely on community builds, so the corresponding packages may take some time to arrive; there are also Mac OS X installers provided, but no Windows builds. In keeping with the just-after-Christmas release date, the Darktable logo in the new release sports a Santa hat; no telling how well that gag will age.
The project made one other stable release (1.2) since we looked at Darktable 1.1 in December 2012. The key features in that update included a de-noising function that could be tailored to fit the profile of one's actual camera, the ability to apply several instances of each filter to an image (which enables far more complex image manipulation for most filters), and the ability to import images from Adobe Lightroom, preserving most—but not all—of the edit operations performed in Lightroom.
In a sense, the 1.4 release mirrors the 1.2 release: a handful of focused feature additions for image manipulation, an important application-compatibility feature, and a lengthy list of minor tweaks and fixes. But they all add up; Darktable has come a long way from its early days, in which it often felt like a tool with tremendous power that was impossible to decipher and use, thanks to unlabeled controls, cryptic icons and UI elements, and a penchant for hiding settings and dials in out-of-the-way corners of the interface. Darktable 1.4 is a lot closer to a standard, easy-to-discover image processor.
Darktable offers four major modes of operation, although two of those are likely to get considerably less usage on average. The emphasis is primarily on the "lighttable" mode for browsing and organizing an image collection and the "darkroom" mode for image editing. The other two options are "tethering" mode for controlling a USB-attached camera through the application and "map" mode, which simply offers a world-map view of any georeferenced images in the collection.
Darkroom mode is where the fun happens, of course. In the early days, Darktable would have been described as a "raw converter" because it focused on manipulating raw image file formats from digital cameras. Today it is a bit more general, supporting essentially every image format available, thanks to import filters from GraphicsMagick. In darkroom mode, the interface shows an image histogram in the top right corner of the window, and users can activate any of fifty (in version 1.4) different filter and effect modules. The modules range from simple operations like cropping to special effects like "bloom" (which simulates the washed-out halo effect usually seen when directly photographing a light source like a bulb or streetlight).
In practice, the main challenge in using Darktable's filter modules is keeping them straight in one's mind. Each module has its own sliders and controls; the active modules in an image are stacked on top of each other in a column underneath the histogram, so that the user must scroll up and down through the column to see them all. There is barely enough vertical space in the interface to see more than a couple of modules at a time, and there is even less room if the user opens up the palette of modules to select another one.
This interface challenge is shared by virtually all raw photo editing applications, and no one has come up with a particularly good solution to it yet. Darktable 1.4 does make an attempt at improvement in this area, though: a new preference will keep only one module "expanded" at a time; clicking on another module opens the new module and collapses the others. It isn't perfect, but neither are the alternatives.
The new showcase feature in 1.4 is support for masking off part of the image, so that a filter module only applies to part of the image. There are five mask types available: circles, ellipses, gradients, closed bezier paths, and arbitrary shapes drawn with a brush. All are resizable and allow a fade-out radius for smooth blending—in fact, the "gradient" mask is just a fade-out projected from a straight line. Users can create as many masks as they want, and name or rename them in the "mask manager" panel (which sits on the left side of the window, so it does not steal more room from the filter module list).
The easiest way to use a mask with a filter is to create the mask first. Then the filter needs to be switched on, at which point the user can change the filter's "blend" control to "drawn mask." That setting restricts the filter to the masked area. The mask can be inverted or blurred if further customization is required.
Not every filter module can be masked, however: essentially, filters that operate on all of an image's pixels can be masked (including color, sharpness, and many of the special effects modules), but modules for other effects cannot. The un-maskable modules include controls for how the original image is imported (white balance, demosaicing algorithm, and so forth).
It is hard to overstate just how powerful masking filters are in the image-editing process. Without them, almost all effects are limited to operating on the entire picture; users who need to treat one part of the image differently from another would most likely perform part of the corrections in Darktable, then export the result for further work in another program. There are a few quirks in this implementation—for example, I could not get Darktable to allow me to rearrange existing nodes on a path mask, which means the mask needs to be drawn perfectly the first time through. But the feature is a considerable step up in Darktable's power.
Also powerful, at least in theory, is 1.4's support for scripting with Lua. At the moment, there are not many Lua scripts available for Darktable, but that should change over time. Scripting languages can be a divisive subject, of course; it seems plausible that the choice of Lua was made because Adobe Lightroom uses Lua for scripting, but there are several other open source image-processing applications that also support Lua scripts.
There are several other user-visible changes introduced in 1.4, although they are not as potentially important as masking and scripting. One of them is automatic focus detection in lighttable mode. When there are multiple shots of a particular subject, it is usually important to find the sharpest image; the focus detection feature will highlight sharp zones in an image in red, and blurry zones in blue, so the user can quickly narrow down the choice of which shots to work on.
There are also three new filter modules: "contrast/brightness/saturation," "color balance," and "color mapping." The first of those is a straightforward set of sliders for the image attributes in its name, while the second two require a bit more explanation. "Color balance" allows the user to manipulate the image with gamma and gain controls, while "color mapping" allows the user to swap colors in the image, with a considerable level of control.
Most color manipulations can be arrived at in multiple ways, of course; what makes any one filter module useful is whether it is intuitive to use or simplifies a specific operation. On that front, "color balance" and "color mapping" might take some getting used to. Fortunately, as is usually the case, Darktable provides several sensible defaults to help users get started.
A new feature that might require a bit more experimentation to get the hang of is Darktable's "waveform" mode for the image histogram. Usually, an image histogram graphs the number of pixels from an image on a dark-to-light axis, so that a balanced image will not show a graph that skews significantly to the dark end or the light end. The same is true for color histograms, which chart the amount of red, green, and blue: a balanced image shows all three colors in roughly equal proportions.
The new waveform histogram mode is a different beast entirely: its x-axis plots the horizontal position of the image itself, but the y-axis shows brightness, and the color of the actual point on the graph maps how many pixels in the image to the intensity of the point. Confused? Not to worry; you are not alone. This is certainly an experimental way to examine an image; with some playing around it is possible to get a feel for how the waveform plots change in response to different images and different filters. But it is hard to say so far how useful it will prove in practice.
Last but certainly not least, Darktable 1.4 includes some new utility functions, such as the ability to analyze a sample image to create a "base curve" for a camera (i.e., to profile the specific camera unit's output), and the ability to query the operating system's color management configuration.
For casual photographers, Darktable is one of the best options for creative image adjustment because its stable of filter modules offer decent presets and defaults, and because it renders the results to screen rapidly. In contrast, an application like Rawstudio can perform many of the same effects, but the user needs to play around with individual controls to achieve them. For straightforward color correction, though, Darktable's lengthy list of modules may prove intimidating, and the interface—although it has made many positive strides in recent years—can still be confusing to navigate. For demanding photographers, however, Darktable 1.4 is sure to be a favorite on the strength of masking support alone. The rest is simply gravy.
Having more data is almost always a plus—in theory. But out in the real world, data sets are messy, inconsistent, and incomplete. Correlating multiple data sources introduces a whole new set of problems: matching up fields and keys that are labeled differently, use different units, or have other incompatibilities. These are the issues that OpenRefine sets out to address. It is a tool for sanitizing and standardizing data sets, hopefully through a faster and easier process than a full manual clean up would require. But it also allows the user to explore the data along the way, potentially providing some insights that should be followed up with a more detailed data-mining or visualization program.
OpenRefine was initially a Google product called (naturally) Google Refine. The company stopped working on the code in October 2012, however, so it was re-branded and re-launched as a standalone open source project a short time later. The application is a self-contained Java program that runs on a local HTTP port through the Jetty web server. Users access the front end through their browser, but nothing ever leaves the local machine—a fact that the project emphasizes several times in the documentation and tutorials. One key reason for this concern is that Google Refine / OpenRefine was initially positioned for use in "data-driven journalism," which often involves downloading government or commercial data sets and then hunting through them for patterns. Keeping potentially sensitive projects private is obviously a recurring concern for that use case.
On one hand, the focus on data-driven journalism probably does not matter to most other users; the tool set works just as well for any type of data, from sensor readings to web server logs. On the other hand, the journalism angle does drive some of the application's most useful tools, which are designed to quickly cut through the problems commonly encountered in public data sets. This includes a lot of problems in text fields: inconsistent labels, spacing, the use of acronyms in one place but not in another, and so on.
If you are (for example) simply logging the temperature and electricity usage in your house to look for inefficiencies, you probably care far more about numeric fields, but the text manipulation tools could prove invaluable for other tasks, like exploring wordy log files.
I put OpenRefine to the test against several categories of data: sensor readings from a health tracker, a state government data set listing school district performance ratings, and the backend logs from a MythTV server. In short, the messier the original data and the larger the data set, the more valuable pre-processing it with OpenRefine will be. To get started, you can download compressed tarballs from the OpenRefine web site; at the moment two versions are available: Google Refine 2.5 (the last release branded as such), and a beta build of the upcoming OpenRefine 2.6 release. 2.6 does not introduce much in the way of new features; apart from the re-branding work the majority of the changes are bugfixes. In both cases, you unpack the archive and run the ./refine script inside to launch the program.
The workflow consists of loading in a data set, transforming and sanitizing it as much as required, then exporting the result as another file. Along the way, one can do some exploration of the data itself, but generating publication-ready graphs, maps, or other output is out of scope. OpenRefine can import many different file types: comma-, tab-, or space-delimited text, spreadsheets (including ODF and Microsoft Excel), plus RDF, XML, JSON, and more. It can also retrieve files remotely by URL, which makes it useful for regularly-updated data publications.
It cannot, though, read from an actual database—the expectation, after all, is that the data needs massaging and correcting; a proper database with well-defined columns and records is hopefully unlikely to need such work, and if it does, working on it with proper database tools is probably better.
The sanitization process begins with the import itself; OpenRefine will do its best to auto-discover column delimiters, header text, and other structural information, but before it even completes the import, it shows the user a preview of the first few rows so that corrections can be made. The use of limited previews is a recurring theme; the operations OpenRefine can use to transform the data are executed on the full data set, but only a subset is ever shown on screen.
The primary data-cleaning process is done, after import, in the Facet/Filter screen. "Facets" are akin to smart "views" into a data column. You activate a facet on a column by clicking the column header. The facet type chosen (text, numeric, timeline, or scatterplot) triggers a quick analysis of the column for commonly-needed corrections. Activating a numeric facet brings up a bar chart showing the range of values seen and highlighting problematic entries (such as undefined "NaN"s or empty cells).
Activating a text facet brings up a list of values that can be further refined by automatically detecting what seem to be similar values: in my MythTV log file test, for example, there were quite a few lines that had extraneous spaces surrounding some fields, which OpenRefine could repair with a single click. In government data sets, a lot of the online documentation notes that inconsistent acronym usage, stray HTML character-code entries, and misspellings are common problems that OpenRefine facets can make short work of.
Apart from flagging fields for correction, facets can also be used to narrow down the portion of the data set examined. For example, sliding the handles on either side a numeric facet's bar chart will winnow down the range of entries shown in the user interface. You could export a subset of the data quite easily at that point, but narrowing down to a sub-range can also reveal further problems that need correction. For example, a few outliers that are abnormally high numbers might simply be expressed in the wrong units (e.g., $1999 instead of $19.99); by zooming in with the facet, you can click on the "change" button and apply a transformation. In this example, a simple value/100 is all that is required for the fix, but there is actually a fairly comprehensive expression language defined with string, array, math, and date functions, control structures, and variables.
"Filters" are the other main option for data massaging, although the line between a filter and a facet can be a bit hazy. As one might expect, cell contents can be filtered with regular expressions or through the built-in expression language. The difference is that activating a filter on a column does not analyze the data set for the common problems that a facet offers quick corrections for.
Facets and filters are both transient changes to the current on-screen view of the data; when activated they pop up as a box on the left-hand side of the window, and they can be closed to restore the unaltered view on the data. Their primary purpose is to isolate subsets of the whole data set, in order to fix specific problem cells. But more permanent and direct editing of the data set is possible as well; a variety of cell transformations come predefined (from simple letter-case changes to parsing HTML markup), and you can re-order, split, combine, and transpose cells, rows, and columns. The cell transformation functions are "more" permanent in the sense that they alter the table's contents, but one important feature of OpenRefine is infinite undo/redo for projects; the entire operation history is preserved and can be rolled back, even across sessions.
One of the more useful cell transformations is "Add column based on this column," which allows you to write an expression (in the built-in expression language) that evaluates the cells in one column to create data for another. This is where the conditionals and control structures prove valuable; the filter and facet tools can already alter cells based on mathematical expressions, but tossing if/then constructs and branching logic into the expression crosses over into "new data" territory. Deriving new columns from the existing ones is more powerful than simply transforming existing cell content—plus, it does not overwrite cells that could be useful for other purposes.
If it seems like there are multiple ways to alter cell contents in OpenRefine (i.e., facets/filters and the transformation tools for entire columns), that is because a lot of work went into making the facet/filter functionality powerful, since it enables the user to find and repair a subset of the data. You can still transform an entire column all at once, but the ability to zero in on a subset consisting of bad or misformatted cells is a key part of the data sanitization process.
Admittedly, complex cell transformations are pushing the envelope of "data sanitization," and can easily cross over into territory that would be better handled by a data mining tool like Orange or a statistical program like R. It is a personal choice, of course, and there is nothing wrong with doing data mining in OpenRefine; most users will export their data sets (into one of the same file formats supported as input) for use in another application. But OpenRefine does not make very deep exploration easy—the statistical functions in the expression language are basic, and there is no real visualization tool (the "scatterplot" facet will do a basic X-Y graph of two columns, but that is about all).
Where OpenRefine shines is in pinpointing problems with the data set itself—problems that the set has as a document, so to speak. And, in reality, this is a huge concern. My own brief excursion to try and find a government data set to experiment with was 90% failure, thanks to everything from dead links, mislabeled and unlabeled files, "databases" published as PDFs, and just about every other flaw imaginable. Then when I found a working data set, almost every column was an unlabeled acronym filled in with single letters to which there was no key.
A tool like OpenRefine does not make repairing all of those problems trivial, but for a data set not selected at random one would hope that making sense of it would be easier. I did, for example, find it much easier to use for purging bad data from my sensor logs and for weeding out unnecessary MythTV error messages. In any case, the tools for making the repairs are easy to use, and the program does a good job of quickly analyzing and locating problem areas. It is hard to get too specific about all that OpenRefine can do without picking a real-world task and pursuing it in depth. Fortunately, there are a lot of published tutorials that do just that; they are recommended reading for anyone still unsure if OpenRefine meets their needs.
It is not clear at present just how active the OpenRefine project is; the 2.6 beta was released in August and there have been precious few commits since. The program does seem to do what most of its users want it to, but it also seems obvious that Google's withdrawal of support has left the effort with far fewer developers. That never bodes well for a project, of course, but there appear to be plenty of dedicated users (based on the mailing list and wiki traffic), and some of them are even extension writers, so the project may still find its footing as an independent community. Hopefully so, because an easy-to-use data exploration tool is a valuable thing to have at one's beck and call, regardless of the data being explored.
The dual elliptic curve deterministic random bit generator (Dual EC DRBG) cryptographic algorithm has a dubious history—it is believed to have been backdoored by the US National Security Agency (NSA)—but is mandated by the FIPS 140-2 US government cryptographic standard. That means that any cryptographic library project that is interested in getting FIPS 140-2 certified needs to implement the discredited random number algorithm. But, since certified libraries cannot change a single line—even to fix major, fatal bugs—having a non-working version of Dual EC DRBG may actually be the best defense against the backdoor. Interestingly, that is exactly where the OpenSSL project finds itself.
OpenSSL project manager Steve Marquess posted the tale to the openssl-announce mailing list on December 19. It is, he said, "an unusual bug report for an unusual situation". It turns out that the Dual EC DRBG implementation in OpenSSL is fatally flawed, to the point where using it at all will either crash or stall the program. Given that the FIPS-certified code cannot be changed without invalidating the certification, and that the bug has existed since the introduction of Dual EC DRBG into OpenSSL, it is clear that no one has actually used that algorithm from OpenSSL. It did, however, pass the testing required for the certification somehow.
It is also interesting to note that the financial sponsor of the feature adding support for Dual EC DRBG, who is not named, did so after the algorithm was already known to be questionable. It was part of a request to implement all of SP 800-90A, which is a suite of four DRBGs that Marquess called "more or less mandatory" for FIPS certification. At the time, the project recognized the "dubious reputation" for Dual EC DRBG, but also considers OpenSSL to be a comprehensive library and toolkit: "As such it implements many algorithms of varying strength and utility, from worthless to robust." Dual EC DRBG was not even enabled by default, but it was put into the library.
The bug was discovered by Stephen Checkoway and Matt Green of the Johns Hopkins University Information Security Institute, Marquess said. Though there is a one-line patch to fix the problem included with the bug report, there are no plans to apply it. Instead, OpenSSL will be removing the Dual EC DRBG code from its next FIPS-targeted version. The US National Institute of Standards and Technology (NIST), which oversees FIPS and other government cryptography standards, has recently recommended not using Dual EC DRBG [PDF]. Since that recommendation, Dual EC DRBG has been disabled in OpenSSL anyway. Because there is essentially the same amount of testing required for fixing or removing the algorithm (for FIPS recertification), removal seems like the right course.
The problem stems from a requirement in FIPS that each block of output random numbers not match the previous block. It is, effectively, a crude test that the algorithm is actually producing random-looking data (and not repeating blocks of zeroes, for example). When there is no previous block to compare against, OpenSSL generates one that should be discarded after the comparison. But the Dual EC DRBG implementation botched the discard operation by not updating the state correctly.
Dual EC DRBG was under suspicion for other reasons even before it was adopted by NIST in 2006. In 2007, Bruce Schneier raised the alarm about an NSA backdoor in the algorithm. For one thing, Dual EC DRBG is different than the other three algorithms specified in SP 800-90A in that it is three orders of magnitude slower and that it was only added at the behest of the NSA. It was found that the elliptic curve constants chosen by NIST (with unspecified provenance) could be combined with another set of numbers—not generally known, except possibly by the NSA—to predict the output of the random number generator after observing 32 bytes of its output. Those secret numbers could have been generated at the same time the EC constants were, but it is unknown if they actually were.
The NIST standards were a bit unclear about whether the EC constants were required, but Marquess noted that the testing lab required using the constants (aka "points"):
So, what we have here is a likely backdoored algorithm that almost no one used (evidently unless they were paid $10 million) added to an open-source cryptography library funded by money from an unnamed third party. After "rigorous" testing, that code was certified as conforming to a US government cryptographic standard, but it never actually worked at all. According to Marquess: "Frankly the FIPS 140-2 validation testing isn't very useful for catching 'real world' problems."
It is almost comical (except to RSA's BSafe customers, anyway), but it does highlight some fundamental problems in the US (and probably other) government certification process. Not finding this bug is one thing, but not being able to fix it (or, more importantly, being unable to fix a problem in an actually useful cryptographic algorithm) without spending lots of time and money on recertification seems entirely broken. The ham-fisted way that the NSA went about putting the backdoor into the standard is also nearly amusing. If all its attempts were similarly obvious and noisy, we wouldn't have much to worry about—unfortunately that seems unlikely to be the case.
One other thing to possibly consider: did someone on the OpenSSL project "backdoor" the Dual EC DRBG implementation such that it could never work, but would pass the certification tests? Given what was known about the algorithm and how unlikely it was that it would ever be used by anyone with any cryptographic savvy, it may have seemed like a nice safeguard to effectively disable the backdoor. Perhaps that is far-fetched, but one can certainly imagine a developer being irritated by having to implement the NSA's broken random number generator—and doing something about it. Either way, we will probably never really know for sure.
If there are any other skeletons in the closet, it’s probably a good time to air them out before we find out there’s other things you repeatedly did not disclose. Look on the bright side: can it really be any worse than that time you had to replace every single freakin’ token in the world?
All of these serious terrorism cases argue not for the gathering of ever vaster troves of information but simply for a better understanding of the information the government has already collected and that are derived from conventional law enforcement and intelligence methods.
|Created:||December 27, 2013||Updated:||January 1, 2014|
From the openSUSE advisory:
On systems installed via the Live Media that /etc/shadow file was readable by the "users" group, which was not intended. (bnc#843230, CVE-2013-3713)
Reason for this was that the user "root" was put into the "users" group.
|Created:||December 20, 2013||Updated:||January 28, 2014|
From the Red Hat bug report:
A flaw was found in the way ack, a tool similar to grep, processed .ackrc files. If a local user ran ack in an attacker-controlled directory, it would lead to arbitrary code execution with the privileges of the user running ack. This issue affects versions 2.00 to 2.10 (such as the version in Fedora 19), and should be fixed in version 2.12. It does not affect versions below 2.00 (such as those in EPEL).
|Created:||December 23, 2013||Updated:||January 8, 2014|
|Description:||From the Mageia advisory:
Buffer overflow in the unpacksms16 function in apps/app_sms.c in Asterisk Open Source 1.8.x before 188.8.131.52, 10.x before 10.12.4, and 11.x before 11.6.1; Asterisk with Digiumphones 10.x-digiumphones before 10.12.4-digiumphones; and Certified Asterisk 1.8.x before 1.8.15-cert4 and 11.x before 11.2-cert3 allows remote attackers to cause a denial of service (daemon crash) via a 16-bit SMS message.
|Created:||December 27, 2013||Updated:||January 1, 2014|
From the Red Hat bugzilla entry:
Multiple stack overflow flaws were found in the way the XML parser of boinc-client, a Berkeley Open Infrastructure for Network Computing (BOINC) client for distributed computing, performed processing of certain XML files. A rogue BOINC server could provide a specially-crafted XML file that, when processed would lead to boinc-client executable crash.
|Created:||December 23, 2013||Updated:||January 5, 2015|
|Description:||From the Debian advisory:
Helmut Grohne discovered that denyhosts, a tool preventing SSH brute-force attacks, could be used to perform remote denial of service against the SSH daemon. Incorrectly specified regular expressions used to detect brute force attacks in authentication logs could be exploited by a malicious user to forge crafted login names in order to make denyhosts ban arbitrary IP addresses.
|Created:||December 23, 2013||Updated:||January 1, 2014|
|Description:||From the CVE entry:
The get_main_source_dir function in scripts/uscan.pl in devscripts before 2.13.8, when using USCAN_EXCLUSION, allows remote attackers to execute arbitrary commands via shell metacharacters in a directory name.
|Package(s):||eucalyptus||CVE #(s):||CVE-2012-4067 CVE-2013-2296|
|Created:||January 1, 2014||Updated:||January 1, 2014|
|Description:||Eucalyptus contains two vulnerabilities in the "Walrus" object store. An XML parsing problem (CVE-2012-4067, ESA-09) can enable unspecified denial of service attacks, while a missing authentication step (CVE-2013-2296, ESA-10) could allow unauthorized access to the internal bucket logs.|
|Created:||December 20, 2013||Updated:||April 4, 2014|
From the Ubuntu advisory:
Chris Chapman discovered cross-site scripting (XSS) vulnerabilities in Horizon via the Volumes and Network Topology pages. An authenticated attacker could exploit these to conduct stored cross-site scripting (XSS) attacks against users viewing these pages in order to modify the contents or steal confidential data within the same domain.
|Created:||December 20, 2013||Updated:||April 7, 2014|
From the Ubuntu advisory:
Steven Hardy discovered that Keystone did not properly enforce trusts when using the ec2tokens API. An authenticated attacker could exploit this to retrieve a token not scoped to the trust and elevate privileges to the trustor's roles.
|Created:||December 30, 2013||Updated:||September 24, 2014|
|Description:||From the Red Hat bugzilla:
Libgadu, an open library for communicating using the protocol e-mail, was found to have missing the ssl certificate validation. The issue is that libgadu uses openSSL library for creating secure connections. A program using openSSL can perform SSL handshake by invoking the SSL_connect function. Some certificate validation errors are signaled through, the return values of the SSL_connect, while for the others errors SSL_connect returns OK but sets internal "verify result" flags. Application must call ssl_get_verify_result function to check if any such errors occurred. This check seems to be missing in libgadu. And thus a man-in-the-middle attack is possible failing all the SSL protection.
Upstream suggested that it was a conscious decision as libgadu is reverse-engineered implementation of a proprietary protocol, they had no control over the certificates used for SSL connections, so they would add a note to the documentation about this.
|Created:||December 23, 2013||Updated:||January 1, 2014|
|Description:||From the Red Hat bugzilla:
As noted in bug #1031818, libreswan suffers from a problem with the new ike_pad= feature that was implemented in version 3.6:
During an effort to ignore IKEv2 minor version numbers as required for RFC-5996, complete parse errors of any IKE packets with version 2.1+ were mistakenly accepted for further processing. This causes a crash later on if the IKE packet is mangled (e.g. too short). Openswan turns out not to be vulnerable because it happens to abort on the mismatched IKE length versus packet length before it inspects the rest of the IKE header. And since reading an invalid IKE major aborts further parsing of the IKE header, the length remains at 0, and so it will always mismatch.
|Package(s):||memcached||CVE #(s):||CVE-2013-7239 CVE-2013-0179|
|Created:||January 1, 2014||Updated:||February 3, 2014|
|Description:||From the Debian advisory:
CVE-2011-4971: Stefan Bucur reported that memcached could be caused to crash by sending a specially crafted packet.
CVE-2013-7239: It was reported that SASL authentication could be bypassed due to a flaw related to the management of the SASL authentication state. With a specially crafted request, a remote attacker may be able to authenticate with invalid SASL credentials.
|Package(s):||openssl||CVE #(s):||CVE-2013-6450 CVE-2013-6449|
|Created:||January 1, 2014||Updated:||December 29, 2014|
|Description:||From the Debian advisory:
Multiple security issues have been fixed in OpenSSL: The TLS 1.2 support was susceptible to denial of service and retransmission of DTLS messages was fixed. In addition this updates disables the insecure Dual_EC_DRBG algorithm (which was unused anyway, see http://marc.info/?l=openssl-announce&m=13874711982232... for further information) and no longer uses the RdRand feature available on some Intel CPUs as a sole source of entropy unless explicitly requested.
|Created:||December 23, 2013||Updated:||January 6, 2014|
|Description:||From the Red Hat bugzilla:
A flaw was reported for OpenSSL 1.0.1e, that can cause application using OpenSSL to crash when using TLS version 1.2. Issue was reported via the following OpenSSL upstream ticket:
|Created:||December 30, 2013||Updated:||January 27, 2014|
|Description:||From the Red Hat bugzilla:
It was reported that perl-Proc-Daemon, when instructed to write a pid file, does that with a umask set to 0, so the pid file ends up with mode 666. This might be a security issue.
|Created:||January 1, 2014||Updated:||February 20, 2014|
|Description:||From the Debian advisory:
An unsafe use of temporary files was discovered in Puppet, a tool for centralized configuration management. An attacker can exploit this vulnerability and overwrite an arbitrary file in the system.
|Created:||January 1, 2014||Updated:||March 30, 2015|
|Description:||From the Red Hat bugzilla:
A security flaw was found in the way Python Setuptools, a collection of enhancements to the Python distutils module, that allows more easily to build and distribute Python packages, performed integrity checks when loading external resources, previously extracted from zipped Python Egg archives(formerly if the timestamp and file size of a particular resource expanded from the archive matched the original values, the resource was successfully loaded). A local attacker, with write permission into the Python's EGG cache (directory) could use this flaw to provide a specially-crafted resource (in expanded form) that, when loaded in an application requiring that resource to (be able to) run, would lead to arbitrary code execution with the privileges of the user running the application.
|Created:||December 23, 2013||Updated:||March 27, 2014|
|Description:||From the CVE entry:
Multiple format string vulnerabilities in log_subscriber.rb files in the log subscriber component in Action Mailer in Ruby on Rails 3.x before 3.2.15 allow remote attackers to cause a denial of service via a crafted e-mail address that is improperly handled during construction of a log message.
|Created:||December 23, 2013||Updated:||January 21, 2014|
|Description:||From the CVE entry:
Cross-site scripting (XSS) vulnerability in exceptions.rb in the i18n gem before 0.6.6 for Ruby allows remote attackers to inject arbitrary web script or HTML via a crafted I18n::MissingTranslationData.new call.
|Package(s):||wireshark||CVE #(s):||CVE-2013-7113 CVE-2013-7114|
|Created:||December 20, 2013||Updated:||January 6, 2014|
From the CVE entries:
CVE-2013-7113 - epan/dissectors/packet-bssgp.c in the BSSGP dissector in Wireshark 1.10.x before 1.10.4 incorrectly relies on a global variable, which allows remote attackers to cause a denial of service (application crash) via a crafted packet.
CVE-2013-7114 - Multiple buffer overflows in the create_ntlmssp_v2_key function in epan/dissectors/packet-ntlmssp.c in the NTLMSSP v2 dissector in Wireshark 1.8.x before 1.8.12 and 1.10.x before 1.10.4 allow remote attackers to cause a denial of service (application crash) via a long domain name in a packet.
|Created:||December 23, 2013||Updated:||January 1, 2014|
|Description:||From the CVE entry:
Xen 4.2.x and 4.3.x, when using Intel VT-d and a PCI device has been assigned, does not clear the flag that suppresses IOMMU TLB flushes when unspecified errors occur, which causes the TLB entries to not be flushed and allows local guest administrators to cause a denial of service (host crash) or gain privileges via unspecified vectors.
Page editor: Jake Edge
Brief itemsreleased on December 29. Previously, 3.13-rc5 was released on December 22. "This might also be a good time to say that even _if_ things continue to calm down, I think we'll be going to at least -rc8 regardless, since LCA is fairly early this year, and I won't be opening the merge window for 3.14 until after I'm back from those travels."
No end host should have rp_filter on. It unnecessarily makes our routing lookups much more expensive for zero gain on an end host. But people convinced the distributions that turning it on everywhere by default was a good idea and it stuck.
Kernel development newsRAID mechanism built in. This article will delve into how this feature works and how to make use of it.
There are two fundamental reasons to want to spread a single filesystem across multiple physical devices: increased capacity and greater reliability. In some configurations, RAID can also offer improved throughput for certain types of workloads, though throughput tends to be a secondary consideration. RAID arrays can be arranged into a number of configurations ("levels") that offer varying trade-offs between these parameters. Btrfs does not support all of the available RAID levels, but it does have support for the levels that most people actually want to use.
RAID 0 ("striping") can be thought of as a way of concatenating multiple physical disks together into a single, larger virtual drive. A strict striping implementation distributes data across the drives in a well-defined set of "stripes"; as a result, all of the drives must be the same size, and the total capacity is simply the product of the number of drives and the capacity of any individual drive in the array. Btrfs can be a bit more flexible than this, though, supporting a concatenation mode (called "single") which can work with unequally sized drives. In theory, any number of drives can be combined into a RAID 0 or "single" array.
RAID 1 ("mirroring") trades off capacity for reliability; in a RAID 1 array, two drives (of the same size) store identical copies of all data. The failure of a single drive can kill an entire RAID 0 array, but a RAID 1 array will lose no data in that situation. RAID 1 arrays will be slower for write-heavy use, since all data must be written twice, but they can be faster for read-heavy workloads, since any given read can be satisfied by either drive in the array.
RAID 10 is a simple combination of RAID 0 and RAID 1; at least two pairs of drives are organized into independent RAID 1 mirrored arrays, then data is striped across those pairs.
RAID 2, RAID 3, and RAID 4 are not heavily used, and they are not supported by Btrfs. RAID 5 can be thought of as a collection of striped drives with a parity drive added on (in reality, the parity data is usually distributed across all drives). A RAID 5 array with N drives has the storage capacity of a striped array with N-1 drives, but it can also survive the failure of any single drive in the array. RAID 6 uses a second parity drive, increasing the amount of space lost to parity blocks but adding the ability to lose two drives simultaneously without losing any data. A RAID 5 array must have at least three drives to make sense, while RAID 6 needs four drives. Both RAID 5 and RAID 6 are supported by Btrfs.
One other noteworthy point is that Btrfs goes out of its way to treat metadata differently than file data. A loss of metadata can threaten the entire filesystem, while the loss of file data affects only that one file — a lower-cost, if still highly undesirable, failure. Metadata is usually stored in duplicate form in Btrfs filesystems, even when a single drive is in use. But the administrator can explicitly configure how data and metadata are stored on any given array, and the two can be configured differently: data might be simply striped in a RAID 0 configuration, for example, while metadata is stored in RAID 5 mode in the same filesystem. And, for added fun, these parameters can be changed on the fly.
Earlier in this series, we used mkfs.btrfs to create a simple Btrfs filesystem. A more complete version of this command for the creation of multiple-device arrays looks like this:
mkfs.btrfs -d mode -m mode dev1 dev2 ...
This command will group the given devices together into a single array and build a filesystem on that array. The -d option describes how data will be stored on that array; it can be single, raid0, raid1, raid10, raid5, or raid6. The placement of metadata, instead, is controlled with -m; in addition to the modes available for -d, it supports dup (metadata is stored twice somewhere in the filesystem). The storage modes for data and metadata are not required to be the same.
So, for example, a simple striped array with two drives could be created with:
mkfs.btrfs -d raid0 /dev/sdb1 /dev/sdc1
Here, we have specified striping for the data; the default for metadata will be dup. This filesystem is mounted with the mount command as usual. Either /dev/sdb1 or /dev/sdc1 can be specified as the drive containing the filesystem; Btrfs will find all other drives in the array automatically.
The df command will only list the first drive in the array. So, for example, a two-drive RAID 0 filesystem with a bit of data on it looks like this:
# df -h /mnt Filesystem Size Used Avail Use% Mounted on /dev/sdb1 274G 30G 241G 11% /mnt
More information can be had with the btrfs command:
root@dt:~# btrfs filesystem show /mnt Label: none uuid: 4714fca3-bfcb-4130-ad2f-f560f2e12f8e Total devices 2 FS bytes used 27.75GiB devid 1 size 136.72GiB used 17.03GiB path /dev/sdb1 devid 2 size 136.72GiB used 17.01GiB path /dev/sdc1
(Subcommands to btrfs can be abbreviated, so one could type "fi" instead of "filesystem", but full commands will be used here). This output shows the data split evenly across the two physical devices; the total space consumed (17GiB on each device) somewhat exceeds the size of the stored data. That shows a commonly encountered characteristic of Btrfs: the amount of free space shown by a command like df is almost certainly not the amount of data that can actually be stored on the drive. Here we are seeing the added cost of duplicated metadata, among other things; as we will see below, the discrepancy between the available space shown by df and reality is even greater for some of the other storage modes.
Naturally, no matter how large a particular filesystem is when the administrator sets it up, it will prove too small in the long run. That is simply one of the universal truths of system administration. Happily, Btrfs makes it easy to respond to a situation like that; adding another drive (call it "/dev/sdd1") to the array described above is a simple matter of:
# btrfs device add /dev/sdd1 /mnt
Note that this addition can be done while the filesystem is live — no downtime required. Querying the state of the updated filesystem reveals:
# df -h /mnt Filesystem Size Used Avail Use% Mounted on /dev/sdb1 411G 30G 361G 8% /mnt # btrfs filesystem show /mnt Label: none uuid: 4714fca3-bfcb-4130-ad2f-f560f2e12f8e Total devices 3 FS bytes used 27.75GiB devid 1 size 136.72GiB used 17.03GiB path /dev/sdb1 devid 2 size 136.72GiB used 17.01GiB path /dev/sdc1 devid 3 size 136.72GiB used 0.00 path /dev/sdd1
The filesystem has been expanded with the addition of the new space, but there is no space consumed on the new drive. It is, thus, not a truly striped filesystem at this point, though the difference can be hard to tell. New data copied into the filesystem will be striped across all three drives, so the amount of used space will remain unbalanced unless explicit action is taken. To balance out the filesystem, run:
# btrfs balance start -d -m /mnt Done, had to relocate 23 out of 23 chunks
The flags say to balance both data and metadata across the array. A balance operation involves moving a lot of data between drives, so it can take some time to complete; it will also slow access to the filesystem. There are subcommands to pause, resume, and cancel the operation if need be. Once it is complete, the picture of the filesystem looks a little different:
# btrfs filesystem show /mnt Label: none uuid: 4714fca3-bfcb-4130-ad2f-f560f2e12f8e Total devices 3 FS bytes used 27.78GiB devid 1 size 136.72GiB used 10.03GiB path /dev/sdb1 devid 2 size 136.72GiB used 10.03GiB path /dev/sdc1 devid 3 size 136.72GiB used 11.00GiB path /dev/sdd1
The data has now been balanced (approximately) equally across the three drives in the array.
Devices can also be removed from an array with a command like:
# btrfs device delete /dev/sdb1 /mnt
Before the device can actually removed, it is, of course, necessary to relocate any data stored on that device. So this command, too, can take a long time to run; unlike the balance command, device delete offers no way to pause and restart the operation. Needless to say, the command will not succeed if there is not sufficient space on the remaining drives to hold the data from the outgoing drive. It will also fail if removing the device would cause the array to fall below the minimum number of drives for the RAID level of the filesystem; a RAID 0 filesystem cannot be left with a single drive, for example.
Note that any drive can be removed from an array; there is no "primary" drive that must remain. So, for example, a series of add and delete operations could be used to move a Btrfs filesystem to an entirely new set of physical drives with no downtime.
The management of the other RAID levels is similar to RAID 0. To create a mirrored array, for example, one could run:
mkfs.btrfs -d raid1 -m raid1 /dev/sdb1 /dev/sdc1
With this setup, both data and metadata will be mirrored across both drives. Exactly two drives are required for RAID 1 arrays; these arrays, once again, can look a little confusing to tools like df:
# du -sh /mnt 28G /mnt # df -h /mnt Filesystem Size Used Avail Use% Mounted on /dev/sdb1 280G 56G 215G 21% /mnt
Here, df shows 56GB of space taken, while du swears that only half that much data is actually stored there. The listed size of the filesystem is also wrong, in that it shows the total space, not taking into account that every block will be stored twice; a user who attempts to store that much data in the array will be sorely disappointed. Once again, more detailed and correct information can be had with:
# btrfs filesystem show /mnt Label: none uuid: e7e9d7bd-5151-45ab-96c9-e748e2c3ee3b Total devices 2 FS bytes used 27.76GiB devid 1 size 136.72GiB used 30.03GiB path /dev/sdb1 devid 2 size 142.31GiB used 30.01GiB path /dev/sdc1
Here we see the full data (plus some overhead) stored on each drive.
A RAID 10 array can be created with the raid10 profile; this type of array requires an even number of drives, with four drives at a minimum. Drives can be added to — or removed from — an active RAID 10 array, but, again, only in pairs. RAID 5 arrays can be created from any number of drives with a minimum of three; RAID 6 needs a minimum of four drives. These arrays, too, can handle the addition and removal of drives while they are mounted.
Imagine for a moment that a three-device RAID 0 array has been created and populated with a bit of data:
# mkfs.btrfs -d raid0 -m raid0 /dev/sdb1 /dev/sdc1 /dev/sdd1 # mount /dev/sdb1 /mnt # cp -r /random-data /mnt
At this point, the state of the array looks somewhat like this:
# btrfs filesystem show /mnt Label: none uuid: 6ca4e92a-566b-486c-a3ce-943700684bea Total devices 3 FS bytes used 6.57GiB devid 1 size 136.72GiB used 4.02GiB path /dev/sdb1 devid 2 size 136.72GiB used 4.00GiB path /dev/sdc1 devid 3 size 136.72GiB used 4.00GiB path /dev/sdd1
After suffering a routine disk disaster, the system administrator then comes to the conclusion that there is value in redundancy and that, thus, it would be much nicer if the above array used RAID 5 instead. It would be entirely possible to change the setup of this array by backing it up, creating a new filesystem in RAID 5 mode, and restoring the old contents into the new array. But the same task can be accomplished without downtime by converting the array on the fly:
# btrfs balance start -dconvert=raid5 -mconvert=raid5 /mnt
(The balance filters page on the Btrfs wiki and this patch changelog have better information on the balance command than the btrfs man page). Once again, this operation can take a long time; it involves moving a lot of data between drives and generating checksums for everything. At the end, though, the administrator will have a nicely balanced RAID 5 array without ever having had to take the filesystem offline:
# btrfs filesystem show /mnt Label: none uuid: 6ca4e92a-566b-486c-a3ce-943700684bea Total devices 3 FS bytes used 9.32GiB devid 1 size 136.72GiB used 7.06GiB path /dev/sdb1 devid 2 size 136.72GiB used 7.06GiB path /dev/sdc1 devid 3 size 136.72GiB used 7.06GiB path /dev/sdd1
Total space consumption has increased, due to the addition of the parity blocks, but otherwise users should not notice the conversion to the RAID 5 organization.
A redundant configuration does not prevent disk disasters, of course, but it does enable those disasters to be handled with a minimum of pain. Let us imagine that /dev/sdc1 in the above array starts to show signs of failure. If the administrator has a spare drive (we'll call it /dev/sde1) available, it can be swapped into the array with a command like:
btrfs replace start /dev/sdc1 /dev/sde1 /mnt
If needed, the -r flag will prevent the system from trying to read from the outgoing drive if possible. Replacement operations can be canceled, but they cannot be paused. Once the operation is complete, /dev/sdc1 will no longer be a part of the array and can be disposed of.
Should a drive fail outright, it may be necessary to mount the filesystem in the degraded mode (with the "-o degraded" flag. The dead drive can then be removed with:
btrfs device delete missing /mnt
The word "missing" is recognized as meaning a drive that is expected to be part of the array, but which is not actually present. The replacement drive can then be added with btrfs device add, probably followed by a balance operation.
The multiple-device features have been part of the Btrfs design from the early days, and, for the most part, this code has been in the mainline and relatively stable for some time. The biggest exception is the RAID 5 and RAID 6 support, which was merged for 3.9. Your editor has not seen huge numbers of problem reports for this functionality, but the fact remains that it is relatively new and there may well be a surprise or two there that users have not yet encountered.
Built-in support for RAID arrays is one of the key Btrfs features, but the list of advanced capabilities does not stop there. Another fundamental aspect of Btrfs is its support for subvolumes and snapshots; those will be discussed in the next installment in this series.Jailhouse is a new hypervisor designed to cooperate with Linux and run bare-metal applications or modified guest operating systems. Despite this cooperation, Jailhouse is self-contained and uses Linux only to bootstrap and (later) manage itself. The hypervisor is free software released under GPLv2 by Siemens; the Jailhouse project was publicly announced in November 2013, and is in an early stage of development. Currently, Jailhouse supports 64-bit x86 systems only; ARM support is on the roadmap, though, and, given that the code is portable, we may see more architectures added to this list in the future.
Linux has many full-fledged hypervisors (including KVM and Xen), so why bother creating another one? Jailhouse is different. First of all, it is a partitioning hypervisor that is more concerned with isolation than virtualization. Jailhouse is lightweight and doesn't provide many features one traditionally expects from virtualization systems. For example, there is no support for overcommitment of resources, guests can't share a CPU because there is no scheduler, and Jailhouse can't emulate devices you don't have.
Instead, Jailhouse enables asymmetric multiprocessing (AMP) on top of an existing Linux setup and splits the system into isolated partitions called "cells." Each cell runs one guest and has a set of assigned resources (CPUs, memory regions, PCI devices) that it fully controls. The hypervisor's job is to manage cells and maintain their isolation from each other. This approach is most useful for virtualizing tasks that require full control over the CPU; examples include realtime control tasks and long-running number crunchers (high-performance computing). Besides these, it can be used for security applications: to create sandboxes, for example.
A running Jailhouse system has at least one cell known as the "Linux cell." It contains the Linux system used to initially launch the hypervisor and to control it afterward. This cell's role is somewhat similar to that of dom0 in Xen. However, the Linux cell doesn't assert full control over hardware resources as dom0 does; instead, when a new cell is created, the Linux cell cedes control over some of its CPU, device, and memory resources to that new cell. This process is called "shrinking".
Jailhouse relies on hardware-assisted virtualization features provided by the target architecture; for Intel processors (the only ones supported as of this writing) this means VT-x and VT-d support. These requirements make the hypervisor design clean, its code compact and relatively simple; the goal is to keep Jailhouse below 10,000 lines of code. Traditionally, hypervisors were either large and complex, or intentionally simple if built for the classroom. Jailhouse fits in between: it is a real product targeted at production use that is small enough to cover in a two-part article series.
The easiest way to play with Jailhouse now is to run it inside KVM with a simple bare-metal application, apic-demo.bin (provided with the Jailhouse source), as a guest. In this case, VT-d is not used since KVM doesn't emulate it (yet). The README file describes how to create this setup in detail; additional help can be found in the mailing list archives.
Running Jailhouse on real hardware is also possible, but is not very easy at this time. You will need to describe the resources available to Jailhouse (a process covered in the next section); a good starting point for this is the contents of /proc/iomem in your Linux system. This is an error-prone process, but hopefully this article will provide enough insight into how Jailhouse works internally to get it running on the hardware of your choice.
A good introduction to Jailhouse (including slides) can be found in the initial announcement. We won't reproduce it here but rather will dive straight into the hypervisor internals.
Before it can be used to partition a real system, the Jailhouse system must be told how that system is put together. To that end, Jailhouse uses struct jailhouse_system (defined in cell-config.h) as a descriptor for the system it runs on. This structure contains three fields:
A cell descriptor starts with struct jailhouse_cell_desc, defined in cell-config.h as well. This structure contains basic information like the cell's name, size of its CPU set, the number of memory regions, IRQ lines, and PCI devices. Associated with struct jailhouse_cell_desc are several variable-sized arrays which follow immediately after it in memory; these arrays are:
Currently, Jailhouse has no human-readable configuration files. Instead, the C structures mentioned above are compiled with the "-O binary" objcopy flag to produce raw binaries rather than ELF objects, and the jailhouse user-space tool (see tools/jailhouse.c) loads them into memory in that form. Creating such descriptors is tedious work that requires extensive knowledge of the hardware architecture. There are no sanity checks for descriptors except basic validation, so you can easily create something unusable. Nothing prevents Jailhouse from using a higher-level XML or similar text-based configuration files in the future — it is just not implemented yet.
Another common data structure is struct per_cpu, which is architecture-specific and defined (for x86) in x86/include/asm/percpu.h. It describes a CPU that is assigned to a cell. Throughout this text, we will refer to it as cpu_data. There is one cpu_data structure for each processor Jailhouse manages, and it is stored in a per-CPU memory region called per_cpu. cpu_data contains information like the logical CPU identifier (cpu_id field), APIC identifier (apic_id), the hypervisor stack (stack[PAGE_SIZE]), a back reference to the cell this CPU belongs to (cell), a set of Linux registers (i.e. register values used when Linux moved to this CPU's cell), and the CPU mode (stopped, wait-for-SIPI, etc). It also holds the VMXON and VMCS regions required for VT-x.
Finally, there is struct jailhouse_header defined in header.h, which describes the hypervisor as a whole. It is located at the very beginning of the hypervisor binary image and contains information like the hypervisor entry point address, its memory size, page offset, and number of possible/online CPUs. Some fields in this structure have static values, while the loader initializes the others at Jailhouse startup.
Jailhouse operates in a physically continuous memory region. Currently, this region must be reserved at boot using the "memmap=" kernel command-line parameter; future versions may use the contiguous memory allocator (CMA) instead. When you enable Jailhouse, the loader linearly maps this memory into the kernel's virtual address space. Its offset from the memory region's base address is stored in the page_offset field of the header. This makes converting from host virtual to physical address (and the reverse) trivial.
To enable the hypervisor, Jailhouse needs to initialize its subsystems, create a Linux cell according to the system configuration, enable VT-x on each CPU, and, finally, migrate Linux into its cell to continue running in guest mode. From this point, the hypervisor asserts full control over the system's resources. As stated earlier, Jailhouse doesn't depend on Linux to provide services to guests. However, Linux is used to initialize the hypervisor and to control it later. For these tasks, the jailhouse user-space tool issues ioctl() commands to /dev/jailhouse. The jailhouse.ko module (the loader), compiled from main.c, registers this device node when it is loaded into the kernel.
To start the sequence of events described above, the jailhouse tool is used to issue a JAILHOUSE_ENABLE ioctl() which causes a call to jailhouse_enable(). It loads the hypervisor code into the reserved memory region via a request_firmware() call. Then jailhouse_enable() maps Jailhouse's reserved memory region into kernel space using ioremap() and marks its pages as executable. The hypervisor and a system configuration (struct jailhouse_system) copied from user space are laid out in the reserved region. Finally, jailhouse_enable() calls enter_hypervisor() on each CPU, passing it the header, and waits until all these calls return. After that, Jailhouse is considered enabled and the firmware is released.
enter_hypervisor() is really a thin wrapper that jumps to the entry point set in the header. The entry point is defined in hypervisor/setup.c as arch_entry, which is coded in assembler and resides in x86/entry.S. This code locates the per_cpu region for a given cpu_id, stores the Linux stack pointer and cpu_id in it, sets the Jailhouse stack, and calls the architecture-independent entry() function, passing it a pointer to cpu_data. When this function returns, the Linux stack pointer is restored.
The entry() function is what actually enables Jailhouse. It behaves slightly differently for the first CPU it initializes than for the rest of them. The first CPU is called "master"; it is responsible for system-wide initialization and checks. It sets up paging, maps config_memory if it is present in the system configuration, checks the memory regions defined in the Linux cell descriptor for alignment and access flags, initializes the APIC, creates Jailhouse's Interrupt Descriptor Table (IDT), configures x2APIC guest (VMX non-root) access (if available), and initializes the Linux cell. After that, VT-d is enabled and configured for the Linux cell. Non-master CPUs, instead, only initialize themselves.
CPU initialization is a lengthy process that begins in the cpu_init() function. For starters, the CPU is registered as a "Linux CPU": its ID is validated, and, if it is on the system CPU set, it is added to the Linux cell. The rest of the procedure is architecture-specific and continues in arch_cpu_init(). For x86, it saves the current register values in the cpu_data structure. These values will be restored on first VM entry. Then Jailhouse swaps the IDT (interrupt handlers), the Global Descriptor Table (GDT) that contains segment descriptors, and CR3 (page directory pointer) register with its own values.
Finally, arch_cpu_init() fills the cpu_data->apic_id field (see apic_cpu_init()) and configures Virtual Machine Extensions (VMX) for the CPU. This is done in vmx_cpu_init(), which first checks that CPU provides all the required features. Then it prepares the Virtual Machine Control Structure (VMCS) which is located in cpu_data, and enables VMX on the CPU. The VMCS region is configured in vmcs_setup() so that on every VM entry or exit:
When all CPUs are initialized, entry() calls arch_cpu_activate_vmm(). This is point of no return: it sets the RAX register to zero, loads all the general-purpose registers left and issues a VMLAUNCH instruction to enter the guest. Due to the guest register setup described earlier and because RAX (which, by convention, stores function return values) is zero, Linux will consider the entry() call to be successful and move on as a guest.
This concludes the Part 1 of the series. In Part 2, we will look at how Jailhouse handles interrupts, and what needs to be done to create a cell, and to disable the hypervisor.
Patches and updates
Core kernel code
Filesystems and block I/O
Virtualization and containers
Page editor: Jonathan Corbet
Distributions, like all software (and other) projects, have failures. One of the most important things that can come out of any kind of failure is to learn from it and try to prevent similar failures in the future. That is precisely the goal of Adam Williamson's post mortem on the FedUp bug that affected users trying to upgrade to Fedora 20. In it, he explained how and why things went bad, with an eye toward better testing to catch this kind of bug in the future. He also had some thoughts on how the current release process might be changed to help avoid bugs that arise because of the time crunch at the end of the cycle.
Williamson shepherds Fedora's quality assurance (QA) efforts and is thus well-placed to observe what went wrong and to suggest fixes going forward. QA didn't catch the bug before it got out into the wild and Williamson accepts his share of the blame for that. But blame is not really the purpose of the exercise. Finding the underlying problems and addressing them for the future are the goals.
When Fedora 20 was released, the FedUp version most Fedora 18 and 19 users had (fedup-0.7) would not properly upgrade those systems to F20. FedUp is the approved method for upgrading from one Fedora version to the next (or even for skipping a version and going straight from F18 to F20, for example). The solution was fairly simple, even for those who had tried and failed to upgrade with 0.7: get fedup-0.8 and use it. There was a bit more to it than that, particularly for F18 users, but that was the crux of the fix.
The bug was spotted quickly and fixed pretty quickly, but the upgrade process is one of the most high-profile places for a release-day bug. It would certainly have left a bad taste for any users who were bitten by it. The fact that the bug could easily be overcome helped, but it was something of a black eye for the distribution on a day intended to celebrate a new release.
So, how does Fedora avoid the problem in the future? The actual underlying cause of the bug has not been identified, according to Williamson, but it appears that the versions for FedUp and the fedup-dracut package must be kept the same, so that the initramfs created by fedup-dracut will work with the FedUp installed on the user's machine. Essentially, FedUp 0.7 was fetching an initramfs created by fedup-dracut-0.8, which would not work to reboot the system as part of the upgrade. Falling back to the F18 or F19 kernel and initramfs would still allow the system to boot, however.
Beyond the bug's proximate cause, Williamson noted several problems that led to the bug, including a lack of widespread knowledge about how FedUp works, inadequate test cases, and two problems that are endemic to Fedora's short stabilization phase: release candidates that are short-lived and large changes to fundamental packages made late in the cycle. The latter two tend to reduce the amount of time that QA has for testing, which can lead to more bugs slipping through the cracks. Large, late changes also mean that not all of the ramifications of a new feature are discovered pre-release, which is another source of surprises.
Adding better test cases is fairly straightforward. The existing tests were set up when FedUp was developing rapidly, so the test case grabbed the package from the updates-testing repository (rather than the stable or updates repositories). For Fedora 18 and 19, fedup-0.8 was in updates-testing, so QA never saw the bug. The tests have been changed to get the package from the other repositories.
The bug also probably led to a better understanding of at least some of the workings of FedUp within the Fedora development community. In tracking down the bug and fixes for it, some folks got a crash course in FedUp and how it operates. That may help address Williamson's concern about a lack of knowledge of the tool. Given its importance to the distribution, a tool like FedUp should be well understood by more than just a small handful of community members.
The other identified issues will be harder to address, at least in the short term. But, as Williamson noted, squeezing everything into the tail end of the release cycle is a known problem; this bug just helped highlight it again:
It's also another good indicator that we should do whatever we can to try and land major changes much earlier in the release cycle. This is hardly a new observation, of course, nor an issue of which many relevant people were previously unaware, and there are always good reasons why we wind up landing the kitchen sink a week before release, but it's always good to have another reminder.
There are likely lessons for other projects and distributions here. While some of the issues were Fedora-specific, most were not. Williamson has done a nice service not only for Fedora here, but for the wider community. There are some real advantages to doing our work in the open—learning from other projects' successes and failures is just one of them.
Russ Allbery, meanwhile, is in favor of systemd. "There are two separate conceptual areas in which I think systemd offers substantial advantages over upstart, each of which I would consider sufficient to choose systemd on its own. Together, they make a compelling case for systemd."
In both cases, the authors have extensively documented their reasons for their decisions; reading the full messages is recommended.Engineered with Red Hat-hardened OpenStack Havana code, Red Hat Enterprise Linux 6.5, and the Red Hat Enterprise Virtualization Hypervisor built on KVM, Red Hat Enterprise Linux OpenStack Platform offers IT infrastructure teams, cloud application developers, and experienced cloud builders a clear path to the open hybrid cloud without compromising on availability, security, or performance."
Debian GNU/LinuxPS: We will freeze on the 5th of November. Your packages will be RC bug free way before then, right?" Debian is vast, and many people contribute to it, not just Debian Developers. We value and encourage all kinds of contributions, but we currently fail to make that work visible, and to credit it."
FedoraPrior to Fedora 20's release, the test cases for fedup recommended testing the latest version of fedup from updates-testing against the upgrade initramfs from the development/20 tree. This procedure was a holdover from the very early days of FedUp, when it was changing daily and testing anything older was uninteresting, and when procedures for the generation and publishing of the upgrade initramfs had not yet been clearly established (and TC/RC trees did not contain one). However, it is no longer appropriate for the more mature state of FedUp development at this point in time, and it should have been changed earlier. We in QA apologize to the project for this oversight."
Newsletters and articles of interest
Page editor: Rebecca Sobol
Python 3 has been around for five years now, but adoption of the language still languishes—at least according to some. It is a common problem for projects that make non-backward-compatible changes, but it is even more difficult for programming languages. There is, typically, a huge body of installed code and, for Python, libraries that need to be upgraded as well. If the new features aren't compelling enough, and the previous version is already quite capable, it can lead to slow adoption of the new version—and frustration for some of its developers. Solving that problem for Python 3 is on the minds of some.
Recently, Alex Gaynor, who has worked on core Python and PyPy, expressed his concerns that Python 3 would never take off, partly because the benefits it brings are not significant enough to cause projects and users to switch over to Python 3. After five years, few are downloading packages for Python 3 and developers aren't moving to it either:
That leads to language stagnation at some level, Gaynor said. Everyone is still using Python 2, so none of the features in 3.x are getting exercised. Since Python 2 is feature frozen, all of the new features go into versions that few are using. And it is mostly Python-language developers (or others closely related to the project) who are using the new features; the majority of "real" users are not, so the feedback on those features may be distorted:
Gaynor concluded that the divergence of Python 2 and 3 has been bad for the community. He suggested releasing a Python 2.8 that backported all of the features from Python 3 and emitted warnings for constructs that would not work in Python 3. That, he said, would give users a path for moving to Python 3—and a way to start exercising its new features.
As might be guessed, not all were in favor of his plan, but the frustration over the Python 3 adoption rate seems fairly widespread—at least with the commenters on Gaynor's blog. So far, the conversation has not spilled over to python-ideas or some other mailing list.
Alan Franzoni agreed that some kind of transition path, rather than an abrupt jump, is needed.
Burn the ship and force people to move ashore or go away.
In addition, rwdim suggested that the Python package index (PyPI) stop serving packages that are not Python 3 compatible at some point. That last suggestion was not well received by PyPI maintainers and others, but it did attract other semi-belligerent comments. For example, "jgmitzen" likens users sticking with 2.x to terrorists (and to the Tea Party in another comment). While perhaps a bit overwrought, jgmitzen's point is that supporting 2.x in the Python ecosystem is taking time and energy away from 3.x—to the detriment of the language.
But "gsnedders" is not sure that a 2.8 really brings anything to the table. In the libraries that gsnedders maintains, things have gotten to the point where a single code base can support both >=2.6 and 3.x, and that should be true for most projects. The more recent feature additions for Python 3 are in the standard library, which means they are available for 2.x via PyPI.
Like rwdim, Sean Jensen-Grey would like to see an evolutionary approach so that a single interpreter can be used with both Python 2 and 3. In another comment, he referenced a March 2012 blog post from Aaron Swartz that outlines his vision of how the Python 3 transition should have worked. It followed the established pattern of adding new features to Python 2.x, which is clearly an evolutionary approach.
But Python 3 set out with a non-evolutionary approach. Python Enhancement Proposal (PEP) 3000 clearly specified a break with 2.x backward compatibility. The question seems to be: is it time to rethink that strategy in light of the slow adoption for Python 3?
It may simply be a matter of time, too. Linux distributions are starting to plan for shipping Python 3 as the default—some already have made the switch. Those kinds of changes can only help spur adoption, though it may still take a while.
In addition, Some don't seem convinced that Python 3 adoption is lagging, or at least that it is lagging as badly as is sometimes portrayed. To start to answer that question, Dan Stromberg has put together a survey on Python 2.x/3.x use. Whatever the outcome of that, though, it seems likely that many core developers are less than completely pleased with where Python 3 uptake is today—and will be looking for ways to improve it.
Email is something that everyone uses, every day. It's intrinsically federated.
We should really be working harder to make it easy for every family or individual to run an email server at home or on their own cloud server.
Also worthy of note: the GnuPG developers have launched a crowdfunding campaign to help with GnuPG 2.1 development, update the project's infrastructure, and more.release announcement for details. Version 2.6.0 of the GnuCash accounting system has been released. New features include a reworked reports subsystem, the ability to attach external files (receipts, for example) to transactions, a number of new business features, a year-2038 fix, and a relicensing to GPLv2+. See the GnuCash 2.6.0 release tour page for more information. Version 1.4 of the Darktable photo editor is out. New features include an embedded Lua engine for scripting, a number of new mask types, various performance enhancements, a new "waveform" histogram mode, and more. Commotion is a mesh networking system intended for resiliency in difficult situations; it claims a number of real-world deployments. The 1.0 release has just been announced. "The launch represents the first full iteration of the technology, which makes it possible for communities to build and own their communications infrastructure using 'mesh' networking. In mesh networks, users connect their devices to each other without having to route through traditional major infrastructure." Binary downloads (including a special OpenWRT image and an Android client) are available from this page; source is hosted on github.
Version 0.17 of the notmuch email indexer has been released. This update fixes a major bug with SHA1 computation on big-endian machines, so it is thus incompatible with older releases in that regard, even though the old SHA1 computations were incorrect. "This meant that messages with overlong or missing message-ids were given different computed message-ids than on more common little endian architectures like i386 and amd64. If you use notmuch on a big endian architecture, you are strongly advised to make a backup of your tags using `notmuch dump` before this upgrade." Other changes include better handling of duplicate messages, many improvements to the Emacs front end, and a long list of assorted bugfixes and new options.GNU Octave is a Matlab-like interpreted language for numerical computations; the 3.8.0 release has just been announced. "One of the biggest new features for Octave 3.8 is a graphical user interface. It is the one thing that users have requested most often over the last few years and now it is almost ready." Other new features include Matlab-compatible nested functions, named exceptions, and more; see the NEWS file for the full list.
Newsletters and articles
Page editor: Nathan Willis
Brief itemsannounced another round of venture funding, said to be on the order of $22 million. "What does this mean for you as a CM user? Not much yet, except that you’ll see more new things from us more often. We will continue to invest in the community by way of increased resources, sponsoring more events, and of course staying open. You’ll see new apps and features from us, new services, and also more devices which run CM out of the box."
Articles of interesta survey of crowdfunded Linux-based device projects launched in 2013. "Of the 19 such products listed below, five were never successfully crowdfunded. Of these unfunded devices, all but one appear to be moving forward with alternative funding. In fact, one — CrystalFontz America’s CFA10036 module — has already shipped. That leaves Canonical’s doomed, yet history making Ubuntu Edge smartphone as the only 'failure.'" announced the return of a great deal of historical embedded Linux content to the web. "The LinuxDevices Archive is searchable and also available from a calendar interface, so you can click on any month of any year between 1999 and 2012 and see what pops up. Although some stories did not survive the various transitions between content management systems, the Archive includes over 14,000 LinuxDevices posts, most with images intact, including news, product showcases, and special articles and editorials. So far, just about everything we’ve searched for has emerged in good shape."
Calls for PresentationsThe idea behind the conference is not to talk about women in free software, or women in Debian, but rather to make discussion about Debian subjects more inclusive for women. If you agree with this goal, spread the word. Forward this call for potential speakers and help us make this event a great success!" LWN's coverage of the 2013 Summit for an overview of the type of discussion held there.
|January 7||March 15
|Chemnitz Linux Days 2014||Chemnitz, Germany|
|January 10||January 18
|Paris Mini Debconf 2014||Paris, France|
|January 15||February 28
|FOSSASIA 2014||Phnom Penh, Cambodia|
|January 15||April 2
|Libre Graphics Meeting 2014||Leipzig, Germany|
|January 17||March 26
|16. Deutscher Perl-Workshop 2014||Hannover, Germany|
|January 19||May 20
|PGCon 2014||Ottawa, Canada|
|January 19||March 22||Linux Info Tag||Augsburg, Germany|
|January 22||May 2
|LOPSA-EAST 2014||New Brunswick, NJ, USA|
|January 28||June 19
|USENIX Annual Technical Conference||Philadelphia, PA, USA|
|January 30||July 20
|OSCON 2014||Portland, OR, USA|
|January 31||March 29||Hong Kong Open Source Conference 2014||Hong Kong, Hong Kong|
|January 31||March 24
|Linux Storage Filesystem & MM Summit||Napa Valley, CA, USA|
|January 31||March 15
|Women MiniDebConf Barcelona 2014||Barcelona, Spain|
|January 31||May 15
|ScilabTEC 2014||Paris, France|
|February 1||April 29
|Android Builders Summit||San Jose, CA, USA|
|February 1||April 7
|ApacheCon 2014||Denver, CO, USA|
|February 1||March 26
|Collaboration Summit||Napa Valley, CA, USA|
|February 3||May 1
|Linux Audio Conference 2014||Karlsruhe, Germany|
|February 5||March 20||Nordic PostgreSQL Day 2014||Stockholm, Sweden|
|February 8||February 14
|Linux Vacation / Eastern Europe Winter 2014||Minsk, Belarus|
|February 9||July 21
|EuroPython 2014||Berlin, Germany|
|February 14||May 12
|OpenStack Summit||Atlanta, GA, USA|
|February 27||August 20
|USENIX Security '14||San Diego, CA, USA|
If the CFP deadline for your event does not appear here, please tell us about it.
|January 6||Sysadmin Miniconf at Linux.conf.au 2014||Perth, Australia|
|Real World Cryptography Workshop||NYC, NY, USA|
|QtDay Italy||Florence, Italy|
|Paris Mini Debconf 2014||Paris, France|
|January 31||CentOS Dojo||Brussels, Belgium|
|FOSDEM 2014||Brussels, Belgium|
|Config Management Camp||Gent, Belgium|
|Open Daylight Summit||Santa Clara, CA, USA|
|Django Weekend Cardiff||Cardiff, Wales, UK|
|devconf.cz||Brno, Czech Republic|
|Linux Vacation / Eastern Europe Winter 2014||Minsk, Belarus|
|conf.kde.in 2014||Gandhinagar, India|
|Southern California Linux Expo||Los Angeles, CA, USA|
|February 25||Open Source Software and Govenrment||McLean, VA, USA|
|FOSSASIA 2014||Phnom Penh, Cambodia|
If your event does not appear here, please tell us about it.
Page editor: Rebecca Sobol
Copyright © 2014, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds