LWN.net Weekly Edition for January 12, 2012
The ColorHug adds a remote disable "feature"
The ColorHug is an open hardware and software colorimeter that can be used to calibrate monitor screens for color matching purposes. It is the brainchild of GNOME and Red Hat hacker Richard Hughes, who has put in a rather large investment of time and money to get the project off the ground. It was announced back in November and the first 50 units have rolled off the "assembly line", but Hughes is concerned that fraudsters may cause him to lose money by claiming they didn't receive ColorHugs that he shipped. To combat that, he turned to a technique that many may find surprising: the capability to remotely disable ColorHugs that were reported lost in shipping.
According to Hughes, it was his bank manager that alerted him to the problem of people who order things over the internet and then fraudulently claim that they never received them. Due to a UK "distance selling" law from 2000, Hughes's company is on the hook to refund the £48 selling price even if it has reason to believe that the device actually was delivered. Given that he is funding the company out of his own pocket (and sweat), Hughes wanted some way to deter would-be fraudsters.
What he came up with is a way to remotely disable ColorHugs. If the user runs the GUI firmware update application, it will send the serial number of the ColorHug to a server, which will check it against a blacklist of serial numbers for ColorHugs that were reported lost. If the serial number is on that list, no firmware update will be provided and the ColorHug device will be disabled by setting a flag in the firmware; it will become a free brick, rather than the free colorimeter the scammer thought they were getting.
One might guess that the number of scammers interested in free colorimeters is low, and Hughes essentially agrees, noting that he will likely never use the feature. But he does believe it will act as a deterrent that protects him. The bank painted a fairly stark picture that clearly has him worried:
But, the existence of a remote kill switch—even in the hands of a
longtime free software developer who can be trusted not to abuse
it—makes some people uncomfortable.
It's also unclear that it actually serves as much of a deterrent. It is
fairly simple to avoid using the GUI tool, get a copy of the updated
firmware from somewhere
(like the ColorHug download
page), and use the command-line tools to update the firmware. Even a
"bricked" ColorHug can be restored by flashing a new bootloader (something
any "moderately clever geek
" could do, Hughes said). One
could also set the serial number to a non-blacklisted value (unlike
many other blacklists, the ColorHug
blacklist is available for inspection).
One of the obvious choices that would seem to avoid the entire problem is to require ColorHug purchasers to pay for some form of tracked shipping (e.g. FedEx, UPS, or DHL), though even that may be insufficient. There are, evidently, folks out there who will sign for a package using someone else's name then claim the package never arrived. In addition, tracked shipping from Hughes's UK location to other countries can be expensive, on the order of £8-9, which represents a 20% surcharge on the device. It also means that all of the honest customers (presumably the overwhelmingly vast majority) have to pay more to protect against the unscrupulous minority.
For those reasons, Hughes added the remote disable. When he mentioned
it on the ColorHug Google+ page, reactions were mixed, which seemed to take
Hughes somewhat by surprise. Simo Sorce said "Remote deactivation is a
really nasty feature, but beyond that is going to be a major headache to
maintain.
" Kay Sievers was even more blunt:
Maybe you should just get a few beers and rethink what you are trying to accomplish.
Others were more understanding. Paweł T. Jochym points out that Hughes is
the one with something to lose: "He is working in real world and had to
invest his own coin. The risk is his not yours.
" The deterrence
rests on the understanding that the device will be disabled if it is "lost"
in the mail, in much the same way that anti-theft signs at houses work,
John Tamplin said. He continued with some ideas for more active tracking,
but did note the negatives:
Phoning home is not going to be a very popular feature with privacy-conscious users, as Tamplin notes. One might also guess that scammers who actually want to use the device will find ways around the "feature".
There is a real question whether the deterrence will truly be effective. It's not at all clear that casual scammers will even notice the disablement feature; anyone who truly wants a free colorimeter is likely to have the minimal technical skills required to circumvent the problem. In the end analysis, colorimeters are not likely to be ultra-popular much-sought-after devices—we aren't talking about music players, tablets, or phone handsets after all—the resale market will be vanishingly small, so what's the business model for the scammer?
There is also the logistical overhead of tracking serial numbers, ensuring that only the right one(s) get on the blacklist, and so on. The remote disable is not completely risk-free either, and could lead to unhappy customers if something goes awry. Overall, it seems like a very large hammer for a fairly small problem. But, as Jochym noted, it is Hughes's money that is at risk, thus it is his decision to make.
Things like remote disable are generally considered to be "anti-features" that proprietary companies bake into their products, so it's not surprising that some open source proponents would find it to be less-than-welcome on an otherwise open device. But, since the schematics and code are available, someone suitably motivated could create different firmware without remote disable and/or build their own ColorHugs and even market those. Given that Hughes doesn't seem to have a huge profit motive behind this effort, he might just welcome someone else taking on the burden.
Plenty of other devices are sent from the UK
without a remote disable feature; many are likely to be in more
popular device categories where fraud is a bigger problem than it is in the
colorimeter realm. Presumably,
those companies are pricing their products with this fraud factor in mind,
but Hughes is reluctant to do so because it puts the device "out of
the reach of many students
" and may push others toward the
proprietary colorimeters due to the price.
While it may be tempting to take Hughes to task over this (and some are), it is hard to argue that he should take risks he is unwilling to take—even if those risks seem fairly miniscule from the outside. Those who would like a colorimeter, but are unhappy with remote disable, can either hack the firmware or the GUI tool—or decide not to buy one. The ColorHug itself looks like a very nice piece of hardware that fills a hole for free desktops that the proprietary alternatives can't. We plan to review it once we can get our hands on one—the first 50 flew off the "shelves" before we could do so. Given the overall openness of the device, and the ability to hack around the remote disable "problem" in various ways, it is really more of an annoyance than anything else—though one that many would argue could and should have been avoided.
Cinnamon and Razor-qt: A tale of alternative desktops
Despite how often we hear about "the post-PC era," the Linux desktop environment is certainly an active space, churning with more competing projects than ever. Two recent additions to that space are Cinnamon and Razor-qt, which could be described as "alternative DEs" breaking away from the more established GNOME and KDE offerings, respectively. The relationships between the projects are not quite that simple, however, as Cinnamon has plans for retaining GNOME compatibility, while Razor-qt is more interested in producing a lightweight, stripped-down environment.
Cinnamon
Cinnamon is the brainchild of Linux Mint's Clement Lefebvre. The Mint distribution is derived from Ubuntu, but it has set the goal for itself of supporting multiple DEs in parallel. In November 2011, Lefebvre blogged about the upcoming release of Mint 12 and outlined two alternatives that Mint users could expect to see in place of the GNOME Shell environment. One was MATE, a third-party revival of the GNOME 2.x desktop. MATE was started by an Arch Linux developer, but Lefebvre serves as the project's release manager.
The second alternative was the Mint GNOME Shell Extensions (MGSE), which as the name makes clear is a suite of extensions for GNOME 3.2's GNOME Shell. Individually the extensions implement several desktop conventions not available in vanilla GNOME Shell (such as a bottom panel, a window list, and an "applications menu"), and alter some individual pieces of GNOME Shell's behavior, such as window switching.
Although using MGSE restores multiple desktop components that GNOME Shell's critics said they missed from GNOME 2.x, it remains GNOME Shell underneath. In late December 2011, Lefebvre unveiled Cinnamon, which takes the MGSE concept further by replacing GNOME Shell outright. The latest release of Cinnamon is version 1.1.3, from January 2. At the moment the code is officially available as 32- or 64-bit Debian packages (in addition to source hosted at the Linux Mint GitHub site), while third-party RPM packages have been contributed for openSUSE and Fedora.
1.1.3 picks up where MGSE left off, not only providing the bottom panel, applications menu, and window list features, but beginning the process of removing GNOME Shell components that the project considers unsatisfactory. The GNOME Shell "Applications View," which is the search-based interface to a system's installed applications, is the first to go. The Applications View is one of the two overlay modes triggered by activating GNOME Shell's "Activities" screen; since Cinnamon provides menu-driven access to applications, the Applications View is redundant.
On the other hand, at present Cinnamon retains the Activities screen's other mode, the "Windows" switcher, although it makes several alterations. The tab key cycles between individual windows rather than applications (which only results in different behavior for multi-window applications), and each window is stamped with its application icon. Cinnamon also adds a "Themes" overlay mode to the Activities screen, which allows the user to switch between any Cinnamon themes stored in ~/.themes. However, the entire Activities screen can be disabled by editing a dconf key.
The bottom panel is similar to GNOME 2.x's, though it uses GNOME 3 technology to provide easier right-click editing of launchers (such as the "Add to panel" option found in GNOME Shell). The applications menu, though, is a departure from upstream GNOME 2.x, and reflects the customized, multi-column main menu offered by earlier Mint releases. There are also a number of smaller changes, such as tweaks to the placement and duration of notification messages, a smaller default font size, and miscellaneous changes to the behavior of some default panel applets. Cinnamon supports multi-monitor setups, but like GNOME Shell, it has some kinks to be worked out (as users are reporting on the Mint forum).
The rationale stated for developing Cinnamon includes a number of
factors: making the computer "work for you
", making things
"
not hidden away but easy to access
", and making you
"
feel at home ... thus giving you the ability to change the way the
desktop works, looks and behave.
" Of course, several of those
factors are uniquely personal, so it is hard to quantify what they mean in
a general sense. As a result, the other way most people describe Cinnamon
is that it re-implements the GNOME 2.x desktop using GNOME 3 technology.
When you consider the changes to the applications menu and panel applets,
that is not quite true — in fact, what Cinnamon
re-implements is Mint's customized version of GNOME 2.x. If you are an
Ubuntu or Fedora user, you may find that it requires some getting used to.
A bigger problem is that since the release of GNOME 3.2, a lot of GNOME Shell extensions have starting popping up, as has an "official" extensions web site. But as of today, Cinnamon is not compatible with other GNOME Shell extensions, which is probably not too surprising. A developer on the forum cites Lefebvre as saying he wants to make Cinnamon configurable without the need for extensions; nevertheless there are users who are clearly interested in using add-ons that originate elsewhere. That lack could hurt Cinnamon's adoption among non-Mint users. Another risk is what Jonathan Corbet described in his predictions for 2012: between MATE, MGSE, and Cinnamon, Linux Mint is taking on a lot of development work for a small distribution — it may fare well, but it may also hit the wall.
Razor-qt
While Cinnamon is only attempting to replace GNOME Shell (leaving other GNOME platform components untouched), Razor-qt is an attempt to build a new DE entirely, providing a lightweight computing environment based on the Qt framework. In that sense, Razor-qt is analogous to Xfce or LXDE, which build on GTK+ but do not use most GNOME platform technologies. Razor-qt provides a basic DE without depending on KDE libraries — most notably the Plasma engine on which KDE's desktop, panel, and widget system are based.
Razor-qt was started in 2010, but development picked up after the project migrated from SourceForge to Github in July 2011. The latest release is version 0.4, which dropped on December 12. In addition to the sources available through Github, there are package repositories provided for Ubuntu, Fedora, OpenSUSE, Arch Linux, and Agilia, along with ebuilds for Gentoo. The dependencies are few, including Qt4 (no more specific notes as to which version), X, libudev, and libmagick.
The Razor-qt environment is lightweight, perhaps even spartan, but slick. It provides a panel (configurable for either top or bottom placement), a system menu, and a collection of essential panel applets (clock, window list, system tray, etc.). It does not include a window manager, but it is capable of cooperating with "any modern WM from fwwm2 to KWin
" — although the project developers recommend Openbox. The project also recommends an assortment of Qt-based applications to flesh out the system, all of which also come without KDE dependencies.
At first launch, Razor-qt asks the user which window manager to use. Razor-qt relies on other components for things like drawing the desktop background and desktop icons, so some of the desktop look will depend on which window and file managers are being used. If GTK+ options are chosen, those tools may not quite match the Qt look-and-feel of the rest of the environment. The 0.4 release advertises its support for Freedesktop.org's "XDG" cross-desktop standards (which it implements in a reusable qtxdg library). It seems to observe the XDG base directory, menu, cursor theme, icon theme, and .desktop launcher specifications — through the window manager it also picks up support for other useful specifications, such as drag-and-drop.
The result is a desktop environment that is more-or-less usable for everyday tasks, although there are still missing and incomplete features. For example, there is a screensaver plug-in, but it only supports locking the screen by clicking on a panel button; there is no automatic time-out. Settings are divided up between two applications, one for the desktop and one for the session, but neither contains font preferences, widget themes, or several other categories one might expect. Still, the set of bundled utilities continues to grow — it now includes a clipboard, keyboard manager, and battery status applet, all of which are still listed as third-party add-ons on the applications wiki page.
Razor-qt works with multiple monitors. As it does not implement the "overlay" effect that GNOME Shell, Unity, and Cinnamon use to bring up the window switcher or application search functions, there are fewer bugs reported in relation to its multi-monitor support. Current support is basic, but there are feature requests asking for enhancements.
Razor-qt is also commendable for providing API documentation, even at such an early stage in its development. The documentation includes references for the DE's XDG implementations and for its original desktop environment and session classes. That will no doubt come in handy for attracting new contributors.
It is hard to go into much more detail about Razor-qt's environment, because it is (quite intentionally) so simple. The project's home page gives "simplicity, speed, and an intuitive interface
" as its goals, along with the ability to run on "weak machines
". It does the job without fanfare. The GTK+-based lightweight DEs offer a more complete package at the moment, but they have a considerable head start as well. The Qt framework encompasses more than GTK+ does, so as the project progresses, it is reasonable to expect it to catch up, without piling on too many additional dependencies or custom libraries.
Conclusion
There seems to be an implicit "should we switch?" question behind most of the blog reviews published about Cinnamon, Razor-qt, and other alternative desktop projects. The short answer is "no," of course: both are still very wet behind the ears, which means missing pieces and instabilities. The more interesting question is what each of the projects means for the Linux desktop ecosystem. Neither has a firm roadmap published, but there are potentially interesting implications to both projects.
First, with the arrival of Cinnamon, hopefully it is clear at this point that GNOME users' criticisms about GNOME Shell cannot all be dismissed solely as "fear of change." The extensions community and all-out forks like Cinnamon implement specific feature changes (application menus, window lists, relocation of the clock and notifications) unavailable when GNOME Shell debuted, and hopefully they will be accepted (even if not adopted as defaults) by the upstream developers. The usability concerns about "large screen" desktops versus "small screen" tablets are not quite as easily solved, but if no one tries then no progress is possible. Cinnamon's existence clarifies that someone can like GNOME 3 technology, but still make a valid argument that there is more than one way to implement a usable desktop with it.
Razor-qt, for its part, offers application developers the tantalizing possibility of clarifying the distinction between KDE and Qt, which are too often confused by users. Simply providing users with another choice — and in this case, a substantially different one than Xfce and LXDE — is empowering to those users, but it can also serve to push the Qt project ahead. The qtxdg library is one example, and probably will not be the last. On top of that, additional platforms equal more testing, and as Qt makes a play for embedded devices, the more diverse its ecosystem, the better it will do.
The Nook Tablet and the GPL
Recently, certain corners of the net have carried the claim that Barnes & Noble is refusing to release the source for the kernel shipped in its "Nook Tablet" book reader device. That, of course, would be a violation of the kernel's licensing. GPL violations are far from unheard of in the mobile electronics market, but B&N is a company with a high-enough profile to attract special attention. A look at what is going on suggests that there is less to the story than meets the eye - but it still merits a look.The big fuss was made on the XDA developers forum, where many Android-related conversations are hosted. There, Adam Outler claimed:
He also claims that B&N has been deleting requests for the source from its own forum sites.
In truth, B&N has made the kernel source for its Nook devices available - though some prodding from Matthew Garrett was required to get that to happen. One can find it by scrolling down on the Nook "terms of service" page. Matthew believes that source distribution to be complete - or something very close to it. His position is that B&N appears to be living up to its obligations under the GPL at this time. Some XDA developers still disagree with this assessment for one simple reason: it is not actually possible to build a replacement kernel for the Nook (specifically, the "Nook Tablet" variant) using the source provided. Like many consumer electronics devices, the Nook Tablet is locked down and will refuse to boot a kernel that lacks a signature that it recognizes. What the XDA developers want is a signing key that will allow them to build kernels that are actually bootable on the device. Without that key, they say, the kernel sources are incomplete and useless.
It is certainly a reasonable thing for them to want; hackable devices are, after all, far more interesting and valuable than the locked-down variety. There is a difference, though, between wanting something and claiming that somebody is obligated to provide it to you. Whether B&N is obligated to provide that key is far from clear. The relevant GPLv2 language is this:
Many developers have, over the years, claimed that a signing key qualifies as one of the "scripts" referred to in §3 of the GPL. Even if the license does not explicitly say that it must be possible to build and install an executable that the hardware will actually deign to boot, that requirement is arguably within the intent of the license.
Version 3 of the GPL added language to make this expectation explicit; in most cases, it is not possible to use GPLv3-licensed code in a device if that code cannot be updated by the user. But the Linux kernel is not covered by GPLv3, and, more to the point, the discussion around GPLv3 made it clear that a significant portion of the kernel development community does not wish to limit the use of their code in this way. The "kernel developers' position on GPLv3" document posted in 2006 was explicit on that point. Linus Torvalds made his position clear on the issue well before the GPLv3 process even started:
Of course, the actual meaning of the language in the GPL is not determined by Linus or any specific group of kernel developers. That, in the end, can only be definitively done in a court, and, even then, clarity can be hard to come by. So it is conceivable that, someday, some developer could pursue a case against a company like B&N and prevail, forcing either the release of the signing key or the removal of the product from the market. Such an outcome, needless to say, would cause a number of manufacturers to reevaluate their use of Linux in their products.
That outcome seems unlikely, though, for one simple reason: Linus and a number of other high-profile developers have, through their statements and rejection of GPLv3, made it clear that they see locked-down systems as a permissible use of the kernel and in full compliance with its license. Such signals carry a lot of weight in arguments before a judge, who will be reluctant to rule that a vendor cannot do what the developers of the kernel explicitly said was allowed. Anybody seeking such a ruling will be fighting an uphill battle from the outset.
Saying that the license allows certain behavior is not a statement that such behavior is a good thing. But that is the nature of free software: it is not truly free if it cannot be used to do something the author disagrees with. Once code is released under a free license, it can be used for any number of distasteful things, including running criminal organ-harvesting rings, controlling land mines, tracking people the government doesn't like, or sending "join my social network" email. It can also be used to implement unpleasant DRM schemes on a locked-down ebook reader. That is simply part of the loss of control that comes with making software free.
In some parts of the market, it has become clear that an open platform is a competitive advantage; see HTC's policy of not locking down the boot loaders on its handsets, for example. Barnes & Noble is engaged in a difficult fight against companies like Amazon; as Charlie Stross so clearly stated last year, an insistence on using DRM is just making that fight harder. The push for DRM seemingly comes mainly from the publishers, but pushback from retailers like B&N could force some change there. So one could easily argue that B&N should stop locking down its readers, not because of licensing problems, but because it makes more commercial sense not to.
Security
Denial of service via hash collisions
Man pages (or perldoc pages in this case) are not the usual places to look for security holes, but that is apparently where two researchers found a whole class of problems for web application frameworks (and the languages underlying them). The problem concerns dictionaries (aka hashes, depending on the language) and was fixed in Perl in 2003, but the need for a fix evidently wasn't clear to other high-level/scripting language developers. So, Julian Wälde and Alexander Klink were able to report efficient denial-of-service attacks against PHP, Python, Java, Ruby, and the v8 JavaScript engine via some of the web frameworks that use those languages at the recently held Chaos Communication Congress.
The "perlsec" perldoc page has a section on "Algorithmic Complexity Attacks" that explains the problem that was fixed in Perl and is (or was) present in other languages. Dictionaries are data structures offered by some languages to store key/value pairs, which are implemented using a hash table based on the key. Unlike a cryptographic hash, the hash functions that are used for hash tables are meant to be efficient in terms of CPU usage, but still offer collision resistance. But, if the same hash function is used for every installation and invocation of a given language, one can fairly easily calculate a set of dictionary keys that collide. That's where the problem starts.
A hash table needs to have some way of dealing with collisions, where two different key values hash to the same value. One common way is to have each hash "bucket" be a linked list of all the dictionary entries that hash to the bucket index. In most cases, that works just fine as the number of collisions is typically small, so the slight inefficiency of traversing the list for insertion, deletion, and searching is in the noise. But if an attacker can control the keys that are being inserted, they can arrange that all of them hash to the same bucket. So, that turns the usual case of an insertion requiring the traversal of a few collisions, at worst, to an insertion requiring traversal of all entries added so far.
If all keys hash to the same bucket, the entire linked list for the bucket must be traversed in order to insert a new key. During the traversal, each stored key must be compared with the insertion key to see if it is already present (which it wouldn't be in the attack scenario, but the hash table insertion code doesn't know that). As more and more keys get added, that traversal takes longer and longer, to a point where it can chew up some fairly significant CPU time. But how can an attacker control the key values that get stored?
That's where web frameworks come into play. Those frameworks helpfully collect up all of the POST form data that gets sent to a particular web page into a dictionary that gets handed off to the application for processing. The amount of data that the frameworks will allow is typically rather large (e.g. 1MB), so an attacker can create tens of thousands of form entries in a single HTTP request. Because they all collide, those entries can take tens of seconds to a few minutes to process on an i7 core according to the advisory [PDF] released by Wälde and Klink.
Because the languages do not randomize the hash function used in any way, the collisions that get calculated are usable for a wide variety of web applications. It is likely that collisions calculated for Python would work on most Python-based web applications for example. The presentation (slides [PDF]) mentions two different techniques to find collisions that are computationally feasible, but they are both "offline" calculations and cannot be used if the hash function changes between invocations and sites. Thus, the suggested fix for the problem is to choose a random seed that gets mixed into the hash so that collisions are not predictable. That seed would be chosen at language start-up time so that each invocation essentially had its own hash function.
There are workarounds for the problem as well. Web frameworks could (probably should) limit the amount of form data that can be processed and/or the number of form elements that are allowed in a given POST. Another potential workaround is to limit the amount of CPU time that a web application is allowed to use. If a given POST can only soak up ten seconds of a core, for example, it will take more of them (i.e. more bandwidth) to fully saturate a given server or set of servers.
While randomizing the hash function is suggested, it may not be a complete
solution. Discussion on the python-dev
mailing list and in Python bug 13703 indicate that
more is needed to eradicate the problem. As Paul McMillan put it: "I've
gotten confirmation from several other sources that the fix recommended by
the presenters (just a random initialization seed) only prevents the most
basic form of the attack.
" For long-running Python interpreters, it
may be feasible for an attacker to brute force or otherwise determine the
random seed that was used.
Determining the random seed will be easier to do with an "oracle function", which is some mechanism that an attacker can use to get information about the hash values of various strings. It does not require that the "raw" hash values be returned, necessarily, as other information (like JSON dictionary key ordering, for example) can be used to deduce the seed. Those are much harder attacks than the simple "POST with thousands of collisions" attack that is available today, however.
Other languages, including Perl and Ruby have fixed the simpler problem, but Python is looking at going further. One way to do that would be to use a much larger seed, and choose pieces of the seed to add into the hash based on properties of the input string, which essentially increases the search space for brute force and other means of deducing the seed.
PHP has also started to consider fixes. According to the advisory, Oracle has determined that no fix will be applied to Java itself, but that the GlassFish application server will be updated to fix or work around the problem.
The 2003 research [PDF] by Scott A. Crosby and Dan S. Wallach clearly showed the flaw, but didn't apply it to any other language beyond Perl. That led to the Perl fix, but even a heads-up message to python-dev by Crosby was not enough to cause a fix in Python at that time. Part of the problem was the inability to point to an easy way for an attacker to control the keys put into a dictionary. Evidently, the web frameworks were not considered at that time.
According to McMillan, though, the web frameworks are just the tip of the iceberg in terms of the scope of the problem. Various Python developers had suggested that the problem should be fixed in the web frameworks (as Java is evidently doing) or in a few standard libraries (like urllib, cgi, and json), but McMillan states that the problem goes much further than that:
Work on testing and benchmarking changes for Python is ongoing, but the plan is to provide a security release for 2.6, 2.7, and 3.3, while also providing backported patches for earlier Pythons. PHP is discussing options, including placing limits on the length of hash collision chains and limiting the number of form variables that will be processed, while still planning to add randomization into the hashing function.
It is interesting to see a nearly ten-year-old flaw pop back up, though it seems likely that the flaw hasn't been exploited widely (or fixes would have come sooner). It does serve as a reminder that what may seem like theoretical vulnerabilities will, over time, often become actual vulnerabilities. It's sometimes hard to consider making changes for "theoretical" attacks, which may well be the right choice at the time, but it is worth pondering the likelihood that eventually the change will need to be made. Like most security questions—as with the rest of software development—it's always a matter of trade-offs.
Brief items
Security quotes of the week
New vulnerabilities
apache: multiple vulnerabilities
| Package(s): | apache | CVE #(s): | CVE-2011-3607 CVE-2011-4317 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | January 10, 2012 | Updated: | January 11, 2012 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the Mandriva advisory:
Integer overflow in the ap_pregsub function in server/util.c in the Apache HTTP Server 2.0.x through 2.0.64 and 2.2.x through 2.2.21, when the mod_setenvif module is enabled, allows local users to gain privileges via a .htaccess file with a crafted SetEnvIf directive, in conjunction with a crafted HTTP request header, leading to a heap-based buffer overflow (CVE-2011-3607). The mod_proxy module in the Apache HTTP Server 1.3.x through 1.3.42, 2.0.x through 2.0.64, and 2.2.x through 2.2.21, when the Revision 1179239 patch is in place, does not properly interact with use of (1) RewriteRule and (2) ProxyPassMatch pattern matches for configuration of a reverse proxy, which allows remote attackers to send requests to intranet servers via a malformed URI containing an \@ (at sign) character and a : (colon) character in invalid positions. NOTE: this vulnerability exists because of an incomplete fix for CVE-2011-3368 (CVE-2011-4317). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
cacti: command execution
| Package(s): | cacti | CVE #(s): | CVE-2011-4824 | ||||||||||||
| Created: | January 9, 2012 | Updated: | January 23, 2012 | ||||||||||||
| Description: | From the CVE entry:
SQL injection vulnerability in auth_login.php in Cacti before 0.8.7h allows remote attackers to execute arbitrary SQL commands via the login_username parameter. | ||||||||||||||
| Alerts: |
| ||||||||||||||
chromium: multiple vulnerabilities
| Package(s): | chromium | CVE #(s): | CVE-2011-3903 CVE-2011-3904 CVE-2011-3906 CVE-2011-3907 CVE-2011-3908 CVE-2011-3909 CVE-2011-3910 CVE-2011-3912 CVE-2011-3913 CVE-2011-3914 CVE-2011-3917 CVE-2011-3921 CVE-2011-3922 | ||||||||||||||||||||||||
| Created: | January 9, 2012 | Updated: | January 30, 2012 | ||||||||||||||||||||||||
| Description: | From the Gentoo advisory:
Multiple vulnerabilities have been discovered in Chromium and V8. A context-dependent attacker could entice a user to open a specially crafted web site or JavaScript program using Chromium or V8, possibly resulting in the execution of arbitrary code with the privileges of the process, or a Denial of Service condition. | ||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||
glibc: heap overflow
| Package(s): | glibc | CVE #(s): | CVE-2009-5029 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | January 5, 2012 | Updated: | February 13, 2012 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the openSUSE advisory:
Specially crafted time zone files could cause a heap overflow in glibc. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
kernel: denial of service
| Package(s): | kernel | CVE #(s): | CVE-2011-4622 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | January 9, 2012 | Updated: | March 6, 2012 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the Red Hat bugzilla:
User space may create the PIT and forget about setting up the irqchips. In that case, firing PIT IRQs will crash the host. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
kernel: denial of service
| Package(s): | kernel | CVE #(s): | CVE-2011-3637 CVE-2011-4324 CVE-2011-4325 CVE-2011-4348 | ||||||||||||||||||||||||||||||||||||||||
| Created: | January 11, 2012 | Updated: | January 11, 2012 | ||||||||||||||||||||||||||||||||||||||||
| Description: | The kernel suffers from four independent denial of service vulnerabilities in the SELinux subsystem (CVE-2011-4348), module subsystem (CVE-2011-3637), and NFS subsystem (CVE-2011-4324 CVE-2011-4325). | ||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||
libvirt: firewalled port exposure
| Package(s): | libvirt | CVE #(s): | CVE-2011-4600 | ||||||||
| Created: | January 6, 2012 | Updated: | January 11, 2012 | ||||||||
| Description: | From the Fedora advisory: This release of libvirt fixes a minor security problem with extraneous iptables rules being added when an externally managed network (new feature in 0.9.4) exists. More information can be found in the Red Hat bugzilla entry. | ||||||||||
| Alerts: |
| ||||||||||
nova: access control bypass
| Package(s): | nova | CVE #(s): | CVE-2012-0030 | ||||
| Created: | January 11, 2012 | Updated: | January 20, 2012 | ||||
| Description: | From the Ubuntu advisory: Nachi Ueno, Rohit Karajgi, and Venkatesan Ravikumar discovered that when Nova is configured to use the OpenStack API, it would not correctly enforce access controls on certain incoming requests. A remote authenticated attacker could exploit this to change resources of arbitrary tenants. | ||||||
| Alerts: |
| ||||||
openssl: multiple vulnerabilities
| Package(s): | openssl | CVE #(s): | CVE-2011-4108 CVE-2011-4576 CVE-2011-4577 CVE-2011-4619 CVE-2011-4109 CVE-2012-0027 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | January 11, 2012 | Updated: | February 7, 2012 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | Openssl prior to versions 1.0.0f and 0.9.8s suffers from a number of information disclosure and denial of service vulnerabilities; see this advisory for details. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
pdns: denial of service
| Package(s): | pdns | CVE #(s): | CVE-2012-0206 | ||||||||||||||||||||
| Created: | January 10, 2012 | Updated: | February 23, 2012 | ||||||||||||||||||||
| Description: | From the Debian advisory:
Ray Morris discovered that the PowerDNS authoritative sever responds to response packets. An attacker who can spoof the source address of IP packets can cause an endless packet loop between a PowerDNS authoritative server and another DNS server, leading to a denial of service. | ||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||
python-virtualenv: symlink attack
| Package(s): | python-virtualenv | CVE #(s): | CVE-2011-4617 | ||||||||||||
| Created: | January 6, 2012 | Updated: | June 25, 2012 | ||||||||||||
| Description: | From the CVE entry:
virtualenv.py in virtualenv before 1.5 allows local users to overwrite arbitrary files via a symlink attack on a certain file in /tmp/. | ||||||||||||||
| Alerts: |
| ||||||||||||||
ruby: denial of service
| Package(s): | ruby | CVE #(s): | CVE-2011-4815 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | January 11, 2012 | Updated: | February 28, 2012 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | Ruby, like many scripting languages, enables the predictable creation of hash collisions. This "feature" can be exploited for a denial of service attack. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
super: buffer overflow
| Package(s): | super | CVE #(s): | CVE-2011-2776 | ||||
| Created: | January 9, 2012 | Updated: | January 11, 2012 | ||||
| Description: | From the Debian advisory:
Robert Luberda discovered a buffer overflow in the syslog logging code of Super, a tool to execute scripts (or other commands) as if they were root. The default Debian configuration is not affected. | ||||||
| Alerts: |
| ||||||
zabbix: multiple cross-site scripting vulnerabilities
| Package(s): | zabbix | CVE #(s): | CVE-2011-4615 CVE-2011-5027 | ||||||||
| Created: | January 9, 2012 | Updated: | January 11, 2012 | ||||||||
| Description: | From the CVE entries:
Multiple cross-site scripting (XSS) vulnerabilities in Zabbix before 1.8.10 allow remote attackers to inject arbitrary web script or HTML via the gname parameter (aka host groups name) to (1) hostgroups.php and (2) usergrps.php, the update action to (3) hosts.php and (4) scripts.php, and (5) maintenance.php. (CVE-2011-4615) Cross-site scripting (XSS) vulnerability in ZABBIX before 1.8.10 allows remote attackers to inject arbitrary web script or HTML via unspecified vectors related to the profiler. (CVE-2011-5027) | ||||||||||
| Alerts: |
| ||||||||||
Page editor: Jake Edge
Kernel development
Brief items
Kernel release status
The 3.2 kernel was released on January 4, after 72 days of development. Among other things, this kernel adds the proportional rate reduction TCP algorithm, the extended verification module, the CPU scheduler bandwidth controller, the cross-memory attach IPC mechanism, the Hexagon DSP architecture, improved recovery of corrupted Btrfs filesystems, and the I/O-less dirty throttling code. See the Kernelnewbies 3.2 page for lots more information.As of this writing, the 3.3 merge window is open; see below for details on what has been merged so far.
Stable updates: the 2.6.32.53, 3.0.16, and 3.1.8 stable kernel updates were released on January 6. Each contains the usual long list of important fixes (OK, 2.6.32.53 only has nine fixes, but the newer kernels have quite a few more).
The 2.6.32.54, 3.0.17, 3.1.9, and 3.2.1 stable updates are in the review process; they can be expected on or after January 12. 3.1.9 is likely to be the final update for the 3.1 kernel.
Quotes of the week
Code that specializes the kernel in weird ways is accepted into the kernel all the time, and I've tried to figure out why this particular bit of code is treated differently. Especially since this code is self-contained, configurable, and imposes no perceivable long-term maintenance burden.
Yes, you are special and unique, just like everyone else.
The next person who says the "embedded is different" phrase again, owes me a beer of my choice.
A long-term kernel support update
Greg Kroah-Hartman has posted an update on his plans for long-term kernel maintenance. As he announced before, the 3.1 series is almost at the end of its update period; he is also approaching the end of his maintenance for the long-lived 2.6.32 release. "It is approaching it's end-of-life, and I think I only have another month or so doing releases of this. After I am finished with it, it might be picked up by someone else, but I'm not going to promise anything." As it happens, Tim Gardner has stated that Ubuntu will support 2.6.32 through April 2015 - though whether that support will translate into kernel releases outside of the Ubuntu distribution is not clear. Ubuntu also plans to take on 3.2 as a long-term supported kernel.
No more system devices
Since the early days of the Linux device model, there has been a special device class for "system devices," typically those built into the platform itself. For almost as long, the driver core developers have felt that there was no real need for this device type, which looks weirdly different from every other type of device. For 3.3, they have actually done something about it; system devices are no more.All in-tree system device drivers have been fixed up to use regular devices instead. The process is relatively simple; it can be seen in, for example, this commit updating kernel/time/clocksource.c. In short, the embedded struct sys_device becomes a simple struct device instead. Attributes defined with SYSDEV_ATTR() are switched to DEVICE_ATTR(). The sysdev_class structure is turned into a nearly empty bus_type structure instead. That is about all that is required.
These changes, naturally, cause a user-space ABI change; system devices had their own special area under /sys which goes away. That has the potential to break programs and scripts, which would not be a good thing. To avoid this problem, a special function has been added:
int subsys_system_register(struct bus_type *subsys,
const struct attribute_group **groups);
Registering a subsystem in this way will restore its old /sys/devices/system hierarchy. Needless to say, this function exists for backward compatibility purposes only; using it in new drivers is not likely to be received well.
Yet another new approach to seccomp
Over the years, we have seen a number of attempts to use the seccomp ("secure computing") mechanism to reduce the range of operations available to a given process. The hope is to use such a mechanism as part of a sandboxing solution that would allow (for example) a web browser to run third-party code in a safer manner. Thus far, all of these attempts have gone down in flames; see Seccomp filters: no clear path from last May for the most recent episode in this particular story.Things have been quiet on the seccomp front recently - until now. Will Drewry, who has been behind the recent attempts to enhance seccomp, has come up with an interesting new approach to the problem. Whether this attempt will be more successful than its predecessors remains to be seen, but Will has managed to step around some of the traps that doomed his previous attempt.
In the last seccomp discussion, there was a fair amount of pressure to adapt the kernel's tracing infrastructure to this task; there was also resistance to using that infrastructure in that way. As explained in detail in the patch posting, Will has come to the conclusion that the tracing infrastructure is not really fit for the task anyway:
Will's new approach has a stroke of brilliance to it: rather than use the ftrace filter mechanism, he has repurposed the networking layer's packet filtering mechanism (BPF). The BPF code normally operates on packets; in the seccomp context, instead, it operates on the register set at the time of each system call. The registers will contain the system call number and its parameters, allowing the filter to make a wide range of decisions on what will (or will not) be allowed. BPF is also well-maintained and well-optimized code; it even has an in-kernel just-in-time compiler. Some of these advantages are lost because seccomp uses its own BPF interpreter; one assumes that a way could be found to merge the two implementations if the underlying idea looks like it will pass muster.
As of this writing, there has not really been time for comments on the new patch. It will be interesting to see what the developers think. Meanwhile, those wanting more information should see the patch posting and the documentation file, which includes a sample program showing how to use the new facility.
Kernel development news
The first half of the 3.3 merge window
As of this writing, just over 5,700 non-merge changesets have been pulled into the mainline for the 3.3 development cycle. A fair amount of work remains to be pulled, so it looks like another fairly active cycle, though perhaps not quite up to the level of 3.2.Some of the more significant, user-visible changes merged so far include:
- The "team" network driver - a lightweight mechanism for bonding
multiple interfaces together - has been merged. The libteam project has the
user-space code needed to operate this device.
- The network priority control group controller has been added. This
controller allows the administrator to specify the priority with which
members of each control group have access to the network interfaces
available on the system. See net_prio.txt from the documentation
directory for more information.
- Also added is the TCP buffer size
controller which can be used to place limits on the amount of
kernel memory used to hold TCP buffers.
- The byte queue limits
infrastructure has been added, enabling control over how much data can
be queued for transmission over a network interface at any time.
- The Open vSwitch virtual network
switch has been merged.
- The ARM architecture has gained support for the "large physical
address extension," allowing 32-bit processors to address more than
4GB of installed memory.
- The "adaptive RED" queue management algorithm is now supported by the
networking layer.
- The near-field communications (NFC) layer has gained support for the
logical link control protocol (LLCP).
- The beginnings of dynamic frequency
selection support have been added to the wireless networking
subsystem.
- For S390 users who find the current limit of 3.8TB of RAM to be
constraining: 3.3 will add support for four-level page tables and an
upper limit of 64TB (for now).
- Various Android drivers have returned to the staging tree; see this article for more information.
- The C6X architecture (described in this
article) has been merged.
- The ext4 filesystem has added support for online resizing via the
EXT4_IOC_RESIZE_FS ioctl() command. This operation
does not (yet) work with filesystems using the "bigalloc" or "meta_bg"
features.
- The /proc filesystem has a new subdirectory for each process
called map_files; it contains a symbolic link describing
every file-backed mapping used by the relevant process. This feature
is one of many needed to support the desired checkpoint/restart
feature.
- /proc also supports a couple of new mount options. When
mounted with hidepid=1, /proc will deny access to
any process directories not owned by the requesting process. With
hidepid=2, even the existence of other processes will be
hidden. The default (hidepid=0) behavior is unchanged. The
other new option (gid=N) provides an ID for a group that is
allowed to access information for all processes regardless of the
hidepid= setting.
- New drivers:
- Systems and processors:
AppliedMicro APM8018X PowerPC processors,
Numascale NumaChip systems,
IBM Currituck (476fpe) boards, and
NVIDIA Tegra30 processors.
- Input:
TI TCA8418 keypad decoders,
Wacom Intuos4 wireless tablets,
EETI eGalax multi-touch panels,
GPIO-connected tilt switches,
Sharp GP2AP002A00F I2C Proximity/Opto sensors, and
PIXCIR I2C touchscreens.
- Miscellaneous: P7IOC PowerPC I/O hubs,
Dialog Semiconductor DA9052/53 PMIC devices,
SiRF SoC Platform Serial ports,
Analog Devices AD5421, AD5764, AD5744, and AD5380 digital to
analog converters,
GE PIO2 VME Parallel I/O cards,
OMAP 2/3/4 displays,
OMAP "Tiling and Isometric Lightweight Engine for Rotation" devices,
Dialog DA9052/DA9053 regulators,
VIA hardware watchdog timers, and
TI TCA6507 I2C LED controllers.
- Network: Calxeda 1G/10G XGMAC Ethernet interfaces and
ISA-based CC770 CAN controllers.
- USB: Marvell USB OTG transceivers and
Marvell EHCI host controllers.
- Graduations: Microsoft's Hyper-V virtual network driver and the gma500 graphics driver have moved out of staging into the mainline.
- Systems and processors:
AppliedMicro APM8018X PowerPC processors,
Numascale NumaChip systems,
IBM Currituck (476fpe) boards, and
NVIDIA Tegra30 processors.
Changes visible to kernel developers include:
- A reworked version of the DMA buffer sharing API has been merged; this
API has been described in a separate
article.
- The "memblock" low-level memory allocation API has been substantially
reworked.
- Quite a few VFS interfaces have been changed to use the
umode_t type for file mode bits.
- Also in the VFS: most of
the members of struct vfsmount have been moved elsewhere
(to a containing struct mount) and hidden from
filesystem code. A number of callbacks in struct
super_operations (specifically: show_stats(),
show_devname(), show_path() and
show_options()) now take a pointer to struct dentry
instead of struct vfsmount.
- The pin control subsystem has gained a
new configuration interface.
- Boolean module parameters have traditionally allowed the underlying
module variable to be of either bool or int type.
That tolerance is coming to an end with 3.3, where non-bool
types will generate a warning; the plan is apparently to change those
warnings to fatal compilation errors
in the 3.4 cycle. A lot of modules have seen type changes for their
parameters in preparation for the new regime.
- The "system device" type has been removed from the kernel; all instances have been converted to regular devices instead. See this article for more information.
The merge window can be expected to remain open through approximately January 18.
Rethinking power-aware scheduling
Sometimes it seems that there are few uncontroversial topics in kernel development, but saving power would normally be among them. Whether the concern is keeping a battery from running down too soon or keeping the planet from running down too soon, the ability to use less power per unit of computation is seen as a good thing. So when the kernel's scheduler maintainer threatened to rip out a bunch of power-saving code, it got some people's attention.The main thing the scheduler can do to reduce power consumption is to allow as many CPUs as possible to stay in a deep sleep state for as long as possible. With contemporary hardware, a comatose CPU draws almost no power at all. If there is a lot of CPU-intensive work to do, there will be obvious limits on how much sleeping the CPUs can get away with. But, if the system is lightly loaded, the way the scheduler distributes running processes can have a significant effect on both performance and power use.
Since there is a bit of a performance tradeoff, the scheduler exports a couple of tuning knobs under /sys/devices/system/cpu. The first, called sched_mc_power_savings, has three possible settings:
- The scheduler will not consider power usage when distributing tasks;
instead, tasks will be distributed across the system for maximum
performance. This is the default value.
- One core will be filled with tasks before tasks will be moved to other
cores. The idea is to concentrate the running tasks on a relatively
small number of cores, allowing the others to remain idle.
- Like (1), but with the additional tweak that newly awakened tasks will be directed toward "semi-idle" cores rather than started on an idle core.
There is another knob, sched_smt_power_savings, that takes the same set of values, but applies the results to the threads of symmetric multithreading (SMT) processors instead. These threads look a lot like independent processors, but, since they share most of the underlying hardware, they are not truly independent from each other.
Recently, Youquan Song noticed that sched_smt_power_savings did not actually work as advertised; a quick patch followed to fix the problem. Scheduler maintainer Peter Zijlstra objected to the fix, but he also made it clear that he objects to the power-saving machinery in general. Just to make that clear, he came back with a patch removing the whole thing and a threat to merge that patch unless somebody puts some effort into cleaning up the power-saving code.
Peter subsequently made it clear that he sees the value of power-aware scheduling; the real problem is in the implementation. And, within that, the real problem seems to be the control knobs. The two knobs provide similar behavioral controls at two levels of the scheduler domain hierarchy. But, with three possible values for each, the result is nine different modes that the scheduler can run in. That seems like too much complexity for a situation where the real choice comes down to "run as fast as possible," or "use as little power as possible."
In truth, it is not quite that simple. The performance cost of loading up every thread in an SMT processor is likely to be higher than that of concentrating tasks at higher levels. Those threads contend for the actual CPU hardware, so they will slow each other down. So one could conceive of situations where an administrator might want to enable different behavior at different levels, but such situations are likely to be quite rare. It is probably not worth the trouble of maintaining the infrastructure to support nine separate scheduler modes just in case somebody wants to do something special.
For added fun, early versions of the patch adding the "book" scheduling level (used only by the s390 architecture) included a sched_book_power_savings switch, though that switch went away before the patch was merged. There is also the looming possibility that somebody may want to do the same for scheduling at the NUMA node level. There comes a point where the number of possibilities becomes ridiculous. Some people - Peter, for example - think that point has already been reached.
That conclusion leads naturally to talk of what should replace the current mechanism. One solution would be a simple knob with two settings: "performance" or "low power." It could, as Ingo Molnar suggested, default to performance for line-connected systems and low power for systems on battery. That seems like a straightforward solution, but there is also a completely different approach suggested by Indan Zupancic: move that decision making into the CPU governor instead. The governor is charged with deciding which power state a CPU should be in at any given (idle) time. It could be given the additional task of deciding when CPUs should be taken offline entirely; the scheduler could then just do its normal job of distributing tasks among the CPUs that are available to it. Moving this responsibility to the governor is an interesting thought, but one which does not currently have any code to back it up; until somebody rectifies that little problem, a governor-based approach probably will not receive a whole lot more consideration.
Somebody probably will come through with the single-knob approach, though; whether they will follow through and clean up the power-saving implementation within the scheduler is harder to say. But it should be enough to avert the threat of seeing that code removed altogether. And that is certainly a good thing; imagine the power that would be uselessly consumed in a flamewar over a regression in the kernel's power-aware scheduling ability.
DMA buffer sharing in 3.3
Back in August 2011, LWN looked at the DMA buffer sharing patch set posted by Marek Szyprowski. Since then, that patch has been picked up by Sumit Semwal, who modified it considerably in response to comments from a number of developers. The version of this patch that was merged for 3.3 differs enough from its predecessors that it merits another look here.The core idea remains the same, though: this mechanism allows DMA buffers to be shared between drivers that might otherwise be unaware of each other. The initial target use is sharing buffers between producers and consumers of video streams; a camera device, for example, could acquire a stream of frames into a series of buffers that are shared with the graphics adapter, enabling the capture and display of the data with no copying in the kernel.
In the 3.3 sharing scheme, one driver will set itself up as an exporter of sharable buffers. That requires providing a set of callbacks to the buffer sharing code:
struct dma_buf_ops {
int (*attach)(struct dma_buf *buf, struct device *dev,
struct dma_buf_attachment *dma_attach);
void (*detach)(struct dma_buf *buf, struct dma_buf_attachment *dma_attach);
struct sg_table *(*map_dma_buf)(struct dma_buf_attachment *dma_attach,
enum dma_data_direction dir);
void (*unmap_dma_buf)(struct dma_buf_attachment *dma_attach, struct sg_table *sg);
void (*release)(struct dma_buf *);
};
Briefly, attach() and detach() inform the exporting driver when others take or release references to the buffer. The map_dma_buf() and unmap_dma_buf() callbacks, instead, cause the buffer to be prepared (or unprepared) for DMA and pass ownership between drivers. A call to release() will be made when the last reference to the buffer is released.
The exporting driver makes the buffer available with a call to:
struct dma_buf *dma_buf_export(void *priv, struct dma_buf_ops *ops,
size_t size, int flags);
Note that the size of the buffer is specified here, but there is no pointer to the buffer itself. In fact, the current version of the interface never passes around CPU-accessible buffer pointers at all. One of the actions performed by dma_buf_export() is the creation of an anonymous file to represent the buffer; flags is used to set the mode bits on that file.
Since the file is anonymous, it is not visible to the rest of the kernel (or user space) in any useful way. Truly exporting the buffer, instead, requires obtaining a file descriptor for it and making that descriptor available to user space. The descriptor can be had with:
int dma_buf_fd(struct dma_buf *dmabuf);
There is no standardized mechanism for passing that file descriptor to user space, so it seems likely that any subsystem implementing this functionality will add its own special ioctl() operation to get a buffer's file descriptor. The same is true for the act of passing a file descriptor to drivers that will share this buffer; it is something that will happen outside of the buffer-sharing API.
A driver wishing to share a DMA buffer has to go through a series of calls after obtaining the corresponding file descriptor, the first of which is:
struct dma_buf *dma_buf_get(int fd);
This function obtains a reference to the buffer and returns a dma_buf structure pointer that can be used with the other API calls to refer to the buffer. When the driver is finished with the buffer, it should be returned with a call to dma_buf_put().
The next step is to "attach" to the buffer with:
struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
struct device *dev);
This function will allocate and fill in yet another structure:
struct dma_buf_attachment {
struct dma_buf *dmabuf;
struct device *dev;
struct list_head node;
void *priv;
};
That structure will then be passed to the exporting driver's attach() callback. There seems to be a couple of reasons for the existence of this step, the first of which is simply to let the exporting driver know about the consumers of the buffer. Beyond that, the device structure passed by the calling driver can contain a pointer (in its dma_params field) to one of these structures:
struct device_dma_parameters {
unsigned int max_segment_size;
unsigned long segment_boundary_mask;
};
The exporting driver should look at these constraints and ensure that the buffer it is exporting can satisfy them; if not, the attach() call should fail. If multiple drivers attach to the buffer, the exporting driver will need to allocate the buffer in a way that satisfies all of their constraints.
The final step is to map the buffer for DMA:
struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach,
enum dma_data_direction direction);
This call turns into a call to the exporting driver's map_dma_buf() callback. If this call succeeds, the return value will be a scatterlist that can be used to program the DMA operation into the device. A successful return also means that the calling driver's device owns the buffer; it should not be touched by the CPU during this time.
Note that mapping a buffer is an operation that can block for a number of reasons; if the buffer is busy elsewhere, for example. Also worth noting is that, until this call is made, the buffer need not necessarily be allocated anywhere. The exporting driver can wait until others have attached to the buffer so that it can see their DMA constraints and allocate the buffer accordingly. Of course, if the buffer lives in device memory or is otherwise constrained on the exporting side, it can be allocated sooner.
After the DMA operation is completed, the sharing driver should unmap the buffer with:
void dma_buf_unmap_attachment(struct dma_buf_attachment *attach,
struct sg_table *sg_table);
That will, in turn, generate a call to the exporting driver's unmap_dma_buf() function. Detaching from the buffer (when it is no longer needed) can be done with:
void dma_buf_detach(struct dma_buf *dmabuf, struct dma_buf_attachment *attach);
As might be expected, this function will call the exporting driver's detach() callback.
As of 3.3, there are no users for this interface in the mainline kernel. There seems to be a fair amount of interest in using it, though, so Dave Airlie pushed it into the mainline with the idea that it would make the development of users easier. Some of those users can be seen (in an early form) in Dave's drm-prime repository and Rob Clark's OMAP4 tree.
Patches and updates
Kernel trees
Architecture-specific
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
Memory management
Networking
Security-related
Miscellaneous
Page editor: Jonathan Corbet
Distributions
CyanogenMod contemplates an app store
The big question that is on everyone's mind when it comes to CyanogenMod
these days is "when might we see Ice Cream Sandwich on device xyzzy?" Work
on the ICS-based CyanogenMod 9 (CM9) proceeds apace, but trying to get any kind
of estimate would undoubtedly violate the first rule of CyanogenMod:
"you don't ask for ETAs
". There are a couple of different
interesting things going on in CM-land right now, beyond all of the work on
bringing ICS to the many supported devices.
Two things in CM developer Koushik Dutta's Google+ post from January 10 stand out: the idea of a CyanogenMod App Store is gaining some traction, and that CM is approaching one million unique installs. While one million users is a pretty significant number, it's still a drop in the bucket compared to Android devices overall (which have 700,000 registrations per day at last count). Still, it is quite an achievement for the project.
Incidentally, the numbers gathered by CM come from the opt-in CMStats
program that users are asked to enable at first-boot time. Undoubtedly,
some users don't, so it's likely that CM has already surpassed the
million-user mark. Since CMStats checks in on each boot, though, it is a
reliable source of data for counting "unique, active, user installs
", as
Dutta put it.
The app store idea comes about for a couple of different reasons. Apps that required rooted devices have a tendency to get kicked out of the Android Market, presumably because the carriers don't like them. Other unpopular app types include emulators for older video game systems (without the ROMs that would clearly be a copyright violation), one-click rooting, and tethering apps. Dutta is thinking that an app store that is not under the thumb of Google and the carriers would provide one-stop shopping for those kinds of apps.
The "shopping" part is important. Dutta and project founder Steve Kondik see a CM app store as a way to generate revenue to help support the project's development. If a portion of the revenue from such a store went to the project, it could cover some of its increasing hardware and server costs. As Dutta mentions, there is no reason that it would be limited only to CM installs either as any alternate Android ROM could include it (presumably with some kind of revenue sharing deal).
Based on the overwhelmingly positive reaction to the Google+ posting (it reached the 500 comment limit in less than a day) it seems like an idea with some legs. In fact, a follow-up posting would seem to indicate that Dutta has started working on code for the app store. Several commenters brought up issues that a CM app store would need to address, including ensuring app quality (and security/privacy), but overall, it would appear there are quite a few users interested in a single place to get "banned" apps.
While there have been no CM9 progress updates since one was posted to the blog on December 2, there are a lot of CM9 "KANG" (unofficial) builds floating around in threads on the XDA-developers site and elsewhere. No official builds have yet appeared, though, not even release candidates or nightlies as yet. In the meantime, one can either be patient or go ahead and build CM9 from the source.
Brief items
Tizen releases some code
The Tizen project has announced the release of an initial set of source repositories and an alpha SDK. "Today we are posting a set of pre-release tools to give application developers an early look at Tizen. These tools, together with their corresponding documentation and source code, will provide developers with information required to become familiar with Tizen development."
Distribution News
Debian GNU/Linux
bits from the DPL for December 2011
Debian Project Leader Stefano 'Zack' Zacchiroli has a few short bits on his December activities. Topics include Wheezy artwork organization, Auditor work ramping up, and more. "I've gladly accepted an invite by representatives of other distributions to join a panel at FOSDEM 2012 about local user groups (e.g. Fedora ambassadors, LoCo groups for Ubuntu, etc)."
Ubuntu family
Long term support for Xubuntu, Kubuntu and Edubuntu
The upcoming Ubuntu 12.04 will be a Long Term Support (LTS) release. The Ubuntu technical board has approved applications from the Xubuntu, Kubuntu and Edubuntu for these variants to also enjoy LTS.
Newsletters and articles of interest
Distribution newsletters
- Debian Project News (January 9)
- DistroWatch Weekly, Issue 438 (January 9)
- Fedora Weekly News Issue 289 (January 3)
- Maemo Weekly News (January 9)
- Ubuntu Weekly Newsletter, Issue 247 (January 8)
Is Mandriva Finished This Time? (OStatic)
OStatic reports that Mandriva may be about to shut down. "On the Mandriva Forum, Raphaël Jadot, a long-time contributor, wrote, 'everything was fine, but there is a big problem: a minor shareholder (Linlux) refuses the capital injection required for Mandriva to continue, even though the Russian investor had offered to bear it alone. Except turnaround Mandriva should cease activity Jan. 16.' No further details were made available there. But as news crept around the various forums more did emerge."
Ubuntu TV unveiled (PC Pro)
PC Pro has a brief report on the "Ubuntu TV" offering revealed by Canonical at the Consumer Electronics Show. "[Jane] Silber told us Canonical was in discussions with a number of television manufacturers, but couldn't confirm any signed deals. It will face stiff competition from Google - which only last week added LG to its roster of Google TV manufacturers - and Apple, which is widely tipped to be working on an internet television after making little impact with successive generations of its Apple TV hardware."
Page editor: Rebecca Sobol
Development
Hadoop rings in the new year with a 1.0 release
The Apache Software Foundation (ASF) has declared its Hadoop software framework ready to be called 1.0. Hadoop, a darling of the "Big Data" movement, is a framework for writing distributed applications that process vast quantities of data in parallel — where "vast" means petabyte-scale and larger, divided up across thousands of nodes. In one sense, the 1.0 release is an arbitrary declaration by the Hadoop team that the core of the framework (which has been in development for six years, and is in widespread use) has reached enterprise-level stability suitable for commercial adoption. But, coming from a top-level project at the ASF, the "1.0" also represents a commitment to long-term support from the community, and the release includes notable improvements in security, database functionality, and filesystem access.
A quick bit of background
Hadoop's core framework is an implementation of the MapReduce programming paradigm. The MapReduce approach involves dividing a large data set up among a cluster of compute nodes. The "map" step applies a single (presumably simple) function to the entire data set; that function is executed in parallel by each node on its own chunk. The "reduce" step then collects the chunks of output generated by the nodes and applies another function to combine them, as appropriate, into a result for the data set as a whole. The canonical example is performing a word frequency count on a large set of documents. The set of documents is first divided up among the nodes, and the map function splits each document into individual words, returning the words as a list. The reduce function ingests all of the word lists produced by the mappers, and increments a separate counter for each word as it progresses.
In practice, a MapReduce application is also responsible for the potentially trickier steps of deciding how best to partition the input data set among the available nodes, how to sort (or otherwise prepare) the output returned by the mappers, and how to read and write the massive data sets between the nodes and storage. Hadoop supports multiple options for most of these tasks, including several job allocation algorithms, and several different storage back-ends (including read-only HTTP/HTTPS servers and Amazon S3). The MapReduce framework is also responsible for managing the communication between the master node and the child mapper and reducer nodes. The approach can be multi-tiered, so that a mapper node can subdivide its chunk of the data and split it up among child mapper nodes. MapReduce is capable of working with heterogeneous compute clusters, so particularly fast or multi-processor nodes can get assigned a larger portion of the input data by the framework's job coordinator.
The concepts at the heart of MapReduce programming are not themselves exotic; similar ideas are well-known from multi-threading features in several existing languages. But MapReduce was popularized in a 2003-era USENIX paper [PDF] published by Google, which described how the search giant used MapReduce across large clusters on huge data sets — Google famously used its in-house MapReduce tools to rebuild its index of the entire web. Google's MapReduce was designed to operate on <key,value> pairs (a natural match for the string-centric computations of web search), and a second paper [PDF] described the "Google FS" (GFS) distributed filesystem that the company developed to support its MapReduce clusters.
Google's MapReduce implementation was also provided to users of its AppEngine service, but it is not free software. In 2004, Doug Cutting, co-creator of the Apache Lucene and Nutch search projects, started Hadoop as an open source implementation of the MapReduce concept while he was an employee at Yahoo. Yahoo has remained one of Hadoop's chief contributors and evangelists, and has been joined by other data-centric web companies such as Facebook and eBay. Hadoop is designed to run on commodity hardware, including heterogeneous nodes, and to scale up rapidly.
Extras
In addition to its central MapReduce framework, Hadoop ships with a number of infrastructure tools to support large MapReduce applications. The most notable is HDFS, a distributed filesystem designed to run on Hadoop clusters. In spite of the name, HDFS is not a filesystem in the Linux kernel sense of the word; it is a node-based storage system whose design mirrors that of a Hadoop cluster. A master node called the NameNode keeps track of all file metadata, and coordinates among the various DataNodes used for storage. Files are broken up into blocks and replicated among the DataNodes, based on parameters that are adjustable for optimum fault-tolerance or speed.
The new release adds several noteworthy features to the filesystem, starting with WebHDFS, a REST-like HTTP interface. This API exposes the complete filesystem interface over HTTP, via GET, PUT, POST, and DELETE, which makes it possible to manage an HDFS volume without writing custom Java or C client code. The filesystem can now also be protected against unauthorized access by requiring Kerberos authentication.
The MapReduce core also picks up a Kerberos authentication option, naturally. The Kerberos work is part of a larger security-hardening effort that was undertaken to prepare for Hadoop for 1.0. The other security changes include stricter permissions on files and directories, enabling access control lists (ACLs) for task resources, and ensuring that an application's task processes run as non-privileged users.
Hadoop itself is written in Java and provides a full Java API for MapReduce, but there are several interfaces designed to help developers code in other languages as well. The best known are Yahoo's Pig (a high-level scripting language), Facebook's Hive (which overlays a database-like structure and offers an SQL-like query interface), and Hadoop Streaming, which provides a text-based interface exposed through stdin and stdout. Using Hadoop Streaming (which, unlike Pig and Hive, is developed within the Hadoop project itself), developers can call executables written in any language as their mapper and reducer functions, routing data through them as they would with Unix pipes.
HDFS is not designed to serve as a relational database, optimized as it is for streaming read/write performance. But the popularity of projects like Hive show that many Hadoop users are interested in some level of database-like functionality for their MapReduce problems. Google created the proprietary BigTable to add a database layer to GFS; Hadoop created HBase to offer similar functionality for HDFS. HBase does not offer many of the features that other "NoSQL" RDBMS-replacement products advertise (such as typed columns, secondary indexes, and advanced queries). It does, however, offer a table structure and record lookups, and implements "strongly consistent" reads and writes, sharding, and some optimization techniques such as block caches and Bloom filters. Under the hood, though, HBase stores its data in HDFS files.
HBase is officially part of the 1.0 release, and is now a fully-supported storage option for MapReduce jobs. Like HDFS, it is accessible using either Java or REST APIs. Performance of HBase and BigTable has never been as fast as a traditional RDBMS, but Hadoop does say that the 1.0 release includes performance enhancements, particularly for access to HDFS files stored on the local disk.
The Big Data user community has already widely embraced Hadoop, with heavyweight service providers like IBM and Oracle offering Hadoop-based products, in addition to the smaller, cloud-oriented service companies and various start-ups run by Hadoop project members. In many ways, the 1.0 milestone (which according to the release notes is based on the project's 0.20-security branch) is recognition of the stability the project has already achieved.
Consequently, Hadoop add-ons may be the focus of the news-making future developments for the project. Google has reportedly moved further away from the original MapReduce itself in recent years, putting more work into the GFS and BigTable layers of the stack. The various Hadoop service providers and Big Data users (Yahoo and Facebook included) are similarly extending the Hadoop core, with projects like Pig and Hive. The list of Hadoop-derived projects includes a number of efforts to leverage Hadoop's demonstrably-stable core to take on wildly different classes of problem: database applications, data collection, machine learning, and even configuration management. As beneficial as an open source MapReduce implementation is on its own merits, this ripple effect will influence a far wider set of computing tasks in the future.
Brief items
Quotes of the week
gcc-python-plugin 0.8
Version 0.8 of gcc-python-plugin (a GCC plugin that allows the embedding of a Python interpreter into the compiler) has been released. New features include support for analyzing C++ code and improvements to the CPython API checker.A mutt with notmuch bark
Karel Zak is a fan of the Notmuch mail indexer, but he would like a more integrated experience. Thus: "I have forked mutt to seriously integrate notmuch to this excellent e-mail client. I don't want to use symlinks or any other hacks to emulate virtual folders." Quite a few developers on the Notmuch list (which has become much more active in recent times) have expressed interest in this project; chances are it will progress quickly.
systemd v38 released
The systemd v38 release is out; this is the first release to contain "the journal." "The journal is quite complete at this time, but a small number of bigger features are still missing. Documentation is currently terse and will be extended in the coming versions. If you want to see the effect of the journal, try 'systemctl status' which is now hooked up with the journal and will show the most recent log output of a service." Note that v38 is a test release; it will be good for those wanting to see what the journal looks like, but probably should be kept far from production systems.
Newsletters and articles
Development newsletters from the last week
- Caml Weekly News (January 10)
- LibreOffice development summary (January 10)
- Perl Weekly (January 9)
- PostgreSQL Weekly News (January 8)
- Tahoe-LAFS Weekly News (January 8)
Katz: The Future of CouchDB
CouchDB creator Damien Katz appears to be forking the project with the creation of "Couchbase." "It's not that we think CouchDB isn't awesome. It's that we are creating the successor to it: Couchbase Server. A product and project with similar capabilities and goals, but more faster, more scalable, more customer and developer focused. And definitely not part of Apache."
Page editor: Jonathan Corbet
Announcements
Articles of interest
First FOSDEM 2012 Speaker Interviews
As in years past, the FOSDEM (Free and Open source Software Developers' European Meeting) team has been interviewing the main track speakers. Interviews available so far are:- Carlos Sanchez (DevOps)
- David Chisnall (LLVM)
- Paolo Bonzini (KVM)
- Lars Wirzenius (distro development)
- Sasha Levin (native KVM tool)
- Sylvain Lebresne (Apache Cassandra)
New Books
The Linux Command Line--New from No Starch Press
No Starch Press has released "The Linux Command Line" by William E. Shotts, Jr.
Calls for Presentations
LLVM EU conference 2012 - Call for participation
The second European LLVM event will take place April 12-13, 2012 in London, UK. Proposals for keynote speakers, presentations and workshops will be accepted until February 10. "We invite academic, industrial and hobbyist speakers to present their work on developing or using LLVM and Clang. We invite abstracts for technical presentations, posters, workshops, demonstrations and BoFs relating to LLVM/Clang development and use. Material will be chosen to cover a broad spectrum of themes and topics at various depths, some technical deep-diving, some surface-scratching."
Upcoming Events
Events: January 12, 2012 to March 12, 2012
The following event listing is taken from the LWN.net Calendar.
| Date(s) | Event | Location |
|---|---|---|
| January 12 January 13 |
Open Source World Conference 2012 | Granada, Spain |
| January 13 January 15 |
Fedora User and Developer Conference, North America | Blacksburg, VA, USA |
| January 16 January 20 |
linux.conf.au 2012 | Ballarat, Australia |
| January 20 January 22 |
Wikipedia & MediaWiki hackathon & workshops | San Francisco, CA, USA |
| January 20 January 22 |
SCALE 10x - Southern California Linux Expo | Los Angeles, CA, USA |
| January 27 January 29 |
DebianMed Meeting Southport2012 | Southport, UK |
| January 31 February 2 |
Ubuntu Developer Week | #ubuntu-classroom, irc.freenode.net |
| February 4 February 5 |
Free and Open Source Developers Meeting | Brussels, Belgium |
| February 6 February 10 |
Linux on ARM: Linaro Connect Q1.12 | San Francisco, CA, USA |
| February 7 February 8 |
Open Source Now 2012 | Geneva, Switzerland |
| February 10 February 12 |
Linux Vacation / Eastern Europe Winter session 2012 | Minsk, Belarus |
| February 10 February 12 |
Skolelinux/Debian Edu developer gathering | Oslo, Norway |
| February 13 February 14 |
Android Builder's Summit | Redwood Shores, CA, USA |
| February 15 February 17 |
2012 Embedded Linux Conference | Redwood Shores, CA, USA |
| February 16 February 17 |
Embedded Technology Conference 2012 | San José, Costa Rica |
| February 17 February 18 |
Red Hat, Fedora, JBoss Developer Conference | Brno, Czech Republic |
| February 24 February 25 |
PHP UK Conference 2012 | London, UK |
| February 27 March 2 |
ConFoo Web Techno Conference 2012 | Montreal, Canada |
| February 28 | Israeli Perl Workshop 2012 | Ramat Gan, Israel |
| March 2 March 4 |
Debian BSP in Cambridge | Cambridge, UK |
| March 2 March 4 |
BSP2012 - Moenchengladbach | Mönchengladbach, Germany |
| March 5 March 7 |
14. German Perl Workshop | Erlangen, Germany |
| March 6 March 10 |
CeBIT 2012 | Hannover, Germany |
| March 7 March 15 |
PyCon 2012 | Santa Clara, CA, USA |
| March 10 March 11 |
Open Source Days 2012 | Copenhagen, Denmark |
| March 10 March 11 |
Debian BSP in Perth | Perth, Australia |
If your event does not appear here, please tell us about it.
Page editor: Rebecca Sobol
