LWN.net Logo

LWN.net Weekly Edition for December 9, 2010

Getting grubby with ZFS

By Jonathan Corbet
December 7, 2010
The GRUB bootloader is widely used to get Linux (and other) systems running. Its flexibility and configurability make it a logical choice for many types of computers, as does its "just works" factor: your editor cannot be the only one to smile when he realizes how long it has been since the last "I forgot to run LILO and my new kernel won't boot" episode. One of GRUB's nice features is its ability to understand filesystem structures and find bootable kernels on the fly. So the addition of support for another filesystem type would not normally be a noteworthy event. When that filesystem is ZFS, though, people will pay attention.

ZFS was developed by Sun Microsystems, and is now owned by Oracle. It offers some nice features that Linux does not (yet) have in a production-quality filesystem. ZFS, like the rest of Solaris, is licensed under the CDDL, which is not considered to be compatible with the GPLv3 license used by GRUB. Over the years, ZFS has also been the subject of a fair amount of dark murmuring with regard to a large pile of associated software patents. For these reasons, there has never been a serious push to get ZFS support into Linux.

One would think that these concerns would keep ZFS support out of GRUB as well. It turns out that one of those concerns - licensing - is not relevant for the simple reason that Sun saw fit to release some small bits of ZFS code under the GPL for the express purpose of compatibility with GRUB. The released code is not enough to run a ZFS filesystem; it's really just enough to locate and read files. Just enough, in other words, to bootstrap a ZFS-based system.

What about software patents? One would assume that Oracle would not go out of its way to sue GRUB users for using its built-in ZFS code to boot Solaris systems. Those people are, after all, Oracle's customers, and, for all the criticism of Oracle which has been heard recently, nobody has suggested that it has reached a point where it will take advice from the SCO playbook. Still, assumptions can lead to trouble; Oracle may yet hire Darl McBride once Larry Ellison retires to his yacht, it may sell the patents to somebody else, or any of a number of other things may happen. Depending on rational behavior from corporations over the long term is always a scary bet.

In this case, the GRUB maintainers (and, presumably, the Free Software Foundation, which owns the GRUB project) have decided that incorporating the code is safe. Their reasons are described in the announcement; it comes down to the fact that Oracle has distributed the code under the GPL:

Thanks to this, and due to the fact that Oracle is bound to the terms of the GNU GPL when it comes to GRUB, we believe this renders patents covering ZFS basically harmless to GRUB users. If the patents covering GRUB are held by Oracle, they can't use them against GRUB users, and if they're held by other parties, the GPL provisions will prevent Oracle from paying a tax only for themselves, so if they will fight alongside the community instead of betraying it.

The announcement goes on to suggest that anybody who cares about the freedom of all their users should always release code under the latest version of the GPL.

There is an interesting implication here. The FSF is counting on Oracle being bound by the strengthened patent clauses found in GPLv3. But the code found in Solaris was never explicitly distributed under GPLv3; it is under a GPLv2+ license. The code only became explicitly GPLv3 when it was moved into the GNU-run Savannah repository. The FSF is saying that, thanks to the "or any later version" language in the copyright notice, users of the ZFS code can assume that Oracle is bound by the more explicit GPLv3 patent language even though GPLv3 did not exist when the code was released. They are probably right.

GPLv2 arguably contains an implicit patent grant. But it certainly does not have the Novell-inspired "you can't buy a license for your users only" language. Sun's lawyers may not have thought that they were giving the FSF the right to further bind Sun's actions with regard to patents through updated versions of the GPL. Using the "or any later version" language hands a powerful blank check to whoever controls later versions of the license.

The merging of the ZFS code raises eyebrows for another reason: neither Sun nor Oracle has assigned ownership of this code to the FSF. The Foundation's policy is clear: it needs to obtain assignment, or, failing that, a complete disclaimer of rights on the code; the ZFS code comes with neither. This exception to policy is justified this way:

The ZFS code that has been imported into GRUB derives from the OpenSolaris version of GRUB Legacy. On one hand, this code was released to the public under the terms of the GNU GPL. On the other, binary releases of Solaris included this modified GRUB, and as a result Oracle/Sun is bound by the GPL.

We believe that these two factors give us very strong reassurance that: a) Oracle owns the copyright to this code and b) Oracle is licensing it under GPL, and therefore it is completely safe to use this in GRUB.

The FSF has often claimed that copyright assignment is required in order to be able to prosecute infringement cases. Either the FSF no longer believes this, or it has decided that license enforcement will never be necessary for GRUB. It's hard to find any other possible explanations for this decision.

The FSF has also pronounced as "safe" a chunk of code which was never submitted for inclusion by its authors, and which is owned by a company which is known for its active legal department. This is the company which is currently suing Google over an alternative Java implementation, after all. Perhaps the FSF has a hush-hush agreement with Oracle regarding the merging of this code, but that seems unlikely. Merging the code is almost certainly safe without such an agreement, but it would be a stretch to say that it is more safe than merging code from individual contributors who do not wish to assign their copyrights to the FSF. If this code can be safely accepted without copyright assignment, so can contributions from many others.

Might the FSF be slowly rethinking its position on copyright assignment? There have been few signs of any such deliberation, but the acceptance of the ZFS code sets an interesting precedent. Perhaps the FSF has an internal policy saying that unassigned code is acceptable if it comes from an Oracle-sized corporation? It would be nice to know what the FSF is really thinking.

Comments (59 posted)

Mozilla's open web app platform

December 8, 2010

This article was contributed by Nathan Willis

The Mozilla Labs project is rolling out a framework it calls Open Web Apps intended to improve the "stickiness" and operating system integration of web-based applications. The framework uses HTML5 features like local storage and existing standards like OpenID to create an installation workflow that more closely mimics the process traditionally used with desktop applications. Officially announced in October, the first bits of code have now started to appear on Github.

Based on the initial announcement in October and a foreshadowing post from May, the theory underlying Open Web Apps seems to be that, under the current paradigm, users have little more than their browser's bookmarking system to keep track of web applications that they frequently use. As a result, web applications (in spite of their growing popularity) remain segregated from the rest of the OS experience — they do not have a persistent presence, they all behave differently in regard to sign-on procedures, and so forth.

On top of that, the May post suggests that as more web developers build web applications disguised as mobile applications for consumer smartphones like Android or Apple's iPhone, they have grown to like the browsable, searchable, rate-able interface of the "mobile app store."

What it is

[Dashboard]

The Open Web Apps experiment attempts to solve both of these problems at once. On the web apps' side, it describes a JSON-based application "manifest" file that each application would serve up to describe basic metadata about itself — name, icon, creator, launch path, verification URL, and a set of basic capabilities. On the browser side, it lays out a standard for a web app "repository" (which could be implemented directly in browser code, as an extension, or via JavaScript) made up of a locally-stored collection of these manifests.

The repository has two APIs: one that web sites can use to offer the user an "install this web app" option, and one that the browser can use to show the user his or her currently installed web apps. There is a JavaScript-based demo running at myapps.mozillalabs.com that uses this user-facing API to create a dashboard, showing a launcher for each app in the repository as well as an uninstall option.

[Store demo]

At apps.mozillalabs.com (be sure to note the absence of "my"), the project has several demonstration implementations of the server-side code that illustrate different possibilities. A simple app can "self-publish," meaning that it offers its own manifest file and "install me" button, but interested third parties could also build directories, cataloging manifests found in the wild and presenting them to site visitors in categories and with rankings. The is also a "store" demonstration that illustrates the optional verification scheme, which can be used to hand off login via OpenID or even to charge an online payment before returning a successful install.

At the moment, the feature set offered by the demos is a little thin. There are a half-dozen apps available, but the only one that uses the paid-verification architecture is a fake app called TaskTracker which does not actually charge any money ... but neither is it a real app. The dashboard demo has big, glossy icons, but it also does not offer any genuine functionality beyond the standard Firefox bookmarks the system is supposed to be replacing.

As a result, it is easy to imagine that the manifest system could be good for web app developers if the "app store" model does indeed take off (Mozilla makes it clear repeatedly in the documentation that it is not interested in running such a store or directory). The ranking and sorting could be beneficial, and the unified verification/payment method would simplify sign-up. But there is not as much to like from the end user's point of view. Launchers are just launchers, regardless of the size of the icon.

Extending the idea

Moving forward, however, there may still be some interesting offerings in future versions of the architecture. The capabilities field, for example, has yet to be fully explored, but exposing what an application can do in advance could help users search for the apps they want. The wiki lists a handful of proposed capabilities, including geolocation support, media capture, read/write file access, read access to contacts, and so on.

Apart from geolocation, few current web applications make use of capabilities that users might care to seek out or specifically avoid, but more are presumably on the way. Mozilla's own Rainbow project exposes desktop audio and video recording hardware to web applications, for example. The existing capabilities list comes from the W3C's Permissions working group. Elsewhere the documentation and blog posts mention 3-D rendering, which might also be a viable candidate.

A blog post from November introduces an enhancement to the original scheme that does offer clear benefits to the user: synchronization of repositories between multiple computers. Code for this feature is already available on Github, though interestingly enough, as a separate server. The functionality to synchronize client data between browsers is already present in Firefox Sync (formerly Weave), though, so app repository synchronization may make it there someday.

Some of the features described as possibilities for future client-side enhancement cannot be implemented in the JavaScript-based demo dashboard running at myapps.mozillalabs.com due to the need to access lower-level browser code. The project says that add-ons-based implementations will follow — presumably for Firefox first, and Firefox for Mobile, though possibly for other HTML5 browsers as well.

Another proposed enhancement to the architecture that has implications for app developers is support for cryptographically signed manifests, which would allow the browser to verify that a manifest has not been altered by an attacker. The manifest specification is still undergoing revision, including a discussion on how best to let an application delegate installation authority to a third-party — i.e., allowing an app manifest to specify which stores and directories are authorized to sell (or perhaps even list) it.

Further out, the project mentions several ideas for extending the repository and dashboard functionality to provide better OS integration, such as a notification framework, cross-application search methods, and possible support for cross-site user-experience schemes like OExchange, which could be used to link user content from several different apps into a single unified set of documents.

Security

Wherever cross-site functionality is concerned, security becomes an issue. The project has a dedicated page outlining all of the possible security and privacy concerns it knows of in the Open Web App architecture, and where possible, potential solutions.

Because the system is primarily used as a way to connect to third-party sites, most of the potential attack vectors are not direct exploits of the web app in question (such as stealing a user's GMail password); those would be security holes in the service itself. Rather, the page describes attacks against the repository, the installation and verification functions, and the dashboard.

Some aspects of the system do not introduce any new attack vectors. Tampering with the repository itself or any installed app's manifest amounts to an attack on the browser's implementation of HTML5 local storage — though it should also be noted that the signed manifests proposal mentioned earlier is a safeguard against this. Likewise, intercepting application launch via a man-in-the-middle attack amounts to performing the same attack against the existing site's OpenID login implementation.

On the other hand, it would be possible to build a man-in-the-middle dashboard that intercepted installation or launch requests and delivered tainted goods to the client. This is only possible with a JavaScript-based, hosted dashboard, as opposed to a native browser dashboard implementation. The demo dashboard at myapps.mozillalabs.com, of course, is one such hosted dashboard. The project page suggests implementing dashboards only over HTTPS to provide a layer of protection against this attack. It also notes, however, that if browsers begin to implement the dashboard in local code, the attack vector disappears.

Finally, it would be possible to build a malicious "app store" that, through iframe defacement, tricks the user into installing a different application than the one they intend. The page notes that Firefox 4's Content Security Policy can protect against this vulnerability.

Appzilla returns

Strangely absent from the Open Web App project documentation is how the scheme could fit in with Mozilla's other web application / desktop integration product, Mozilla Prism. Prism is the renamed XULRunner browser, which can be used to launch sites in separate processes that behave more like a native application on a desktop system — living in the system tray, running at startup, and so forth. Some of the proposed extensions to the Open Web App architecture sound like they would be a good fit for Prism, but there is no indication that native repository functionality is headed in Prism's direction.

The major challenge facing Open Web App's growth, however, is not lack of browser support, but the effort that would be required to convince web developers to create browser-agnostic sites. Written all over the Open Web App documentation (starting with the name) is the notion that compliant apps should be based on free and open standards: HTML5, CSS, and JavaScript. But just saying that doesn't make it happen. Nothing in the system prevents developers from building IE-only or iPhone-only sites and slapping a compliant manifest file up on the server — it will just fail to work properly once installed in a different browser.

Still, that is a hurdle that can only be overcome with evangelism, not with specifications. The development community is at least aware of the difference. On Tuesday, Google unveiled a similarly-themed "app store" designed to function solely with its Chrome browser. During the press conference, a Twitter message from one reader was re-tweeted multiple times, asking "So why again are we building web apps 'for Chrome' instead of for the Web?" If Mozilla is correct about the growing desire of web application developers to have an "app store" model in which to hawk their wares to the public, it can only find that question encouraging — but it may still face a long slog uphill to make truly cross-browser applications the standard.

Comments (8 posted)

The 2010 Linux and free software timeline - Q2

Here is LWN's thirteenth annual timeline of significant events in the Linux and free software world for the year.

In what is becoming a fairly standard pattern, 2010 brought various patent lawsuits, company acquisitions, new initiatives, and new projects. It also brought new releases of the software that we use on a daily basis. There were licensing squabbles and development direction disagreements—all things that we have come to expect from the Linux and free software world over a year's time. Also as expected, though, were the improvements in the kernel, applications, distributions, and so on that make up that world. Linux and free software just keep chugging along, and we are very happy to be able to keep on reporting about it.

Like last year, we will be breaking this up into quarters, and this is our report on April-June 2010. Over the next month or so, we will be putting out timelines of the other quarters of the year.


This is version 0.8 of the 2010 timeline. There are almost certainly some errors or omissions; if you find any, please send them to timeline@lwn.net.

LWN subscribers have paid for the development of this timeline, along with previous timelines and the weekly editions. If you like what you see here, or elsewhere on the site, please consider subscribing to LWN.

For those with a nostalgic bent, our timeline index page has links to the previous twelve timelines and some other retrospective articles going all the way back to 1998.

April

Since Emacs is just an editor, not a god, it cannot do miracles.

-- Richard Stallman

[Subversion]

Subversion puts out a proposed vision and roadmap for the version control system (VCS), which recognizes that it has "no future" as a distributed VCS (DVCS) (proposal).

The Embedded Linux Conference is held in San Francisco (LWN coverage: Android and the community, Embedded Linux status, and Using LTTng).

You can't modify Fedora under F/OSS principles and still call it Fedora, just like you can't modify Firefox under F/OSS principles and still call it Firefox. Both of us do this to protect the good name of the project. We'd be in an extremely glass house-y situation if we tried to 'call out' Mozilla over this. It'd be ridiculous.

-- Adam Williamson

The apache.org infrastructure is attacked in a direct, targeted fashion using cross-site scripting and password brute-forcing (report).

[Perl] Perl 5.12.0 is released and the project moves to a time-based yearly release schedule (announcement).

Java inventor James Gosling leaves Oracle shortly after Oracle's acquisition of Sun (blog post).

The Linux Foundation Collaboration Summit is held in San Francisco (LWN coverage: Some notes and MeeGo)

[Debian] Stefano Zacchiroli is elected as Debian Project Leader, succeeding Steve McIntyre (results).

GCC 4.5.0 is released (LWN coverage).

The Qubes security-oriented, virtualization-based open source OS is announced; it is built atop Xen and Linux (announcement, LWN coverage).

Ubuntu 10.04 LTS ("Lucid Lynx") is released (announcement).

May

Lennart Poettering announces "systemd" as a replacement for init, and it has gained traction in both Fedora and openSUSE though it has yet to be released in either distribution (announcement).

I resent being called an imaginary user. Being imaginary would seriously screw with my weekend plans.

-- Peter Hutterer

Red Hat and Novell fend off patent suit by IP Innovation, which, as its name might imply, is a patent troll. The suit was over some very broad patents that ended up being invalidated (LWN coverage of the suit, Groklaw coverage of the outcome).

All video codecs are covered by patents. A patent pool is being assembled to go after Theora and other "open source" codecs now. Unfortunately, just because something is open source, it doesn't mean or guarantee that it doesn't infringe on others patents. An open standard is different from being royalty free or open source.

-- Steve Jobs

Free Software Foundation Europe (FSFE) founder Georg Greve receives the German Cross of Merit (announcement).

[Ryzom] The Ryzom multiplayer online role-playing game (MMORPG) code is released as free software after several years of almost being freed (announcement, 2008 LWN coverage).

Mandriva looks for a buyer (news article (in French), Google translation).

Linux 2.6.34 is released (announcement, KernelNewbies summary).

The answers to your Security Questions are case sensitive and cannot contain special characters like an apostrophe, or the words "insert," "delete," "drop," "update," "null," or "select."

-- Novel SQL injection protection as reported on BoingBoing

Linux Mint 9 is released (announcement).

[WebM] Google launches the WebM media format for the web, which includes the VP8 video codec acquired when it bought On2, the Vorbis audio codec, and the Matroska media container format (announcement, LWN coverage). [Fedora]

Fedora 13 is released (announcement).

The Diaspora project forms to develop a privacy-friendly alternative to Facebook and other social networking sites. Its request for $10,000 in funding results in more than 20x as much in donations (LWN coverage).

The Libre Graphics meeting is held in Brussels (LWN coverage).

MeeGo 1.0 is released (announcement, LWN review).

The Free Software Foundation asks Apple's App Store to comply with the GPL on an iPhone port of GNU Go, which leads to Apple removing the app from the store (FSF blog post and update, LWN coverage).

June

Thrilled to read that Intel finally did the right thing, and dropped the requirement for (C) assignment (of whatever form) to be able to contribute to clutter - making it a truly open project; nice! I feel a sudden urge to contribute, something, anything now it belongs to us all.

-- Michael Meeks

[Linaro]

The Linaro consortium is announced, which seeks to simplify the ARM Linux landscape (announcement, LWN article).

Rockbox 3.6 is released, with many new features for the free music player firmware (announcement, LWN review).

LinuxTag is held in Berlin, Germany (LWN coverage: Mark Shuttleworth, Thomas Gleixner, and Stefano Zacchiroli)

Another, seemingly final, setback for SCO in SCO v. Novell (Groklaw report).

Most mixers are self-contained and not hackable, but Siciliano says many home automation systems tap into appliances such as blenders and coffee machines. These home networks are then open to attack in surprising ways: A hacker might turn on the blender from outside your home to distract you as he sneaks in a back window, he warns.

-- Fox News hypes "hacker" threats

SouthEast LinuxFest (SELF) is held in Spartanburg, South Carolina (USA) (LWN coverage).

GNOME finalizes speaker guidelines, which are meant to reduce friction and present a more welcoming face to newcomers (guidelines, LWN coverage).

The US Supreme Court rules in the Bilski case, which affirms the lower court's ruling against the Bilski patent, but does not make hoped-for changes to the patentability of software (LWN article).

File locking on Linux is just broken. The broken semantics of POSIX locking show that the designers of this API apparently never have tried to actually use it in real software. It smells a lot like an interface that kernel people thought makes sense but in reality doesn't when you try to use it from userspace.

-- Lennart Poettering

[FFmpeg] FFmpeg 0.6 is released with support for WebM and better HTML5 compatibility (announcement).

[EFF] The Electronic Frontier Foundation (EFF) launches HTTPS Everywhere, which is a Firefox plugin to promote better web security (LWN article).

Jared Smith becomes the new Fedora Project Leader, succeeding Paul Frields (announcement).

Comments (2 posted)

Page editor: Jonathan Corbet

Security

Pathname-based hooks for SELinux?

By Jake Edge
December 8, 2010

A patch that would add the last path component as a parameter to the Linux security module (LSM) hooks for inode creation raised a few eyebrows. It looked to be an attempt to add pathname-based hooks for SELinux—after many SELinux developers took strong stands against those kinds of hooks when they were proposed for AppArmor and, later, TOMOYO. But, this change would not add pathname-based access controls to SELinux, and would, instead, allow it to make decisions about the label it applies to a new inode based on the filename being created. Still, there are questions about whether this is just an ad hoc change to the LSM API for SELinux, and whether there are other hooks that might benefit from similar treatment.

The patches, which were proposed by Eric Paris on the linux-security-module mailing list, are fairly straightforward. The first simply adds a struct qstr pointer to the inode_init_security() hook and changes all the calls to it that are made, mostly in various filesystems. A qstr is a "quick string" object from the directory entry cache, which contains the filename and some additional information (length and hash). The other patch in the set changes SELinux so that it can use that information in its policies:

Currently SELinux has rules which label new objects according to 3 criteria. The label of the process creating the object, the label of the parent directory, and the type of object (reg, dir, char, block, etc.) This patch adds a 4th criteria, the dentry name, thus we can distinguish between creating a file in an etc_t directory called shadow and one called motd.

There is no file globbing, regex parsing, or anything mystical. Either the policy exactly (strcmp) matches the dentry name of the object or it doesn't. This patch has no changes from today if policy does not implement the new rules.

But the inclusion of path information was enough to get a rise out of Casey Schaufler: "I see. Pathname based controls. In SELinux.". He went on to note that AppArmor and TOMOYO had made similar arguments to Paris's and that there are already pathname-based hooks that were added to support those two solutions. But Paris is quick to point out that he is not implementing pathname-based access controls (which is what AppArmor and TOMOYO implement), but is only adding additional information for decisions about labeling new filesystem objects:

The intention is to remove some particularly gross userspace hacks related to new object labeling (read udev/restorecond/anything to do with /var/run, etc). It simplifies userspace, removes numerous races, and does so with no reduction in security (and theoretically the possibility of a more secure system)

Schaufler does not completely buy that argument because of the way labels are typically maintained in an SELinux system, i.e. using user-space utilities like restorecond that are pathname-based: "Yes, the kernel component of SELinux relies strictly on the labels, but the reality is that SELinux is heavily dependent on the user space component to maintain the proper labels on files so that the specified policy is rational." Stephen Smalley agreed with that to some extent, but tries to clarify the role of pathnames in SELinux:

That fact that we are already using the parent directory context as an input in computing the security context of a new files means that our file labeling logic is already "path-based" in a certain sense. It isn't solely path-based (either before or after this change), but it is already taking into account the placement of the file when it is created. This just refines the granularity at which we can make such decisions.

Smalley also explains more about the kinds of race conditions that the patch is trying to avoid:

restorecond and udev relabeling of kernel-created dev nodes are inherently racy - the file is not created in the desired security context initially, and must be relabeled by some userspace component that notices that the file has been created. Kernel support for incorporating the last component name as an additional input enables us to label certain files correctly upon creation and thus avoids that problem entirely.

Furthermore, Smalley said, the pathname-based hooks that are currently available in the LSM API are not usable to solve this problem because they don't address the issue of assigning labels to new inodes. The existing hooks are "about enforcing access control upon file accesses based on the pathname used to reach the file". The SELinux community has reached a consensus that the proposed change is needed, Smalley said, and the only real question in his mind was whether the changes were acceptable to the Linux virtual filesystem (VFS) and various filesystem developers.

While Schaufler recognizes that the SELinux community is fully behind the change, he wonders if there are other hooks that could also benefit from the filename information:

One of the concerns that has traditionally been raised when new LSM hooks or changes to existing hooks are proposed is that of generality. I can think of a number of ways in which the final component of a pathname could be used to make access control decisions, but I would not expect to be using them myself. Who else might you expect to make use of this LSM "enhancement", or is this something that only SELinux is ever going to want? Is the component something the LSM should be providing in general, or is this the only case in which it makes sense?

He goes on to point out that the LSM API is inconsistent and arbitrary, so it would make sense to look at the "bigger picture" before hacking in a change specifically for SELinux. As an example, he posits a possible access control mechanism that uses file extensions to make decisions ("only files suffixed with '.exe' can be executed and only files suffixed with '.so' can be mmapped"). Smalley believes that kind of access control could be done with the existing pathname-based hooks, but Kyle Moffett came up with another place where the filename information might be useful, even for SELinux:

While you of course cannot (and should not) *change* the label of a file in a link() or rename() operation, it would potentially be useful to deny an operation based on the old label and the new name that is being passed in.

The example Moffett gives would deny a compromised web application the ability to rename or link to the .htaccess file in its directories.

So far, none of the VFS or filesystem hackers have spoken up one way or another, so it is unclear whether this change will be acceptable to them. The LSM API is something of a kernel outcast—or so it appears at times—as no one is particularly satisfied with it, yet it is an integral part of the kernel security landscape. Sometimes that means that various "hacks" get added for specific security solutions, without looking at the overall picture, which is rather unfortunate. It may well be that this change is adopted, as is, without considering other potential users or consistency in the API.

Comments (4 posted)

Brief items

Security quotes of the week

Within a couple days' time, the WikiLeaks web content has been spread across enough independent parts of the Internet's DNS and routing space that they are, for all intents and purposes, now immune to takedown by any single legal authority. If pressure were applied, one imagines that the geographic diversity would simply double, and double again.
-- James Cowie

Unfortunately, my government does not agree with my definition of winning. They think that living in fear and trying desperately to keep us all 100% safe while flying is the most effective way to fight terrorism. It reminds me of a boss that told me he liked it when people lived in fear of being fired, they worked harder. I told him being fired held no fear for me. When you live in fear, you do irrational things - like sending millions of people's shoes through an xray scanner every day.
-- Stormy Peters

Some of them call terrorism an "existential threat" against our nation. It's not. Even the events of 9/11, as horrific as they were, didn't make an existential dent in our nation. Automobile-related fatalities -- at 42,000 per year, more deaths each month, on average, than 9/11 -- aren't, either. It's our reaction to terrorism that threatens our nation, not terrorism itself. The empty monument would symbolize the empty rhetoric of those leaders who preach fear and then use that fear for their own political ends.
-- Bruce Schneier on closing the Washington Monument (worth reading in its entirety)

Because if a group of well-planned and well-funded terrorist plotters makes it to the airport, the chance is pretty low that those blue-shirted crotch-groping water-bottle-confiscating TSA agents are going to catch them. The agents are trying to do a good job, but the deck is so stacked against them that their job is impossible. Airport security is the last line of defense, and it's not a very good one.
-- Bruce Schneier (yet again)

Comments (none posted)

Back door in ProFTPD FTP server (The H)

The H has an article about a back door that was recently put into the ProFTPD server code. "The back door provides the attackers with complete access to systems on which the modified version of the server has been installed. On installation, the modified version informs the group behind the back door by contacting an IP address in the Saudi Arabia area. Entering the command 'HELP ACIDBITCHEZ' results in the modified server displaying a root shell. [...] Ironically, to place their back door, the attackers used a zero day vulnerability in ProFTPD itself, which the developers were using to make the source code available to users." (Thanks to Jan-Frode Myklebust who gave us a heads-up about this issue).

Comments (39 posted)

Interesting kernel exploit posted

Dan Rosenberg has posted a new local kernel compromise program to the full-disclosure list. It is interesting in that it requires the simultaneous exploitation of three different vulnerabilities to work. This particular program is not useful for anybody wanting to take over a system, but: "However, the important issue, CVE-2010-4258, affects everyone, and it would be trivial to find an unpatched DoS under KERNEL_DS and write a slightly more sophisticated version of this that doesn't have the roadblocks I put in to prevent abuse by script kiddies."

Full Story (comments: 15)

New vulnerabilities

acroread: code execution

Package(s):acroread CVE #(s):CVE-2010-4091
Created:December 2, 2010 Updated:March 7, 2011
Description:

From the Red Hat advisory:

A specially-crafted PDF file could cause Adobe Reader to crash or, potentially, execute arbitrary code as the user running Adobe Reader when opened. (CVE-2010-3654, CVE-2010-4091)

Alerts:
SUSE SUSE-SA:2011:011 2011-03-07
openSUSE openSUSE-SU-2011:0156-1 2011-03-07
Gentoo 201101-08 2011-01-21
SUSE SUSE-SA:2010:058 2010-12-08
openSUSE openSUSE-SU-2010:1030-1 2010-12-07
Red Hat RHSA-2010:0934-01 2010-12-01
Gentoo 201201-19 2012-01-30

Comments (none posted)

bareftp: privilege escalation

Package(s):bareftp CVE #(s):CVE-2010-3350
Created:December 8, 2010 Updated:December 8, 2010
Description: From the CVE entry:

bareFTP 0.3.4 places a zero-length directory name in the LD_LIBRARY_PATH, which allows local users to gain privileges via a Trojan horse shared library in the current working directory.

Alerts:
Fedora FEDORA-2010-18323 2010-11-29
Fedora FEDORA-2010-18310 2010-11-29

Comments (none posted)

bind: multiple vulnerabilities

Package(s):bind9 CVE #(s):CVE-2010-3613 CVE-2010-3614
Created:December 2, 2010 Updated:January 27, 2011
Description:

From the Ubuntu advisory:

It was discovered that Bind would incorrectly allow a ncache entry and a rrsig for the same type. A remote attacker could exploit this to cause Bind to crash, resulting in a denial of service. (CVE-2010-3613)

It was discovered that Bind would incorrectly mark zone data as insecure when the zone is undergoing a key algorithm rollover. (CVE-2010-3614)

Alerts:
CentOS CESA-2010:1000 2011-01-27
Red Hat RHSA-2010:1000-01 2010-12-20
Slackware SSA:2010-350-01 2010-12-17
Mandriva MDVSA-2010:253 2010-12-14
CentOS CESA-2010:0976 2010-12-14
Red Hat RHSA-2010:0976-01 2010-12-13
Red Hat RHSA-2010:0975-01 2010-12-13
Debian DSA-2130-1 2010-12-10
Fedora FEDORA-2010-18469 2010-12-02
openSUSE openSUSE-SU-2010:1031-1 2010-12-08
Fedora FEDORA-2010-18521 2010-12-03
Fedora FEDORA-2010-18521 2010-12-03
Fedora FEDORA-2010-18521 2010-12-03
Ubuntu USN-1025-1 2010-12-01
Gentoo 201206-01 2012-06-02

Comments (none posted)

clamav: multiple vulnerabilities

Package(s):clamav CVE #(s):CVE-2010-4260 CVE-2010-4479 CVE-2010-4261
Created:December 7, 2010 Updated:December 24, 2010
Description: From the Mandriva advisory:

Multiple unspecified vulnerabilities in pdf.c in libclamav in ClamAV before 0.96.5 allow remote attackers to cause a denial of service (application crash) or possibly execute arbitrary code via a crafted PDF document (CVE-2010-4260, (CVE-2010-4479).

Off-by-one error in the icon_cb function in pe_icons.c in libclamav in ClamAV before 0.96.5 allows remote attackers to cause a denial of service (memory corruption and application crash) or possibly execute arbitrary code via unspecified vectors. NOTE: some of these details are obtained from third party information (CVE-2010-4261).

Alerts:
Gentoo 201110-20 2011-10-23
SUSE SUSE-SR:2010:024 2010-12-23
Fedora FEDORA-2010-18564 2010-12-05
Ubuntu USN-1031-1 2010-12-10
openSUSE openSUSE-SU-2010:1041-1 2010-12-10
Fedora FEDORA-2010-18568 2010-12-05
Mandriva MDVSA-2010:249 2010-12-07

Comments (none posted)

epiphany: arbitrary https web site spoofing

Package(s):epiphany CVE #(s):CVE-2010-3312
Created:December 7, 2010 Updated:January 25, 2011
Description: From the CVE entry:

Epiphany 2.28 and 2.29, when WebKit and LibSoup are used, unconditionally displays a closed-lock icon for any URL beginning with the https: substring, without any warning to the user, which allows man-in-the-middle attackers to spoof arbitrary https web sites via a crafted X.509 server certificate.

Alerts:
SUSE SUSE-SR:2011:002 2011-01-25
openSUSE openSUSE-SU-2011:0024-1 2011-01-12
SUSE SUSE-SR:2010:023 2010-12-08
openSUSE openSUSE-SU-2010:1027-1 2010-12-07

Comments (none posted)

gnupg: code execution

Package(s):gnupg CVE #(s):
Created:December 6, 2010 Updated:December 10, 2010
Description: From the rPath advisory:

A use-after-free vulnerability in kbx/keybox-blob.c in GPGSM in GnuPG could allow remote attackers to cause a denial of service (crash) and possibly execute arbitrary code by tricking a user into importing a certificate with a large number of Subject Alternate Names.

Alerts:
rPath rPSA-2010-0076-1 2010-12-06

Comments (1 posted)

imagemagick: privilege escalation

Package(s):imagemagick CVE #(s):CVE-2010-4167
Created:December 8, 2010 Updated:March 22, 2012
Description: From the CVE entry:

Untrusted search path vulnerability in configure.c in ImageMagick before 6.6.5-5, when MAGICKCORE_INSTALLED_SUPPORT is defined, allows local users to gain privileges via a Trojan horse configuration file in the current working directory.

Alerts:
Fedora FEDORA-2010-19056 2010-12-18
Fedora FEDORA-2010-19025 2010-12-17
Ubuntu USN-1028-1 2010-12-07
Red Hat RHSA-2012:0301-03 2012-02-21
Oracle ELSA-2012-0301 2012-03-07
Scientific Linux SL-Imag-20120321 2012-03-21
Red Hat RHSA-2012:0544-01 2012-05-07
CentOS CESA-2012:0544 2012-05-07
Scientific Linux SL-Imag-20120508 2012-05-08
Oracle ELSA-2012-0544 2012-05-08
Mandriva MDVSA-2012:077 2012-05-17

Comments (none posted)

kernel: multiple vulnerabilities

Package(s):kernel CVE #(s):CVE-2010-4075 CVE-2010-4077 CVE-2010-4248
Created:December 6, 2010 Updated:August 9, 2011
Description: From the CVE entries:

The uart_get_count function in drivers/serial/serial_core.c in the Linux kernel before 2.6.37-rc1 does not properly initialize a certain structure member, which allows local users to obtain potentially sensitive information from kernel stack memory via a TIOCGICOUNT ioctl call. (CVE-2010-4075)

The ntty_ioctl_tiocgicount function in drivers/char/nozomi.c in the Linux kernel 2.6.36.1 and earlier does not properly initialize a certain structure member, which allows local users to obtain potentially sensitive information from kernel stack memory via a TIOCGICOUNT ioctl call. (CVE-2010-4077)

Race condition in the __exit_signal function in kernel/exit.c in the Linux kernel before 2.6.37-rc2 allows local users to cause a denial of service via vectors related to multithreaded exec, the use of a thread group leader in kernel/posix-cpu-timers.c, and the selection of a new thread group leader in the de_thread function in fs/exec.c. (CVE-2010-4248)

Alerts:
Ubuntu USN-1218-1 2011-09-29
Ubuntu USN-1216-1 2011-09-26
Ubuntu USN-1208-1 2011-09-14
Ubuntu USN-1204-1 2011-09-13
Ubuntu USN-1203-1 2011-09-13
Ubuntu USN-1202-1 2011-09-13
Ubuntu USN-1187-1 2011-08-09
Ubuntu USN-1183-1 2011-08-03
Ubuntu USN-1170-1 2011-07-15
Ubuntu USN-1167-1 2011-07-13
Ubuntu USN-1164-1 2011-07-06
Debian DSA-2264-1 2011-06-18
SUSE SUSE-SA:2011:017 2011-04-18
openSUSE openSUSE-SU-2011:0346-1 2011-04-18
Ubuntu USN-1105-1 2011-04-05
Ubuntu USN-1093-1 2011-03-25
Ubuntu USN-1092-1 2011-03-25
SUSE SUSE-SA:2011:015 2011-03-24
Ubuntu USN-1090-1 2011-03-18
Ubuntu USN-1089-1 2011-03-18
Red Hat RHSA-2011:0330-01 2011-03-10
Ubuntu USN-1086-1 2011-03-08
SUSE SUSE-SA:2011:012 2011-03-08
Ubuntu USN-1080-2 2011-03-02
Ubuntu USN-1081-1 2011-03-02
Ubuntu USN-1080-1 2011-03-01
Ubuntu USN-1073-1 2011-02-25
Ubuntu USN-1072-1 2011-02-25
Mandriva MDVSA-2011:029 2011-02-17
openSUSE openSUSE-SU-2011:0399-1 2011-04-28
Debian DSA-2153-1 2011-01-30
CentOS CESA-2011:0162 2011-01-27
Red Hat RHSA-2011:0162-01 2011-01-18
Red Hat RHSA-2011:0007-01 2011-01-11
CentOS CESA-2011:0004 2011-01-06
Red Hat RHSA-2011:0004-01 2011-01-04
Red Hat RHSA-2010:0958-01 2010-12-08
Fedora FEDORA-2010-18506 2010-12-03
Fedora FEDORA-2010-18493 2010-12-03
Red Hat RHSA-2011:0017-01 2011-01-13

Comments (none posted)

mercurial: man-in-the-middle attack

Package(s):mercurial CVE #(s):CVE-2010-4237
Created:December 7, 2010 Updated:December 8, 2010
Description: From the Novell bugzilla:

a security flaw was found in the way Mercurial handled subject Common Name field of the provided certificate (the check if the commonName in the received certificate matches the requested hostname was not performed). An attacker, able to get a carefully-crafted certificate signed by a Certificate Authority could use the certificate during a man-in-the-middle attack and potentially confuse Mercurial into accepting it by mistake.

Alerts:
openSUSE openSUSE-SU-2010:1029-1 2010-12-07

Comments (none posted)

openssl: unintended cipher use

Package(s):openssl CVE #(s):CVE-2010-4180
Created:December 7, 2010 Updated:July 27, 2011
Description: From the Mandriva advisory:

OpenSSL before 0.9.8q, and 1.0.x before 1.0.0c, when SSL_OP_NETSCAPE_REUSE_CIPHER_CHANGE_BUG is enabled, does not properly prevent modification of the ciphersuite in the session cache, which allows remote attackers to force the use of an unintended cipher via vectors involving sniffing network traffic to discover a session identifier.

Alerts:
Gentoo 201110-01 2011-10-09
SUSE SUSE-SU-2011:0847-1 2011-07-27
openSUSE openSUSE-SU-2011:0845-1 2011-07-27
rPath rPSA-2011-0013-1 2011-04-11
SUSE SUSE-SR:2011:009 2011-05-17
CentOS CESA-2010:0977 2011-01-27
SUSE SUSE-SR:2011:001 2011-01-11
Debian DSA-2141-1 2011-01-06
openSUSE openSUSE-SU-2011:0014-1 2011-01-05
Fedora FEDORA-2010-18736 2010-12-09
CentOS CESA-2010:0978 2010-12-14
Red Hat RHSA-2010:0979-01 2010-12-13
Red Hat RHSA-2010:0978-01 2010-12-13
Red Hat RHSA-2010:0977-01 2010-12-13
Fedora FEDORA-2010-18765 2010-12-09
Ubuntu USN-1029-1 2010-12-08
Slackware SSA:2010-340-01 2010-12-07
Mandriva MDVSA-2010:248 2010-12-07

Comments (none posted)

openssl: unintended cipher use

Package(s):openssl CVE #(s):CVE-2008-7270
Created:December 8, 2010 Updated:January 27, 2011
Description: From the CVE entry:

OpenSSL before 0.9.8j, when SSL_OP_NETSCAPE_REUSE_CIPHER_CHANGE_BUG is enabled, does not prevent modification of the ciphersuite in the session cache, which allows remote attackers to force the use of a disabled cipher via vectors involving sniffing network traffic to discover a session identifier, a different vulnerability than CVE-2010-4180.

Alerts:
CentOS CESA-2010:0977 2011-01-27
CentOS CESA-2010:0978 2010-12-14
Red Hat RHSA-2010:0978-01 2010-12-13
Red Hat RHSA-2010:0977-01 2010-12-13
Ubuntu USN-1029-1 2010-12-08

Comments (none posted)

openssl: authentication bypass

Package(s):openssl CVE #(s):CVE-2010-4252
Created:December 7, 2010 Updated:December 8, 2010
Description: From the CVE entry:

OpenSSL before 1.0.0c, when J-PAKE is enabled, does not properly validate the public parameters in the J-PAKE protocol, which allows remote attackers to bypass the need for knowledge of the shared secret, and successfully authenticate, by sending crafted values in each round of the protocol.

Alerts:
Gentoo 201110-01 2011-10-09
Slackware SSA:2010-340-01 2010-12-07

Comments (none posted)

python-paste: cross-site scripting

Package(s):paste CVE #(s):CVE-2010-2477
Created:December 8, 2010 Updated:December 8, 2010
Description: From the CVE entry:

Multiple cross-site scripting (XSS) vulnerabilities in the paste.httpexceptions implementation in Paste before 1.7.4 allow remote attackers to inject arbitrary web script or HTML via vectors involving a 404 status code, related to (1) paste.urlparser.StaticURLParser, (2) paste.urlparser.PkgResourcesParser, (3) paste.urlmap.URLMap, and (4) HTTPNotFound.

Alerts:
Ubuntu USN-1026-1 2010-12-07

Comments (none posted)

tomboy: code execution

Package(s):tomboy CVE #(s):CVE-2010-4005
Created:December 2, 2010 Updated:June 16, 2011
Description:

From the Novell bugzilla entry:

The following files set LD_LIBRARY_PATH in a way that allows empty elements which means the current directory is included:

/usr/bin/tomboy (+: instead of :+:)
/usr/bin/tomboy-panel (+: instead of :+:)

Alerts:
Fedora FEDORA-2011-7997 2011-06-07
Fedora FEDORA-2011-7994 2011-06-07
Mandriva MDVSA-2011:035 2011-02-22
SUSE SUSE-SR:2010:023 2010-12-08
openSUSE openSUSE-SU-2010:1001-1 2010-12-02

Comments (none posted)

Page editor: Jake Edge

Kernel development

Brief items

Kernel release status

The current development kernel is 2.6.37-rc5, released on December 6. "Well, no surprises this week. I think the bulk patch-wise are config patches (both ARM defconfig cleanups and some kconfig updates). And the rbd sysfs interface change stands out, but other than that it's mostly fairly small fixes all over." See the full changelog for all the details.

Linus thinks that the final 2.6.37 release will happen in early January. It might be possible to put it out a little sooner, but: "I don't really think anybody wants the merge window over the holidays."

Stable updates: there have been no stable updates in the last week. The 2.6.27.57, 2.6.32.27, and 2.6.36.2 (289 patches) updates are in the review process, with a release expected on or after December 9.

Comments (none posted)

Quotes of the week

Seriously. Nobody _ever_ does "nice make", unless they are seriously repressed beta-males (eg MIS people who get shouted at when they do system maintenance unless they hide in dark corners and don't get discovered). It just doesn't happen.
-- Linus Torvalds

If we had the Macro of Invinicibility (true stable events), then this would not be an issue for us. But unfortunately, the Macro of Invincibility is not here, and is probably buried somewhere with an old gay wizard.
-- Steven Rostedt (who is evidently reading (or watching) too much Harry Potter)

Comments (8 posted)

Some stable kernel process changes

Greg Kroah-Hartman has announced some minor changes in how the stable kernel updates are done. "So, it's 'back to our roots' time, and I'm now only going to be doing -stable releases for the last kernel released, with the usual one or two release overlap with the latest release from Linus to give people a chance to move over and have the new release stabilize a bit." That said, the long-term 2.6.27 and 2.6.32 maintenance will continue, but 2.6.27 will probably have a new maintainer soon. Also, Andi Kleen has stepped forward to maintain 2.6.35 for the indefinite future.

Comments (none posted)

Gettys: Whose house is of glasse, must not throw stones at another

Jim Gettys has been on the path of a number of network pathologies for some time; he has now summarized his findings. The problem: too much buffering in Internet routers. "The buffers are confusing TCP's RTT estimator; the delay caused by the buffers is many times the actual RTT on the path. Remember, TCP is a servo system, which is constantly trying to "fill" the pipe. So by not signalling congestion in a timely fashion, there is *no possible way* that TCP's algorithms can possibly determine the correct bandwidth it can send data at (it needs to compute the delay/bandwidth product, and the delay becomes hideously large). TCP increasingly sends data a bit faster (the usual slow start rules apply), reestimates the RTT from that, and sends data faster. Of course, this means that even in slow start, TCP ends up trying to run too fast. Therefore the buffers fill (and the latency rises). Note the actual RTT on the path of this trace is 10 milliseconds; TCP's RTT estimator is mislead by more than a factor of 100. It takes 10-20 seconds for TCP to get completely confused by the buffering in my modem; but there is no way back."

Comments (84 posted)

Kernel development news

Group scheduling and alternatives

By Jonathan Corbet
December 6, 2010
The TTY-based group scheduling patch set has received a lot of discussion on LWN and elsewhere; some distributors are rushing out kernels with this code added, despite the fact that it has not yet been merged into the mainline. That patch has evolved slightly since it was last discussed here. There have also been some interesting conversations about alternatives; this article will attempt to bring things up to date.

The main change to the TTY-based group scheduling patch set is that it is, in fact, no longer TTY-based. The identity of the controlling terminal was chosen as a heuristic which could be used to group together tasks which should compete with each other for CPU time, but other choices are possible. An obvious possibility is the session ID. This ID is used to identify distinct process groups; a process starts a new session with the setsid() system call. Since sessions are already used to group together related processes, it makes sense to use the session ID as the key when grouping processes for scheduling. More recent versions of the patch do exactly that. The session-based group scheduling mechanism appears to be stabilizing; chances are good that it will be merged in the 2.6.38 merge window.

Meanwhile, there have been a couple of discussions led by vocal proponents of other approaches to interactive scheduling. It is fair to say that neither is likely to find its way into the mainline. Both are worth a look, though, as examples of how people are thinking about the problem.

Colin Walters asked about whether group scheduling could be tied into the "niceness" priorities which have been implemented by Unix and Linux schedulers for decades. People are used to nice, he said, but they would like it to work better. Creating groups for nice levels would help to make that happen. But Linus was not excited about this idea; he claims that almost nobody uses nice now and that is unlikely to change.

More to the point, though: the semantics implemented by nice are very different from those offered by group scheduling. The former is entirely priority-based, making the promise that processes with a higher "niceness" will get less processor time than those with lower values. Group scheduling, instead, is about isolation - keeping groups of processes from interfering with each other. The concept of priorities is poorly handled by group scheduling now, it's just not how that mechanism works. Group scheduling will not cause one set of processes to run in favor of another; it just ensures that the division of CPU time between the groups is fair.

Colin went on to suggest that using groups would improve nice, giving the results that users really want. But changing something as fundamental as the effects of niceness would be, in a very real sense, an ABI change. There may not be many users of nice, but installations which depend on it would not appreciate a change in its semantics. So nice will stay the way it is, and group scheduling will be used to implement different (presumably better) semantics.

The group scheduling discussion also featured a rare appearance by Con Kolivas. Con's view is that the session-based group scheduling patch is another attempt to put interactivity heuristics into the kernel - an approach which has failed in the past:

You want to program more intelligence in to work around these regressions, you'll just get yourself deeper and deeper into the same quagmire. The 'quick fix' you seek now is not something you should be defending so vehemently. The "I have a solution now" just doesn't make sense in this light. I for one do not welcome our new heuristic overlords.

Con's alternative suggestion was to put control of interactivity more directly into the hands of user space. He would attach a parameter to every process describing its latency needs. Applications could then be coded to communicate their needs to the kernel; an audio processing application would request the lowest latency, while make would inform the kernel that latency matters little. Con would also add a global knob controlling whether low-latency processes would also get more CPU time. The result, he says, would be to explicitly favor "foreground" processes (assuming those processes are the ones which request lower latency). Distributors could set up defaults for these parameters; users could change them, if they wanted to.

All of that, Con said, would be a good way to "move away from the fragile heuristic tweaks and find a longer term robust solution." The suggestion has not been particularly well received, though. Group scheduling was defended against the "heuristics" label; it is simply an implementation of the scheduling preferences established by the user or system administrator. The session-based component is just a default for how the groups can be composed; it may well be a better default than "no groups," which is what most systems are using now. More to the point, changing that default is easily done. Lennart Poettering's systemd-driven groups are an example; they are managed entirely from user space. Group scheduling is, in fact, quite easy to manage for anybody who wants to set up a different scheme.

So we'll probably not see Con's knobs added anytime soon - even if somebody does actually create a patch to implement them. What we might see, though, is a variant on that approach where processes could specify exact latency and CPU requirements. A patch for that does exist - it's called the deadline scheduler. If clever group scheduling turns out not to solve everybody's problem (likely - somebody always has an intractable problem), we might see a new push to get the deadline scheduling patches merged.

Comments (16 posted)

The RCU API, 2010 Edition

December 8, 2010

This article was contributed by Paul McKenney

Introduction

Read-copy update (RCU) is a synchronization mechanism that was added to the Linux kernel in October of 2002. RCU is most frequently described as a replacement for reader-writer locking, but has also been used in a number of other ways. RCU is notable in that RCU readers do not directly synchronize with RCU updaters, which makes RCU read paths extremely fast, and also permits RCU readers to accomplish useful work even when running concurrently with RCU updaters.

Although the basic idea behind RCU has not changed in decades following its introduction into DYNIX/ptx, the RCU API has evolved significantly even over the past three years, most recently due to software-engineering concerns. This evolution is documented by the following sections.

  1. Software-Engineering Enhancements
  2. RCU has a Family of Wait-to-Finish APIs
  3. RCU has List-Based Publish-Subscribe and Version-Maintenance APIs
  4. RCU has Pointer-Based Publish-Subscribe and Version-Maintenance APIs
  5. RCU has Debugging APIs
  6. Kernel Configuration Parameters
  7. What Next for the RCU API?

These sections are followed by answers to the Quick Quizzes.

Software-Engineering Enhancements

Pre-Linux experience with RCU-like primitives featured either a small number of uses, a small number of developers, or both. That experience gave no reason to believe that Linux would ever contain more than a few hundred RCU call sites which could easily be validated by inspection on each release of the Linux kernel, thus making computer-assisted validation unnecessary. Nevertheless, Linux developers quickly added checking in the form of “scheduling while atomic” that triggered if a thread blocked within an RCU read-side critical section in CONFIG_PREEMPT kernels. This provided as much checking as did any previous RCU implementation, so life was good, at least for a while.

Around 2005, the uptake of RCU in the Linux kernel increased substantially. Where it took about four years for the first thousand RCU call sites, the second thousand took only about 18 months, as did the third thousand. Even given only the three thousand RCU call sites that were in the kernel at the beginning of 2010, rechecking each instance during the week after each release's merge window would leave less than two minutes for each instance, assuming an 80-hour week. The odds of spotting subtle bugs on that kind of schedule are vanishingly small. This point was underscored by a patchset from Thomas Gleixner fixing a number of rcu_read_lock()-omission bugs; Tetsuo Handa followed up with a list of no fewer than 47 similar bugs.

I therefore started work on a patch that used lockdep to ensure that rcu_dereference() calls are properly protected, either by an RCU read-side critical section, by the update-side lock, or by being inaccessible to readers, as reported on LWN. I naively expected this to be the end of the story, but experience with problems reported by lockdep RCU resulted in a number of changes to the RCU API, which are documented below.

Mathieu Desnoyers took interest in another set of RCU-usage bugs in which the developer passes a given structure to call_rcu() twice within a given grace period, and submitted a patch that uses debug objects to detect this sort of abuse. Because the debug-objects facility does not track on-stack variables, Mathieu added an additional pair of RCU APIs to control tracking of on-stack rcu_head structures.

Arnd Bergmann noted that lockdep RCU cannot be expected to locate bugs where an RCU-protected pointer is accessed directly, without the benefit of rcu_dereference(). He therefore enlisted the aid of the sparse static analysis tool by annotating declarations of RCU-protected pointers with a new __rcu macro. This macro allows sparse to complain when an RCU-protected pointer is accessed without the assistance of one of the rcu_dereference() family of RCU APIs, which required some modifications to those APIs.

A few other RCU API changes resulted from usage, filling out the API as the need for additional RCU primitives arose. The next sections discuss aspects of the RCU API, highlighting recent changes.

RCU has a Family of Wait-to-Finish APIs

The most straightforward answer to “what is RCU” is that RCU is an API used in the Linux kernel, as summarized by the big API table and the following discussion. Or, more precisely, RCU is a four-member family of APIs as shown in the table, with each column corresponding to one of the family members.

If you are new to RCU, you might consider focusing on just one of the columns in the big RCU API table. For example, if you are primarily interested in understanding how RCU is most frequently used in the Linux kernel, “RCU” would be the place to start. On the other hand, if you want to understand RCU for its own sake, “SRCU” has the simplest API. You can always come back to the other columns later. If you are already familiar with RCU, this table can serve as a useful reference.

Quick Quiz 1: Why are some of the cells in the big table colored green?

Quick Quiz 2: Why are some of the cells in the big table colored blue?

The “RCU” column corresponds to the original RCU implementation, in which RCU read-side critical sections are delimited by rcu_read_lock() and rcu_read_unlock(), which may be nested. The corresponding synchronous update-side primitives, synchronize_rcu(), along with its synonym synchronize_net(), wait for any currently executing RCU read-side critical sections to complete. The length of this wait is known as a “grace period”. If grace periods are too long for you, synchronize_rcu_expedited() speeds things up by about an order of magnitude, but at the expense of significant CPU overhead. The asynchronous update-side primitive, call_rcu(), invokes a specified function with a specified argument after a subsequent grace period. For example, call_rcu(p,f); will result in the “RCU callback” f(p) being invoked after a subsequent grace period. There are situations, such as when unloading a module that uses call_rcu(), when it is necessary to wait for all outstanding RCU callbacks to complete. The rcu_barrier() primitive does this job.

In the “RCU BH” column, rcu_read_lock_bh() and rcu_read_unlock_bh() delimit RCU read-side critical sections, and call_rcu_bh() invokes the specified function and argument after a subsequent grace period. There are also synchronize_rcu_bh(), synchronize_rcu_bh_expedited(), and rcu_barrier_bh() primitives, which are analogous to their “RCU” counterparts.

In the “RCU Sched” column, anything that disables preemption acts as an RCU read-side critical section, and synchronize_sched() waits for the corresponding RCU grace period. This RCU API family was added in the 2.6.12 kernel, which split the old synchronize_kernel() API into the current synchronize_rcu() (for RCU) and synchronize_sched() (for RCU Sched). There are also synchronize_sched(), synchronize_sched_expedited(), and rcu_barrier_sched() primitives, which are analogous to their “RCU” counterparts.

Quick Quiz 3: What happens if you mix and match RCU and RCU Sched?

The "SRCU" column displays a specialized RCU API that permits general sleeping in RCU read-side critical sections, as was described in the LWN article Sleepable RCU. Of course, use of synchronize_srcu() in an SRCU read-side critical section can result in self-deadlock, so should be avoided. SRCU differs from earlier RCU implementations in that the caller allocates an srcu_struct for each distinct SRCU usage. This approach prevents SRCU read-side critical sections from blocking unrelated synchronize_srcu() invocations. In addition, in this variant of RCU, srcu_read_lock() returns a value that must be passed into the corresponding srcu_read_unlock().

Quick Quiz 4: Why does SRCU lack an asynchronous call_srcu() interface?

Quick Quiz 5: Can synchronize_srcu() be safely used within an SRCU read-side critical section? If so, why? If not, why not?

The Linux kernel currently has a surprising number of RCU APIs and implementations. There is some hope of reducing this number, but careful inspection and analysis will be required before removing either an implementation or any API members, just as would be required before removing one of the many locking APIs in the Linux kernel.

RCU has List-Based Publish-Subscribe and Version-Maintenance APIs

Fortunately, most of RCU's list-based publish-subscribe and version-maintenance primitives shown in the following table apply to all of the variants of RCU discussed above. This commonality can in some cases allow more code to be shared, which certainly reduces the API proliferation that would otherwise occur. However, it is quite likely that software-engineering considerations will eventually result in variants of these list-handling primitives that are specialized for each given flavor of RCU.

Category Primitives Purpose
List traversal list_for_each_entry_rcu() Iterate over an RCU-protected list from the beginning.
list_for_each_entry_continue_rcu() Iterate over an RCU-protected list from the specified element.
list_entry_rcu() Given a pointer to a raw list_head in an RCU-protected list, return a pointer to the enclosing element.
list_first_entry_rcu() Return the first element of an RCU-protected list.
List update list_add_rcu() Add an element to the head of an RCU-protected list.
list_add_tail_rcu() Add an element to the tail of an RCU-protected list.
list_del_rcu() Delete the specified element from an RCU-protected list, poisoning the ->pprev pointer but not the ->next pointer.
list_replace_rcu() Replace the specified element in an RCU-protected list with the specified element.
list_splice_init_rcu() Move all elements from an RCU-protected list to another RCU-protected list.
Hlist traversal hlist_for_each_entry_rcu() Iterate over an RCU-protected hlist from the beginning.
hlist_for_each_entry_rcu_bh() Iterate over an RCU-bh-protected hlist from the beginning.
hlist_for_each_entry_continue_rcu() Iterate over an RCU-protected hlist from the specified element.
hlist_for_each_entry_continue_rcu_bh() Iterate over an RCU-bh-protected hlist from the specified element.
Hlist update hlist_add_after_rcu() Add an element after the specified element in an RCU-protected hlist.
hlist_add_before_rcu() Add an element before the specified element in an RCU-protected hlist.
hlist_add_head_rcu() Add an element at the head of an RCU-protected hlist.
hlist_del_rcu() Delete the specified element from an RCU-protected hlist, poisoning the ->pprev pointer but not the ->next pointer.
hlist_del_init_rcu() Delete the specified element from an RCU-protected hlist, initializing the element's reverse pointer after deletion.
hlist_replace_rcu() Replace the specified element in an RCU-protected hlist with the specified element.
Hlist nulls traversal hlist_nulls_for_each_entry_rcu() Iterate over an RCU-protected hlist-nulls list from the beginning.
Hlist nulls update hlist_nulls_del_init_rcu() Delete the specified element from an RCU-protected hlist-nulls list, initializing the element after deletion.
hlist_nulls_del_rcu() Delete the specified element from an RCU-protected hlist-nulls list, poisoning the ->pprev pointer but not the ->next pointer.
hlist_nulls_add_head_rcu() Add an element to the head of an RCU-protected hlist-nulls list.

The first pair of categories operate on the Linux kernel's struct list_head lists, which are circular, doubly-linked lists. These primitives permit lists to be modified in the face of concurrent traversals by readers. The list-traversal primitives are implemented with simple instructions, so are extremely lightweight, though they also execute a memory barrier on DEC Alpha. The list-update primitives that add elements to a list incur memory-barrier overhead, while those that only remove elements from a list are implemented using simple instructions. The list_splice_init_rcu() primitive incurs not only memory-barrier overhead, but also grace-period latency, and is therefore the only blocking primitive shown in the table.

Quick Quiz 6: Why doesn't list_del_rcu() poison both the next and prev pointers?

The second pair of categories operate on the Linux kernel's struct hlist_head, which is a linear linked list. One advantage of struct hlist_head over struct list_head is that the former requires only a single-pointer list header, which can save significant memory in large hash tables. The struct hlist_head primitives in the table relate to their non-RCU counterparts in much the same way as do the struct list_head primitives. Their overheads are similar to that of their list counterparts in the first two categories in the table.

The third and final pair of categories operate on Linux-kernel hlist-nulls lists, which are made up of hlist_nulls_head and hlist_nulls_node structures. These lists have special multi-valued NULL pointers, which have the low-order bit set to 1 with the upper bits available to the programmer to distinguish different lists. There are hlist-nulls interfaces for non-RCU-protected lists as well.

Quick Quiz 7: Why would anyone need to distinguish lists based on their NULL pointers? Why not just remember which list you started searching???

A major advantage of hlist-nulls lists is that updaters can free elements to SLAB_DESTROY_BY_RCU slab caches without waiting for an RCU grace period to elapse. However, readers must be extremely careful when traversing such lists: Not only must they conduct their searches within a single RCU read-side critical section, but because any element might be freed and then reallocated at any time, readers must also validate each element that they encounter during their traversal.

Quick Quiz 8: Why is there no hlist_nulls_add_tail_rcu()?

RCU has Pointer-Based Publish-Subscribe and Version-Maintenance APIs

Although RCU's list-based APIs are quite useful, there are times when a hand-crafted data structure is required, for example, RCU-protected arrays and trees. RCU provides for these situations with the pointer-access and -update APIs in the following table:

Category Primitives Purpose
Pointer update rcu_assign_pointer() Assign to an RCU-protected pointer.
Pointer access rcu_dereference() Fetch an RCU-protected pointer, giving an lockdep-RCU error message if not in an RCU read-side critical section.
rcu_dereference_bh() Fetch an RCU-protected pointer, giving an lockdep-RCU error message if not in an RCU-bh read-side critical section.
rcu_dereference_sched() Fetch an RCU-protected pointer, giving an lockdep-RCU error message if not in an RCU-sched read-side critical section.
srcu_dereference() Fetch an RCU-protected pointer, giving an lockdep-RCU error message if not in the specified SRCU read-side critical section.
rcu_dereference_protected() Fetch an RCU-protected pointer with no protection against concurrent updates, giving an lockdep-RCU error message if the specified lockdep condition does not hold. This primitive is normally used when the update-side lock is held.
rcu_dereference_check() Fetch an RCU-protected pointer, giving an lockdep-RCU error message if (1) the specified lockdep condition does not hold and (2) not under the protection of rcu_read_lock().
rcu_dereference_bh_check() Fetch an RCU-bh-protected pointer, giving an lockdep-RCU error message if (1) the specified lockdep condition does not hold and (2) not under the protection of rcu_read_lock_bh() (2.6.37 or later).
rcu_dereference_sched_check() Fetch an RCU-sched-protected pointer, giving an lockdep-RCU error message if (1) the specified lockdep condition does not hold and (2) not under the protection of rcu_read_lock_sched() or friend (2.6.37 or later).
srcu_dereference_check() Fetch an SRCU-protected pointer, giving an lockdep-RCU error message if (1) the specified lockdep condition does not hold and (2) not under the protection of the specified srcu_read_lock() (2.6.37 or later).
rcu_dereference_index_check() Fetch an RCU-protected integral index, giving an lockdep-RCU error message if the specified lockdep condition does not hold.
rcu_access_pointer() Fetch an RCU-protected value (pointer or index), but with no protection against concurrent updates. This primitive is normally used to do pointer comparisons, for example, to check for a NULL pointer.
rcu_dereference_raw() Fetch an RCU-protected pointer with no lockdep-RCU checks. Use of this primitive is strongly discouraged. If you must use this primitive, add a comment stating why, just as you would with smp_mb().

The rcu_assign_pointer() primitive assigns a new value to an RCU-protected pointer, but ensuring that any prior initialization of the pointed-to structure remains ordered before the assignment to the pointer, even on weakly ordered machines. You can think of rcu_assign_pointer() as publishing a new data structure to RCU readers.

Similarly, the rcu_dereference() primitive reads from an RCU-protected pointer, but ensuring that subsequent code dereferencing that pointer will see the effects of initialization code prior to the corresponding rcu_assign_pointer(), prohibiting aggressively optimizing compilers (and the Alpha CPU) from reordering the dereferencing in a way that causes it to precede the rcu_dereference(). In addition, rcu_dereference() documents which pointer dereferences are protected by RCU. You can think of rcu_dereference() and similar primitives as subscribing to the current version of an RCU-protected pointer.

Quick Quiz 9: Normally, any pointer subject to rcu_dereference() should always be updated using rcu_assign_pointer(). What is an exception to this rule?

Recent changes introduced the CONFIG_PROVE_RCU kernel configuration parameter, which causes rcu_dereference() to verify that it is being used under the protection of rcu_read_lock(), emitting a lockdep-RCU error message (also called “lockdep-RCU splat”) if not. Of course, there are multiple flavors of RCU, so that the Linux kernel now has corresponding flavors of rcu_dereference(), adding rcu_dereference_bh(), rcu_dereference_sched(), and srcu_dereference(). Use of these primitives allows you to make your code automatically check for proper use of RCU.

It is of course also legal to access RCU-protected pointers while holding the update-side lock, in which case the data structure cannot change, which means that the compiler constraints and (on Alpha) memory barriers can be omitted. The rcu_dereference_protected() primitive is designed for this situation. It takes the RCU-protected pointer as its first argument and a lockdep expression as its second argument, allowing the code to verify that the required locks really are held.

It is also legal to access RCU-protected pointers in functions that are invoked by both readers and updaters. In this case, protection against concurrent updates is still required, but the lockdep checks must allow for the possibility of RCU read-side critical sections as well as lock-based critical sections. The rcu_dereference_check() primitive is the tool for this job. It correctly handles concurrent updates, but also provides a second argument for a lockdep expression for update-side locks. If rcu_dereference_check() is called outside of the protection of rcu_read_lock() and of the specified locks, it will emit a lockdep-RCU splat. There are also rcu_dereference_bh_check(), rcu_dereference_sched_check(), and srcu_dereference_check() for RCU-bh, RCU-sched, and SRCU, respectively.

One of the side-effects of Arnd's sparse-based checking for misuse of RCU-protected pointers is that all of the members of the rcu_dereference() family now require a real pointer of a type that really can be dereferenced. Further, this pointer must be a C lvalue rather than an rvalue, so that rcu_dereference(p) is legal, but rcu_dereference(i) (where i is an integer) and rcu_dereference(p+1) are not. Because there really is code in the Linux kernel with RCU-protected indexes (as opposed to pointers), the rcu_dereference_index_check() primitive handles the index case, and takes a lockdep expression identifying the locks and types of RCU that protect the access (see table below). Unfortunately, this also disables sparse-based checking, so it is possible that this primitive will be deprecated in the future in favor of pointers. (So if you know of an RCU-protected index that cannot be easily converted to an RCU-protected pointer, this would be a really good time to speak up!)

In some cases, only the value of the RCU-protected pointer is used without being dereferenced; for example, the RCU-protected pointer might simply be compared against NULL. There is no need to protect against concurrent updates, and there is also no need to be under the protection of rcu_read_lock() or friends. The rcu_access_pointer() primitive is designed for this situation.

Finally, RCU is used by some data structures such as radix trees, where any flavor of RCU might be used for read-side protection, or where locking might provide protection, depending on the usage. The rcu_dereference_raw() primitive is designed for this purpose. Note however, that this disables all lockdep-RCU checking, so please avoid using this where possible. Where it must be used, include a comment saying why its use is safe.

RCU has Debugging APIs

RCU has also gained a number of debugging APIs, shown in the following table:

Category Primitives Purpose
RCU pointer declaration __rcu Tell sparse that a given pointer is intended to be protected by RCU.
Callback tracking init_rcu_head_on_stack() Prepare on-stack callback for debug-objects (CONFIG_DEBUG_OBJECTS_RCU_HEAD).
destroy_rcu_head_on_stack() Prepare on-stack callback for exiting scope.
Critical section validation rcu_read_lock_held() Splat unless under rcu_read_lock().
rcu_read_lock_bh_held() Splat unless under rcu_read_lock_bh() or unless interrupts are disabled.
rcu_read_lock_sched_held() Splat unless under rcu_read_lock_sched() or friend.
srcu_read_lock_held() Splat unless under specified srcu_read_lock().

The first debugging API is used to mark an RCU-protected pointer, for example:

    struct foo __rcu *foo_p;

If a pointer is marked with __rcu, then all accesses to this pointer must be made via rcu_assign_pointer() or via one of the rcu_dereference() family of APIs, otherwise sparse will flag the error when the CONFIG_SPARSE_RCU_POINTER kernel configuration parameter is set. Note that __rcu may be used to tag arguments passed in to functions as well as global variables, struct fields, and the like.

Just as a double kfree() is a bug, so is a double invocation of call_rcu() using the same pointer. If the CONFIG_DEBUG_OBJECTS_RCU_HEAD kernel configuration parameter is set, then debug objects will flag this sort of abuse. However, the debug-objects mechanism does not track objects on the stack by default. Therefore, when working with a stack-based rcu_head structure, use init_rcu_head_on_stack() before the first use and destroy_rcu_head_on_stack() before the corresponding stack frame is deallocated.

Finally, it is common practice to include a header comment on functions that require RCU protection. Although this is good practice, it is far better to enlist the computer's help in locating cases where RCU protection has been omitted. The rcu_read_lock_held(), rcu_read_lock_bh_held(), rcu_read_lock_sched_held(), and srcu_read_lock_held() primitives enable debug checks in the code itself. These primitives provide the most useful information when the CONFIG_PROVE_RCU kernel parameter is set.

These debugging APIs can be extremely helpful. It does take a little extra work to get them set up correctly, but much less work than is required to debug RCU usage problems the hard way!!!

Quick Quiz 10: Why isn't there a debugging API to check for mismatching read-side and grace-period primitives, for example, using rcu_read_lock() on the read side, but incorrectly using synchronize_sched() during updates?

Kernel Configuration Parameters

RCU's kernel-configuration parameters can be considered to be part of the RCU API, most especially from the viewpoint of someone building a kernel intended for a specialized device or workload. This section summarizes the RCU-related configuration parameters.

The first set of parameters controls the underlying behavior of the RCU implementation itself, and is defined in init/Kconfig.

  1. CONFIG_TREE_RCU: selects the non-preemptible tree-based RCU implementation that is appropriate for server-class SMP builds. It can accommodate a very large number of CPUs, but scales down sufficiently well for all but the most memory-constrained systems. The following kernel parameters may also be specified when CONFIG_TREE_RCU is specified:
    1. CONFIG_RCU_TRACE enables debugfs-based tracing.
    2. CONFIG_RCU_FANOUT controls the fanout of the tree. Lower fanout values reduce lock contention, but also consumes more memory and increases the overhead of grace-period computations. To the best of my knowledge, the default values have been always been sufficient.
    3. CONFIG_RCU_FANOUT_EXACT forces the tree to be as balanced as possible. Again, to the best of my knowledge, the default values have always been sufficient.
    4. CONFIG_RCU_FAST_NO_HZ causes the last non-dyntick-idle CPU to be more aggressive about entering dyntick-idle state. If you are not working with an SMP battery-powered device, you probably don't care about this parameter.
    In addition, the following parameters may be specified, either on the boot command line or via sysfs:
    1. blimit: This specifies the maximum number of RCU callbacks that may be processed consecutively, and defaults to 10.
    2. qhimark: If the number of callbacks waiting on a given CPU exceeds this number, which defaults to 10,000, then blimit is ignored and RCU callbacks will be processed indefinitely if need be.
    3. qlomark: If a given CPU is in indefinite-callback mode, then it will return to normal blimit-based callback processing once the number of outstanding RCU callbacks drops below qlomark, which defaults to 100.

  2. CONFIG_TREE_PREEMPT_RCU: selects the preemptible tree-based RCU implementation that is appropriate for real-time and low-latency SMP builds. It can also accommodate a very large number of CPUs, and also scales down sufficiently well for all but the most memory-constrained systems. The CONFIG_RCU_TRACE, CONFIG_RCU_FANOUT, CONFIG_RCU_FANOUT_EXACT, blimit, qhimark, and qlomark parameters are supported as noted above.

  3. CONFIG_TINY_RCU: selects the non-preemptible uniprocessor RCU implementation that is appropriate for non-real-time UP builds. It has the smallest memory footprint of any of the current in-kernel RCU implementations.

  4. CONFIG_TINY_PREEMPT_RCU: selects the preemptible uniprocessor RCU implementation that is appropriate for real-time UP builds. It also boasts a small memory footprint, though not quite so small as CONFIG_TINY_RCU. This implementation was produced as part of the Linaro effort.

The second set of kernel configuration parameters controls debugging options:

  1. CONFIG_SPARSE_RCU_POINTER enables sparse-based checks of proper use of RCU-protected pointers. Please note that this is a build-time check: Use “make C=1” to cause sparse to check source files that would have been rebuilt by “make”, and use “make C=2” to cause sparse to unconditionally check source files.

  2. CONFIG_DEBUG_OBJECTS_RCU_HEAD enables debug-objects checking of multiple invocations of call_rcu (and friends) on the same structure.

  3. CONFIG_PROVE_RCU enables lockdep-RCU checking. If CONFIG_PROVE_RCU_REPEATEDLY is also specified, then the lockdep-RCU checking can output multiple lockdep-RCU “splats”, otherwise only a single lockdep-RCU splat will be emitted per boot.

  4. CONFIG_RCU_TORTURE_TEST enables RCU torture testing. This is a tri-state parameter, permitting rcutorture.c to be compiled into the kernel, built as a module, or omitted entirely. When rcutorture.c is built into the kernel (CONFIG_RCU_TORTURE_TEST=y), then CONFIG_RCU_TORTURE_TEST_RUNNABLE starts RCU torture testing during boot. Please don't try this on a production system!

  5. CONFIG_RCU_CPU_STALL_DETECTOR, which is enabled by default for CONFIG_TREE_RCU and CONFIG_TREE_PREEMPT_RCU kernels, provides RCU-based checking for CPU stalls, which occur when a CPU or task fails to find its way out of an RCU read-side critical section in a timely manner. CPU stalls can be caused by a number of bugs, as described in Documentation/RCU/stallwarn.txt. The stall-warning timeout is controlled by CONFIG_RCU_CPU_STALL_TIMEOUT, which defaults to 60 seconds. If CONFIG_RCU_CPU_STALL_DETECTOR_RUNNABLE is set, which it is by default, then CPU stalls are checked for starting early in boot, otherwise, the rcu_cpu_stall_suppress module parameter must be manually specified (via either the boot-time command line or via sysfs) in order to start CPU stall checking. If CONFIG_RCU_CPU_STALL_VERBOSE is set, which it also is by default, then detailed per-task information is printed when a CPU stall is encountered.

If you are working with code that uses RCU, please do us all a favor and test that code with CONFIG_PROVE_RCU and CONFIG_DEBUG_OBJECTS_RCU_HEAD enabled. Please also consider running sparse with CONFIG_SPARSE_RCU_POINTER. If you are modifying the RCU implementation itself, you will need to run rcutorture, with multiple runs covering the relevant kernel configuration parameters. A one-hour rcutorture run on an 8-CPU machine qualifies as light rcutorture testing.

Yes, running extra tests can be a hassle, but I am here to tell you that extra testing is much easier than trying to track down bugs in your RCU code!!!

What Next for the RCU API?

The most honest answer is that I do not know. The next steps for the RCU API will be decided as they always have been, by the needs of RCU's users and by the limits of the technology at the time. That said, the following seem to be a few of the more likely directions:

  1. Complete implementation of RCU priority boosting (TINY_RCU submission slated for 2.6.38, TREE_RCU implementation in progress).

  2. Support for running certain types of user applications without scheduler ticks. There was a spirited discussion on this topic at Linux Plumbers Conference, but at this writing it appears that only small tweaks to RCU will be required.

  3. Merge the implementation of SRCU into TINY_RCU and TREE_RCU. A design for this is in mostly in place. This effort is likely to result in call_srcu() and srcu_barrier(). If it does, please be very careful with these primitives!!!

  4. Make RCU_FAST_NO_HZ work for TREE_PREEMPT_RCU.

  5. Drive the choice between TINY_PREEMPT_RCU, TINY_RCU, TREE_PREEMPT_RCU, and TREE_RCU entirely off of SMP and PREEMPT. This would allow cutting code and test scenarios, but first TINY_PREEMPT_RCU must prove itself.

  6. It is possible that rcu_dereference_index_check() will be retired if it is reasonable to convert all current use of RCU-protected indexes into RCU-protected pointers.

  7. It is quite possible that large systems might encounter problems with synchronize_rcu_expedited() scalability.

  8. Make RCU be more aggressive about entering dyntick-idle state when running on lightly loaded systems with four or more CPUs.

  9. Numerous other items listed on the RCU to-do list.

But if the past is any guide, new use cases and workloads will place unanticipated demands on RCU.

Acknowledgments

We are all indebted to Andy Whitcroft, Jon Walpole, Gautham Shenoy, and LWN member jarkao2, whose review of early drafts of this document greatly improved it. A great many people helped find and address issues located by the software-engineering enhancements to RCU, including Andi Kleen, Andrew Morton, Avi Kivity, Ben Greear, Daniel J Blueman, David Howells, Dhaval Giani, Eric Dumazet, Eric Paris, Frederic Weisbecker, Greg Thelen, Heiko Carstens, Ilia Mirkin, Ingo Molnar, Jens Axboe, Jiri Slaby, Johannes Berg, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Lai Jiangshan, Marcelo Tosatti, Mathieu Desnoyers, Miles Lane, Minchan Kim, Oleg Nesterov, Paul Moore, Peter Zijlstra, Robert Olsson, Sergey Senozhatsky, Subrata Modak, Tetsuo Handa, Thomas Gleixner, Trond Myklebust, Valdis Kletnieks, Vegard Nossum, Vivek Goyal, and Zdenek Kabelac. I owe thanks to the members of the Relativistic Programming project and to members of PNW TEC for many valuable discussions. I am grateful to Dan Frye for his support of this effort.

This work represents the view of the author and does not necessarily represent the view of IBM.

Linux is a registered trademark of Linus Torvalds.

Other company, product, and service names may be trademarks or service marks of others.

Answers to Quick Quizzes

Quick Quiz 1: Why are some of the cells in the above table colored green?

Answer: The green API members (rcu_read_lock(), rcu_read_unlock(), and call_rcu()) were the only members of the RCU (then called “rclock”) API back in the mid-90s. During this timeframe, Paul was under the mistaken impression that he knew all that there is to know about RCU.

Back to Quick Quiz 1.

Quick Quiz 2: Why are some of the cells in the above table colored blue?

Answer: Because the corresponding API members are new since the 2008 version of RCU part 3: the RCU API.

Back to Quick Quiz 2.

Quick Quiz 3: What happens if you mix and match RCU and RCU Sched?

Answer: In a CONFIG_TREE_RCU or a CONFIG_TINY_RCU kernel, mixing these two works "by accident" because in those kernel builds, RCU and RCU Sched map to the same implementation. However, this mixture is fatal in CONFIG_TREE_PREEMPT_RCU and CONFIG_TINY_PREEMPT_RCU builds, due to the fact that RCU's read-side critical sections can then be preempted, which would permit synchronize_sched() to return before the RCU read-side critical section reached its rcu_read_unlock() call. This could in turn result in a data structure being freed before the read-side critical section was finished with it, which could in turn greatly increase the actuarial risk experienced by your kernel.

Even in CONFIG_TREE_RCU and CONFIG_TINY_RCU builds, such mixing and matching is of course very strongly discouraged. Mixing and matching other flavors of RCU is also a very bad idea.

Back to Quick Quiz 3.

Quick Quiz 4: Why does SRCU lack an asynchronous call_srcu() interface?

Answer: Given an asynchronous interface, a single task could register an arbitrarily large number of SRCU callbacks, thereby consuming an arbitrarily large quantity of memory. In contrast, given the current synchronous synchronize_srcu() interface, a given task must finish waiting for a given grace period before it can start waiting for the next one.

However, there is a good chance that a call_srcu() will become available in the near future. If it does, and if you decide to use it, please be careful!!!

Back to Quick Quiz 4.

Quick Quiz 5: Can synchronize_srcu() be safely used within an SRCU read-side critical section? If so, why? If not, why not?

Answer: In principle, you can use synchronize_srcu() with a given srcu_struct within an SRCU read-side critical section that uses some other srcu_struct. In practice, however, such use is almost certainly a bad idea. In particular, the following could still result in deadlock:

idx = srcu_read_lock(&ssa);
synchronize_srcu(&ssb);
srcu_read_unlock(&ssa, idx);

/* . . . */

idx = srcu_read_lock(&ssb);
synchronize_srcu(&ssa);
srcu_read_unlock(&ssb, idx);

The reason that this code fragment can result in deadlock is that we have a cycle. The ssa read-side critical sections can wait on an ssb grace period, which waits on ssb read-side critical sections, which contains a synchronize_srcu(), which in turn waits on ssa read-side critical sections.

So if you do include synchronize_srcu() in SRCU read-side critical sections, make sure to avoid cycles. Of course, the simplest way to avoid cycles is to avoid using synchronize_srcu() in SRCU read-side critical sections in the first place.

Back to Quick Quiz 5.

Quick Quiz 6: Why doesn't list_del_rcu() poison both the next and prev pointers?

Answer: Poisoning the next pointer would interfere with concurrent RCU readers, which must use this pointer. However, RCU readers are forbidden from using the prev pointer, so it may safely be poisoned.

Back to Quick Quiz 6.

Quick Quiz 7: Why would anyone need to distinguish lists based on their NULL pointers? Why not just remember which list you started searching???

Answer: Suppose that CPU 0 is traversing such a list within an RCU read-side critical section, where the elements are allocated from SLAB_DESTROY_BY_RCU slab cache. The elements could therefore be freed and reallocated at any time. If CPU 0 is referencing an element while CPU 1 is freeing that element, and if CPU 1 then quickly reallocates that same element and adds it to some other list, then CPU 0 will be transported to that new list along with the element. In this case, remembering the starting list would clearly be unhelpful.

To make matters worse, suppose that CPU 0 searches a list and fails to find the element that it was looking for. Was that because the element did not exist? Or because CPU 0 got transported to some other list in the meantime? Readers traversing SLAB_DESTROY_BY_RCU lists must carefully validate each element and check for being moved to another list. One way to check for being moved to another list is for each list to have its own value for the NULL pointer. These checks are subtle and easy to get wrong, so please be careful!

Back to Quick Quiz 7.

Quick Quiz 8: Why is there no hlist_nulls_add_tail_rcu()?

Answer: Suppose that CPU 0 is traversing an hlist-nulls list under RCU protection. Suppose that while CPU 0 is referencing list element A, CPU 1 frees it and reallocates it, adding it to another list, which CPU 0 unwittingly starts traversing. Suppose further that while CPU 0 is referencing an element B in the new list, CPU 2 frees and reallocates it, moving it back to the original list. When CPU 0 comes to the end of the original list, it sees that the NULL pointer has the proper value, so does not realize that it has been moved.

If CPU 2 had added element B at the tail of the list, CPU 0 would be within its rights to conclude that it had fully searched this list when in fact it had not. But given that it is only possible to add elements to the head of an hlist-nulls list, any CPU coming to the end of the same list it started traversing can be sure that it really did search the entire list. Possibly several times, if it was extremely unlucky.

Therefore, there is not and should never be a primitive to add to the middle or the end of an RCU-protected hlist-nulls list. Except maybe at initialization time, before any readers have access to the list.

Back to Quick Quiz 8.

Quick Quiz 9: Normally, any pointer subject to rcu_dereference() must always be updated using rcu_assign_pointer(). What is an exception to this rule?

Answer: One such exception is when a multi-element linked data structure is initialized as a unit while inaccessible to other CPUs, and then a single rcu_assign_pointer() is used to plant a global pointer to this data structure. The initialization-time pointer assignments need not use rcu_assign_pointer(), though any such assignments that happen after the structure is globally visible must use rcu_assign_pointer().

However, unless this initialization code is on an impressively hot code-path, it is probably wise to use rcu_assign_pointer() anyway, even though it is in theory unnecessary. It is all too easy for a "minor" change to invalidate your cherished assumptions about the initialization happening privately. Besides, Arnd's sparse-based checks will yell at you if you apply simple assignment to a pointer that has been marked as protected by RCU using __rcu.

Back to Quick Quiz 9.

Quick Quiz 10: Why isn't there a debugging API to check for mismatching read-side and grace-period primitives, for example, using rcu_read_lock() on the read side, but incorrectly using synchronize_sched() during updates?

Answer: Because we have not yet come up with a good way to do this. One challenge is that synchronize_sched() doesn't say which RCU pointer it is waiting on. Even if the pointer was known, the connection to the read-side primitives might well be in some other compilation unit. One approach would be to use separate pointer tags for each flavor of RCU rather than the current __rcu, but early attempts in that direction resulted in mind-numbing complexity — even from an RCU perspective.

If you have a good idea, give it a try and let us know how it goes!

Back to Quick Quiz 10.

Comments (2 posted)

Patches and updates

Kernel trees

Build system

Core kernel code

Development tools

Device drivers

Documentation

Janitorial

Memory management

Architecture-specific

Security-related

Virtualization and containers

Benchmarks and bugs

Miscellaneous

Page editor: Jonathan Corbet

Distributions

openSUSE experimenting with a "rolling" release

December 8, 2010

This article was contributed by Joe 'Zonker' Brockmeier.

Since Novell's acquisition by Attachmate was announced, and Attachmate's stated intent to continue supporting openSUSE, there seems to be some renewed enthusiasm within the project. Greg Kroah-Hartman has proposed a rolling release called "Tumbleweed", and there's renewed discussion of a long-term-support release as well. Ultimately, the long-term option doesn't seem very likely, but Tumbleweed looks very promising.

As Kroah-Hartman pointed out succinctly in a post to the opensuse-project list, there's been a lot of talk about a "rolling update" distribution, but nothing has happened: "So the time now is to stop talking about it, and actually trying to do it."

Note that openSUSE would not be the first distribution to offer a rolling release option. There are, of course, Gentoo and Arch Linux; Debian testing is also used as a rolling release by many users. Some might think of openSUSE Factory as a rolling release distribution, but Kroah-Hartman says that Tumbleweed would differ by only putting forward stable packages. For instance, openSUSE Factory tracks the development kernels (like 2.6.37-rc) whereas Tumbleweed would keep the 2.6.36 kernel until 2.6.37 is stable.

Another example might be Firefox or GNOME 3.0. Many major releases are out of sync with the openSUSE release cycle (every eight months). Often, users and developers want to use the most recent stable releases of Firefox or GNOME, but don't want to wait seven months or so for the next release if its development roadmap doesn't line up well with openSUSE releases. Some users compensate by adding the stable release from an openSUSE Build Service repository, but that can quickly become complicated for users who track multiple projects.

It might seem out of character for Kroah-Hartman to be directing attention to the openSUSE release cycle — since he spends most of his time focused on the kernel. "What, I'm not allowed to work on other things than the kernel? :)", he said when asked in email about his interest in Tumbleweed. He continued:

Anyway, like I stated in the project announcement, I'm tired of people talking about how openSUSE could or could not move to a rolling release process. It comes up internally at Novell every few months, and every time a number of us meet together we end up talking about it as well. It's time to stop arguing about it and actually try it to see if it is possible.

Yes, it's a big project, but I have a lot of other openSUSE developers on board with helping me out.

So what's the motivation, aside from moving the idea from the discussion phase to realization? Kroah-Hartman says that it might attract more developers to openSUSE as their primary distribution, but "that's not my primary goal here." Instead, he says that he wants to see if it's possible to have a rolling release distribution because "it's something I've been wanting a long time." He also notes that, thanks to the openSUSE Build Service, it's something many users already hack together for themselves.

Naturally, a number of concerns, objections, and "what ifs" were raised on the list after Kroah-Hartman's announcement. For example, Vincent Untz worried that some developers might focus on the rolling release rather than the stable releases. Guido Berhoerster expressed concern about the additional workload for packagers.

But Kroah-Hartman, rightly, pointed out that it's better to try and see what problems arise rather than just debating the idea indefinitely. In email, he said that he doesn't think it will be much, if any, additional work for the package maintainers. "It is merely taking the work they are already doing today with the Factory repository, and trailing it by a bit to only include stuff that is 'known to work.'"

It will be some time before Tumbleweed is a reality, though. Kroah-Hartman says he'll try it out with openSUSE 11.3, but doesn't guarantee that it will work correctly. He said that he wants to do it "for real" after 11.4 is out due to the amount of time it will take to get the workflow down properly.

openSUSE LTS

Another discussion that has come up again and again within the openSUSE community is that of a release with a longer lifecycle. openSUSE initially enjoyed a two-year lifecycle, but this was cut to 18 months in August of 2009. While many users accept it as reasonable to update a desktop system every year and a half, it is an unacceptably short lifecycle for those using openSUSE on the server. This prompted renewed discussion of ways to extend long term support for a SUSE/openSUSE based distribution without having to actually pay for SLES.

Several proposals have been floated over the years, ranging from a CentOS-style repackaging of SUSE Linux Enterprise to updating openSUSE after "official" support ends to keep a release alive longer. While repackaging efforts for enterprise distributions have been successful (e.g CentOS), attempts to maintain a community distribution beyond its normal lifecycle (e.g. Fedora Legacy) have lagged unacceptably far behind and ultimately ceased entirely. Users express great interest in such efforts, but fewer bodies are available to actually do the work.

Discussions about a long-term release petered out after some heated discussion and planning on and off the openSUSE mailing lists in 2009. The topic was revived by Wolfgang Rosenauer on November 22. Rosenauer suggests that the community take over after Novell drops support at 18 months, and focus on a "a subset of packages which were delivered with the original distribution but the focus might be on server services anyway."

Shortly after, openSUSE Board member Pascal Bleser took up the topic. Bleser concurs that it would be a good thing to have, but "it does bring some technical challenges which require a certain number of committed contributors working on the maintenance." Bleser also suggests that Novell would be opposed to a CentOS-style SLES, which Kroah-Hartman quickly rebutted:

Don't be so sure of this at all. I can't speak for anyone else here, and I am not speaking as Novell at all, but I can tell you that if you wish to [pursue] this option, I will be glad to personally help you if you run into resistance from Novell in any way for this project. It should _not_ be anything that Novell should be resistant to having happen.

The problem is that while many users and even openSUSE developers want a longer-term support for openSUSE, few, if any, are trying to make it happen. The discussion on the openSUSE list about an LTS differs greatly from Kroah-Hartman's Tumbleweed proposal in that no one is stepping forward to make it happen. Much of the discussion is of the "we would really like Novell to run with this idea" variety. Until a sufficiently large group of openSUSE enthusiasts step forward to do the work, an LTS will likely remain a pleasant idea that goes nowhere.

Tumbleweed, on the other hand, seems to have the enthusiastic support of enough openSUSE contributors to get a full try. It should be interesting to see how the project progresses. If it's successful, openSUSE could become a very popular distribution with many developers and Linux enthusiasts who want to run the most recent stable software without waiting for the next openSUSE release or maintaining a laundry list of Zypper repositories.

Comments (1 posted)

Brief items

Distribution quotes of the week

Way back in the day, there was only the main tree. And we shipped it.

Then, there was the Desktop Live image, and it was asked:

"Hey rel-eng, can you build this? And proto-QA, can you test this?"

"Um, sure, OK."

Then the KDE SIG asked, "hey, we'd like a live image too. Hey, rel-eng, can you build this? And proto-QA, can you test this?"

"Sure, I guess so".

And then came someone wanting XFCE. And someone wanting LXDE. And someone saying "hey, I'd like to build an Electronics Lab". And then someone mentioned Sugar. And they all said, "Hey, rel-eng, can you build this? And proto-QA, can you test this?"

Then proto-QA and rel-eng both said "hey, wait a minute... no, not really."

And that's where the spins SIG was born

-- Bill Nottingham

"Ten years ago, Linux distros were cutting edge by coming with a firewall enabled. Now Fedora is going to cut the edge in a new way... no firewall wanted."

...

[And no I am not trying for 2 weeks of LWN quotes as tempting it will be. (alright alright I am .. it is just so addicting)]

-- Stephen John Smoogen (Thanks to Stu Tomlinson)

Comments (none posted)

CyanogenMod 6.1 released

Version 6.1 of the CyanogenMod Android distribution has been released. New features abound; they include a number of camera improvements, nicer notifications, "super duper unified flashlights," FM radio support on a number of handsets, and more; see the changelog for the details.

Comments (6 posted)

Debian Installer 6.0 Beta2 release

The Debian Installer team has announced the second beta release of the installer for Debian GNU/Linux Squeeze. Click below for a look at the improvements, behavioral changes, and known issues in this release.

Full Story (comments: none)

OpenEmbedded 2010.12 released

The OpenEmbedded Community has announced the release of OpenEmbedded 2010.12. "This release was [focused] on stabilizing the metadata and getting various distro/machine/image combinations to work. A lot of effort was put by community to make this happen and a table of tested combination is available at http://www.openembedded.org/index.php/Release-2010.12."

Comments (none posted)

Natty Alpha 1 Released

The first alpha release of Ubuntu's Natty Narwhal (11.04) is available for testing. The other members of the Natty family that are also available are: Ubuntu Server for UEC and EC2, Kubuntu, Xubuntu, Edubuntu DVD, and Ubuntu Studio.

Full Story (comments: none)

FOSDEM 2011 distribution miniconf

FOSDEM 2011 (Free and Open Source Software Developer's European Meeting) will be held in Brussels in early February. There will be a distribution miniconf and the call for talks is open. Wouter Verhelst notes that "Unfortunately, to date, the number of submissions (from Debian and other distributions alike) has been abysmally low." Henne Vogelsang also posted to the openSUSE-project list, "Please if you maintain/contribute to or know about a great technology that you think is underrated in the other distributions be a nice FOSS developer and present it. Or you have a nice tangible idea how distributions can work better together, speak up about it!"

Comments (none posted)

Distribution News

Fedora

Fedora Board Meeting Minutes - 2010-12-06

The minutes from the December 6 meeting of the Fedora Board are available. The board welcomed its new members, announced a public meeting at FUDCon Tempe, and discussed Spins and SIG Governance.

Comments (none posted)

Newsletters and articles of interest

Distribution newsletters

Comments (none posted)

Page editor: Rebecca Sobol

Development

Large file management with git-annex

By Jake Edge
December 8, 2010

As its introduction says, git-annex sounds like something of a paradox. It uses Git to manage files that are larger than Git can easily handle—without checking them into the repository. But git-annex provides ways to track those files using much of the same infrastructure as Git, so that moving or deleting those files can all be tracked in much the same way as committed files. In addition, git-annex allows for branches and distributed clones of its trees.

Developer Joey Hess lists two use cases for git-annex that will appeal to folks who juggle many large files on multiple storage devices, frequently move between different locations and computers, or some combination thereof. Because git-annex tracks the locations of the actual data files, which may not be locally present, it can act like a hierarchical storage manager. The filenames will be present in the repository, but their content may need to be fetched from elsewhere or from a currently offline disk. git-annex will fetch the data if it can find it in an online repository or ask that a particular repository be made available.

In addition, git-annex ensures that there is at least one copy—though it can be configured to keep more than one—of a file's contents available before dropping the file from a local repository. That way, the user can drop a large file (or files) from their laptop, say, while knowing that the contents are still available on some other repository that git-annex was able to contact. For "The Archivist", which is one of Hess's use cases, that is essential, so that they can reorganize their files at will, while knowing that they can't be accidentally deleted.

But those same attributes are useful to "The Nomad" (Hess's other use case):

When she has 1 bar on her cell, Alice queues up interesting files on her server for later. At a coffee shop, she has git-annex download them to her USB drive. High in the sky or in a remote cabin, she catches up on podcasts, videos, and games, first letting git-annex copy them from her USB drive to the netbook (this saves battery power).

When she's done, she tells git-annex which to keep and which to remove. They're all removed from her netbook to save space, and Alice [knows] that next time she syncs up to the net, her changes will be synced back to her server.

It does all this via a git-annex binary that is built from Haskell sources. That allows git-annex to integrate with Git, so using it is as simple as "git annex ...". Unlike many free software utilities, git-annex also comes with fairly extensive documentation, including a man page and a walk-through. As might be expected, the code is available via a Git repository—though Debian unstable users can apt-get install it.

When files are added to git-annex, their content is moved to a .git/annex/objects directory and a symbolic link is created using the original filename and pointing to the content. Those symbolic links are handled by Git directly, while git-annex arranges for the content to be present as requested. Creating a repository is pretty straightforward:

    $ mkdir ~/annextst
    $ cd ~/annextst
    $ git init
    $ git annex init "desktop repo"
The "git annex" command gives the annex a name that can be used to identify the repository later on. One then adds files to the repository in a fairly obvious way:
    $ cp /tmp/big_file .
    $ git annex add .
    add big_file ok
    $ git commit -a -m "added big_file"
The last command may seem a bit surprising, but Git is what will track the symbolic link(s) that the git annex add created. As the walk-through shows, that Git repository can be cloned elsewhere (on another machine or a removable USB device for example) and then each of those repositories can be added as remote repositories (i.e. git remote) of each other. The only additional step for turning it into a git-annex repository is to do:
    $ git annex init "some other repo"
in the cloned directory.

Getting file content is as simple as doing:

    $ git annex get some_file
while removing files is done with:
    $ git annex drop some_file
though that may fail if git-annex cannot find another copy in the repositories it can currently contact (which can, of course, be overridden). Syncing between repositories is done with the usual "git pull" command. Another nice feature of git-annex is that it works seamlessly with files that are already present in the git repository, so handling a combination of giant and normal-sized files is easy.

There are several types of storage back-ends that git-annex can use to store the key-value pairs that relate the filename to its contents. The default is WORM (write once, read many), which is also the least expensive because it assumes that file contents do not change once they have been stored. The SHA1 backend stores the file content object based on its SHA1 hash, which can be an expensive operation on very large files, but will track changes to the contents. There is also a URL backend that fetches the content from an external URL (as the name implies).

This only scratches the surface of git-annex and what it can do, so interested readers should take a wander through the documentation that Hess provides. In the announcement of git-annex, Hess also points to two other projects that he calls "software tools that use git in ways that were never intended". The first is mr, which treats a set of repositories in various repository formats (svn, git, cvs, hg, bzr, ...) as if they were one combined repository. The other, etckeeper, hooks into package managers like apt and yum to commit changes to files in /etc when they are changed by a package update. One of the advantages of free software is that it allows folks to do things that were unanticipated by the original developer; it would certainly seem that Hess has done just that.

Comments (1 posted)

Brief items

Quotes of the week

Also, anytime you are creating a new commit with the same changes as another commit, you are destroying `git blame`'s ability to tell you who to flog publicly. And as we all know, public floggings are the lifeblood of software development teams.
-- Paul Stadig

Many of the economic arguments in favor of releasing code as open source, and dedicating a significant fraction of an engineer's time to serve as a OSS project maintainer or kernel subsystem maintainer, are ones that make much more sense at a very large company like Google or IBM. That's not because startups are evil, or deficient in any way; just the economic realities that at a successful startup, everything has to be subordinated to the central goal of proving that they have a sustainable, scalable business model and that they have a good product/market fit. Everything else, and that includes participating in an open source community, is very likely a distraction from that central goal.
-- Ted Ts'o

The results over the last year have been really amazing. Between the two of us Andrew [Bartlett] and I have pushed over 2500 patches to the Samba master repository over a year of pair programming, which is more than twice what we managed in the previous year. I find it really interesting that despite only one of us typing at a time, we get much more done with pair programming than when we work separately. The results are even more notable when you take into account that in the last year Andrew has been rebuilding his house and looking after a new baby!

I think the reason it works so well is that it tends to minimise procrastination. When I code alone and I'm stuck on a bit of code, I often find myself drifting off to read slashdot or muck about with some new application that I've found. That happens a lot less when someone else is watching over your shoulder on VNC. We discuss how we're going to solve the problem and then we solve it, without the hours of procrastination in between.

-- Andrew Tridgell

Comments (none posted)

GRUB imports ZFS support

The GRUB project has announced its decision to add ZFS support to the GRUB bootloader, despite the facts that (1) Oracle has not assigned copyright to the FSF, and (2) ZFS is not thought to carry a GPL-compatible license. "The ZFS code that has been imported into GRUB derives from the OpenSolaris version of GRUB Legacy. On one hand, this code was released to the public under the terms of the GNU GPL. On the other, binary releases of Solaris included this modified GRUB, and as a result Oracle/Sun is bound by the GPL." (Thanks to Luis Rodriguez)

Full Story (comments: 9)

Second Beta of KDE SC 4.6 Out

KDE SC 4.6 Beta 2 has been released. "KDE SC 4.6 Beta2 is targeted at testers and those that would like to have an early look at what's coming to their desktops and netbooks this summer. KDE is now firmly in beta mode, meaning that the primary focus is on fixing bugs and preparing the stable release of the software compilation this summer. Since the release of the first beta two weeks ago 1318 bugs have been reported and 1176 bugs have been closed."

Full Story (comments: none)

KOffice becomes the Calligra Suite

KDE.News carries the news that KOffice has been rebranded as "the Calligra Suite" and given a wider focus. "The Calligra Suite introduces the Calligra Office Engine which makes it easy for developers to create new user experiences, target new platforms and create specialized versions for new kinds of users. Currently, there are two main user experiences: the desktop UI with the applications mentioned above, and FreOffice which is the only free mobile office suite in existence."

Comments (29 posted)

Psycopg 2.3.1 released

Psycopg, which is a PostgreSQL adapter for Python, has released version 2.3.1. It is simply a fix for a CentOS build bug in the unannounced 2.3.0 version. Major new features in 2.3.0 are:
  • dict to hstore adapter and hstore to dict typecaster, using both 9.0 and pre-9.0 syntax.
  • Two-phase commit protocol support as per DBAPI specification.
  • Support for payload in notifications received from the backed.
  • namedtuple-returning cursor.
  • Query execution cancel.

Full Story (comments: none)

PublicSQL announced

PublicSQL, which is new way to handle SQL queries from within web applications by storing the data in tables in JavaScript code, has been announced. The tables are generated from the query and will be loaded automatically into the web page. This allows for web applications that don't require a database server, but can still provide SQL services.

Full Story (comments: none)

Newsletters and articles

Development newsletters from the past week

Comments (none posted)

Hudson Labs: Who's driving this thing?

It would seem that there is a brewing conflict between the development community for Hudson, which is an open source continuous integration server, and Oracle, who own the trademark to the name, over where the code and development infrastructure will be hosted. Over at the Hudson Labs blog, R. Tyler Croy lays out a timeline of the disagreement, along with some of his opinion of what's going on. "The fundamental issue here is that the developers want to make a change in how they contribute to Hudson, and have made their voices heard to that end. From the users' perspective, such a change would have literally zero impact on them, which makes Oracle's conflation of the two sides of Hudson particularly frustrating." (Thanks to Croy and Christof Damian for bringing it to our attention).

Comments (18 posted)

Henrik Ingo: How to grow your open source project 10x

Henrik Ingo has posted the results of a study on project governance concluding that the key factor distinguishing large and successful projects is the existence of a nonprofit governing foundation. "There appears to be a glass ceiling for single vendor projects prohibiting their growth from the Large category upwards. To truly reach their fullest potential, open source projects are recommended to consider the proven governance model of a non-profit foundation around which participants collaborate."

Comments (25 posted)

Page editor: Jonathan Corbet

Announcements

Non-Commercial announcements

Gentoo becomes Open Invention Network licensee

The Gentoo Foundation has joined Open Invention Network as a licensee. "We believe that by becoming an Open Invention Network licensee, we encourage continued open source development and foster innovation in a technical community that benefits everyone. We recognize the importance of participating in a substantial community of Linux supporters and leveraging the Open Invention Network to further spur open source innovation."

Comments (3 posted)

FSFE: European Commission's software contract is a rough deal for Europe

The Free Software Foundation Europe urges the European Commission to stick to its own decisions and guidelines to use open and interoperable hardware and software. "The European Commission will spend EUR 189 million on proprietary software over the next six years, in direct contradiction to its own decisions and guidelines. The Commission last week announced a six-year framework contract to acquire a wide range of mostly proprietary software and related services."

Full Story (comments: none)

FSFLA's petition for Canaima GNU/Linux to be Free

The Free Software Foundation Latin America has petitioned the Venezuelan government to remove the non-free bits from the state sponsored Canaima GNU/Linux. "There are regulations that will require that computers purchased or produced by the Venezuelan state be capable of working with Canaima GNU/Linux. If Canaima GNU/Linux includes privative drivers, it will enable the purchase of far more hardware that demands Privative Software to work, preventing the achievement of our dreamed Technological Sovereignty."

Full Story (comments: none)

Commercial announcements

Android 2.3 and the Nexus S

Google has announced the availability of the Android 2.3 platform and software development kit. There's lots of new stuff in this release; source does not appear to be available yet, though.

Also announced is a new flagship phone: the Nexus S.

Comments (19 posted)

Mentor Graphics acquires CodeSourcery assets

Mentor Graphics has announced its intention to acquire certain assets of CodeSourcery, Inc., a provider of open source GNU-based toolchains and services. ""CodeSourcery and its industry-recognized toolchain services and products significantly increase the value of embedded solutions that Mentor Embedded can provide its customers, as well as contributions to the open source community," said Glenn Perry, general manager, Mentor Graphics Embedded Software Division. "We believe that the future of embedded development depends on the wide availability of open source software and tools."" (Thanks to David Daney)

Comments (3 posted)

Linux Foundation Announces Certifications to Linux Standard Base 4.0 and Public Beta 4.1

The Linux Foundation has announced that all the leading commercial Linux companies are certified to Linux Standard Base 4.0 (LSB 4.0), "including Canonical, Kylin, Linpus, Mandriva, Neoshine, Novell, Oracle, Red Flag and Red Hat." There is also a beta of the LSB 4.1 available at https://www.linuxfoundation.org/en/LSB_4.1_Beta.

Comments (none posted)

Huawei Joins Linux Foundation

The Linux Foundation has announced that the Chinese company Huawei has joined LF. "Being recognized as one of the world's most innovative companies, Huawei is using Linux to develop network equipment and devices and sees its Linux Foundation membership as an opportunity to collaborate with a worldwide network of developers, users and vendors to advance that work."

Full Story (comments: none)

Legal Announcements

Red Hat Files Brief with U.S. Supreme Court

Red Hat has joined in a brief filed with the U.S. Supreme Court seeking correction of the standard for inducing patent infringement. "The "friend of the court," or amicus brief, submitted by Red Hat and others seeks reversal of a lower court decision that threatens to expand patent litigation. The brief argues that the law requires that only those who actually know of the specific patent at issue and know that it covers the alleged infringing activity can be found liable."

Comments (none posted)

Articles of interest

Asay: Leaving Canonical

On his blog, Matt Asay has announced that he is leaving Canonical for a mobile web application startup called Strobe. Asay started as Canonical's Chief Operating Office (COO) in February. "It was the hardest decision of my career, even harder than my decision to leave Alfresco. I have never left an employer after such a short time, and everywhere I look within Canonical and the Ubuntu community I see massive opportunity. This is a leap of faith for me, but one that I feel sure is right for me, even as I continue to cheer on Canonical in its ambitious quest."

Comments (30 posted)

Student participation in open source projects (opensource.com)

Opensource.com is running an article from a university professor on student participation in development projects. "Clearly, there are also large differences in culture. But I think that collaboration between open source and academic realms can work, as there are also some strong commonalities between the groups. The open source and academic environments both share the desire to create something, to produce a product that people will use. Both groups have a love of learning and both groups are based on the idea that something (whether it is knowledge or software) should be accessible to everyone. Both groups have a desire to belong to a professional group, to be interacting as professionals and participating in ongoing professional activity. And interestingly, I think both groups share the desire to be self-directed and to have control over what they do."

Comments (7 posted)

Progress Report: LibreOffice Beta 3 (Linux Magazine)

Joe "Zonker" Brockmeier looks at LibreOffice Beta 3. "The progress made by the LibreOffice folks so far is impressive, at least when it comes to attracting contributors. The third beta was released on November 18th, and seems to have impressive momentum. The release notes list 118 contributors who've helped with the development just between beta 2 and beta 3. How's it looking so far? Don't expect miracles, but it's shaping up nicely."

Comments (3 posted)

About Those 882 Novell Patents: This is Where OIN Comes In (Groklaw)

Groklaw takes a look at the sale of Novell's patents and how the Open Invention Network (OIN) fits in. "Here's how it works. The patents of OIN members are licensed to each other royalty-free in perpetuity. Even on a sale, the license remains in force for all pre-existing members. If you are a member of OIN prior to the closing on the Novell deal, then, you are covered. The proposed closing date is January 23rd, so you still have time to join OIN and get the benefit of the license to those patents. Then, if Microsoft shows up at your door, you can say, "Thanks, but no thanks. I already have a license." So here's what it all adds up to, by my reading: if ever you were thinking of joining the Open Invention Network, this is the sensible time to do it, as long as you get it done before this sale closes and that door shuts with respect to the Novell patents."

Comments (26 posted)

Novell muddles through fiscal Q4 (The Register)

The Register covers at Novell's financial results. "Novell didn't do too badly in Q4, all things considered, with sales only down 4.2 per cent to $206.5m. Software license sales were $31.3m, down a smidgen from the year-ago quarter. Maintenance and subscription sales were off 4.2 per cent, mirroring license declines, at $153.3m, and services revenues fell by 9.7 per cent to just under $22m. In the quarter, Novell booked $308m in non-cash tax benefits related to "certain net deferred tax assets," which would have been interesting to explain."

Comments (none posted)

New Books

Pragmatic Guide to Subversion--New from Pragmatic Bookshelf

Pragmatic Bookshelf has released "Pragmatic Guide to Subversion", by Mike Mason.

Full Story (comments: none)

Pragmatic Guide to JavaScript--New from Pragmatic Bookshelf

Pragmatic Bookshelf has released "Pragmatic Guide to JavaScript", by Christophe Porteneuve.

Full Story (comments: none)

Upcoming Events

Linux Users' Group of Davis presentation December 20th

The Linux Users' Group of Davis (LUGOD) will be holding its next meeting on December 20, 2010 at the Explorit Nature Center in Davis, California. There will a presentation on "3D Display, 3D Interaction, 3D Capture, and Off-Label Uses of Commodity Hardware (or: How to Become an Internet Celebrity in Three Easy Steps)"

Full Story (comments: none)

Events: December 16, 2010 to February 14, 2011

The following event listing is taken from the LWN.net Calendar.

Date(s)EventLocation
December 13
December 18
SciPy.in 2010 Hyderabad, India
December 15
December 17
FOSS.IN/2010 Bangalore, India
January 16
January 22
PyPy Leysin Winter Sprint Leysin, Switzerland
January 22 OrgCamp 2011 Paris, France
January 24
January 29
linux.conf.au 2011 Brisbane, Australia
January 27 Ubuntu Developer Day Bangalore, India
January 27
January 28
Southwest Drupal Summit 2011 Houston, Texas, USA
January 29
January 31
FUDCon Tempe 2011 Tempe, Arizona, USA
February 2
February 3
Cloud Expo Europe London, UK
February 5 Open Source Conference Kagawa 2011 Takamatsu, Japan
February 5
February 6
FOSDEM 2011 Brussels, Belgium
February 7
February 11
Global Ignite Week 2011 several, worldwide
February 11
February 12
Red Hat Developer Conference 2011 Brno, Czech Republic

If your event does not appear here, please tell us about it.

Page editor: Rebecca Sobol

Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds