User: Password:
Subscribe / Log in / New account Weekly Edition for August 2, 2012

GUADEC: open source and open "stuff"

By Nathan Willis
August 1, 2012

Developer conferences like GUADEC tend to be dominated by technical content, so talks with a different tenor stand out. Alex "Skud" Bayley's July 28 keynote "What's Next? From Open Source to Open Everything" was one such talk. Bayley has spent time in both the open source and open data movements, and offered a number of insights between those and other online, grassroots-community movements — including what open source can teach newer communities from experience, and what it can learn.

Bayley's talk was based loosely on a blog post she wrote in January 2011 while working at the Freebase "open data" project. After roughly a decade of working full-time on open source software, she decided it was no longer the "fringe" and cutting-edge movement it was in the early years, and consequently stopped being an interesting challenge. That assessment is not a criticism, however; as Bayley put it, the fact was that open source software had "won." "No one seriously says 'lets use ColdFusion running on Oracle' for their Web site anymore." She subsequently encountered the open data movement and found that it "freaked out enough people" that she was certain she was onto something interesting again.

[Bayley at GUADEC]

But the real insight was her discovery that the nascent open data movement was grappling with the same set of challenges that open source software had tackled roughly a decade earlier. For example, it was grappling with licensing (as open source had), and was still in the process of distilling out its core principles and how best to enshrine them in appropriate licenses. Similarly, she said, in the early days open source struggled to build its software support tools (such as build systems and version control), find working business models, and discover how to interact with governments and other entities that found the movement odd or suspicious.

Open data was repeating the same process, ten years later. Bayley relayed an anecdote about a New Zealand government project that attempted an open data release as a Zip archive. Nat Torkington reacted with a number of questions illustrating how a Zip release failed to make the grade: what if there is a bug, or an update, or a patch? Open source and open data are not the only movements, however: Creative Commons and Wikipedia have dealt with similar issues, as have open education, open healthcare, open government, open access (e.g., to academic research), and open hardware — and Bayley found the parallels interesting. In short, when asked what her current interest is, she now replies "open ... stuff."

An even broader circle encompasses not only the open technology movements, but other recent grassroots and peer-to-peer online communities, including "scanlation" groups that crowdsource translations, music or video remixing communities, unconferences, and even fan-fiction communities. Some of these groups might not seem to have any connection to open source, Bayley admitted, but the parallels are there: they are self-organizing, decentralized, non-hierarchical, and are based around the tenet of making things free. That makes them kindred spirits that open source could assist based on its own experiences, and it makes them worth learning from.

What open source can teach

The first area in which open source can offer assistance to other open movements is licensing, Bayley said. Licensing is "really, really important," but most online communities don't think about it in their early days. Open source has generally settled on a small set of licenses that cover most participants' needs, and it has done so largely because it started from the FSF's four freedoms. Newer communities could benefit from open source's work articulating its core principles, writing definitions, and figuring out the boundaries that determine "who's in and who's out."

She cited several examples in the open data movement where lack of licensing standards confuses the issue. One was a genealogy site advertising its "open data" at OSCON 2010 — data that was licensed CC-Attribution-NonCommercial-NoDerivatives-ShareAlike, a choice that breaks three of the FSF's four freedoms. Another was a "community mapping" project run by Google, which used participatory and community language in its marketing, but in which all contributions became the sole property of Google.

The second area where open source can assist other movements is tools. Open source has a complete toolchain all the way down the stack, but many other communities do not. Many creative- or publishing-centric communities have no concept of version control, she said. But telling ebook authors to "just use GitHub" is not the answer; they would balk at the suggestion, and rightfully so. Rather, the open source community needs to ask "what would DocHub look like?" and help the community build the tools it requires.

Finally, open source can teach other communities about the value of working and communicating completely in the open: open mailing lists, open documentation, and "release early, release often" workflows. The benefits may seem obvious to open source developers, but it is a scary prospect to those not "soaking in it" already, like the open government movement. But transparency has benefits for all open source communities, she said. It allows outsiders to see what the community is like and how it operates, so that they can put themselves into the situation with fewer surprises. It also means more accountability, which is particularly important in movements like open government.

What open source can learn

Open source software's relative maturity puts it in a position to offer experience-based advice to other online communities, Bayley said, but that fact does not mean the other communities have nothing to teach of their own. After all, she said, no one recruits the thousands of teenagers who write and share their own Harry Potter fan-fiction — they build and organize their own communities online. There are several potential lessons from these other groups, which she described in random order.

The first is the value of hands-on events. Hackerspaces often hold short, practical events where "people can walk-in, learn something, and walk out." Open source rarely does this, expecting newcomers instead to sign up for online courses or figure out what to do on their own. But "no one says 'I spent all weekend reading documentation; it was awesome!'" Many minority or marginalized groups in particular require a slight push to get involved; a physical event after which they can walk away having learned something will provide this push in ways an online event cannot. It is easy — but fundamentally selfish — to tell others that they must learn it hard way because you did, she said.

Many other open communities also have much more age diversity than open source. Environmental groups and music communities tend to be all-age, she said, but open source is not. Developer conferences like GUADEC tend to be dominated by attendees in their 20s and 30s, while system administration conferences are much older. But as an example, she asked, why are there no children at GUADEC? Some might expect them to find the talk sessions dull, or to be disruptive, which is valid, but there could still be other program content designed to reach them. She told a story about a punk music venue in Berkeley California that held only all-age concerts, and how she witnessed adults helping kids enjoy the mosh pit by lifting them onto their shoulders. "If you can run a public mosh pit for kids, you can probably solve the problem for"

Finally, many other open communities operate with a strong "nothing about us without us" ethic. The phrase comes from the disability rights community, and it means that the community tries not to embark on projects ostensibly for disabled people unless there are people with disabilities involved. Otherwise, the results can easily fail to meet the needs of the target community.

An example of failing to exercise this approach happened after the 2010 Haiti earthquake. After the quake, a number of open source developers volunteered to write software to support the relief effort, but did so without partnering with the relief workers on the ground. The developers felt good about themselves — at least at first — but were ultimately disappointed because their efforts were not of much practical help. In addition to producing better outcomes, she said, the "nothing about us without us" approach has the added benefit of empowering people to build things for themselves, rather than building things for them.

Bayley's talk encompassed such a wide view of online and "open something" communities that at first it was hard to see much that connected them. But in the end, she is right: even if the reason that the other community congregates has nothing to do with the motives that drive open source software, these days we have a lot in common with anyone who uses the Internet to collaborate and to build. In her first few years of open source involvement, Bayley said, she frequently told people to switch over to Linux and open source software in tactless ways that had little impact. She hopes that she is more tactful today than she was at 18, she said, because open source has lessons to teach about freedom and community. Those lessons are valuable even for communities that have no interest in technology.

[The author would like to thank the GNOME Foundation for travel assistance to A Coruña for GUADEC.]

Comments (1 posted)

GUADEC: Motion tracking with Skeltrack

By Nathan Willis
August 1, 2012

At the beginning of his GUADEC 2012 talk, developer Joaquim Rocha showed an image from Steven Spielberg's oft-cited 2002 film Minority Report. When the movie came out, it attracted considerable attention for its gesture-driven computing. But, Rocha said, we have already surpassed the film's technology, because the special gloves it depicted are no longer needed. Rocha's Skeltrack library can leverage Microsoft Kinect or similar depth-mapping hardware to find users and recognize their positions and movements. Skeltrack is not an all-in-one hands-free user interface, but it solves a critical problem in such an application stack.

[Rocha at GUADEC]

Rocha's presentation fell on Sunday, July 29, the last "talk" day of the week-long event. Although GUADEC is a GNOME project event, the Skeltrack library's primary dependency is GLib, so it should be useful on non-GNOME platforms as well. Rocha launched Skeltrack in March, and has released a few updates since. The current version is 0.1.4 from June, and is available on GitHub. For those who don't follow Microsoft hardware, the Kinect uses an infrared illuminator to project a dot pattern onto the scene in front of it, and an infrared sensor reads the distortion in the pattern to map out a "depth buffer" of the objects or people in the field of view.

How it works

Like the name suggests, Skeltrack is a library for "skeleton tracking." It is built to take data from a depth buffer like the one provided by the Kinect device, locate the image of a (single) human being in the buffer, and identify the "joints." Currently Skeltrack picks out seven: one head, two shoulders, two elbows, and two hands. Those joints can then be used by the application, letting the user manipulate objects, or for further processing (such as gesture recognition). The Kinect is the primary hardware device used with Skeltrack, Rocha said (because of its low price point and simple, hackable USB interface), but the library is hardware independent. Skeltrack builds on the existing libfreenect library for device control, and includes GFreenect, a GObject wrapper library around libfreenect (because, as Rocha quipped "we really like our APIs in GNOME").

One might be tempted to think that acquiring the 3D depth information is the tricky part of the process, and that picking a human being out of the image is not that complicated. But such is not the case. Libfreenect, Rocha said, cannot tell you whether the depth information depicts a human being, or a cow, or a monkey, much less identify joints and poses. There are three proprietary ways to get skeleton information out of libfreenect depth buffers: the commercial OpenNI framework, Microsoft's Kinect SDK, and Microsoft's Kinect For Windows. Despite its name, OpenNI includes many non-free components, the skeleton-tracking module included. The Kinect SDK is licensed for non-commercial use only, while Kinect for Windows is a commercial offering, and only works with the desktop version of the Kinect.

Moreover, the proprietary solutions generally rely on a database of "poses" against which the depth buffer is compared, in an attempt to match the image against known patterns. That approach is slow and has difficulty picking out people of different body shapes, so Rocha looked for another approach. He found Andreas Baak's paper A Data-Driven Approach for Real-Time Full Body Pose Reconstruction from a Depth Camera [PDF]. Baak's algorithm uses pattern matching, too, but it provided a valuable starting point: locating the mathematical extrema in the body shape detected, then proceeding to deduce the skeleton.

Heuristics are used to determine which three extrema are most likely to be the head and shoulders (with the head being in the middle), and which are hands. Subsequently, a graph is built connecting the points found, and analyzed to determine which shoulder each hand belongs to (based on proximity). Elbows are inferred as being roughly halfway along the path connecting each hand to its shoulder. The result is a skeleton detected without any "computer vision" techniques, and without any prior calibration steps. The down side of this approach is that for the moment it only works for upper-body recognition, although Rocha said full-body detection is yet to come.

How to use it

Skeltrack's SkeltrackSkeleton object has tweakable parameters for expected shoulder and hand distances, plus other measurements to modify the algorithm. One of the more important parameters is smoothing, which helps cope with the jitter often found in skeleton detection. For starters, Kinect depth data can be quite noisy, and on top of that, the heuristics used to find joints in the library result in rapid, tiny changes. Rocha showed a live demo of Skeltrack on stage, and with the smoothing function deactivated, the result is entertaining to watch, but would not be pleasant to use when interacting with one's computer. The down side is that running the smoothing formula costs CPU cycles; one can maximize smoothing, but the result is higher latency, which might hamper interactive applications.

Rocha also demonstrated a few poses that can confuse Skeltrack's algorithm. For example, when standing hands-on-hips, there are no "hand" extrema to be found, leading the algorithm to conclude that the elbows are hands. With one hand raised head-height and the corresponding elbow held at shoulder height (as one might do while waving), the algorithm cannot find the shoulder, and thus cannot figure out which of the extrema is the head and which is the hand. Nevertheless, Skeltrack is quite good at interpreting common motions. Rocha demonstrated it with a sample program that simply drew the skeleton on screen, and also with a GNOME 3 desktop control application. The desktop application is hardcoded to a handful (pun semi-intended) of actions, rather than a general gesture input framework. There was also a demo set up at the Igalia (Rocha's employer) expo booth.

Skeltrack provides both an asynchronous and a synchronous API, and it reports the locations of joints in both "real world" and screen coordinates — measured in millimeters in the original scene and pixels in the webcam image. Currently the code is limited to identifying one person in the buffer, but there are evidently ways to work around the limitation. Rocha said that a company in Greece was using OpenCV to recognize multiple people in the depth buffer, then running Skeltrack separately on each part of the frame that contained a person. However, the project in question was not doing the skeleton recognition in real-time.

Libfreenect (and thus Skeltrack) is not tied into the XInput input system, nor is Skeltrack itself bound to a multi-touch application framework. That is one possible direction for the code to head in the future; hooking Skeltrack into the same touch event and gesture recognition libraries as multi-touch pads and touch-screens would make Kinect-style hardware more accessible to application developers. But that cannot be the endpoint — depth buffers offer richer information than 2D touch devices; developers can and will find more (and more unusual) things to do with this new interface method. Skeltrack is ahead of the competition (libfreenect lacks skeleton tracking, but its developers recognize the need for it), and that is a win not just for GNOME, but for open source software in general.

[The author would like to thank the GNOME Foundation for travel assistance to A Coruña for GUADEC.]

Comments (none posted)

The Nexus 7: Google ships a tablet

By Jonathan Corbet
July 31, 2012
When life presents challenges, one can always try to cope by buying a new toy. In this case, said new toy is the Nexus 7 tablet, the first "pure Android" tablet offered directly by Google; it is meant to showcase what Android can be on this type of device. The initial indications are that it is selling well, suggesting that the frantic effort to prepare Android for tablets are finally beginning to bear some fruit. What follows are your editor's impressions of this device and the associated "Jelly Bean" Android release.

The Nexus 7 (N7) is an intermediate-size tablet — larger than even the biggest phones, but smaller than, say, a Xoom or iPad device. It features a 7" 1280x800 display and weighs in at 340 grams. There's 1GB of RAM, and up to 16GB of storage; the CPU is a quad-core Tegra3 processor. The notion of a quad-core system that fits easily into a back pocket is amusing to us old-timers, but that's the age we live in now. The N7 features WiFi connectivity and Bluetooth, but there is no cellular connectivity; it has 802.11n support, but cannot access the 5GHz band where 802.11n networks often live. The only camera is a front-facing 1.2 megapixel device; the N7 does not even have the camera application installed by default.

The N7 runs Android 4.1.1, the "Jelly Bean" release. 4.1.1 offers a lot of enhancements over 4.0, but is, for the most part, similar in appearance and functionality. The first impression, once the setup formalities are done, can be a little disconcerting: the home screen is dominated by a large ad for Google's "Play Magazines" service. It makes one think that "pure Android" devices might be going the crapware route, but the ad widget is easily disposed of and never appears again.

As of this writing, there is no CyanogenMod build available for the N7. That is unsurprising, given the newness of the hardware and the fact that CyanogenMod has not yet moved to the Jelly Bean release. But the N7 is an unlocked (or, at least, easily unlockable) device, so one can expect that alternative distributions will become available for it in due time.

Using the N7

Android on tablets has matured considerably since the initial "Honeycomb" release featured on the Xoom. For the most part, things work nicely, at least as far as the standard Google applications are concerned. The ability of third-party applications to work well on larger screens is still highly variable. One bit of remaining confusion is the "menu" button, which appears in different places in different applications, or is absent altogether. Playing the "find the menu" game is a common part of learning any new application. One gets the sense that the Android developers would like to do away with menus altogether, but there are many practical difficulties in doing so.

Perhaps the most jarring change is the switch to Chrome as the built-in web browser. The standard Android browser wasn't perfect, but it had accumulated some nice features over the years. Chrome is capable and fully-featured, and it arguably makes sense for Google to focus on supporting a single browser. But your editor misses the "auto-fit pages" option and the "quick controls" provided by the Android browser. Getting around with Chrome just seems to be a slower process requiring more taps and gestures. Undoubtedly there is a way to get the Android browser onto the N7, but, so far, time has been short and a quick search came up empty.

The N7's front-facing camera is clearly not meant for any sort of photographic use, unless one is especially interested in self portraits. It is useful for the "face unlock" feature, naturally. It is also clearly meant for use with applications like Skype; the N7 should make a very nice video network phone. Unfortunately, video calls in Skype fail to work on your editor's device. Some searching indicates that it works for some people and fails for others; sometimes installing the camera application helps, but not in this case. At this time, the N7 does not appear to be ready for this kind of use.

One need not have an especially conspiracy-theoretical mindset to surmise that Skype's owner (a small company called "Microsoft") might just have an incentive to ensure that Skype works better on its own operating system than on Android. But the truth of the matter is probably more prosaic: by all accounts, the Skype application is just not an example of stellar software engineering. Unfortunately, it is an example of proprietary software, so there is no way for anybody but Skype to fix it. There should really be a place for a free-software video calling application that (1) actually works, and (2) can be verified to lack backdoors for government agencies and anybody else interested in listening in on conversations. But that application does not seem to exist at this time, alas.

Electronic books

Another obvious use case for a 7" tablet is as an electronic book reader. The N7 has some obvious disadvantages relative to the current crop of electronic-ink readers, though: it weighs about twice as much, has a fraction of the battery life, and has a backlit screen that is harder to stare at for hours. Still, it is worth considering for this role; its presence in the travel bag is more easily justified if it can displace another device.

The N7 hardware, in the end, puts in a credible, though not stellar, performance as a book reader. The extra weight is noticeable, but the tablet still weighs less than most books. The rated battery life for reading is about nine hours, possibly extendable by turning off the wireless interface. Nine hours will get one through an international travel experience of moderate length, but one misses the battery life of a proper reader device that can go for weeks at a time without a recharge. The lack of dedicated buttons for page-turning and the like (which are commonly present on dedicated readers) is not a huge problem. The backlit display can actually be advantageous in situations where turning on the lights is frowned upon — when the spouse is sleeping, or on some airplanes, for example.

On the software side, there are a number of reading applications available, ranging from the ultra-proprietary Google Books and Kindle applications to the (nice) GPL-licensed FBReader [Nexus 7] program. Experience shows that the rendering of text does not always work as well in applications like FBReader or Aldiko, though; white space used to separate sections within chapters can disappear, for example, and block quotes can be smashed into the surrounding paragraphs. Readers like Kindle do better in this regard. Another annoyance is that the tablet uses the MTP protocol over the USB connection, meaning that it does not work easily with Calibre. One can, of course, move book files manually or use Calibre's built-in web server to get books onto the device, but it would be a lot nicer if Calibre could just manage the on-device library directly.

In summary, while the experience for users of walled-garden book services is probably pretty good, it remains a bit rough for those wanting to take charge of the books that they so foolishly think they, by virtue of having paid for them, actually own. Beyond that, for content that goes beyond pure text — anything with pictures, for example — a tablet can provide a nicer experience. And, of course, the tablet offers the full Internet and all the other Android applications; whether that is considered to be an advantage in a book reader is almost certainly in the eye of the user.

In the long term, it seems clear that general-purpose tablets will displace dedicated reader devices, but the N7, arguably, is not quite there yet.

In general, though, the N7 works nicely as a media consumption device. It plays videos nicely and is a pleasant device for wandering around on the web. For people who are fully hooked into the Google machine it naturally provides a nicely integrated interface into all of the related services. For the rest of us the experience is a bit more uneven; your editor still yearns for a better email client, for example. But, even with its limitations, the N7 fills in nicely where one does not want to deal with a laptop, but where a phone screen is simply too limiting. This new tablet from Google is a nice device overall; it is likely to remain in active use for some time.

Comments (155 posted)

Page editor: Jonathan Corbet


The leap second of doom

By Jake Edge
August 1, 2012

Since the last leap second caused a certain amount of havoc on Linux systems, it was probably only a matter of time before someone came up with the idea of "testing" for vulnerable systems again. Leap seconds are only supposed to occur at the end of June and December, with six months notice, so administrators might well have been waiting to update their servers for the problem until another was nigh. But "rogue" (or buggy) network time protocol (NTP) servers can effectively cause a leap second at the end of any month—which seems to be what happened on July 31.

It is not uncommon for "black hats" to keep exploiting vulnerabilities well after updates to fix them have been released. This situation is a bit different, though. While updating systems to avoid known vulnerabilities is clearly a "best practice", sometimes system administrators choose to delay updates, especially those that require a reboot, based on their sense of the likelihood of an attack. Given that no real leap seconds were scheduled, and the subversion of NTP servers (or traffic) may have seemed relatively unlikely, some (perhaps large) percentage of Linux systems have not been updated. But, not all "attacks" are caused by black hats; the original problem was caused by a bug, this one may also turn out that way.

Marco Marongiu appears to have been the first to notice the problem:

This is just to warn you that there are now some NTP servers around the globe spreading a leap second announcement for tomorrow 00:00:00 UTC (so, basically, in a few hours now).

If you didn't take action before the leapocalypse last month, you better hurry now.

Given that the notice (to the NTP questions mailing list) came less than four hours before the second "leapocalypse", it's hard to imagine that many administrators saw it in time to take action.

The most interesting question, of course, is how this could have happened. It is tempting to see it as some kind of worldwide denial of service attack, but that is probably not the most likely cause. Further discussion in the thread with Marongiu's warning points to another possible cause.

It seems that the NTP protocol has a "leap" flag (aka LI or leap indicator), which is a two-bit field that indicates whether a second should be inserted or deleted at the end of the current month. Adding a leap second at the end of any month does not correspond with current practice (June and December leap seconds only), but depending on which standard you look at, it is reasonable to do so. RFC 5905, which governs NTP, definitely allows leap seconds at the end of any month, however, so compliant implementations should allow that.

But that still leaves the question of why the LI flag was set to 1 (i.e. add a second at the end of the month). In the thread, "demonccc" noted a server with the flag set. Furthermore, Martin Burnicki described a problem his customers saw after June's leap second in which certain older NTP servers did not reset the leap flag after the event. That could cause leap seconds at the end of every month until it gets fixed.

While there aren't widespread reports of Linux systems going into infinite loops and burning up excess power (unlike June), it does appear to have affected some systems out there. The MythTV users mailing list has a thread about the problem, for example. If it is an actual attack, it is a clever one, but there are enough signs pointing to NTP server bugs that it's pretty unlikely.

Even if it is "just" caused by a bug (or bugs), it is still a bit worrisome. NTP has not generally been seen as a vector for attacks, but this situation shows that it could be. Unpatched systems could be targeted by man-in-the-middle attacks toward the end of every month for example. Both leap-second occurrences (real and fake) point to the problems that can lurk in code that only truly gets tested once in a great while. One wonders what might happen to systems (patched or not) that receive a "subtract a second" NTP message, since there has never been a real negative leap second.

Comments (2 posted)

Brief items

Security quotes of the week

Your silly post reminded me of something, while on vacation recently I bought a video game called "Assassin's Creed Revelations". I didn't have much of a chance to play it, but it seems fun so far. However, I noticed the installation procedure creates a browser plugin for it's accompanying uplay launcher, which grants unexpectedly (at least to me) wide access to websites.

I don't know if it's by design, but I thought I'd mention it here in case someone else wants to look into it (I'm not really interested in video game security, I air-gap the machine I use to play games).

-- Tavis Ormandy discovers a root kit disguised as DRM (more here)

You hereby grant Ninja Tel permission to listen to, read, view and/or record any and all communications sent via the network to which you are a party. [...] Before you get all upset about this, you already know full well that AT&T does this for the NSA. You understand that you have no reasonable expectation of privacy as to any on the Ninja Tel network. You grant Ninja Tel a worldwide, perpetual, assignable, royalty-free license to use any and all recorded or real-time communications sent via the Ninja Tel network to which you are a party. Don't worry, most of this is for the lulz.
-- Terms of service for Ninja Tel, Defcon's private cell network

Comments (6 posted)

Privilege escalation vulnerability in the NVidia binary driver

People running the proprietary NVidia graphics driver on systems with untrusted users may want to have a look at this exploit posted by Dave Airlie. "I was given this anonymously, it has been sent to nvidia over a month ago with no reply or advisory and the original author wishes to remain anonymous but would like to have the exploit published at this time."

Comments (15 posted)

This Cute Chat Site Could Save Your Life And Help Overthrow Your Government (Wired)

Wired writes about (or Cryptocat), which is an AGPL3-licensed browser-based AES-256-encrypted chat program. It was created by 21-year-old Nadim Kobeissi, who is originally from Beirut, Lebanon and now goes to college in Montréal, Canada. "But Kobeissi also knows that it’s equally important that Cryptocat be usable and pretty. Kobeissi wants Cryptocat to be something you want to use, not just need to. Encrypted chat tools have existed for years — but have largely stayed in the hands of geeks, who usually aren’t the ones most likely to need strong crypto. 'Security is not just good crypto. It’s very important to have good crypto, and audit it. Security is not possible without (that), but security is equally impossible without making it accessible.'"

Comments (26 posted)

Martin: Off the Record Messaging: A Tutorial

Ben Martin has a lengthy tutorial on Off the Record (OTR) messaging on his blog. OTR is useful for realtime encrypted communication (e.g. instant messaging, IRC) and Martin's post looks at both the protocol and using libotr to add OTR support to C++ programs. "In order to operate without a web of trust, libotr implements the Socialist Millionaires' Protocol (SMP). The SMP allows two parties to verify that they both know the same secret. The secret might be a passphrase or answer to a private joke that two people will easily know. The SMP operates fine in the presence of eaves droppers (who don't get to learn the secret). Active communications tampering is not a problem, though of course it might cause the protocol not to complete successfully."

Comments (18 posted)

New vulnerabilities

apache-mod_auth_openid: local session ID disclosure

Package(s):apache-mod_auth_openid CVE #(s):CVE-2012-2760
Created:July 26, 2012 Updated:August 1, 2012

From the Mandriva advisory:

mod_auth_openid before 0.7 for Apache uses world-readable permissions for /tmp/mod_auth_openid.db, which allows local users to obtain session ids (CVE-2012-2760).

Mandriva MDVSA-2012:114 apache-mod_auth_openid 2012-07-26

Comments (none posted)

bacula: symlink attack

Package(s):bacula CVE #(s):CVE-2008-5373
Created:July 30, 2012 Updated:August 27, 2012
Description: From the CVE entry:

mtx-changer.Adic-Scalar-24 in bacula-common 2.4.2 allows local users to overwrite arbitrary files via a symlink attack on a /tmp/mtx.##### temporary file, probably a related issue to CVE-2005-2995.

Mageia MGASA-2012-0321 bacula 2012-11-06
Fedora FEDORA-2012-11717 bacula 2012-08-27
Fedora FEDORA-2012-10929 bacula 2012-07-29

Comments (none posted)

bind9: denial of service

Package(s):bind9 CVE #(s):CVE-2012-3817
Created:July 26, 2012 Updated:September 10, 2012

From the Ubuntu advisory:

Einar Lonn discovered that Bind incorrectly initialized the failing-query cache. A remote attacker could use this flaw to cause Bind to crash, resulting in a denial of service.

Oracle ELSA-2014-1984 bind 2014-12-12
openSUSE openSUSE-SU-2013:0605-1 bind 2013-04-03
Slackware SSA:2012-341-01 bind 2012-12-06
Gentoo 201209-04 bind 2012-09-23
Mageia MGASA-2012-0258 bind 2012-09-07
Mageia MGASA-2012-0257 bind 2012-09-07
Fedora FEDORA-2012-11146 bind 2012-08-09
Fedora FEDORA-2012-11153 bind 2012-08-09
Oracle ELSA-2012-1122 bind97 2012-07-31
Oracle ELSA-2012-1123 bind 2012-07-31
Oracle ELSA-2012-1123 bind 2012-07-31
Scientific Linux SL-bind-20120731 bind 2012-07-31
Scientific Linux SL-bind-20120731 bind97 2012-07-31
CentOS CESA-2012:1123 bind 2012-07-31
CentOS CESA-2012:1123 bind 2012-07-31
CentOS CESA-2012:1122 bind97 2012-07-31
Debian DSA-2517-1 bind9 2012-07-30
Red Hat RHSA-2012:1123-01 bind 2012-07-31
Red Hat RHSA-2012:1122-01 bind97 2012-07-31
Mandriva MDVSA-2012:119 bind 2012-07-29
Ubuntu USN-1518-1 bind9 2012-07-26
openSUSE openSUSE-SU-2012:0969-1 bind 2012-08-08
openSUSE openSUSE-SU-2012:0971-1 bind 2012-08-08

Comments (none posted)

ganglia: code execution

Package(s):ganglia CVE #(s):
Created:July 26, 2012 Updated:April 9, 2013

From the Ganglia advisory:

There is a security issue in Ganglia Web going back to at least 3.1.7 which can lead to arbitrary script being executed with web user privileges possibly leading to a machine compromise.

Mandriva MDVSA-2013:080 ganglia 2013-04-09
Mageia MGASA-2012-0277 ganglia 2012-09-30
Fedora FEDORA-2012-10699 ganglia 2012-07-26
Fedora FEDORA-2012-10727 ganglia 2012-07-26

Comments (none posted)

icedtea-web: code execution

Package(s):icedtea-web CVE #(s):CVE-2012-3422 CVE-2012-3423
Created:August 1, 2012 Updated:September 24, 2012
Description: From the Red Hat advisory:

An uninitialized pointer use flaw was found in the IcedTea-Web plug-in. Visiting a malicious web page could possibly cause a web browser using the IcedTea-Web plug-in to crash, disclose a portion of its memory, or execute arbitrary code. (CVE-2012-3422)

It was discovered that the IcedTea-Web plug-in incorrectly assumed all strings received from the browser were NUL terminated. When using the plug-in with a web browser that does not NUL terminate strings, visiting a web page containing a Java applet could possibly cause the browser to crash, disclose a portion of its memory, or execute arbitrary code. (CVE-2012-3423)

Gentoo 201406-32 icedtea-bin 2014-06-29
SUSE SUSE-SU-2013:1174-1 icedtea-web 2013-07-10
openSUSE openSUSE-SU-2013:0966-1 icedtea-web 2013-06-10
SUSE SUSE-SU-2013:0851-1 icedtea-web 2013-05-31
openSUSE openSUSE-SU-2013:0893-1 icedtea-web 2013-06-10
openSUSE openSUSE-SU-2013:0826-1 icedtea-web 2013-05-24
Fedora FEDORA-2012-14340 icedtea-web 2012-09-21
Fedora FEDORA-2012-14316 icedtea-web 2012-09-21
Mandriva MDVSA-2012:122 icedtea-web 2012-08-02
Ubuntu USN-1521-1 icedtea-web 2012-07-31
Scientific Linux SL-iced-20120801 icedtea-web 2012-08-01
Oracle ELSA-2012-1132 icedtea-web 2012-07-31
CentOS CESA-2012:1132 icedtea-web 2012-07-31
Red Hat RHSA-2012:1132-01 icedtea-web 2012-07-31
openSUSE openSUSE-SU-2012:0981-1 icedtea-web 2012-08-10
SUSE SUSE-SU-2012:0979-1 icedtea-web 2012-08-09
openSUSE openSUSE-SU-2012:0982-1 update 2012-08-13
Mageia MGASA-2012-0198 icedtea-web 2012-08-03

Comments (none posted)

isc-dhcp: multiple vulnerabilities

Package(s):isc-dhcp CVE #(s):CVE-2012-3571 CVE-2012-3954
Created:July 26, 2012 Updated:August 6, 2012

From the Debian advisory:

CVE-2012-3571: Markus Hietava of the Codenomicon CROSS project discovered that it is possible to force the server to enter an infinite loop via messages with malformed client identifiers.

CVE-2012-3954: Glen Eustace discovered that DHCP servers running in DHCPv6 mode and possibly DHCPv4 mode suffer of memory leaks while processing messages. An attacker can use this flaw to exhaust resources and perform denial of service attacks.

Oracle ELSA-2013-0504 dhcp 2013-02-25
Gentoo 201301-06 dhcp 2013-01-09
Mageia MGASA-2012-0256 dhcp 2012-09-07
Slackware SSA:2012-237-01 dhcp 2012-08-24
Scientific Linux SL-dhcp-20120803 dhcp 2012-08-03
Scientific Linux SL-dhcp-20120803 dhcp 2012-08-03
Oracle ELSA-2012-1141 dhcp 2012-08-03
Oracle ELSA-2012-1140 dhcp 2012-08-03
CentOS CESA-2012:1141 dhcp 2012-08-03
CentOS CESA-2012:1140 dhcp 2012-08-03
Red Hat RHSA-2012:1141-01 dhcp 2012-08-03
Red Hat RHSA-2012:1140-01 dhcp 2012-08-03
Fedora FEDORA-2012-11079 dhcp 2012-08-01
Debian DSA-2519-1 isc-dhcp 2012-08-01
Ubuntu USN-1519-1 isc-dhcp 2012-07-26
Mandriva MDVSA-2012:115 dhcp 2012-07-26
Mandriva MDVSA-2012:116 dhcp 2012-07-26
Debian DSA-2516-1 isc-dhcp 2012-07-26
openSUSE openSUSE-SU-2012:1006-1 update 2012-08-20
Fedora FEDORA-2012-11110 dhcp 2012-08-06
Debian DSA-2519-2 isc-dhcp 2012-08-04

Comments (none posted)

krb5: denial of service

Package(s):krb5 CVE #(s):CVE-2012-1015
Created:August 1, 2012 Updated:August 6, 2012
Description: From the Red Hat advisory:

An uninitialized pointer use flaw was found in the way the MIT Kerberos KDC handled initial authentication requests (AS-REQ). A remote, unauthenticated attacker could use this flaw to crash the KDC via a specially-crafted AS-REQ request.

Gentoo 201312-12 mit-krb5 2013-12-16
Mandriva MDVSA-2013:042 krb5 2013-04-05
openSUSE openSUSE-SU-2012:0967-1 krb5 2012-08-08
Mandriva MDVSA-2012:111 krb5 2012-08-01
Ubuntu USN-1520-1 krb5 2012-07-31
Scientific Linux SL-krb5-20120801 krb5 2012-08-01
Oracle ELSA-2012-1131 krb5 2012-07-31
Debian DSA-2518-1 krb5 2012-07-31
CentOS CESA-2012:1131 krb5 2012-07-31
Red Hat RHSA-2012:1131-01 krb5 2012-07-31
Fedora FEDORA-2012-11388 krb5 2012-08-05
Fedora FEDORA-2012-11370 krb5 2012-08-09
Mageia MGASA-2012-0196 krb5 2012-08-03

Comments (none posted)

krb5: code execution

Package(s):krb5 CVE #(s):CVE-2012-1014
Created:August 1, 2012 Updated:March 18, 2013
Description: From the Debian advisory:

By sending specially crafted AS-REQ (Authentication Service Request) to a KDC (Key Distribution Center), an attacker could make it free an uninitialized pointer, corrupting the heap. This can lead to process crash or even arbitrary code execution.

Gentoo 201312-12 mit-krb5 2013-12-16
openSUSE openSUSE-SU-2012:0967-1 krb5 2012-08-08
Ubuntu USN-1520-1 krb5 2012-07-31
Debian DSA-2518-1 krb5 2012-07-31
Fedora FEDORA-2012-11388 krb5 2012-08-05

Comments (none posted)

krb5: information disclosure

Package(s):krb5 CVE #(s):CVE-2012-1012
Created:August 1, 2012 Updated:August 1, 2012
Description: From the Ubuntu advisory:

It was discovered that the kadmin protocol implementation in MIT krb5 did not properly restrict access to the SET_STRING and GET_STRINGS operations. A remote authenticated attacker could use this to expose or modify sensitive information. This issue only affected Ubuntu 12.04 LTS.

Ubuntu USN-1520-1 krb5 2012-07-31

Comments (none posted)

libjpeg-turbo: code execution

Package(s):libjpeg-turbo CVE #(s):CVE-2012-2806
Created:August 1, 2012 Updated:April 8, 2013
Description: From the Novell bugzilla:

A Heap-based buffer overflow was found in the way libjpeg-turbo decompressed certain corrupt JPEG images in which the component count was erroneously set to a large value. An attacker could create a specially-crafted JPEG image that, when opened, could cause an application using libpng to crash or, possibly, execute arbitrary code with the privileges of the user running the application.

Mandriva MDVSA-2013:274 libjpeg 2013-11-21
Mandriva MDVSA-2013:044 libjpeg 2013-04-05
Gentoo 201209-13 libjpeg-turbo 2012-09-26
Fedora FEDORA-2012-10721 libjpeg-turbo 2012-08-09
Mageia MGASA-2012-0203 libjpeg 2012-08-06
Mandriva MDVSA-2012:121 libjpeg-turbo 2012-08-01
openSUSE openSUSE-SU-2012:0932-1 libjpeg-turbo 2012-08-01

Comments (none posted)

libpng14: denial of service

Package(s):libpng14 CVE #(s):CVE-2012-3425
Created:August 1, 2012 Updated:August 1, 2012
Description: libpng crashes when loading a corrupted image.
Ubuntu USN-2815-1 libpng 2015-11-19
Debian-LTS DLA-343-1 libpng 2015-11-17
openSUSE openSUSE-SU-2012:0934-1 libpng14 2012-08-01
Debian-LTS DLA-375-1 libpng 2015-12-27
Debian-LTS DLA-375-1 ia32-libs 2016-01-01

Comments (none posted)

puppet: IP address impersonation

Package(s):puppet CVE #(s):CVE-2012-3408
Created:July 30, 2012 Updated:August 1, 2012
Description: From the Red Hat bugzilla:

From puppet labs: Puppet agents with certnames of IP addresses can be impersonated

This affects Puppet 2.6.16 and 2.7.17

If an authenticated host with a certname of an IP address changes IP addresses, and a second host assumes the first host's former IP address, the second host will be treated by the puppet master as the first one, giving the second host access to the first host's catalog. Note: This will not be fixed in Puppet versions prior to the forthcoming 3.x. Instead, with this announcement IP-based authentication in Puppet < 3.x is deprecated.

Resolved in Puppet 2.6.17, 2.7.18

Fedora FEDORA-2012-10891 puppet 2012-07-28

Comments (none posted)

wireshark: remote denial of service

Package(s):wireshark CVE #(s):CVE-2012-4048 CVE-2012-4049
Created:August 1, 2012 Updated:December 26, 2012
Description: From the CVE entries:

The PPP dissector in Wireshark 1.4.x before 1.4.14, 1.6.x before 1.6.9, and 1.8.x before 1.8.1 allows remote attackers to cause a denial of service (invalid pointer dereference and application crash) via a crafted packet, as demonstrated by a usbmon dump. (CVE-2012-4048)

epan/dissectors/packet-nfs.c in the NFS dissector in Wireshark 1.4.x before 1.4.14, 1.6.x before 1.6.9, and 1.8.x before 1.8.1 allows remote attackers to cause a denial of service (loop and CPU consumption) via a crafted packet. (CVE-2012-4049)

Gentoo GLSA 201308-05:02 wireshark 2013-08-30
Gentoo 201308-05 wireshark 2013-08-28
Mandriva MDVSA-2013:055 wireshark 2013-04-05
Debian DSA-2590-1 wireshark 2012-12-26
openSUSE openSUSE-SU-2012:0930-1 wireshark 2012-08-01
Mageia MGASA-2012-0210 wireshark 2012-08-12
Mageia MGASA-2012-0206 wireshark 2012-08-12
Mandriva MDVSA-2012:125 wireshark 2012-08-06

Comments (none posted)

xen: denial of service

Package(s):xen CVE #(s):CVE-2012-2625
Created:August 1, 2012 Updated:September 14, 2012
Description: From the Red Hat advisory:

A flaw was found in the way the pyGrub boot loader handled compressed kernel images. A privileged guest user in a para-virtualized guest (a DomU) could use this flaw to create a crafted kernel image that, when attempting to boot it, could result in an out-of-memory condition in the privileged domain (the Dom0).

openSUSE openSUSE-SU-2012:1573-1 XEN 2012-11-26
openSUSE openSUSE-SU-2012:1572-1 XEN 2012-11-26
openSUSE openSUSE-SU-2012:1172-1 Xen 2012-09-14
openSUSE openSUSE-SU-2012:1174-1 Xen 2012-09-14
SUSE SUSE-SU-2012:1135-1 Xen 2012-09-07
SUSE SUSE-SU-2012:1044-1 Xen 2012-08-27
SUSE SUSE-SU-2012:1043-1 Xen and libvirt 2012-08-27
Scientific Linux SL-xen-20120801 xen 2012-08-01
Oracle ELSA-2012-1130 xen 2012-08-01
CentOS CESA-2012:1130 xen 2012-07-31
Red Hat RHSA-2012:1130-01 xen 2012-07-31

Comments (none posted)

xrdp: weak encryption

Package(s):xrdp CVE #(s):
Created:July 31, 2012 Updated:August 1, 2012
Description: From the SUSE advisory:

The XRDP service was changed so that the default crypto level in XRDP was changed from "low" to "high".

This switches from using a 40 bit encryption to a 128 bit two-way encryption.

SUSE SUSE-SU-2012:0927-1 xrdp 2012-07-31

Comments (none posted)

Page editor: Jake Edge

Kernel development

Brief items

Kernel release status

The 3.6 merge window remains open, so there is no current development kernel release. Changes continue to move into the mainline; see the separate article below for details.

Stable updates: 3.2.24 was released on July 26, 3.4.7 came out on July 30, and 3.0.39 was released on August 1. In addition to the usual fixes, 3.0.39 includes a significant set of backported memory management performance patches.

The 3.2.25 update is in the review process as of this writing; it can be expected on or after August 2.

Comments (none posted)

Quotes of the week

If someone has to read the code to find out what the driver is, your help text probably sucks.
Dave Jones

The number of underscores for the original rcu_dereference()'s local variable was the outcome of an argument about how obfuscated that variable's name should be in order to avoid possible collisions with names in the enclosing scope. Nine leading underscores might seem excessive, or even as you say, insane, but on the other hand no name collisions have ever come to my attention.
Paul McKenney

For cases like that, I will do the merge myself, but I'll actually double-check my merge against the maintainer merge. And it's happened more than once that my merge has differed, and _my_ merge is the correct one. The maintainer may know his code better, but I know my merging. I do a ton of them.
Linus Torvalds

Comments (none posted)

RIP Andre Hedrick (Register)

The Register has an article on the life and death of Andre Hedrick, the former kernel IDE maintainer who passed away on July 13. "Today, millions of people use digital restriction management systems that lock down books, songs and music - the Amazon Kindle, the BBC iPlayer and Spotify are examples - but consumers enter into the private commercial agreement knowingly. It isn't set by default in the factory, as it might have been. The PC remains open rather than becoming an appliance. Andre was never comfortable taking the credit he really deserved for this achievement." See also this weblog page where memories are being collected.

Comments (34 posted)

Garzik: An Andre To Remember

Jeff Garzik has shared his memories of Andre Hedrick on the linux-kernel mailing list; worth a read. "This is a time for grief and a time for celebration of Andre's accomplishments, but also it is a time to look around at our fellow geeks and offer our support, if similar behavioral signs appear."

Full Story (comments: 68)

Kernel development news

3.6 merge window part 2

By Jonathan Corbet
August 1, 2012
As of this writing, just over 8,200 non-merge changesets have been pulled into Linus's repository; that's nearly 4,000 since last week's summary. It seems that any hopes that 3.6 might be a relatively low-volume cycle are not meant to be fulfilled. That said, things seem to be going relatively smoothly, with only a small number of problems being reported so far.

User-visible changes merged since last week include:

  • The btrfs send/receive feature has been merged. Send/receive can calculate the differences between two btrfs subvolumes or snapshots and serialize the result; it can be used for, among other things, easy mirroring of volumes and incremental backups.

  • Btrfs has also gained the ability to apply disk quotas to subvolumes. According to btrfs maintainer Chris Mason, "This enables full tracking of how many blocks are allocated to each subvolume (and all snapshots) and you can set limits on a per-subvolume basis. You can also create quota groups and toss multiple subvolumes into a big group. It's everything you need to be a web hosting company and give each user their own subvolume."

  • The kernel has gained better EFI booting support. This should allow the removal of a lot of EFI setup code from various bootloaders, which now need only load the kernel and jump into it.

  • The new "coupled cpuidle" code enables better CPU power management on systems where CPUs cannot be powered down individually. See this commit for more information on how this feature works.

  • The LED code supports a new "oneshot" mode where applications can request a single LED blink via sysfs. See Documentation/leds/ledtrig-oneshot.txt for details.

  • A number of random number generator changes have been merged, hopefully leading to more secure random numbers, especially on embedded devices.

  • The VFIO subsystem, intended to be a safe mechanism for the creation of user-space device drivers, has been merged; see Documentation/vfio.txt for more information.

  • The swap-over-NFS patch set has been merged, making the placement of swap files on NFS-mounted filesystems a not entirely insane thing to do.

  • New hardware support includes:

    • Processors and systems: Loongson 1B CPUs.

    • Audio: Wolfson Micro "Arizona" audio controllers (WM5102 and WM5110 in particular).

    • Input: NXP LPC32XX key scanners, MELFAS MMS114 touchscreen controllers, and EDT ft5x06 based polytouch devices.

    • Miscellaneous: National Semiconductor/TI LM3533 ambient light sensors, Analog Devices AD9523 clock generators, Analog Devices ADF4350/ADF4351 wideband synthesizers, Analog Devices AD7265/AD7266 analog to digital converters, Analog Devices AD-FMCOMMS1-EBZ SPI-I2C-bridges, Microchip MCP4725 digital-to-analog converters, Maxim DS28E04-100 1-Wire EEPROMs, Vishay VCNL4000 ambient light/proximity sensors, Texas Instruments OMAP4+ temperature sensors, EXYNOS HW random number generators, Atmel AES, SHA1/SHA256, and AES crypto accelerators, Blackfin CRC accelerators, AMD 8111 GPIO controllers, TI LM3556 and LP8788 LED controllers, BlinkM I2C RGB LED controllers, Calxeda Highbank memory controllers, Maxim Semiconductor MAX77686 PMICs, Marvell 88PM800 and 88PM805 PMICs, Lantiq Falcon SPI controllers, and Broadcom BCM63xx random number generators.

    • Networking: Cambridge Silicon Radio wireless controllers.

    • USB: Freescale i.MX ci13xxx USB controllers, Marvell PXA2128 USB 3.0 controllers, and Maxim MAX77693 MUIC USB port accessory detectors.

    • Video4Linux: Realtek RTL2832 DVB-T demodulators, Analog Devices ADV7393 encoders, Griffin radioSHARK and radioSHARK2 USB radio receivers, and IguanaWorks USB IR transceivers.

    • Staging graduations: IIO digital-to-analog converter drivers.

Changes visible to kernel developers include:

  • The pstore persistent storage mechanism has improved handling of console log messages. The Android RAM buffer console mechanism has been removed, since pstore is now able to provide all of the same functionality. Pstore has also gained function tracer support, allowing the recording of function calls prior to a panic.

  • The new PWM framework eases the writing of drivers for pulse-width modulation devices, including LEDs, fans, and more. See Documentation/pwm.txt for details.

  • There is a new utility function:

         size_t memweight(const void *ptr, size_t bytes);

    It returns the number of bits set in the given memory region.

  • The fault injection subsystem has a new module which can inject errors into notifier call chains.

  • There is a new "flexible proportions" library allowing the calculation of proportions over a variable period. See <linux/flex_proportions.h> for the interface.

  • The new __GFP_MEMALLOC flag allows memory allocations to dip into the emergency reserves.

  • The IRQF_SAMPLE_RANDOM interrupt flag no longer does anything; it has been removed from the kernel.

Andrew Morton's big pile of patches was merged on August 1; that is usually a sign that the merge window is nearing its end. Expect a brief update after the 3.6 merge window closes, but, at this point, the feature set for this release can be expected to be nearly complete.

Comments (1 posted)


By Jonathan Corbet
August 1, 2012
Even a casual reader of the kernel source code is likely to run into invocations of the ACCESS_ONCE() macro eventually; there are well over 200 of them in the current source tree. Many such readers probably do not stop to understand just what that macro means; a recent discussion on the mailing list made it clear that even core kernel developers may not have a firm idea of what it does. Your editor was equally ignorant but decided to fix that; the result, hopefully, is a reasonable explanation of why ACCESS_ONCE() exists and when it must be used.

The functionality of this macro is actually well described by its name; its purpose is to ensure that the value passed as a parameter is accessed exactly once by the generated code. One might well wonder why that matters. It comes down to the fact that the C compiler will, if not given reasons to the contrary, assume that there is only one thread of execution in the address space of the program it is compiling. Concurrency is not built into the C language itself, so mechanisms for dealing with concurrent access must be built on top of the language; ACCESS_ONCE() is one such mechanism.

Consider, for example, the following code snippet from kernel/mutex.c:

    for (;;) {
	struct task_struct *owner;

	owner = ACCESS_ONCE(lock->owner);
	if (owner && !mutex_spin_on_owner(lock, owner))
 	/* ... */

This is a small piece of the adaptive spinning code that hopes to quickly grab a mutex once the current owner drops it, without going to sleep. There is much more to this for loop than has been shown here, but this code is sufficient to show why ACCESS_ONCE() can be necessary.

Imagine for a second that the compiler in use is developed by fanatical developers who will optimize things in every way they can. This is not a purely hypothetical scenario; as Paul McKenney recently attested: "I have seen the glint in their eyes when they discuss optimization techniques that you would not want your children to know about!" These developers might create a compiler that concludes that, since the code in question does not actually modify lock->owner, it is not necessary to actually fetch its value each time through the loop. The compiler might then rearrange the code into something like:

    owner = ACCESS_ONCE(lock->owner);
    for (;;) {
	if (owner && !mutex_spin_on_owner(lock, owner))

What the compiler has missed is the fact that lock->owner is being changed by another thread of execution entirely. The result is code that will fail to notice any such changes as it executes the loop multiple times, leading to unpleasant results. The ACCESS_ONCE() call prevents this optimization happening, with the result that the code (hopefully) executes as intended.

As it happens, an optimized-out access is not the only peril that this code could encounter. Some processor architectures (x86, for example) are not richly endowed with registers; on such systems, the compiler must make careful choices regarding which values to keep in registers if it is to generate the highest-performing code. Specific values may be pushed out of the register set, then pulled back in later. Should that happen to the mutex code above, the result could be multiple references to lock->owner. And that could cause trouble; if the value of lock->owner changed in the middle of the loop, the code, which is expecting the value of its local owner variable to remain constant, could become fatally confused. Once again, the ACCESS_ONCE() invocation tells the compiler not to do that, avoiding potential problems.

The actual implementation of ACCESS_ONCE(), found in <linux/compiler.h>, is fairly straightforward:

    #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))

In other words, it works by turning the relevant variable, temporarily, into a volatile type.

Given the kinds of hazards presented by optimizing compilers, one might well wonder why this kind of situation does not come up more often. The answer is that most concurrent access to data is (or certainly should be) protected by locks. Spinlocks and mutexes both function as optimization barriers, meaning that they prevent optimizations on one side of the barrier from carrying over to the other. If code only accesses a shared variable with the relevant lock held, and if that variable can only change when the lock is released (and held by a different thread), the compiler will not create subtle problems. It is only in places where shared data is accessed without locks (or explicit barriers) that a construct like ACCESS_ONCE() is required. Scalability pressures are causing the creation of more of this type of code, but most kernel developers still should not need to worry about ACCESS_ONCE() most of the time.

Comments (82 posted)

TCP Fast Open: expediting web services

By Michael Kerrisk
August 1, 2012

Much of today's Internet traffic takes the form of short TCP data flows that consist of just a few round trips exchanging data segments before the connection is terminated. The prototypical example of this kind of short TCP conversation is the transfer of web pages over the Hypertext Transfer Protocol (HTTP).

The speed of TCP data flows is dependent on two factors: transmission delay (the width of the data pipe) and propagation delay (the time that the data takes to travel from one end of the pipe to the other). Transmission delay is dependent on network bandwidth, which has increased steadily and substantially over the life of the Internet. On the other hand, propagation delay is a function of router latencies, which have not improved to the same extent as network bandwidth, and the speed of light, which has remained stubbornly constant. (At intercontinental distances, this physical limitation means that—leaving aside router latencies—transmission through the medium alone requires several milliseconds.) The relative change in the weighting of these two factors means that over time the propagation delay has become a steadily larger component in the overall latency of web services. (This is especially so for many web pages, where a browser often opens several connections to fetch multiple small objects that compose the page.)

Reducing the number of round trips required in a TCP conversation has thus become a subject of keen interest for companies that provide web services. It is therefore unsurprising that Google should be the originator of a series of patches to the Linux networking stack to implement the TCP Fast Open (TFO) feature, which allows the elimination of one round time trip (RTT) from certain kinds of TCP conversations. According to the implementers (in "TCP Fast Open", CoNEXT 2011 [PDF]), TFO could result in speed improvements of between 4% and 41% in the page load times on popular web sites.

We first wrote about TFO back in September 2011, when the idea was still in the development stage. Now that the TFO implementation is starting to make its way into the kernel, it's time to visit it in more detail.

The TCP three-way handshake

To understand the optimization performed by TFO, we first need to note that each TCP conversation begins with a round trip in the form of the so-called three-way handshake. The three-way handshake is initiated when a client makes a connection request to a server. At the application level, this corresponds to a client performing a connect() system call to establish a connection with a server that has previously bound a socket to a well-known address and then called accept() to receive incoming connections. Figure 1 shows the details of the three-way handshake in diagrammatic form.

[TCP Three-Way Handshake]
Figure 1: TCP three-way handshake between a client and a server

During the three-way handshake, the two TCP end-points exchange SYN (synchronize) segments containing options that govern the subsequent TCP conversation—for example, the maximum segment size (MSS), which specifies the maximum number of data bytes that a TCP end-point can receive in a TCP segment. The SYN segments also contain the initial sequence numbers (ISNs) that each end-point selects for the conversation (labeled M and N in Figure 1).

The three-way handshake serves another purpose with respect to connection establishment: in the (unlikely) event that the initial SYN is duplicated (this may occur, for example, because underlying network protocols duplicate network packets), then the three-way handshake allows the duplication to be detected, so that only a single connection is created. If a connection was established before completion of the three-way handshake, then a duplicate SYN could cause a second connection to be created.

The problem with current TCP implementations is that data can only be exchanged on the connection after the initiator of the connection has received an ACK (acknowledge) segment from the peer TCP. In other words, data can be sent from the client to the server only in the third step of the three-way handshake (the ACK segment sent by the initiator). Thus, one full round trip time is lost before data is even exchanged between the peers. This lost RTT is a significant component of the latency of short web conversations.

Applications such as web browsers try to mitigate this problem using HTTP persistent connections, whereby the browser holds a connection open to the web server and reuses that connection for later HTTP requests. However, the effectiveness of this technique is decreased because idle connections may be closed before they are reused. For example, in order to limit resource usage, busy web servers often aggressively close idle HTTP connections. The result is that a high proportion of HTTP requests are cold, requiring a new TCP connection to be established to the web server.

Eliminating a round trip

Theoretically, the initial SYN segment could contain data sent by the initiator of the connection: RFC 793, the specification for TCP, does permit data to be included in a SYN segment. However, TCP is prohibited from delivering that data to the application until the three-way handshake completes. This is a necessary security measure to prevent various kinds of malicious attacks. For example, if a malicious client sent a SYN segment containing data and a spoofed source address, and the server TCP passed that segment to the server application before completion of the three-way handshake, then the segment would both cause resources to be consumed on the server and cause (possibly multiple) responses to be sent to the victim host whose address was spoofed.

The aim of TFO is to eliminate one round trip time from a TCP conversation by allowing data to be included as part of the SYN segment that initiates the connection. TFO is designed to do this in such a way that the security concerns described above are addressed. (T/TCP, a mechanism designed in the early 1990s, also tried to provide a way of short circuiting the three-way handshake, but fundamental security flaws in its design meant that it never gained wide use.)

On the other hand, the TFO mechanism does not detect duplicate SYN segments. (This was a deliberate choice made to simplify design of the protocol.) Consequently, servers employing TFO must be idempotent—they must tolerate the possibility of receiving duplicate initial SYN segments containing the same data and produce the same result regardless of whether one or multiple such SYN segments arrive. Many web services are idempotent, for example, web servers that serve static web pages in response to URL requests from browsers, or web services that manipulate internal state but have internal application logic to detect (and ignore) duplicate requests from the same client.

In order to prevent the aforementioned malicious attacks, TFO employs security cookies (TFO cookies). The TFO cookie is generated once by the server TCP and returned to the client TCP for later reuse. The cookie is constructed by encrypting the client IP address in a fashion that is reproducible (by the server TCP) but is difficult for an attacker to guess. Request, generation, and exchange of the TFO cookie happens entirely transparently to the application layer.

At the protocol layer, the client requests a TFO cookie by sending a SYN segment to the server that includes a special TCP option asking for a TFO cookie. The SYN segment is otherwise "normal"; that is, there is no data in the segment and establishment of the connection still requires the normal three-way handshake. In response, the server generates a TFO cookie that is returned in the SYN-ACK segment that the server sends to the client. The client caches the TFO cookie for later use. The steps in the generation and caching of the TFO cookie are shown in Figure 2.

[Generating the TFO cookie]
Figure 2: Generating the TFO cookie

At this point, the client TCP now has a token that it can use to prove to the server TCP that an earlier three-way handshake to the client's IP address completed successfully.

For subsequent conversations with the server, the client can short circuit the three-way handshake as shown in Figure 3.

[Employing the TFO cookie]
Figure 3: Employing the TFO cookie

The steps shown in Figure 3 are as follows:

  1. The client TCP sends a SYN that contains both the TFO cookie (specified as a TCP option) and data from the client application.

  2. The server TCP validates the TFO cookie by duplicating the encryption process based on the source IP address of the new SYN. If the cookie proves to be valid, then the server TCP can be confident that this SYN comes from the address it claims to come from. This means that the server TCP can immediately pass the application data to the server application.

  3. From here on, the TCP conversation proceeds as normal: the server TCP sends a SYN-ACK segment to the client, which the client TCP then acknowledges, thus completing the three-way handshake. The server TCP can also send response data segments to the client TCP before it receives the client's ACK.

In the above steps, if the TFO cookie proves not to be valid, then the server TCP discards the data and sends a segment to the client TCP that acknowledges just the SYN. At this point, the TCP conversation falls back to the normal three-way handshake. If the client TCP is authentic (not malicious), then it will (transparently to the application) retransmit the data that it sent in the SYN segment.

Comparing Figure 1 and Figure 3, we can see that a complete RTT has been saved in the conversation between the client and server. (This assumes that the client's initial request is small enough to fit inside a single TCP segment. This is true for most requests, but not all. Whether it might be technically possible to handle larger requests—for example, by transmitting multiple segments from the client before receiving the server's ACK—remains an open question.)

There are various details of TFO cookie generation that we don't cover here. For example, the algorithm for generating a suitably secure TFO cookie is implementation-dependent, and should (and can) be designed to be computable with low processor effort, so as not to slow the processing of connection requests. Furthermore, the server should periodically change the encryption key used to generate the TFO cookies, so as to prevent attackers harvesting many cookies over time to use in a coordinated attack against the server.

There is one detail of the use of TFO cookies that we will revisit below. Because the TFO mechanism allows a client that submits a valid TFO cookie to trigger resource usage on the server before completion of the three-way handshake, the server can be the target of resource-exhaustion attacks. To prevent this possibility, the server imposes a limit on the number of pending TFO connections that have not yet completed the three-way handshake. When this limit is exceeded, the server ignores TFO cookies and falls back to the normal three-way handshake for subsequent client requests until the number of pending TFO connections falls below the limit; this allows the server to employ traditional measures against SYN-flood attacks.

The user-space API

As noted above, the generation and use of TFO cookies is transparent to the application level: the TFO cookie is automatically generated during the first TCP conversation between the client and server, and then automatically reused in subsequent conversations. Nevertheless, applications that wish to use TFO must notify the system using suitable API calls. Furthermore, certain system configuration knobs need to be turned in order to enable TFO.

The changes required to a server in order to support TFO are minimal, and are highlighted in the code template below.

    sfd = socket(AF_INET, SOCK_STREAM, 0);   // Create socket

    bind(sfd, ...);                          // Bind to well known address
    int qlen = 5;                            // Value to be chosen by application
    setsockopt(sfd, SOL_TCP, TCP_FASTOPEN, &qlen, sizeof(qlen));
    listen(sfd, ...);                        // Mark socket to receive connections

    cfd = accept(sfd, NULL, 0);              // Accept connection on new socket

    // read and write data on connected socket cfd


Setting the TCP_FASTOPEN socket option requests the kernel to use TFO for the server's socket. By implication, this is also a statement that the server can handle duplicated SYN segments in an idempotent fashion. The option value, qlen, specifies this server's limit on the size of the queue of TFO requests that have not yet completed the three-way handshake (see the remarks on prevention of resource-exhaustion attacks above).

The changes required to a client in order to support TFO are also minor, but a little more substantial than for a TFO server. A normal TCP client uses separate system calls to initiate a connection and transmit data: connect() to initiate the connection to a specified server address and (typically) write() or send() to transmit data. Since a TFO client combines connection initiation and data transmission in a single step, it needs to employ an API that allows both the server address and the data to be specified in a single operation. For this purpose, the client can use either of two repurposed system calls: sendto() and sendmsg().

The sendto() and sendmsg() system calls are normally used with datagram (e.g., UDP) sockets: since datagram sockets are connectionless, each outgoing datagram must include both the transmitted data and the destination address. Since this is the same information that is required to initiate a TFO connection, these system calls are recycled for the purpose, with the requirement that the new MSG_FASTOPEN flag must be specified in the flags argument of the system call. A TFO client thus has the following general form:

    sfd = socket(AF_INET, SOCK_STREAM, 0);
    sendto(sfd, data, data_len, MSG_FASTOPEN, 
                (struct sockaddr *) &server_addr, addr_len);
        // Replaces connect() + send()/write()
    // read and write further data on connected socket sfd


If this is the first TCP conversation between the client and server, then the above code will result in the scenario shown in Figure 2, with the result that a TFO cookie is returned to the client TCP, which then caches the cookie. If the client TCP has already obtained a TFO cookie from a previous TCP conversation, then the scenario is as shown in Figure 3, with client data being passed in the initial SYN segment and a round trip being saved.

In addition to the above APIs, there are various knobs—in the form of files in the /proc/sys/net/ipv4 directory—that control TFO on a system-wide basis:

  • The tcp_fastopen file can be used to view or set a value that enables the operation of different parts of the TFO functionality. Setting bit 0 (i.e., the value 1) in this value enables client TFO functionality, so that applications can request TFO cookies. Setting bit 1 (i.e., the value 2) enables server TFO functionality, so that server TCPs can generate TFO cookies in response to requests from clients. (Thus, the value 3 would enable both client and server TFO functionality on the host.)

  • The tcp_fastopen_cookies file can be used to view or set a system-wide limit on the number of pending TFO connections that have not yet completed the three-way handshake. While this limit is exceeded, all incoming TFO connection attempts fall back to the normal three-way handshake.

Current state of TCP fast open

Currently, TFO is an Internet Draft with the IETF. Linux is the first operating system that is adding support for TFO. However, as yet that support remains incomplete in the mainline kernel. The client-side support has been merged for Linux 3.6. However, the server-side TFO support has not so far been merged, and from conversations with the developers it appears that this support won't be added in the current merge window. Thus, an operational TFO implementation is likely to become available only in Linux 3.7.

Once operating system support is fully available, a few further steps need to be completed to achieve wider deployment of TFO on the Internet. Among these is assignment by IANA of a dedicated TCP Option Number for TFO. (The current implementation employs the TCP Experimental Option Number facility as a placeholder for a real TCP Option Number.)

Then, of course, suitable changes must be made to both clients and servers along the lines described above. Although each client-server pair requires modification to employ TFO, it's worth noting that changes to just a small subset of applications—most notably, web servers and browsers—will likely yield most of the benefit visible to end users. During the deployment process, TFO-enabled clients may attempt connections with servers that don't understand TFO. This case is handled gracefully by the protocol: transparently to the application, the client and server will fall back to a normal three-way handshake.

There are other deployment hurdles that may be encountered. In their CoNEXT 2011 paper, the TFO developers note that a minority of middle-boxes and hosts drop TCP SYN segments containing unknown (i.e., new) TCP options or data. Such problems are likely to diminish as TFO is more widely deployed, but in the meantime a client TCP can (transparently) handle such problems by falling back to the normal three-way handshake on individual connections, or generally falling back for all connections to specific server IP addresses that show repeated failures for TFO.


TFO is promising technology that has the potential to make significant reductions in the latency of billions of web service transactions that take place each day. Barring any unforeseen security flaws (and the developers seem to have considered the matter quite carefully), TFO is likely to see rapid deployment in web browsers and servers, as well as in a number of other commonly used web applications.

Comments (49 posted)

Patches and updates

Kernel trees


Build system

Core kernel code

Development tools

Device drivers

Filesystems and block I/O

Memory management



Virtualization and containers


Page editor: Jonathan Corbet


CeroWrt: Bufferbloat, IPv6, and more

By Jake Edge
August 1, 2012

The CeroWrt project is an effort aimed at helping to solve a number of different problems in current home router distributions, but its primary focus is on bufferbloat. The problem of excessive buffering of network packets is endemic on the Internet as a whole, but it is much easier to start addressing the problem at the home router end, especially considering the easy availability of Linux-based firmware distributions. Beyond bufferbloat, though, CeroWrt also enables experiments with two "next generation" Internet features, IPv6 and DNSSEC.

CeroWrt is built atop the OpenWrt project's router firmware. It uses the OpenWrt development version ("Attitude Adjustment") with extras added by the CeroWrt team. Unlike OpenWrt's extensive list of supported hardware, CeroWrt focuses on supporting just two router devices: the Netgear WNDR3700v2 and WNDR3800. Both are capable devices with free driver support for all of the hardware and, importantly, the wireless networking hardware.

The most recent release is 3.3.8-10 from July 9. There is a 3.3.8-11 version available, but project lead Dave Täht suggested that people steer clear until a problem with the 5GHz wireless AP is resolved. Installing CeroWrt is fairly straightforward, either through the web-based GUI by uploading the "sysupgrade" image, or via tftp using the "factory" image.

Once the device has been flashed, one can connect to it on its default address, CeroWrt specifically chose to avoid the other blocks of non-routable IP addresses ( and so that it can be experimented with in existing networks. Most home networks live in 192.168.x.y space and the 10.x.y.z addresses are often used by Internet backbones. The web UI is hosted on port 81 (and only available on the inside of the network, not via the WAN) so that users can use port 80 for their own router-based web site if they wish.

[CeroWrt status]

The web UI is very similar to that of the current OpenWrt "Backfire" (10.03.1) release that I run on my venerable Linksys WRT54GL. The UI is built using LuCI, a Lua-based tool for building web interfaces for embedded devices. LuCI is noticeably snappier on the WNDR3700v2 that I used for CeroWrt testing than it is on the WRT54GL—presumably due to a faster CPU. The interface provides a great deal of status information, as well as allowing users to change various configuration settings. Everything from updating the firmware and checking firewall rules to changing DNS settings and examining system logs is available through the interface. In addition, there are various realtime graphs of system load, network connections, bandwidth usage, and so on.

The first steps after connecting to the router are some predictable things like setting the root password and adding wireless passwords, but there is another important step: enabling and configuring Active Queue Management (AQM). Essentially, one must determine the download and upload speeds (using something like of the Internet link to plug into the web form and enable AQM. Testing bandwidth that way is static, so dynamic changes are not reflected, which is sub-optimal and the project is looking at better tests and ways to set those values automatically. It should also be noted that in limited testing, no real difference was apparent (even when copying large files while doing something interactive) with AQM enabled or disabled—more study is clearly required.

[CeroWrt traffic graph]

The wireless networking setup is rather different than what OpenWrt (at least for Backfire) provides. There are four separate SSIDs for various kinds of WiFi access. CEROwrt and CEROwrt5 provide normal access for 2.4 and 5GHz respectively, while CEROwrt-guest and CEROwrt-guest5 are for guest access. By default, they all act as open access points and do not require a password, but enabling WPA2 for the non-guest SSIDs (at least) is suggested. There are also two babel SSIDs which are there to support mesh networking.

The guest SSIDs correspond to the guest zone in the firewall configuration. By default, guest traffic can only go to the Internet, so it does not have access to other devices on the local network. That allows one to give access to visitors (and neighbors) without risking unauthorized access to systems behind the firewall. The 172.30.42.x address space is broken up in to separate sub-networks such that each SSID gets its own set of 30 IP addresses, as does each set of wired, mesh, and DMZ devices.

But the main focus of CeroWrt is to experiment with solutions to the bufferbloat problem. To that end, it uses the 3.3.8 kernel (the CeroWrt release numbering follows that of the underlying kernel) with the addition of the controlled delay (CoDel) AQM algorithm. CoDel requires the byte queue limits feature that was added in the 3.3 kernel.

But there are additional goals for the project, and IPv6 support ("make IPv6 networking in the home as simple as IPv4") is near the top of the list. While it isn't as "simple" as IPv4 (yet), the instructions are pretty easy to follow to have the router use a 6in4 tunnel, as well as to provide IPv6 on the local net. That makes CeroWrt a nice choice for experimenting with IPv6 as well, though some UI support to configure it would be welcome. There are other features to experiment with as well, including DNSSEC and the mesh networking, though I didn't try those out.

Overall, the experience of switching over to the CeroWrt-powered router was done with very few hitches—other than a balky router "authentication" web application at my ISP. The addition of 5GHz WiFi is welcome (though my ISP is typically the bottleneck anyway), as is the availability of a guest zone. In fact, I haven't moved back to the old router, though I probably will at some point so that the WNDR3700v2 can be used for experiments without upending "Words with Friends" in the other room. The router is cheap enough that getting a second (or more likely a WNDR3800 at less than $150) to replace the WRT54GL is certainly a possibility. Though messing around with mesh networking between them might still result in spousal complaints.

Täht's 3.3.8-10 release announcement outlined the way forward (or a way forward) for CeroWrt. There is lots of work to be done, but the bufferbloat projects, including CeroWrt, are not funded, currently. That is clearly making it difficult for Täht to continue working on CeroWrt—at least to the level he would like. While it appears that there are lots of volunteers and companies helping out, the overall project maintainer role is languishing to some extent.

But, as he points out, all of the CeroWrt work is being pushed upstream to OpenWrt (and CeroWrt frequently merges back as well). The two projects are focused in different areas, but there is clearly some synergy between them, which is likely to help both. It is a bit unclear when a "stable" CeroWrt release might be forthcoming, but it is pretty usable in its current form. What it most needs, perhaps, is some developer time and, possibly, some funding.

Comments (2 posted)

Brief items

Distribution quote of the week

There are a ton of reasons why Debian may have an older version of an upstream release. For example, and I hasten to point out that the following list is by no means exhaustive, and not all of the possibilities are common:

* The Debian package maintainer is dead, but nobody noticed it yet, and nobody has wanted an update badly enough to do an NMU or to adopt the package.

* The upstream release is actually a fake. It's a trojan, which was put there by the NSA in order to infiltrate the CIA mainframe. The Debian package maintainer noticed this and uploaded that version of the package to non-free instead of main, since the trojan code does not come with proper source.

* Upstream has moved the RSS feed for new releases without notifying the old feed of the move, so the Debian package maintainer missed that, and doesn't actually know about the new release. Due to a complicated series of happenstance involving rainbows, midget unicorns, and the ongoing rewrite of the Netsurf web browser, the Debian package maintainer is not able to find the new feed because it would require doing a web search and their browser doesn't have working form support now. No other browser is available on the Amiga they're using as their only computer, either.

* The new release is requested by insistent Hurd porters, and the Debian package maintainer absolutely loathes the Hurd, and will refuse to upload any packages that work on the Hurd.

* The Debian package maintainer suffers from mental problems cause by reading debian-devel too much, and now has a nervous breakdown every time they recognize a name as someone whom they've seen on the list.

* The Debian development process is being sabotaged by Microsoft sending people to the developers' houses pretending to be TV license checkers or Jehova's witnesses every time they detect, using the hardware wireless keylogger embedded in every PC, that the developer is trying to run any Debian packaging command.

* Apple is also sabotaging Debian by paying me to write snarky e-mails on Debian mailing lists to distract everyone from working on the actual release, so that we can get past the freeze and start uploading things again without having to worry that it breaks things in ways that makes the freeze longer.

-- Lars Wirzenius

Comments (1 posted)

Distribution News

Debian GNU/Linux

Debian's new draft trademark policy

The Debian project is attempting to rewrite its trademark policy to be "as free as possible" while still protecting the project's identity; project leader Stefano Zacchiroli has just announced a new draft for consideration. "The objective of this trademark policy is to encourage widespread use and adoption of the DEBIAN trademarks, styles, and logos (hereinafter ``trademarks'') while ensuring consistent usage which avoids any confusion in the mind of the users. The goal of this policy is to encourage use of the DEBIAN mark in commercial or non-commercial activity based around DEBIAN."

Full Story (comments: 5)

Bits from the nippy Release Team

The Debian release team reports that wheezy still has far too many RC bugs; "As mentioned in the freeze announcement [RT:FRZ], the number of RC bugs in wheezy is still significantly larger than would normally be expected at the start of a freeze. Please feel encouraged to fix a bug (or three) from the list [BTS:RC] to help get issues resolved in testing." They also tell us that Debian 8.0 (wheezy+1) will be known as "Jessie".

Full Story (comments: none)


New features for Fedora 18

The minutes from the July 30 meeting of the Fedora Engineering Steering Committee show than an impressive list of new features has been approved for the Fedora 18 release. New goodies in F18 will include Samba4, the GNOME2-based MATE desktop, the Linux Trace Toolkit next generation (LTTng), OwnCloud, the Federated Filesystem, and more.

Full Story (comments: 16)

New Sponsor of Fedora Infrastructure

Colocation America is now a sponsor of Fedora Infrastructure with the donation of a server in their Los Angles data center. "We have put this server to use as a proxy and application server, so if you are going to any sites and you are in North America you will likely be accessing us from there."

Full Story (comments: 1)

Newsletters and articles of interest

Distribution newsletters

Comments (none posted)

SUSE Linux powers 147,456-core German supercomputer (ars technica)

Ars technica has brief look at the world's fastest x86-based supercomputer and Europe's fastest supercomputer—not to mention the 4th most powerful in the world. The SuperMUC, which runs SUSE Linux, is located at the Leibniz Supercomputing Centre (LRZ) of the Bavarian Academy of Sciences. "A statement issued by SUSE says that the supercomputer has a unique cooling system inspired by human blood circulation that significantly reduces energy consumption. The supercomputer is reportedly designed so that some of the energy can be recaptured and used to heat buildings at the LRZ campus. The statement also says that the SuperMUC has 155,000 processor cores capable of delivering a total of 3 petaflops of processing power. A report on Slashdot indicates that the computer has 324 terabytes of memory."

Comments (1 posted)

Page editor: Rebecca Sobol


Toward generic atomic operations

August 1, 2012

This article was contributed by Jon Masters

Modern Linux distributions support a number of different computer architectures. Each of these architectures has its own quirks and implementation differences that are largely abstracted by a clever collaboration between the kernel and system libraries, such as the GNU C Library (glibc). However, there are still some ways in which core architecture differences are exposed to higher-level software. One example of this is in the implementation of atomic memory operations.

Atomic operations are necessary to ensure programming correctness in those situations where there are multiple threads of simultaneous execution. (Atomic operations are even necessary on uniprocessor systems, where interrupts and asynchronous scheduling of other threads provide the illusion of multithreading.) Atomicity means that a given operation (such as incrementing a counter variable) takes place in an indivisible fashion; its result is either visible to all CPUs in the system instantaneously, or does not take place at all (it is similar in concept but less fashionable than transactional memory).

Atomic operations are typically fairly small, hand-optimized assembly functions that provide for atomic increment and decrement of counters, acquisition and release of locks, and so on. Since these operations differ from one architecture to another, typically few developers on any given project understand the different implementations in their entirety, and even fewer care to vouch for the code being correct across all supported architectures. Although there are generic implementations available in libraries such as pthreads, not all projects can make use of them, for a variety of reasons, including a desire to be portable to non-Linux platforms; thus a number of projects within the average Linux distribution still contain their own custom implementations of atomic operations.

Atomic operations are particularly useful on modern systems with many CPUs running multi-threaded applications, but even a system with a single CPU (core) has a need for them. After all, the Linux kernel may interrupt a running task thread "A" to service an interrupt routine, and may then schedule a different task thread "B" before returning to the one that was originally interrupted. Without a means to ensure certain operations have taken place atomically, there would be no way to cope with potential interference between task thread B and task thread A (e.g., if both threads race for the same lock or operate on the same variable).

How do atomic operations work?

Atomic operations fundamentally require underlying hardware support. There are, broadly speaking, two popular mechanisms used by CPUs in implementing support for atomic operations in modern computer architectures. The older, more traditional approach involves directly manipulating memory locations, for example, a compare-and-swap (or compare-and-exchange) instruction such as CMPXCHG on Intel's x86 and Itanium architectures. These instructions compare the value of a given memory location with a value supplied as part of the instruction. If the two values are identical, then yet another supplied value is written back to the memory location, while the overall result is signaled in the form of a returned value (almost universally in a register). This whole sequence takes place as a single processor instruction, for example by locking the processor local bus, and disabling external interrupts for its duration.

A more modern alternative to directly acting upon memory locations is to implement a reservation engine within the processor. A reservation engine (as used by modern RISC architectures such as ARM, POWER, etc.) is typically implemented under the control of two special processor instructions. The first instruction, often called load-with-reservation, load-exclusive, or load-link, atomically loads the value of a given memory location into a register and marks that memory location as reserved.

The loaded value can then be manipulated arbitrarily before a second instruction, often called store-exclusive, or store-conditional, atomically stores an updated value from a register back to a given memory location, provided that no modification has been made to that memory location in the interim. The store-exclusive operation returns a value indicating whether the store operation completed successfully or not, which is important because there is an opportunity for external interference between the load, modification, and subsequent store. This means that higher-level atomic operations built using these instructions typically involve a loop, retrying the entire operation until the store is successful.

Reservation engines are slightly more complex to work with in software (requiring two instructions and a comparison in a loop block), but they come with multiple benefits. Although compare-and-exchange appears simpler because it is implemented in a single instruction, it in fact causes poor performance in the CPU pipeline because multiple additional sub-stages are required for the implied memory operations. By contrast, the reservation engine approach explicitly separates memory reads and stores into multiple operations. A reservation engine can be implemented separately from the bulk of the core CPU logic, and can be as complex as desired (including necessary logic to synchronize with other reservation engines).

Some reservation engine implementations handle only a single memory location at a time on a given processor, while others are more complex. In every case, outstanding reservations are invalidated upon a context switch between running tasks (often as a result of a specific invalidation in the context-switch code). The reservation approach can also handle the "ABA problem"—that is, it can detect any changes to the target memory location after the atomic load, even if the original value is written back prior to the store, because the reservation engine is aware of all memory modifications.

The story doesn't quite end there. Some architectures lack full support for certain atomic operations that are required by higher-level software, such as atomic 64-bit (multiple word) load and store operations. In this case, there are workarounds (e.g. the "kuser" helper, a VDSO-like helper on older ARM processors), but that is a topic best saved for another article.

How are atomic operations used?

Atomic-operations libraries typically provide a set of functions that include incrementing and decrementing a memory location, compare-and-swap of a memory location, and higher-level operations built using these functions, such as lock acquisition and release. All of these various operations are built using the fundamental architecture-specific processor instructions of the kind described above. As an example, the OpenMPI message-passing library includes the following inline assembly code to implement an atomic 32-bit addition operation on version 7 of the ARM Architecture:

           ldrex   r2, [r0]        @ exlusively load address at r0 into r2
           add     r2, r2, r1      @ increment the value of r2 with value in r1
           strex   r3, r2, [r0]    @ attempt to store the value of r2 at the address in r0
           cmp     r3, #0          @ r3 contains result from store exclusive, test if successful 
           bne     REFLSYM(13)     @ repeat entire operation if it was interrupted
           mov     r0, r2          @ return value that was written
           bx      lr

This atomic increment function works by using the special ldrex and strex instructions, which control the CPU's reservation engine, to gain exclusive access to a desired memory location. The example code first loads the contents of a given memory location into a general-purpose register, adds a value to the register, and then tests the result of attempting to exclusively store this change back to memory. If it is successful, the function returns. If it is not successful, the operation is repeated until it completes without interference.

OpenMPI includes a custom atomic-operations library that implements support for 13 different base architectures. Some of those architectures have multiple ways to achieve the same thing, depending on which version is in use. For example, ARM processors have moved away from the deprecated SWP (compare-and-swap) instruction in favor of a reservation-engine-based approach. Both approaches need to be supported if code is to run on newer and older ARM processors. It is unfortunate that projects such as OpenMPI have needed to implement their own atomic-operations libraries, which must be periodically updated for new processors and are hard to maintain because they require special knowledge of multiple underlying architectures.

The C11 memory model

The main culprit for this state of affairs is the venerable C programming language. Traditionally, C had no explicit internal notion of multi-threaded applications, and only a very weakly ordered memory model. That is, it was hard to guarantee that the compiler would not reorder memory operations on shared variables because the language lacked the built-in constructs that are necessary to inform the compiler of such hidden data dependencies. Over the years, independent platform-specific libraries have provided support for general threading abstractions, including atomic operations performed on their own defined types. This is all well and good, but not all projects can rely upon such platform libraries for atomic operations, especially those that want to remain highly portable to non-Linux systems.

This is where C11 comes in. C11 introduces a new memory model explicitly designed with support for threading and atomic operations. It introduces the new standard header stdatomic.h, atomic integer types such as atomic_int (constructed using the _Atomic type qualifier), and a new memory_order enumerated type that defines various levels of memory ordering from the weakest memory_order_relaxed (no specific ordering requirement) through to memory_order_seq (sequentially consistent), the strongest ordering. Using the C11 memory model, the previous inline assembly can be reduced to defining an _Atomic typed variable and using one of the atomic fetch-and-modify generic functions, such as atomic_fetch_add().

Here is an example of using the C11 defined atomics:

    #include <stdatomic.h>

    _Atomic int magic_number = ATOMIC_VAR_INIT(42); // can also use _Atomic(int)
    atomic_fetch_add(&magic_number, 5);             // make Star Trek fans happy

This defines and initializes a new variable called magic_number to the value 42 (using ATOMIC_VAR_INIT()) before correcting that value for the true answer to the ultimate question of life, the universe, and everything, which, as everyone knows, Star Trek correctly defined to be 47. Using the new C11 extensions, projects such as OpenMPI do not need to implement their own atomic-operations library, because the underlying language now provides the necessary support, already optimized for each target architecture.

There is, however, at least one little problem with rushing to embrace C11. As of this writing, GCC and glibc do not yet have full support for the new atomic types. This is slated to be added in the GCC 4.8 time frame. (The glibc maintainers are aware of the topic, and plan to incorporate support once it is available in GCC.) Meanwhile, GCC 4.7 gained support for a new set of built-in functions to provide memory-model-aware atomic operations that were designed specifically to meet the requirements of the C11 memory model. The idea is that the higher-level C11 atomic primitives can be easily built using these built-ins in time for GCC 4.8. In the meantime, there are several alternative options. One of these is to use a third-party macro-based implementation of the C11 types (which already use the GCC built-ins), of which several exist.

Another option prior to broader C11 support being available is to use the new GCC built-ins directly. For common operations, such as atomic increment, the GCC 4.7 built-in atomic functions look very similar to those that form part of the broader C11 standard:

    __atomic_store_n(&v, 42, __ATOMIC_SEQ_CST);
    __atomic_add_fetch(&v, 5, __ATOMIC_SEQ_CST);

The OpenMPI atomic-add example code could thus be replaced with a single call to __atomic_add_fetch(), which will atomically fetch a value from a memory location, add a supplied value, and return the result, doing the right thing for every supported architecture. Compiling the example and disassembling it will (unsurprisingly) produce a sequence of operations that appears very similar to the inline assembly it replaces. Of course, one does need to be careful in using the GCC built-ins directly because they do not require the use of variables with an _Atomic type qualifier. This means that it is possible to mix the use of atomic functions with manipulations of regular variables without triggering any compiler warnings. Still, this is no different than existing code failing to use a call to a special inline assembly function, which is also incorrect.

In time, it is the hope of this author that most projects implementing custom inline assembly for atomic operations can move to a standard C11 based implementation using stdatomic.h. That would be both portable to many different platforms, and easier to maintain by distributions and upstream projects themselves, because a specialist knowledge of the architecture specifics can be abstracted by the compiler. (Note, however, that projects may need to continue to support legacy approaches to atomic operations, if they want to continue supporting old compilers.) You can read more about the new C11 atomic operations (including the details of the memory model and its orderings not covered in depth here) in section 7.17.7 of the final draft version of the C11 specification [PDF].

Comments (17 posted)

Brief items

Quotes of the week

I'm afraid I didn't read all of the changes in detail, I'm hoping there's no 'And Keith Packard promises to bake cookies to anyone using the software' phrase here.
Keith Packard

Raise your hand if you have used CVS before. Yeah ... so you know the pain.
— Federico Mena-Quintero (at GUADEC)

Comments (none posted)

New Cygwin Package: python-3.2.3-1

Cygwin Python 3.2.3-1 is now available. This is the first official release of the package supporting Python 3.

Full Story (comments: none)

Edda log visualizer released

The first release of Edda, a log visualizer for MongoDB, has been made available. "MongoDB servers generate some pretty substantial log files. These lengthy logs are one of the more important tools we have for diagnosing issues with MongoDB servers. However, correlating logs from multiple servers can be time-consuming. Enter Edda, a log visualizer for MongoDB." This release focuses on visualizations of replica sets, with more features planned for the future.

Full Story (comments: none)

Newsletters and articles

Development newsletters from the last week

Comments (none posted)

KDE Release 4.9 – in memory of Claire Lotion

KDE has released version 4.9, providing major updates to KDE Plasma Workspaces, KDE Applications, and the KDE Platform. "This release is dedicated to the memory of KDE contributor Claire Lotion. Claire's vibrant personality and enthusiasm were an inspiration to many in our community, and her pioneering work on the format, scope and frequency of our developer meetings changed the way we go about implementing our mission today. Through these and other activities she left a notable mark on the software we are able to release to you today, and we are grateful for and humbled by her efforts."

Comments (3 posted)

OpenStreetMap bot removes waypoints after licensing change (The H)

The H writes about changes in OpenStreetMap (OSM) data. The title is a little misleading as the licensing change hasn't actually happened yet, but OpenStreetMap is preparing for it by removing data from people that did not consent to the change. "The reason for the licensing change is that the current Creative Commons licence is largely inapplicable to collections of data such as the OpenStreetMap mapping database. The Open Database licence has been developed to resolve this problem. Like the Creative Commons licence, it is a share-alike licence, meaning users must return any improvements or changes to the data to the community." The removal is said to be "barely noticeable in many places" but there have been some complaints in the OSM community.

Comments (31 posted)

Otte: staring into the abyss

On his blog, Benjamin Otte has some observations and criticisms of the GNOME project. He outlines a number of problem areas including understaffing, a loss of market and mind-share, and a lack of clear goals. "In fact, these days GNOME describes itself as a “community that makes great software”, which is as nondescript as you can get for software development. The biggest problem with having no goals is that you can’t measure yourself. Nobody can say if GNOME 3 is better or worse than GNOME 2. There is no recognized metric anywhere. This also leads to frustration in lots of places."

Comments (195 posted)

Page editor: Nathan Willis


Brief items

FSFE wants to better protect free software licenses from bankruptcy

The Free Software Foundation Europe is working on an interesting problem: what happens to free software licenses when the rights holder goes bankrupt? The organization currently is pushing a change to German bankruptcy law in particular: "The clause ensures that Free Software licensing model would not be negatively affected by a bankruptcy of a licensing rights holder. It makes it clear that any offer to grant Free Software license made before the licensor's bankruptcy can be accepted by anyone even after the bankruptcy proceedings started."

Full Story (comments: 14)

Articles of interest

Free Software Supporter -- Issue 52, July 2012

The July edition of the Free Software Foundation's monthly newsletter covers the winner of the Restricted Boot webcomic contest, an update to the Guide to DRM-free Living, Compliance Lab, the solution to Posner's patent problem, a 5-part interview with Richard Stallman on Restricted Boot, and several other topics.

Full Story (comments: none)

New Books

Think Like a Programmer--New from No Starch Press

No Starch Press has released "Think Like a Programmer" by V. Anton Spraul.

Full Story (comments: none)

Learning Rails 3 -- New from O'Reilly Media

O'Reilly Media has released "Learning Rails 3" by Simon St. Laurent, Edd Dumbill and Eric J Gruber.

Full Story (comments: none)

Calls for Presentations

PyCon UK Call for Papers

PyCon UK will take place September 28 - October 1, 2012 in Coventry, West Midlands, UK. The call for papers is open until August 14. "If you would like to share your expertise, tell us your horror stories or pimp your project, please consider giving a talk at PyConUK."

Full Story (comments: none)

Call for Papers: PyHPC 2012

PyHPC will take place November 16 in Salt Lake City, Utah. The call for papers closes September 14. It will be held in conjunction with the International Conference for High Performance Computing, Networking, Storage and Analysis (SC12).

Full Story (comments: none)

Upcoming Events

LPC microconference topics announced

The Linux Plumbers Conference (August 29-31, San Diego) has posted a detailed agenda showing the topics to be covered this year. "As you can see we have a wide range of issues to tackle and this year’s Linux Plumbers Conference is shaping up to be a great event." The early registration period is also about to come to an end.

Comments (none posted)

LPI Forum in Warsaw, Poland

The Linux Professional Institute (LPI) and its affiliate, LPI-Central Europe, will host a forum for Linux professionals on September 28, 2012, in Warsaw, Poland. "The forum will feature speakers on a number of technology subjects including Free and Open Source Software solutions, professional skills development, IT innovation and entrepreneurship, lifelong learning, Linux certification and the workforce development of Linux and Open Source professionals."

Full Story (comments: none)

Events: August 2, 2012 to October 1, 2012

The following event listing is taken from the Calendar.

August 3
August 4
Texas Linux Fest San Antonio, TX, USA
August 8
August 10
21st USENIX Security Symposium Bellevue, WA, USA
August 18
August 19
PyCon Australia 2012 Hobart, Tasmania
August 20
August 21
Conference for Open Source Coders, Users and Promoters Taipei, Taiwan
August 20
August 22
YAPC::Europe 2012 in Frankfurt am Main Frankfurt/Main, Germany
August 25 Debian Day 2012 Costa Rica San José, Costa Rica
August 27
August 28
XenSummit North America 2012 San Diego, CA, USA
August 27
August 28
GStreamer conference San Diego, CA, USA
August 27
August 29
Kernel Summit San Diego, CA, USA
August 28
August 30
Ubuntu Developer Week IRC
August 29
August 31
2012 Linux Plumbers Conference San Diego, CA, USA
August 29
August 31
LinuxCon North America San Diego, CA, USA
August 30
August 31
Linux Security Summit San Diego, CA, USA
August 31
September 2
Electromagnetic Field Milton Keynes, UK
September 1
September 2
Kiwi PyCon 2012 Dunedin, New Zealand
September 1
September 2
VideoLAN Dev Days 2012 Paris, France
September 1 Panel Discussion Indonesia Linux Conference 2012 Malang, Indonesia
September 3
September 4
Foundations of Open Media Standards and Software Paris, France
September 3
September 8
DjangoCon US Washington, DC, USA
September 4
September 5
Magnolia Conference 2012 Basel, Switzerland
September 8
September 9
Hardening Server Indonesia Linux Conference 2012 Malang, Indonesia
September 10
September 13
International Conference on Open Source Systems Hammamet, Tunisia
September 14
September 21
Debian FTPMaster sprint Fulda, Germany
September 14
September 16
Debian Bug Squashing Party Berlin, Germany
September 14
September 16
KPLI Meeting Indonesia Linux Conference 2012 Malang, Indonesia
September 15
September 16
PyTexas 2012 College Station, TX, USA
September 15
September 16
Bitcoin Conference London, UK
September 17
September 20
SNIA Storage Developers' Conference Santa Clara, CA, USA
September 17
September 19
Postgres Open Chicago, IL, USA
September 18
September 21
SUSECon Orlando, Florida, US
September 19
September 21
2012 X.Org Developer Conference Nürnberg, Germany
September 19
September 20
Automotive Linux Summit 2012 Gaydon/Warwickshire, UK
September 21 Kernel Recipes Paris, France
September 21
September 23
openSUSE Summit Orlando, FL, USA
September 24
September 25
OpenCms Days Cologne, Germany
September 24
September 27
GNU Radio Conference Atlanta, USA
September 27
September 28
PuppetConf San Francisco, US
September 27
September 29
YAPC::Asia Tokyo, Japan
September 28
October 1
PyCon UK 2012 Coventry, West Midlands, UK
September 28
September 30
Ohio LinuxFest 2012 Columbus, OH, USA
September 28
September 30
PyCon India 2012 Bengaluru, India
September 28 LPI Forum Warsaw, Poland

If your event does not appear here, please tell us about it.

Page editor: Rebecca Sobol

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds