|
|
Subscribe / Log in / New account

LWN.net Weekly Edition for June 9, 2011

Oracle donates OpenOffice.org to Apache

By Jake Edge
June 8, 2011

The news that Oracle was proposing to donate the OpenOffice.org (OOo) code to the Apache Software Foundation (ASF) came as a surprise to many, though it probably shouldn't have. Many optimistically hoped that Oracle's plan to turn OOo over to an "organization focused on serving that broad constituency [the OOo community] on a non-commercial basis" meant that it would turn to The Document Foundation (TDF), which forked OOo into LibreOffice (LO) in September 2010, as the obvious repository for the code. But, for a number of reasons, that was probably never a very likely outcome; some discussions evidently took place between Oracle and the TDF, but there seems to be enough bad blood—along with licensing differences—that another home for OOo was sought.

Oracle's contribution

Oracle proposed OOo as an Apache Incubator project on June 1 with a post to Apache's incubator-general mailing list. The original posting from Oracle VP Luke Kowalski was done as an .odt file, which made it hard to comment on, so Greg Stein posted the text of the proposal. Shortly thereafter, it was turned into a wiki page which has been updated to reflect the discussions about the proposal.

Other than the proposal itself, and a press release with statements from Oracle, Apache, and IBM, there has been little said by Oracle about this move. IBM, on the other hand, has been quite vocal, with three separate, very favorable blog posts (Rob Weir, Ed Brill, and Bob Sutor) that came out more-or-less at the same time as the proposal. This seemingly coordinated response didn't necessarily sit well with some in the OOo/LO community, but TDF had enough notice to put out its own statement that was conciliatory, if disappointed.

Basically, Oracle has signed a license grant to the ASF covering a list of files that make up OOo. That allows the ASF to release the code under the Apache License. Oracle will also be transferring the OOo trademarks to the foundation, though there is a typo in the current transfer ("OpenOffice" rather than "OpenOffice.org") that is currently being addressed. There are some questions whether the listed files are actually all of those needed to build OOo, but the belief is that Oracle will work with ASF to address any deficiencies.

Apache incubation

The license and trademark grant is a "done deal", by and large, but where things go from there are still a bit up in the air. Apache has an "incubation" process that is meant to help new (to Apache) projects come up to speed on how Apache projects work and are governed. In addition, the incubation process is meant to allow time to handle any licensing issues with the code (as all Apache projects must be licensed under the Apache license), as well as to determine if the project has attracted enough of a community to be a viable project going forward.

As spelled out in the Incubation Policy, the project must have a "champion" who is an ASF member. For OOo, Sam Ruby will be the champion. In addition, there needs to be at least one "mentor" from the ASF for an incubator project. For OOo, there are eight mentors listed at the time of this writing. The role of the mentors is to assist the project through the process by providing guidance on Apache philosophy and policies. In order to get a sense for how much interest there is in the potential "podling" (as accepted incubator projects are called), a list of "initial committers" is being gathered in the proposal. "Committers" does not necessarily imply developers as it is meant to cover anyone who plans to do any kind of contribution to the project. There are more than 60 people listed as initial committers at the time of this writing.

Once the proposal is firmed up, a vote will be taken to determine whether the podling is accepted into the incubator program. That vote will likely happen quite soon, almost certainly before the middle of June. Based on the discussions in the mailing list, it seems pretty likely that the proposal will be accepted. The consensus seems to be that, while there may be substantial barriers to overcome before the OOo project could become an Apache top-level project, the incubation process is meant to shake those problems out. If that doesn't happen, the project will eventually be terminated, but there is no reason not to see if the problems can be worked out.

As might be guessed, that consensus (if consensus it truly is) used up a lot of electrons to emerge. There are multiple 100+ message threads in the mailing list that are discussing various aspects of the proposal. It is not only ASF members who are participating either, as various TDF members, OOo and LO community members, and other interested parties are chiming in. For the most part, it has been a polite conversation, as various commenters have been careful to steer the discussion so as to keep it on-topic and congenial—asking that flames be taken elsewhere. But it's also clear that there are some strong emotional undercurrents, at least partly because the TDF/LO community feels somewhat slighted.

It's not surprising that they feel that way. TDF and its community have done a huge amount of work in the last eight months to create a meritocratic organization to foster LO. In addition, there has been a lot of technical work done to clean up what is, by many accounts, a codebase that has the potential to inflict eye cancer, as well as work to add new features, set up build farms, and so on. Much of that work may need to be redone by any Apache project, so it looks an awful lot like a waste of effort to the LO community.

Licensing

The bigger issue may be licensing, however. When TDF formed, largely due to what it saw as mismanagement of the project, first by Sun then by Oracle, it took the OOo code under the only license it could: LGPLv3. In order to try to attract companies like IBM into contributing to LO, the foundation asked that contributions be made with a dual LGPL/Mozilla Public License (MPL) license. The MPL is a weaker copyleft license which requires that changes to existing code be released, but allows extensions and the like to be kept closed. By dual-licensing with the MPL, it would still allow companies to release LO with proprietary extensions if they could get a license for the LGPL-covered and Oracle-owned core.

At least one company has such a license, and that's IBM for its Lotus Symphony products. Prior to Sun changing the license for OOo version 2, IBM released its proprietary OOo-based products using the earlier Sun Industry Standards Source License (SISSL), which did not require that code changes be released. After Sun dropped that license for version 2, IBM had to negotiate a license separate from the LGPL so that it could keep its code closed.

The only reason Sun could issue that license to IBM is because it always required OOo contributors to grant Sun a joint copyright on the contribution. That means that Sun, now Oracle, can do anything it wants with the code, including licensing it for proprietary use. This contributor license agreement (CLA), which essentially made for an uneven playing field because only Sun/Oracle had certain rights, was another problem that caused the LO fork. It should be noted, though, that the CLA is what allows Oracle to grant ASF the right to release the code under the Apache license. Without it, all contributors would have had to agree to the change—which might have been logistically, and perhaps ideologically, difficult.

Ideology comes into play because there are two very different philosophies here when it comes to free software: copyleft vs. non-copyleft. The Apache license is a non-copyleft license that, similar to the BSD license, allows anyone to do what they wish with the code. The GPL, LGPL, MPL, and others require that modifications be released under various circumstances. Copyleft licenses restrict the ability of companies to keep parts of the code private, while non-copyleft licenses have no requirements of that sort.

The belief is that companies are more likely to contribute when they can keep some of their "secret sauce" to themselves. The BSDs have had some success with that philosophy, though GPL-covered Linux is often held up as a counter-example. It is Apache, though, that has arguably had the most success with building communities of both companies and individuals around non-copyleft code.

The ASF is, quite reasonably, proud of its license and accomplishments, so the ability to gain an Apache-branded desktop office suite is rather attractive. That said, OOo is also not an obvious fit for the organization. ASF has largely targeted server applications and, as noted by several commenters in the mailing list, is accustomed to making source-only releases. Users of OOo are unlikely to expect to have to build their own binaries, so some kind binary release will be needed. For Linux, this is less of an issue as there are distributions aplenty that will make binary releases for their users if they decide to ship OOo; in the Windows and Mac worlds—which make up the vast majority of OOo users—it's a more difficult problem.

It should be noted that, even in the Linux world, most major distributions have switched over to LO, or plan to, so some kind of a switch back to OOo would be required. Since many of the companies behind the larger distributions are also TDF supporters, that kind of a change is unlikely, at least in the near term.

Possible outcomes

One of the more optimistic conversations on the mailing list looks at ways that TDF and ASF could collaborate, without necessarily joining forces. Neither side looks likely to budge on its license choice, at least in the near term, so combining the two is simply not possible. There is something of an imbalance between the two, though, because TDF can adopt any of the Apache-licensed code (either Oracle's initial contribution or any further changes), while an Apache project cannot adopt the LGPL/MPL-licensed changes that TDF has made (or will make in the future). That one-way door is inherent in the nature of non-copyleft licenses; not only can the code be taken proprietary, it can be additionally licensed under a copyleft license.

Should the podling get accepted but fail to graduate to a top-level project, the Apache-licensed code will be available and presumably TDF will be the home of the community around the OOo/LO codebase. On the other hand, if Apache OOo takes off and the LO community largely moves over to the new project, one could imagine the LO code being re-licensed. The bulk of the LO changes were done by companies like Novell, Red Hat, Canonical, and others, so a change to the Apache license for those parts would just require the strokes of a few pens.

The other plausible outcome is that both projects thrive—or at least survive—presumably each smaller than the combination would be. The codebases would continue to diverge to a point where they would be completely different office suites that both natively supported Open Document Format (ODF). It would get harder and harder for LO to adopt OOo changes because of the divergence, so at some point, they would go their separate ways. That split is what worries many, because it would probably result in two less-capable suites. Others argue that competition between the projects may lead to both becoming better—it certainly wouldn't be the first such split in free software.

Looking at how the two projects can collaborate is an avenue toward avoiding the split, however. If the codebases could be kept in sync fairly closely, and perhaps some LO contributions also licensed under the Apache license, the divergence could be kept to a minimum. Whether the two communities can work together remains to be seen, but there are proposals for joint meetings and/or a summit of some kind. At least some cooperation in the near term seems likely, but there are some big hurdles for Apache OOo to clear.

Challenges

Numerous challenges for the likely podling have been mentioned in the threads, starting with the problem of creating binaries for end users—along with the bandwidth and server requirements to support those users. But there is more to it than that. While there are numerous initial committers listed for the project, from many different organizations as well as individual contributors, the bulk of the full-time, paid OOo staff will, at least initially, be coming from IBM. That worries some because IBM's priorities could change at any time, which might lead to a podling without enough of a contributor base.

There are also some questions about IBM's goals in pushing for an Apache OOo project. The company was never a large contributor to OOo, even after it joined the project with some fanfare in 2007. Many of its contributions have languished, and not been merged into the OOo mainline. On the other hand, IBM already has a license for the code that it needs so it's a bit unclear why it would go to the trouble of pushing Apache OOo if it didn't really have hopes of seeing a larger community grow up around it.

In addition, IBM doesn't have much of a track record in community-oriented free software projects. It has certainly contributed to various projects (notably the Linux kernel), but it lacks experience in leading a free software community—at least one that isn't directly under its control. Apache does have that experience, however, and has policies in place to ensure that its projects are governed well (starting with the incubator program itself).

There are also questions about external dependencies that may not be available under an Apache license, which might necessitate disabling some functionality or rewriting those pieces. Another missing piece from the list of files provided by Oracle is the translations that were done for OOo, which may just be an oversight. The ASF folks posting on the mailing list seem comfortable that these things can be worked out as part of the incubation process.

As a number of people have pointed out, there is a certain irony to this recent engagement between ASF, IBM, and Oracle. Apache certainly has reason to be relatively unhappy with IBM because of its abandonment of the Harmony project—something that has been cited several times as a cautionary tale regarding OOo—and Oracle because of its unwillingness to license the Java compatibility tests to Apache, which led to Apache resigning from the Java Community Process executive committee. It is a testament to the pragmatism and maturity of the ASF that it has seemingly not allowed those other problems to interfere with the current OOo contribution.

It will be interesting to watch this play out. It is unfortunate in many ways because an opportunity to fix the split in the OOo and LO development communities has been lost—or at least delayed further. It is tempting to speculate on what might have happened had Oracle made this move, say, ten months ago. But it didn't, and it owned the code, so it can make decisions that make the most sense for Oracle and its partners. At this point it seems like a face-saving move by Oracle, along with a poke in the eye to TDF, but it may be that Oracle has contracts with IBM or others that require moving the code to an organization with a non-copyleft outlook.

The decisions made by the podling going forward will likely give us a view of how interested IBM and the OOo community are in working with LO. There are presumably lots of cleanups that LO has done that could be adopted by OOo (it's hard to imagine that code and comment removals, for example, are covered by a license). That would make it easier for code to move between the two projects as it takes more than just compatible licenses (assuming some LO contributors are willing to do that) to make that work.

There seems to be a belief that some part, perhaps a large part, of the OOo community was left behind when TDF forked. Clearly Oracle employees were left out (presumably by Oracle fiat), but that doesn't really change as Oracle appears to have no interest in the project once the transition is complete. Perhaps there are constituencies that are not served well by TDF and will be by an Apache OOo project, but the progress made by LO vs. OOo since the fork doesn't seem to indicate that. We'll all just have to watch and see where things go from here.

Comments (30 posted)

Webian: A Mozilla-based web desktop

June 8, 2011

This article was contributed by Nathan Willis

At first glance it looks like a typo, but Webian is in fact an experimental, open source desktop environment using Mozilla and Gecko as its core. Developer Ben Francis made the first public release last week, spawning immediate comparisons to Google's ChromeOS project, and even inciting concern that Mozilla would draw the search giant's wrath for wandering into its corporate territory. But Webian is Francis' personal project, not underwritten or sponsored by Mozilla. It does, however, show off some key concepts that the browser-maker is interested in promoting — namely Mozilla's belief that HTML5 and open web standards are the development platform of the future.

Essentially, Webian is a small set of applications written on top of Mozilla's Chromeless toolkit. Note that the name "Chromeless" refers to Mozilla's long-standing habit of calling its existing user interface layer "chrome," and is not taking a swipe at the Google browser. It strips away the entire browser interface (including the XUL and XPCOM used by Firefox) and replaces it with a layer written in HTML, CSS, and JavaScript itself. Chromeless is an evolution of Mozilla's Prism project from years past, and the current version has the same run-time features as Firefox 4, with JavaScript APIs for calling browser functionality. Mozilla Labs has demonstrated other Chromeless-based projects in the past, including basic browsers and a code editor derived from the SkyWriter (formerly Bespin) collaborative editor.

0.1 release

Last week, Mozilla Labs featured a guest blog post by Francis about the 0.1 release of Webian Shell, the basic "desktop" for Webian. The Shell is an integrated web browser that needs no other window manager or desktop trappings. It runs in full-screen mode, providing a "home screen," a row of tabs across the bottom, and a URL-and-title-bar across the top. Francis has been actively developing Webian since 2009, based on design concepts he cooked up while in college. The design concept document explains the scope and architecture choices in more detail than the project's wiki: the browser is the only application, there are stripped-down tools to access other functions (such as hardware settings), and only minimal borrowing from "thick client" desktop metaphors of ages past.

The 0.1 release is available in binary form for Linux (32- and 64-bit), Windows, and Mac OS X, as well as source. Not everything described in the design document is implemented yet; Webian Shell works as a full-screen browser, but the home screen [Home screen] sports only a clock and "shut down" button. Also missing is Francis' concept of "stripe" UI elements: notifications and queries that slip into place horizontally and stack on top of each other, pushing browser window contents down rather than layering on top of them. The goal, according to the design document, is to "remove the concept of 2.5 dimensions where possible and treat 2D as 2D."

It is important to note that Webian Shell by design runs on top of your existing operating system. While it does run full-screen (for Linux, that means it run over X, and does not interact with the window manager), it is not a full OS stack like ChromeOS is. Thus, at the moment, you can use it as a browser and a UI demo for the eventual desktop system, but not a replacement for your complete environment. As a browser, it runs quite fast — faster, it seemed to me, than did Firefox 4 itself. Partly that is due to not loading Firefox extensions, but simply shedding the heft of Firefox's own interface seems to give a noticeable speedup (installed plug-ins, it should be noted, do run in Chromeless and Webian Shell).

Webian inherits its security model from Chromeless. The latest version takes advantage of out-of-process plugins and out-of-process tabs (which debuted in Firefox 4), so one page crashing should not crash the entire app (or, in Webian's case, shell). However, Flash or Java hiccups in one page may force the user to re-load other pages in order to re-start a crashed plug-in. Privacy controls are another matter; Chromeless and Webian can technically maintain separate user profiles just like Firefox, but the interface for managing profiles is not yet implemented in Webian Shell.

Aggravatingly, at the moment several of the high-profile Google web services do not run in Webian Shell, due to an upstream bug in Chromeless that hits pages incorporating the X-Frame-Options HTTP header. Nevertheless, there are still plenty of functional web applications available in the wild, so you can can easily test out Webian Shell for long browsing sessions.

[Applications]

The long-term plan involves separate applications beyond the Shell, however. The project is working on desktop widgets for the home screen based on the W3C widget specification, and a photo management application that implements a local interface for tagging and organizing content, but still connects to remote, web-based services for publishing and storage. That choice is initially hard to get used to: the ChromeOS-like approach would be to write an entirely server-delivered photo organizer, then deliver it through the browser.

Chrome OS and other competition

Despite Webian's limited scope as a browser and (potentially) desktop environment, the comparisons to ChromeOS are inevitable. Francis' post on the Mozilla Labs blog led some online media outlets to describe Webian as a Mozilla project — an error Francis is quick to correct. He has received help and input from the Chromeless team (including work on the X-Frame-Options bug), but the project is not affiliated with Mozilla nor is he an employee.

Still, the project does line up quite closely with several of Mozilla's goals. It serves as an independent showcase for Chromeless, which is poised to take on a more prominent role now that Mozilla has announced the end of support for Gecko embedding APIs.

Mozilla is also pushing forward on the "installable web app" front, through apps.mozillalabs.com. While Firefox support for these apps is provided by an extension, native support built into a thin-client desktop like Webian would arguably be a better demonstration of their value. Google's Chrome team has independently developed its own installable web app framework, with a similar but not-quite-compatible manifest format for the browser to consume. Francis said he would like to support both, although he would prefer that both parties come to an agreement on a common format.

Another feature discussed in the Webian design documentation is a command-line interface implemented in the Shell URL bar / text-entry widget: supporting search queries is the first order of business, but arithmetic, app launching, and natural-language questions have been discussed on the mailing list and discussion forum. As several people in the Webian community pointed out, this type of functionality is already available in Mozilla Ubiquity, so here again cross-pollination with another Mozilla-based project seems natural.

But to really stage a direct challenge to ChromeOS, Webian would have to be bundled with an underlying OS. At the moment, Chromeless does not have the access to low-level system hardware that it would need to provide the hardware control described in the Webian Shell plan (although Mozilla's Rainbow does show signs of life in that area). Thus, to develop into a full OS replacement, Webian would almost certainly have to leave cross-platform compatibility behind, and pick a single stack to build upon.

Linux is the obvious first choice, and Francis alluded to that future direction in an email. "How much of Webian's functionality will be cross-platform is an unknown at the moment. The priority will be to build a Linux-based version but if some level of cross platform support can be maintained that would be great."

Of course, whether or not a full Webian-based OS offering would be successful competing against ChromeOS is a different question entirely. Francis and the other contributors are nowhere near the point where they could push Webian as a commercial offering like Google is doing for ChromeOS. Several contributors make the argument on the mailing list that Webian's non-commercial approach makes it more trustworthy for end users, who may be concerned over Google's user-tracking activities, or simply unhappy with ChromeOS's lack of transparent and meritocratic development processes.

No doubt, that stance makes sense to many free software advocates, but it does not do the work of bundling Webian with a Linux-based kernel and providing it as an installable image. The other argument common to the list is that Webian (like Mozilla) is dedicated to full support of open standards. Consequently, for example, it will feature HTML5 video playback of royalty-free formats only, rather than supporting royalty-bearing formats as well.

Frankly, that is a highly speculative line of thinking anyway, and threatens to overshadow what Webian Shell showcases here and now. At the moment, it is a showcase for Mozilla Chromeless — an idea that the browser vendor has been arguing for for years without a visible product to demonstrate. The notion that desktops are dead and the web is the new delivery platform gets considerable airplay in the press, often including the refrain that the open source community is behind the times as its desktop wars rage on. But up until now, ChromeOS was the only end-user-targeted attempt to build a "web desktop" at all, and it was intimately entwined with the proprietary web services offered by Google.

Thus it is good to take a close look at Webian, if for no other reason that to put the notion of the "web desktop" to the test in a Google-free environment. Personally, I certainly hope it continues to push forward, and to accelerate development of some of Mozilla's "lab experiment" projects at the same time. It could serve as a valuable motivator for the free software community on the web services front as well. But if nothing else, it goes to show how lean and fast a Mozilla-based browser can be, once all of that chrome is stripped away.

Comments (10 posted)

Android, forking, and control

By Jonathan Corbet
June 6, 2011
Many words have been said about the relationship between the Android project and the mainline kernel development community. At LinuxCon Japan, James Bottomley took the stage to say a few more. There are, he said, some interesting lessons to be learned from that disconnect. If the development community pays attention to what has been going on, we may be better placed to deal well with such situations in the future.

James started with the statement that Android is, hands down, the most successful Linux distribution ever produced. Its adoption dwarfs that of Linux on the desktop - and on the server too. Android's success is spectacular, but it was achieved by:

  • Forking the kernel,
  • Rewriting the toolchain and C library,
  • Developing a custom Java-based application framework, and
  • Working from an extreme dislike of the GPL

In other words, James said, Android is a poster child for how one should not work in the open source community. They did everything we told them not to, and won big. While we would like the Android developers to change and do some things differently, their success suggests that, perhaps, Android is not the only group in need of change. Maybe the community needs to reevaluate how it weighs code quality against market success; do we, he asked, need a more commercially-oriented metric?

One of the big assumptions brought into this debate is that forking is a bad thing. Android started by forking the kernel and writing its own user space mostly from scratch, and the community has duly condemned these moves. But it is worth understanding what the Android developers were trying to do; Android started by finding adopters first; only then did they get around to actually implementing their system. At that point, the time pressures were severe; they had to have something ready as soon as possible. There is a lot to be said for the development community's patch review and acceptance processes, but they do tend to be somewhat open-ended. Google couldn't wait for that process to run its course before it shipped Android, so there was little choice other than forking the kernel.

Was forking the kernel wrong? In a sense, James said, it cannot be wrong: the GPL guarantees that right, after all. The right is guaranteed because forking is sometimes necessary, and rights are meaningless if they are not exercised. In this specific case, without a fork, the Android project would have had a hard time achieving its goals (with regard to power management and more) in a commercially useful time. The result would have been a delayed Android release which would have led to less success in the market or, perhaps, missing the market window entirely and failing to take off. Forks, in other words, can be good things - they can enable groups to get things done more quickly than going through the community process.

Is forking equal to fragmentation, he asked? It is an important question; fragmentation killed the Unix market back in the 1990's. James claimed that forks which fail do not fragment the community; they simply disappear. Forks which are merged back into their parent project also do not represent fragmentation; they bring their code and their developers back to the original project. The forks which are harmful are those which achieve some success, carrying part of the community with them, and which do not return to the parent project. From that, James said, it follows that it is important for the community to help forks merge back.

The Android developers, beyond forking the kernel, also took the position that the GPL is bad for business. The project's original goal was to avoid GPL-licensed code altogether; the plan was to write a new kernel as well. In the end, a certain amount of reason prevailed, and the (GPL-licensed) Linux kernel was adopted; there are a few other GPL-licensed components as well. So, James said, we can thank Andy Rubin - from whom the dislike of the GPL originates - for conclusively demonstrating that a handset containing GPL-licensed code can be successful in the market. It turns out that downstream vendors really don't care about the licensing of the code in their devices; they only care that it's clear and compliant.

What about Android's special application framework? James said that the Java-based framework is one of the most innovative things about Android; it abstracts away platform details and moves the application layer as far away from the kernel as possible. The framework restricts the API available to applications, giving more control over what those applications do. Given the structure of the system, it seems that rewriting the C library was entirely unnecessary; nobody above the framework makes any sort of direct use of it anyway.

So maybe Android didn't do everything wrong. But there were some mistakes made; the biggest, from James's point of view, was the lack of a calendar which can handle SyncML. That made Android handsets relatively useless for business users. One of the keys to the Blackberry's success was its nice calendaring. Motorola had seen this problem and implemented its own proprietary SyncML calendaring application for the Droid; that actually made things worse, as business users would get an Android handset with the idea that it would work with their calendars. If they ended up with something other than the Droid, they would be disappointed and, eventually, just buy an iPhone instead. Android had no SyncML support until 2.1, when a new, written-from-scratch implementation was added. The cost of this mistake was one year of poor corporate uptake.

The other problem with Android, of course, is its "walled garden" approach to development. Android may be an open-source project, but Google maintains total control over the base release; nobody else even sees the code until Google throws it over the wall. No changes from partners get in, so there is no community around the code, no shared innovation. As an example, Android could have solved its calendar problem much sooner had it been willing to accept help from outside. Google's total control over Android was needed to give the project its market focus. It was a necessary precondition for market dominance, but it is bad for community and has forced Google to reinvent a lot of wheels.

Another big mistake was being sued by Oracle. That suit is based on Android's rewrite of Java which, in turn, was entirely motivated by fear of the GPL. Had Android been built on Oracle's GPL-licensed Java code base, there would have been no suit; Google would have been protected by the GPL's implied patent license. If Oracle wins, rewriting Java will turn out to be a hugely expensive exercise in license avoidance. And the sad fact is that the license is entirely irrelevant: the Java runtime's API constitutes a "bright line" isolating applications from the GPL.

Lessons learned

So what can be learned from all of this? James reiterated that forking can be a good thing, but only if the results are merged back. The Android fork has not been merged back despite a great deal of effort; it's also not clear that the Android developers have bought into the solutions that the kernel community has come up with. Maybe, he said, we need to come up with a way to make merging easier. The community should have a better way of handling this process, which currently tends to get bogged down in review, especially if the fork is large.

Projects which create forks also need to think about their processes. Forks tend to create not-invented-here mentalities which, in turn, lead to a reluctance to show the resulting code. It's no fun to post code that you know is going to be panned by the community. The longer a fork goes, the worse the situation gets; fixing of fundamental design mistakes (which is what wakelocks are in the community's view) gets harder. Preventing this problem requires forks to be more inclusive, post their code more often, and ask the community's advice - even if they do not plan to take that advice. It's important to open the wall and let ideas pass through in both directions.

James talked a bit about "licensing fears," stating that the GPL is our particular version of FUD. The discussions we have in the community about licensing tend to look like scary problems to people in industry; less heat from the community on this subject would do a lot of good. The fear of the GPL is driven by outside interests, but we tend to make it easy for them. The community should be more proactive on this front to allay fears; pointing to Android as an example of how GPL-licensed code can work is one possibility. The Linux Foundation does some of this work, but James thinks that the community needs to help. The GPL, he said, is far easier to comply with than most commercial licensing arrangements; that's a point we need to be making much more clearly.

We should also design more "bright line" systems which make the question of GPL compliance clear. The kernel's user-space ABI is one such system; developers know that user-space code is not considered to be derived from the kernel. Making the boundary easy to understand helps to make the GPL less scary.

The community should do better at fostering and embracing diversity, encouraging forks (which can create significant progress) and helping them to merge back. Currently, James said, the kernel gets a "C - must do better" grade at best here. We only take code from people who look like us; as a result, the Android merge attempt was more painful than it needed to be.

Companies, in turn, should aim for "control by acclamation" rather than control by total ownership. Linus Torvalds was given as an example; he has a lot of control, but only because the community trusts him to do the right thing. In general, if the community trusts you, it will happily hand over a lot of control; that's why the benevolent dictator model is as common as it is. On the other hand, companies which try to assert control through walled garden development or by demanding copyright assignment from contributors have a much harder time with the community.

In summary, James said, Android was a fiasco for everybody involved; we all need to figure out how to do better. We need to find better ways of encouraging and managing forks and allaying licensing fears. Projects which create forks should be thinking about merging back from the outset. Then projects which (like Android) are a commercial success can also be a community success.

[Your editor would like to thank the Linux Foundation for funding his travel to Japan to attend this event.]

Comments (92 posted)

Page editor: Jonathan Corbet

Security

Phantom: Decentralized anonymous networking

By Jake Edge
June 8, 2011

Anonymity on the internet is an interesting problem, for which several different solutions have been implemented (e.g. Tor, Freenet). Creating such a network is an interesting exercise for one thing, but using one is also highly useful to avoid various kinds of internet activity monitoring. While people in relatively free countries may find it useful to avoid their ISP and government's monitoring, activists and others living under more repressive regimes may find it to be much more than that—in some cases, it could make a life-or-death difference. Phantom is another mechanism for providing internet anonymity that has a number of interesting properties.

The Phantom protocol was introduced at DEFCON 16 in 2008 by Magnus Bråding (slides [PPT]) and is designed to provide decentralized anonymity. The idea is that there is no central weak point that can be shut down or attacked to stop the use of Phantom. It also requires end-to-end encryption, unlike Tor and others, so that there is no "exit node" problem, where a compromised or malicious participant can eavesdrop on the communication. In addition, Phantom is designed for higher performance so that large data volumes can be transferred through the network.

One of the most interesting aspects of Phantom is that it requires no changes to existing internet applications. From the perspective of a web browser or other application, it is just using standard-looking IP addresses. In reality, those addresses are Anonymous Protocol (AP) addresses that are handled by the Phantom software. One of the assumptions that Phantom makes is that IP addresses can be mapped to real-life identities (a very sensible assumption), so one of the major goals is to ensure that those cannot leak.

While the internet is used to carry all of the Phantom traffic, that traffic is virtually partitioned from the rest of the internet. Service providers that want to enable anonymous access to their services (e.g. a web server) have to register that service within the Phantom network. Obviously, that registry could be a problem from a decentralization standpoint, but Phantom uses a distributed hash table (DHT) to contain the information. Various large-scale implementations of DHTs, like Kademlia that was used by the eMule peer-to-peer system, are already in existence.

The DHT is known as the "network database" and contains two separate tables. One lists the IP addresses, ports, and properties of the currently connected nodes in the network, while the other has the AP addresses and properties of connected and registered nodes. The two tables are, obviously, not directly correlated as that would defeat the whole purpose. In order to get a "copy" of the DHT, a new node just needs to contact one existing node and join into the distributed database. Lists of valid IP addresses to connect to could come via nearly any mechanism: web sites, email, or even distributed on pieces of paper. If even one of the listed nodes is still valid, a new node can use it to join in.

A client that wants to communicate on the network must set up its own exit node. It does so by choosing a number of other nodes in the network with which to establish a routing path, the last one of which is the exit node. Unlike Tor, there isn't an established set of exit nodes as any system participating in the network can potentially be an exit node. Also unlike Tor, it is the endpoint that chooses its routing path, rather than the network making those decisions. There is a detailed description of the protocol for establishing a routing path in the Phantom design white paper [PDF]. Each step along the path is encrypted using SSL and the paper shows the details of the complicated process of creating the exit node.

Similarly, any services on the network need to create a routing path to an "entry node". In some cases, where the service itself does not require anonymity but wants to provide access for anonymous clients, the entry node may be the server itself. In any case, services register their AP-to-IP address mapping in the DHT using the IP address of the entry node. For services that do wish to remain anonymous, they will still be hidden behind the routing path from that entry node.

Furthermore, nodes create routing tunnels between themselves and their exit or entry node. These tunnels are under the control of the endpoints, not the network or any intermediary (including entry/exit) nodes. Making a connection is then a process of connecting the two routing tunnels together with the exit node of the client connecting to the entry node of the server. These tunnels are bi-directional, and encrypted in such a way that the intermediaries cannot decrypt the traffic, nor can a man-in-the-middle interfere with the communication without detection.

One of the important properties of the system is that nodes do not know whether they are talking to an endpoint or just another node in a routing path. The routing paths themselves can be arbitrarily long, and could even be chained together to provide further isolation as desired.

While the whole scheme seems fiendishly complex, it has been implemented [PDF] by Johannes Schlumberger as part of his Masters Degree work. Performance is, perhaps surprisingly, said to be reasonable: "maxing out a 100 Mb/s network connection for data transfers over multi-hop Phantom routing tunnels, so the crypto overhead does not seem to be significant at all". The code is available under the Hacktivismo Enhanced-Source Software License Agreement (HESSLA), which seems to be a GPL-inspired license with some additional "political" objectives. Based on the README, the implementation uses a tun virtual network device and may be fairly complicated to set up.

Overall, Phantom looks very interesting. Like Tor and others, though, it requires a fairly large number of participating nodes in order to truly be of use. One of the biggest barriers for Tor has been that exit nodes get blamed for the behavior of the traffic that emanates from them. Since that traffic can't be traced further back than the exit nodes (at least hopefully), any criminal or malicious traffic is associated with whoever runs the Tor node. Because services will have to specifically enable anonymous access for Phantom, that may be less of a problem. It may also make Phantom adoption less likely.

It's a bit difficult to see widespread adoption of Phantom (or any of the other anonymous network protocols), though the Electronic Frontier Foundation has been pushing Tor adoption recently. Some kind of solution is clearly needed but, so far, the logistical and legal hurdles seem to be too large for many to overcome. Unfortunately, anonymous networks may fall into the category of "things that are not set up until it's too late". But it is good to see that people are still thinking about, and working on, this problem.

Comments (18 posted)

Brief items

Security quotes of the week

The possible impact on elections using optical scan ballots is more mixed. One positive use is to detect ballot box stuffing---our methods could help identify whether someone replaced a subset of the legitimate ballots with a set of fraudulent ballots completed by herself. On the other hand, our approach could help an adversary with access to the physical ballots or scans of them to undermine ballot secrecy. Suppose an unscrupulous employer uses a bubble form employment application. That employer could test the markings against ballots from an employee's jurisdiction to locate the employee's ballot.
-- Will Clarkson reports on research showing that "bubble forms" may not provide their presumed anonymity

RSA Security Chairman Art Coviello said that the reason RSA had not disclosed the full extent of the vulnerability because doing so would have revealed to the hackers how to perform further attacks. RSA's customers might question this reasoning; the Lockheed Martin incident suggests that the RSA hackers knew what to do anyway—failing to properly disclose the true nature of the attack served only to mislead RSA's customers about the risks they faced.
-- Peter Bright in ars technica about the cracking of RSA's SecurID

Comments (none posted)

New vulnerabilities

couchdb: cross site scripting

Package(s):couchdb CVE #(s):CVE-2010-3854
Created:June 7, 2011 Updated:June 8, 2011
Description: From the CVE entry:

Multiple cross-site scripting (XSS) vulnerabilities in the web administration interface (aka Futon) in Apache CouchDB 0.8.0 through 1.0.1 allow remote attackers to inject arbitrary web script or HTML via unspecified vectors.

Alerts:
Fedora FEDORA-2011-7232 couchdb 2011-05-19

Comments (none posted)

drupal: multiple vulnerabilities

Package(s):drupal CVE #(s):
Created:June 3, 2011 Updated:June 8, 2011
Description: From the Fedora advisory:

Multiple vulnerabilities and weaknesses were discovered in Drupal.

.... Reflected cross site scripting vulnerability in error handler

A reflected cross site scripting vulnerability was discovered in Drupal's error handler. Drupal displays PHP errors in the messages area, and a specially crafted URL can cause malicious scripts to be injected into the message. The issue can be mitigated by disabling on-screen error display at admin/settings/error-reporting. This is the recommended setting for production sites.

This issue affects Drupal 6.x only.

.... Cross site scripting vulnerability in Color module

When using re-colorable themes, color inputs are not sanitized. Malicious color values can be used to insert arbitrary CSS and script code. Successful exploitation requires the "Administer themes" permission.

This issue affects Drupal 6.x and 7.x.

.... Access bypass in File module

When using private files in combination with a node access module, the File module allows unrestricted access to private files.

This issue affects Drupal 7.x only.

Alerts:
Fedora FEDORA-2011-7546 drupal 2011-05-26
Fedora FEDORA-2011-7578 drupal 2011-05-26
Fedora FEDORA-2011-7588 drupal7 2011-05-26
Fedora FEDORA-2011-7559 drupal6 2011-05-26
Fedora FEDORA-2011-7575 drupal7 2011-05-26

Comments (none posted)

fetchmail: denial of service

Package(s):fetchmail CVE #(s):CVE-2011-1947
Created:June 7, 2011 Updated:June 21, 2011
Description: From the Mandriva advisory:

fetchmail 5.9.9 through 6.3.19 does not properly limit the wait time after issuing a (1) STARTTLS or (2) STLS request, which allows remote servers to cause a denial of service (application hang) by acknowledging the request but not sending additional packets.

Alerts:
Fedora FEDORA-2011-8021 fetchmail 2011-06-08
Fedora FEDORA-2011-8059 fetchmail 2011-06-08
Fedora FEDORA-2011-8011 fetchmail 2011-06-08
Slackware SSA:2011-171-01 fetchmail 2011-06-21
Mandriva MDVSA-2011:107 fetchmail 2011-06-07

Comments (none posted)

flash-plugin: cross-site scripting

Package(s):flash-plugin CVE #(s):CVE-2011-2107
Created:June 6, 2011 Updated:June 13, 2011
Description: From the Red Hat advisory:

This update fixes one vulnerability in Adobe Flash Player. This vulnerability is detailed on the Adobe security page APSB11-13, listed in the References section.

Alerts:
Gentoo 201110-11 adobe-flash 2011-10-13
SUSE SUSE-SU-2011:0614-1 flash-player 2011-06-13
openSUSE openSUSE-SU-2011:0612-1 flash-player 2011-06-08
Red Hat RHSA-2011:0850-01 flash-plugin 2011-06-06

Comments (none posted)

java: multiple vulnerabilities

Package(s):java-1.6.0-sun java-1.6.0-openjdk CVE #(s):CVE-2011-0802 CVE-2011-0814 CVE-2011-0862 CVE-2011-0863 CVE-2011-0864 CVE-2011-0865 CVE-2011-0867 CVE-2011-0868 CVE-2011-0869 CVE-2011-0871 CVE-2011-0873
Created:June 8, 2011 Updated:September 28, 2011
Description: Java implementations suffer from a number of vulnerabilities:

  • CVE-2011-0802, CVE-2011-0814, CVE-2011-0863, CVE-2011-0873: "unspecified vulnerabilities"
  • CVE-2011-0862: integer overflows
  • CVE-2011-0864: JVM memory corruption by specific byte code
  • CVE-2011-0865: SignedObjects can be made mutable
  • CVE-2011-0867: information leak
  • CVE-2011-0868: "incorrect numeric type conversion"
  • CVE-2011-0869: unprivileged proxy settings
  • CVE-2011-0871: excessively privileged objects created
Alerts:
Gentoo 201406-32 icedtea-bin 2014-06-29
Debian DSA-2358-1 openjdk-6 2011-12-05
Gentoo 201111-02 sun-jdk 2011-11-05
Debian DSA-2311-1 openjdk-6 2011-09-27
SUSE SUSE-SU-2011:0966-1 IBM Java 2011-08-30
SUSE SUSE-SA:2011:036 java-1_4_2-ibm 2011-08-29
Red Hat RHSA-2011:1159-01 java-1.4.2-ibm 2011-08-15
Mandriva MDVSA-2011:126 java-1.6.0-openjdk 2011-08-15
SUSE SUSE-SU-2011:0863-2 IBM Java5 JRE and SDK 2011-08-05
SUSE SUSE-SU-2011:0863-1 java 2011-08-02
Red Hat RHSA-2011:1087-01 java-1.5.0-ibm 2011-07-22
SUSE SUSE-SU-2011:0807-1 java 2011-07-19
SUSE SUSE-SA:2011:030 java-1_6_0-ibm 2011-07-18
Red Hat RHSA-2011:0938-01 java-1.6.0-ibm 2011-07-15
Pardus 2011-95 sun-jdk sun-jre 2011-07-11
openSUSE openSUSE-SU-2011:0706-1 java-1_6_0-openjdk 2011-06-28
Scientific Linux SL-java-20110621 java 2011-06-21
Ubuntu USN-1154-1 openjdk-6, openjdk-6b18 2011-06-17
Fedora FEDORA-2011-8028 java-1.6.0-openjdk 2011-06-08
SUSE SUSE-SA:2011:032 java-1_5_0-ibm,IBMJava5 2011-08-04
Fedora FEDORA-2011-8020 java-1.6.0-openjdk 2011-06-08
SUSE SUSE-SU-2011:0632-1 java 2011-06-14
SUSE openSUSE-SU-2011:0633-1 java 2011-06-14
CentOS CESA-2011:0857 java-1.6.0-openjdk 2011-06-13
Fedora FEDORA-2011-8003 java-1.6.0-openjdk 2011-06-08
Scientific Linux SL-java-20110608 java-1.6.0-openjdk 2011-06-08
Scientific Linux SL-java-20110608 java-1.6.0-openjdk 2011-06-08
Red Hat RHSA-2011:0860-01 java-1.6.0-sun 2011-06-08
Red Hat RHSA-2011:0856-01 java-1.6.0-openjdk 2011-06-08
Red Hat RHSA-2011:0857-01 java-1.6.0-openjdk 2011-06-08

Comments (none posted)

libxml2: arbitrary code execution

Package(s):libxml2 CVE #(s):
Created:June 6, 2011 Updated:June 8, 2011
Description: From the Debian advisory:

Chris Evans discovered that libxml was vulnerable to buffer overflows, which allowed a crafted XML input file to potentially execute arbitrary code.

Alerts:
Debian DSA-2255-1 libxml2 2011-06-06

Comments (none posted)

oprofile: command injection/privilege escalation

Package(s):oprofile CVE #(s):CVE-2011-1760
Created:June 6, 2011 Updated:July 26, 2011
Description: From the Debian advisory:

OProfile is a performance profiling tool which is configurable by opcontrol, its control utility. Stephane Chauveau reported several ways to inject arbitrary commands in the arguments of this utility. If a local unprivileged user is authorized by sudoers file to run opcontrol as root, this user could use the flaw to escalate his privileges.

Alerts:
Gentoo 201412-09 racer-bin, fmod, PEAR-Mail, lvm2, gnucash, xine-lib, lastfmplayer, webkit-gtk, shadow, PEAR-PEAR, unixODBC, resource-agents, mrouted, rsync, xmlsec, xrdb, vino, oprofile, syslog-ng, sflowtool, gdm, libsoup, ca-certificates, gitolite, qt-creator 2014-12-11
Fedora FEDORA-2011-8087 oprofile 2011-06-10
Fedora FEDORA-2011-8076 oprofile 2011-06-10
Debian DSA-2254-2 oprofile 2011-07-11
Ubuntu USN-1166-1 oprofile 2011-07-11
Debian DSA-2254-1 oprofile 2011-06-03

Comments (none posted)

phpMyAdmin: multiple vulnerabilities

Package(s):phpMyAdmin CVE #(s):
Created:June 6, 2011 Updated:June 15, 2011
Description:

From the phpMyAdmin advisories [1, 2]:

PMASA-2011-3: XSS vulnerability on Tracking page - It was possible to create a crafted table name that leads to XSS.

PMASA-2011-4: URL redirection to untrusted site - It was possible to redirect to an arbitrary, untrusted site, leading to a possible phishing attack.

Alerts:
Fedora FEDORA-2011-7702 phpMyAdmin 2011-05-30
Fedora FEDORA-2011-7703 phpMyAdmin 2011-05-30
Fedora FEDORA-2011-7684 phpMyAdmin 2011-05-30

Comments (none posted)

tor: denial of service

Package(s):tor CVE #(s):
Created:June 7, 2011 Updated:June 8, 2011
Description: From the Red Hat bugzilla:

A vulnerability in Tor was reported that could allow a malicious remote attacker to cause a denial of service. This vulnerability is due to a boundary error within the policy_summarize() function in src/or/policies.c which can be exploited to crash a Tor directory authority.

Alerts:
Fedora FEDORA-2011-7972 tor 2011-06-07

Comments (none posted)

Page editor: Jake Edge

Kernel development

Brief items

Kernel release status

The current development kernel is 3.0-rc2, released on June 6. "It's been reasonably quiet, although the btrfs update is bigger than I was hoping for. Other than that, it's mostly driver fixes, some ubifs updates too, and a few reverts for the early regressions." The short changelog is in the announcement, or see the full changelog for the details.

Stable updates: no stable updates have been released in the last week, and none are in the review process as of this writing.

Comments (none posted)

Quotes of the week

Bugs are like mushrooms - found one, look around for more...
-- Al Viro

Maximizing security is hard: whether a bug has security implications is highly usecase and bug dependent, and the true security impact of bugs is not discovered in the majority of cases. I estimate that in *ANY* OS there's probably at least 10 times more bugs with some potential security impact than ever get a CVE number...

So putting CVEs into the changelog is harmful, pointless, misleading and would just create a fake "scare users" and "gain attention" industry (coupled with a "delay bug fixes for a long time" aspect, if paid well enough) that operates based on issuing CVEs and 'solving' them - which disincentivises the *real* bugfixes and the non-self-selected bug fixers.

I'd like to strengthen the natural 'bug fixing' industry, not the security circus industry.

-- Ingo Molnar

Comments (47 posted)

Ext4 snapshot concerns

By Jonathan Corbet
June 8, 2011
The next3 filesystem patch, which added snapshots to the ext3 filesystem, appeared just over one year ago; LWN's discussion of the patch at the time concluded that it needed to move forward to ext4 before it could possibly be merged. That change has been made, and recent ext4 snapshot patches are starting to look close to being ready for merging into the mainline. That has inspired the airing of new concerns which may slow the process somewhat.

One complaint came from Josef Bacik:

I probably should have brought this up before, but why put all this effort into shoehorning in such a big an invasive feature to ext4 when btrfs does this all already? Why not put your efforts into helping btrfs become stable and ready and then use that, instead of having to come up with a bunch of hacks to get around the myriad of weird feature combinations you can get with ext4?

Snapshot developer Amir Goldstein's response is that his employer (CTERA Networks) wanted the feature in ext4. The feature is shipping in products now, and btrfs is still not seen as stable enough to use in that environment.

There are general concerns about merging another big feature into a filesystem which is supposed to be stable and ready for production use. Nobody wants to see the addition of serious bugs to ext4 at this time. Beyond that, the snapshot feature does not currently work with all variants of the ext4 on-disk format. There are a number of ext4 features which do not currently play well together, leading Eric Sandeen to worry about where the filesystem is going:

If ext4 matches the lifespan of ext3, in 10 years I fear that it will look more like a collection of various individuals' pet projects, rather than any kind of well-designed, cohesive project. How long can we really keep adding features which are semi- or wholly- incompatible with other features?

Consider this a cry in the wilderness for less rushed feature introduction, and a more holistic approach to ext4 design...

Ext4 maintainer Ted Ts'o has responded with a rare (for the kernel community) admission that technical concerns are not the sole driver of feature-merging decisions:

It's something I do worry about; and I do share your concern. At the same time, the reality is that we are a little like the Old Dutch Masters, who had take into account the preference of their patrons (i.e., in our case, those who pay our paychecks :-).

In this case, he thinks that there are a lot of people who are interested in the snapshot feature. He worried that companies like CTERA could move away from ext4 if it can't be made to meet their needs. So his plan is to merge snapshots once (1) the patches are good enough and (2) it looks like there is a plan to address the remaining issues.

Comments (4 posted)

Kernel development news

On vsyscalls and the vDSO

By Jonathan Corbet
June 8, 2011
The "vsyscall" and "vDSO" segments are two mechanisms used to accelerate certain system calls in Linux. While their basic function (provide fast access to functionality which does not need to run in kernel mode) is the same, there are some distinct differences between them. Recently vsyscall has come to be seen as an enabler of security attacks, so some patches have been put together to phase it out. The discussion of those patches shows that the disagreement over how security issues are handled by the community remains as strong as ever.

The vsyscall area is the older of these two mechanisms. It was added as a way to execute specific system calls which do not need any real level of privilege to run. The classic example is gettimeofday(); all it needs to do is to read the kernel's idea of the current time. There are applications out there that call gettimeofday() frequently, to the point that they care about even a little bit of overhead. To address that concern, the kernel allows the page containing the current time to be mapped read-only into user space; that page also contains a fast gettimeofday() implementation. Using this virtual system call, the C library can provide a fast gettimeofday() which never actually has to change into kernel mode.

Vsyscall has some limitations; among other things, there is only space for a handful of virtual system calls. As those limitations were hit, the kernel developers introduced the more flexible vDSO implementation. A quick look on a contemporary system will show that both are still in use:

    $ cat /proc/self/maps
    ...
    7fffcbcb7000-7fffcbcb8000 r-xp 00000000 00:00 0            [vdso]
    ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0    [vsyscall]

The key to the current discussion can be seen by typing the same command again and comparing the output:

    7fff379ff000-7fff37a00000 r-xp 00000000 00:00 0             [vdso]
    ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0     [vsyscall]

Note that the vDSO area has moved, while the vsyscall page remains at the same location. The location of the vsyscall page is nailed down in the kernel ABI, but the vDSO area - like most other areas in the user-space memory layout - has its location randomized every time it is mapped.

Address-space layout randomization is a form of defense against security holes. An attacker who is able to overrun the stack can often arrange for a function in the target process to "return" to an arbitrary address. Depending on what instructions are found at that address, this return can cause almost anything to happen. Returning into the system() function in the C library is an obvious example; it can be used to execute arbitrary commands. If the location of the C library in memory is not known, though, then it becomes difficult or impossible for an exploit to jump into a useful place.

There is no system() function in the vsyscall page, but there are several machine instructions that invoke system calls. With just a bit of setup, these instructions might be usable in a stack overrun attack to invoke an arbitrary system call with attacker-defined parameters - not a desirable outcome. So it would be nice to get rid of - or at least randomize the location of - the vsyscall page to thwart this type of attack. Unfortunately, applications depend on the existence and exact address of that page, so nothing can be done.

Except that Andrew Lutomirski found something that could be done: remove all of the useful instructions from the vsyscall page. One was associated with the vsyscall64 sysctl knob, which is really only useful for user-mode Linux (and does not work properly even there); it was simply deleted. Others weren't actually system call instructions as such: the system time, if jumped into (and, thus, executed as if it were code) when it held just the right value, looks like a system call instruction. To address that problem, variables have been moved into a separate page with execute permission turned off.

The remaining code in the vsyscall page has simply been removed and replaced by a special trap instruction. An application trying to call into the vsyscall page will trap into the kernel, which will then emulate the desired virtual system call in kernel space. The result is a kernel system call emulating a virtual system call which was put there to avoid the kernel system call in the first place. The result is a "vsyscall" which takes a fraction of a microsecond longer to execute but, crucially, does not break the existing ABI. In any case, the slowdown will only be seen if the application is trying to use the vsyscall page instead of the vDSO.

Contemporary applications should not be doing that most of the time, except for one little problem: glibc still uses the vsyscall version of time(). That has been fixed in the glibc repository, but the fix may not find its way out to users for a while; meanwhile, time() calls will be a little slower than they were before. That should not really be an issue, but one never knows, so Andy put in a configuration option to preserve the old way of doing things. Anybody worried about the overhead of an emulated vsyscall page can set CONFIG_UNSAFE_VSYSCALLS to get the old behavior.

Nobody really objected to the patch series as a whole, but Linus hated the name of the configuration option; he asked that it be called CONFIG_LEGACY_VSYSCALLS instead. Or, even better, the change could just be done unconditionally. That led to a fairly predictable response from the PaX developer on how the kernel community likes to hide security problems, to which Linus said:

Calling the old vdso "UNSAFE" as a config option is just plain stupid. It's a politicized name, with no good reason except for your political agenda. And when I call it out as such, you just spout the same tired old security nonsense.

Suffice to say that the conversation went downhill from there; interested parties can follow the thread links in the messages cited above.

One useful point from that discussion is that the static vsyscall page is not, in fact, a security vulnerability; it's simply a resource which can make it easier for an attacker to exploit a vulnerability elsewhere in the system. Whether that aspect makes that page "unsafe" or merely "legacy" is left as an exercise for the reader. Either way, removing it is seen as a good idea even though that removal might, arguably, cause real security bugs to remain unfixed in the kernel; the argument is all about naming.

Final versions of the patches have not been posted as of this writing, but the shape they will take is fairly clear. The static vsyscall page will not continue to exist in its current form, and applications which still use it will continue to work but will get a little bit slower. The configuration option controlling this behavior may or may not exist, but any distribution shipping a kernel containing this change (presumably 3.1 or later) will also have a C library which no longer tries to use the vsyscall page. And, with luck, exploiting vulnerabilities will get just a little bit harder.

Comments (9 posted)

Memory power management

By Jonathan Corbet
June 7, 2011
Efforts to reduce power consumption on Linux systems have often focused on the CPU; that emphasis does make sense, since the CPU draws a significant portion of the power used on most systems. After years of effort to improve the kernel's power behavior, add instrumentation to track wakeups, and fix misbehaving applications, Linux does quite well when it comes to CPU power. So now attention is moving to other parts of the system in search of power savings to be had; one of those places is main memory. Contemporary DRAM memory requires power for its self-refresh cycles even if it is not being used; might there be a way to reduce its consumption?

One technology which is finding its way into some systems is called "partial array self refresh" or PASR. On a PASR-enabled system, memory is divided into banks, each of which can be powered down independently. If (say) half of memory is not needed, that memory (and its self-refresh mechanism) can be turned off; the result is a reduction in power use, but also the loss of any data stored in the affected banks. The amount of power actually saved is a bit unclear; estimates seem to run in the range of 5-15% of the total power used by the memory subsystem.

The key to powering down a bank of memory, naturally, is to be sure that there is no important data stored therein first. That means that the system must either evacuate a bank to be powered down, or it must take care not to allocate memory there in the first place. So the memory management subsystem will have to become aware of the power topology of main memory and take that information into account when satisfying allocation requests. It will also have to understand the desired power management policy and make decisions to power banks up or down depending on the current level of memory pressure. This is going to be fun: memory management is already a complicated set of heuristics which attempt to provide reasonable results for any workload; adding power management into the mix can only complicate things further.

A recent patch set from Ankita Garg does not attempt to solve the whole problem; instead, it creates an initial infrastructure which can be used for future power management decisions. Before looking at that patch, though, a bit of background will be helpful.

The memory management subsystem already splits available memory at two different levels. On non-uniform memory access (NUMA) systems, memory which is local to a specific processor will be faster to access than memory on a different processor. The kernel's memory management code takes NUMA nodes into account to implement specific allocation policies. In many cases, the system will try to keep a process and all of its memory on the same NUMA node in the hope of maximizing the number of local accesses; other times, it is better to spread allocations evenly across the system. The point is that the NUMA node must be taken into account for all allocation and reclaim decisions.

The other important concept is that of a "zone"; zones are present on all systems. The primary use of zones is to categorize memory by accessibility; 32-bit systems, for example, will have "low memory" and "high memory" zones to contain memory which can and cannot (respectively) be directly accessed by the kernel. Systems may have a zone for memory accessible with a 32-bit address; many devices can only perform DMA to such addresses. Zones are also used to separate memory which can readily be relocated (user-space pages accessed through page tables, for example) from memory which is hard to move (kernel memory for which there may be an arbitrary number of pointers). Every NUMA node has a full set of zones.

PASR has been on the horizon for a little while, so a few people have been thinking about how to support it; one of the early works would appear to be this paper by Henrik Kjellberg, though that work didn't result in code submitted upstream. Henrik pointed out that the kernel already has a couple of mechanisms which could be used to support PASR. One of those is memory hotplug, wherein memory can be physically removed from the system. Turning off a bank of memory can be thought of as being something close to removing that memory, so it makes sense to consider hotplug. Hotplug is a heavyweight operation, though; it is not well suited to power management, where decisions to power banks of memory up or down may be made fairly often.

Another approach would be to use zones; the system could set up a separate zone for each memory bank which could be powered down independently. Powering down a bank would then be a matter of moving needed data out of the associated zone and marking that zone so that no further allocations would be made from it. The problem with this approach is a number of important memory management operations happen at the zone level; in particular, each zone has a set of limits on how many free pages must exist. Adding more zones would increase memory management overhead and create balancing problems which don't need to exist.

That is the approach that Ankita has taken, though; the patch adds another level of description called "regions" interposed between nodes and zones, essentially creating not just one new zone for each bank of memory, but a complete set of zones for each. The page allocator will always try to obtain pages from the lowest-numbered region it can in the hope that the higher regions will remain vacant. Over time, of course, this simple approach will not work and it will become necessary to migrate pages out of regions before they can be powered down. The initial patch does not address that issue, though - or any of the associated policy issues that come up.

Your editor is not a memory management hacker, but ignorance has never kept him from having an opinion on things. To a naive point of view, it would almost seem like this design has been done backward - that regions should really be contained within zones. That would avoid multiplying the number of zones in the system and the associated balancing costs. Also, importantly, it would allow regions to be controlled by the policy of a single enclosing zone. In particular, regions inside a zone used for movable allocations would be vacated with relative ease, allowing them to be powered down when memory pressure is light. Placing multiple zones within each region, instead, would make clearing a region harder.

The patch set has not gotten a lot of review attention; the people who know what they are talking about in this area have mostly kept silent. There are numerous memory management patches circulating at the moment, so time for review is probably scarce. Andrew Morton did ask about the overhead of this work on machines which lack the PASR capability and about how much power might actually be saved; answers to those questions don't seem to be available at the moment. So one might conclude that this patch set, while demonstrating an approach to memory power management, will not be ready for mainline inclusion in the near future. But, then, adding power management to such a tricky subsystem was never going to be done in a hurry.

Comments (17 posted)

Object-oriented design patterns in the kernel, part 2

June 7, 2011

This article was contributed by Neil Brown

In the first part of this analysis we looked at how the polymorphic side of object-oriented programming was implemented in the Linux kernel using regular C constructs. In particular we examined method dispatch, looked at the different forms that vtables could take, and the circumstances where separate vtables were eschewed in preference for storing function pointers directly in objects. In this conclusion we will explore a second important aspect of object-oriented programming - inheritance, and in particular data inheritance.

Data inheritance

Inheritance is a core concept of object-oriented programming, though it comes in many forms, whether prototype inheritance, mixin inheritance, subtype inheritance, interface inheritance etc., some of which overlap. The form that is of interest when exploring the Linux kernel is most like subtype inheritance, where a concrete or "final" type inherits some data fields from a "virtual" parent type. We will call this "data inheritance" to emphasize the fact that it is the data rather than the behavior that is being inherited.

Put another way, a number of different implementations of a particular interface share, and separately extend, a common data structure. They can be said to inherit from that data structure. There are three different approaches to this sharing and extending that can be found in the Linux kernel, and all can be seen by exploring the struct inode structure and its history, though they are widely used elsewhere.

Extension through unions

The first approach, which is probably the most obvious but also the least flexible, is to declare a union as one element of the common structure and, for each implementation, to declare an entry in that union with extra fields that the particular implementation needs. This approach was introduced to struct inode in Linux-0.97.2 (August 1992) when

       union {
               struct minix_inode_info minix_i;
               struct ext_inode_info ext_i;
               struct msdos_inode_info msdos_i;
       } u;

was added to struct inode. Each of these structures remained empty until 0.97.5 when i_data was moved from struct inode to struct ext_inode_info. Over the years several more "inode_info" fields were added for different filesystems, peaking at 28 different "inode_info" structures in 2.4.14.2 when ext3 was added.

This approach to data inheritance is simple and straightforward, but is also somewhat clumsy. There are two obvious problems. Firstly, every new filesystem implementation needs to add an extra field to the union "u". With 3 fields this may not seem like a problem, with 28 it was well past "ugly". Requiring every filesystem to update this one structure is a barrier to adding filesystems that is unnecessary. Secondly, every inode allocated will be the same size and will be large enough to store the data for any filesystem. So a filesystem that wants lots of space in its "inode_info" structure will impose that space cost on every other filesystem.

The first of these issues is not an impenetrable barrier as we will see shortly. The second is a real problem and the general ugliness of the design encouraged change. Early in the 2.5 development series this change began; it was completed by 2.5.7 when there were no "inode_info" structures left in union u (though the union itself remained until 2.6.19).

Embedded structures

The change that happened to inodes in early 2.5 was effectively an inversion. The change which removed ext3_i from struct inode.u also added a struct inode, called vfs_inode, to struct ext3_inode_info. So instead of the private structure being embedded in the common data structure, the common data structure is now embedded in the private one. This neatly avoids the two problems with unions; now each filesystem needs to only allocate memory to store its own structure without any need to know anything about what other filesystems might need. Of course nothing ever comes for free and this change brought with it other issues that needed to be solved, but the solutions were not costly.

The first difficulty is the fact that when the common filesystem code - the VFS layer - calls into a specific filesystem it passes a pointer to the common data structure, the struct inode. Using this pointer, the filesystem needs to find a pointer to its own private data structure. An obvious approach is to always place the struct inode at the top of the private inode structure and simply cast a pointer to one into a pointer to the other. While this can work, it lacks any semblance of type safety and makes it harder to arrange fields in the inode to get optimal performance - as some kernel developers are wont to do.

The solution was to use the list_entry() macro to perform the necessary pointer arithmetic, subtracting from the address of the struct inode its offset in the private data structure and then casting this appropriately. The macro for this was called list_entry() simply because the "list.h lists" implementation was the first to use this pattern of data structure embedding. The list_entry() macro did exactly what was needed and so it was used despite the strange name. This practice lasted until 2.5.28 when a new container_of() macro was added which implemented the same functionality as list_entry(), though with slightly more type safety and a more meaningful name. With container_of() it is a simple matter to map from an embedded data structure to the structure in which it is embedded.

The second difficulty was that the filesystem had to be responsible for allocating the inode - it could no longer be allocated by common code as the common code did not have enough information to allocate the correct amount of space. This simply involved adding alloc_inode() and destroy_inode() methods to the super_operations structure and calling them as appropriate.

Void pointers

As noted earlier, the union pattern was not an impenetrable barrier to adding new filesystems independently. This is because the union u had one more field that was not an "inode_info" structure. A generic pointer field called generic_ip was added in Linux-1.0.5, but it was not used until 1.3.7. Any file system that does not own a structure in struct inode itself could define and allocate a separate structure and link it to the inode through u.generic_ip. This approach addressed both of the problems with unions as no changes are needed to shared declarations and each filesystem only uses the space that it needs. However it again introduced new problems of its own.

Using generic_ip, each filesystem required two allocations for each inode instead of one and this could lead to more wastage depending on how the structure size was rounded up for allocation; it also required writing more error-handling code. Also there was memory used for the generic_ip pointer and often for a back pointer from the private structure to the common struct inode. Both of these are wasted space compared with the union approach or the embedding approach.

Worse than this though, an extra memory dereference was needed to access the private structure from the common structure; such dereferences are best avoided. Filesystem code will often need to access both the common and the private structures. This either requires lots of extra memory dereferences, or it requires holding the address of the private structure in a register which increases register pressure. It was largely these concerns that stopped struct inode from ever migrating to broad use of the generic_ip pointer. It was certainly used, but not by the major, high-performance filesystems.

Though this pattern has problems it is still in wide use. struct super_block has an s_fs_info pointer which serves the same purpose as u.generic_ip (which has since been renamed to i_private when the u union was finally removed - why it was not completely removed is left as an exercise for the reader). This is the only way to store filesystem-private data in a super_block. A simple search in the Linux include files shows quite a collection of fields which are void pointers named "private" or something similar. Many of these are examples of the pattern of extending a data type by using a pointer to a private extension, and most of these could be converted to using the embedded-structure pattern.

Beyond inodes

While inodes serve as an effective vehicle to introduce these three patterns they do not display the full scope of any of them so it is useful to look further afield and see what else we can learn.

A survey of the use of unions elsewhere in the kernel shows that they are widely used though in very different circumstances than in struct inode. The particular aspect of inodes that is missing elsewhere is that a wide range of different modules (different filesystems) each wanted to extend an inode in different ways. In most places where unions are used there are a small fixed number of subtypes of the base type and there is little expectation of more being added. A simple example of this is struct nfs_fattr which stores file attribute information decoded out of an NFS reply. The details of these attributes are slightly different for NFSv2 and NFSv3 so there are effectively two subtypes of this structure with the difference encoded in a union. As NFSv4 uses the same information as NFSv3 this is very unlikely to ever be extended further.

A very common pattern in other uses of unions in Linux is for encoding messages that are passed around, typically between the kernel and user-space. struct siginfo is used to convey extra information with a signal delivery. Each signal type has a different type of ancillary information, so struct siginfo has a union to encode six different subtypes. union inputArgs appears to be the largest current union with 22 different subtypes. It is used by the "coda" network file system to pass requests between the kernel module and a user-space daemon which handles the network communication.

It is not clear whether these examples should be considered as the same pattern as the original struct inode. Do they really represent different subtypes of a base type, or is it just one type with internal variants? The Eiffel object-oriented programming language does not support variant types at all except through subtype inheritance so there is clearly a school of thought that would want to treat all usages of union as a form of subtyping. Many other languages, such as C++, provide both inheritance and unions allowing the programmer to make a choice. So the answer is not clear.

For our purposes it doesn't really matter what we call it as long as we know where to use each pattern. The examples in the kernel fairly clearly show that when all of the variants are understood by a single module, then a union is a very appropriate mechanism for variants structures, whether you want to refer to them as using data inheritance or not. When different subtypes are managed by different modules, or at least widely separate pieces of code, then one of the other mechanisms is preferred. The use of unions for this case has almost completely disappeared with only struct cycx_device remaining as an example of a deprecated pattern.

Problems with void pointers

Void pointers are not quite so easy to classify. It would probably be fair to say that void pointers are the modern equivalent of "goto" statements. They can be very useful but they can also lead to very convoluted designs. A particular problem is that when you look at a void pointer, like looking at a goto, you don't really know what it is pointing at. A void pointer called private is even worse - it is like a "goto destination" command - almost meaningless without reading lots of context.

Examining all the different uses that void pointers can be put to would be well beyond the scope of this article. Instead we will restrict our attention to just one new usage which relates to data inheritance and illustrates how the untamed nature of void pointers makes it hard to recognize their use in data inheritance. The example we will use to explain this usage is struct seq_file used by the seq_file library which makes it easy to synthesize simple text files like some of those in /proc. The "seq" part of seq_file simply indicates that the file contains a sequence of lines corresponding to a sequence of items of information in the kernel, so /proc/mounts is a seq_file which walks through the mount table reporting each mount on a single line.

When seq_open() is used to create a new seq_file it allocates a struct seq_file and assigns it to the private_data field of the struct file which is being opened. This is a straightforward example of void pointer based data inheritance where the struct file is the base type and the struct seq_file is a simple extension to that type. It is a structure that never exists by itself but is always the private_data for some file. struct seq_file itself has a private field which is a void pointer and it can be used by clients of seq_file to add extra state to the file. For example md_seq_open() allocates a struct mdstat_info structure and attaches it via this private field, using it to meet md's internal needs. Again, this is simple data inheritance following the described pattern.

However the private field of struct seq_file is used by svc_pool_stats_open() in a subtly but importantly different way. In this case the extra data needed is just a single pointer. So rather than allocating a local data structure to refer to from the private field, svc_pool_stats_open simply stores that pointer directly in the private field itself. This certainly seems like a sensible optimization - performing an allocation to store a single pointer would be a waste - but it highlights exactly the source of confusion that was suggested earlier: that when you look at a void pointer you don't really know what is it pointing at, or why.

To make it a bit clearer what is happening here, it is helpful to imagine "void *private" as being like a union of every different possible pointer type. If the value that needs to be stored is a pointer, it can be stored in this union following the "unions for data inheritance" pattern. If the value is not a single pointer, then it gets stored in allocated space following the "void pointers for data inheritance" pattern. Thus when we see a void pointer being used it may not be obvious whether it is being used to point to an extension structure for data inheritance, or being used as an extension for data inheritance (or being used as something else altogether).

To highlight this issue from a slightly different perspective it is instructive to examine struct v4l2_subdev which represents a sub-device in a video4linux device, such as a sensor or camera controller within a webcam. According to the (rather helpful) documentation it is expected that this structure will normally be embedded in a larger structure which contains extra state. However this structure still has not just one but two void pointers, both with names suggesting that they are for private use by subtypes:

	/* pointer to private data */
	void *dev_priv;
	void *host_priv;

It is common that a v4l sub-device (a sensor, usually) will be realized by, for example, an I2C device (much as a block device which stores your filesystem might be realized by an ATA or SCSI device). To allow for this common occurrence, struct v4l2_subdev provides a void pointer (dev_priv), so that the driver itself doesn't need to define a more specific pointer in the larger structure which struct v4l2_subdev would be embedded in. host_priv is intended to point back to a "parent" device such as a controller which acquires video data from the sensor. Of the three drivers which use this field, one appears to follow that intention while the other two use it to point to an allocated extension structure. So both of these pointers are intended to be used following the "unions for data inheritance" pattern, where a void pointer is playing the role of a union of many other pointer types, but they are not always used that way.

It is not immediately clear that defining this void pointer in case it is useful is actually a valuable service to provide given that the device driver could easily enough define its own (type safe) pointer in its extension structure. What is clear is that an apparently "private" void pointer can be intended for various qualitatively different uses and, as we have seen in two different circumstances, they may not be used exactly as expected.

In short, recognizing the "data inheritance through void pointers" pattern is not easy. A fairly deep examination of the code is needed to determine the exact purpose and usage of void pointers.

A diversion into struct page

Before we leave unions and void pointers behind a look at struct page may be interesting. This structure uses both of these patterns, though they are hidden somewhat due to historical baggage. This example is particularly instructive because it is one case where struct embedding simply is not an option.

In Linux memory is divided into pages, and these pages are put to a variety of different uses. Some are in the "page cache" used to store the contents of files. Some are "anonymous pages" holding data used by applications. Some are used as "slabs" and divided into pieces to answer kmalloc() requests. Others are simply part of a multi-page allocation or maybe are on a free list waiting to be used. Each of these different use cases could be seen as a subtype of the general class of "page", and in most cases need some dedicated fields in struct page, such as a struct address_space pointer and index when used in the page cache, or struct kmem_cache and freelist pointers when used as a slab.

Each page always has the same struct page describing it, so if the effective type of the page is to change - as it must as the demands for different uses of memory change over time - the type of the struct page must change within the lifetime of that structure. While many type systems are designed assuming that the type of an object is immutable, we find here that the kernel has a very real need for type mutability. Both unions and void pointers allow types to change and as noted, struct page uses both.

At the first level of subtyping there are only a small number of different subtypes as listed above; these are all known to the core memory management code, so a union would be ideal here. Unfortunately struct page has three unions with fields for some subtypes spread over all three, thus hiding the real structure somewhat.

When the primary subtype in use has the page being used in the page cache, the particular address_space that it belongs to may want to extend the data structure further. For this purpose there is a private field that can be used. However it is not a void pointer but is an unsigned long. Many places in the kernel assume an unsigned long and a void * are the same size and this is one of them. Most users of this field actually store a pointer here and have to cast it back and forth. The "buffer_head" library provides macros attach_page_buffers and page_buffers to set and get this field.

So while struct page is not the most elegant example, it is an informative example of a case where unions and void pointers are the only option for providing data inheritance.

The details of structure embedding

Where structure embedding can be used, and where the list of possible subtypes is not known in advance, it seems to be increasingly the preferred choice. To gain a full understanding of it we will again need to explore a little bit further than inodes and contrast data inheritance with other uses of structure embedding.

There are essentially three uses for structure embedding - three reasons for including a structure within another structure. Sometimes there is nothing particularly interesting going on. Data items are collected together into structures and structures within structures simply to highlight the closeness of the relationships between the different items. In this case the address of the embedded structure is rarely taken, and it is never mapped back to the containing structure using container_of().

The second use is the data inheritance embedding that we have already discussed. The third is like it but importantly different. This third use is typified by struct list_head and other structs used as an embedded anchor when creating abstract data types.

The use of an embedded anchor like struct list_head can be seen as a style of inheritance as the structure containing it "is-a" member of a list by virtue of inheriting from struct list_head. However it is not a strict subtype as a single object can have several struct list_heads embedded - struct inode has six (if we include the similar hlist_node). So it is probably best to think of this sort of embedding more like a "mixin" style of inheritance. The struct list_head provides a service - that of being included in a list - that can be mixed-in to other objects, an arbitrary number of times.

A key aspect of data inheritance structure embedding that differentiates it from each of the other two is the existence of a reference counter in the inner-most structure. This is an observation that is tied directly to the fact that the Linux kernel uses reference counting as the primary means of lifetime management and so would not be shared by systems that used, for example, garbage collection to manage lifetimes.

In Linux, every object with an independent existence will have a reference counter, sometimes a simple atomic_t or even an int, though often a more explicit struct kref. When an object is created using several levels of inheritance the reference counter could be buried quite deeply. For example a struct usb_device embeds a struct device which embeds struct kobject which has a struct kref. So usb_device (which might in turn be embedded in a structure for some specific device) does have a reference counter, but it is contained several levels down in the nest of structure embedding. This contrasts quite nicely with a list_head and similar structures. These have no reference counter, have no independent existence and simply provide a service to other data structures.

Though it seems obvious when put this way, it is useful to remember that a single object cannot have two reference counters - at least not two lifetime reference counters (It is fine to have two counters like s_active and s_count in struct super_block which count different things). This means that multiple inheritance in the "data inheritance" style is not possible. The only form of multiple inheritance that can work is the mixin style used by list_head as mentioned above.

It also means that, when designing a data structure, it is important to think about lifetime issues and whether this data structure should have its own reference counter or whether it should depend on something else for its lifetime management. That is, whether it is an object in its own right, or simply a service provided to other objects. These issues are not really new and apply equally to void pointer inheritance. However an important difference with void pointers is that it is relatively easy to change your mind later and switch an extension structure to be a fully independent object. Structure embedding requires the discipline of thinking clearly about the problem up front and making the right decision early - a discipline that is worth encouraging.

The other key telltale for data inheritance structure embedding is the set of rules for allocating and initializing new instances of a structure, as has already been hinted at. When union or void pointer inheritance is used the main structure is usually allocated and initialized by common code (the mid-layer) and then a device specific open() or create() function is called which can optionally allocate and initialize any extension object. By contrast when structure embedding is used the structure needs to be allocated by the lowest level device driver which then initializes its own fields and calls in to common code to initialize the common fields.

Continuing the struct inode example from above which has an alloc_inode() method in the super_block to request allocation, we find that initialization is provided for with inode_init_once() and inode_init_always() support functions. The first of these is used when the previous use of a piece of memory is unknown, the second is sufficient by itself when we know that the memory was previously used for some other inode. We see this same pattern of an initializer function separate from allocation in kobject_init(), kref_init(), and device_initialize().

So apart from the obvious embedding of structures, the pattern of "data inheritance through structure embedding" can be recognized by the presence of a reference counter in the innermost structure, by the delegation of structure allocation to the final user of the structure, and by the provision of initializing functions which initialize a previously allocated structure.

Conclusion

In exploring the use of method dispatch (last week) and data inheritance (this week) in the Linux kernel we find that while some patterns seem to dominate they are by no means universal. While almost all data inheritance could be implemented using structure embedding, unions provide real value in a few specific cases. Similarly while simple vtables are common, mixin vtables are very important and the ability to delegate methods to a related object can be valuable.

We also find that there are patterns in use with little to recommend them. Using void pointers for inheritance may have an initial simplicity, but causes longer term wastage, can cause confusion, and could nearly always be replaced by embedded inheritance. Using NULL pointers to indicate default behavior is similarly a poor choice - when the default is important there are better ways to provide for it.

But maybe the most valuable lesson is that the Linux kernel is not only a useful program to run, it is also a useful document to study. Such study can find elegant practical solutions to real problems, and some less elegant solutions. The willing student can pursue the former to help improve their mind, and pursue the latter to help improve the kernel itself. With that in mind, the following exercises might be of interest to some.

Exercises

  1. As inodes now use structure embedding for inheritance, void pointers should not be necessary. Examine the consequences and wisdom of removing "i_private" from "struct inode".

  2. Rearrange the three unions in struct page to just one union so that the enumeration of different subtypes is more explicit.

  3. As was noted in the text, struct seq_file can be extended both through "void pointer" and a limited form of "union" data inheritance. Explain how seq_open_private() allows this structure to also be extended through "embedded structure" data inheritance and give an example by converting one usage in the kernel from "void pointer" to "embedded structure". Consider submitting a patch if this appears to be an improvement. Contrast this implementation of embedded structure inheritance with the mechanism used for inodes.

  4. Though subtyping is widely used in the kernel, it is not uncommon for a object to contain fields that not all users are interested in. This can indicate that more fine grained subtyping is possible. As very many completely different things can be represented by a "file descriptor", it is likely that struct file could be a candidate for further subtyping.

    Identify the smallest set of fields that could serve as a generic struct file and explore the implications of embedding that in different structures to implement regular files, socket files, event files, and other file types. Exploring more general use of the proposed open() method for inodes might help here.

  5. Identify an "object-oriented" language which has an object model that would meet all the needs of the Linux kernel as identified in these two articles.

Comments (29 posted)

Patches and updates

Kernel trees

Linus Torvalds Linux 3.0-rc2 ?
Con Kolivas 2.6.39-ck2 ?

Architecture-specific

Core kernel code

Development tools

Device drivers

Documentation

Filesystems and block I/O

Memory management

Networking

Lauro Ramos Venancio NFC subsystem ?

Security-related

Virtualization and containers

Miscellaneous

Page editor: Jonathan Corbet

Distributions

Send in the clone: Scientific Linux 6.1 approaches

June 8, 2011

This article was contributed by Joe 'Zonker' Brockmeier.

When Red Hat discontinued the free Red Hat Linux and introduced Red Hat Enterprise Linux (RHEL), demand for clones spawned a slew of clones based on RHEL sources. Many of the projects — White Box Enterprise Linux, Tao Linux, and Lineox — have since gone offline or simply gone silent. Only CentOS and Scientific Linux have survived the long haul, and only Scientific Linux has managed to put out a release based on RHEL 6.0. It is also keeping pace with 6.1 as the Scientific Linux 6.1 alpha released less than two weeks after RHEL 6.1.

[SL 6.1 desktop]

Scientific Linux is a distribution pulled together from the source of RHEL. It started life as High Energy Physics Linux (HEPL), developed by Connie Sieh. After Sieh solicited input from other labs and universities, two things were clear — there was definitely interest in a lab-focused distribution from RHEL sources, and the name wasn't quite right for labs and universities not working with high energy physics.

The name was changed to Scientific Linux and the first release (3.0.1) came out on in May of 2004. Since then, the project has followed RHEL releases fairly closely — though there was a significant delay between the release of RHEL 6.0 (November 2010) and Scientific Linux 6.0 (March 2011). With the 6.1 release, Scientific Linux is closing the gap — RHEL 6.1 was released in mid-May, with the first alpha for Scientific Linux out on June 1st.

Scientific Linux 6.1 carries the same updates as the upstream release, as well as a couple of minor tweaks. Specifically, 6.1 has a new graphical theme called "Edge of Space," and has moved some of SL's repositories (testing and fastbugs) to an optional package rather than enabling them by default.

Differences Between Scientific Linux and RHEL

Unlike other RHEL clones, Scientific Linux is not an attempt to produce an exact duplicate of RHEL, minus branding. While Scientific Linux does try to be as close as possible, changes are made. However, the delta between Scientific and RHEL is decreasing greatly over time.

The Scientific Linux customizations page leads to information on added packages between the upstream and the corresponding Scientific release. According to core contributor Troy Dawson, the 6x series has the fewest changes from upstream. In the default install, only yum-autoupdate (which does what the name suggests) is added to upstream's package selection. Users can also choose to install IceWM, the OpenAFS distributed filesystem, a handful of yum repositories, and Scientific Linux's tools for creating "sites."

Sites, also known as spins, are custom configurations of Scientific Linux. Dawson says that the ability to create spins have required installer tweaks in SL3 through SL5, but it has required fewer changes from release to release — and no changes in SL6 (modulo branding, of course). A few spins for other labs are listed on the Scientific Linux Website, but none are based on SL6 yet.

According to Dawson, the reduction in changes and additional packages comes at the request of the community. "This was due to requests from the HEP community that we quit adding our own packages and start utilizing the other community based repositories, such as EPEL and RPMForge." In some cases, changes between upstream and Scientific are available as "tweak RPMs," which Dawson says "change something after the regular RPM is installed." Dawson says most of the tweak RPMs are not installed by default.

[Revisor]

If users want to further customize Scientific Linux, they can use Revisor to create a site, assuming they have the full distribution mirrored. The project already has the documentation for creating sites for SL6 including RPMs that require special attention during builds. Sieh says that Scientific Linux is using Koji to do its building, and then Revisor to create the ISO images and network tree — and Scientific Linux does provide the config files necessary to replicate the build. In most cases, though, Sieh says that if a user simply wants minor changes the easiest way is to build the packages manually and use Revisor to create a custom distribution rather than trying to replicate an entire Koji build environment.

Scientific Linux seems to be gaining in popularity recently, no doubt in part due to being first to deliver a release based on RHEL 6.0. According to its statistics page (which only measures downloads from the main sites — not mirrors) downloads of 6.0 have had a bigger spike than previous SL releases, which seems to indicate that non-lab use of Scientific Linux is growing. For prior releases of Scientific, Dawson says that "I'm pretty sure about half our users were labs, universities, and other scientific and educational places" where Scientific Linux was primarily used on compute nodes. With the SL6 release Dawson says "I really don't know" who the average user is, but he does think that the profile has changed.

One thing that is attractive about Scientific Linux, aside from the obvious, is that the Scientific Linux team has been proactive in sending out status updates, and notifying its community of unavoidable delays.

Dawson says that the project does try to have security updates "pushed out and announced within 24 hours of the upstream vendor announcing" the updates. For non-security errata, updates are pushed out once per week into the "fastbugs" repository — with a few exceptions. According to Dawson, one exception is if the Scientific team has problems building an RPM for one reason or another. He also says that after a major release from Red Hat, updates are slowed while the team works on that. Finally, Dawson says that they hold onto kernel changes for a few days "just to make sure we don't see any yelling" on the Red Hat mailing list about the new kernels. "It generally takes a couple weeks to get their latest security errata built and tested to our satisfaction. An example would be when RHEL 6.1 came out. We didn't release any security updates for SL 6.0 until this week. So it took a week and a half to get the security errata out."

The reason for the delay is that Scientific Linux is backporting security fixes to 6.0. Dawson adds that he still has three packages that are waiting on updates: sssd, qemu-kvm, and libguestfs. The sssd package pulls in several dependencies from 6.1, so it's getting extra testing. The qemu-kvm package hasn't passed the QA tests, but Dawson says he's hoping to have it ready next week for SL6.1 as well as SL6.0 security errata. Finally, libguestfs is delayed because it depends on qemu-kvm.

In other words, the updates generally appear within a day for security releases and within a week for other errata. However, users should be aware that this is not an iron-clad guarantee that security updates or errata will be available in as timely a fashion as they might like. However, Scientific Linux has a fairly good track record. What's the secret to providing consistent updates over the long haul? It doesn't hurt that Scientific Linux has folks like Dawson and Sieh that are paid to work on Scientific Linux (at least in part) by Fermilab — and the entire development team is not allowed to go on vacation at the same time so at least one developer is always available.

Dawson says that Fermilab has reorganized recently, and added two more developers to the team working on Scientific Linux (Jason Harrington and Tyler Parsons), in addition to Dawson and Sieh. CERN also contributes "a person here and there, usually Jaroslaw Polok" and Stephan Wiesand from Deutsches Elektronen-Synchrotron (DESY) worked on the OpenAFS packages and a few others. But nobody is paid to work full-time on Scientific Linux:

The Fermilab group is paid to work on Scientific Linux, although it's only one part of our job description. If any of us are not working on Scientific Linux, we would still have more work than we can do in a day. Both Connie any myself put in as many after work hours as is needed during critical times. I believe for all of the developers outside Fermilab it is the same. The labs they work at allows them to work on their respective projects during work, but they also do a lot of SL work after normal work hours.

But the project is open to outside contribution as well. Dawson says that the best way to be involved is "find something you think you can do and do it. Many people point out 'you should do this' but very few actually do it." He noted that Urs Beyerle "just started making live CDs", and Shawn Thompson "thought he could do better on the artwork" so he redid the artwork for SL5 and SL6. The forum was the brainchild of John Outlan, who set it up and started moderating it for Scientific Linux. Dawson says "there are limits, but many people have found ways to contribute."

With the 6.1 release, Scientific Linux is looking very healthy and ready for users who want a RHEL clone with reliable and timely updates. It's not a perfect clone of RHEL, but for many purposes it is close enough to get the job done and offer a suitable substitute.

[ Editor's note: We have added Scientific Linux to the list of security updates that we follow, so you should see SL advisories in our daily updates soon. ]

Comments (3 posted)

Brief items

Distribution quote of the week

And so, dear readers: I'm going to invoke the 8th F of Fedora: FIXIT. (Other F's include, of course, Freedom, Friends, Features and First, and the lesser known gems such as Fun, Fail, Fail Faster, Finance Friday, etc.) Rather than lament on how things could be better, I think we should fix the feature process, or at least take a good assessment to see if it's still fitting our needs, and if not, do something.
-- Robyn Bergeron

Comments (none posted)

MeeGo 1.1 update 5

The fifth update for MeeGo 1.1 is available. This includes the Core Software Platform, Netbook UX, and In-Vehicle Infotainment (IVI) UX.

Full Story (comments: none)

Oneiric Ocelot Alpha 1 Released

The first alpha for Ubuntu 11.10 (Oneiric Ocelot) is available for testing. This alpha includes images for Ubuntu Desktop, Server and Cloud, as well as Kubuntu, Xubuntu, and Edubuntu editions.

Full Story (comments: none)

Distribution News

openSUSE

Time to vote on the openSUSE Strategy!

Members of the openSUSE project are encouraged to vote on the openSUSE strategy proposal. "We're not asking everyone if they think it is a perfect fit for themselves — we're a diverse community and we'll never fully agree on anything. But this proposal has seen lots of thought, discussion, revision, input — it is arguably the best we could come up with to describe who we are and where we want to go. So the question is — do we agree this describes us? Is it good enough for us to support? Can we move on with this?"

Comments (5 posted)

Newsletters and articles of interest

Distribution newsletters

Comments (none posted)

Hameleers: What's cooking?

Slackware volunteer Eric Hameleers looks at some of his packaging projects. He's hit some bumps with the KDE 4.7.x series which could have major impact.

The new series 4.7.x proves to be a bigger challenge for Slackware. We saw that the 4.6. series moved away from HAL and instead requires udisks/upower (which was the reason for sticking with 4.5.5 in Slackware 13.37). The KDE developers have now finalized their move from CVS to GIT as the source control and version management system. The result is less than optimally arranged for packagers. The old "monolithic" source tarballs are now being split into many additional tarballs for individual applications. This means we have rewrite our scripts and possibly add a lot of packages. While this may be advantageous for some other distros with dedicated packaging teams, for us Slackware people it is a time for decisions.

After talking to Pat Volkerding, I announced on the KDE packager mailing list that we are considering the same solution as was chosen for GNOME in the past: remove KDE from Slackware if it proves to become a maintenance burden. I can not yet say anything final about this. For the time being, I have decided not to create Slackware packages for the KDE Software Compilation 4.7.x.

Comments (2 posted)

Page editor: Rebecca Sobol

Development

Building RPMs using Mock's SCM integration

June 8, 2011

This article was contributed by Marko Myllynen

Building a single RPM once is a relatively simple task, which is accomplished by tweaking the spec file and running it through rpmbuild on the local system. However, when more than one or two people are involved and several RPMs for several distributions are being maintained over a long period of time, reproducibility and change tracking become essential. A new feature in Mock, which builds RPMs from source, will make it easier for developers and projects to handle this task.

Koji offers a centralized build system for distributions like Fedora with hundreds of developers and a dedicated infrastructure team. Unfortunately, setting up Koji is not the most straightforward task. Few will enjoy investigating miscommunications between Koji components when building an RPM fails, for example, or how to adjust repository settings in a database instead of a simple yum-format configuration file. In some cases Koji's centralized nature is also a limitation because using it might require setting up a VPN connection to an organization's network.

Mock is the tool used by Koji to create chroots and build RPMs in them. Up until recently, Mock, which runs on the local system, has required source RPMs when used directly. This is not really optimal since building source RPMs requires extra steps. It is also worth remembering that the RPM version of RHEL 4/5 is unable to handle source RPMs created with the default options on RHEL 6 or recent Fedoras due to RPM package format changes.

The recent release of Mock 1.1.10 introduces an interesting feature to seamlessly integrate RPM building with CVS/Git/SVN workflow. After initial Mock installation and configuration (i.e., just adding user(s) to the mock group) all that's needed is to define the default source code management (SCM) method and its checkout command in Mock's site-defaults.cfg main configuration file. After that Mock can build RPMs directly from the SCM repository provided that a spec file suitable for rpmbuild is found inside the repository. Mock first builds the source RPM in a selected chroot and then the binary RPMs from it in the same chroot.

Building an RPM directly from the default SCM in a Mock chroot is as easy as:

    mock -r fedora-14-i386 --scm-enable --scm-option package=testpkg -v

In this example first the target (-r fedora-14-i386) is specified, then the SCM integration is enabled (--scm-enable) and the actual package to be build is set (--scm-option package=testpkg). Also, a modest level of tracing is enabled (-v). Otherwise defaults from the Mock configuration file are used, including the default SCM method (one of CVS/Git/SVN) and the location of the SCM repository.

This SCM integration allows two possibilities for setting up repositories: the repository may contain all the source and configuration files needed for a package (the tar ball needed during the build process will be generated on-the-fly) or the repository may contain only the spec file, possible patches, and local configuration files, then the tar package can be fetched from an NFS mounted directory for example. Then Mock running on a 64-bit RHEL 6 system, for instance, can be used to build RPMs from an SCM for 32/64-bit RHEL 4/5/6, Fedoras, and other RPM-based distributions. In many cases a spec file doesn't need any extra directives for different distributions but, if needed, distribution or version specific if/else definitions can be used to allow using the same spec file for all targets. As long as the package repositories for targets are publicly available, no access to the organization's network is needed when building.

Using Mock's SCM integration capabilities allow organizations' package maintainers to combine the easiness of plain rpmbuild with build reproducibility and change tracking offered by Mock and SCMs, but without the overhead of Koji. Trivial configuration, being able to trace the build process locally if needed, and using the build system while roaming are also noteworthy features. In the future, integrating all this into higher level tools like Emacs or Eclipse would open interesting possibilities for developers: generating RPMs for several targeted distributions directly from an SCM in a reproducible manner with a single click on the GUI.

Comments (none posted)

Brief items

Qtractor 0.4.9

[Qtractor] Qtractor is a MIDI sequencer; the 0.4.9 release is now out. New features include audio latency compensation, "MIDI scale-quantize and snap-to-scale tools," "MIDI controller invert value and connects access," and "Audio peak/waveform generation pipeline."

Full Story (comments: none)

Synfig Studio 0.63.0

Version 0.63.0 of the Synfig animation tool is out. New features include better outline control, improved bline editing, a port to cairo, canvas guides, and more. (Thanks to Paul Wise).

Comments (none posted)

Newsletters and articles

Development newsletters from the last week

Comments (none posted)

Aslett: The trend towards permissive licensing

Matthew Aslett argues that the popularity of copyleft licenses is in decline. "This last chart illustrates something significant about the previous dominance of strong copyleft licenses: that it was achieved and maintained to a significant degree due to the vendor-led open source projects, rather than community-led projects. One of the main findings of our Control and Community report was the ongoing shift away from projects controlled by a single vendor and back toward community and collaboration. While some might expect that to mean increased adoption of strong copyleft licenses - given that they are associated with collaborative development projects such as GNU and the Linux kernel - the charts above indicate a shift towards non copyleft."

Comments (18 posted)

Meeks: LibreOffice progress to 3.4.0

Michael Meeks digs in to the changes that went into LibreOffice 3.4, including better translation support, merging changes from OpenOffice.org (part of which was a "multi-million-line" OO.o cleanup patch), adding more build bots, and more. One major area of work was in doing some cleanup to reduce the size of LibreOffice: "First - ridding ourself of sillies - there is lots of good work in this area, eg. big cleanups of dead, and unreachable code, dropping export support from our (deprecated for a decade) binary filters and more. I'd like to highlight one invisible area: icons. Lots of volunteers worked on this, at least: Joseph Powers, Ace Dent, Joachim Tremouroux and Matus Kukan. The problem is that previously OO.o had simply tons of duplication, of icons everywhere: it had around one hundred and fifty (duplicate) 'missing icon' icons as an example. It also duplicated each icon for a 'high contrast' version in each theme (in place of a simple, separate high contrast icon theme), and it also propagated this effective bool highcontrast all across the code bloating things. All of that nonsense is now gone, and we have a great framework for handling eg. low-contrast disabilities consistently."

Comments (14 posted)

Linux Photo Tools (The H)

Here's a lengthy survey of photo processing tools (some proprietary) on The H. "If you are prepared to deal with multiple user interfaces and software handling concepts, you will be able to produce professional looking results by using clever combinations of appropriate tools. This is a tried and trusted approach for many Linux users, who are familiar with using collections of individual tools that each perform one specific task particularly well. The choice of efficient photo workflow tools for Linux is not as wide as it is for Windows, but there is nevertheless a good selection of powerful programs available for importing, viewing and geotagging your images, as well as for performing a multitude of other tasks."

Comments (12 posted)

Walter: The Go Programming Language, or: Why all C-like languages except one suck

Jörg Walter has written a detailed and positive review of the Go programming language. "And as a final note, I have seen a fair amount of criticism of Go on the internet, which I cannot ignore, so here it goes: Most of these people didn't actually look at it. Go is different, even though it still looks kinda-C. It isn't. It's not C++, nor Objective C, and it doesn't try to be! So stop saying 'Who needs Go when we have C++/Objective C?' already. Check out how Go tries to solve the same problems in a radically different way."

Comments (110 posted)

Page editor: Jonathan Corbet

Announcements

Brief items

OIN gets a set of web scripting patents

The Open Invention Network has announced that it has named Fred DuFresne as a "distinguished inventor" and acquired his patents related to WebMate Foundation, "a server-side scripting software that predates ASP from Microsoft, JSP from Sun/Oracle, and PHP from the open source community". One could imagine that patents reading on ASP and JSP might be useful in future disputes with the companies involved.

Comments (9 posted)

LAC 2011 videos are available

The proceedings (including video recordings, slides and other material) from the recent Linux Audio Conference are available. "We'd like to thank all speakers and everyone who volunteered to make this an enjoyable event; in particular Frank Neumann, John Lato, Victor Lazzarini and special thanks to Jörn Nettingsmeier."

Full Story (comments: 1)

Articles of interest

FSFE Newsletter - June 2011

Click below to see the June edition of the Free Software Foundation Europe newsletter.

Full Story (comments: none)

Samsung Delivers Galaxy S II to CyanogenMod Dev (Phandroid)

Phandroid has a brief article noting that Samsung has given a Galaxy S II phone to a CyanogenMod developer with the explicit purpose of facilitating a port. Donating a handset is an easy thing for a manufacturer to do, but the value of actively encouraging hacking on the device is great.

Comments (15 posted)

Calls for Presentations

Call For Papers - PG-Day Denver 2011

PG-Day Denver will be held on September 17, 2011 in Denver, Colorado. "Each session will last 45 minutes, and may be on any topic related to PostgreSQL." The submission deadline is July 31.

Full Story (comments: none)

Upcoming Events

Linux Beer Hike

This year's annual Linux Beer Hike (LinuxBierWanderung aka LBW) will be held July 30 - August 6, 2011 in the village of Lanersbach in the Tux valley in Austria. "There is much to discover: glacier caves, waterfalls, marmots, pleasant pastures and even a summer ski resort for those who like to take to the pistes all year round. Tux also has a lot to offer in terms of food and drink: many delicious beers and hearty cuisine will fortify the participants and keep them all happy. Talks about Linux and Free and Open Source Software complete the programme."

Full Story (comments: none)

Events: June 16, 2011 to August 15, 2011

The following event listing is taken from the LWN.net Calendar.

Date(s)EventLocation
June 15
June 17
2011 USENIX Annual Technical Conference Portland, OR, USA
June 20
June 26
EuroPython 2011 Florence, Italy
June 21
June 24
Open Source Bridge Portland, OR, USA
June 27
June 29
YAPC::NA Asheville, NC, USA
June 29
July 2
12º Fórum Internacional Software Livre Porto Alegre, Brazil
June 29 Scilab conference 2011 Palaiseau, France
July 9
July 14
Libre Software Meeting / Rencontres mondiales du logiciel libre Strasbourg, France
July 11
July 16
SciPy 2011 Austin, TX, USA
July 11
July 12
PostgreSQL Clustering, High Availability and Replication Cambridge, UK
July 11
July 15
Ubuntu Developer Week online event
July 15
July 17
State of the Map Europe 2011 Wien, Austria
July 17
July 23
DebCamp Banja Luka, Bosnia
July 19 Getting Started with C++ Unit Testing in Linux
July 24
July 30
DebConf11 Banja Luka, Bosnia
July 25
July 29
OSCON 2011 Portland, OR, USA
July 30
July 31
PyOhio 2011 Columbus, OH, USA
July 30
August 6
Linux Beer Hike (LinuxBierWanderung) Lanersbach, Tux, Austria
August 4
August 7
Wikimania 2011 Haifa, Israel
August 6
August 12
Desktop Summit Berlin, Germany
August 10
August 12
USENIX Security ’11: 20th USENIX Security Symposium San Francisco, CA, USA
August 10
August 14
Chaos Communication Camp 2011 Finowfurt, Germany
August 13
August 14
OggCamp 11 Farnham, UK

If your event does not appear here, please tell us about it.

Page editor: Rebecca Sobol


Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds