Recently, I had the opportunity to attend GUADEC in The Hague, Netherlands and was quite impressed by the conference and the GNOME project itself. There were many more sessions than one could possibly attend, and too many attended to do a full write-up on. Rather than let the notes languish on the laptop hard drive, though, a brief look at some of the other sessions I sat in on seems warranted.
The conference venue, De Haagse Hogeschool—College (or University) of The Hague—was an excellent facility for GUADEC, with plenty of room for the sessions as well as a nice atrium in the center for the always important "hallway track". The city was quite nice as well, with easy walking to most things, and ever-present trams for places that were further away. While there was a fair amount of the expected rain during GUADEC, there were some very nice weather days as well. I took the opportunity to do a little wandering around the city center—where the conference hotel was located—my only regret is that I never made it to the Escher Museum; another trip there is clearly called for.
Luis Villa was not the only one who thought that GNOME should become more web-focused; there were several other presentations that looked at various aspects of how to make that happen. Xan Lopez of Igalia and Gustavo Noronha of Collabora nearly short-circuited their presentation by stating their agreement with Villa and John Palmieri—who has also pushed "GNOME web"—followed by the traditional "Questions?" slide. After the laughter died down, they pushed on to look at the history of desktops and the web, as well as how they saw GNOME fitting in.
Lopez and Noronha noted that the basics of the desktop were invented by Alan Kay in the 1970s and have been incrementally improved since then. "Apple made it [the desktop UI] popular, Microsoft made it really popular, we are trying to make it free." Web applications are rapidly catching up to the desktop in functionality, though, and the perception is that the desktop is "losing momentum".
They looked at the evolution of web support in GNOME, starting with gtkhtml, which was "not feature complete", to Gecko and gtkmozembe, which was problematic because it focused on the browser, not those who wanted to embed an HTML renderer. The most recent effort, WebKitGTK has a number of advantages, they said. WebKit was designed "from the ground up" to be embedded. It is also easier to have a voice in WebKit development because there are multiple vendors using it, unlike Gecko which is focused on Mozilla's needs.
In addition, WebKit uses existing libraries that are already used by GNOME. For example, Gecko uses the Necko library for networking, but WebKit uses libsoup. WebKitGTK is "much better suited for us", they said. They also listed multiple GNOME applications that are using the toolkit for rendering, like Banshee and Rhythmbox embedding browser windows into their interfaces, multiple different browsers, the Liferea RSS reader for its display, and even things like Gwibber and Empathy are using it for "sexier" message display as well as more flexible layout and theming.
The "web does not exist in a vacuum" and GNOME has lots of components to bring to the table, they said. Things like Cairo for vector graphics or GStreamer for multimedia are used by WebKit, so the two projects have much in common. In the mobile space, they noted that Litl and Palm's WebOS both treat all of their applications as web applications, but use GNOME underneath. Lopez and Noronha clearly see WebKitGTK as the way forward for GNOME.
MeeGo hacker Rob Bradford of Intel gave a presentation on a concrete example of integrating web services into GNOME using LibSocialWeb and librest. The basic idea is to keep interested applications informed of updates to the user's social web applications, like Facebook, Flickr, Twitter, and others. Applications can use a D-Bus interface to LibSocialWeb to register their interest in various kinds of events and then to receive them asynchronously.
Backends are responsible for talking with each of the web services, and each has its own D-Bus service. Currently there are backends available for the most popular services and, depending on the API provided by the service, they can also update the service (i.e. sending Facebook status updates or a photo to Flickr) in addition to being a passive listener. The backends periodically connect to the service, parse the returned XML, and notice things that have been added or changed. There is a common core, which is shared by most of the backends to do the parsing and noticing.
For handling the communication tasks, librest is used. It targets RESTful web applications, and includes a simple XML parser—as a wrapper around the more powerful libxml2—to parse data returned from web applications. Traditional XML parsing is "overkill for the simple data returned from most web services", he said.
The goal is to get LibSocialWeb added as an external dependency for GNOME 3, so that GNOME applications can take advantage of it. There is still lots to do before that can happen, Bradford said, including reworking the client-side API as there is "just enough" now to be able to demonstrate the functionality.
In addition to interacting with the "standard" social web services, he also discussed other uses for LibSocialWeb. Using libchamplain to display maps that include location information retrieved from the web (or by doing an IP address to location lookup using GeoClue) is one. He also described a small application that he wrote in 20 minutes to search compfight.com for Creative-Commons-licensed images that could be used as a screen background, which could be integrated into the GNOME control center.
All told, LibSocialWeb and librest seem like a way forward for GNOME applications that want to be more "webby". They will presumably get a good workout in MeeGo, which should shake loose many of the bugs and limitations.
With Seif Lofty acting as a "master of ceremonies" of sorts, several members of the Zeitgeist team gave short presentations about aspects of the desktop event collector and advances made since it was introduced at last year's desktop summit. The Zeitgeist engine is a means to capture events, like file access, application use, or communication action, as the user does them, and then allow applications to query the events. The idea is that various user interfaces, like the GNOME activity journal, Nautilus, or Docky, can then present that information in different ways to help users keep track of what they were doing, when, where, and so on.
Thorsten Prante described the deployment of Zeitgeist in different applications and provided use cases of how the data gathered might be used. The activity journal gives a timeline-based look at what the user was doing on different days or at different points within a day. It can then answer questions like "when did I do X?", and "what else did I do then?". But it goes further than that as correlations can be made with location or other applications or documents used at the same time.
This gives users a "time-referential access" to their activities, which will help them "go back to where [they] left off", he said. Correlating that information with chat and email history "can show the big picture of what you've been doing". Adding application-specific information like browser history can help give a better context for which related tasks the user was performing—leading to better workflow.
Former Google Summer of Code participant—now core maintainer—Siegfried Gevatter talked about Zeitgeist framework. Over the last year, a new, better performing database structure has been adopted, along with a more powerful D-Bus API. Applications can push events into Zeitgeist using Python, C, Vala, or directly using the D-Bus interface.
The framework is "intended to be enhanced with plugins", he said. Those plugins are written in Python, reside in the Zeitgeist process, and "manipulate events in and out". They can provide their own D-Bus API and handle tasks like blacklisting, geolocation, adding file content and full text search events, etc. At the end of his mini-presentation, Gevatter demonstrated an application that placed various activities on a map (from OpenStreetMap naturally) so that a user could see where, geographically, they were when they performed those tasks—all in "200 lines of Python".
After that, Youness Alaoui presented the newest part of Zeitgeist: the Teamgeist framework. Teamgeist was motivated by a "lack of real-time collaboration tools", he said. Sharing desktop events between team members is the idea behind the framework, so that others can see what you were doing and pick up where you left off. Teamgeist started with a prototype last year and, since then, Collabora sponsored work on a "full-fledged" implementation.
The criteria for sharing events is under the control of each user, but there are multiple use cases that Alaoui presented, including sharing of online research progress, files edited, documents created, and so on. Teamgeist uses Telepathy "tubes" to exchange events currently, but other transports could be added, sharing events through email for example. The vision for Teamgeist is that teams could be fully aware of what the other members are doing, sharing files and documents through version control repositories or via some kind of cloud storage.
The most eye (and headline) catching result from Dave Neary's GNOME census report was the less-than-stellar ranking of Canonical among corporate contributors, but that was certainly not the thrust of his presentation. He set out to examine the makeup of GNOME committers because he "thought it would be an interesting thing to know". But he also pointed out that partners and potential partners, the press, big institutional clients, vertical application developers, and headhunters all have expressed interest in that kind of data at times.
Neary measured changesets, rather than lines of code (LoC), because LoC is "not a good measure", though he admitted that changesets was not a perfect measure either. He looked at the commits from the GNOME 2.30 release in March 2010 and before, omitting some external dependencies, deprecated modules, and GNOME mobile.
He used various tools to gather his statistics, gitdm and CVSAnaly primarily along with a number of shell scripts. He put the data into MySQL for easy access and querying and used Gnumeric for his charts. One of the biggest difficulties was to try to disambiguate multiple email addresses that corresponded to the same developer and to properly attribute developer's contributions to their employer—or to "none" as appropriate.
The slides [slideshare] from the talk show some interesting trends. The "Rhythm" graph shows the number of commits over time, which clearly shows the post-GUADEC flurry of work as well as the steep dropoff in commits at each release point. There is, unsurprisingly, a long tail in the number of commits based on each committers rank: there are some 3500 committers, with the top 200 accounting for the overwhelming majority of commits—number 1000 in the ranking has only 2 or 3 commits, he said.
There is an interesting mix between two kinds of prolific developers, he said, as they either have "thousands of commits in a few modules or hundreds of commits in many modules". That reflects a split between specialists and generalists among GNOME hackers.
He also looked at the company contributions to GNOME, noting that Red Hat had 16% of the overall commits and "11 of the top 20 contributors" were either former or current Red Hat employees. Red Hat tends to spread its contributions throughout GNOME, while Novell (10%) seems to focus on particular applications. Collabora, third on the list of companies at 5%, is tightly focused on GStreamer and Telepathy.
While he did point out that Canonical came in quite low (1%), that was partly because it was doing a lot of work that it had not yet submitted upstream. "It would be a better strategy to work directly with GNOME", he said. He also noted that there may be a worry because of Nokia's shift to Qt as it had pushed a number of startups to make significant GNOME contributions. If much of that work was funded by Nokia, "what happens going forward?"
His other concern was for the territory that various companies have staked out. Should GNOME be worried for GStreamer and Telepathy if Collabora was to go out of business, he asked. He was clear that he wasn't worried about Collabora's future but about the more general question of GNOME modules that are dominated by a single company—one that could change strategies with little or no warning.
Thorsten Sick, an anti-virus developer at AVIRA, gave a nice overview of the desktop malware landscape, with an eye toward helping GNOME avoid some of the mistakes other desktops have made. He would like to prevent "the Windows malware situation" from developing on the Linux desktop. In his spare time, Sick also contributes to the GNOME Seahorse project, which is an encryption key (e.g. SSH or PGP) manager for the desktop.
Malware has moved from "cool hacker experiments", through "script kiddie stuff", to where it is now, which is a thriving malware economy. Today's attacks are largely focused on extracting money from their victims somehow. But that shift makes for one way to combat these attackers: reducing their return on investment (ROI) will make them turn to easier targets.
The malware scene has gotten more sophisticated over time as well; today's attacks will try to hide from scanners and will adjust to anti-virus detection within hours. Malware is sold with support contracts and guarantees of evading detection. Making it more difficult to attack systems, thus raising the price of the malware, is one way to reduce the attackers' ROI. Others include increasing the chance of getting caught, ratcheting up the legal penalties for malware distribution, or reducing the prices for the valuables that can be gained. He noted that a glut of stolen credit card numbers available at one point drastically reduced prices, which probably, temporarily anyway, reduced attacks that targeted credit card numbers.
To the attackers, "Linux is not interesting at all right now because Ubuntu bug #1 is not solved", he said. But that may change as Linux users typically "feel safe" and tend not to use any anti-virus programs on their systems. This makes for a fertile ground for attackers.
He pointed out that many in the Linux community focus on root exploits, but "malware does not need to be root". Today's attacks are focused at user data that is completely accessible without root access. On the other hand, Linux distributions have some advantages over other systems, including easy updating of the system for security problems and various security technologies (SELinux, AppArmor, etc.) that are turned on by some distributions.
His main point was education, and he wants Linux and GNOME to "be prepared" for the attackers turning their eyes to that platform. "Everyone can do a small piece of the puzzle to improve Linux desktop security", he said.
I agree with Brad Kuhn's assessment that if you rate conferences by "inspiration value", this year's GUADEC ranks very highly indeed. Like Kuhn, I also found myself wondering where I might be able to contribute to GNOME, which is a bit amusing given that I generally run KDE—though I am not terribly religious about it. It was a very high-energy conference that clearly indicated a strong and engaged project.
The conference also had two nice parties, one at a club in the city center that was sponsored by Canonical and a beach barbecue that Collabora put on. There were lots of interesting folks to talk to—and play Go with—to complement the wide array of interesting presentations. The only downside for me was a self-inflicted Rawhide upgrade that left me only able to suspend my laptop once per boot—next time testing suspend several times before braving a trans-Atlantic trip seems indicated.
The cow-themed wooden shoe slippers (at right), which were given to me as a speaker's gift, were quite the hit with my wife after I swapped them to a smaller size. I almost regret that switch as I must admit that Lennart Poettering looked rather sharp in the orange version of the slippers during one of his presentations.
In the closing session, Berlin was announced as the location for the next GUADEC, which will be a combined conference with KDE's Akademy making for the second desktop summit. I certainly have high hopes of attending.
[I would like to thank the GNOME foundation for its assistance with travel costs for GUADEC. LWN depends on sponsors for our overseas (and sometimes domestic) travel, and we truly appreciate that help.]
Oracle vice president Wim Coekaerts started off the conference with a keynote talk on how much Oracle likes Linux. The Oracle database was first ported to Linux in 1998, just barely winning a race with Informix to be the first commercial database on Linux. The big push started in 2002; now some 20% of Oracle's installed base runs on Linux (as opposed to 27% on Solaris). Surprisingly enough, Wim's talk did not cover Oracle's lawsuit which was just about to land on Google and its Android Linux distribution.
Oracle, it seems, has a list of things it would like to see improved with Linux. Wim pointed out diagnosis tools (tracing and such) as a weak point; he asked the community to recognize that non-hacker users need to be able to support Linux in production situations and could benefit from better tools. Testing was also high on the list; Wim said that nobody is testing mainline kernels - a claim that was disputed during the kernel panel later the same day. Oracle runs some kernel tests of its own, but would like to see more testing done elsewhere. It would also be nice, he said, if more tests could be added to the kernel repository itself, and if distributors could stay closer to the mainline so that testing results would be more relevant to the kernels they ship.
Oracle also wants to see more testing of the full stack; there is a test kit available to help in this regard.
Wim talked up Oracle's contributions, including work with NFS over IPv6, the contribution of the reliable datagram protocol implementation, support for the T10DIF data integrity standard (making Linux the first platform with that feature), improvements to Xen, and, of course, btrfs. It was a convincing discussion of how much Oracle likes Linux, but Oracle's subsequent actions have ensured that any associated warm fuzzy feelings did not last long.
"Harmony" seems to be a popular choice for controversial projects; your editor first encountered the name associated with an ill-fated attempt to replace the (then) non-free Qt toolkit. The latest Project Harmony touches on another difficult issue: contributor agreements for free software projects. This project is headed up by Canonical counsel Amanda Brock, who ran a BOF session about it at LinuxCon.
The core idea behind this Harmony project is that contributor agreements are a pain. They are seen as a waste of time, they are often unclear and complicated, and it's not always clear who should be signing them. Those who do sign these agreements do not always understand what they are agreeing to. Project Harmony is trying to make life easier for everybody involved by creating a set of standardized agreements that everybody understands. These agreements, we were told, are to be drafted by the Software Freedom Law Center, so we can hope that the end result will not be too contrary to the needs of developers.
There will never be a single, one-size-fits-all agreement, of course, so the standardized version will have a number of options which can be chosen. The especially controversial issue of copyright assignment will be one of those options. Others will include the license to be applied to contributions, indemnification, etc. The idea is to try to cover the wishes of most projects in a standard way.
It seems that quite a few of the people involved with this project are opposed to the idea of contributor agreements (or at least certain types of agreements) in general. They are involved because they realize that these agreements are not going away and they want to keep an eye on the process. One reason that the list of participants has not been made public is that a number of these people do not want to be publicly-associated with the concept of contributor agreements.
Given that, it's not entirely surprising that Project Harmony seems to be treading cautiously and trying not to step on anybody's toes. The end result will not advocate any particular choices, and will avoid calling into doubt the agreement any agreements that specific projects may be using now.
Efforts are being made to make the project more transparent; it seems like it's mostly a matter of placating nervous participants. Stay tuned.
Karen Sandler has been a lawyer at the Software Freedom Law Center for some years now. She is also, unfortunately, afflicted with a heart condition which carries the risk of sudden death; one need not be a lawyer to want to try to mitigate a risk like that. To that end, she now has an implanted device which works to ensure that her heart continues to function in a way which keeps the rest of her body happy and healthy. She is, she says, "Karen the cyborg."
Being a free-software-minded cyborg, Karen started to wonder about the software which was about to be implanted into her body. So she went to the various manufacturers of the type of device she needed, asking about the software and whether she could see the source. These manufacturers were somewhat surprised by the request, but wasted no time in turning it down. Karen would really like to take a look at the software which is attached to her heart, but she eventually had to give in and accept the implantation of a closed-source device.
In the process, though, she wrote a paper on software in medical devices for the SFLC. There is, she says, a real issue here: some 350,000 pacemakers are implanted in the US every year, and nobody knows anything about the software on them. Or, it seems, almost nobody knows: some others have already figured out ways to hack these devices. It seems that a number of them use no encryption or security in their communications with the world and can conceivably be made to do unfortunate things.
In general, when the US Food and Drug Administration is considering medical devices for approval, it does not look at the software at all. The agency just does not have the time to do that level of research. But the wider community could look at that code, if it were to be made available. There should be little harm to the manufacturer in releasing its code - if the code is good; patients do not choose pacemakers based on which has the flashiest proprietary code. Like most medical system reforms, this one looks like an uphill battle, but many of our lives may well depend on its outcome.
Stormy Peters is the executive director of the GNOME Foundation, which is concerned with the creation of a free desktop system. Increasingly, though, she has been looking at issues beyond the desktop, and issues surrounding web-based services in particular. Unless we're careful, she says, our use of such services risks giving away much of the freedom that we have worked so hard to build for ourselves.
A lot of people have made a lot of sacrifices over the years, she says, to create our free desktops. Many of them did that work because they believe in freedom. Others, though, worked in this area because they were tired of the blue screen of death and wanted something a little more reliable. The providers of web services have successfully taken away the pain of the BSOD, and, as a result, a lot of us have gotten lazy. We have, Stormy says, forgotten about freedom. As a result, they are becoming trapped by systems which compromise their private information, entrap their data, and may block them out at any time.
That said, people in the community are working on some good initiatives. She mentioned Firefox sync as one example: there are two passwords involved and all data is encrypted so that Mozilla cannot look at (or disclose) it. Also mentioned were identi.ca and the Tomboy online effort.
There are things we should bear in mind when evaluating an online service. One is data: how do you delete it, and what happens to it? Then there's the issue of backups: users should always have a data backup under their control in case they get shut out of the service. We should, Stormy says, create services which make the creation of backups easy. Lock-in is another issue: how easy is it to move to a competing service? And, finally, is licensing; Stormy is a fan of the Affero GPL, which requires that the source for the service be available.
As free software developers, we should make sure that our software integrates well with online services, and we should be working toward the creation of truly free services. We also need to solve the problem of hosting for these services; she mentioned the Gobby collaborative editor, which, she says, is a great tool with no hosting available. We need better APIs for service integration; Grilo and libgdata were mentioned in this context. And, of course, we need web-aware desktop applications.
All told, it's a tall order, but it's one we have to face up to if we care about our freedom.
The patents all cover various aspects of the implementation of Java-based systems. Some of them seem rather trivial; others are quite broad. One of them, for example, would appear to cover the concept of a just-in-time compiler. Those wanting details can see the complaint itself, which lists the patents in question, and this page on the End Software Patents wiki for a look at each patent and the beginning of an attempt to collect prior art. The short summary, though, is that we're just dealing with another set of obnoxious software patents; these are not the silliest ones your editor has ever seen. The patents used for Apple's attack on Android cover much more fundamental concepts.
The patents may or may not stand up to a determined prior-art defense, but chances are that it will not come to that. Prior art is a hard way to go when defending against patents, which enter the courtroom under the halo of presumed validity. What we may see, instead, is an attempt to push the inadequate Bilski decision to get the whole mess invalidated as a set of unpatentable abstract ideas. That would be a risky course which would take years to play out, but there is the potential, at least, of dealing a severe blow to software patents in general. One can always dream.
Meanwhile, there are many outstanding questions about whether Oracle (or, more precisely, Sun before Oracle) has licensed these patents to the world, either implicitly through the GPLv2 code release, or explicitly via patent grants. Only a court will be able to provide a definitive answer to that sort of question, but it is not obvious that such a license exists. The explicit patent grants are generally tied to exact implementations of the language and library specifications, with neither subsets nor supersets allowed. Android's Dalvik is not such an implementation. There may be an implicit patent grant with Sun's GPL-licensed code, but Android does not use that code. Dalvik is not governed by Sun's license, so it may be hard to claim protection under the patent grant which is (implicitly) found in that license.
But, then, your editor is not a lawyer and his opinions on any subject are known to have a tenuous grip on reality; just ask your editor's children.
The complaint also alleges copyright infringement, but no specifics are available at this time. There is some speculation that Oracle sees an "unauthorized" implementation of the Java specification as an infringement on that specification's copyright. For now, though, we must wait to see what Oracle is really claiming.
This is not an attack on free software in general, despite the fact that Google would like to see the community view it that way. It is an attack on a specific platform (much of which is free software) by a rapacious company which has just bought an expensive asset and wants to squeeze some revenue from it. It seems quite likely that this suit would have happened in the same way if Dalvik were proprietary. Even if Oracle gets everything it wants, the damage to the wider free software community will be limited. We were strong before the advent of Android, and would remain strong if it were to be removed from the scene.
That said, we are certainly stronger with a free Android than without, and we surely do not want to see a thriving free software platform taken down (or taxed) by a patent troll.
What is going on here is that the mobile market is seen as a gold mine, and everybody is trying to grab a piece of it in one way or another. Some companies are most interested in gaining their slice through the creation of mobile platforms that people actually want to buy and use; others are more inclined toward getting theirs through the courts. And some companies are doing both. As a result, anybody trying to work in this market is currently embroiled in lawsuits; see this diagram in the New York Times for a summary of where things stood back in March. It will be most interesting to see if this whole mess can be resolved. In the past, such situations have led to the creation of patent pools - not a free-software-friendly solution.
Despite this suit, and despite the withdrawal of OpenSolaris, Oracle seems to be determined to continue to work with the community on other fronts. The company claims to contribute to a long list of projects, and it employs a number of well-respected developers. One assumes that those projects will not start rejecting contributions from those developers. But neither will those projects deal with Oracle in the future without wondering, if just for a moment, what the company's motives and goals really are. It may not be an attack on free software in general, but this lawsuit has shown that Oracle is willing to use software patents to attack a specific free software project that it disagrees with. This move will kill a lot of the trust between Oracle and the development community; now one cannot help but wonder what might happen if, say, an OpenSolaris or MySQL fork starts to overshadow the original.
Non-free platforms should be avoided. Sun released much of the Java code under the GPL - eventually - but it never made Java truly free. The company went out of its way to retain control over the language and of any implementations of it; control over the specifications, copyright licensing policies forcing control over the code, and software patents held in reserve do not add up to a platform one can trust. Sun seemingly feared forks above all else, and so went out of its way to eliminate the freedom to fork whenever possible. The result was a non-free and hazardous platform; Oracle now seems to be saying that it cannot even be implemented independently without infringing both patents and copyrights. This kind of suit would not have happened had Google decided to make its own version of, say, Python.
There is no absolute security in this world. But there is relative security, and, by now, it should be clear that the relative security of a platform owned and controlled by a single corporation is quite low. Corporations, by their nature, are not trustworthy beasts; even the most well-intentioned company is only one bad quarter (or one takeover) away from becoming an aggressive troll. Sun was unlikely to sue over a not-really-Java virtual machine, but Sun has been replaced by a company with a rather different mindset. That company now has control over a platform that many people have based their businesses on, and, as we can see, it will react strongly when it sees a potential threat to that control.
How all this will turn out is anybody's guess. Perhaps Google will pay the troll to have some peace to continue to pursue the goal of total Android world domination. Perhaps some parts of Android will become more closed. Or perhaps Google will fight to the end while simultaneously executing an emergency backup plan which involves shifting the whole platform to the Ruby language. One thing that can be said is that, as long as software patents remain a threat, we will continue to see cases like this.
A longstanding bug in the Linux kernel—quite possibly since the first 2.6 release in 2003—has been fixed by a recent patch, but the nearly two-month delay between the report and the fix is raising some eyebrows. It is a local privilege escalation flaw that can be triggered by malicious X clients forcing the server to overrun its stack.
The problem was discovered by Rafal Wojtczuk of Invisible Things Lab (ITL) while working on Qubes OS, ITL's virtualization-based, security-focused operating system. ITL's CEO Joanna Rutkowska describes the flaw on the company's blog and Wojtczuk released a paper [PDF] on August 17 with lots more details. In that paper, he notes that he reported the problem to the X.org security team on June 17, and by June 20 the team had determined that it should be fixed in the kernel. But it took until August 13 before that actually happened.
In addition, the description in the patch isn't terribly forthcoming about the security implications of the bug. That is in keeping with Linus Torvalds's policy of disclosing security bugs via code, but not in the commit message, because he feels that may help "script kiddies" easily exploit the flaw. There have been endless arguments about that policy on linux-kernel, here at LWN, and elsewhere, but Torvalds is quite adamant about his stance. While some are calling it a "silent" security fix—and to some extent it is—it really should not come as much of a surprise.
The bug is not in the X server, though the fact that it runs as root on most distributions makes the privilege escalation possible. Because Linux does not separate process stack and heap pages, overrunning a stack page into an adjacent heap page is possible. That means that a sufficiently deep stack (from a recursive call for example) could end up using memory in the heap. A program that can write to that heap page (e.g. an X client) could then manipulate the return address of one of the calls to jump to a place of its choosing. That means that the client can cause the server to run code of its choosing—arbitrary code execution—which can be leveraged to gain root privileges.
Evidently, this kind of exploit has been known for five years or more as Wojtczuk's paper points to a presentation [PDF] by Gaël Delalleau at CanSecWest in 2005 describing the problem, and pointing out that Linux was vulnerable to it. Unfortunately it would seem that the information didn't reach the kernel security team until it was rediscovered recently.
The X server has some other attributes that make it an ideal candidate to exploit the kernel vulnerability. Most servers run with the MIT shared memory extension (MIT-SHM) which allows clients to share memory with the server to exchange image data. An attacker can cause the X server to almost completely exhaust its address space by creating many shared memory segments to share with the server. 64-bit systems must allocate roughly 36,000 32Kx32K pixmaps in the server before creating the shared memory to further reduce the address space. One of the shared memory segments will get attached by the server in the "proper" position with respect to the server's stack.
Once that is done, the client then causes the X server to make a recursive function call. By looking through the shared memory segments for non-zero data, the client can figure out which of the segments is located adjacent to the stack. At that point, it spawns another process that continuously overwrites that segment with the attack payload and triggers the recursion again. When the recursion unwinds, it will hit the exploit code and jump off to do the attacker's bidding—as root.
It is possible that other root processes or setuid programs are vulnerable to the kernel flaw, and X servers with MIT-SHM disabled may be as well. All of those cases are, as yet, hypothetical, and are likely to be much harder to exploit.
X.org hacker Keith Packard described how the fix progressed within the X team. He said that they tried several fixes in the X server, including using resource limits to reduce the address space allowed to the server and limiting recursion depth while ensuring adequate stack depth. None of those were deemed complete fixes for the problem, though.
Andrea Arcangeli and Nick Piggin worked on a fix on the kernel side, but it was not accepted by Torvalds because it "violated some internal VM rules", Packard said. As the deadline for disclosure neared—after being extended from its original August 1 date—Torvalds implemented his own solution which fixed the problem. Overall, Packard was pleased with the response:
It should also be noted that Torvalds's original fix had a bug, which he has since fixed. The new patch, along with a fix for a user-space-visible change to the /proc/<pid>/maps file are out for stable kernel review at the time of this writing. So, a full correct fix for the problem is not yet available except for those running development kernels or patching the fix in on their own.
All of the "fancy security mechanisms" in Linux were not able to stop this particular exploit, Rutkowska said. She also pointed out that the "sandbox -X" SELinux compartmentalization would not stop this exploit. While it isn't a direct remote exploit, it only takes one vulnerable X client (web browser, PDF viewer, etc.) to turn it into something that is remotely exploitable. Given the number of vulnerable kernels out there, it could certainly be a bigger problem in the future.
The most unfortunate aspect of the bug is the length of time it took to fix. Not just the two months between its discovery and fix, but also the five years since Delalleau's presentation. We need to get better at paying attention to publicly accessible security reports and fixing the problems they describe. One has to wonder how many attackers took note of the CanSecWest presentation and have been using that knowledge for ill. There have been no reports of widespread exploitation—that would likely have been noticed—but smaller, targeted attacks may well have taken advantage of the flaw.
|Created:||August 13, 2010||Updated:||September 28, 2010|
From the Pardus advisory:
The MS-ZIP decompressor in cabextract before 1.3 allows remote attackers to cause a denial of service (infinite loop) via a malformed MSZIP archive in a .cab file during a (1) test or (2) extract action, related to the libmspack library.
|Created:||August 16, 2010||Updated:||August 18, 2010|
|Description:||From the Fedora advisory:
Multiple vulnerabilities and weaknesses were discovered in Drupal.
|Package(s):||flash-plugin||CVE #(s):||CVE-2010-0209 CVE-2010-2213 CVE-2010-2214 CVE-2010-2215 CVE-2010-2216|
|Created:||August 12, 2010||Updated:||January 21, 2011|
From the Red Hat advisory:
Multiple security flaws were found in the way flash-plugin displayed certain SWF content. An attacker could use these flaws to create a specially-crafted SWF file that would cause flash-plugin to crash or, potentially, execute arbitrary code when the victim loaded a page containing the specially-crafted SWF content. (CVE-2010-0209, CVE-2010-2213, CVE-2010-2214, CVE-2010-2216)
A clickjacking flaw was discovered in flash-plugin. A specially-crafted SWF file could trick a user into unintentionally or mistakenly clicking a link or a dialog. (CVE-2010-2215)
|Package(s):||freetype||CVE #(s):||CVE-2010-2805 CVE-2010-2806 CVE-2010-2807 CVE-2010-2808|
|Created:||August 13, 2010||Updated:||January 20, 2011|
From the Pardus advisory:
CVE-2010-2805, CVE-2010-2806, CVE-2010-2807, CVE-2010-2808: Memory corruption flaws were found in the way FreeType font rendering engine processed certain Adobe Type 1 Mac Font File (LWFN) fonts. An attacker could use this flaw to create a specially-crafted font file that, when opened, would cause an application linked against libfreetype to crash, or, possibly execute arbitrary code.
|Created:||August 16, 2010||Updated:||September 6, 2011|
|Description:||From the CVE entry:
The (1) mod_cache and (2) mod_dav modules in the Apache HTTP Server 2.2.x before 2.2.16 allow remote attackers to cause a denial of service (process crash) via a request that lacks a path.
|Created:||August 16, 2010||Updated:||August 18, 2010|
|Description:||Multiple vulnerabilities have been fixed in icedtea6-1.8.1. The Fedora advisory does not clearly indicate which of the fixes are security related, however, nor are there any CVE numbers listed. The only clear security mention is:
Fix security flaw in NetX that allows arbitrary unsigned apps to set any java property.
|Package(s):||kernel kernel-pae||CVE #(s):||CVE-2010-2226 CVE-2010-2537 CVE-2010-2538 CVE-2010-2798|
|Created:||August 13, 2010||Updated:||March 3, 2011|
From the Pardus advisory:
CVE-2010-2226: A flaw was found in the handling of the SWAPEXT IOCTL in the Linux kernel XFS file system implementation. A local user could use this flaw to read write-only files, that they do not own, on an XFS file system. This could lead to unintended information disclosure.
CVE-2010-2537: The BTRFS_IOC_CLONE and BTRFS_IOC_CLONE_RANGE ioctls should check whether the donor file is append-only before writing to it.
CVE-2010-2538: The BTRFS_IOC_CLONE_RANGE ioctl appears to have an integer overflow that allows a user to specify an out-of-bounds range to copy from the source file (if off + len wraps around).
CVE-2010-2798: The problem was in the way the gfs2 directory code was trying to re-use sentinel directory entries. A local, unprivileged user on a gfs2 mounted directory can trigger this issue, resulting in a NULL pointer dereference.
|Created:||August 17, 2010||Updated:||March 21, 2011|
|Description:||From the Red Hat advisory:
when an application has a stack overflow, the stack could silently overwrite another memory mapped area instead of a segmentation fault occurring, which could lead to local privilege escalation on 64-bit systems. This issue is fixed with an implementation of a stack guard feature.
|Created:||August 16, 2010||Updated:||January 20, 2011|
|Description:||From the CVE entry:
loaders/load_it.c in libmikmod, possibly 3.1.12, does not properly account for the larger size of name##env relative to name##tick and name##node, which allows remote attackers to trigger a buffer over-read and possibly have unspecified other impact via a crafted Impulse Tracker file, a related issue to CVE-2010-2546. NOTE: this issue exists because of an incomplete fix for CVE-2009-3995.
|Created:||August 16, 2010||Updated:||July 29, 2011|
|Description:||From the Mandriva advisory:
The (1) htk_read_header, (2) alaw_init, (3) ulaw_init, (4) pcm_init, (5) float32_init, and (6) sds_read_header functions in libsndfile 1.0.20 allow context-dependent attackers to cause a denial of service (divide-by-zero error and application crash) via a crafted audio file.
|Package(s):||lxr-cvs||CVE #(s):||CVE-2010-1625 CVE-2010-1738 CVE-2010-1448 CVE-2009-4497|
|Created:||August 18, 2010||Updated:||August 18, 2010|
|Description:||The lxr-cvs tool fails to properly sanitize user input in a number of places, leading to several cross-site scripting vulnerabilities.|
|Package(s):||mipv6-daemon||CVE #(s):||CVE-2010-2522 CVE-2010-2523|
|Created:||August 17, 2010||Updated:||October 25, 2010|
|Description:||From the Fedora advisory:
This update fixes two security problems in mipv6-daemon: I) CVE-2010-2522: The origin of netlink messages sent to mipv6-daemon was not verified, allowing for local users to spoof netlink messages and thus influence the behaviour of mipv6-daemon. II) CVE-2010-2523: A specially crafted ND_OPT_PREFIX_INFORMATION or ND_OPT_HOME_AGENT_INFO packet could be used to exploit a buffer overflow in mipv6-daemon.
|Package(s):||openjdk-6||CVE #(s):||CVE-2010-2548 CVE-2010-2783|
|Created:||August 16, 2010||Updated:||August 26, 2010|
|Description:||From the Ubuntu advisory:
It was discovered that the IcedTea plugin did not correctly check certain accesses. If a user or automated system were tricked into running a specially crafted Java applet, a remote attacker could read arbitrary files with user privileges, leading to a loss of privacy.
|Created:||August 13, 2010||Updated:||September 3, 2010|
From the CVE entry:
Multiple cross-site scripting (XSS) vulnerabilities in rekonq 0.5 and earlier allow remote attackers to inject arbitrary web script or HTML via (1) a URL associated with a nonexistent domain name, related to webpage.cpp, aka a "universal XSS" issue; (2) unspecified vectors related to webview.cpp; and the about: views for (3) favorites, (4) bookmarks, (5) closed tabs, and (6) history. References
|Created:||August 12, 2010||Updated:||January 14, 2013|
From the SquirrelMail advisory:
A bug has been identified in SquirrelMail that poses a denial of service risk. The problem exists in SquirrelMail versions up through 1.4.20 wherein an attacker can submit random login attempts with 8-bit characters in the password. This will cause SquirrelMail to temporarily accept the login (further actions will all fail; user is never *actually* logged in) and create a preferences file (if one does not already exist) for the given username. An attacker could continue to use random usernames with the same password until enough preference files are created that the server runs out of hard disk space. We consider this a relatively low-risk problem, but it nevertheless has been fixed in SquirrelMail version 1.4.21.
|Created:||August 16, 2010||Updated:||August 18, 2010|
|Description:||From the Red
a deficiency in the way ssmtp removed trailing '\n' sequence by processing lines beginning with a leading dot. A local user, could send a specially-crafted e-mail message via ssmtp send-only sendmail emulator, leading to ssmtp executable denial of service (exit with: ssmtp: standardise() -- Buffer overflow). Different vulnerability than CVE-2008-3962.
|Created:||August 12, 2010||Updated:||April 19, 2011|
From the Red Hat advisory:
Multiple buffer overflow flaws were found in the Wireshark SigComp Universal Decompressor Virtual Machine (UDVM) dissector. If Wireshark read a malformed packet off a network or opened a malicious dump file, it could crash or, possibly, execute arbitrary code as the user running Wireshark. (CVE-2010-2287, CVE-2010-2995)
|Package(s):||znc||CVE #(s):||CVE-2010-2812 CVE-2010-2934|
|Created:||August 12, 2010||Updated:||August 18, 2010|
From the Red Hat bugzilla entry:
An out-of-range flaw was found in znc where if it received a "PING" from a client without an argument, std::string would throw a std::out_of_range exception which killed znc.
Some unsafe substr() calls were fixed as well. These are of lesser impact because a valid login is required in order to cause a std::out_of_range exception.
Page editor: Jake Edge
Brief itemslast week's merge window summary; see below for the most significant of them. Overall, the headline additions to 2.6.36 look to be the AppArmor security module, a new suspend mechanism which might - or might not - address the needs of the Android project, the LIRC infrared controller driver suite, a new out-of-memory killer, and the fanotify hooks for anti-malware applications. The full changelog is available for those who want all the details.
A handful of patches have been merged since 2.6.36-rc1; they include parts of the VFS scalability patch set by Nick Piggin. We'll take a closer look at those patches for next week's edition.
Stable updates: The 22.214.171.124, 126.96.36.199, 188.8.131.52, and 184.108.40.206 updates were released on August 13. Greg notes that the 2.6.34 updates are coming to an end, with only one more planned. There is another 2.6.27 update in the review process as of this writing; the future of 2.6.27 updates appears to be short as well, given that Greg can no longer boot such old kernels on any hardware in his possession.
1) base and suffices choose the possible types.
2) order of types is always the same: int -> unsigned -> long -> unsigned long -> long long -> unsigned long long
3) we always choose the first type the value would fit into
4) L in suffix == "at least long"
5) LL in suffix == "at least long long"
6) U in suffix == "unsigned"
7) without U in suffix, base 10 == "signed"
Developers, understandably, want their code to be used, but turning new features on by default is often thought to be taking things a bit too far. Herbert Xu and other kernel crypto subsystem developers recently ran afoul of this policy when a new option controlling the self-testing the crypto drivers at boot time was set to "yes" by default. They undoubtedly thought that this feature was important—bad cryptography can lead to system or data corruption—but Linux has a longstanding policy that features should default to "off". When David Howells ran into a problem caused by a bug when loading the cryptomgr module, Linus Torvalds was quick to sharply remind Xu of that policy.
The proximate cause of Howells's problem was that the cryptomgr was returning a value that made it appear as if it was not loaded. That caused a cascade of problems early in the boot sequence when the module loader was trying to write an error message to /dev/console, which had not yet been initialized. Xu sent out a patch to fix that problem, but Howells's bisection pointed to a commit that added a way to disable boot-time crypto self-tests—defaulted to running the tests.
Torvalds was characteristically blunt: "People always think that their magical code is so important. I tell you up-front that [it] absolutely is not. Just remove the crap entirely, please." He was unhappy that, at least by default, everyone would be running these self-tests every time they boot. But Xu was worried about data corruption and potentially flaky crypto hardware:
The last thing you want to is to upgrade your kernel with a new hardware crypto driver that detects that you have a piece of rarely- used crypto [hardware], decides to use it and ends up making your data toast.
But Torvalds was unconvinced: "The _developer_ had better test the thing. That is absolutely _zero_ excuse for then forcing every boot for every poor user to re-do the test over and over again.". Others were not so sure, however. Kyle Moffett noted that he had been personally bitten by new hardware crypto drivers that failed the self-tests—thus falling back to the software implementation—so he would like to see more testing:
Basically Torvalds's point was that making every user pay the cost to run the self-tests at boot time was too high. The drivers should be reliable or they shouldn't be in the kernel. He continued: "And if you worry about alpha-particles, you should run a RAM test on every boot. But don't ask _me_ to run one."
Though Xu posted a patch to default the self-tests to "off", it has not yet made its way into the mainline. Given Torvalds's statements, though, that will probably happen relatively soon. If distributions disagree with his assessment, they can, and presumably will, enable the tests for their kernels.
Kernel development newslast week's update; this article will cover the significant additions since then, starting with the user-visible changes:
Changes visible to kernel developers include:
kparam_block_sysfs_read(name); kparam_unblock_sysfs_read(name); kparam_block_sysfs_write(name); kparam_unblock_sysfs_write(name);
Here, name is the name of the parameter as supplied to module_param() in the same source file. They are implemented with a mutex.
All told, some 7,770 changes were incorporated during this merge window. There were not a whole lot of changes pushed back this time around. The biggest feature which was not merged, perhaps, was transparent hugepages, but that omission is most likely due to the lack of a proper pull request from the maintainer.
Now the stabilization period begins. Linus has suggested that he plans to repeat his attempt to hold a hard line against any post-rc1 changes which are not clearly important fixes; we will see how that works out in practice.
One's first thought, when faced with the prospect of handling one billion files, might be to look for workarounds. Rather than shoveling all of those files into a single filesystem, why not spread them out across a series of smaller filesystems? The problems with that approach are that (1) it limits the kernel's ability to optimize head seeks and such, reducing performance, and (2) it forces developers (or administrators) to deal with the hassles involved in actually distributing the files. Inevitably things will get out of balance, forcing things to be redistributed in the future.
Another possibility is to use a database rather than the filesystem. But filesystems are familiar to developers and users, and they come with the operating system from the outset. Filesystems also are better at handling partial failure; databases, instead, tend to be all-or-nothing affairs.
If one wanted to experiment with a billion-file filesystem, how would one come up with hardware which is up to the task? The most obvious way at the moment is with external disk arrays. These boxes feature non-volatile caching and a hierarchy of storage technologies. They are often quite fast at streaming data, but random access may be fast or slow, depending on where the data of interest is stored. They cost $20,000 and up.
With regard to solid-state storage, Ric noted only that 1Tb still costs a good $1000. So rotating media is likely to be with us for a while.
What if you wanted to put together a 100Tb array on your own? They did it at Red Hat; the system involved four expansion shelves holding 64 2Tb drives. It cost over $30,000, and was, Ric said, a generally bad idea. Anybody wanting a big storage array will be well advised to just go out and buy one.
The filesystem life cycle, according to Ric, starts with a mkfs operation. The filesystem is filled, iterated over in various ways, and an occasional fsck run is required. At some point in the future, the files are removed. Ric put up a series of plots showing how ext3, ext4, XFS, and btrfs performed on each of those operations with a one-million-file filesystem. The results varied, with about the only consistent factor being that ext4 generally performs better than ext3. Ext3/4 are much slower than the others at creating filesystems, due to the need to create the static inode tables. On the other hand, the worst performers when creating 1 million files were ext3 and XFS. Everybody except ext3 performs reasonably well when running fsck - though btrfs shows room for some optimization. The big loser when it comes to removing those million files is XFS.
To see the actual plots, have a look at Ric's slides [PDF].
It's one thing to put one million files into a filesystem, but what about one billion? Ric did this experiment on ext4, using the homebrew array described above. Creating the filesystem in the first place was not an exercise for the impatient; it took about four hours to run. Actually creating those one billion files, instead, took a full four days. Surprisingly, running fsck on this filesystem only took 2.5 hours - a real walk in the park. So, in other words, Linux can handle one billion files now.
That said, there are some lessons that came out of this experience; they indicate where some of the problems are going to be. The first of these is that running fsck on an ext4 filesystem takes a lot of memory: on a 70Tb filesystem with one billion files, 10GB of RAM was needed. That number goes up to 30GB when XFS is used, though, so things can get worse. The short conclusion: you can put a huge amount of storage onto a small server, but you'll not be able to run the filesystem checker on it. That is a good limitation to know about ahead of time.
Next lesson: XFS, for all of its strengths, struggles when faced with metadata-intensive workloads. There is work in progress to improve things in this area, but, for now, it will not perform as well as ext3 in such situations.
According to Ric, running ls on a huge filesystem is "a bad idea"; iterating over that many files can generate a lot of I/O activity. When trying to look at that many files, you need to avoid running stat() on every one of them or trying to sort the whole list. Some filesystems can return the file type with the name in readdir() calls, eliminating the need to call stat() in many situations; that can help a lot in this case.
In general, enumeration of files tends to be slow; we can do, at best, a few thousand files per second. That may seem like a lot of files, but, if the target is one billion files, it will take a very long time to get through the whole list. A related problem is backup and/or replication. That, too, will take a very long time, and it can badly affect the performance of other things running at the same time. That can be a problem because, given that a backup can take days, it really needs to be run on an operating, production system. Control groups and the I/O bandwidth controller can maybe help to preserve system performance in such situations.
Finally, application developers must bear in mind that processes which run this long will invariably experience failures, sooner or later. So they will need to be designed with some sort of checkpoint and restart capability. We also need to do better about moving on quickly when I/O operations fail; lengthy retry operations can take a slow process and turn it into an interminable one.
In other words, as things get bigger we will run into some scalability problems. There is nothing new in that revelation. We've always overcome those problems in the past, and should certainly be able to do so in the future. It's always better to think about these things before they become urgent problems, though, so talks like Ric's provide a valuable service to the community.Linux Storage and Filesystem summit was to get rid of support for barriers in the Linux kernel block subsystem. This was a popular decision, but also somewhat misunderstood (arguably, by your editor above all). Now, a new patch series from Tejun Heo shows how request ordering will likely be handled between filesystems and the block layer in the future.
The block layer must be able to reorder disk I/O operations if it is to obtain the sort of performance that users expect from their systems. On rotating media, there is much to be gained by minimizing head seeks, and that goal is best achieved by executing all nearby requests together, regardless of the order in which those requests were issued. Even with flash-based devices, there is some benefit to be had by grouping adjacent requests, especially when small requests can be coalesced into larger operations. Proper dispatch of requests to the low-level device driver is normally the I/O scheduler's job; the scheduler will freely reorder requests, blissfully ignorant of the higher-level decisions which created those requests in the first place.
Note that this reordering also usually happens within the storage device itself; requests will be cached in (possibly volatile) memory and writes will be executed at a time which the hardware deems to be convenient. This reordering is typically invisible to the operating system.
The problem, of course, is that it is not always safe to reorder I/O requests in arbitrary ways. The classic example is that of a journaling filesystem, which operates in roughly this way:
If the system were to crash before step 3 completes, everything written to the journal would be lost, but the integrity of the filesystem would be unharmed. If the system crashes after step 3, but before the changes are written to the filesystem, those changes will be replayed at the next mount, preserving both the metadata and the filesystem's integrity. Thus, journaling makes a filesystem relatively crash-proof.
But imagine what can happen if requests are reordered. If the commit record is written before all of the other changes have been written to the journal, then, after a crash, an incomplete journal would be replayed, corrupting the filesystem. Or, if a transaction frees some disk blocks which are subsequently reused elsewhere in the filesystem, and the reused blocks are written before the transaction which freed them is committed, a crash at the wrong time would, once again, corrupt things. So, clearly, the filesystem must be able to impose some ordering on how requests are executed; otherwise, its attempts to guarantee filesystem integrity in all situations may well be for nothing.
For some years, the answer has been barrier requests. When the filesystem issues a request to the block layer, it can mark that request as a barrier, indicating that the block layer should execute all requests issued before the barrier prior to doing any requests issued afterward. Barriers should, thus, ensure that operations make it to the media in the right order while not overly constraining the block layer's ability to reorder requests between the barriers.
In practice, barriers have an unpleasant reputation for killing block I/O performance, to the point that administrators are often tempted to turn them off and take their risks. While the tagged queue operations provided by contemporary hardware should implement barriers reasonably well, attempts to make use of those features have generally run into difficulties. So, in the real world, barriers are implemented by simply draining the I/O request queue prior to issuing the barrier operation, with some flush operations thrown in to get the hardware to actually commit the data to persistent media. Queue-drain operations will stall the device and kill the parallelism needed for full performance; it's not surprising that the use of barriers can be painful.
In their discussions of this problem, the storage and filesystem developers have realized that the ordering semantics provided by block-layer barriers are much stronger than necessary. Filesystems need to ensure that certain requests are executed in a specific order, and they need to ensure that specific requests have made it to the physical media before starting others. Beyond that, though, filesystems need not concern themselves with the ordering for most other requests, so the use of barriers constrains the block layer more than is required. In general, it was concluded, filesystems should concern themselves with ordering, since that's where the information is, and not dump that problem into the block layer.
To implement this reasoning, Tejun's patch gets rid of hard-barrier operations in the block layer; any filesystem trying to use them will get a cheery EOPNOTSUPP error for its pains. A filesystem which wants operations to happen in a specific order will simply need to issue them in the proper order, waiting for completion when necessary. The block layer can then reorder requests at will.
What the block layer cannot do, though, is evade the responsibility for getting important requests to the physical media when the filesystem requires it. So, while barrier requests are going away, "flush requests" will replace them. On suitably-capable devices, a flush request can have two separate requirements: (1) the write cache must be flushed before beginning the operation, and (2) the data associated with the flush request itself must be committed to persistent media by the time the request completes. The second part is often called a "force unit access" (or FUA) request.
In this world, a journaling filesystem can issue all of the journal writes for a given transaction, then wait for them to complete. At that point, it knows that the writes have made it to the device, but the device might have cached those requests internally. The write of the commit record can then follow, with both the "flush" and "FUA" bits set; that will ensure that all of the journal data makes it to physical media before the commit record does, and that the commit record itself is written by the time the request completes. Meanwhile, all other I/O operations - playing through previous transactions or those with no transaction at all - can be in flight at the same time, avoiding the queue stall which characterizes the barrier operations implemented by current kernels.
The patch set has been well received, but there is still work to be done, especially with regard to converting filesystems to the new way of doing things. Christoph Hellwig has posted a set of patches to that end. A lot of testing will be required as well; there is little desire to introduce bugs in this area, since the consequences of failure are so high. But the development cycle has just begun, leaving a fair amount of time to shake down this work before the 2.6.37 merge window opens.
Patches and updates
Core kernel code
Filesystems and block I/O
Page editor: Jonathan Corbet
In an effort to beef up the quality assurance (QA) process for Fedora, the project has launched a new QA-focused team called Proven Testers. The Proven Testers are a select group of QA volunteers who are responsible for stress-testing and approving updates to what Fedora calls its "critical path" packages. The project hopes this new approach will increase the quality of Fedora releases, but also hopes it will attract more core developers and packagers to the QA process itself.
The Fedora QA process is already a systematic process, consisting of organized teams for bug triage, organized test plans and test days, and drafting release criteria. Prior to the Proven Testers subproject, however, testing release milestones and updates was not a strictly-organized affair. The Fedora test list acts as a coordination point for individual volunteers, who test both packages hitting the development repository Rawhide and packages hitting the updates-testing repository for maintained releases.
Adam Miller proposed the Proven Testers program in March, with the goal of providing increased attention to packages that affect the critical path — essentially, the core system functionality (installation, boot, mounting filesystems, graphics, login, network, fetching package updates, etc) without which the system is unusable.
Proven Testers are asked to perform a full system update from updates-testing at least once per day, updating individual packages more frequently when there is an urgent need. They then test for basic stability, and provide feedback against the update. Major bugs reports are to be reported in the project's Bugzilla, and positive or negative "karma" votes are given using Bodhi. Updates must receive positive votes from members of the Proven Testers group in order to be promoted through the system for release.
The initial set of Proven Testers numbered just twelve, drawn from the existing QA group members, to attempt a trial run. Subsequently, the QA project has opened up the Proven Testers group to others, although Miller says it is intended for "members who have a 'proven' track record of having good testing habits, file meaningful feedback (bugs or karma to Bodhi), being familiar with the over all Fedora QA processes/guidelines, etc."
This does not exclude newcomers, he explained; rather, new testers who wish to join are paired with a more experienced mentor to guide them through the process. Given the critical nature of critical path updates, he added, there is an informal process that could be used to remove a Proven Tester in the case that one consistently gives positive feedback to broken updates, but it has never been used, and it is hoped never to be needed.
As of now, there are 33 Proven Testers, responsible for testing 579 critical path packages. Following this past spring's trial run, Fedora 13 (released in May) is the first to enjoy the support of the Proven Testers program since day one.
Testing processes among the other community-driven Linux distributions vary considerably in terms of formality. Ubuntu's Testing Team performs organized daily smoke tests and pre-release ISO testing, sponsors testing days aimed at particular features and applications, and maintains distribution- and application-test cases. It, too, maintains a repository for proposed updates to stable releases, and has a public stable release update (SRU) verification team. The SRU verification team, however, does not have to sign off on updates for packages to be approved for release, and there is not a formal membership application-and-approval-process.
OpenSUSE has a volunteer Core Testing Team responsible for ensuring basic functionality in development releases, and maintains separate sub-teams for specific core areas such as KDE, GNOME, installation, wireless, and LAMP servers. OpenSUSE has recently excised and relaunched its public wiki, which makes finding current documentation of the team's processes a challenge, but according to the mailing list the emphasis is placed on ISO testing as a part of the regular release process. A separate Maintenance Team also exists for packaging updates for maintained releases, although it does not appear to encompass testing. A similar function, though, seems to be provided by Novell's QA team for the company's SUSE Linux Enterprise products.
Debian's testing process, of course, is different entirely, as is its release process. Individual package updates progress through the experimental and then unstable distributions based on the amount of time each has been available, the build status for all of the supported architectures, and the number of release-critical bugs being fewer or equal to the number for the prior update. Historically, stable releases of the distribution are made at the discretion of the release manager, and updates are generally limited to security fixes.
Fedora's use of voting through Bodhi already distinguishes it from the other distributions, where bug reports are the determining factor in an updates acceptance. Bodhi solves the problem that the mere absence of a negative (a bug) does not prove an update is ready. However, the positive (a vote in Bodhi) clearly makes "false positives" a potential pitfall in addition to the "false negative" of an undiscovered bug.
The Proven Testers project is an effort to correct for this, at least for the most critical packages. But in addition, Miller hopes it will attract more individual developers and packagers to participate in Fedora's QA team. By and large, the historical QA and testing communities have seen more participation from non-developers. Hopefully, by bringing developers, packagers, bug reporters, and testers into a closer release process overall, the stability of the distribution will be improved, and the community can engage in better overall communication.
Miller is pleased with the Proven Testers project thus far, although he notes it is not perfect. "A perfect example of that is how PackageKit worked like a champ if you were in Gnome or XFCE but had an issue where it would no longer alert on updates if you were running KDE. So again, not a perfect process but we do try." In the long run, though, he is more excited about other developments in the Fedora QA process, such as the AutoQA automated test system.
Several of the other distributions are developing automated test tools. While certainly helpful, they will never supplant human testers as the last line of defense. All large free software projects are concerned about QA and testing. No individual distribution can hope to amass the sheer volume of testers attracted by the Linux kernel itself, so the more systematic approach being taken by Fedora is a welcome development — perhaps attracting additional participants, but more practically, allowing them to focus their energies towards a measurable test process. If it works, it may be a valuable case study for other large projects for whom release stability is a major concern, from the various desktop environments, to development frameworks, to X.org.
New ReleasesThis release includes updated server, desktop, and alternate installation CDs for the i386 and amd64 architectures."
FedoraToday we held our readiness meeting for the Alpha release of Fedora 14. As you may know, this is a meeting with representatives from the Development, Release Engineering, and Quality Assurance teams. In these meetings, we evaluate the list of blocker bugs and give a "go" or "no go" signal on the state of the Fedora release. [...] You can read the minutes of the meeting here, but in short the decision was made that the release has not passed its release criteria." Click below for the full announcement. summary of the August 13 meeting of the Fedora Board. This was a public meeting on IRC. Topics include the schedule, a vision statement for Fedora, code maturity for inclusion, meeting protocol, Fedora Board composition, updates, and other board business.
Ubuntu familyannounced the release of uTouch 1.0, a multitouch/gesture stack which will be shipped with the upcoming 10.10 release. "With Ubuntu 10.10 (the Maverick Meerkat), users and developers will have an end-to-end touch-screen framework from the kernel all the way through to applications. Our multi-touch team has worked closely with the Linux kernel and X.org communities to improve drivers, add support for missing features, and participate in the touch advances being made in open source world. To complete the stack, weve created an open source gesture recognition engine and defined a gesture API that provides a means for applications to obtain and use gesture events from the uTouch gesture engine." introduces the mascot for Ubuntu 11.04, "Natty Narwhal". "The Narwhal, as an Arctic (and somewhat endangered) animal, is a fitting reminder of the fact that we have only one spaceship that can host all of humanity (trust me, a Soyuz won't do for the long haul to Alpha Centauri). And Ubuntu is all about bringing the generosity of all contributors in this functional commons of code to the widest possible audience, it's about treating one another with respect, and it's about being aware of the complexity and diversity of the ecosystems which feed us, clothe us and keep us healthy. Being a natty narwhal, of course, means we have some obligation to put our best foot forward. First impressions count, lasting impressions count more, so let's make both and make them favourable."
Newsletters and articles of interest
Page editor: Rebecca Sobol
Recently I bought a shiny new disk for my Fedora-10 based Mythtv system. I had to copy some 700GiB of video files from the old disk to the new one. I am used to rsync for this type of job as the rsync command and its accompanying options flow right from my fingers to the keyboard. However, I was not happy with what I saw, as the performance was nothing to write home about: the files were copied at about 37MiB/s. Both disks can handle about three times that speed — at least on the outer cylinders. That makes a lot of difference: an expected wait of just over two hours changed into a six hour ordeal. Note that both SATA disks were local to the system and no network was involved.
Wanting to know what happened, I created a small test to see what was going on: copying a 10GiB file from one disk to the other. I made sure that the ext4 file systems involved were completely fresh so fragmentation could not play a part (a new mkfs after each test.) I also made sure that the test file systems were created on the outermost (and fastest) cylinders of the disks. Simply reading the source file could be done at 106MiB/s and writing a 10GiB file to the destination file system could be done at 134MiB/s.
The copy programs under test were rsync, cpio, cp, and cat. Of course I took care that the cache could not interfere by flushing the cache before each test, and waiting for the dirty buffers to be flushed to the destination disk after the test command completes. For example, when the SRC and DEST are variables holding the name of the source file in the current directory and the name of the destination directory:
sync # flush dirty buffers to disk echo 3 > /proc/sys/vm/drop_caches # discard caches time sh -c "cp $SRC $DEST; sync" # measure cp and sync time
The echo command to /proc/sys/vm/drop_caches forces the invalidation of all non-dirty buffers in the page cache. To also force dirty pages to be flushed, we first use the sync command. The copy command will copy the 10GiB file, but it will actually finish before the last blocks have been flushed to disk. That is why we time the combination of the cp command and the sync command, which forces flushing the dirty blocks to disk.
The four commands tested were:
rsync $SRC $DEST echo $SRC | cpio -p $DEST cp $SRC $DEST cat $SRC > $DEST/$SRC
The results for rsync, cpio, cp, and cat were:
user sys elapsed hog MiB/s test 5.24 77.92 101.86 81% 100.53 cpio 0.85 53.77 101.12 54% 101.27 cp 1.73 59.47 100.84 60% 101.55 cat 139.69 93.50 280.40 83% 36.52 rsync
The observation that rsync was slow was indeed substantiated. Looking at the hog factor (the amount of cpu-time used relative to the elapsed time), we can conclude that rsync is not so much disk-bound (as is to be expected), but cpu-bound. That required some more scrutiny. The atop program showed rsync appears to need three processes: one process that does only disk reads, one that does only disk writes and one (I assume) control process that uses little CPU time and does no disk I/O.
Using strace, it can be shown that cp only uses read() and write() system calls in a tight loop, while rsync uses two processes that talk to each other using reads and writes through a socket, sprinkled with loads of select() system calls. To simulate the multiple processes, I then used multiple cat processes strung together using pipes. That test does not show the bad performance that rsync demonstrates. To test the influence of using a socket, I also created a TCP service using xinetd that just starts cat with its output redirected to a file to simulate the "network traffic." The client side:
cat $SRC | nc localhost myserviceAnd the server side:
cat > $DESTEven this setup outperforms rsync. It achieves the same disk bandwidth as cp with a far lower CPU load than rsync.
taskset -pc 0 1111 # force on CPU0 taskset -pc 1 1112 # force on CPU1 taskset -pc 2 1113 # force on CPU2
By using taskset right after rsync was started, the throughput of rsync went up from 36.5MiB to 40MiB. Though a 10% improvement, it was still nowhere near cat's performance. When forcing the three rsync processes to run on the same CPU, performance went down to 32MiB/s
rsync needs quite a lot of CPU power (both user and system time). Despite that, the on-demand frequency governor does not scale up the CPU frequency. We can force all cores to run at the highest frequency with:
for i in 0 1 2 3 ; do echo performance > /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor done
If the CPU-frequency is forced on the highest frequency (2.6GHz), the results for three rsyncs on a single core goes up: 62MiB/s. Combining this with the "spread the load" tactic using taskset, we even get up to 85MiB/s. Still 15% less than other copy programs, but more than a two-fold performance increase compared to the default situation.
The conclusion is that in the default situation, using cp over rsync will give you almost threefold better performance. However, a little tinkering with the scheduler (using taskset) and the cpufreq governor can get you a twofold performance improvement with rsync, but still only two-thirds that of cp.
Summarizing the results of the test with rsync:
Throughput CPUs Core frequency 22MiB/s 1-3 0.8GHz 23MiB/s 1 0.8GHz 34MiB/s 1 ondemand 37MiB/s 1-3 ondemand << default 39MiB/s 3 0.8GHz 40MiB/s 3 ondemand 62MiB/s 1 2.6GHz 62MiB/s 1-3 2.6GHz 85MiB/s 3 2.6GHz
In this table, the second column shows how the rsyncs were distributed over the cores. 1 CPU means the three rsyncs were forced on the same one single CPU. 1-3 CPUs means the scheduler could do what it saw fit. And finally when the three rsyncs were each forced on their own CPU, the table shows 3 CPUs.
It is clear that the default setting are not the worst settings, but close to it.
The bad behavior can be seen using cpufreq_stats. After loading the module:
modprobe cpufreq_statsit is possible to see how much time was spent in each frequency by which core. If we look at the results after the rsync command, we see for CPU 2:
$ cat /sys/devices/system/cpu/cpu2/cpufreq/stats/time_in_state 2600000 423293 1900000 363 1400000 534 800000 6645805The frequency (in 1000Hz units) is the first column, while the time (in 10ms units) is the second column. Since the module was loaded, CPU2 has spent most time on the lowest frequency, despite the fact that rsync really is quite CPU-intensive.
After all these results, I decided to give Arjan's patches a try. I compiled kernel version 2.6.35-rc3 that has the patches incorporated and used that instead of the 220.127.116.11-170.2.117 kernel Fedora 10 was running when the original problem popped up. For comparison, I also ran the tests with a more recent kernel that does not incorporate Arjan's patches: 2.6.34
I could immediately see (in atop) that the three rsync processes were on separate processors most of the time. The newer kernels apparently are better at spreading the load. However, this is not a great help:
FC10 2.6.34 2.6.35-rc3 CPUs Frequency MiB/s MiB/s MiB/s 23.12 28.85 28.07 1 0.8GHz 22.19 44.23 45.25 1-3 0.8GHz 38.62 43.39 43.75 3 0.8GHz 34.01 55.48 57.37 1 ondemand 36.52 44.85 45.08 1-3 ondemand <<default 39.73 43.65 44.30 3 ondemand 62.37 66.67 68.52 1 2.6GHz 62.15 92.34 91.84 1-3 2.6GHz 85.47 89.79 89.42 3 2.6GHz
The newer kernels are better at spreading the processes over the cores. However, this is hindering Arjan van de Ven's patch from doing its work. The patch does indeed work when all rsync processes run on a single CPU. But because the new kernel does a better job of spreading the processes over CPUs, Arjan's frequency increase does not occur. Arjan is working on an entirely new governor that may be better at raising the CPU's frequency when doing a lot of disk I/O.
Newsletters and articles
Page editor: Jonathan Corbet
Non-Commercial announcementsannounced a new policy on modules which require copyright assignment. "The very short summary is that the inclusion of a new module in GNOME that requires copyright assignment has to be explicitly approved on a case-by-case basis by both the Release Team and the GNOME Foundation Board." There is a set of detailed guidelines on when modules with such policies might be accepted.
Articles of interestreport at ars technica. There aren't too many details available, but according to the Oracle press release, it is about both copyright and patent infringements. "In the complaint, a copy of which was posted on VentureBeat, Oracle claims that Android, the Android SDK, and Dalvik all infringe on seven patents owned by the database giant. Oracle also accuses Google of "knowingly, willingly, and unlawfully" copying, preparing, publishing, and [distributing] its IP." report from Eben Moglen's LinuxCon keynote. Moglen spoke about the software patent problem, particularly for free software, now that the Bilski ruling did not go as far as some had hoped. "One of the main themes of Moglen's keynote is the power of collaboration between commercial business interests and the free software community. He argues that the economy of sharing that the free culture movement seeks to elevate is not mutually hostile with the mostly taken-for-granted and longstanding economy of ownership. In fact, he thinks that the sharing and ownership economies are mutually reinforcing." writes about the case for software patents—or at least the case that is being put forth by proponents. He is responding to a recent blog post by attorney Gene Quinn that, among other weak arguments, calls those who oppose software patents "ideological buffoons". Tiller says: "Not everyone views it this way, of course. Those who are profiting from the existing system generally think that it works rather well. And they have some appealing-sounding arguments. For instance, they argue that patents encourage innovation by allowing lone inventors to pursue their ground-breaking dreams in the face of powerful corporations. This sort of story tends to excite emotions and hinder rational analysis. It ignores the rarity of inventors who work without significant collaboration, of inventions that are ground-breaking, and of patents that ever recover even the cost of the patent application. Dreams of getting a hugely successful patent are about as realistic as dreams of winning the lottery. Still, it's a nice, and understandable, dream." report from the Education Mini-Summit that was held just before LinuxCon. Seven different talks from the mini-summit are summarized, with topics ranging from open source adoption to open data for education, along with several others. "Karlie [Robinson] also talked about some other initiatives happening at RIT [Rochester Institute of Technology] involving open source. Currently they are working on improving video chat on OLPCs in the hopes of providing enough fidelity for sign language video chat, with the help of funding from the National Institute for the Deaf. The project is leading to better video drivers and software for the OLPC. Typically for college computer science students, their buddy working on a project with them or in the same class is right down the hall, and they can chat in-person and help each other out and work on things. In FLOSS, the help you need is in an IRC room, not a dorm room down the hall, and working within that social space takes a bit of adjustment. For example, Karlie gave the example of a student struggling to find an answer to a question he had on the software he was working on. He told Karlie he couldn't find the answer. "Who did you ask?" asked Karlie. "Oh, it was late" the student replied. Karlie responded, "It's never late! People all over the world come online all the time who can help you." She's had to explain to the students that time doesn't exist in the FLOSS world and if a chat room is silent, it's time to hit the mailing list — just don't give up."
ResourcesPET: Python entre todos is "a magazine in Spanish, by pythonistas, for pythonistas, about Python made using python". The first issue has been released
|European DrupalCon||Copenhagen, Denmark|
|August 28||PyTexas 2010||Waco, TX, USA|
|OOoCon 2010||Budapest, Hungary|
|LinuxCon Brazil 2010||São Paulo, Brazil|
|Free and Open Source Software for Geospatial Conference||Barcelona, Spain|
|DjangoCon US 2010||Portland, OR, USA|
|CouchCamp: CouchDB summer camp||Petaluma, CA, United States|
|Ohio Linux Fest||Columbus, Ohio, USA|
|September 11||Open Tech 2010||London, UK|
|Open Source Singapore Pacific-Asia Conference||Sydney, Australia|
|X Developers' Summit||Toulouse, France|
|3rd International Conference FOSS Sea 2010||Odessa, Ukraine|
|Italian Debian/Ubuntu Community Conference 2010||Perugia, Italy|
|WordCamp Portland||Portland, OR, USA|
|September 18||Software Freedom Day 2010||Everywhere, Everywhere|
|September 23||Open Hardware Summit||New York, NY, USA|
|BruCON Security Conference 2010||Brussels, Belgium|
|PyCon India 2010||Bangalore, India|
|Japan Linux Symposium||Tokyo, Japan|
|Workshop on Self-sustaining Systems||Tokyo, Japan|
|September 29||3rd Firebird Conference - Moscow||Moscow, Russia|
|Open World Forum||Paris, France|
|Open Video Conference||New York, NY, USA|
|October 1||Firebird Day Paris - La Cinémathèque Française||Paris, France|
|Foundations of Open Media Software 2010||New York, NY, USA|
|IRILL days - where FOSS developers, researchers, and communities meet||Paris, France|
|Utah Open Source Conference||Salt Lake City, UT, USA|
|Free Culture Research Conference||Berlin, Germany|
|17th Annual Tcl/Tk Conference||Chicago/Oakbrook Terrace, IL, USA|
|Linux Foundation End User Summit||Jersey City, NJ, USA|
|October 12||Eclipse Government Day||Reston, VA, USA|
|October 16||FLOSS UK Unconference Autumn 2010||Birmingham, UK|
|October 16||Central PA Open Source Conference||Harrisburg, PA, USA|
|7th Netfilter Workshop||Seville, Spain|
|Pacific Northwest Software Quality Conference||Portland, OR, USA|
|Open Source in Mobile World||London, United Kingdom|
|openSUSE Conference 2010||Nuremberg, Germany|
|OLPC Community Summit||San Francisco, CA, USA|
If your event does not appear here, please tell us about it.
Page editor: Rebecca Sobol
Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds