LWN.net Weekly Edition for April 16, 2009 [LWN.net]

PostgreSQL 8.4 Beta: "We've got momentum"

April 15, 2009

This article was contributed by Bruce Byfield

The PostgreSQL project has released the beta of its 8.4 version, with the final release expected in late June or July. Like previous PostgreSQL releases, 8.4 features dozens of enhancements throughout the code, but Bruce Momjian, a member of PostgreSQL's core team since the project was founded in 1996, described it as a "more surgical release" with improvements tending to cluster in select areas. In particular, that area seems to be administrative features. However, almost as interesting as the features in the beta are the ones that failed to meet the cut this time, and the project's struggle to control the release process as it undergoes major growth.

Momjian explained the release's more targeted approach as the result of the project's advanced state. "You see a real consolidation in this release," he said, "and I think that's because you're seeing a much more complete feature set. You're seeing a real maturity of the code base, which is kind of surprising, because, if you looked a few years ago, you'd see changes all over the map."

At the same time, Momjian characterized the PostgreSQL code as being constantly revised, so that the project is unlikely to need any time soon the kind of major revision that KDE underwent last year.

That's always been a fear of ours, but we've actually never had to do it. And I think the reason is that, in general, we're always restructuring our source code. So we're always having to re-engineer things and clean them up.

For example, by the end of the Windows port [in 2003], the code was cleaner than when we started. You'd think that after adding a Windows port, the code would be just — you know — spaghetti code central, right? But we end up abstracting a lot of the Unix-specific behavior into a portability library. so now, a lot of the assumptions that you make about Unix are now codified in a separate place. And then you put the Windows pieces in there, and it works really well, too.

Nor is the project hesitant about altering behavior or deprecating legacy code in the name of what Momjian called "high standards and the promise of reliability."

New features

PostgreSQL releases tend to be one to two years apart, with far too many features to mention in any detail. Many of these features are highlighted in a PDF presentation by Momjian entitled, "Upcoming PostgreSQL Performance Features [PDF]", including Column Level Permissions and Per-Database Locales. However, if you ask active contributors what the major enhancements are in the 8.4 release, their answers emphasize automated administrative features.

Selena Deckelmann, user groups leader in the project, emphasized changes to Free Space Map, which maps unused space in a database. In previous releases, PostgreSQL could only detect newly freed space by an administrator manually running the Vacuum utility. By contrast, in the 8.4 release, freed space is re-mapped automatically, saving both time and effort.

Deckelmann also called attention to a new feature called Visibility Map. While in previous releases, Vacuum had to re-map all rows in a table, regardless of whether they had changed, Visibility Map improves performance by allowing the utility to skip rows that have not changed.

Yet another automated feature mentioned by Deckelmann is Auto Explain, which captures the explain plan for a query — that is, how the query finds results — and saves the results, information that can be used to improve system load and the efficiency of queries, and, ultimately, to reduce the costs of a database operation. "It's something that a lot of admins end up doing anyway," Deckelman observed, "They write a script that looks for long-running queries, and then they go in and manually figure with each one what's going on. It's kind of a neat feature that came from the Japanese PostgreSQL team."

Josh Berkus, another core team member, noted that Oracle's Statspack, a set of tools to generate statistics useful to database administrators, inspired pgstat. As Berkus explained the situation, users migrating from Oracle expect to see comparable tools in PostgreSQL, and Statspack:

...allows you to see exactly what's going on with your server internally in terms of how much memory it's using, what queries it is running, and all those other things that you need to know so that, when the load on the server starts going up, you know what to do about it. We've had some hackish tools for a long time, and have had some sophisticated activity logging, but activity logging is not very interactive. So we're adding some new interactive tools. We're trying to achieve an analog of what Statspack provides.

Berkus noted, though, that pgstat will not be stable enough to be installed by default in 8.4. Instead, it will be shipped in postgresql-contrib, the repository for tools that, for one reason or the other, are not part of the regular installation. Some tools in contrib may be too specialized for most installations, or illegal to ship under American restrictions on the exporting of cryptographic tools, while others, like pgstat, are still in development.

According to Deckelmann, statistics in PostgreSQL receive another boost in the 8.4 release with increased control over them. To reduce system overhead, statistics are no longer automatically collected and written to a file, operations that can have significant system overhead if done regularly. Instead, statistics can now be configured to run from a RAM disk to improve performance.

However, if the latest release has a single outstanding feature, it is parallel restore. As the name suggests, parallel restore allows admins to restore a database with multiple processes, rather than a single thread. Berkus, who runs his own PostgreSQL consulting business, said, "[Parallel restore] is the feature i've been making the most use of. I've been using it in beta already — pretty heavily. Because if you have 300 gigabyte databases, upgrading them single-threaded is lethally slow. It's a real issue."

He went on to describe parallel restore as a particularly difficult feature to implement, all the more so because it had to accommodate past changes in the PostgreSQL file format. While employed as a supervisor at Sun Microsystems, Berkus said, he had two employees working on a similar feature for a year and a half "without coming up with more than a rough prototype." By contrast, PostgreSQL developed its version in three weeks, followed by three months of debugging.

Features for the next release

As mentioned earlier on LWN, PostgreSQL's core team had hopes of other major features being in the 8.4 release, particularly Hot Standby and SE-PostgreSQL.

Hot Standby is a feature that is a major step in improving PostgreSQL's replication. Replication is an area in which PostgreSQL lags behind MySQL; it has been identified by the core team as one of the major priorities for the project, according to Berkus. PostgreSQL does have some replication, Berkus said, but it is needlessly complicated and "not for someone who can't make a large time investment in finding out how it works. And that's a real problem for the sort of average-case web developer that has two servers and just wants to make sure that PostgreSQL is mirrored."

Hot Standby is an important improvement in replication that allows administrators to run queries on a database that is being recovered from an archive. The module allows replication of the database logs in order to create read-only duplicates of the database. Unfortunately, funding for development of the feature only came through in August 2008, leaving only a few months before the November code freeze. Nor could Hot Standby be finalized in the extended testing period that followed.

SE-PostgreSQL is an even greater innovation, which will add the SE Linux security model, making PostgreSQL the first database to use the same security model as many distributions, such as Fedora. The problem is, Berkus explained

Because we are the trendsetters, it is very hard to advance in this area. There's no standard defined syntax, and all the papers on the topic are highly academic and speculative, so it means that we really have to spend a lot of time implementing things down to the API level and spending long periods of time arguing over details of which security features should be implemented in any context.

Having failed to be ready for the 8.4 release, both Hot Standby and SE-PostgreSQL are high on the priority list for the next release. In fact, according to Momjian, 8.4 is partly designed to ease their later implementation. Momjian suggested that, given the maturity of PostgreSQL's code, these might be simply the first of many new features that are too large to be implemented in a single release. Meanwhile, he planned on promoting SE-PostgreSQL in the release notes in the hopes of encouraging interest in it and perhaps attracting a few developers with SE Linux experience.

"To continue to grow and continue to have the sort of reputation we have, you have to make some hard decisions," Momjian said. Like Berkus and Deckelmann, Momjian clearly regretted the necessity for omitting these features. However, having faced long delays because of the refusal to jettison unfinished features before — notably in the implementation of the Windows port in version 7.4 and more generally in the sheer number of features in version 8.3 — the PostgreSQL core team has learned the hard way not to insist that a release include all the features they hoped it would have.

Lessons from the release

Besides the features, the 8.4 release has been important to the PostgreSQL team in its effort to regain control of its development processes. The project has always had an extensive review process, in which veteran developers were counted on for the final review of contributions. However, ever since the 8.3 release, Momjian explained:

We're getting really major, large, complex patches almost every week. And obviously that's very hard for the veterans to digest. One of the challenges is that we've not grown our veterans' group as quickly as we have our submitters. It's like a snake swallowing a mouse — it takes a while to go through. You have this bulge of activity, and it's really struggling just to digest so many complex patches.

To make matters worse, the review process can be daunting, especially to new contributors. Consequently, many were working in private and submitting their code only at the last moment before a freeze. To reduce these problems, PostgreSQL implemented a series of what it calls CommitFests — alternate months in which no contributions are allowed, and project members concentrate on reviewing existing contributions. This solution helped reduce the problems, but did not eliminate them.

In addition, Berkus said that CommitFests did not address complaints he had heard about people who would review a patch, only to find that a veteran had reviewed and submitted the patch before they had finished. Another challenge was how to enlist the aid of those who said that they were willing to help review out of public-spiritedness, but did not want the trouble of picking out patches to review by themselves. Solving such issues seemed central to increasing the efficiency of reviewing and in keeping CommitFests from extending beyond their originally allotted time.

Realizing that greater efforts were needed, the core team is now using a wiki to coordinate each CommitFest. In addition, over the last half year or so, Berkus created a team of what he calls "Round-Robin Reviewers" — those who wanted their reviews assigned. He also implemented reviews for routine matters such as patch structure by less experienced contributors. The core team and veterans still have to do the majority of the work, but the process has reduced the work load, and added 25 reviewers to the previous pool of 15.

The changes can still make for a slow process, but Berkus apparently judges them a qualified success, noting that the 8.4 release will likely take several months less than the 8.3 release.

Conclusion

Overall, Momjian seems satisfied with the general direction of PostgreSQL. He even suggested that the project is starting to gain increased recognition.

In the past year or so, I'm been seeing PostgreSQL put up as — I won't say a model open source project, but one that's getting the kind of respect for office disruption that OpenOffice is getting. It is starting to be seen as a valid competitor to Oracle. I've heard people say, 'What is the business case for buying Oracle when 80% of its functionality can be found in PostgreSQL, and in some cases it's easier to use and easier to administer, and, in most cases, cheaper?'

Momjian speculated that the recession may be driving this increased respect.

Berkus pointed, too, to the increased sponsorship of the program, which now means that more developers are being paid to work on the project full-time, and to the increased number of PostgreSQL conferences and user groups world wide. "Compared to other open source databases, we are still developing very quickly," Berkus said. "We still have developer momentum." As you look over the feature list for 8.4 as well as the features that were dropped, it is hard to disagree. While stopping short of being revolutionary, the new release suggests sustained, steady progress in development.

Comments (9 posted)

ELC/LFCS2009: A tale of two panels

By Jake Edge
April 15, 2009

Two kernel panel sessions were held last week in San Francisco, one for each of two conferences sharing facilities—and participants. In both cases, the kernel developers making up the panel were asked about various kernel features and developments, both from a historical and future perspective, but each had its own focus as well. The Embedded Linux Conference (ELC) panel was, unsurprisingly, focused on topics of most interest to the embedded community, while the Linux Foundation Collaboration Summit (LFCS) panel looked at more mainstream kernel concerns.

ELC: Embedded Linux Kernel Features and Developments

Besides the venue, the panel sessions also had another thing in common: LWN Executive Editor Jonathan Corbet, who moderated the LFCS panel and was a member of the ELC version. The ELC panel was moderated by CE Linux Forum (CELF) architecture group chair Tim Bird, while embedded maintainer David Woodhouse and Matt Mackall, developer of the SLOB memory allocator (along with various other kernel tasks), rounded out the panel. Bird asked most of the questions, but the audience also got the opportunity to ask a few too.

One of the themes of the discussion—as well as Woodhouse's earlier keynote—was the convergence of features between so-called "big iron" (servers and mainframes) and embedded devices. Corbet was amused to see "highmem" support recently added for ARM processors, noting that it was a controversial feature at the time it was added for servers; supporting a full 1GB of RAM on a 32-bit processor was once a "big iron" problem. Mackall also pointed to SMP and NUMA support moving into the embedded architectures. But things are not only moving in that direction, Mackall said, there is recognition from the big iron developers that there is value for their systems in some of the embedded features too.

Bird asked the panel about the proliferation of embedded distributions and whether that was a good or bad thing. Woodhouse said "fragmentation doesn't have to be bad"; it's only bad when a distribution doesn't work well with the various upstreams and goes off and does its own "weird things". Multiple distributions are one of the "great strengths of Linux", Corbet said, as it provides playgrounds where folks can experiment with different approaches.

Mackall pointed to a lack of community involvement in the various embedded Linux distributions, noting that the most successful desktop distributions were those with a strong community. In the mobile space, the distributions are "coming from the top down", he said, for any of them to be successful, they need to get community feedback.

The impact and usefulness of new "social networking" sites for Linux developers—like MontaVista's meld and the LF's relaunch of the Linux.com community—was another question Bird put to the panel. Woodhouse didn't really see the need, but "communication is always good". Mackall was concerned that these other services not become a "substitute for talking to the Linux kernel community through its normal channels". Corbet noted that there is value in "small town environments", but there is a risk that they can become inbred. "It rarely leads to good things" when a small community gets headed off in their own direction, he said.

One of the more interesting exchanges centered around the question of what a developer who just has a small amount of time can do to assist the larger community. The discussion spread out from there, though. Woodhouse stated that every developer needed to make sure that what they are working on can go upstream even if their managers "need to be whipped to allow you to do that". But Mackall wanted to "back up a step" and ensure that developers are running Linux on their desktop.

Mackall said that developers should be running Linux at home as well; if they are going to work with Linux, they should "live it". Making it work on a laptop is a good exercise; if it doesn't work, figure out why and fix it. He has seen too many embedded Linux developers with Windows desktops who don't understand Linux well enough to properly develop on it. "They don't have the Linux mindset", he said.

Those thoughts were echoed by Woodhouse as he related an anecdote about some embedded developers who would FTP a file to the Windows box, edit it using Notepad, then FTP it back to the Linux machine. It is not efficient to do things that way, he said. Doing the development on Linux will lead to a better result, Mackall said. Doing everything on a Linux desktop will help that, Mackall pointed out, "you should read your mail on it too".

Towards the end of the hour-long session, Bird asked "have we won?", is embedded Linux unstoppable or "is it possible to lose?". Mackall and Corbet had similar thoughts, worrying about the proliferation of devices running Linux that could not be modified by their users. "We haven't won until I can put my code on my phone", Mackall said. Corbet echoed that: "If we end up populating the world with locked-down Linux systems, then we've lost".

In closing, Bird noted that embedded Linux has made an "awful lot of progress". This is the fifth year for ELC and he has been working on embedded Linux for 17 years, over that time, "things have gotten way better", he said.

LFCS: The Linux Kernel: What's Next

Corbet opened the panel by having the participants introduce themselves to an audience of around 400 people. The panel consisted of X.org project lead Keith Packard of Intel, Andrew Morton the kernel "odd job man" from Google, USB maintainer Greg Kroah-Hartman of Novell, and Ted Ts'o of IBM who is currently on loan to the LF as its CTO. After that, Corbet got started by asking Kroah-Hartman about the -staging tree.

Approximately one-third of the code that was merged as part of the 2.6.30 merge window came in via the -staging tree, which Kroah-Hartman maintains. Corbet said there was a lot of confusion about the tree and asked for an explanation. Basically, it is a collection of drivers that used to live outside of the tree, Kroah-Hartman said, consisting of bad code with bad API usage and other major problems barring their acceptance into the mainline, in other words, "crap". But there is a lot of hardware in use that requires those drivers and the code was not getting improved out of the tree, so moving them gives a centralized location where people can get them and hopefully improve them.

Kroah-Hartman said that there were several drivers that had graduated from -staging and into the mainline, so the process seemed to be working. "If you want to get involved in the kernel, that's a good place to start", he said. He noted that there are TODO files in each driver's directory listing the kinds of changes needed before the driver will be accepted into the mainline.

Corbet mentioned that he had been going to conferences for years hearing about all the great things that were going to be done in the Linux graphics area, but that we had now reached a point where much of it had actually been done. He asked Packard to fill the audience in on what had been done and "why it's cool". Packard described how X.org had "turned the graphics stack upside down" by moving the device configuration out of user space and into the kernel.

By doing that, X becomes just an API for existing applications, and other APIs such as OpenGL or Wayland can be considered, he said. Support for Intel graphics is good, and there is lots of work going on for Radeon (ATI) chipsets, but NVIDIA is "not helping at all". He pointed out that Fedora 11 will be shipping with the Nouveau driver for NVIDIA hardware because it has surpassed the free 'nv' driver in capabilities. He also noted that moving the configuration and initialization into the kernel allows people to experiment with graphics acceleration without spending an inordinate amount of time figuring out how to initialize the hardware.

Next, Corbet asked Ts'o about the status of the ext4 filesystem. Ts'o reported that Fedora and Ubuntu would be shipping it in their next releases that are coming within a few months. He said that the user community was growing and "to be brutally honest, that will sometimes find bugs". He said one goal is to get it into the next round of enterprise distributions. He also noted that ext4 is a temporary solution, based on BSD FFS, which is technology from the 70s. Btrfs, nilfs, and others were where the interesting filesystem development is happening. All of those make it an "exciting time" to be a filesystem developer.

Morton responded to a query about the linux-next tree by saying that it is working out well, overall, as a place for integration and testing. But, he said that he was "a bit disappointed with the uptake it has", especially from a testing perspective. Fewer maintainers are taking advantage of the opportunity to integrate and test using linux-next than he would like to see. It is often the case that when there is a problem that shows up in Linus Torvalds's tree, it is because the code never made into linux-next.

From the audience, ftrace developer Steven Rostedt asked about the pressure to merge new code upstream into the mainline, but that there is major resistance to certain things—he mentioned SystemTap and utrace—being merged. He wanted to know what can be done to resolve that. Morton responded that for device drivers or supporting new architectures, the path is easier, but that the two examples Rostedt gave touch core kernel code. Morton likened the utrace battle to an "incestuous family struggle", but noted that the code needed improvement before it could go in.

One of the reasons that utrace didn't make it into the kernel was a lack of an in-kernel user of the code, Rostedt noted. Morton responded that having an in-kernel user for a feature is a "nice checkbox", because it gives the kernel community a means to test the code. But, Kroah-Hartman pointed out that "changing core kernel code is hard, and it should be". Ts'o also pointed out that several core kernel developers are helping out with utrace, which should significantly smooth its path into the mainline.

That discussion led Corbet to ask about tracing, noting that there were several tracing solutions that were still out of the tree, but that ftrace got new tracers added for each kernel release. Morton would like to "see evidence that people are using them and getting good results". Both he and Ts'o pointed at the lack of documentation for various tracers, saying that adding that and making the tracing more usable would help get more of that code into the mainline kernel.

The recently proposed nftables packet filtering subsystem was raised by Corbet as an example of a place where a user-space interface—the existing iptables—might be supplanted. He asked how that transition could be accomplished. Morton called it a "pretty traumatic transition" that would require a compatible set of tools, with several years of warning along with buy-in from the distributions. That takes three to four years according to Kroah-Hartman. Ts'o called the packet filtering interface more of an administrative interface that didn't have to be kept as stable as others, but that the iptables command does need to be stable.

All of that led Packard to complain about the difficulties of keeping the current user-space interface for X servers while moving modesetting into the kernel. According to Packard, there are exactly two users of the interface, both of which are under his control, so why does he need to provide backward compatibility? Ts'o said that the problem would be for distribution users who wanted to upgrade their kernel. Because the distribution might use an old X server, that interface—which Packard described as "open /dev/mem"—needs to be maintained. Kernel hackers want as much testing of new kernels as they can get, so any barrier to that testing is problematic.

At the end of the session, LF Executive Director Jim Zemlin announced the first ever LF "Unsung Hero" award, which he then presented to Morton. He explained that Morton is an avid car racer, so the LF arranged for him to have a day at the track as a reward. It was no surprise that there was much applause for Morton—one of the few people actually able to follow the linux-kernel mailing list. He also reviews an incredible amount of the code that ends up in the kernel.

These sessions provide an interesting view into the thinking of the members of the panel—one not easily derived from just keeping up with the technical side of Linux development via the LWN Kernel page or even by sifting through linux-kernel. They also give attendees a look at what's coming in the future that can be hard to discern, though Corbet's Linux Weather Forecast is helpful there. In the end analysis, though, the biggest benefit may just be putting kernel developers and users together in a fairly informal setting so that both sides get a better feel for the other. Faces and personalities don't necessarily jump out of the normal communication channels, so panel sessions like those that went on in San Francisco are useful well beyond their technical content.

Comments (11 posted)

Book Review: Pragmatic Version Control Using Git

By Jake Edge
April 13, 2009

Given the ubiquity of Git as a version control system throughout the free software community, one would expect there to be more books about it. So far, that is not the case—though there are indications that is changing—so Travis Swicegood's Pragmatic Version Control Using Git is welcome for those trying to come up to speed on Git. Overall, the book provides a nice starting point, though there are some rough spots.

Like any book covering a free software package, this one begins with some important basics: where to get and how to install the tool. For Linux users, this guide is probably unnecessary as Git is packaged for most distributions these days—Mac OS X or Windows users may find more of interest. The discussion of Git configuration, along with the reminder to set the user.name and user.email parameters before doing any commits, something that I regularly forget when setting up a new machine, is quite useful for all.

Unlike some other authors, though, Swicegood takes the time to give a bit of the flavor of Git through a discussion of its concepts—along with some indication of why one might want to use it—before descending into the much more boring installation guide. He takes a "30,000 foot" view of the tool and, with no command syntax or specific usage details, spells out what Git can do.

One of the primary problems that any text on a version control system (VCS) must overcome is the need to give "real-world" examples while still keeping the book to a reasonable size. Swicegood does a good job here, by following one example repository throughout the text. One could quibble with the scope of some of his examples, but, by and large, they give a good idea of how things work. In some ways, the simplicity of those examples appears to encourage curious readers to do some experimentation. That is, after all, a pretty good way to learn how to use a tool.

The book is broken up into three main sections (plus an Appendix with a reference and some pointers to more information), but the meat of the text is in Section II, "Everyday Git". For whatever reason, the last chapter of the first section covers setting up local repositories as well as cloning remote repositories. That might make sense, but it is rather puzzling that it starts talking about things like git rebase, branches, and doing releases here. Much of that is covered in further detail later and it doesn't seem to belong.

In Section II, the book does an excellent job of covering how to use Git on a day-to-day basis. I have found myself referring to it several times since reading it to remind myself of the syntax of a command—or the name of a command itself. The sequence is logical, starting with adding and committing files, moving through branch creation and management as well as examining and working with history in Git, and completing the core with a look at remote repositories. Two additional chapters covered somewhat more advanced—or just less often used—features such as organizing the repository and working with multiple remote projects as well as things like compacting a repository and working with the reflog.

Swicegood uses the term "staging" for what is commonly referred, at least in other Git documentation, as the "index". Some readers, especially if they are already well-versed in Git, may find this a bit confusing, but I found that it made sense and, in some ways, simplified the concept. In any case, it seems clear that is how Swicegood envisions the Git index, so passing it along to his readers is a nice touch.

There is no specific mention of the Git version covered by the book—though some early examples mention 1.6.0.2—which is a rather large oversight. Git development moves rapidly, so some of what Swicegood talks about could well be out-of-date. New Git features, such as the unmentioned git stash, were left out, but it isn't clear whether that was done on purpose or because they were added after the book was completed. Most of what is covered should be unaffected, though, as the basic operation of the tool is fairly stable.

The third, thinnest and weakest section is "Administration", which covers migrating to Git and running Git servers. Both chapters seem to suffer from a lack of breadth. In the migration chapter, nothing but CVS or Subversion are considered, and tools like tailor are not even mentioned.

Two things about Swicegood's choices of Git features stood out in a negative way. He seems overly enamored of git rebase, which certainly has its place, but it has some drawbacks that he doesn't fully caution against. His solution for how to create a repository for others to use was somewhat unsatisfying; Git itself can be configured to support such things. Instead, Swicegood reaches for Gitosis, a Python tool for managing remote git repositories. The project seems to have no web page (other than a gitweb page) and one must install it by cloning its repository. Given that there is no mention of how to "manually" set up a Git server, it all seems a bit strange.

There are a handful of less-substantive complaints I could make as well: a throwaway George Santayana quote on the history chapter was a bit annoying, an embarrassing "EMCAScript" typo in one of the examples stood out, as did a few other minor flaws. Swicegood complains frequently about having to truncate or otherwise modify the output of commands to fit on the page, which seems a bit silly. Either fix the problem somehow in the production process or ignore it to the extent possible; involving the reader in the pain of the typographic process seems unnecessary. But these are nits.

While I had some complaints—it is a rare book indeed where I don't—Pragmatic Version Control Using Git has certainly found a spot for itself on my shelf. It especially shines as a quick reference to commands needed daily or nearly so. It will also provide a good starting point for those who wish to learn Git from scratch. Once other Git books come out, it will be interesting to see which end up on my shelf and which are shuffled off to long-term storage. In the end, that is the best test for a good book.

Comments (13 posted)

Another Linux capabilities hole found

By Jake Edge
April 15, 2009

A recent patch posted to the linux-kernel mailing list fixes a long-standing flaw in the Linux capabilities implementation. The problem has existed since capabilities were added to the kernel during the 2.1 development series—more than ten years ago. One of the obvious questions is how a bug of that sort could have escaped notice for so long.

The problem was reported in March by Igor Zhbanov, who provided an excellent analysis of the flaw and how it can be exploited. The basic problem lives in the VFS and NFS code which tries to drop privileges, by way of capabilities, before performing operations. The mask of capabilities bits that was used for that purpose does not include CAP_MKNOD (the ability to make a device node entry) or CAP_LINUX_IMMUTABLE (which allows changing the S_APPEND and S_IMMUTABLE file attributes). That means that those capabilities bits are not removed before the file operation is performed.

Zhabanov shows that on a compromised client machine, the root user could give another user CAP_MKNOD, which would allow that user to run the mknod command and create a device entry owned by them. If this was done on an NFS-mounted filesystem, that entry would be created on the server still owned by the user. This works even if the root_squash option—essentially mapping root users on client machines to "nobody" on the server machine—was used on the export.

If the user on the compromised machine can execute code on the server or any other client, they can directly access the device that underlies the device node entry. They will not require any special permissions on the other machines because the device node is owned by them. For example, creating the equivalent of /dev/hda on the server's filesystem might allow direct access to the hard disk block device on any system that had the NFS filesystem mounted. Uglier exploits can certainly be imagined.

This is clearly a nasty problem. Linus Torvalds merged the fix for the recently released 2.6.30-rc2 kernel. One would guess the -stable tree folks won't be too far behind. Serge Hallyn also provided patches for 2.4 and 2.2 kernels, though the latter has become completely unsupported.

The patch was greeted with a question from Valdis Kletnieks: "Wow. How did this manage to stay un-noticed for this long?" Torvalds had a characteristically blunt answer: "Because nobody uses capabilities?" While that might explain how the bug went undetected for so long, it doesn't help alleviate the problem. Whether folks are using capabilities or not is irrelevant, the kernel itself certainly is.

This is not the first time capabilities have been the source of a nasty, exploitable hole. The unfortunately-named "sendmail-capabilities bug" provided a way to gain root privileges by exploiting the way sendmail dropped its privileges. The solution, when this bug was found in 2000, was to "cripple" capabilities in the kernel by disabling capability inheritance. That functionality was not restored until relatively recently.

If distributions and other users were doing more with capabilities, it does seem likely that this particular problem would have been seen sometime in the last decade. But, by and large, Torvalds is right. For one thing, capabilities are a Linux-specific feature, so anyone writing portable code is likely to avoid using them. In addition, they are fairly difficult to wrap your head around; that complexity tends to lead folks to ignore capabilities.

There have been some efforts at using capabilities in distributions more, but one has to wonder how many more exploits still lurk in that code. It is hard to imagine removing capabilities at this late date—it is a user-space interface from the kernel after all—but some must be wondering if the feature is worth all the trouble it has caused.

Comments (8 posted)

clamav: denial of service

Package(s):

clamav

CVE #(s):

Created:

April 14, 2009

Updated:

April 15, 2009

Description:

From the Ubuntu advisory: It was discovered that ClamAV did not properly verify buffers when processing Upack files. A remote attacker could send a crafted file and cause a denial of service via application crash.

Alerts:

Ubuntu

USN-756-1

clamav

2009-04-13

Alternative Database Language Support
* means language support is pending
Cassandra:	C++, C#, Java, Perl, Python, PHP, Erlang, Ruby
Memcached:	C/C++, C#, Java, Perl, Python, PHP, Ruby, Lua, OCaml, Common LISP
Tokyo Cabinet:	C/C++, Java, Perl, Ruby, Lua
Redis:	C/C++, Java*, Perl, Python, PHP, Erlang, Ruby, Lua, Tcl
CouchDB:	C#, Java, Perl, Python, PHP, Erlang, Ruby, Haskell, JavaScript, Common LISP
MongoDB:	C++, Java, Python, PHP, Erlang*, Ruby

Date(s)	Event	Location
April 20 April 24	samba eXPerience 2009	Göttingen, Germany
April 20 April 23	MySQL Conference and Expo	Santa Clara, CA, USA
April 20 April 24	Perl Bootcamp at the Big Nerd Ranch	Atlanta, GA, USA
April 20 April 24	Cloud Slam '09	Online, Online
April 22 April 25	ACCU 2009	Oxford, United Kingdom
April 23 April 26	Liwoli 2009	Linz, Austria
April 23	Linuxwochen Austria - Linz	Linz, Austria
April 23 April 24	European Licensing and Legal Workshop for Free Software	Amsterdam, The Netherlands
April 25 May 1	Ruby & Ruby on Rails Bootcamp	Atlanta, Georgia, USA
April 25 April 26	LinuxFest Northwest 2009 10th Anniversary	Bellingham, Washington, USA
April 25	Linuxwochen Austria - Graz	Graz, Austria
April 25	Festival Latinoamericano instalación de Software libre	All Latin America, All Latin America
April 25	Grazer Linux Tage 2009	Graz, Austria
April 27	OSDM 2009	Bangkok, Thailand
May 4 May 8	JavaScript/Ajax Bootcamp at the Big Nerd Ranch	Atlanta, Georgia, USA
May 4 May 7	RailsConf 2009	Las Vegas, NV, USA
May 4 May 6	EuroDjangoCon 2009	Prague, Czech Republic
May 4 May 6	SYSTOR 2009---The Israeli Experimental Systems Conference	Haifa, Israel
May 5	Linuxwochen Austria - Salzburg	Salzburg, Austria
May 6 May 9	Libre Graphics Meeting 2009	Montreal, Quebec, Canada
May 6 May 8	Embedded Linux training	Maynard, USA
May 7	NLUUG spring conference	Ede, The Netherlands
May 8 May 10	PyCon Italy 2009	Florence, Italy
May 8 May 9	Linuxwochen Austria - Eisenstadt	Eisenstadt, Austria
May 8 May 9	Erlanger Firebird Conference 2009	Erlangen-Nürnberg, Germany
May 11	The Free! Summit	San Mateo, CA, USA
May 13 May 15	FOSSLC Summercamp 2009	Ottawa, Ontario, Canada
May 15 May 16	CONFidence 2009	Krakow, Poland
May 15	Firebird Developers Day - Brazil	Piracicaba, Brazil
May 16 May 17	YAPC::Russia 2009	Moscow, Russia
May 18 May 19	Cloud Summit 2009	Las Vegas, NV, USA
May 19 May 22	PGCon PostgreSQL Conference	Ottawa, Canada
May 19	Workshop on Software Engineering for Secure Systems	Vancouver, Canada
May 19 May 22	php\|tek 2009	Chicago, IL, USA
May 19 May 21	Where 2.0 Conference	San Jose, CA, USA
May 19 May 22	SEaCURE.it	Villasimius, Italy
May 21	7th WhyFLOSS Conference Madrid 09	Madrid, Spain
May 22 May 23	eLiberatica - The Benefits of Open Source and Free Technologies	Bucharest, Romania
May 23 May 24	LayerOne Security Conference	Anaheim, CA, USA
May 25 May 29	Ubuntu Developers Summit - Karmic Koala	Barcelona, Spain
May 27 May 28	EUSecWest 2009	London, UK
May 28	Canberra LUG Monthly meeting - May 2009	Canberra, Australia
May 29 May 31	Mozilla Maemo Mer Danish Weekend	Copenhagen, Denmark
May 31 June 3	Techno Security 2009	Myrtle Beach, SC, USA
June 1 June 5	Python Bootcamp with Dave Beazley	Atlanta, GA, USA
June 2 June 4	SOA in Healthcare Conference	Chicago, IL, USA
June 3 June 5	LinuxDays 2009	Geneva, Switzerland
June 3 June 4	Nordic Meet on Nagios 2009	Stockholm, Sweden
June 6	PgDay Junín 2009	Buenos Aires, Argentina
June 8 June 12	Ruby on Rails Bootcamp with Charles B. Quinn	Atlanta, GA, USA
June 10 June 11	FreedomHEC Taipei	Taipei, Taiwan
June 11 June 12	ShakaCon Security Conference	Honolulu, HI, USA
June 12 June 13	III Conferenza Italiana sul Software Libero	Bologna, Italy
June 12 June 14	Writing Open Source: The Conference	Owen Sound, Canada
June 13	SouthEast LinuxFest	Clemson, SC, USA
June 14 June 19	2009 USENIX Annual Technical Conference	San Diego, USA
June 17 June 19	Open Source Bridge	Portland, OR, USA
June 17 June 19	Conference on Cyber Warfare	Tallinn, Estonia
June 20 June 26	Beginning iPhone for Commuters	New York, USA

LWN.net Weekly Edition for April 16, 2009

New features

Features for the next release

Lessons from the release

Conclusion

ELC: Embedded Linux Kernel Features and Developments

LFCS: The Linux Kernel: What's Next

Security

New vulnerabilities

clamav: denial of service

ghostscript: overflows and underflows

ghostscript: integer overflows

imp4: cross-site scripting

mod_perl: cross-site scripting

ntop: world-writable log file

ntp: arbitrary code execution

openafs: multiple vulnerabilities

php: denial of service

pptp: file permission problem

seamonkey: XSL Transformation vulnerability

tor: multiple vulnerabilities

wireshark: multiple vulnerabilities

wordpress-mu: cross-site scripting vulnerability

xine-lib: integer overflow

Kernel development

Brief items

Kernel development news

Sidebar: data=guarded

Patches and updates

Kernel trees

Architecture-specific

Core kernel code

Development tools

Device drivers

Documentation

Filesystems and block I/O

Memory management

Networking

Security-related

Virtualization and containers

Benchmarks and bugs

Miscellaneous

Distributions

News and Editorials

New Releases

Distribution News

Debian GNU/Linux

Fedora

Gentoo Linux

Mandriva Linux

SUSE Linux and openSUSE

Ubuntu family

Distribution Newsletters

Distribution meetings

Interviews

Distribution reviews

Development

Cassandra

Memcached

Tokyo Cabinet

Redis

CouchDB

MongoDB

Choosing a Data Store

System Applications

Audio Projects

Database Software

Device Drivers

Embedded Systems

Filesystem Utilities

LDAP Software

Security

Web Site Development

Desktop Applications

Audio Applications

CAD

Desktop Environments

Electronics

Graphics

Interoperability