LWN.net Logo

LWN.net Weekly Edition for April 24, 2008

The Grumpy Editor encounters the Hardy Heron

By Jonathan Corbet
April 23, 2008
Your editor is not always known for making life easy for himself. Perhaps one of the most clear examples of masochistic behavior would be a certain preference for running development distributions on mission-critical systems. That said, your editor has stuck with a stable distribution on his laptop through a round of intensive travel earlier this year. But that was too easy, so, shortly before heading off to the Linux Foundation's Collaboration Summit, the laptop got moved to the Ubuntu "Hardy Heron" distribution. Needless to say, there have been some interesting ups and downs (literally) since then.
Advertisement

There is always a certain thrill that comes with upgrading a system and finding that important features no longer work. In this case, the problem was suspend and resume, which your editor uses heavily. In fact, the system would suspend just fine - as long as one failed to notice that, behind the cleverly darkened screen, the laptop's backlight had been left on. Needless to say, this new behavior is not helpful if one's goal is to save power while the system is suspended, but it gets worse than that. Your editor discovered this nice surprise after carrying the computer in a backpack for a few hours; by the time it came out, it was almost too hot to hold. Happily, no permanent damage appears to have been done.

Or, perhaps, unhappily. Your editor has been looking for an excuse to get a new laptop for a while.

The problem turned out to be a HAL configuration error combined with a strange internal model number which makes your editor's Thinkpad X31 different from, seemingly, every other X31 on the planet. Once your editor found the bug report and attached a "me too" comment, the solution was quick in coming. On the net, one can find complaints that Ubuntu is unresponsive to bug reports, but that was certainly not the experience here.

As an aside, it seems worth noting that life seems to have gotten more complicated, with a lot more code wrapped around the kernel than there once was. The problematic configuration file was /usr/share/hal/fdi/information/10freedesktop/20-video-quirk-pm-ibm.fdi - not a place where your editor, who is not a HAL expert, would have thought to look. That, it seems, is the price of more capable hardware and software, but sometimes your editor pines for the days when it seemed possible to carry a full understanding of the system within a single brain.

GNOME developers are (perhaps unjustly in recent years) known for taking a minimal approach to configuration options. That can be irritating, but just as annoying is their tendency to reset the options they do provide over major updates. Once suspend and resume work, your editor demands something else of a laptop when traveling: absolute silence. So the return of beeps to gnome-terminal was not appreciated. Those were easily silenced, but the GNOME developers also saw fit to bring back the blinking cursor - and they took away the configuration option which abolishes that intolerable feature.

Your editor first ran into the unstoppable blink with Rawhide; a query to the developers there turned up a quick answer. It seems that the GNOME developers have decided to create a single, system-wide parameter to control blinking cursors. Now, your editor approves of the concept of being able to turn off that behavior everywhere with a single switch - but only as long as that switch isn't hidden where nobody will ever find it. In this case, the GNOME developers have taken this feature, wrapped it in old newspapers, and stashed it behind the furnace in the basement; then they put a trunk on top of it. It is a rare user who will find it unassisted. In the hopes that it may save one or two readers from some time spent with search engine, your editor will now divulge the top-secret incantation which turns blinking cursors off:

    gconftool-2 --type bool --set /desktop/gnome/interface/cursor_blink false 

Naturally, a terminal window is required to run this command. It would have been nice if the developers who packaged this code for Hardy Heron had found a way to smooth over this change, but no such luck; as far as your editor can tell, no distributor has made that effort.

Another bit of fun is that your editor is no longer able to set the desktop background; the relevant configuration windows are ineffective. In this case, it would appear that the task of implementing the user's background choices have been moved to nautilus - just the place your editor would have thought to look for it. As it happens, your editor has no use for file managers and does not run nautilus - and is punished with an immutable Ubuntu-brown background for that sin. Happily, your editor still knows how to run xsetroot.

All of the above is a set of relatively minor grumbles, all of which are rectified in relatively short order. Once those details have been taken care of, the Hardy Heron release works quite well. One of the biggest aggravations from previous upgrades - having OpenOffice.org reformat the slides in all of your editor's presentations - was not present this time around. Hopefully we are moving into an era where "it didn't mangle my documents" is not something considered worthy of mention.

There was one very nice surprise as well. Your editor's laptop previously required almost 12 watts of power when running unplugged. This laptop is not at the bleeding edge of current technology, so the amount of time it was able to run without a recharge has been dropping for a while. With the Hardy release, steady-state power consumption has dropped to just over 9 watts - a big improvement. The credit for this change belongs to developers at all levels: kernel, applications, distributors, etc. The end result is a system which runs much more efficiently, and that is a good thing.

All told, your editor is reasonably content; this distribution looks like one which might just be worth keeping around. That's a good thing, since Ubuntu plans to maintain it as a "long-term support" release. Not that your editor intends to make much use of that long-term support; there should be a new development series starting soon, after all. One of the nice things about development distributions is that support never ends as long as one stays on the treadmill and the project itself remains alive.

Comments (161 posted)

ELC: A taste of the conference

By Jake Edge
April 23, 2008

Technical conferences generally provide a wealth of choices, to the point where participants have to make tough decisions at times to pick the session to sit in on. This year's Embedded Linux Conference was no exception; there were multiple slots where the author had to wish that he could be in more than one place at a time. But, he did manage to take notes in some of those that he attended; hopefully some of the conference flavor can come through in the following report.

Power management

MontaVista's Kevin Hilman presented an approach for handling power management on embedded devices that focused on changes that can be made to the kernel, but noted that there is much that can be done by applications too. Because of the time and money budgets available for embedded projects, many do not have the resources to do a complete job of tuning the kernel to get the best possible power performance. There is also no "one size fits all" solution for power management, there are too many device-specific issues to allow that.

Hilman's approach is to target specific "building blocks" that embedded developers can incorporate into their project. Each block will provide some savings, so the project can stop when the desired performance is reached—or it is time to ship the device. One of the easier steps is to customize the idle loop in the kernel, putting the processor to sleep when there is no work to be done. There are different kinds of sleep, though, generally trading off power savings and wakeup latency. The cpuidle subsystem provides a means to specify those values in an architecture independent way, which, along with a platform independent "governor", can put the processor into various sleep modes. The only platform dependent piece are the hooks to enter each of the different sleep states.

A similar approach is taken by the CPUfreq subsystem, which can reduce the clock frequency of the CPU to reduce power consumption using the Dynamic Voltage and Frequency Scaling (DVFS) feature of some processors. "Operating points" (OPs)—voltage and frequency tuples—are defined for the hardware. There are various generic CPUfreq governors that can then be used to determine when to change OPs and which to change to. The governor will invoke a platform-specific driver to effect that change. In addition, power management "quality of service" is currently being discussed to allow applications to request a certain level of performance that may override some of the lower-level sleep or frequency decisions.

Embedded SELinux

SELinux has a well-earned reputation for being able to restrict processes to only use those resources that have been specifically allowed by policy, but it is rather resource intensive. Yuichi Nakamura presented Hitachi's research into bringing SELinux into a more resource constrained embedded environment. One of the first problems they encountered was the need for flash filesystems that support extended attributes (xattrs), which is where SELinux stores labels for files. Only jffs2 currently supports xattrs, so that is the one they used.

The next big hurdle was trying to get a set of policies that were stripped down to the needs of an embedded platform. Nakamura started with the SELinux reference policy (refpolicy) and started removing rules. The sheer number of rules and policies that needed to be removed was daunting—as was the need to understand what was being removed. He also ran into strange dependencies: removing a sendmail policy caused a problem in the apache rules. The solution was to create a simplified policy language and policy editor that reduced the problem to something more tractable for the embedded world. In the process it greatly reduced the size of the policy files, from 4.6M down to 60K.

Another problem encountered was the performance and size of SELinux, which is a common embedded woe. Through some hand optimization of the read/write path, along with removing some unused permissions checks, they were able to increase the performance by a factor of ten on their SuperH reference platform. By changing some static buffers in SELinux to a dynamic allocation they also saved 250K of runtime memory. Much of that work was merged into 2.6.24. There is still work to be done, but with the changes, SELinux is viable for embedded platforms.

GCC and kernel hacking

Two sessions provided various tips and tricks for embedded development, with Gene Sally of Timesys focused on GCC, while IBM's Hugh Blemings shared some of the things he has learned from the kernel hackers he works with.

Sally discussed the different ways that developers could get a GCC toolchain for their target processor. One of the bigger hurdles that an embedded developer faces is getting a cross-compiler toolchain—one that runs on his development workstation, but generates code for the target platform. There are several ways to get the toolchain: as a tarball for popular development/target combinations, by using helper tools like crosstool or buildroot from uClibc, or by building it from source directly.

Building from source is the most difficult, of course, but allows for the most customizations and flexibility. Sally went on to describe a handful of useful GCC command-line options for helping to debug cross-compilers or just to better understand what GCC is doing:

  • gcc -### - show what GCC would have executed
  • gcc -v - show what GCC is executing
  • gcc -g x.c -o x; objdump -S x - show the C and generated assembly code
  • gcc -E -dM - </dev/null - show all predefined GCC macros
  • gcc -C -E - show pre-processor output, but leave comments intact
  • gcc -M - show all include file dependencies (for use in Makefiles)
  • gcc -MM - like above, but ignore system include files

Blemings concentrated on the development infrastructure by describing the lab that he used to port the kernel to a Taishan PowerPC-based evaluation board. When undertaking a project like that, "get to know your hardware team" because they will have lots of important information and shortcuts that can be used as part of the board "bringup". At IBM in Canberra, where Blemings is based, they have gotten to the point where they can bring up Linux on any board where they can "access memory and point the PC [program counter] at it"; his tips have come out of that environment.

One of the most important things is to realize that you will be building kernels over and over again, so optimizing your environment for that will save lots of time. His suggestion was to start with a "honkin'" compile box; he described an IBM multi-processor box as an excellent choice but noted that the cost was so high he couldn't get one. It would, however, do "3k/sec"—that's compile 3 kernels per second. In the absence of something like that, he suggested borrowing cycles by using ccache and distcc to reduce and parallelize the compilation that needs to be done. Even adding relatively modest machines into the distcc pool can significantly reduce time spent waiting for a new kernel.

Ubuntu mobile and embedded (UME) and Maemo

One of the hottest areas in embedded Linux these days is the mobile internet device (MID) market. There were two talks on MID-focused distributions, with Canonical's David Mandala giving an overview of Ubuntu Mobile and Embedded (UME) and Nokia's Kate Alhola talking about the status and future directions of Maemo Mobile Linux. UME is a relatively recent addition to the mobile device space—they are anxiously awaiting hardware to run on—whereas Maemo has been around for a while, powering the Nokia N770, N800, and N810 internet tablets.

UME is an effort to apply the Ubuntu distribution and philosophy to touchscreen devices. Mandala explained that they are taking existing Linux applications and adapting them for small screens that use fingers, rather than keyboard and mouse, as the input device. The resolution of the displays is typically something approaching that of low-end desktops, but the physical space they take up is far smaller (i.e. the dots per inch or DPI is high) making it difficult to do development without actual hardware.

The UME project is working with Intel's Moblin.org project to target Atom processor based systems. It uses the Hildon application framework atop GNOME Mobile, running on an Ubuntu 8.04 (Hardy Heron) distribution. Mandala stressed that Linux should be "invisible" on these devices as users just want applications that work to browse the web, use email, and the like.

The main focus of UME has, so far, been on the user interface, though power consumption, memory footprint, and speeding up boot times are all on their radar. Canonical is very interested in fostering a community around UME, but that has been "a bit of a challenge", mostly due to a lack of hardware to run on. Mandala expects a few different hardware devices to be available "soon" and that will make it easier to attract a development community.

As should come as no surprise after Nokia's purchase of Trolltech early this year, Alhola announced that Maemo would be supporting both GTK and Qt in the near future. This is part of Nokia's belief that there is "no single truth", so Maemo supports multiple paths to development on the platform. Maemo directly supports C, C++, and Python, while the community has added support for Java, Objective C, Vala, and Mono.

Nokia makes a very clear distinction in its product line between phones, which are largely closed platforms, and tablets, which are open. Open source software is an essential part of their strategy as they want to build an application ecosystem around their products. "We are taking open source to the consumer mainstream," Alhola explains.

One of the interesting tools that Nokia is working on as part of Maemo is Scratchbox, which is a toolkit geared towards making cross-compilation easier. It does this by making the development environment look and act like the execution environment, using QEMU to simulate the target hardware. Scratchbox supports both ARM and x86 targets, with experimental support for additional architectures. It uses standard toolchains and distributions where possible and is released under the GPL.

LogFS

LogFS is a flash filesystem that is targeted at the larger flash devices that are becoming more widespread. Unlike some filesystems currently in use, most notably jffs2, LogFS is specifically designed to avoid some of the performance and scalability problems that come with larger devices. Jörn Engel is the developer of LogFS, with some support from the Consumer Electronics Linux Forum (sponsor of ELC), so he gave an update on the status of the project.

Engel used an unconventional scale (the sucks/rules meter) to measure the progress that had been made in the last year. The scale runs from -10 to 10 and measures the "suckiness" of particular features of the filesystem. Taking a page from This Is Spinal Tap, the score for the mount speed of LogFS was measured at 11 both last year and this. It is clearly the feature that Engel is most proud of as it takes 10-60ms to mount a filesystem; a similarly sized jffs2 takes on the order of one second.

Engel looked at around ten separate attributes of the filesystem, first rating them on where LogFS was a year ago, then re-rating based on where it is today. The conclusion is that the average measure has moved from -2.75 to -0.55, so that "on average, it hardly sucks". He says he is getting confident enough to submit it to Andrew Morton for inclusion in his tree, hopefully on its way into the mainline. Engel is clearly somewhat frustrated with people who are waiting until it is "done" to start using LogFS—though there are some fairly serious usability problems that would tend to limit testers—proclaiming: "LogFS is finished, try it now, today!"

In conclusion

There were more talks, of course, as well as an active "hallway track" for the roughly 175 participants. ELC is a well-run and very interesting conference that is worth consideration for anyone who uses, or plans to use, Linux as an embedded operating system. This year's venue, the Computer History Museum was a nice facility for a conference of this size. It also had some great exhibits that will bring back memories for anyone who has been using computers, calculators, or game systems over the past 50 years or so—well worth a visit when one is in Silicon Valley.

Comments (1 posted)

OLPC at a turning point

By Jonathan Corbet
April 23, 2008
It looks like hard times for the One Laptop Per Child project. Quite a few key developers have left, including Mary Lou Jepsen, Ivan Krstić, Andres Salomon, and Walter Bender. Laptop deployments are far below the several million that the project had hoped for by this time, and many of the goals for the system's software have not been achieved. There is persistent talk of supporting Windows, with suggestions that Linux could be dropped altogether. An ongoing thread on the project's development mailing list shows that quite a few participants are concerned about where things are going. To many, it seems, OLPC is about to go down as a noble failure.

These rumors may be just a bit premature, though. When considering what may really come of OLPC, it's worth keeping a few things in mind.

One of those is the fact that the project has just completed a major push to its first mass-production system. Your editor has watched the project closely enough to see that, as with many such efforts, the people involved have been putting in lots of long hours to get the job done. When this kind of pressure is lifted, it is natural to take a break, catch up on the house work, and, perhaps, find a new job. So the departure of some key staff at this stage is not entirely surprising.

A look at the state of OLPC's software suggests that the project had set an overly ambitious set of goals for its first release. When that happens, one must jettison some objectives; the later that this is done, the more likely it is that the wrong objectives will be tossed overboard. There are signs that OLPC tried to do too much for too long, with an end result which is not as stable, as fast, or as fully-featured as one would like. As many people close to the project have noted, the laptop's software remains immature. But, as former president Walter Bender put it:

While [we] have heard a lot of noise about performance in the media and from some members of the development community, it has not, in my experience been a major road-block in the school trials and deployments. There are lots of bugs and lots of things that could be improved upon, and these should certainly be addressed, but the characterizations being made in this thread do not reflect the realities of the OLPC deployments--the children and teachers are using the laptops and are learning.

Finally, the number of laptops delivered to children is far below the level the project had planned upon. Fewer deployments means a lower impact for the project, but it also cannot be helping to create the economies of scale the project had counted on to push the cost down. There have also been some embarrassing failures along the way, including the misplacing of a large number of "Give one get one" orders until after it was too late to include them in the manufacturing run.

All of the above points to a need to make some changes in how the project is run. Changes always create uncertainty, so it would be surprising if OLPC participants were not a little nervous at the moment.

What happens in the next few months will likely determine OLPC's fate. The project's leadership has famously said in the past that OLPC is an education project, not a laptop project. Some people have recently expressed concerns that, in fact, OLPC is turning into a laptop project, with deployment numbers being the main goal. Nicholas Negroponte doesn't help when he allows himself to be quoted as being "mainly concerned with putting as many laptops as possible in children's hands." If OLPC becomes primarily a low-cost laptop vendor, and especially if it goes to proprietary operating systems as a means toward that end, it will lose much of the community that has grown up around the project.

And that would be a shame. There is great beauty in the idea of putting a well-designed learning tool into the hands of children and empowering those children by providing a system which is completely open and hackable. A large and motivated community of highly-capable people came together behind that vision and did their best to rethink how this technology should work and create something better. Deployment groups in a number of countries have gotten the resulting systems into the hands of thousands of children, and many of them are reporting good results. A lot of good things have happened here, and it doesn't have to end now.

But it might end soon. To pull things together, the project will have to communicate a clearer vision of where it plans to go with its software at all levels; Mr. Negroponte's statement of continued support for Sugar appears to be an attempt to start this process. The operational side of the project needs to get its act together. Some transparency on, for example, what is being done with donation money and what agreements have been made with outside corporations, would be most helpful. And, most of all, the group of volunteers working with this project have to be convinced anew that they are not wasting their time. If the project's leadership can manage all of that, there may well be great things coming from OLPC in the future.

Comments (8 posted)

Page editor: Jonathan Corbet

Security

Image handling vulnerabilities

By Jake Edge
April 23, 2008

Bugs that linger for eight years without a fix are probably annoying to whoever reported them; perhaps others as well. When those bugs have possible security implications, it is hard to see how they can remain unfixed for even eight months, let alone years, but that appears to be the case with some GTK image handling bugs. Code to handle image formats has been the source of numerous vulnerabilities along the way, which makes it even harder to see why these have languished so long.

A call for ideas for a hackfest on the GNOME foundation mailing list seems like a bit of a strange place to find information about vulnerabilities, but in the ensuing thread, Michael Chudobiak brought up some bugs that he would like to see addressed, perhaps as part of a hackfest:

I'd like to suggest one possible topic: The pixbuf loaders. They're slow and memory intensive, and this drags down anything that needs thumbnails (Nautilus, etc). There is a lot of opportunity to improve the responsiveness of the desktop here.

The bugs he listed were from 2002 (80925), 2004 (142428), and 2008 (522803), but Alan Cox mentioned that he reported one of them as a GNOME security bug "about eight years ago". In his opinion all of the bugs were of the "well known, never fixed" variety. Because the code in question lives in GTK—used by many GNOME applications—"quite a few gnome apps fed small compressed images explode".

The basic problem is that the routines handling images create the full-resolution image in memory regardless of the size requested. In addition, various memory-intensive techniques are used to scale the image to the requested size. This impacts Nautilus and other GNOME programs that create thumbnails of large images.

Presumably, a denial of service, at a minimum, can result from these operations, though there may be other ways to exploit any program crashes that result. Cox has a plan to see them get fixed:

Unfortunately they are well known but nobody seems to care. I'll forward your message to the vendor security list and we'll see what happens. Probably the bug just needs to be made *very* public to incentivise people to fix it 8)

The vendor security list, often abbreviated vendor-sec, is a closed mailing list for distribution security teams to exchange information about vulnerabilities in various programs. It is closed so that bugs that are not publicly known can be freely discussed. Whether Cox's posting to that list spurs any action remains to be seen.

It is a rare week where LWN does not report some kind of image handling botch as a new vulnerability. This week, a cups vulnerability in handling PNG files could lead to a denial of service; last week we reported an Opera vulnerability in handling images in HTML canvas elements that could possibly lead to arbitrary code execution. Image handling is an area where all bugs need to be scrutinized carefully for potential security issues.

Hopefully, part of the problem is that the GNOME hackers did not realize the security implications of the bugs. There does seem to be ample complaint about performance problems, though, to get some kind of action over the last six or eight years. This is a set of related bugs that have seemingly been overlooked for a long time. Perhaps that time is now coming to an end.

Comments (2 posted)

New vulnerabilities

clamav: buffer overflows

Package(s):clamav CVE #(s):CVE-2008-0314 CVE-2008-1100
Created:April 18, 2008 Updated:May 15, 2008
Description: Several remote vulnerabilities have been discovered in the Clam anti-virus toolkit.
Alerts:
Debian DSA-1549-1 2008-04-17
Mandriva MDVSA-2008:088 2007-04-17
SuSE SUSE-SA:2008:024 2008-04-24
Fedora FEDORA-2008-3420 2008-04-29
Fedora FEDORA-2008-3358 2008-04-29
Fedora FEDORA-2008-3900 2008-05-14

Comments (none posted)

clamav: multiple vulnerabilities

Package(s):clamav CVE #(s):CVE-2008-1387 CVE-2008-1833 CVE-2008-1835 CVE-2008-1836 CVE-2008-1837
Created:April 18, 2008 Updated:May 15, 2008
Description: From the CVE entries:

ClamAV before 0.93 allows remote attackers to cause a denial of service (CPU consumption) via a crafted ARJ archive, as demonstrated by the PROTOS GENOME test suite for Archive Formats. (CVE-2008-1387)

Heap-based buffer overflow in libclamav in ClamAV 0.92.1 allows remote attackers to execute arbitrary code via a crafted WWPack compressed PE binary. (CVE-2008-1833)

ClamAV before 0.93 allows remote attackers to bypass the scanning enging via a RAR file with an invalid version number, which cannot be parsed by ClamAV but can be extracted by Winrar. (CVE-2008-1835)

The rfc2231 function in message.c in libclamav in ClamAV before 0.93 allows remote attackers to cause a denial of service (crash) via a crafted message that produces a string that is not null terminated, which triggers a buffer over-read. (CVE-2008-1836)

libclamunrar in ClamAV before 0.93 allows remote attackers to cause a denial of service (crash) via crafted RAR files that trigger "memory problems," as demonstrated by the PROTOS GENOME test suite for Archive Formats. (CVE-2008-1837)

Alerts:
Mandriva MDVSA-2008:088 2007-04-17
SuSE SUSE-SA:2008:024 2008-04-24
Fedora FEDORA-2008-3420 2008-04-29
Fedora FEDORA-2008-3358 2008-04-29
Fedora FEDORA-2008-3900 2008-05-14

Comments (none posted)

cups: arbitrary code execution

Package(s):cups CVE #(s):CVE-2008-1722
Created:April 21, 2008 Updated:May 13, 2008
Description:

From the Gentoo advisory:

Thomas Pollet reported a possible integer overflow vulnerability in the PNG image handling in the file filter/image-png.c.

A malicious user might be able to execute arbitrary code with the privileges of the user running CUPS (usually lp), or cause a Denial of Service by sending a specially crafted PNG image to the print server. The vulnerability is exploitable via the network if CUPS is sharing printers remotely.

Alerts:
Gentoo 200804-23 2008-04-18
Ubuntu USN-606-1 2008-05-05
Fedora FEDORA-2008-3449 2008-05-09
Fedora FEDORA-2008-3586 2008-05-09
Fedora FEDORA-2008-3756 2008-05-13

Comments (none posted)

dbmail: authentication bypass

Package(s):dbmail CVE #(s):CVE-2007-6714
Created:April 21, 2008 Updated:April 29, 2008
Description:

From the Gentoo advisory:

A vulnerability in DBMail's authldap module when used in conjunction with an Active Directory server has been reported by vugluskr. When passing a zero length password to the module, it tries to bind anonymously to the LDAP server. If the LDAP server allows anonymous binds, this bind succeeds and results in a successful authentication to DBMail.

By passing an empty password string to the server, an attacker could be able to log in to any account.

Alerts:
Gentoo 200804-24 2008-04-18
Fedora FEDORA-2008-3333 2008-04-29
Fedora FEDORA-2008-3371 2008-04-29

Comments (none posted)

fedora-ds-admin: privilege escalation and arbitrary command execution

Package(s):fedora-ds-admin CVE #(s):CVE-2008-0892 CVE-2008-0893
Created:April 22, 2008 Updated:April 23, 2008
Description: From the CVE entries:
The replication monitor CGI script (repl-monitor-cgi.pl) in Red Hat Administration Server, as used by Red Hat Directory Server 8.0 EL4 and EL5, allows remote attackers to execute arbitrary commands.

Red Hat Administration Server, as used by Red Hat Directory Server 8.0 EL4 and EL5, does not properly restrict access to CGI scripts, which allows remote attackers to perform administrative actions.

Alerts:
Fedora FEDORA-2008-3214 2008-04-21
Fedora FEDORA-2008-3220 2008-04-21

Comments (none posted)

feh: shell command injection

Package(s):feh CVE #(s):
Created:April 17, 2008 Updated:April 23, 2008
Description: feh has a vulnerability involving shell command injection using specially crafted file names.
Alerts:
Fedora FEDORA-2008-3068 2008-04-17
Fedora FEDORA-2008-3064 2008-04-17

Comments (none posted)

firefox: denial of service

Package(s):firefox CVE #(s):CVE-2008-1380
Created:April 17, 2008 Updated:May 12, 2008
Description: From the Red Hat alert: A flaw was found in the processing of malformed JavaScript content. A web page containing such malicious content could cause Firefox to crash or, potentially, execute arbitrary code as the user running Firefox.
Alerts:
Red Hat RHSA-2008:0222-02 2008-04-16
Red Hat RHSA-2008:0223-02 2008-04-16
Slackware SSA:2008-108-01 2008-04-18
Ubuntu USN-602-1 2008-04-22
Fedora FEDORA-2008-3283 2008-04-22
Fedora FEDORA-2008-3283 2008-04-22
Fedora FEDORA-2008-3283 2008-04-22
Fedora FEDORA-2008-3283 2008-04-22
Fedora FEDORA-2008-3283 2008-04-22
Fedora FEDORA-2008-3283 2008-04-22
Fedora FEDORA-2008-3283 2008-04-22
Fedora FEDORA-2008-3283 2008-04-22
Fedora FEDORA-2008-3283 2008-04-22
Fedora FEDORA-2008-3283 2008-04-22
Debian DSA-1555-1 2008-04-23
Fedora FEDORA-2008-3231 2008-04-22
Fedora FEDORA-2008-3249 2008-04-22
Fedora FEDORA-2008-3249 2008-04-22
Fedora FEDORA-2008-3249 2008-04-22
Fedora FEDORA-2008-3249 2008-04-22
Fedora FEDORA-2008-3249 2008-04-22
Fedora FEDORA-2008-3249 2008-04-22
Fedora FEDORA-2008-3249 2008-04-22
Fedora FEDORA-2008-3249 2008-04-22
Fedora FEDORA-2008-3249 2008-04-22
Fedora FEDORA-2008-3249 2008-04-22
Fedora FEDORA-2008-3249 2008-04-22
Fedora FEDORA-2008-3249 2008-04-22
Fedora FEDORA-2008-3249 2008-04-22
Fedora FEDORA-2008-3249 2008-04-22
Fedora FEDORA-2008-3264 2008-04-22
Fedora FEDORA-2008-3283 2008-04-22
Fedora FEDORA-2008-3283 2008-04-22
Fedora FEDORA-2008-3283 2008-04-22
Fedora FEDORA-2008-3283 2008-04-22
Fedora FEDORA-2008-3283 2008-04-22
Debian DSA-1558-1 2008-04-24
Debian DSA-1562-1 2008-04-28
Red Hat RHSA-2008:0224-01 2008-04-30
CentOS CESA-2008:0224 2008-05-08
Foresight FLEA-2008-0008-1 2008-05-08
SuSE SUSE-SR:2008:011 2008-05-09
Fedora FEDORA-2008-3519 2008-05-09
Fedora FEDORA-2008-3557 2008-05-09

Comments (none posted)

ikiwiki: cross-site request forgery

Package(s):ikiwiki CVE #(s):CVE-2008-0165
Created:April 21, 2008 Updated:April 23, 2008
Description:

From the Debian advisory:

It has been discovered that ikiwiki, a Wiki implementation, does not guard password and content changes against cross-site request forgery (CSRF) attacks.

Alerts:
Debian DSA-1553-1 2008-04-20

Comments (none posted)

mplayer: arbitrary code execution

Package(s):mplayer CVE #(s):CVE-2008-1558
Created:April 21, 2008 Updated:April 23, 2008
Description:

From the Debian advisory:

It was discovered that the MPlayer movie player performs insufficient input sanitising on SDP session data, leading to potential execution of arbitrary code through a malformed multimedia stream.

Alerts:
Debian DSA-1552-1 2008-04-19

Comments (none posted)

mt-daapd: integer overflow

Package(s):mt-daapd CVE #(s):CVE-2008-1771
Created:April 23, 2008 Updated:April 23, 2008
Description: The mt-daapd music server suffers from an integer overflow enabling remote denial of service attacks and possibly code execution.
Alerts:
Fedora FEDORA-2008-3250 2008-04-22

Comments (none posted)

openfire: denial of service

Package(s):openfire CVE #(s):CVE-2008-1728
Created:April 23, 2008 Updated:April 23, 2008
Description: The openfire (formerly wildfire) Jabber server cannot cope with clients which fail to read messages, leading to a denial of service vulnerability.
Alerts:
Gentoo 200804-26 2008-04-23

Comments (none posted)

openoffice.org: multiple vulnerabilities

Package(s):openoffice.org CVE #(s):CVE-2007-5745 CVE-2007-5746 CVE-2007-5747 CVE-2008-0320
Created:April 17, 2008 Updated:May 15, 2008
Description: From the Debian alert:

CVE-2007-5745, CVE-2007-5747: Several bugs have been discovered in the way OpenOffice.org parses Quattro Pro files that may lead to a overflow in the heap potentially leading to the execution of arbitrary code.

CVE-2007-5746: Specially crafted EMF files can trigger a buffer overflow in the heap that may lead to the execution of arbitrary code.

CVE-2008-0320: A bug has been discovered in the processing of OLE files that can cause a buffer overflow in the heap potentially leading to the execution of arbitrary code.

Alerts:
Debian DSA-1547-1 2008-04-17
Red Hat RHSA-2008:0175-01 2008-04-17
SuSE SUSE-SA:2008:023 2008-04-18
Mandriva MDVSA-2008:090 2008-04-20
Fedora FEDORA-2008-3251 2008-04-22
Mandriva MDVSA-2008:095 2008-05-02
Ubuntu USN-609-1 2008-05-06
Gentoo 200805-16 2008-05-14

Comments (none posted)

php-toolkit: denial of service

Package(s):php-toolkit CVE #(s):CVE-2008-1734
Created:April 18, 2008 Updated:April 23, 2008
Description: From the Gentoo advisory: Toni Arnold, David Sveningsson, Michal Bartoszkiewicz, and Joseph reported that php-select does not quote parameters passed to the "tr" command, which could convert the "-D PHP5" argument in the "APACHE2_OPTS" setting in the file /etc/conf.d/apache2 to lower case.
Alerts:
Gentoo 200804-19 2008-04-17

Comments (none posted)

poppler: arbitrary code execution

Package(s):poppler CVE #(s):CVE-2008-1693
Created:April 17, 2008 Updated:May 9, 2008
Description: From the Gentoo alert: Poppler does not handle fonts inside PDF files safely, allowing for execution of arbitrary code.
Alerts:
Gentoo 200804-18:02 2008-04-17
Debian DSA-1548-1 2008-04-17
Red Hat RHSA-2008:0238-01 2008-04-17
Red Hat RHSA-2008:0239-01 2008-04-17
Red Hat RHSA-2008:0240-01 2008-04-17
Ubuntu USN-603-1 2008-04-17
Ubuntu USN-603-2 2008-04-17
Mandriva MDVSA-2008:089 2008-04-17
Fedora FEDORA-2008-3312 2008-04-29
Red Hat RHSA-2008:0262-01 2008-05-08
CentOS CESA-2008:0262 2008-05-08
SuSE SUSE-SR:2008:011 2008-05-09

Comments (none posted)

python2.4: arbitrary code execution

Package(s):python2.4 CVE #(s):CVE-2008-1887
Created:April 21, 2008 Updated:April 23, 2008
Description:

From the Debian advisory:

CVE-2008-1887: Justin Ferguson discovered that insufficient input validation in PyString_FromStringAndSize() may lead to the execution of arbitrary code.

Alerts:
Debian DSA-1551-1 2008-04-19

Comments (none posted)

speex: insufficient boundary checks

Package(s):speex CVE #(s):CVE-2008-1686
Created:April 17, 2008 Updated:May 9, 2008
Description: The speex speech codec has insufficient boundary checking in speex_packet_to_header().
Alerts:
Fedora FEDORA-2008-3191 2008-04-17
Fedora FEDORA-2008-3103 2008-04-17
Fedora FEDORA-2008-3059 2008-04-17
Gentoo 200804-17 2008-04-17
Red Hat RHSA-2008:0235-01 2008-04-16
Slackware SSA:2008-111-01 2008-04-22
Mandriva MDVSA-2008:093 2008-04-29
Mandriva MDVSA-2008:092 2008-04-29
Mandriva MDVSA-2008:094 2007-04-29
Ubuntu USN-611-1 2008-05-08
Ubuntu USN-611-2 2008-05-08
Ubuntu USN-611-3 2008-05-08

Comments (none posted)

sun java: multiple vulnerabilities

Package(s):sun-jre, sun-jdk CVE #(s):CVE-2007-5689 CVE-2007-5237 CVE-2008-0628
Created:April 18, 2008 Updated:April 28, 2008
Description: From the CVE entries:

The Java Virtual Machine (JVM) in Sun Java Runtime Environment (JRE) in SDK and JRE 1.3.x through 1.3.1_20 and 1.4.x through 1.4.2_15, and JDK and JRE 5.x through 5.0 Update 12 and 6.x through 6 Update 2, allows remote attackers to execute arbitrary programs, or read or modify arbitrary files, via applets that grant privileges to themselves. (CVE-2007-5689)

Java Web Start in Sun JDK and JRE 6 Update 2 and earlier does not properly enforce access restrictions for untrusted applications, which allows user-assisted remote attackers to read and modify local files via an untrusted application, aka "two vulnerabilities." (CVE-2007-5237)

The XML parsing code in Sun Java Runtime Environment JDK and JRE 6 Update 3 and earlier processes external entity references even when the "external general entities" property is false, which allows remote attackers to conduct XML external entity (XXE) attacks and cause a denial of service or access restricted resources. (CVE-2008-0628)

Alerts:
Gentoo 200804-20 2008-04-17
Red Hat RHSA-2008:0245-01 2008-04-28

Comments (none posted)

suphp: privilege escalation

Package(s):suphp CVE #(s):CVE-2008-1614
Created:April 18, 2008 Updated:April 23, 2008
Description: suPHP before 0.6.3 allows local users to gain privileges via (1) a race condition that involves multiple symlink changes to point a file owned by a different user, or (2) a symlink to the directory of a different user, which is used to determine privileges.
Alerts:
Debian DSA-1550-1 2008-04-17

Comments (none posted)

vlc: multiple vulnerabilities

Package(s):vlc CVE #(s):CVE-2008-1881 CVE-2008-1489 CVE-2008-1768 CVE-2008-1769
Created:April 23, 2008 Updated:April 23, 2008
Description: The latest set of vulnerabilities in vlc include a stack-based buffer overflow in the subtitle code (CVE-2008-1881), an integer overflow vulnerability in the MP4 code leading to a heap overflow (CVE-2008-1489), more integer overflows (CVE-2008-1768), and a "boundary error" in Cinepak leading to memory corruption (CVE-2008-1769).
Alerts:
Gentoo 200804-25 2008-04-23

Comments (none posted)

WebKit: cross-site scripting and code execution

Package(s):WebKit CVE #(s):CVE-2008-1010 CVE-2008-1011
Created:April 23, 2008 Updated:April 29, 2008
Description: The WebKit browser engine suffers from a buffer overflow leading to arbitrary code execution and a cross-site scripting vulnerability; some more information is available from this summary.
Alerts:
Fedora FEDORA-2008-3229 2008-04-22
Fedora FEDORA-2008-3229 2008-04-22
Fedora FEDORA-2008-3229 2008-04-22
Fedora FEDORA-2008-3415 2008-04-29
Fedora FEDORA-2008-3415 2008-04-29

Comments (none posted)

Page editor: Jake Edge

Kernel development

Release status

Kernel release status

The 2.6.26 merge window is open, so there is no current 2.6 development release. See the article below for a summary of the patches merged for 2.6.26 so far.

The current -mm tree is 2.6.25-mm1. Recent changes to -mm include some read-copy-update enhancements and the OLPC architecture support code, but mostly -mm is just getting ready for the big flow of patches into the mainline. See the -mm merge plans document for Andrew's plans for 2.6.26.

The current stable 2.6 kernel is 2.6.25, released on April 16. After nearly three months of development and the merging of over 12,000 patches from almost 1200 developers, this kernel is now considered ready for wider use. Highlights of this release include the ath5k (Atheros wireless) driver, a bunch of realtime work including realtime group scheduling, preemptable RCU, LatencyTop support, a number of new ext4 filesystem features, support for the controller area network protocol, more network namespace work, the return of the timerfd() system call, the page map patches (providing much better information on system memory use), the SMACK security module, better kernel support for Intel and ATI R500 graphics chipsets, the memory use controller, ACPI thermal regulation support, MN10300 architecture support, and much more. See the KernelNewbies 2.6.25 page for lots of details, or the full changelog for unbelievable amounts of detail.

2.6.24.5 was released on April 18. It contains a relatively long list of fixes for significant 2.6.24 problems.

For older kernels: 2.4.36.3 was released on April 19. "Nothing outstanding here, I've just decided to release pending fixes. Those already running 2.4.36.2 have no particular reason to upgrade, unless they already experience troubles in the fixed areas."

Comments (1 posted)

Kernel development news

Quotes of the week

In any case, we'll continue to use the fact that mprotect is also broken to get our WC mapping working (using mprotect PROT_NONE followed by mprotect PROT_READ|PROT_WRITE causes the CD and WT bits to get cleared). We're fortunate in this case that we've found a bug to exploit that gives us the desired behaviour.
-- Keith Packard

Nice-looking code - kgdb has improved rather a lot. I'm glad we finally got it in. Maybe one day I'll get to use it again.
-- Andrew Morton

/me duly notes this request to break Andrew's systems even more frequently ;-)
-- Ingo Molnar

Comments (none posted)

The 2.6.26 merge window opens

By Jonathan Corbet
April 23, 2008
That shiny new 2.6.25 kernel which was released on April 16 is now ancient history; some 3500 changesets have been merged into the mainline git repository since then. Some of the most significant user-visible changes include:

  • New drivers for Korina (IDT rc32434) Ethernet MACs, SuperH MX-G and SH-MobileR2 CPUs, Solution Engine SH7721 boards, ARM YL9200, Kwikbyte KB9260, Olimex SAM9-L9260, and emQbit ECB_AT91 boards, Digi ns921x processors, the Nias Digital SMX crypto engines, AMCC PPC460EX evaluation boards, Emerson KSI8560 boards, Wind River SBC8641D boards, Logitech Rumblepad 2 force-feedback devices, Renesas SH7760 I2C controllers, and SuperH Mobile I2C controllers.

  • The PCI subsystem now supports PCI Express Active State Power Management, which can yield significant power savings on suitably equipped hardware.

  • There is a new security= boot parameter which allows the specification of which security module to use if more than one are available.

  • Network address translation (NAT) is now supported for the SCTP, DCCP, and UDP-Lite protocols. There is also netfilter connection tracking support for DCCP.

  • The network stack can now negotiate selective acknowledgments and window scaling even when syncookies are in use.

  • Another long series of network namespace patches has been merged, continuing the long process of making all networking code namespace-aware.

  • Mesh networking support has been added to the mac80211 layer. It is currently marked "broken," though, until various outstanding issues are fixed.

  • 4K stacks are now the default for the x86 architecture. This change is controversial and could be reversed by the time the final release happens.

  • SELinux now supports "permissive types" which allow specific domains to run as if SELinux were not present in the system at all.

  • A number of enhancements have been made to the realtime group scheduler, including multi-level groups, the ability to mix processes and groups (and have them compete against each other for CPU time), better SMP balancing, and more.

  • Support for the running of SunOS and Solaris binaries has been removed; it has long been unmaintained and did not work well.

  • The kernel now has support for read-only bind mounts, which provide a read-only view into an otherwise writable filesystem. This feature (the implementation of which was more involved than one might think) is intended for use in containers and other situations where even processes running as root should not be able to modify certain filesystems.

Changes visible to kernel developers include:

  • At long last, support for the KGDB interactive debugger has been added to the x86 architecture. There is a DocBook document in the Documentation directory which provides an overview on how to use this new facility.

  • Page attribute table (PAT) support is also (again, at long last) available for the x86 architecture. PATs allow for fine-grained control of memory caching behavior with more flexibility than the older MTRR feature. See Documentation/x86/pat.txt for more information.

  • Two new functions (inode_getsecid() and ipc_getsecid()), added to support security modules and the audit code, provide general access to security IDs associated with inodes and IPC objects. A number of superblock-related LSM callbacks now take a struct path pointer instead of struct nameidata. There is also a new set of hooks providing generic audit support in the security module framework.

  • The now-unused ieee80211 software MAC layer has been removed; all of the drivers which needed it have been converted to mac80211. Also removed are the sk98lin network driver (in favor of skge) and bcm43xx (replaced by b43 and b43legacy).

  • The generic semaphores patch has been merged. The semaphore code also has new down_killable() and down_timeout() functions.

  • The ata_port_operations structure used by libata drivers now supports a simple sort of operation inheritance, making it easier to write drivers which are "almost like" existing code, but with small differences.

  • A new function (ns_to_ktime()) converts a time value in nanoseconds to ktime_t.

  • The final users of struct class_device have been converted to use struct device instead. If all goes well, the class_device structure will be removed later in the 2.6.26 cycle.

  • Greg Kroah-Hartman is no longer the PCI subsystem maintainer, having passed that responsibility on to Jesse Barnes.

  • The seq_file code now accepts a return value of SEQ_SKIP from the show() callback; that value causes any accumulated output from that call to be discarded.

Needless to say, this development series is still young and, as of this writing, the merge window has over a week to run. So there will be a lot more code going into the mainline before the shape of 2.6.26 becomes clear.

Comments (1 posted)

4K stacks by default?

By Jake Edge
April 23, 2008

The kernel stack is a rather important chunk of memory in any Linux system. The unpleasant kernel memory corruption that results from overflowing it is something that is to be avoided at all costs. But the stack is allocated for each process and thread in the system, so those who are looking to reduce memory usage target the 8K stack used by default on x86. In addition, an 8K stack requires two physically contiguous pages (an "order 1" allocation) which can be difficult to satisfy on a running system due to fragmentation.

Linux has had optional support for 4K stacks for nearly four years now, with Fedora and RHEL enabling it on the kernels they ship, but a recent patch to make it the default for x86 has raised some eyebrows. Andrew Morton sees it as bypassing the normal patch submission process:

This patch will cause kernels to crash.

It has no changelog which explains or justifies the alteration.

afaict the patch was not posted to the mailing list and was not discussed or reviewed.

It is not surprising that patch author Ingo Molnar sees things a little differently:

what mainline kernels crash and how will they crash? Fedora and other distros have had 4K stacks enabled for years [ ... ] and we've conducted tens of thousands of bootup tests with all sorts of drivers and kernel options enabled and have yet to see a single crash due to 4K stacks. So basically the kernel default just follows the common distro default now. (distros and users can still disable it)

As described in an earlier LWN article, the main concerns about only providing 4K for the kernel stack are for complicated storage configurations or for people using NDISwrapper. There is fairly high disdain for the latter case—as it is done to load proprietary Windows drivers into the kernel—but it could lead to a pretty hideous failure in the former. Data corruption certainly seems like a possibility, but, regardless, a kernel crash is definitely not what an administrator wants to have to deal with.

Arjan van de Ven summarized the current state, noting that NDISwrapper really requires 12K stacks, so having 8K only makes it less likely those kernels will crash. The stacking of multiple storage drivers (network filesystems, device mapper, RAID, etc.) is a bigger issue:

we need to know which they are, and then solve them, because even on x86-64 with 8k stacks they can be a problem (just because the stack frames are bigger, although not quite double, there).

Proponents of default 4K stacks seem to be puzzled why there is objection to the change since there have been no problems with Red Hat kernels. But Andi Kleen notes:

One way they do that is by marking significant parts of the kernel unsupported. I don't think that's an option for mainline.

The xfs filesystem, which is not supported in RHEL or Fedora, can potentially use a great deal of stack. This leads some kernel hackers to worry that a complicated configuration that uses it, an "nfs+xfs+md+scsi writeback" configuration as Eric Sandeen puts it, could overflow. Work is already proceeding to reduce the xfs stack usage, but it clearly is a problem that xfs hackers have seen. David Chinner responds to a question about stack overflows:

We see them regularly enough on x86 to know that the first question to any strange crash is "are you using 4k stacks?". In comparison, I have never heard of a single stack overflow on x86_64....

It would seem premature to make 4K stacks the default. There is good reason to believe that folks using xfs could run into problems. But there is a larger issue, one that Morton brought up in his initial message, then reiterated later in the thread:

Anyway. We should be having this sort of discussion _before_ a patch gets merged, no?

The memory savings can be significant, especially in the embedded world. Coupled with the elimination of order 1 allocations each time a process gets created, there is good reason to keep working toward 4K stacks by default. As of this writing, the default remains for 4K stacks in Linus's tree, but that could change before long.

Comments (25 posted)

ELC: Morton and Saxena on working with the kernel community

By Jake Edge
April 21, 2008

In many ways, Andrew Morton's keynote set the tone for this year's Embedded Linux Conference (ELC) by describing the ways that embedded companies and developers can work with the kernel community in a way that will be "mutually beneficial". Morton provided reasons, from a purely economic standpoint, why it makes sense for companies to get their code into the mainline kernel. He also provided concrete suggestions on how to make that happen. The theme of the conference seemed to be "working with the community" and Morton's speech provided an excellent example of how and why to do just that.

Conference organizer Tim Bird introduced the keynote as "the main event" for ELC, noting that he often thought of Morton as "kind of like the adult in the room" on linux-kernel. Readers of that mailing list tend to get the impression that there's more than one of him around because of all that he does. He also noted that it was surprising to some that Morton has an embedded Linux background—from his work at Digeo.

Morton believes that embedded development is underrepresented in kernel.org work relative to its economic importance. This is caused by a number of factors, not least the financial constraints under which much embedded development is done. An exceptional case is the chip and board manufacturers who have a big interest in seeing Linux run well on their hardware so that they can attract more customers. But even those do not contribute as much as he would like to see to kernel development.

An effect of this underrepresentation is a risk that it will tilt kernel development more toward the server and desktop. The kernel team is already accused of being server-centric, and there is some truth to that, "but not as much as one might think". Kernel hackers do care about the desktop as well as embedded devices, but without an advocate for embedded concerns, sometimes things get missed.

Something Morton would like to see is a single full-time "embedded maintainer". That person would serve as the advocate for embedded concerns, ensuring that they didn't get overlooked in the process. An embedded maintainer could make a significant impact for embedded development.

Not all kernel contributions need to be code, he said. There is a need just to hear the problems that are being faced by the embedded community along with lists of things that are missing. "Senior, sophisticated people" are needed to help prioritize the features that are being considered as well. Morton often finds out things he didn't know at conferences, things that he should have known about much earlier: "That's bad!"

Morton is trying to incite the embedded community to interact with the kernel hackers more on linux-kernel. He said that a great way to get the attention of the team is to come onto the mailing list and make them look bad. Unfavorable comparisons to other systems or earlier kernels, for example, especially when backed up with numbers, are noticed quickly. He said that it is important to remember that the person who makes the most noise gets the most attention.

One of the areas that he is most concerned about is the practice of "patch hoarding"—holding on to kernel changes as patches without submitting them upstream to the kernel hackers. It is hopefully only due to a lack of resources, but he has heard that some are doing it to try and gain a competitive advantage. This is simply wrong, he said, companies have a "moral if not legal obligation" to submit those patches.

The code will be better because of the review done by the kernel hackers; once it is done, the maintenance cost falls to near zero as well. He also touted the competitive advantage, noting that getting your code merged means that you have won—competing proposals won't get in.

There are many good reasons for getting code merged upstream that Morton outlined. The code will be better because of the review done by the kernel hackers; once it is done, the maintenance cost falls to near zero as well. He also touted the competitive advantage, noting that getting your code merged means that you have won—competing proposals won't get in. Being the first to merge a feature can make it easier on yourself and harder on your competition.

There are downsides to getting your code upstream as well. Most of those stem from not getting code out there early enough for review. The kernel developers can ask for significant changes to the code especially in the area of user space interfaces. If a company already has lots of code using the new feature and/or interface, it could be very disruptive; "sorry, there's no real fix for that except getting your code out early enough".

Another downside that companies may run into is with competitors being brought into the process. Morton and other kernel hackers will try to find others who might have a stake in a new feature to get them involved so that everybody's needs are taken into account. This can blunt the "win" of getting your feature merged. Some are also concerned that competitors will get access to the code once it has been submitted; "tough luck" Morton said, everything in the kernel is GPL.

Morton had specific suggestions for choosing a kernel version to use for an embedded project. 2.6.24 is not a lot better than 2.4.18 for embedded use, but it has one important feature: the kernel team will be interested in bugs in the current kernel. He suggests starting with the current kernel, upgrading it while development proceeds, freezing it only when it is time to ship the product.

He also suggests that a company create an internal kernel team with one or two people who are the interface to linux-kernel. This will help with name recognition on the mailing list, which will in turn get patches submitted more attention. Over time, by participating and reviewing others' code, the interface people will build up "brownie points" that will allow them to call in favors to get their code reviewed, or to help smooth the path for inclusion.

The kernel.org developers appear to give free support, generally very good support, Morton said, but it is not truly free. Kernel hackers do it as a "mutually beneficial transaction"; they don't do it to make more money for your company, they do it to make the kernel better. Morton is definitely a big part of that, inviting people to email him, especially if "five minutes of my time can save months of yours".

The decision about when to merge a new feature is hard for some to understand. Many consider Linux a dictatorship, which is incorrect, it is instead "a democracy that doesn't vote". The merge decision is made on the model of the "rule of law" with kernel hackers playing the role of judges. Unfortunately, there are few written rules.

Some of the factors that go into his decision about a particular feature are its maintainability, whether there will be an ongoing maintenance team, as well as the general usefulness of the feature. Depending on the size of the feature, an ongoing maintenance team can be the deciding factor. It is not so important for a driver, but a new architecture, for example, needs ongoing maintenance that can only be done by people with knowledge of and access to the hardware.

MontaVista kernel hacker, Deepak Saxena, gave a presentation entitled "Appropriate Community Practices: Social and Technical Advice" later in the conference that mirrored many of Morton's points. He showed some examples of hardware vendors making bad decisions that got shot down by the kernel developers, mostly because they didn't "release early and release often". There is a dangerous attitude that "it's Linux, it's open source, I can do anything I want" which is true, but won't get you far with the community.

Saxena has high regard for the benefits of working with the system: if your competitor is active in the community, they are getting an advantage that you aren't. Like Morton, he believes that some members of the development team need to get involved in kernel.org activities. "The community is an extension of your team, your team is an extension of the community."

He also has specific advice for hardware vendors: avoid abstraction layers, recognize that your hardware is not unique, and think beyond the reference board implementation. Generalizing your code so that others can use it will make it much more acceptable, as will talking with the developers responsible for the subsystems you are touching. Abstraction layers may be helpful for hardware vendors trying to support multiple operating systems, but they make it difficult for the kernel hackers to understand and maintain the code. The kernel.org folks are not interested in finding and fixing bugs in an abstraction layer.

He also points out additional benefits of getting code merged. Once it is in the kernel, the company's team will no longer have to keep up with kernel releases, updating their patches to follow the latest changes. The code will still need to be maintained, but day-to-day changes will be handled by the kernel.org folks. An additional benefit is that the code will be enhanced by various efforts to automatically find bugs in mainline kernel code with tools like lockdep.

It is clear that the kernel hackers are making a big effort to not only get code from the embedded folks, but also some of their expertise. There are various outreach efforts to try and get more people involved in the Linux development process; these two talks are certainly a part of that. By making it clear that there are benefits to both parties, they hope to make an argument that will reach up from engineering to management resulting in a better kernel for all.

Comments (27 posted)

Integrating and Validating dynticks and Preemptable RCU

April 22, 2008

This article was contributed by Paul McKenney

Introduction

Read-copy update (RCU) is a synchronization mechanism that was added to the Linux kernel in October of 2002. RCU is most frequently described as a replacement for reader-writer locking, but it has also been used in a number of other ways. RCU is notable in that RCU readers do not directly synchronize with RCU updaters, which makes RCU read paths extremely fast, and also permits RCU readers to accomplish useful work even when running concurrently with RCU updaters.

In early 2008, a preemptable variant of RCU was accepted into mainline Linux in support of real-time workloads, a variant similar to the RCU implementations in the -rt patchset since August 2005. Preemptable RCU is needed for real-time workloads because older RCU implementations disable preemption across RCU read-side critical sections, resulting in excessive real-time latencies.

However, one disadvantage of the -rt implementation was that each grace period required work to be done on each CPU, even if that CPU is in a low-power “dynticks-idle” state, and thus incapable of executing RCU read-side critical sections. The idea behind the dynticks-idle state is that idle CPUs should be physically powered down in order to conserve energy. In short, preemptable RCU can disable a valuable energy-conservation feature of recent Linux kernels. Although Josh Triplett and Paul McKenney had discussed some approaches for allowing CPUs to remain in low-power state throughout an RCU grace period (thus preserving the Linux kernel's ability to conserve energy), matters did not come to a head until Steve Rostedt integrated a new dyntick implementation with preemptable RCU in the -rt patchset.

This combination caused one of Steve's systems to hang on boot, so in October, Paul coded up a dynticks-friendly modification to preemptable RCU's grace-period processing. Steve coded up rcu_irq_enter() and rcu_irq_exit() interfaces called from the irq_enter() and irq_exit() interrupt entry/exit functions. These rcu_irq_enter() and rcu_irq_exit() functions are needed to allow RCU to reliably handle situations where a dynticks-idle CPUs is momentarily powered up for an interrupt handler containing RCU read-side critical sections. With these changes in place, Steve's system booted reliably, but Paul continued inspecting the code periodically on the assumption that we could not possibly have gotten the code right on the first try.

Paul reviewed the code repeatedly from October 2007 to February 2008, and almost always found at least one bug. In one case, Paul even coded and tested a fix before realizing that the bug was illusory, but in all cases, the “bug” was in fact illusory.

Near the end of February, Paul grew tired of this game. He therefore decided to enlist the aid of Promela and spin, as described in the LWN article Using Promela and Spin to verify parallel algorithms. This article presents a series of seven increasingly realistic Promela models, the last of which passes, consuming about 40GB of main memory for the state space.

Quick Quiz 1: Yeah, that's great!!! Now, just what am I supposed to do if I don't happen to have a machine with 40GB of main memory???

More important, Promela and Spin did find a very subtle bug for me!!!

This article is organized as follows:

  1. Introduction to Preemptable RCU and dynticks
    1. Task Interface
    2. Interrupt Interface
    3. Grace-Period Interface
  2. Validating Preemptable RCU and dynticks
    1. Basic Model
    2. Validating Safety
    3. Validating Liveness
    4. Interrupts
    5. Validating Interrupt Handlers
    6. Validating Nested Interrupt Handlers
    7. Validating NMI Handlers

These sections are followed by conclusions and answers to the Quick Quizzes.

Introduction to Preemptable RCU and dynticks

The per-CPU dynticks_progress_counter variable is central to the interface between dynticks and preemptable RCU. This variable has an even value whenever the corresponding CPU is in dynticks-idle mode, and an odd value otherwise. A CPU exits dynticks-idle mode for the following three reasons:

  1. to start running a task,
  2. when entering the outermost of a possibly nested set of interrupt handlers, and
  3. when entering an NMI handler.

Preemptable RCU's grace-period machinery samples the value of the dynticks_progress_counter variable in order to determine when a dynticks-idle CPU may safely be ignored.

The following three sections give an overview of the task interface, the interrupt/NMI interface, and the use of the dynticks_progress_counter variable by the grace-period machinery.

Task Interface

When a given CPU enters dynticks-idle mode because it has no more tasks to run, it invokes rcu_enter_nohz():

  1 static inline void rcu_enter_nohz(void)
  2 {
  3   mb();
  4   __get_cpu_var(dynticks_progress_counter)++;
  5   WARN_ON(__get_cpu_var(dynticks_progress_counter) & 0x1);
  6 }

This function simply increments dynticks_progress_counter and checks that the result is even, but first executing a memory barrier to ensure that any other CPU that sees the new value of dynticks_progress_counter will also see the completion of any prior RCU read-side critical sections.

Similarly, when a CPU that is in dynticks-idle mode prepares to start executing a newly runnable task, it invokes rcu_exit_nohz:

  1 static inline void rcu_exit_nohz(void)
  2 {
  3   __get_cpu_var(dynticks_progress_counter)++;
  4   mb();
  5   WARN_ON(!(__get_cpu_var(dynticks_progress_counter) & 0x1));
  6 }

This function again increments dynticks_progress_counter, but follows it with a memory barrier to ensure that if any other CPU sees the result of any subsequent RCU read-side critical section, then that other CPU will also see the incremented value of dynticks_progress_counter. Finally, rcu_exit_nohz() checks that the result of the increment is an odd value.

The rcu_enter_nohz() and rcu_exit_nohz functions handle the case where a CPU enters and exits dynticks-idle mode due to task execution, but does not handle interrupts, which are covered in the following section.

Interrupt Interface

The rcu_irq_enter() and rcu_irq_exit() functions handle interrupt/NMI entry and exit, respectively. Of course, nested interrupts must also be properly accounted for. The possibility of nested interrupts is handled by a second per-CPU variable, rcu_update_flag, which is incremented upon entry to an interrupt or NMI handler (in rcu_irq_enter()) and is decremented upon exit (in rcu_irq_exit()). In addition, the pre-existing in_interrupt() primitive is used to distinguish between an outermost or a nested interrupt/NMI.

Interrupt entry is handled by the rcu_irq_enter shown below:

  1 void rcu_irq_enter(void)
  2 {
  3   int cpu = smp_processor_id();
  4 
  5   if (per_cpu(rcu_update_flag, cpu))
  6     per_cpu(rcu_update_flag, cpu)++;
  7   if (!in_interrupt() &&
  8       (per_cpu(dynticks_progress_counter, cpu) & 0x1) == 0) {
  9     per_cpu(dynticks_progress_counter, cpu)++;
 10     smp_mb();
 11     per_cpu(rcu_update_flag, cpu)++;
 12   }
 13 }

Quick Quiz 2: Why not simply increment rcu_update_flag, and then only increment dynticks_progress_counter if the old value of rcu_update_flag was zero???

Quick Quiz 3: But if line 7 finds that we are the outermost interrupt, wouldn't we always need to increment dynticks_progress_counter?

Line 3 fetches the current CPU's number, while lines 4 and 5 increment the rcu_update_flag nesting counter if it is already non-zero. Lines 6 and 7 check to see whether we are the outermost level of interrupt, and, if so, whether dynticks_progress_counter needs to be incremented. If so, line 9 increments dynticks_progress_counter, line 10 executes a memory barrier, and line 11 increments rcu_update_flag. As with rcu_exit_nohz(), the memory barrier ensures that any other CPU that sees the effects of an RCU read-side critical section in the interrupt handler (following the rcu_irq_enter() invocation) will also see the increment of dynticks_progress_counter.

Interrupt entry is handled similarly by rcu_irq_exit():

  1 void rcu_irq_exit(void)
  2 {
  3   int cpu = smp_processor_id();
  4 
  5   if (per_cpu(rcu_update_flag, cpu)) {
  6     if (--per_cpu(rcu_update_flag, cpu))
  7       return;
  8     WARN_ON(in_interrupt());
  9     smp_mb();
 10     per_cpu(dynticks_progress_counter, cpu)++;
 11     WARN_ON(per_cpu(dynticks_progress_counter, cpu) & 0x1);
 12   }
 13 }

Line 3 fetches the current CPU's number, as before. Line 5 checks to see if the rcu_update_flag is non-zero, returning immediately (via falling off the end of the function) if not. Otherwise, lines 6 through 11 come into play. Line 6 decrements rcu_update_flag, returning if the result is not zero. Line 8 verifies that we are indeed leaving the outermost level of nested interrupts, line 9 executes a memory barrier, line 10 increments dynticks_progress_counter, and line 11 verifies that this variable is now even. As with rcu_enter_nohz(), the memory barrier ensures that any other CPU that sees the increment of dynticks_progress_counter will also see the effects of an RCU read-side critical section in the interrupt handler (preceding the rcu_irq_enter() invocation).

These two sections have described how the dynticks_progress_counter variable is maintained during entry to and exit from dynticks-idle mode, both by tasks and by interrupts and NMIs. The following section describes how this variable is used by preemptable RCU's grace-period machinery.

Grace-Period Interface

Of the four preemptable RCU grace-period states shown below (taken from The Design of Preemptable Read-Copy Update), only the rcu_try_flip_waitack_state() and rcu_try_flip_waitmb_state() states need to wait for other CPUs to respond.

Preemptable RCU State Diagram

Of course, if a given CPU is in dynticks-idle state, we shouldn't wait for it. Therefore, just before entering one of these two states, the preceding state takes a snapshot of each CPU's dynticks_progress_counter variable, placing the snapshot in another per-CPU variable, rcu_dyntick_snapshot. This is accomplished by invoking dyntick_save_progress_counter, shown below:

  1 static void dyntick_save_progress_counter(int cpu)
  2 {
  3   per_cpu(rcu_dyntick_snapshot, cpu) =
  4     per_cpu(dynticks_progress_counter, cpu);
  5 }

The rcu_try_flip_waitack_state() state invokes rcu_try_flip_waitack_needed(), shown below:

  1 static inline int
  2 rcu_try_flip_waitack_needed(int cpu)
  3 {
  4   long curr;
  5   long snap;
  6 
  7   curr = per_cpu(dynticks_progress_counter, cpu);
  8   snap = per_cpu(rcu_dyntick_snapshot, cpu);
  9   smp_mb(); /* force ordering with cpu entering/leaving dynticks. */
 10   if ((curr == snap) && ((curr & 0x1) == 0))
 11     return 0;
 12   if ((curr - snap) > 2 || (snap & 0x1) == 0)
 13     return 0;
 14   return 1;
 15 }

Lines 7 and 8 pick up current and snapshot versions of dynticks_progress_counter, respectively. The memory barrier on line ensures that the counter checks in the later rcu_try_flip_waitzero_state follow the fetches of these counters. Lines 10 and 11 return zero (meaning no communication with the specified CPU is required) if that CPU has remained in dynticks-idle state since the time that the snapshot was taken. Similarly, lines 12 and 13 return zero if that CPU was initially in dynticks-idle state or if it has completely passed through a dynticks-idle state. In both these cases, there is no way that that CPU could have retained the old value of the grace-period counter. If neither of these conditions hold, line 14 returns one, meaning that the CPU needs to explicitly respond.

For its part, the rcu_try_flip_waitmb_state state invokes rcu_try_flip_waitmb_needed(), shown below:

  1 static inline int
  2 rcu_try_flip_waitmb_needed(int cpu)
  3 {
  4   long curr;
  5   long snap;
  6 
  7   curr = per_cpu(dynticks_progress_counter, cpu);
  8   snap = per_cpu(rcu_dyntick_snapshot, cpu);
  9   smp_mb(); /* force ordering with cpu entering/leaving dynticks. */
 10   if ((curr == snap) && ((curr & 0x1) == 0))
 11     return 0;
 12   if (curr != snap)
 13     return 0;
 14   return 1;
 15 }

This is quite similar to rcu_try_flip_waitack_needed, the difference being in lines 12 and 13, because any transition either to or from dynticks-idle state executes the memory barrier needed by the rcu_try_flip_waitmb_state() state.

Quick Quiz 4: Can you spot any bugs in any of the code in this section?

We now have seen all the code involved in the interface between RCU and the dynticks-idle state. The next section builds up the Promela model used to validate this code.

Validating Preemptable RCU and dynticks

This section develops a Promela model for the interface between dynticks and RCU step by step, with each of the following sections illustrating one step, starting with the process-level code, adding assertions, interrupts, and finally NMIs.

Basic Model

This section translates the process-level dynticks entry/exit code and the grace-period processing into Promela. We start with rcu_exit_nohz() and rcu_enter_nohz() from the 2.6.25-rc4 kernel, placing these in a single Promela process that models exiting and entering dynticks-idle mode in a loop as follows:

  1 proctype dyntick_nohz()
  2 {
  3   byte tmp;
  4   byte i = 0;
  5 
  6   do
  7   :: i >= MAX_DYNTICK_LOOP_NOHZ -> break;
  8   :: i < MAX_DYNTICK_LOOP_NOHZ ->
  9     tmp = dynticks_progress_counter;
 10     atomic {
 11       dynticks_progress_counter = tmp + 1;
 12       assert((dynticks_progress_counter & 1) == 1);
 13     }
 14     tmp = dynticks_progress_counter;
 15     atomic {
 16       dynticks_progress_counter = tmp + 1;
 17       assert((dynticks_progress_counter & 1) == 0);
 18     }
 19     i++;
 20   od;
 21 }

Lines 6 and 20 define a loop. Line 7 exits the loop once the loop counter i has exceeded the limit MAX_DYNTICK_LOOP_NOHZ. Line 8 tells the loop construct to execute lines 9-19 for each pass through the loop. Because the conditionals on lines 7 and 8 are exclusive of each other, the normal Promela random selection of true conditions is disabled. Lines 9 and 11 model rcu_exit_nohz()'s non-atomic increment of dynticks_progress_counter, while line 12 models the WARN_ON(). The atomic construct simply reduces the Promela state space, given that the WARN_ON() is not strictly speaking part of the algorithm. Lines 14-18 similarly models the increment and WARN_ON() for rcu_enter_nohz(). Finally, line 19 increments the loop counter.

Quick Quiz 5: Why isn't the memory barrier in rcu_exit_nohz() and rcu_enter_nohz() modeled in Promela?

Quick Quiz 6: Isn't it a bit strange to model rcu_exit_nohz() followed by rcu_enter_nohz()? Wouldn't it be more natural to instead model entry before exit?

Each pass through the loop therefore models a CPU exiting dynticks-idle mode (for example, starting to execute a task), then re-entering dynticks-idle mode (for example, that same task blocking).

The next step is to model the interface to RCU's grace-period processing. For this, we need to model dyntick_save_progress_counter(), rcu_try_flip_waitack_needed(), rcu_try_flip_waitmb_needed(), as well as portions of rcu_try_flip_waitack() and rcu_try_flip_waitmb(), all from the 2.6.25-rc4 kernel. The following grace_period() Promela process models these functions as they would be invoked during a single pass through preemptable RCU's grace-period processing.

  1 proctype grace_period()
  2 {
  3   byte curr;
  4   byte snap;
  5 
  6   atomic {
  7     printf("MAX_DYNTICK_LOOP_NOHZ = %d\n", MAX_DYNTICK_LOOP_NOHZ);
  8     snap = dynticks_progress_counter;
  9   }
 10   do
 11   :: 1 ->
 12     atomic {
 13       curr = dynticks_progress_counter;
 14       if
 15       :: (curr == snap) && ((curr & 1) == 0) ->
 16         break;
 17       :: (curr - snap) > 2 || (snap & 1) == 0 ->
 18         break;
 19       :: 1 -> skip;
 20       fi;
 21     }
 22   od;
 23   snap = dynticks_progress_counter;
 24   do
 25   :: 1 ->
 26     atomic {
 27       curr = dynticks_progress_counter;
 28       if
 29       :: (curr == snap) && ((curr & 1) == 0) ->
 30         break;
 31       :: (curr != snap) ->
 32         break;
 33       :: 1 -> skip;
 34       fi;
 35     }
 36   od;
 37 }

Lines 6-9 print out the loop limit (but only into the .trail file in case of error) and model a line of code from rcu_try_flip_idle() and its call to dyntick_save_progress_counter(), which takes a snapshot of the current CPU's dynticks_progress_counter variable. These two lines are executed atomically to reduce state space.

Lines 10-22 model the relevant code in rcu_try_flip_waitack() and its call to rcu_try_flip_waitack_needed(). This loop is modeling the grace-period state machine waiting for a counter-flip acknowledgment from each CPU, but only that part that interacts with dynticks-idle CPUs.

Line 23 models a line from rcu_try_flip_waitzero() and its call to dyntick_save_progress_counter(), again taking a snapshot of the CPU's dynticks_progress_counter variable.

Finally, lines 24-36 model the relevant code in rcu_try_flip_waitack() and its call to rcu_try_flip_waitack_needed(). This loop is modeling the grace-period state-machine waiting for each CPU to execute a memory barrier, but again only that part that interacts with dynticks-idle CPUs.

Quick Quiz 7: Wait a minute! In the Linux kernel, both dynticks_progress_counter and rcu_dyntick_snapshot are per-CPU variables. So why are they instead being modeled as single global variables?

The resulting model, when run with the runspin.sh script, generates 691 states and passes without errors, which is not at all surprising given that it completely lacks the assertions that could find failures. The next section therefore adds safety assertions.

Validating Safety

A safe RCU implementation must never permit a grace period to complete before the completion of any RCU readers that started before the start of the grace period. This is modeled by a grace_period_state variable that can take on three states as follows:

  1 #define GP_IDLE    0
  2 #define GP_WAITING  1
  3 #define GP_DONE    2
  4 byte grace_period_state = GP_DONE;

The grace_period() process sets this variable as it progresses through the grace-period phases, as shown below:

  1 proctype grace_period()
  2 {
  3   byte curr;
  4   byte snap;
  5 
  6   grace_period_state = GP_IDLE;
  7   atomic {
  8     printf("MAX_DYNTICK_LOOP_NOHZ = %d\n", MAX_DYNTICK_LOOP_NOHZ);
  9     snap = dynticks_progress_counter;
 10     grace_period_state = GP_WAITING;
 11   }
 12   do
 13   :: 1 ->
 14     atomic {
 15       curr = dynticks_progress_counter;
 16       if
 17       :: (curr == snap) && ((curr & 1) == 0) ->
 18         break;
 19       :: (curr - snap) > 2 || (snap & 1) == 0 ->
 20         break;
 21       :: 1 -> skip;
 22       fi;
 23     }
 24   od;
 25   grace_period_state = GP_DONE;
 26   grace_period_state = GP_IDLE;
 27   atomic {
 28     snap = dynticks_progress_counter;
 29     grace_period_state = GP_WAITING;
 30   }
 31   do
 32   :: 1 ->
 33     atomic {
 34       curr = dynticks_progress_counter;
 35       if
 36       :: (curr == snap) && ((curr & 1) == 0) ->
 37         break;
 38       :: (curr != snap) ->
 39         break;
 40       :: 1 -> skip;
 41       fi;
 42     }
 43   od;
 44   grace_period_state = GP_DONE;
 45 }

Quick Quiz 8: Given there are a pair of back-to-back changes to grace_period_state on lines 25 and 26, how can we be sure that line 25's changes won't be lost?
Lines 6, 10, 25, 26, 29, and 44 update this variable (combining atomically with algorithmic operations where feasible) to allow the dyntick_nohz() process to validate the basic RCU safety property. The form of this validation is to assert that the value of the grace_period_state variable cannot jump from GP_IDLE to GP_DONE during a time period over which RCU readers could plausibly persist.

The dyntick_nohz() Promela process implements this validation as shown below:

  1 proctype dyntick_nohz()
  2 {
  3   byte tmp;
  4   byte i = 0;
  5   bit old_gp_idle;
  6 
  7   do
  8   :: i >= MAX_DYNTICK_LOOP_NOHZ -> break;
  9   :: i < MAX_DYNTICK_LOOP_NOHZ ->
 10     tmp = dynticks_progress_counter;
 11     atomic {
 12       dynticks_progress_counter = tmp + 1;
 13       old_gp_idle = (grace_period_state == GP_IDLE);
 14       assert((dynticks_progress_counter & 1) == 1);
 15     }
 16     atomic {
 17       tmp = dynticks_progress_counter;
 18       assert(!old_gp_idle || grace_period_state != GP_DONE);
 19     }
 20     atomic {
 21       dynticks_progress_counter = tmp + 1;
 22       assert((dynticks_progress_counter & 1) == 0);
 23     }
 24     i++;
 25   od;
 26 }

Line 13 sets a new old_gp_idle flag if the value of the grace_period_state variable is GP_IDLE at the beginning of task execution, and the assertion at line 18 fires if the grace_period_state variable has advanced to GP_DONE during task execution, which wo