Kernel development

Brief items

Kernel release status

The current development kernel is 2.6.35-rc6, released on July 22. Linus says:

I actually hope/think that this is going to be the last -rc. Things have been pretty quiet, and while this -rc has more commits than -rc5 had, it's not by a large amount, nor does it look scary to me. So there doesn't seem to be any point in dragging out the release any more, unless we find something new that calls for it.

It contains mostly fixes, but also a rename of the logical memory block (LMB) subsystem to "memblock." See the announcement for the short-form changelog, or the full changelog for all the details.

There have been no stable updates over the last week.

Comments (none posted)

Quotes of the week

One of the primary reasons why I started the kernel summit ten years ago was because I've found that people work better after they have had a chance to meet each other face to face. If you only know someone via e-mail, it's lot easier to get into flame wars. But after you've met someone, broken bread and drunk beer with them, it's easier to work with them as a colleague and fellow developer. While the Linux Kernel development community has grown significantly since March, 2001, this principle still holds true.

-- Ted Ts'o

This is one reason why I wrote the ARM Linux kernel booting document some 8 years ago, which specifies the _minimum_ of information that a boot loader needs to supply the kernel needs to be able to boot. Fat lot of good that did - as far as I'm concerned, writing documentation is a total and utter waste of my time and resources. It just gets ignored.

So I now just don't bother with any documentation _at_ _all_.

-- Russell King

My gut reaction to this sort of thing is "run away in terror". It encourages kernel developers to operate like lackadaisical userspace developers and to assume that underlying code can perform heroic and immortal feats. But it can't. This is the kernel and the kernel is a tough and hostile place and callers should be careful and defensive and take great efforts to minimise the strain they put upon other systems.

-- Andrew Morton

Comments (4 posted)

Kernel Summit 2010 planning process begins

The 2010 Kernel Summit will be held in Cambridge, Massachusetts, on November 1 and 2, immediately prior to the Linux Plumbers Conference. The planning process for this year's summit has begun, and the program committee is looking for ideas on what should be discussed and who should be there. "The kernel summit is organized by a program committee, but it could just as easily said that it is organized by the whole Linux Kernel development community. Which is to say, its goals are to make Linux kernel development flow more smoothly, and what we talk about is driven by the work that is going on in the development community at large. So to that end, we need your help!"

Full Story (comments: none)

Kernel development news

File creation times

By Jonathan Corbet
July 26, 2010

Linux systems, like the Unix systems that came before, maintain three different timestamps for each file. The semantics of those timestamps are often surprising to users, though, and they don't provide the information that users often want to know. The possible addition of a new system call is giving kernel developers the opportunity to make some changes in this area, but there is not, yet, a consensus on how that should be done.

The Unix file timestamps, as long-since enshrined by POSIX, are called "atime," "ctime," and "mtime." The atime stamp is meant to record the last time that the file was accessed. This information is almost never used, though, and can be quite expensive to maintain; Ingo Molnar once called atime "perhaps the most stupid Unix design idea of all times". So atime is often disabled on contemporary systems or, at least, rolled back to the infrequently-updated "relatime" mode. Mtime, instead, makes a certain amount of sense; it tells the user when the file was last modified. Modification requires writing to the file anyway, so updating this time is often free, and the information is often useful.

That leaves ctime, which is a bit of a strange beast. Users who do not look deeply are likely to interpret ctime as "creation time," but that is not what is stored there; ctime, instead, is updated whenever a file's metadata is changed. The main consumer of this information, apparently, is the venerable dump utility, which likes to know that a file's metadata has changed (so that information must be saved in an incremental backup), but the file data itself has not and need not be saved again. The number of dump users has certainly fallen over the years, to the point that the biggest role played by ctime is, arguably, confusing users who really just want a file's creation time.

So where do users find the creation time? They don't: Linux systems do not store that time and provide no interface for applications to access it.

That situation could change, though. Some newer filesystems (Btrfs and ext4, for example) have been designed with space for file creation times. Other operating systems also provide this information, and some network filesystem protocols expect to have access to it. So it would be nice if Linux properly supported file creation times; the proposed addition of the xstat() system call would be the ideal time to make that change.

Current xstat() implementations do, in fact, add a st_btime field to struct xstat; the "b" stands for "birth," which is a convention established in the BSD camp. There has been a fair amount of discussion about that addition, though, based on naming and semantics.

The naming issue, one would think, would be relatively straightforward. It was pointed out, though, that other names have been used in the kernel. JFS and Btrfs use "otime," for some reason, while ext4 uses "crtime." And BSD, it turns out, uses "birthtime" instead of "btime." That discussion inspired Linus to exclaim:

Oh wow. And all of this just convinces me that we should _not_ do any of this, since clearly it's all totally useless and people can't even agree on a name.

After that, though, Linus looked a bit more deeply at the problem, which he saw as primarily being to provide a Windows-style creation time that Samba could use. It turns out that Windows allows the creation time to be modified, so Linus saw it as being a sort of variation on the Unix ctime notion. That led to a suggestion to change the semantics of ctime to better suit the Windows case. After all, almost nobody uses ctime anyway, and it would be a trivial change to make ctime look like the Windows creation time. This behavior could be specified either as a per-process flag or a mount-time option; then there would be no need to add a new time field.

This idea was not wildly popular, though; Jeremy Allison said it would lead to "more horrible confusion". If ctime could mean different things in different situations, even fewer people would really understand it, and tools like Samba could not count on its semantics. Jeremy would rather just see the new field added; that seems like the way things will probably go.

There is one last interesting question, though: should the kernel allow the creation time to be modified? Windows does allow modification, and some applications evidently depend on that feature. Windows also apparently has a hack which, if a file is deleted and replaced by another with the same name, will reuse the older file's creation time. BSD systems, instead, do not allow the creation time to be changed. When Samba is serving files from a BSD system, it stores the "Windows creation time" in an extended attribute so that the usual Windows semantics can be provided.

If the current xstat() patch is merged, Linux will disallow changes to the creation time by default - there will be no system call which can make that change. Providing that capability would require an extended version of utimes() which can accept the additional information. Allowing the time to be changed would make it less reliable, but it would also be useful for backup/restore programs which want to restore the original creation time. That is a discussion which has not happened yet, though; for now, creation times cannot be changed.

Comments (29 posted)

zcache: a compressed page cache

By Jonathan Corbet
July 27, 2010

Last year, Nitin Gupta was pushing the compcache patch, which implemented a sort of swap device which stored pages in main memory, compressing them on the way. Over time, compcache became "ramzswap" and found its way into the staging tree. It's not clear that ramzswap can ever graduate to the mainline kernel, so Nitin is trying again with a development called zcache. But zcache, too, currently lacks a clear path into the mainline.

Like its predecessors, zcache lives to store compressed copies of pages in memory. It no longer looks like a swap device, though; instead, it is set up as a backing store provider for the Cleancache framework. Cleancache uses a set of hooks into the page cache and filesystem code; when a page is evicted from the cache, it is passed to Cleancache, which might (or might not) save a copy somewhere. When pages are needed again, Cleancache gets a chance to restore them before the kernel reads them from disk. If Cleancache (and its backing store) is able to quickly save and restore pages, the potential exists for a real improvement in system performance.

Zcache uses LZO to compress pages passed to it by Cleancache; only pages which compress to less than half their original size are stored. There is also a special test for pages containing only zeros; those compress exceptionally well, requiring no storage space at all. There is not, at this point, any other attempt at the unification of pages with duplicated contents (as is done by KSM), though.

There are a couple of obvious tradeoffs to using a mechanism like zcache: memory usage and CPU time. With regard to memory, Nitin says:

While compression reduces disk I/O, it also reduces the space available for normal (uncompressed) page cache. This can result in more frequent page cache reclaim and thus higher CPU overhead. Thus, it's important to maintain good hit rate for compressed cache or increased CPU overhead can nullify any other benefits. This requires adaptive (compressed) cache resizing and page replacement policies that can maintain optimal cache size and quickly reclaim unused compressed chunks. This work is yet to be done.

The current patch does allow the system administrator to manually adjust the size of the zcache area, which is a start. It will be a rare admin, though, who wants to watch cache hit rates and tweak low-level memory management parameters in an attempt to sustain optimal behavior over time. So zcache will almost certainly have to grow some sort of adaptive self-tweaking before it can make it into the mainline.

The other tradeoff is CPU time: it takes processor time to compress and decompress pages of memory. The cost is made worse by any pages which fail to compress down to less than 50% of their original size - the time spent compressing them is a total waste. But, as Nitin points out: "with multi-cores becoming common, benefits of reduced disk I/O should easily outweigh the problem of increased CPU usage". People have often wondered what we are going to do with the increasing number of cores on contemporary processors; perhaps zcache is part of the answer.

One other issue remains to be resolved, though: zcache depends on Cleancache, which is not currently in the mainline. There is some opposition to merging Cleancache, mostly because that patch, which makes changes to individual filesystems, is seen as being overly intrusive. It's also not clear that everybody is, yet, sold on the value of Cleancache, despite the fact that SUSE has been shipping it for a little while now. Until the fate of Cleancache is resolved, add-on patches like zcache will be stuck outside of the mainline.

Comments (11 posted)

Realtime Linux: academia v. reality

July 26, 2010

This article was contributed by Thomas Gleixner

The 20th Euromicro Conference on Real-Time Systems (ECRTS2010) was held in Brussels, Belgium from July 6-9, along with a series of satellite workshops which took place on July 6. One of those satellite workshops was OSPERT 2010 - the Sixth International Workshop on Operating Systems Platforms for Embedded Real-Time Applications, which was co-chaired by kernel developer Peter Zijlstra and Stefan M. Petters from the Polytechnic Institute of Porto, Portugal. Peter and Stefan invited researchers and practitioners from both industry and the Linux kernel developer community. I participated for the second year and tried, with Peter, to nurse the discussion between the academic and real worlds which started last year at OSPERT in Dublin.

Much to my surprise, I was also invited to give the opening keynote at the main conference, which I titled "The realtime preemption patch: pragmatic ignorance or a chance to collaborate?". Much to the surprise of the audience I did my talk without slides, as I couldn't come up with useful ones as much as I twisted my brain around it. The organizers of ECRTS asked me whether they could publish my writeup, but all I had to offer were my scribbled notes which outlined what I wanted to talk about. So I agreed to do a transcript from my notes and memory, without any guarantee that it's a verbatim transcript. Peter at least confirmed that it matches roughly the real talk.

An introduction

First of all I want to thank Jim Anderson for the invitation to give this keynote at ECRTS and his adventurous offer to let me talk about whatever I want. Such offers can be dangerous, but I'll try my best not to disappoint him too much.

The Linux Kernel community has a proven track record of being in disagreement with - and disconnected from - the academic operating system research community from the very beginning. The famous Torvalds/Tannenbaum debate about the obsolescence of monolithic kernels is just the starting point of a long series of debates about various aspects of Linux kernel design choices.

One of the most controversial topics is the question how to add realtime extensions to the Linux kernel. In the late 1990's, various research realtime extensions emerged from universities. These include KURT (Kansas University), RTAI (University of Milano), RTLinux (NMT, Socorro, New Mexico), Linux/RK (Carnegie Mellon University), QLinux (University of Massachusetts), and DROPS (University of Dresden - based on L4), just to name a few. There have been more, but many of them have only left hard-to-track traces in the net.

The various projects can be divided into two categories:

Running Linux on top of a micro/nano kernel
Improving the realtime behavior of the kernel itself

I participated in and watched several discussions about these approaches over the years; the discussion which is burned into my memory forever happened in summer 2004. In the course of an heated debate one of the participants stated: "It's impossible to turn a General Purpose Operating System into a Real-Time Operating System. Period." I was smiling then as I had already proven, together with Doug Niehaus from Kansas University, that it can be done even if it violates all - or at least most - of the rules of the academic OS research universe.

But those discussions were not restricted to the academic world. The Linux kernel mailing list archives provide a huge choice of technical discussions (as well as flame wars) about preemptability, latency, priority inheritance and approaches to realtime support. It was fun to read back and watch how influential developers changed their minds over time. Especially Linus himself provides quite a few interesting quotes. In May 2002 he stated:

With RTLinux, you have to split the app up into the "hard realtime" part (which ends up being in kernel space) and the "rest".

Which is, in my opinion, the only sane way to handle hard realtime. No confusion about priority inversions, no crap. Clear borders between what is "has to happen _now_" and "this can do with the regular soft realtime".

Four years later he said in a discussion about merging the realtime preemption patch during the Kernel Summit 2006:

Controlling a laser with Linux is crazy, but everyone in this room is crazy in his own way. So if you want to use Linux to control an industrial welding laser, I have no problem with your using PREEMPT_RT.

Equally interesting is his statement about priority inheritance in a huge discussion about realtime approaches in December 2005:

Friends don't let friends use priority inheritance. Just don't do it. If you really need it, your system is broken anyway.

Linus's clear statement that he wouldn't merge any PI code ever was rendered ad absurdum when he merged the PI support for pthread_mutexes without a single comment only half a year later.

Both are pretty good examples of the pragmatic approach of the Linux kernel development community and its key figures. Linus especially has always silently followed the famous words of the former German chancellor Konrad Adenauer: "Why should I care about my chatter from yesterday? Nothing prevents me from becoming wiser."

Adding realtime response to the kernel

But back to the micro/nano-kernel versus in-kernel approaches which emerged in the late 90es. From both camps emerged commercial products and, more or less, active open source communities, but none of those efforts was commercially sustainable or ever got close to being merged into the official mainline kernel code base due to various reasons. Let me look at some of those reasons:

Intrusiveness and maintainability: Most of those approaches lacked - and still lack - proper abstractions and smooth integration into the Linux kernel code base. #ifdef's sprinkled all over the place are neither an incentive for kernel developers to delve into the code nor are they suitable for long-term maintenance.
Complexity of usage: Dual-kernel approaches tend to be hard to understand for application programmers, who often have a hard time coping with a single API. Add a second API and the often backwards-implemented IPC mechanisms between the domains and failure is predictable.
I'm not saying that it can't be done, it's just not suitable for the average programmer.
Incompleteness: Some of those research approaches solve only parts of the problem, as this was their particular area of interest. But that prevents them from becoming useful in practice.
Lack of interest: Some of the projects never made any attempt to approach the Linux kernel community, so the question of inclusion, or even partial merging of infrastructure, never came up.

In October 2004, the real time topic got new vigor on the Linux kernel mailing list. MontaVista had integrated the results of research at the University of the German Federal Armed Forces at Munich into the kernel, replacing spinlocks with priority-inheritance-enabled mutexes. This posting resulted in one of the lengthiest discussions about realtime on the Linux kernel mailing list as almost everyone involved in efforts to solve the realtime problem surfaced and praised the superiority of their own approach. Interestingly enough, nobody from the academic camp participated in this heated argument.

A few days after the flame fest started, the discussion was driven to a new level by kernel developer Ingo Molnar, who, instead of spending time with rhetoric, had implemented a different patch which, despite being clumsy and incomplete, built the starting point for the current realtime preemption patch. In no time quite a few developers interested in realtime joined Ingo's effort and brought the patch to a point which allowed real-world deployment within two years. During that time a huge number of interesting problems had to be solved: efficient priority inheritance, solving per cpu assumptions, preemptible RCU, high resolution timers, interrupt threading etc. and, as a further burden, the fallout from sloppily-implemented locking schemes in all areas across the kernel.

Help from academia?

Those two years were mostly spent with grunt work and twisting our brains around hard-to-understand and hard-to-solve locking and preemption problems. No time was left for theory and research. When the dust settled a bit and we started to feed parts of the realtime patch to the mainline, we actually spent some time reading papers and trying to leverage the academic research results.

Let me pick out priority inheritance and have a look at how the code evolved and why we ended up with the current implementation. The first version which was in Ingo's patchset was a rather simple approach with long-held locks, deep lock nesting and other ugliness. While it was correct and helped us to go forward it was clear that the code had to be replaced at some point.

A first starting point for getting a better implementation was of course reading through academic papers. First I was overwhelmed by the sheer amount of material and puzzled by the various interesting approaches to avoid priority inversion. But, the more papers I read, the more frustrated I got. Lots of theory, proof-of-concept implementations written in Ada, micro improvements to previous papers, you all know the academic drill. I'm not at all saying that it was waste of time as it gave me a pretty good impression of the pitfalls and limitations which are expected in a non-priority-based scheduling environment, but I have to admit that it didn't help me to solve my real world problem either.

The code was rewritten by Ingo Molnar, Esben Nielsen, Steven Rostedt and myself several times until we settled on the current version. The way led from the classic lock-chain walk with instant priority boosting through a scheduler-driven approach, then back to the lock-chain walk as it turned out to be the most robust, scalable and efficient way to solve the problem. My favorite implementation, though, would have been based on proxy execution, which already existed in Doug Niehaus's Kansas University Real Time project at that time, but unfortunately it lacked SMP support. Interestingly enough, we are looking into it again as non-priority-based scheduling algorithms are knocking at the kernel's door. But in hindsight I really regret that nobody—including myself—ever thought about documenting the various algorithms we tried, the up- and down-sides, the test results and related material.

So it seems that there is the reverse problem on the real world developer side: we are solving problems, comparing and contrasting approaches and implementations, but we are either too lazy or too busy to sit down and write a proper paper about it. And of course we believe that it is all documented in the different patch versions and in the maze of the Linux kernel mailing list archives which are freely available for the interested reader.

Indeed it might be a worthwhile exercise to go back and extract the information and document it, but in my case this probably has to wait until I go into retirement, and even then I fear that I have more favorable items on my ever growing list of things which I want to investigate. On the other hand, it might be an interesting student project to do a proper analysis and documentation on which further research could be based.

On the value of academic research

I do not consider myself in any way to be representative of the kernel developer community, so I asked around to learn who was actually influenced by research results when working on the realtime preemption patch. Sorry for you folks, the bad news is that most developers consider reading research results not to be a helpful and worthwhile exercise in order to get real work done. The question arises why? Is academic OS research useless in general? Not at all. It's just incredibly hard to leverage. There are various reasons for this and I'm going to pick out some of them.

First of all—and I have complained about this before—it's often hard to get access to papers because they are hidden away behind IEEE's paywall. While dealing with IEEE, a fact of life for the academic world, I personally consider it as a modern form of robber barony where tax payers have to pay for work which was funded by tax money in the first place. There is another problem I have with the IEEE monopoly. Universities' rankings are influenced by the number of papers written by their members and accepted at a IEEE conferences, which I consider to be one of the most idiotic quality measurement rules on the planet. And it's not only my personal opinion; it's also provable.

I actually took the time to spend a day at a university where I could gain access to IEEE papers without wasting my private money. I picked out twenty recent realtime related papers and did a quick survey. Twelve of the papers were a rehash of well-known and well-researched topics, and at least half of them were badly written as well. From the remaining eight papers, six were micro improvements based on previous papers where I had a hard time figuring out why the papers had been written at all. One of those was merely describing the effects of converting a constant which influences resource partitioning into a runtime configurable variable. So that left two papers which seemed actually worthwhile to read in detail. Funny enough, I had already read one of those papers as it was publicly accessible in a slightly modified form.

That survey really convinced me to stay away from IEEE forever and to consider the university ranking system even more suspicious.

There are plenty of other sources where research papers can be accessed, but unfortunately the signal-to-noise ratio there is not significantly better. I have no idea how researchers filter that, but on the other hand most people wonder how kernel developers filter out the interesting stuff from the Linux kernel mailing list flood.

One interesting thing I noticed while skimming through paper titles and abstracts is that the Linux kernel seems to have become the most popular research vehicle. On one site I found roughly 600 Linux-based realtime and scheduling papers which were written in the last 18 months. About 10% of them utilized the realtime preemption patch as their baseline operating system. Unfortunately almost none of the results ever trickled through to the kernel development community, not to mention actually working code being submitted to the Linux kernel mailing list.

As a side note: one paper even mentioned a hard-to-trigger longstanding bug in the kernel which the authors fixed during their research. It took me some time to map the bug to the kernel code, but I found out that it got fixed in the mainline about three months after the paper was published—which is a full kernel release cycle. The fix was not related to this research work in any way, it just happened that some unrelated changes made the race window wider and therefore made the bug surface. I was a bit grumpy when I discovered this, but all I can ask for is: please send out at least a description of a bug you trip over in your research work to the kernel community.

Another reason why it's hard for us to leverage research results is that academic operating system research has, as probably any other academic research area, a few interesting properties:

Base concepts in research are often several decades old, but they don't show up in the real world even if they would be helpful to solve problems which have been worked around for at least the same number of decades more or less.
We discussed the sporadic server model yesterday at OSPERT, but it has been around for 27 years. I assume that hundreds of papers have been written about it, hundreds of researchers and students have improved the details, created variations, but there is almost no operating system providing support for it. As far as I know Apple's OSX is the only operating system which has a scheduling policy which is not based on priorities but, as I learned, it's well hidden away from the application programmer.
Research often happens on narrow aspects of an already narrow problem space. That's understandable as you often need to verify and contrast algorithms on their own merit without looking at other factors. But that leaves the interested reader like me with a large amount of puzzle pieces to chase and fit together, which often enough made me give up.
Research often happens on artificial application scenarios. While again understandable from the research point of view, it makes it extremely hard, most of the time, to expand the research results into generalized application scenarios without shooting yourself in the foot and without either spending endless time or giving up. I know that it's our fault that we do not provide real application scenarios to the researchers, but in our defense I have to say that in most of the cases we don't know what downstream users are actually doing. We only get a faint idea of it when they complain about the kernel not doing what they expect.
Research often tries to solve yesterday's problems over and over while the reality of hardware and requirements have already moved to the next levels of complexity. I can understand that there are still interesting problems to solve, but seeing the gazillionst paper about priority ceilings on uniprocessor systems is not really helpful when we are struggling with schedulability, lock scaling and other challenges on 64- (and more) core machines.
Comparing and contrasting research results is almost impossible. Even if a lot of research happens on Linux there is no way to compare and contrast the results as researchers, most of the time, base their work on completely different base kernel versions. We talked about this last year and I have to admit that neither Peter nor myself found enough spare time to come up with an approach to create a framework on which the various research groups could base their scheduler of the day. We haven't forgotten about this, but while researchers have to write papers, we get our time occupied by other duties.
Research and education seem to happen in different universes. It seems that operating system and realtime research have little influence on the education of Joe Average Programmer. I'm always dumbstruck when talking to application programmers who have not the faintest idea of resources and their limitations. It seems that the resource problems on their side are all solvable by visiting the hardware shop across the street and buying the next-generation machine. That approach also manifests itself pretty well in the "enterprise realtime" space where people send us test cases which refuse to even start on anything smaller than a machine equipped with 32GB of RAM and at least 16 cores.
If you have any chance to influence that, then please help to plant at least some clue on the folks who are going to use the systems you and we create.
A related observation is the inability of hardware and software engineers to talk to each other when a system is designed. While I observe that disconnect mainly on the industry side, I have the feeling that it is largely true in the universities as well. No idea how to address this issue, but it's going to be more important the more the complexity of systems increases.

I'll stop bashing on you folks now, but I think that there are valid questions and we need to figure out answers to them if we want to get out of the historically grown state of affairs someday.

In conclusion

We are happy that you use Linux and its extensions for your research, but we would be even more happy if we could deal with the outcome of your work in an easier way. In the last couple of years we started to close the gap between researchers and the Linux kernel community at OSPERT and at the Realtime Linux Workshop and I want to say thanks to Stefan Petters, Jim Anderson, Gerhard Fohler, Peter Zijlstra and everyone else involved. It's really worthwhile to discuss the problems we face with the research community and we hope that you get some insight into the problems we face and requirements which are behind our pragmatic approach to solve them.

And of course we appreciate that some code which comes out straight of the research laboratory (the EDF scheduler from ReTiS, Pisa) actually got cleaned up and published on the Linux kernel mailing list for public discussion and I really hope that we are going to see more like this in the foreseeable future. Problem complexity is increasing, unfortunately, and we need all the collective brain power to address next year's challenges. We already started the discussion and first interesting patches have shown up, so really I hope we can follow down that road and get the best out of it for all of us.

Thanks for your attention.

Feedback

I got quite a bit of feedback after the talk. Let me answer some of the questions.

Q: Is there any place outside LKML where discussion between academic folks and the kernel community can take place?

A: Björn Brandenberg suggested setting up a mailing list for research related questions, so that the academics are not forced to wade through the LKML noise. If a topic needs a broader audience we always can move it to LKML. I'm already working on that. It's going to be low traffic, so you should not be swamped in mail.

Q: Where can I get more information about the realtime preemption patch ?

A: General information can be found on the realtime Linux wiki, this LWN article, and this Linux Symposium paper [PDF].

Q: Which technologies in the mainline Linux kernel emerged from the realtime preemption patch?

A: The list includes:

the Generic interrupt handling framework. See: Linux/Documentation/DocBook/genericirq and this LWN article.
Threaded interrupt handlers, described in LWN and again in LWN.
The mutex infrastructure. See: Linux/Documentation/mutex-design.txt
High-resolution timers, including NOHZ idle support. See: Linux/Documentation/timers/highres.txt and these presentation slides.
Priority inheritance support for user space pthread_mutexes. See: Linux/Documentation/pi-futex.txt, Linux/Documentation/rt-mutex.txt, Linux/Documentation/rt-mutex-design.txt, this LWN article, and this Realtime Linux Workshop paper [PDF].
Robustness support for user-space pthread_mutexes. See: Linux/Documentation/robust-futexes.txt and this LWN article.
The lock dependency validator, described in LWN.
The kernel tracing infrastructure, as described in a series of LWN articles: 1, 2, 3, and 4.
Preemptible and hierarchical RCU, also documented in LWN: 1, 2, 3, and 4.

Q: Where do I get information about the Realtime Linux Workshop?

A: The 2010 realtime Linux Workshop (RTLWS) will be in Nairobi, Kenya, Oct. 25-27th. The 2011 RTLWS is planned to be at Kansas University (not confirmed yet). Further information can be found on the RTLWS web page. General information about the organisation behind RTLWS can be found on the OSADL page, and information about it's academic members is on this page.

Conference impressions

I stayed for the main conference, so let me share my impressions. First off the conference was well organized and, in general, the atmosphere was not really different from an open source conference. The realtime researchers seem to be a well-connected and open-minded community. While they take their research seriously, at least most of them admit freely that the ivory tower they are living in can be a complete different universe. This was pretty much observable in various talks where the number of assumptions and the perfectly working abstract hardware models made it hard for me to figure out how the results of this work could be applied to reality.

The really outstanding talks were the keynotes on day two and three.

On Thursday, Norbert When from the Technical University Kaiserslautern gave an interesting talk titled Hardware modeling: A critical assessment with case studies [PDF]. Norbert is working on hardware modeling and low-level software for embedded devices, so he is not the typical speaker you would expect at a realtime-focused conference. But it seems that the program committee tried to bring some reality into the picture. Norbert gave an impressive overview over the evolution of hardware and the reasons why we have to deal with multi-core hardware and have to face the fact that today's hardware is not designed for predictability and reliability. So realtime folks need to rethink their abstract models and take more complex aspects of the overall system into account.

One of the interesting aspects was his view on energy efficient computing: A cloud of 1.7 million AMD Opteron cores consumes 179MW while a cloud of 10 million Xtensa cores provides the same computing power at 3MW. Another aspect of power-aware computing is the increasing role of heterogeneous systems. Dedicated hardware for video decoding is about 100 times more power efficient than a software-based solution on a general-purpose CPU. Even specialized DSPs consume about 10 times more power for the same task than the optimized hardware solution.

But power optimized hardware has a tradeoff: the loss of flexibility which is provided by software. But the mobile space has already arrived in the heterogeneous world, and researchers need to become aware of the increased complexity to analyze such hybrid constructs and develop new models to allow the verification of these systems in the hardware design phase. Workarounds for hardware design failures in application specific systems are orders of magnitudes more complex than on general purpose hardware. All in all, he gave his colleagues from the operating system and realtime research communities quite a list of homework assignments and connected them back to earth.

The Friday morning keynote was a surprising reality check as well. Sanjoy Baruah from the University of North Carolina at Chapel Hill titled his talk "Why realtime scheduling theory still matters". Given the title one would assume that the talk would be focused on justifying the existence of the ivory tower, but Sanjoy was very clear about the fact that the realtime and scheduling research has focused for too long on uniprocessor systems and is missing answers to the challenges of the already-arrived multi-core era. He gave pretty clear guidelines about which areas research should focus on to prove that it still matters.

In addition to the classic problem space of verifiable safety-critical systems, he was calling for research which is relevant to the problem space and built on proper abstractions with a clear focus on multi-core systems. Multi-core systems bring new—and mostly unresearched—challenges like mixed criticalities, which means that safety critical, mission critical and non critical applications run on the same system. All of them have different requirements with regard to meeting their deadlines, resource constraints, etc., and therefore bring a new dimension into the verification problem space. Other areas which need care, according to Sanjoy, are component-based designs and power awareness.

It was good to hear that despite our usual perception of the ivory tower those folks have a strong sense of reality, but it seems they need a more or less gentle reminder from time to time. ECRTS was a real worthwhile conference and I can only encourage developers to attend such research-focused events and keep the communication and discussion between our perceived reality and the not-so-disconnected other universe alive.

Comments (84 posted)

Patches and updates

Kernel trees

Linus Torvalds Linux 2.6.35-rc6 ?

Architecture-specific

Yinghai Lu Use memblock with x86 ?

Jason Baron jump label v10 ?

Stepan Moskovchenko arm: msm: Add System MMU support. ?

Michal Simek microblaze: Add KGDB support ?

Fenghua Yu Package Level Thermal Control and Power Limit Notification ?

Build system

Sam Ravnborg kconfig: introduce alldefconfig + savedefconfig ?

Core kernel code

Srikar Dronamraju Uprobes Patches ?

Patrick Pannuto timer: Added usleep[_range] timer ?

Development tools

Josh Stone SystemTap release 1.3 ?

Koki Sanagi netdev: show a process of packets ?

Device drivers

Stefan Richter [review patch 0/2] new driver: Nosy, a low-budget FireWire packet sniffer ?

Linus Walleij DMAENGINE: generic slave channel control v3 ?

Marc Kleine-Budde CAN: Add Flexcan CAN controller driver ?

Jarod Wilson STAGING: add lirc device drivers ?

Christopher Heiny input/touchscreen: Synaptics Touchscreen Driver ?

Filesystems and block I/O

Artem Bityutskiy kill unnecessary bdi wakeups + cleanups ?

Nick Piggin VFS scalability git tree ?

David Howells [PATCH 1/6] Add a dentry op to handle automounting rather than abusing follow_link() ?

NeilBrown The DM part of dm-raid45 ?

Lukas Czerner Batched discard support for Ext3/Ext4 ?

bchociej@gmail.com Btrfs: Add hot data tracking functionality ?

Memory management

Wu Fengguang [RFC] writeback: try to write older pages first ?

Mel Gorman Reduce writeback from page reclaim context V5 ?

Yinghai Lu generic changes for memblock ?

Michal Nazarewicz The Contiguous Memory Allocator ?

KAMEZAWA Hiroyuki towards I/O aware memory cgroup ?

Security-related

Mimi Zohar EVM ?

John Johansen AppArmor security module ?

Virtualization and containers

Srivatsa Vaddagiri [PATCH RFC 0/4] Paravirt-spinlock implementation for KVM guests (Version 0) ?

Jens Osterkamp implementation of IEEE 802.1Qbg in lldpad ?

Konrad Rzeszutek Wilk Xen-SWIOTLB v0.8.6 used for Xen PCI pass through for PV guests. ?

Benchmarks and bugs

Rafael J. Wysocki 2.6.35-rc6: Reported regressions from 2.6.34 ?

Rafael J. Wysocki 2.6.35-rc6: Reported regressions 2.6.33 -> 2.6.34 ?

Page editor: Jonathan Corbet
Next page: Distributions>>