Approaches to realtime Linux

[Posted October 12, 2004 by corbet]

Using Linux systems for realtime tasks has long been an area of interest. In the last couple of weeks, a number of projects working to implement realtime response have posted their work. This article looks at the patches posted recently to get a sense for where the realtime projects are headed.

The realtime LSM

A relatively simple contribution is the realtime security module by Torben Hohn and Jack O'Quin. This module does not actually add any new realtime features to the kernel; instead, it uses the LSM hooks to let users belonging to a specific group use more of the system's resources. In particular, it adds the CAP_SYS_NICE, CAP_IPC_LOCK, and CAP_SYS_RESOURCE capabilities to the selected group. These capabilities allow the affected processes to raise their priority, lock memory into RAM, and generally to exceed resource limits. Granting capabilities in this way goes somewhat beyond the usual "restrictive hooks only" practice for security modules, but there have not been any complaints on that score.

MontaVista's patch

The event which really stirred up the discussion, however, was the posting of the realtime kernel patch set by MontaVista's Sven-Thorsten Dietrich. This highly intrusive patch attempts to minimize system response latency by taking the preemptible kernel approach to its limit. In comparison, the current preemption approach, which is considered to be too risky to use by most distributors, is a half measure at best.

MontaVista's patch begins by adopting the "IRQ threads" patch posted by Ingo Molnar. This patch moves the running of most interrupt handlers into a separate kernel thread which competes with the others for processor time. Once that is done, interrupt handlers become preemptible and are far less likely to stall the system for long periods of time.

The biggest source of latency in the kernel then becomes critical sections protected by spinlocks. So why not make those sections preemptible as well? To that end, the PMutex patch has been adapted to the 2.6 kernel. This patch implements blocking mutexes, similar to the existing kernel semaphores. The PMutex version, however, has a simple priority inheritance mechanism; processes holding a mutex can have their priority bumped up temporarily so that they get their work done and release the mutex as quickly as possible. Among other things, this approach helps to minimize priority inversion problems.

The biggest change is replacing of most spinlocks in the system with the new mutexes; the patch uses a set of preprocessor macros to turn spinlock_t, and the operations on spinlocks, into their mutex equivalents. In one step, most critical sections become preemptible and no longer are part of the latency problem. As an added bonus, the moving of interrupt handlers to their own thread means that interrupt handlers can no longer deadlock with non-interrupt code when contending for the same lock; that means that it is no longer necessary to disable interrupts when taking a lock which might also be used by an interrupt handler.

There are, of course, a few nagging little problems to deal with. Some code in the system really shouldn't be preempted while holding a lock. In particular, code which might be in the middle of programming hardware registers, the page table handling code, and the scheduler itself need to be allowed to do their job in peace. It is hard, after all, to imagine a scenario where preempting the scheduler will lead to good things. So a number of places in the kernel cannot be switched from spinlocks to the new mutexes.

The realtime patch attempts to handle these cases by creating a new _spinlock_t type, which is just the old spinlock_t under a newer, uglier name. The spinlock primitives have been renamed in the same way (e.g. _spin_lock()). Code which truly needs an old-style spinlock is then hacked up to use the new names, and it functions as before. Except for some files, where the developers were able to include <linux/spin_undefs.h>, which restores the old functionality under the old names. The header file rightly describes this technique as "a dirty, dirty hack." But it does make the patch smaller.

Needless to say, the task of sifting through every lock in the kernel to figure out which ones cannot be changed to mutexes is a long and error-prone process. In fact, the job is nowhere near complete, and the MontaVista patch is, by its authors' admission, marginally stable on uniprocessor systems, unstable on SMP systems, and unrunnable on hyperthreaded systems. But you have to start somewhere.

Ingo's fully preemptible kernel

Ingo Molnar liked that start, but had some issues with it. So he went off for two days and created a better version, which has been folded into his "voluntary preemption" series of patches. Ingo takes the same basic approach used by the MontaVista patch, but with some changes:

The PMutex patch is not used; instead, Ingo uses the existing kernel semaphore implementation. His argument is that semaphores work on all architectures, while PMutexes currently only work on x86. It would be better to hack priority inheritance into the existing semaphores, and thus make it available to all of the current semaphore users as well as those converted over from spinlocks. Ingo's patch does not currently implement priority inheritance, however.
Through some preprocessor trickery, Ingo was able to avoid changing all of the spinlock calls. Preserving "old style" spinlock behavior is simply a matter of changing the type of the lock to raw_spinlock_t and, perhaps, changing the initialization of the lock. The actual spin_lock() and related calls do the right thing with either a "raw" spinlock or a new semaphore-based mutex. Think of it as a sort of poor man's polymorphic lock type.
Ingo found a much larger set of core locks which must use the true spinlock type. This was done partly through a set of checks built into the kernel which complain when the wrong type of lock is being used. With Ingo's patch, some 90 spinlocks remain in the kernel (in comparison, MontaVista preserved about 30 of them). Even so, thanks to the reworked locking primitives, Ingo's patch is much smaller than the MontaVista patch.

Ingo would like to reduce the number of remaining spinlocks, but he warns that a number of "core infrastructure" changes will be required first. In particular, code using read-copy-update must continue to use spinlocks for now; allowing code which holds a reference to an RCU-protected structure to be preempted would break one of the core RCU assumptions. MontaVista has apparently taken a stab at the RCU issue, but does not yet have a patch which they are ready to circulate.

Ingo continues to post patches at a furious rate; things are evolving quickly on this front.

RTAI/Fusion

Meanwhile, the real realtime people point out that none of this work provides deterministic, quantifiable latencies. It does help to reduce latency, but it cannot provide guarantees. A "realtime" system without latency guarantees may be suitable for a number of tasks, but it still isn't up to the challenge of running a nuclear power plant, an airliner's flight management system, or an extra-fast IRC spambot. If it absolutely, positively must respond within a few microseconds, you need a real realtime system.

There are two longstanding Linux projects which are intended to provide this sort of deterministic response: RTLinux and RTAI. There is the obligatory bad blood between the two, complicated by a software patent held by the RTLinux camp.

The RTLinux approach (and the subject of the patent) is to put the hardware under the control of a small, hard realtime system, and to run the whole of Linux as a single, low-priority task under the realtime system. Access to the realtime mode is obtained by writing a kernel module which uses a highly restricted set of primitives. Channels have been provided for communicating between the realtime module and the normal Linux user space. Since the realtime side of the system controls the hardware and gets first claim on its resources, it is possible to guarantee a maximum response time.

RTAI initially used that approach, but has since shifted to running under the Adeos kernel. Adeos is essentially a "hyperviser" system which runs both Linux and a real-time system as subsidiary tasks, and allows the two to communicate. It allows a pecking order to be established between the secondary operating systems so that the realtime component can respond first to hardware events. This approach is said to be more flexible and also to avoid the RTLinux patent. Working with RTAI still requires writing kernel-mode code to handle the hard realtime part of the task.

In response to the current discussion, Philippe Gerum surfaced with an introduction to the RTAI/Fusion project. This project, which is "a branch" of the RTAI effort, is looking for a middle ground between the low-latency efforts and the full RTAI mode of operation; its goal is to allow code to be written for the Linux user space, with access to regular Linux facilities, but still being able to provide deterministic, bounded response times. To this end, RTAI/Fusion provides two operating modes for realtime tasks:

The "hardened" mode offers strict latency guarantees, but programs must restrict themselves to the services provided by RTAI. A subset of Linux system calls are available as RTAI services, but most of them are not.
When a task invokes a system call which cannot be implemented in the hardened mode, it is shifted over to the secondary ("shielded") scheduling mode. This mode is similar to the realtime modes implemented by MontaVista and Ingo Molnar; all Linux services are available, but the maximum latency may be higher. The RTAI/Fusion shielded mode defers most interrupt processing while the realtime task is running, which is said to improve latency somewhat.

Processes may move between the two modes at will.

The end result is a blurring of the line between regular Linux processes and the hard realtime variety. Developers can select the mode which best suits their needs while running under the same system, and they can use different modes for different phases of a program's execution. RTAI/Fusion might yet succeed in the task of combining a general-purpose operating system with hard realtime operation.

In conclusion...

Whether any of the work described here will make it into the mainline kernel is another question. The preemptible kernel patch, which was far less ambitious, has still not been accepted by many developers. Removing most spinlocks and making the kernel fully preemptible will certainly be an even harder sell. It is an intrusive change which could take some time to stabilize fully. If a fully-preemptible, closer-to-realtime kernel does pass muster with the kernel developers, it may well be the sort of development that finally forces the creation of a 2.7 branch.

Another challenge will be building a consensus around the idea that the mainline kernel should even try to be suitable for hard realtime tasks. The kernel developers are, as a rule, opposed to changes which benefit a tiny minority of users, but which impose costs on all users. Merging intrusive patches for the sake of realtime response looks like that sort of change to many. Before mainline Linux can truly claim to be a realtime system, the relevant patches will have to prove themselves to be highly stable and without penalty for "regular" users.

Index entries for this article
Kernel	Interrupts
Kernel	Latency
Kernel	Preemption
Kernel	Realtime
Kernel	Voluntary preemption

The perenial "Nuclear Power Plant" example

Posted Oct 12, 2004 19:01 UTC (Tue) by sbergman27 (guest, #10767) [Link] (21 responses)

The nuclear power plant thing is a pet peeve of mine.

Could someone please explain exactly which systems in a nuclear power plant require something to happen within a few microseconds? I only ask because I would think that in any good design, the necessity of that level of response would not be necessary. I don't know about you, but I would be hard pressed to replace a piece of hardware in the event of hardware failure in a few microseconds. It would take me at *least* several milliseconds, if not more.

Are we really sitting on the edge of armageddon, awaiting,in silent terror, the time that some system fails to respond (for any reason) within a microsecond or two?

I sincerely hope that the above is a reducto al absurdum.

Otherwise, I would say that "The End Is Nigh".

The perenial "Nuclear Power Plant" example

Posted Oct 12, 2004 19:37 UTC (Tue) by arget (guest, #5929) [Link] (7 responses)

A nuclear power plant operates with that wonderful oxymoron, a controlled fission chain reaction. A highly energetic neutron hits a Uranium (or Plutonium) atom, and splits it into two smaller atoms, with some heat energy and a neutron or two left over, that in turn can go on to split more Uranium atoms. It's a balancing act, too many neutrons, and the reaction goes "super-critical" and releases exponentially more energy, potentially doubling in sub-second time frames (periods). A bomb is designed to go super-critical very, very quickly. A normally functioning reactor will operate in "critical" with a period of infinity, right on the razor's edge between super-critical and sub-critical (where there are not enough neutrons to sustain a chain reaction). Now, because of some inherent randomness, the reactor is generally a hair one side or the other of critical. Modern reactors are designed so that the geometry is such that things don't get too "hot" (or too "cold") too quickly, and you have some time to adjust as your period drops into positive or negative numbers from infinity. The razor's edge is more like a broad ridge. Even so, you want to be able to respond quickly. You can't wait for a computer to reboot. Is it ever on the order of micro or even milliseconds in a (modern, Western) reactor? Nah, but it could get within minutes, or tens of seconds. Really, space travel is probably a better example of something that needs to be controlled within microseconds.

The perenial "Nuclear Power Plant" example

Posted Oct 12, 2004 19:58 UTC (Tue) by euvitudo (guest, #98) [Link]

I like your description. There has been a bit of discussion about real-time linux in my workplace. A group is writing software that receives a stream of bits from a set of CCDs used for astronomical observations. They chose linux as the platform, but found out that they ended up losing a row of data every so-often (during each readout) due to the kernel going out to make sure its shirt was properly tucked in.

The obvious need here is to not lose track of the stream (in this case, flood) of bits coming from the hardware. I can imagine (though this may not actually be the case) that if a nuclear reactor has been streaming bits to it's warning systems, you certainly do not want to find out that the kernel was taking a short bathroom break. For my needs, I do not require a real-time system; if the kernel pauses for a brief moment to do some catch-up work, I don't care.

OT: safer nuclear reactors

Posted Oct 13, 2004 12:41 UTC (Wed) by jvotaw (subscriber, #3678) [Link] (5 responses)

[ Note: this is definitely not my field; apologies if I get this wrong. ]

For what it's worth, there are some designs of nuclear reactors that are fairly safe. Yes, they're operating in "critical", but it's unlikely that they will go super-critical quickly.

The two broadest, relevant questions about reactor designs include: how stable is the speed of the nuclear reaction? and, if it becomes unstable, does the speed tend to increase or decrease?

Chernobyl uses a fairly unstable design that tends to get hotter if it gets out or control. A counter-example are the CANDU reactors, which are pretty stable and safe.

There are even better designs which have not yet been implemented, such as CAESAR. As I understand it, this design uses depleted, non-radioactive Uranium as fuel. Steam moderates the speed of neutrons to the precise speed where they will cause depleted Uranium to split. If the reactor overheats or underheats, the density of the steam changes, neutrons are no longer moving at the speed necessary to sustain the reaction, and the reaction stops. The advantages of using depleted Uranium as fuel include the ability to have Uranium rods which are 100% fuel, instead of around 5% in traditional reactors, which means ~40 years of power without replacing the fuel rods. Also, the fuel rods are not usable for nuclear weapons either before or after they are used; we'd have the option of building these reactors in unstable countries without increasing nuclear proliferation.

Again, this is definitely not my field, so please forgive me (and correct me) if I'm wrong.

-Joel

OT: safer nuclear reactors

Posted Oct 14, 2004 9:58 UTC (Thu) by nix (subscriber, #2304) [Link] (4 responses)

`non-radioactive Uranium'? An interesting substance: a shame it doesn't exist.

OT: safer nuclear reactors

Posted Oct 14, 2004 13:34 UTC (Thu) by jvotaw (subscriber, #3678) [Link]

I stand corrected. Even pure U-238 is (minimally) radioactive, it seems.

The larger point remains: this is a substance that is widely considered safe enough to be used in ceramic glazing, sailboat keels, race cars, oil drills, etc. (Although, admittedly, not safe enough that you'd want to turn it in to a powder and disperse it into the air or water.)

Thanks, Wikipedia.

-Joel

OT: safer nuclear reactors

Posted Oct 15, 2004 20:07 UTC (Fri) by Baylink (guest, #755) [Link] (2 responses)

I believe the substance in question is "depleted uranium", as used in weapons systems, among other things.

A better analogy, IMHO, for when hard realtime response is necessary, would be industrial robotics: if a 400lb swingarm is about to crush a human, guaranteed millisecond response is in fact essential.

But Linus and I had an exchange about this, a few years back, carboned to this very venue, and he convinced me that if what you need is that hard realtime, then you should probably not be doing anything else with that computer.

http://lwn.net/2000/0713/backpage.php3

OT: safer nuclear reactors

Posted Oct 21, 2004 14:15 UTC (Thu) by alext (guest, #7589) [Link] (1 responses)

Generally true with respect to ordinary OS tasks. Often though you want to respond to specific events within a fixed time limit or always do X at interval Y. Neither things using all the CPU resource, leaving gaps to fill. What you do the rest of the time is low priority things that don't matter them not happening bang on interval Yn to within nanoseconds.

That is my experience from automotive engine controllers. On those we do lots of low priority things. The issue that comes in to play is testing and validation. If you are running other tasks on a controller with safety critical tasks generally you want to test everything to the higher standard if you are mixing on a shared host.

Related to running something like Linux as a low priority task under a hard real time system gives the argued (I have my doubts) ability to sandbox the none safety critical tasks so that they can't do things to interfere with the safety critical portion.

OT: safer nuclear reactors

Posted Oct 21, 2004 17:07 UTC (Thu) by Baylink (guest, #755) [Link]

This is, as always, a tradeoff.

Response latency can usefull be characterized as "M% of the time, the system will successfully respond within N ms." The more important it is to you, the closer to 100 M must be.

But the underlying point is that for values of M less than 100.0, it's often possible to combine soft-real-time techniques with throw-hardware-at-it, and get a useful result. And Linus' assertion, with which I agree now, is that if you really need 100.0%, because people may be hurt or killed, or the value of things which may be destroyed is sufficiently high, that at *best* you should indeed be running Linux as a task under a small, tight, HRT kernel.

LinuxRT and RTAI may be good enough; they may not.

The perenial "Nuclear Power Plant" example

Posted Oct 12, 2004 19:50 UTC (Tue) by jens.richter (subscriber, #20650) [Link] (1 responses)

Modern digital I&C safety systems require a hard realtime OS, yes it's true!

The timer tick of a system I know is 1 ms.

The response time of the I&C safety system to start the shutdown of the reactor in case of emergency is typically in the range below 1s.

You need a realtime OS, but the timig requirements are less dramatic than we think.

The perenial "Nuclear Power Plant" example

Posted Oct 13, 2004 9:52 UTC (Wed) by pwaechtler (guest, #5075) [Link]

>Modern digital I&C safety systems require a hard realtime OS, yes it's true!
>The timer tick of a system I know is 1 ms.

The event is not triggered by the timer - it's interrupt driven.
Speak about interrupt latency in the range of 3-10 us and
scheduling latency in the range of 5-100 us.

http://www.qnx.com/developers/docs/qnx_4.25_docs/qnx4/sys...

The perenial "Nuclear Power Plant" example

Posted Oct 12, 2004 20:20 UTC (Tue) by darthscsi (guest, #8111) [Link] (3 responses)

Hard realtime has nothing to do with *fast*, just deturministic response. If I can write a system that always has bounded latency of 1 hour, then I am in the hard realtime realm (though not useful really). If I have a system that has average latency of .00000001 nanoseconds, but on some pathalogical cases cannot be anylized, then we are out of the deturministic (hard realtime) realm, no matter how much faster this second system is.

The perenial "Nuclear Power Plant" example

Posted Oct 12, 2004 21:08 UTC (Tue) by hppnq (guest, #14462) [Link] (2 responses)

But the deterministic response follows from the speed at which operations can take place.

The perenial "Nuclear Power Plant" example

Posted Oct 12, 2004 23:22 UTC (Tue) by gilb (subscriber, #11728) [Link] (1 responses)

Nope, it isn't speed, it the ability to specify an upper bound to the response time that is required for deterministic operation. You need to be able to complete your desired task as well (which relates to speed), but the deterministic requirement simply states that you will always have to the opportunity to do your task every N time intervals.

For example, in a modern plane like the B2 or JSF, you may need to adjust the control surfaces every 10 ms in order to guarantee stable flight. You know that this will work because you ran the simulations that showed that 10 ms will work. If the response time exceeds this, the plane may be stable or it may not be, but you don't want to find out while it is flying.

The perenial "Nuclear Power Plant" example

Posted Oct 13, 2004 10:20 UTC (Wed) by hppnq (guest, #14462) [Link]

You need to be able to complete your desired task as well (which relates to speed)

That's what I mean.

The perenial "Nuclear Power Plant" example

Posted Oct 12, 2004 21:19 UTC (Tue) by Stephen_Beynon (guest, #4090) [Link]

I don't know about about US nuke plants, but in the UK the design aim is
that when something goes wrong the operator should have 30 mins to read
his manual before anything significant needs to be done. That includes
the control computer crashing.

The perenial "Nuclear Power Plant" example

Posted Oct 12, 2004 21:51 UTC (Tue) by simlo (guest, #10866) [Link] (2 responses)

It could be that they have to switch out the generator from the electrical grid within a few ms. Typically, if the grid outside the plant is short-circuited. You can make a simple hardware solution but then you can't switch the power plant back on again. With a software solution you can switch in resistor banks and burn off the energy for a few ms until the grid is ok again.

That said: Having such systems running somekind of real-time Linux would be insane. Linux caries out too much code which can contain bugs. I would make it from scratch with either no OS at all or only the bare bones (i.e. basicly just a scheduler). Then I would make a simple protocol between this safety critical subsystem and systems running Linux to supervise it.

A realtime Linux is mostly usefull for cheap systems, where you both Linux's server/client capabilities and have a none-safety critical subsystem you need to service as well and you don't wont the extra cost of an extra CPU.

The question is how hard these real time requirements are. Very often it wont be that "hard" in the sense that the application can somewhat survive a missed deadline once in a while but it might be anoying to the user - like Xmms. For many, many applications it is like that: The models say that you have to do things of such and such rate; but in practise you can skip sample points once in a while with no critical problem.

The perenial "Nuclear Power Plant" example

Posted Oct 15, 2004 16:34 UTC (Fri) by iabervon (subscriber, #722) [Link]

Linux doesn't have all that much code if you disable everything. Of course, I'd personally design a nuclear reactor with a microcontroller to handle all the really fast hard realtime stuff, set up a watchdog to make sure the operator's computer is responding within a couple of minutes, and run Linux on the operator's computer. Linux does have far more code than a microcontroller program, no matter what you do. For that matter, with microcontrollers, you could probably set up a set of redundant ones with voting schemes just to make sure that failures don't cause problems. It's not like you're going to blow your power plant budget on microcontrollers.

The perenial "Nuclear Power Plant" example

Posted Oct 21, 2004 14:23 UTC (Thu) by alext (guest, #7589) [Link]

In national power grid systems my experience is of hardware processing that feeds an event to the software which is event driven with no scheduler involved, it just sits and spins waiting to take action. That then triggers nice, very rapid hardware breakers because of the arcing problem switching that kind of voltage (K and M sizes).

The perenial "Nuclear Power Plant" example

Posted Oct 12, 2004 23:27 UTC (Tue) by smoogen (subscriber, #97) [Link]

Most of the controls I know of in a Nuclear Power plant that need Real Time control are also in any 'steam-powered' plant. Let that coal fire plant get out of control and it will blow up.. it just isnt as scary because people dont consider tons of acid rain, mercury, and other heavy metal contaminants as 'sexy' as 3 eyed fish and giant tarantulas.

The perennial "Nuclear Power Plant" example

Posted Oct 13, 2004 3:23 UTC (Wed) by ncm (guest, #165) [Link]

The "Nuclear Power Plant Control System" example is really just a euphemism for the "Nuclear Weapon Control System" example. Really the differences between the two largely amount to how long the control system is expected to continue running (indeed, existing, in solid state) after the chain reaction begins.

The perenial "Nuclear Power Plant" example

Posted Oct 13, 2004 15:56 UTC (Wed) by AJWM (guest, #15888) [Link]

Nothing in a nuclear power plant is dependant on microsecond timing. Ultimately, nuke plants are mechanically controlled -- control rods slide in and out, coolant pumps and valves actuated, etc. and mechanical devices (at least, those big enough to see) just don't react that precisely.

Nuclear plants were operating years before we had solid-state computers, let alone ones with even microsecond cycle times.

The timing in a nuclear bomb is that critical, actually more so, to ensure that the compression wave from the triggering explosives is precisely shaped so as to uniformly squeeze the fissionable material -- an asymmetrical push will let material spew out before the reaction builds to a peak and you get some kind of fizzle yield (worst case the fissionable just melts itself). The reaction is sensitive to surface area to volume ratios (too much area and too many neutrons escape rather than hitting other U or Pu nuclei).

Approaches to realtime Linux

Posted Oct 12, 2004 19:29 UTC (Tue) by karim (subscriber, #114) [Link] (8 responses)

There are a few things that need to be pointed out:

1- I'm not sure what is meant by "kernel-mode" code, but hard-rt deterministic tasks in RTAI DO NOT need be written as kernel modules. In fact, such tasks can be written as normal shell applications that use a special RTAI syscall vector to access RTAI services, including morphing from normal Linux processes into RTAI-scheduled tasks, and hence obtaining deterministic scheduling.

2- To put Philippe's words in laymen's terms, the aim of RTAI/fusion is to allow normal Linux processes to be serviced by RTAI transparently without requiring the use of any special API. To this end, normal Linux application calls are transparently "redirected" to RTAI using the Adeos nanokernel. It must be said that while there are a few system calls already successfully diverted in this way, nanosleep() being an example, this is still a work in progress. The ultimate goal being to allow those tasks that use time-sensitive calls to obtain the performance they would obtain had they been running on a real hard-rt RTOS. Of course there are calls that cannot be "hardened". Needless to say that an open() or a read() on a file located in an ext3 partition is unlikely to be deterministic any time soon.

Personally, I believe that this approach to real-time is much more sane than threading the interrupt handlers and introducing yet another level of locks. The fact of the matter is that if we have hard-rt like that, we don't need the threading of int handlers and the likes. But if we have threaded int handlers and co., we still need hard-rt because reducing latency doesn't provide deterministic response times.

NOTE: deterministic hard-rt is not about speed, it's about determinism. While Ingo's work is great at reducing latency, it cannot guarantee response times regardless of the load, kernel configuration, and driver set. RTAI/fusion, and the Adeos interrupt pipeline on a smaller scale, can provide such guarantees.

Karim Yaghmour

Approaches to realtime Linux

Posted Oct 12, 2004 21:11 UTC (Tue) by icculus_98 (guest, #8535) [Link]

This is a module of RTAI called LXRT (or NEWLXRT), and its use is
encouraged over writing kernel modules (unless you need kernel
functionality). It allows hard and soft realtime response in user space.

Approaches to realtime Linux

Posted Oct 12, 2004 22:08 UTC (Tue) by simlo (guest, #10866) [Link] (4 responses)

Now due to this splindid article here at LWN I downloaded the latests version of RTAI and started to look at it. If it lives up to it's promises it is really great!

However, isn't Adeos/RTAI adding just an extra level of locking just as MontaVista's patch? And on top of that an extra scheduler is added!?

RTAI does sound like a more expensive solution to me. If the goal was to seperate the real time threads and the none real time threads having two different schedulers is a splindid idea. But RTAI want to make it look the same towards the programmer. I am afraid the coder will make the mitake of calling Linux system calls and break the real time behaviour of the system.

Making all these system calls unavailable from real time part would make more sense - and could justify having two scheduleres on the system. But if you want to make the systems look the same it sounds more like a temporarely solution.

Another problem is device drivers: You can't use the device drivers from normal Linux in you real time subsystem (I briefly looked at the RT-net project who have made seperate device drivers). With the idea of making Linux itself real time you can - but you might have to rewrite some to make them perform better wrt. latencies and prioritising access.

Approaches to realtime Linux

Posted Oct 14, 2004 5:02 UTC (Thu) by karim (subscriber, #114) [Link] (3 responses)

Where the line is drawn between what is "visible" and what isn't for hard-rt processes can be configurable. That's not a problem.

What is a problem is introducing subtilities in the kernel's behavior that are so convoluted as to be too complex for the majority of driver and applications writers as it is. For the past five years I have been the maintainer of the Linux Trace Toolkit. For having done that work, I can tell you that only a marginal number of programmers actually really understand how the kernel operates, and how its operation is infuenced by or influences that of user-space applications and drivers. Just last July I was speaking with Jim Gettys at the OLS and he told me how he'd love to see something as LTT integrated into the kernel because most developers out there simply have no idea what they are doing. Not because they're careless or because they don't want to know, but because their expertise is elsewhere, and they shouldn't be expected to know that much about the kernel's behavior.

This is very relevant to the current debate. The fact of the matter is that the RTAI/fusion development model is much easier to work with because the traditional developers do not need to be exposed to an API that is unlikely to be of use to them (and if the API is there, they will use it; not because they are irresponsible, but because as carefull programmers they will try to give the best out of the kernel for their application), and because those who need it get all they need from a very targeted set of services. Again, as I said earlier, making Linux respond faster does NOT solve the problem of providing deterministic hard-rt, but providing deterministic hard-rt does allow Linux to respond faster.

As for drivers, then yes real-time drivers are different from normal non-rt drivers. There is absolutely no way that all Linux drivers will become suited for hard-rt system just by redefining a few macros here and there. Hard-rt drivers require a hard-rt mindset.

If all this is about making "multi-media" respond better in Linux, then the argument can easily be made that such critical components of multi-media system ought to be deterministic hard-rt anyway. Such multi-media applications can successfully use the services of Adeos and RTAI/fusion, real hard-rt applications can't use the "better latency" schemes. Why settle for less?

Approaches to realtime Linux

Posted Oct 14, 2004 9:54 UTC (Thu) by simlo (guest, #10866) [Link] (2 responses)

(I have written some more comments under the recent article about MontaVistas's patch http://lwn.net/Articles/106011/.)

I know a bit about real-time programing on VxWorks and only a little about the Linux internals.

For performance on a normal time-sharing system it isn't a good idea to replace spinlocks with mutexes. A spinlock will perform much better than a mutex but will effectively raise the locking thread to maximum priority. Similarly, a system will perform better if interrupt-handlers are executed in interrupt context right away instead of being deferred to tasks but again interrupt context is the highest priority. I thus think it should be configureable wether a subsystem uses spinlocks or mutexes and wether the drivers you have included should be run in interrupt or deferred.

A way I could see it done is to make a macro system such that the average driver developer should do something like "GENERIC_LOCK_TYPE(CONFIG_MY_SUBSYSTEM_LOCKTYPE) lock;" instead of "spinlock_t lock;" In the configurator there should be an advanced section where you can change the new macroes away from the default . Many of them should be set such the type will become a spinlock. The configuarator should ofcourse also check for dependencies: If a lock can be taken from interrupt you have to use spinlock_t.

Similarly when you install an interrupt: It should send configureable parameter saying the wether it should run in interrupt or if not, at what priority. Again the configurator should make sure that if interrupt context is choosen the eventual lock-types must be spinlock.

This would in fact make driver development easier: You just pick that you always defer your interrupt handler to a thread and you always lock your subsystem with a mutex. Then you don't have to worry about what you can or not can do in interrupt context and while holding a spinlock. For the average coder this is the easiest approach.

All these extra parameters should be hidden for the average build-your-own kernel user but the real-time developers have to make sure that these parameters are set correctly for the specific system. I.e. all locks which can be hold for more than the accepted latency time must be set to be mutexes, but locks hold for times shorter than the accepted latency are better off being spinlocks.

I suggest the following seperation: Linux should be coded with these macroes instead of having everything always being spinlocks. In all places where spinlocks are known to be unavoidable - I guess that is really only in the very core part of the system - the spinlock must not be held for more than a few, bound number instructions. Linus's official tree should not be tested for other than the default settings. (Some of the drivers you find in Linus' tree haven't been tested either so there is nothing fundamentally new in such a policy.)

It is up to companies like MontaVista to test how you can change the various parameters and they can earn their living by selling that knowledge. It is also their job to check that the various subsystems behave nicely wrt. locking. Forinstance, if somebody wants to make a small real-time application using say a CAN device, MontaVista can then help them to verify that the specific CAN driver in question is "real-time", i.e. can't hold it's lock and thus block the real-time application for a non-deterministic amount of time. The real-time application can't ofcourse call directly into the IP stack or the filesystem or even allocate memory runtime, but will have to defer such operations to other threads. As long as all these subsystems don't spinlock for "too long" but can safely be configured to use mutexes you are in the clear.

It also MontaVista's job to tell their customer which drivers and subsystems are cleared with respect to holding spinlocks for "too long" and thus safely can be included in the kernel. The patches fixing such systems so they can be configured to use mutexes instead should be accepted into the main tree. Also patches to making execution times deterministic in various subsystem should be accepted such that these subsystems can be used in real-time applications.

So basicly I think Igno Molnar's approach is good. He just have to make it configureable. There is still way to go before you can make any real-time application at all but I don't think the path is blocked unless the main kernel developers is talked into blocking such a development. Making various subsystems be directly useable from real-time application will take a very long time making them stop interfering with real-time threads is doable with realtively non-intrusive patches.

Approaches to realtime Linux

Posted Oct 15, 2004 1:35 UTC (Fri) by karim (subscriber, #114) [Link] (1 responses)

The main reason I don't like RTAI/Fusion is that you have to make special drivers for it. If the real-time is included in the kernel you would "only" have to review the drivers and subsystems you want to call directly from you real-time threads and check their behaviour wrt. real-time. The obligation of the rest of the system is that it only holds spinlocks for a very short time and otherwise use mutexes as locks.

The point is as I made it before: for many developers, including driver developers, the kernel's behavior is not entirely clear. Add a new API and people will use it, and it will find its way into "normal" Linux drivers. And once it's everywhere we'll still be at square one in regards to finding who's influencing the latency ... The solution to this problem is to provide a very basic API that provides hard-rt while not being as simple to use as just yet another locking scheme. I believe the Adeos interrupt pipeline does this quite well for the reasons I have enumerated elsewhere, and it changes nothing to kernel's current behavior.

It is up to companies like MontaVista to test how you can change the various parameters and they can earn their living by selling that knowledge. It is also their job to check that the various subsystems behave nicely wrt. locking. Forinstance, if somebod...

Sorry, the Linux community is not about depending on any distro. The fact of the matter is that whatever new feature finds itself in the kernel out to be accessible to anyone out there who cares about that type of functionality, regardless of whether he uses Debian, MV, or if he/she builds his own from scratch. Notice that, as Jonathan points out, the existing minimal preemption functionality that's already there has not yet been adopted by all kernel developers. Certainly trying to sell this new preemption on steriods by making the case that distros will audit the kernel for their clients is likely to be received coldly.

Approaches to realtime Linux

Posted Oct 15, 2004 9:04 UTC (Fri) by simlo (guest, #10866) [Link]

Add a new API and people will use it, and it will find its way into "normal" Linux drivers.

What new API? To be honest I think there are already too many lock-APIs in the kernel and weird kind of rules of what kind of locks should be used where. What is needed is actually a cleanup such the developer only sees one API and a generic method for locking. The specific kind of lock should be set at configuration time. The developer has to worry about is to avoid deadlocks no matter what kind of locks is used, but nothing more really. By making interrupt handlers run in threads by default it will make it a easier for the driver developer as he don't have to worry about the special rules of interrupt context.

Another of my points is that there is no "one size fits all" solution. To make a real-time system you have to configure stuff for your specific application. Thus the kernel developer should not be giving a lot of new APIs and options. He shouldn't forinstance pick at what priority his interrupt has to be performed at. He should make it such that it will work no matter what the priority is - and maybe also such it works even if it runs in interrupt context still but that could be too high a demand on him.

Sorry, the Linux community is not about depending on any distro. The fact of the matter is that whatever new feature finds itself in the kernel out to be accessible to anyone out there who cares about that type of functionality, regardless of whether he uses Debian, MV, or if he/she builds his own from scratch. Notice that, as Jonathan points out, the existing minimal preemption functionality that's already there has not yet been adopted by all kernel developers. Certainly trying to sell this new preemption on steriods by making the case that distros will audit the kernel for their clients is likely to be received coldly.

There is something called experimental drivers. I tried out the ArcNet driver on 2.6.8.1. It called a NULL'ed function pointers. Not testet at all. Somebody has to test stuff - and Linus can't verify that everything have been tested in every configuration. One could say: "remove the ArcNet driver", but that would only make it much harder to get anyone to fix it. And even after I fixed it it still didn't work with SMP and preemption. I.e. you have configurations in the default kernel which simply does not work. You need someone to veryfy your configuration. That somebody can be yourself or you can buy help from a company.

Another examble is the PPC board sitting next to me on the table. Can I make Linux run it? Yes, I can, but I need an expert for it. MontaVista and other companies offers to sell be that expertice. I can buy that or I can spend approximately 2 weeks to figure it out myself.

All I say to the kernel community is: Make these things configurable, but allow the default kernel to have clauses with "If you pick these options don't expect your kernel to be stable". Let companies like MontaVista make a living of pushing into these areas. The most important thing is to avoid code forks which will make it a mess for everybody in the long run. On the other hand make the changes to the actual kernel you build from main tree minimal by make things configurable compile time.

Approaches to realtime Linux

Posted Oct 13, 2004 22:45 UTC (Wed) by bluefoxicy (guest, #25366) [Link] (1 responses)

NOTE: deterministic hard-rt is not about speed, it's about determinism. While Ingo's work is great at reducing latency, it cannot guarantee response times regardless of the load, kernel configuration, and driver set.

-- Karim Yaghmour

Purpose of the Project
The purpose of this effort is to to further reduce interrupt latency and to dramatically reduce task preemption latency in the 2.6 kernel series. Our broad objective is to achieve preemption latency bounded by the worst case IRQ disable.

-- http://source.mvista.com/linux_2_6_RT.html

Does this qualify? It's "bounded" yes? The bounding would qualify as true real-time, and the reduction of latency beyond that bound would just be a happy responsiveness bonus. Am I right? If not, please show err.

Approaches to realtime Linux

Posted Oct 14, 2004 5:31 UTC (Thu) by karim (subscriber, #114) [Link]

Have you kept a count of the number of "reduce" and "broad objective" in that phrase?

Deterministic hard-rt is not about broad objectives or reducing latencies, it's about making guarantees. As it stands, MV's PR relies on slides that show graphs with maximum interrupt disable times, whereupon they can tell crowds: "Here, we have a hard-rt system, it's maximum latency is such as such and measured by our tools." That's just crap because no matter how large a sample is used for measurement (or how long the measurement session lasts) determinism is not about spikes in a graph, it's about mathemathically/algorithmically-demonstrable time-bound operation regardless of load and driver set, and reducing the latency by threading interrupts and introducing new locking primitives does not change the problem: The Linux kernel was never architectured to be a hard-rt deterministic kernel and the drivers shipped and the applications that run on it have never been meant to provide such behavior.

I've said this elsewhere, there is no path of incremental patches that can be applied to the kernel that will make it magically become deterministic. The kernel is meant to provide a best-case scenario for all the software it interacts with: drivers and applications alike. Deterministic hard-rt is all about making guarantees, both in terms of time and in terms of resources.

The greater question that beholds kernel developers is: can the Linux kernel be made to exhibit deterministic hard-real-time bevahior while keeping it fit for the development of mainstream drivers and applications?

Don't get me wrong, reducing latency is an extremely worthy goal, and I encourage any effort in that direction. However, as much as I trust the Linux kernel development community's inventivness and adaptibility to constraints, as much as I believe that providing the type of services required for applications with extreme time-dependencies is a goal that is not reconciliable with making the Linux kernel an inviting platform for driver and application developers.

Personally, I believe that the preemptability feature, which in reality is not yet actually used by most users out there for many reasons including stability, should be dropped altogether in favor of a simple infrastructure that allows time-sensitive applications to get what they need in a Linux environment: deterministic access to outside events. I believe the Adeos interrupt pipeline is the least intrusive and the most effective way of achieving this. It is a very small patch, it provides deterministic hard-rt, it can be built upon to provide a wide-range of services (RTAI/fusion being an example), and from the API useability point of view, it clearly stands out from the rest of the kernel API as being targeted for extreme outside-event-timing-responsiveness-sensitivity and is therefore much less likely to be used by accident by a driver or application developer.

Approaches to realtime Linux

Posted Oct 12, 2004 21:26 UTC (Tue) by Quazatron (guest, #4368) [Link] (3 responses)

As a non-programmer, I'd very much like to see LWN run a simple article explaining the differences between semaphores, mutexes, spinlocks, etc.
I really like to read about kernel programming, but those structures are a complete mistery to me.

Quick and dirty guide to locking primitives

Posted Oct 12, 2004 23:29 UTC (Tue) by aya (guest, #19767) [Link] (1 responses)

Locking is all about not letting multiple processes do the same thing at the same time. For
example, say there's some code where every time it gets executed, it increments a counter
somewhere. Let's also say that to increment the counter, first you have to read its value,
then you have to add one to it, then you have to write it back. So, it's a three step process.
That means anyone incrementing the counter could be interrupted (preempted) in the
middle. If two processes are trying to increment the counter at the same time, something
like this could happen (in theory):

* process A reads the counter (counter = 1) and is interrupted
* process B reads the counter (counter = 1), adds one to it, and stores it back, then is
interrupted (counter now = 2)
* process A adds one to the value of the counter that IT read and stores it back (counter
now = 2 instead of 3)

Since process A had read the counter before process B stored the new value, it writes back a
wrong value. You can protect operations like this by using locks. in this case, to increment
the counter, you'd have to have control over a lock. You take the lock, do all the counter
incrementing, then release it. Since only one process is allowed to have control over the
lock, and our rules say you have to have control over the lock to increment the counter, the
above situation would look like this:

* process A takes the lock and reads the value of counter (counter = 1), and is interrupted
* process B tries to take the lock, but fails; it gets interrupted while waiting for the lock to
be released by process A
* process A finishes incrementing the counter (counter = 2), releases the lock, and gets
interrupted a little while later
* process B tries to take the lock again, succeeds, and increments the counter to 3, releases
the lock, then gets interrupted

Since the lock ensures that counter access is mutually exclusive among processes - that is,
only one process can do it at once - locks are often called mutexes. Also, any piece of code
that requires mutually exclusive access is called a critical section. In this case, incrementing
the counter is our critical section.

There are two basic kinds of lock in Linux: spinlocks and semaphores. The major difference
is how they handle waiting for locks to be released. Spinlocks sit in a loop, continually
checking the value of the lock to see if another process has taken it. Once another process
releases the lock, it can continue. This is simple, but wastes CPU time - if processes only
hold locks for a very short time, spinlocks are okay. Semaphores, on the other hand, put a
process to sleep if it tries to take a lock held by another process, and wake up the waiting
process when the lock gets released.

You may also want to read Rusty's Unreliable Guide to Locking. It's fairly old, but the basic
concepts are valid.
http://www.kernel.org/pub/linux/kernel/people/rusty/kerne...

Quick and dirty guide to locking primitives

Posted Oct 13, 2004 9:24 UTC (Wed) by Quazatron (guest, #4368) [Link]

That was very helpful, thank you!

The Parable of the Dining Programmers

Posted Oct 13, 2004 4:50 UTC (Wed) by maney (subscriber, #12630) [Link]

Five programmers sit around a circular table. Each programmer spends his life alternatively hacking and eating. In the center of the table is an assortment of sushi and accompanying dishes. By the rules of the house style guide, a programmer needs two chopsticks to eat a piece of sushi. Unfortunately, as programming is not so well paid as managing, the programmers can only afford five chopsticks. One chopstick is placed between each pair of programmers, and they must each use only the sticks to his immediate right and left.

For the rest of the details in slightly altered form, google(dining philosophers).

Use a coprocessor

Posted Oct 14, 2004 6:43 UTC (Thu) by BrucePerens (guest, #2510) [Link] (1 responses)

Aw, com'on, guys. An AVR with USB 2.0 interface costs $6 and runs a number of small real-time kernels. See its information here. The developer kit is about $120 from Digi-Key. Plug this into your Linux system and let it handle the real-time tasks.

Bruce

Use a coprocessor

Posted Oct 21, 2004 14:53 UTC (Thu) by renox (guest, #23785) [Link]

Why are you suggesting that the real-time processing task is small?

MontaVista's market is telecom where the processing is very heavy and needs real-time behaviour too..

A better example

Posted Oct 14, 2004 15:58 UTC (Thu) by ssavitzky (subscriber, #2855) [Link]

Although the nuclear power plant is a common example of a hard real-time application, there's a better one close at hand: audio.

You have a stream of 8- (phone), 16- (CD), or 24- (pro recording) bit samples moving through the system at anywhere from 6k to 100k per second. Drop one and you might not notice it, but the hardware cleverly batches them up into blocks. If you're not done with one block before the next one comes in, you will notice it.

This is why audio-oriented distros like DeMuDi and Planet CCRMA use kernels with low-latency patches installed.

Approaches to realtime Linux

Posted Oct 14, 2004 22:57 UTC (Thu) by brianomahoney (guest, #6206) [Link]

Once again the main point on HARD REAL-TIME is being completely missed;
the reality of the situation is that there is a lot of FUD/ingorance
here, and very little understanding of balanced Hard/soft system approaches.

Three cases arise, there is a VITAL HARD-REAL-TIME crisis time; use hardware,
maybe a $2 dedicated MPU, which may spend 99.99% idle but you can prove it
will meet the need

Then there is the we really need to service this event within nnn u-seconds
but if it takes that + 50% the sky wont fall or we need to do 8000 of these
per second; here latency improvement helps and if it dosnt introduce serious bugs. This is all good stuff.

It is a nice to have and of value dealing with PHBs, see above.