LWN: Comments on "The realtime preemption mini-summit"

Lock naming

efexis — Sun, 11 Oct 2009 16:34:35 +0000

Sleepy sounds like they might run a bit slow and probably need to sleep... if the locks may end up sleeping due to external conditions then it should be a narcolocksy :-)

The realtime preemption mini-summit

efexis — Sun, 11 Oct 2009 16:02:21 +0000

Priority inheritance may often/usually not be the best way to do things by design (ie, try not to rely on it) sure, but is always better to have support for it to avoid inversion just-in-case, than not and have a Pathfinder style incident on your hands :-)

The realtime preemption mini-summit

jnareb — Sun, 11 Oct 2009 09:53:26 +0000

> So the realtime tests might remain on their own, but it would be nice, at least, to standardize test options and output formats to help with the automation of testing. XML output from test programs is favored by some, but it is fair to say that XML is not universally loved in this crowd.

Why not use Test Anything Protocol (TAP), originally developed for unit testing of the Perl interpreter? See http://www.testanything.org

The realtime preemption mini-summit

simlo — Tue, 06 Oct 2009 15:37:23 +0000

As I said: You could leave half-PI-aware : Let readers boost the writer, but not the other way around. This will most likely work in many cases. It means a RT task can't write-lock a rwlock must defer the operation to another task. Config options are needed....

The realtime preemption mini-summit

bdonlan — Mon, 05 Oct 2009 19:10:59 +0000

Buggy instructions can also be fixed in the kernel, however, and at least then you know about them. While this may be a bit unfeasable for Windows, there should be some kind of switch Linux can use to disable the SMI handling, and just pass things to the normal #UD handler. If you then include hooks for any operations needing emulation at the same time as loading new microcode to disable the hardware support, no problem.

Typical geek party

dvhart — Sun, 04 Oct 2009 14:03:59 +0000

That or when the camera came out we all tried to find something else to focus on. ;-) Actually, there was discussion 100% of the time. Most of time, about 1/3 of the room was participating while the others focused on something else. I think that's probably typical (or even pretty good) given the diversity of topics and expertise in the room.

About 2.6.29.5-rt22-tirqonly patch and the exact test scenario.

dvhart — Sun, 04 Oct 2009 13:59:42 +0000

That isn't a patch, it's just a .config setting. Grab the 2.6.29-rt22 patches (see Download on rt.wiki.kernel.org) and set CONFIG_PREEMPT (not CONFIG_PREEMPT_RT) and enable hard and soft threaded irq's to yes. As for the exact test scenario, I don't have the details, but running dbench is fairly straightforward and will easily reproduce these results. Ingo did so with a simple 10 second run during discussions at the rt-summit.

The realtime preemption mini-summit

dvhart — Sun, 04 Oct 2009 13:56:11 +0000

WRT rwlocks. We actually cap reader count to 1 in PREEMPT_RT for that very reason. This is unfortunate, and one of the causes for performance degradation on -rt for certain workloads. There was some discussion during the rt-summit in Dresden about making kernel rwlocks non-pi-aware for this reason. Some more investigation is needed before we make a decision there.

The realtime preemption mini-summit

dvhart — Sun, 04 Oct 2009 13:51:33 +0000

There are other less known uses for SMIs that are an unfortunately reality of our world. Fixing hardware bugs is one. A buggy instruction for instance can get emulated under an SMI. It would be wonderful if those things never existed, but in practice, that just isn't the case.

Typical geek party

nevets — Thu, 01 Oct 2009 06:31:40 +0000

If you look closely at the picture of everyone. You will notice that they are (mostly) all concentrating on their laptops. This probably shows that the room was silent most of the time, and everyone was communicating over IRC!

The realtime preemption mini-summit

jcm — Wed, 30 Sep 2009 19:20:45 +0000

Sounds like you guys had fun :)

Lock naming

doogie — Wed, 30 Sep 2009 17:32:25 +0000

You mean baby bear.

POSIX shmem in PostgreSQL

nix — Wed, 30 Sep 2009 15:14:13 +0000

Oh, curses. We need a POSIX syscall that does what lsof/fuser do, really.

Lock naming

nevets — Wed, 30 Sep 2009 12:03:50 +0000

goldielocks was indeed mentioned. But the sleepy locks were not. I'll have to have Jon add that one to the list of possibilities. :-)

The realtime preemption mini-summit

mjthayer — Wed, 30 Sep 2009 08:05:48 +0000

> Or you could say that writers don't boost the readers but readers can boost the single writer. That way you can't use rwlocks in real time tasks and that would not be a problem in most cases.
So to return to my previous question, this would simply mean not trying to get it "right" for this API and clearly write that on the box.

> But the kernel would need a lot of review to be sure and therefore I fully understand the current solution in the preemt RT patche.
Of course I was naively thinking that the API user would be aware of what locking they are using, but that won't hold if they are doing the locking implicitly through other APIs.

The realtime preemption mini-summit

simlo — Wed, 30 Sep 2009 07:49:27 +0000

Well, as for being one of those who actually pushed and implemented a little bit of the priority inheritance in the beginning, I must say that he is just making excuses for not making it in RTLinux, because making it right _is_ indeed very hard. From experience I know it is done wrong even in VxWorks!

But it can be done, and it is done in the current rtmutex in Linux.

From the cases he is talking about, shows me that he has not understood how to use the system at all. He does the usual mistake of not distinguishing between a mutex and a semaphore used as a condition (i.e. waiting for some external event to happen).

Yes, making an RT application work with priority inheritance mutex requires some programming rules: You can't block on a semaphore, socket etc. while holding a lock. But, heck you should always try to avoid that in any multi-threaded program to avoid the whole program eventually locking up because some message didn't arrive over TCP as expected.

In general locks should only be used "inside a module" to protect it's state. The user of the module should not be aware of it. The modules should be ordered such that low level module is not calling a highlevel module with an internal the lock taken - or you can create a deadlock. Or a even simpler rule: Newer make a call to another module with a lock held. In a RT environment with priority inheritance the module can use this to ensure the timing of all the calls to it because all the modules "lower" in the chain have a known timing and you therefore know the maximum time all the internal locks can be held by any thread.

And yes, priority inheritance takes a lot of performance. But in general you should try to avoid congestion and make your program such that the locks are not contended. The locks should only be considered as "safeguards" against a contention, which should not happen very often.

If you know how to use locks, and can avoid the pitfalls, priority inheritance will work for you - provided they are properly implemented by the OS. As is done in Linux.

Wrt. rwlocks: If a high priority, realtime writer wants the lock, it doesn't make sense to boost the readers as you don't know how many there are. What you could do was to limit the number of readers to specific number. Or you could say that writers don't boost the readers but readers can boost the single writer. That way you can't use rwlocks in real time tasks and that would not be a problem in most cases. But the kernel would need a lot of review to be sure and therefore I fully understand the current solution in the preemt RT patche.

About 2.6.29.5-rt22-tirqonly patch and the exact test scenario.

jstultz — Tue, 29 Sep 2009 23:20:30 +0000

Yea, sorry, the chart wasn't originally intended to be distributed as far as it has, so I wasn't as rigorous with the data as I should have been.

2.6.29.5-rt22-tirqonly is the same as 2.6.29.5-rt22 with CONFIG_PREEMPT_RT disabled (CONFIG_PREEMPT is used instead).

I booted with maxcpus=$NP for each cpu point, and with dbench-3.04, ran:
./dbench $NP -t 7000 -D . -c client.txt

POSIX shmem in PostgreSQL

alvherre — Tue, 29 Sep 2009 22:47:39 +0000

BTW, mmap does not work either.

Lock naming

rahulsundaram — Tue, 29 Sep 2009 21:08:34 +0000

Call them sleepy locks then.

Lock naming

niv — Tue, 29 Sep 2009 20:42:21 +0000

"Well, some locks are to heavy, some are too lightweight. Since these are Just Right, they are obviously goldilocks."

Just have to applaud :).

Humor aside, we really do have to get the naming right - there's enough confusion as it is, as Jon points out. Lock names really need to be self-explanatory, or at very least, imply behavior that's somewhat in the ballpark of actual behavior. Spinlocks that can sleep should have big, flashing red neon warning signs or some equivalent thereof in their name, ideally.

What do they mean by "Realtime"?

niv — Tue, 29 Sep 2009 20:15:09 +0000

Determinism is what's really important to real-time.

It's often confused with low latency, but the two are separate criteria and often conflicting goals requiring a trade-off, made complicated by the fact that most applications typically want BOTH - determinism AND low latency.

Determinism is easier understood as the ability to say "this task will take AT MOST n ms". That is, bounded maximum latency.

In the strictest case, this would mean the following:

it is preferable for all 5000 iterations of a task execution
to take 49us (less than 50us) than it is for 4950 to take
35us and 50 iterations to take 69us, when your application
requires a maximum latency of 50us.

For most enterprise applications, the max latency is not a MUST_FINISH_BY with severe consequences for failure, but a REALLY_GOOD_TO_FINISH_WITHIN, with the average low latency being also important. Some applications can tolerate some outliers (maximum latency bound exceeded) as they usually need average low latency as well.

Most OSs are optimized for throughput-driven applications (where average latency is minimized).

Real-time Linux is optimized to offer greater determinism than the stock kernel. Hence the need for greater preemption, including the ability to preempt critical kernel tasks should a higher priority application become runnable.

And remember, you can only guarantee/meet real-time requirements for as many threads as you can run concurrently on your system - on an N-core system, you can at most guarantee that N SCHED_FIFO tasks at the same highest priority P will meet their real time guarantees (depending on a lot of things, handwave, handwave, but you get the general idea). So a lot depends on what the system is running, the overall application solution and top-down configuration of the entire system.

The realtime preemption mini-summit

mjthayer — Tue, 29 Sep 2009 20:09:40 +0000

> I mean, assume you have code like
>
> int get_foo(struct foo* f)
> {
> lock_mutex(&mutex);
> memcpy(f, &source, sizeof(struct foo));
> unlock_mutex(&mutex);
> }
That particular example could be solved by RCU, although I don't want to start a showdown here, as I'm sure you would win it :) I was thinking more on the lines of avoiding contention in the critical path as much as possible though.

POSIX shmem in PostgreSQL

alvherre — Tue, 29 Sep 2009 19:50:51 +0000

It's not just history. In fact, a patch was posted to add support for POSIX shmem, but as it turns out, the POSIX API is not complete enough for PostgreSQL's purposes. See here, for instance: http://archives.postgresql.org/pgsql-patches/2007-02/msg0...

What do they mean by "Realtime"?

dlang — Tue, 29 Sep 2009 18:56:59 +0000

by realtime they don't mean responding in the minimum time

they are looking for a response in a _predictable_max_ time

how short that predictable time is determines how suitable it is for a particular application, but the key thing is to make it predictable.

right now linux is not predictable, that is what they are working on fixing.

What do they mean by "Realtime"?

clameter — Tue, 29 Sep 2009 18:20:46 +0000

It seems that the realtime folks are fuzzy on what they are trying to accomplish. I thought realtime was ensuring that the kernel always responds in a mininum time interval to an event but I dont see any discussion of what the minimum time interval is.

From the article it seems that there are numerous features in the kernel that are currently not "Realtime". That probably means that the potential latencies are beyond any assumable time interval. This includes such basic things as locking.

What is meant by "Realtime" then? What set of functionality of the kernel can be used so that a response is guaranteed within the time interval?

The realtime preemption mini-summit

aleXXX — Tue, 29 Sep 2009 18:14:53 +0000

Yes, it's messy.

Still it's a practical tool and works in general.
Another problem is that most RTOSes don't have the full priority
inheritance implemented, but a simplified version.
E.g. eCos (and I think also vxworks) raise the priorities as expected,
but lower them again when all mutexes in the system are released again.
This can be very late.

The poster before said:
"that they should either not wait for potentially low priority processes
in critical paths,"

This is not easy.
I mean, assume you have code like

int get_foo(struct foo* f)
{
lock_mutex(&mutex);
memcpy(f, &source, sizeof(struct foo));
unlock_mutex(&mutex);
}

i.e. you just protect access to the variable "source". You may need this
information in a low priority thread. The code looks innocent, there are
no loops, nothing undeterministic, it will take at less than 10
microseconds.
So why now wait use the same mutex in all other threads ?
The issue is when a medium priority thread comes into play, suddenly the
code above can block a higher priority thread for a time determined by
the medium priority thread (which does not use that mutex at all).

Also, "make sure that the processes waited for already have the right
priority" is basically saying that all threads using the same mutex
should have the same priority ?
Doesn't work.

So, this is a hard issue, and there's no easy solution.
Maybe, try not to use too many shared variables, let your threads
communicate via messages/tokens/etc.
This helps, but everything gets asynchronous, which doesn't make things
necessarily easier.

Alex

The realtime preemption mini-summit

flewellyn — Tue, 29 Sep 2009 14:58:21 +0000

Well, the SysV stuff for a lot of things is unpleasant to work with. Still, it does work.

Lock naming

nettings — Tue, 29 Sep 2009 14:44:52 +0000

> There was some talk of the best names for "atomic spinlocks"; they could be "core locks," "little kernel locks," or "dread locks."

Well, some locks are to heavy, some are too lightweight. Since these are Just Right, they are obviously goldilocks.

The realtime preemption mini-summit

abacus — Tue, 29 Sep 2009 10:59:41 +0000

Priority inheritance is indeed messy. The following paper contains interesting background information: Victor Yodaiken, Against Priority Inheritance, July 2002.

About 2.6.29.5-rt22-tirqonly patch and the exact test scenario.

leemgs — Tue, 29 Sep 2009 09:28:33 +0000

I think that this is good information for realtime developers.
Can I get 2.6.29.5-rt22-tirqonly patch and the exact test scenario
about this result among 2.6.29.5 and 2.6.29.5-rt22 and
2.6.29.5-rt22-tirqonly by Darren Hart and and John Stultz?

I can just find test result at
http://dvhart.com/darren/rtlws/elm3c160-dbench-vanilla-vs... file without test scenario
using dbench(http://samba.org/ftp/tridge/dbench/).

The realtime preemption mini-summit

nix — Tue, 29 Sep 2009 09:23:18 +0000

It's a bit unfortunate that the ability to have the OS actually control the machine is relegated to a "premium real-time mode" on "select platforms". *Everything* should work like this.

The only tolerable use for SMIs IMNSHO is emergency thermal control, i.e. keeping the hardware safe...

The realtime preemption mini-summit

nix — Tue, 29 Sep 2009 09:21:47 +0000

Yes. I don't really know why databases use SysV SHM: the API for all the SysV stuff is so cracksmoking and unpleasant and non-Unixlike. I suppose POSIX shared memory functions didn't exist when PostgreSQL was young, and mmap() of /dev/null wasn't very portable at that point... so maybe it's just history.

The realtime preemption mini-summit

nix — Tue, 29 Sep 2009 09:20:24 +0000

Oh right. My ignorance is showing, so I'll go and correct that before saying anything else (and add a comment here once I know one way or the other).

The realtime preemption mini-summit

mjthayer — Tue, 29 Sep 2009 08:28:38 +0000

This is probably a very silly question, but priority inheritance seems to be such a messy thing to do right - wouldn't it be better to tell API users directly that it is not guaranteed and that they should either not wait for potentially low priority processes in critical paths, or make sure that the processes waited for already have the right priority? I understand that it is preferable to solve things in a generic way where that is feasible, but one has to be careful that the solution doesn't end up being worse than the problem.

-rt tree

dvhart — Tue, 29 Sep 2009 07:15:38 +0000

The rt wiki http://rt.wiki.kernel.org is a good source of information as well, including links to the download site.

-rt tree

corbet — Tue, 29 Sep 2009 05:38:34 +0000

See, for example, the 2.6.31-rt11 announcement.

The realtime preemption mini-summit

mcgrof — Tue, 29 Sep 2009 05:26:14 +0000

Ah got it -- thanks. And what is the tree where I can pull all this from to test?

The realtime preemption mini-summit

josh — Tue, 29 Sep 2009 03:15:32 +0000

No, nobody has a patch which converts all ISRs individually to threaded interrupt handling. Mainline allows selectively making individual interrupts threaded. The -rt patchset allows making *all* interrupts automatically threaded, with no individual changes to each one.

The realtime preemption mini-summit

mcgrof — Tue, 29 Sep 2009 02:02:00 +0000

Does someone really have a patch which coverts all drivers to use thread IRSs? If so I'd like to see the wireless parts :)

The realtime preemption mini-summit

niv — Mon, 28 Sep 2009 23:15:05 +0000

Jon, thanks for the excellent write-up as usual.

"Some hardware allows SMIs to be disabled, but it's never clear what the consequences of doing so might be; if the CPU melts into a puddle of silicon, the resulting latencies will be even worse than before".

Just to elaborate on the above, and to clear up any doubt on the issue, as we (IBM) have actually done work to remediate SMIs - we are pretty confident that our CPUs will not melt into a puddle, and we officially support on select platforms IBM premium real-time mode which allows us to do this safely (don't try this at home :)).

In all seriousness, though, we have open-sourced the work that Keith Mannthey has done (he talked about this at LPC this past week, and we'll have his slides up on the LPC website shortly) and I imagine it would be of interest to others.