Revisiting the kernel's preemption models (part 1)

Posted Sep 22, 2023 12:34 UTC (Fri) by kazer (subscriber, #134462)
In reply to: Revisiting the kernel's preemption models (part 1) by mtodorov
Parent article: Revisiting the kernel's preemption models (part 1)

The full RT is aimed at cases where safety (hazard) analysis determines that it is a critical task that must be made reliable to nth degree (what would be worst case if it failed, is it car ABS, heart monitor or CD-player). If it were part of, say, fly-by-wire system there would not be need to be in any other mode than full RT always. Less critical systems could have whatever mode suitable for the task.

The RT mode is therefore determined by the task before code is ever deployed, not as a mode of operation "on the fly". RT is normally due to safety reasons and the worst case is what matters. If it safety isn't of consideration (only throughoutput) preemption isn't needed.

Revisiting the kernel's preemption models (part 1)

Posted Sep 22, 2023 12:43 UTC (Fri) by kazer (subscriber, #134462) [Link]

Adding to previous..

The main feature of full RT is that it is *deterministic* - things happen at precisely when predicted and expected. Latency or performance is not important and they can be sacrificed for the sake of correctness.

Servers can have a long latency for the sake of throughoutput and desktop users can prefer low latencies. While these deal with different pre-emption modes, they are not about real-timedness. So even in these cases boot-time selection is quite enough.

Revisiting the kernel's preemption models (part 1)

Posted Sep 22, 2023 13:56 UTC (Fri) by farnz (subscriber, #17727) [Link]

RT isn't only of importance in safety-critical systems; in audio systems, 3 ms of latency is roughly equivalent to 1 m distance between the output device and the listener. For simple listening cases, this is a non-issue; you just have a big enough buffer that any plausible latency is covered by the buffer (e.g. a 3,000 ms buffer will cover most likely stalls). But for telephony, live effects and other such cases where there's feedback between the microphone and the output, you need to bound latency to a much smaller value.

The usual distinction in the RT world is between hard RT and soft RT; a system is soft RT if it recovers from a missed deadline as soon as it starts meeting deadlines again, and hard RT if further action has to be taken to recover or if recovery from missed deadlines is not possible. In this setup, audio is soft RT - if you miss a deadline, the audio system is in a fault condition, but as soon as you start hitting deadlines again, it recovers automatically; a fly-by-wire system is hard RT, since a failure to meet deadlines can result in the plane needing maintenance work.

Revisiting the kernel's preemption models (part 1)

Posted Sep 23, 2023 6:17 UTC (Sat) by donald.buczek (subscriber, #112892) [Link] (3 responses)

> If it were part of, say, fly-by-wire system there would not be need to be in any other mode than full RT always.

Good example. You system doesn't need to be in RT mode if the plane is parked on the ground. It may, for example, be in some low power mode. When you are ready to go, you switch to fly mode which implies RT on critical systems. It's not critical, how long that takes. What you need, though, is an indication that the switch is completed.

Revisiting the kernel's preemption models (part 1)

Posted Sep 23, 2023 7:23 UTC (Sat) by mb (subscriber, #50428) [Link] (1 responses)

But you can do that now with an existing RT kernel.
If your RT application decides to go non-RT, then it can do that now. Just switch scheduling modes.
And if you decide to put your system into sleep, you can do that now.
Switching the kernel preemption mode at runtime is absolutely not necessary to do all that.

Revisiting the kernel's preemption models (part 1)

Posted Sep 23, 2023 8:27 UTC (Sat) by mtodorov (guest, #158788) [Link]

Yet for a number of reasons, the RT preemption model isn't compiled by default ...

Just as a curiosity, about a year ago the kernel 6.0 compiled with KASAN made a number of RCU stall warnings.

It was blamed on the KASAN slow down, but I felt something odd if some RCU lock was held for more than 20 milliseconds, enough to trigger the NMI watchdog.

After a year, I get the notion that it was due to the coarse granularity of the locks held while doing some tasks. In the worst case, an unlucky spinlocks contention can degrade your SMP system's performance in the worst case to a multicore 6502 ... Did somebody make a note on real-time telephony and heart monitor systems?

Don't take this as I am ungrateful for the kernel - I am just not fond of the spinlocks and I wish I had a new parallel programming paradigm and a better grasp on the lockless algorithms and the RCU ...

Revisiting the kernel's preemption models (part 1)

Posted Sep 25, 2023 10:23 UTC (Mon) by farnz (subscriber, #17727) [Link]

You need to separate out PREEMPT_RT from real-time mode.

PREEMPT_RT configured kernels have slightly higher overheads than normal kernels, especially under full load, in return for a hard guarantee on the latencies that can be measured by processes in real time scheduling classes (at the time of this comment, that's SCHED_FIFO, SCHED_RR and SCHED_DEADLINE). If you don't use real time scheduling classes, then the overhead of PREEMPT_RT is entirely wasted, because online (SCHED_OTHER) and batch (SCHED_BATCH, SCHED_IDLE) scheduling classes do not get lower latencies in a PREEMPT_RT kernel.

Separately, a system is "in real-time mode" if it has one or more processes in runnable state in a real-time scheduling class. You can "switch" in and out of "real-time mode" by either blocking all processes in real-time scheduling classes on a kernel primitive such as a futex, or by removing all processes from real-time scheduling classes.

You can be "in real-time mode" with a PREEMPT_NONE or PREEMPT_VOLUNTARY kernel; the issue with this is that the kernel does not provide any guarantee that your process will be scheduled in time to meet its deadlines in this situation, but it's useful for (e.g.) providing musicians in a studio with live monitoring of the recording their DAW is making (complete with applied effects), where the worst consequence of missing a deadline is that you'll need to do another take. You'd prefer not to have to do another take, hence using real time scheduling to ensure that only the kernel can cause you to miss a deadline.

You can also use a PREEMPT_RT kernel with no real-time tasks. In this case, the kernel's latency guarantees are worthless, because you have no tasks that get guaranteed latencies, but you pay a small amount of overhead to allow the kernel to reschedule bits of itself to keep the latency guarantees.

The intended case for PREEMPT_RT kernels is to have real-time tasks, at least some of the time, where the system has been analysed to guarantee that the real-time tasks will always meet their deadlines.

You don't need to change the kernel configuration to switch out of "real-time mode" and enter low power states that are incompatible with your deadlines; you just change your scheduling. You can't switch codepaths between PREEMPT and PREEMPT_RT at runtime, because you need to have the system go completely idle in order to change locking, which is also the pre-requisite for kexec. If you do want to switch kernels, you can kexec between kernels at roughly the same time penalty you'd face if you tried to switch at runtime.