Revisiting the kernel's preemption models (part 1)
Revisiting the kernel's preemption models (part 1)
Posted Sep 21, 2023 17:58 UTC (Thu) by mtodorov (guest, #158788)In reply to: Revisiting the kernel's preemption models (part 1) by mb
Parent article: Revisiting the kernel's preemption models (part 1)
However, I can see the need for that, i.e. in changing a battleship's computer's mode from "normal operation" in which performance and computing efficiency is preferred to a "battle mode" or "condition red" where latency would be a paramount for ship's survival.
Posted Sep 21, 2023 20:49 UTC (Thu)
by geofft (subscriber, #59789)
[Link] (1 responses)
So you can't really convert this in a running system, because locks can be held at any time. You'd have to acquire the spinlock (which can block), then turn it into a sleeping lock, then figure out who's been spinning on the spinlock while you held it and somehow convert them. For a use case like going into battle, even if this were technically possible, having the mode switch require acquiring every lock in the system isn't acceptable because it could take indefinitely long. (And even if it weren't for this, there would be an ongoing performance cost in both modes in order to support checking what mode you're in and so forth, meaning that performance will never be as good as a pure real-time or pure non-real-time kernel.)
Posted Sep 21, 2023 21:26 UTC (Thu)
by mtodorov (guest, #158788)
[Link]
The locks which are currently held in a non-preemptible mode are obviously going to be held, but the idea behind locking is that it is short-held for performance reasons anyway.
I recall the kernel NMI timeouts on i.e. RCU locks being held for 20 seconds, but IMHO that is a sign of something deeply wrong in the kernel code ...
However, I have given up the idea that I will ever understand fully how +32M lines of code work and inter-operate.
Spinlocks are the idea of waiting from 6502 and Z80 and I think they're evil.
Posted Sep 22, 2023 3:37 UTC (Fri)
by josh (subscriber, #17465)
[Link]
Switching modes is a complex and error-prone operation that affects the operation of all other software, so the net effect of doing this on the verge of a critical situation would be to switch into a less-tested mode for all software. Such a mode switch would need to be regularly tested, but even such testing would not be as good as simply running in that mode all the time.
Posted Sep 22, 2023 10:38 UTC (Fri)
by cloehle (subscriber, #128160)
[Link] (3 responses)
Either you're application is real-time critical or it's not. Even in your case "Switching to battle mode" would be time-critical, wouldn't it?
Posted Sep 22, 2023 12:17 UTC (Fri)
by Wol (subscriber, #4433)
[Link] (2 responses)
> Either you're application is real-time critical or it's not. Even in your case "Switching to battle mode" would be time-critical, wouldn't it?
Actually if you really mean real-time, no it wouldn't.
The term "real-time" is, sadly, horribly abused and often confused with "online".
We basically have three - conflicting - targets we may want to optimise for.
(1) Batch: Do as much work as possible, as efficiently as possible, and if some jobs take forever, well they do.
(2) Online: Don't keep the user waiting. If that means background tasks get held up because we're wasting time checking to see if the user wants anything, well so be it.
(3) Real time: We have deadlines. They MUST be met. We might have a job that only takes 20 minutes, but it needs to be completed AT midnight tomorrow. Or say we're making jam. Finish too early, and the jam will run away off the toast. Finish too late and we won't be able to get it out of the pan and into the jars. Basically, for real-time it doesn't matter when the deadline is, it matters that you can't move it.
So take your battleship. Real-time doesn't mean the transition will happen quickly. Real-time means you know that if you engage real-time mode, the cascade of actions to change OS state will take, say. pretty much exactly five minutes (or whatever time that is).
Cheers,
Posted Sep 22, 2023 18:11 UTC (Fri)
by cloehle (subscriber, #128160)
[Link] (1 responses)
The critical path for the RT reaction just increased from signal -> RT reaction (with a defined latency upper bound for RT systems) to signal -> switch to RT-mode -> RT reaction (no upper bound anymore).
Or how does the system know when to be in RT-mode?
Posted Sep 22, 2023 18:41 UTC (Fri)
by mb (subscriber, #50428)
[Link]
There is no such thing as a "system RT mode".
The kernel just provides mechanisms for tasks to be scheduled with various RT properties (e.g. RT-Fifo, RT-RR, deadline sched, locking latency guarantees, etc..).
A task has to opt-in to being RT. (And selecting a RT sched class is only part of that. There are many many many more things to consider.)
And that's why it doesn't make sense at all to switch between full-preempt and RT-preempt during runtime. That would just add even more overhead with no added benefit. If your task doesn't need RT, then don't schedule it in a RT class.
Posted Sep 22, 2023 12:34 UTC (Fri)
by kazer (subscriber, #134462)
[Link] (6 responses)
The RT mode is therefore determined by the task before code is ever deployed, not as a mode of operation "on the fly". RT is normally due to safety reasons and the worst case is what matters. If it safety isn't of consideration (only throughoutput) preemption isn't needed.
Posted Sep 22, 2023 12:43 UTC (Fri)
by kazer (subscriber, #134462)
[Link]
The main feature of full RT is that it is *deterministic* - things happen at precisely when predicted and expected. Latency or performance is not important and they can be sacrificed for the sake of correctness.
Servers can have a long latency for the sake of throughoutput and desktop users can prefer low latencies. While these deal with different pre-emption modes, they are not about real-timedness. So even in these cases boot-time selection is quite enough.
Posted Sep 22, 2023 13:56 UTC (Fri)
by farnz (subscriber, #17727)
[Link]
RT isn't only of importance in safety-critical systems; in audio systems, 3 ms of latency is roughly equivalent to 1 m distance between the output device and the listener. For simple listening cases, this is a non-issue; you just have a big enough buffer that any plausible latency is covered by the buffer (e.g. a 3,000 ms buffer will cover most likely stalls). But for telephony, live effects and other such cases where there's feedback between the microphone and the output, you need to bound latency to a much smaller value.
The usual distinction in the RT world is between hard RT and soft RT; a system is soft RT if it recovers from a missed deadline as soon as it starts meeting deadlines again, and hard RT if further action has to be taken to recover or if recovery from missed deadlines is not possible. In this setup, audio is soft RT - if you miss a deadline, the audio system is in a fault condition, but as soon as you start hitting deadlines again, it recovers automatically; a fly-by-wire system is hard RT, since a failure to meet deadlines can result in the plane needing maintenance work.
Posted Sep 23, 2023 6:17 UTC (Sat)
by donald.buczek (subscriber, #112892)
[Link] (3 responses)
Good example. You system doesn't need to be in RT mode if the plane is parked on the ground. It may, for example, be in some low power mode. When you are ready to go, you switch to fly mode which implies RT on critical systems. It's not critical, how long that takes. What you need, though, is an indication that the switch is completed.
Posted Sep 23, 2023 7:23 UTC (Sat)
by mb (subscriber, #50428)
[Link] (1 responses)
Posted Sep 23, 2023 8:27 UTC (Sat)
by mtodorov (guest, #158788)
[Link]
Just as a curiosity, about a year ago the kernel 6.0 compiled with KASAN made a number of RCU stall warnings.
It was blamed on the KASAN slow down, but I felt something odd if some RCU lock was held for more than 20 milliseconds, enough to trigger the NMI watchdog.
After a year, I get the notion that it was due to the coarse granularity of the locks held while doing some tasks. In the worst case, an unlucky spinlocks contention can degrade your SMP system's performance in the worst case to a multicore 6502 ... Did somebody make a note on real-time telephony and heart monitor systems?
Don't take this as I am ungrateful for the kernel - I am just not fond of the spinlocks and I wish I had a new parallel programming paradigm and a better grasp on the lockless algorithms and the RCU ...
Posted Sep 25, 2023 10:23 UTC (Mon)
by farnz (subscriber, #17727)
[Link]
You need to separate out PREEMPT_RT from real-time mode.
PREEMPT_RT configured kernels have slightly higher overheads than normal kernels, especially under full load, in return for a hard guarantee on the latencies that can be measured by processes in real time scheduling classes (at the time of this comment, that's SCHED_FIFO, SCHED_RR and SCHED_DEADLINE). If you don't use real time scheduling classes, then the overhead of PREEMPT_RT is entirely wasted, because online (SCHED_OTHER) and batch (SCHED_BATCH, SCHED_IDLE) scheduling classes do not get lower latencies in a PREEMPT_RT kernel.
Separately, a system is "in real-time mode" if it has one or more processes in runnable state in a real-time scheduling class. You can "switch" in and out of "real-time mode" by either blocking all processes in real-time scheduling classes on a kernel primitive such as a futex, or by removing all processes from real-time scheduling classes.
You can be "in real-time mode" with a PREEMPT_NONE or PREEMPT_VOLUNTARY kernel; the issue with this is that the kernel does not provide any guarantee that your process will be scheduled in time to meet its deadlines in this situation, but it's useful for (e.g.) providing musicians in a studio with live monitoring of the recording their DAW is making (complete with applied effects), where the worst consequence of missing a deadline is that you'll need to do another take. You'd prefer not to have to do another take, hence using real time scheduling to ensure that only the kernel can cause you to miss a deadline.
You can also use a PREEMPT_RT kernel with no real-time tasks. In this case, the kernel's latency guarantees are worthless, because you have no tasks that get guaranteed latencies, but you pay a small amount of overhead to allow the kernel to reschedule bits of itself to keep the latency guarantees.
The intended case for PREEMPT_RT kernels is to have real-time tasks, at least some of the time, where the system has been analysed to guarantee that the real-time tasks will always meet their deadlines.
You don't need to change the kernel configuration to switch out of "real-time mode" and enter low power states that are incompatible with your deadlines; you just change your scheduling. You can't switch codepaths between PREEMPT and PREEMPT_RT at runtime, because you need to have the system go completely idle in order to change locking, which is also the pre-requisite for kexec. If you do want to switch kernels, you can kexec between kernels at roughly the same time penalty you'd face if you tried to switch at runtime.
Revisiting the kernel's preemption models (part 1)
Revisiting the kernel's preemption models (part 1)
Revisiting the kernel's preemption models (part 1)
Revisiting the kernel's preemption models (part 1)
Revisiting the kernel's preemption models (part 1)
Wol
Revisiting the kernel's preemption models (part 1)
Revisiting the kernel's preemption models (part 1)
But that does *not* mean that every task suddenly becomes a RT task. In fact, in a RT system almost all tasks usually still run in normal scheduling mode.
Normal tasks on a RT kernel basically just behave like tasks on a full-preempt kernel. Just with a little bit more overhead here and there. (I'm simplifying for the sake of briefness)
Revisiting the kernel's preemption models (part 1)
Revisiting the kernel's preemption models (part 1)
Revisiting the kernel's preemption models (part 1)
Revisiting the kernel's preemption models (part 1)
Revisiting the kernel's preemption models (part 1)
If your RT application decides to go non-RT, then it can do that now. Just switch scheduling modes.
And if you decide to put your system into sleep, you can do that now.
Switching the kernel preemption mode at runtime is absolutely not necessary to do all that.
Revisiting the kernel's preemption models (part 1)
Revisiting the kernel's preemption models (part 1)