|
|
Log in / Subscribe / Register

Controlling realtime priorities in kernel threads

By Jonathan Corbet
April 23, 2020
The realtime scheduler classes are intended to allow a developer to state which tasks have the highest priorities with the assurance that, at any given time, the highest-priority task will have unimpeded access to the CPU. The kernel itself carries out a number of tasks that have tight time constraints, so it is natural to want to assign realtime priorities to kernel threads carrying out those tasks. But, as Peter Zijlstra argues in a new patch set, it makes little sense for the kernel to be assigning such priorities; to put an end to that practice, he is proposing to take away most of the kernel's ability to prioritize its own threads.

In the classic realtime model, there are two scheduling classes: SCHED_FIFO and SCHED_RR. Processes in either class have a simple integer priority. SCHED_FIFO processes run until they voluntarily give up the CPU, with the highest-priority process going first. SCHED_RR, instead, rotates through all runnable processes at the highest priority level, giving each a fixed time slice. In either class, processes with a lower realtime priority will be completely blocked until all higher-priority processes are blocked, and processes in either class will, regardless of priority level, run ahead of normal, non-realtime work in the SCHED_NORMAL class.

The kernel pushes a large (and increasing) amount of work out into kernel threads, which are special processes running within the kernel's address space. This is done to allow that work to happen independently of any other thread of execution, under the control of the system scheduler. Most kernel threads run in the SCHED_NORMAL class and must contend with ordinary user-space processes for CPU time. Others, though, are deemed special enough that they should run ahead of user-space work; one way to make that happen is to put those threads into the SCHED_FIFO class.

But then a question arises: which priority should any given thread have? Answering that question requires judging the importance of a given thread relative to all of the other threads running at realtime priority — and relative to any user-space realtime work as well. That is going to be a difficult question to answer, even if the answer turns out to be the same for every system and workload, which seems unlikely. In general, kernel developers don't even try; they just pick something.

Zijlstra believes that this exercise is pointless: "the kernel has no clue what actual priority it should use for various things, so it is useless (or worse, counter productive) to even try". So he has changed the kernel's internal interfaces to take away the ability to run at a specific SCHED_FIFO priority. What remains is a set of three functions:

    void sched_set_fifo(struct task_struct *p);
    void sched_set_fifo_low(struct task_struct *p);
    void sched_set_normal(struct task_struct *p, int nice);

For loadable modules, these become the only functions available for manipulating a thread's scheduling information. All three functions are exported only to modules with GPL-compatible licenses. A call to sched_set_fifo() puts the given process into the SCHED_FIFO class at priority 50 — halfway between the minimum and maximum values. For threads with less pressing requirements, sched_set_fifo_low() sets the priority to the lowest value (one) instead. Calling sched_set_normal() returns the thread to the SCHED_NORMAL class with the given nice value.

The bulk of the patch set consists of changes to specific subsystems to make them use the new API; it gives a picture of how current kernels are handling SCHED_FIFO threads now. Here's what turns up:

SubsystemPriorityDescription
Arm bL switcher 1 The Arm big.LITTLE switcher thread
crypto 50 Crypto engine worker thread
ACPI 1 ACPI processor aggregator driver
drbd 2 Distributed, replicated block device request handling
PSCI checker 99 PSCI firmware hotplug/suspend functionality checker
msm 16 MSM GPU driver
DRM 1 Direct rendering request scheduler
ivtv 99 Conexant cx23416/cx23415 MPEG encoder/decoder driver
mmc 1 MultiMediaCard drivers
cros_ec_spi 50 ChromeOS embedded controller SPI driver
powercap 50 "Powercap" idle-injection driver
powerclamp 50 Intel powerclamp thermal management subsystem
sc16is7xx 50 NXP SC16IS7xx serial port driver
watchdog 99 Watchdog timer driver subsystem
irq 50 Threaded interrupt handling
locktorture 99 Locking torture-testing module
rcuperf 1 Read-copy-update performance tester
rcutorture 1 Read-copy-update torture tester
sched/psi 1 Pressure-stall information data gathering

As one can see, there is indeed a fair amount of variety in the priority values chosen by kernel developers for their threads. Additionally, the drbd driver was using the SCHED_RR class for reasons that weren't entirely clear. After Zijlstra's patch set is applied, all of the subsystems using a priority of one have been converted to use sched_set_fifo_low(), while the rest use sched_set_fifo(), giving them all a priority of 50.

There have been responses to a number of the patches thus far, mostly offering Reviewed-by tags or similar. It seems that few, if any, kernel developers are strongly attached to the SCHED_FIFO priority values that they chose when they had to come up with a number to put into that structure field. It is thus unlikely that there is going to be any sort of serious opposition to this patch set going in.

The end result is not limited to a rationalization of SCHED_FIFO values inside the kernel, though. One of the objections Zijlstra raises about SCHED_FIFO in general is that, even if a developer is able to choose perfect priority values for their workload, all that work goes by the wayside if that workload has to be combined with another, which will have its own set of priority values. The chances of those two sets of values combining into a coherent whole are relatively small.

In current kernels, every realtime workload using SCHED_FIFO faces this problem, since the priority choices made for that workload have to be combined with the choices made for kernel threads — choices that have not really been thought through and which are not documented anywhere. Making the kernel's configuration for SCHED_FIFO priorities predictable should make life easier for realtime system designers, who are unlikely to mind having fewer variables to worry about.

Index entries for this article
KernelModules/Exported symbols
KernelRealtime
KernelScheduler/Realtime


to post comments

Controlling realtime priorities in kernel threads

Posted Apr 26, 2020 20:16 UTC (Sun) by pfmoldau (subscriber, #124842) [Link] (2 responses)

From a realtime system developers point of view, I would favour an interface where I (as an embedded system developer) could control the selected priorities via some kind of runtime option.
I.e. currently all IRQ threads are at prio 50, but I'd like to create a rule set that changes *one specific* interrupt to prio 60 to make it more important than the others...

Controlling realtime priorities in kernel threads

Posted Apr 28, 2020 7:37 UTC (Tue) by Villemoes (subscriber, #91911) [Link]

Indeed, and that was what I was hoping was coming when I read the title of the article, so I was a bit disappointed when it was more "standardize on one or two specific values".

I sent an RFC about a year ago in the hope that someone would say "no, that's not how the API should be, _this_ is how it should look". https://lore.kernel.org/lkml/20190516144937.20101-1-linux... in case anyone is interested.

Controlling realtime priorities in kernel threads

Posted Apr 29, 2020 14:25 UTC (Wed) by torbenh (guest, #76968) [Link]

You can change the priorities at runtime using chrt(1).

Have a look at https://github.com/rncbc/rtirq for an init script.
Its already packaged for debian, at least. https://packages.debian.org/buster/rtirq-init


Copyright © 2020, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds