|
|
Log in / Subscribe / Register

Migration disable for the mainline

By Jonathan Corbet
November 9, 2020
The realtime developers have been working for many years to create a kernel where the highest-priority task is always able to run without delay. That has meant a long process of finding and fixing situations where high-priority tasks might be blocked from running; one of the persistent problems in this regard has been kernel code that disables preemption. One tool that the realtime developers have reached for is disabling migration (moving a process from one CPU to another) rather than preemption; this approach has not been entirely popular among scheduler developers, though. Even so, the solution would appear to be this migration-disable patch set from scheduler developer Peter Zijlstra.

One of the key scalability techniques used in the kernel is per-CPU data. System-wide locking is an effective way of protecting shared data, but it can kill performance in a number of ways, even if a given lock is itself not heavily contested. Any data structure that is only accessed by a single CPU does not need to be protected by system-wide locks, avoiding this problem. Thus, for example, the memory allocators maintain per-CPU lists of available memory that can be handed out without interference from the other CPUs on the system. But kernel code can only safely manipulate per-CPU data if it has exclusive access to the CPU; if some other process is able to jump in, it could find (or create) inconsistent per-CPU data structures. The normal way to prevent this from happening is to disable preemption when necessary; it is a cheap operation (setting a flag, essentially) that ensures that a given task will not be interrupted until its work is done.

Disabling preemption runs afoul of the goals of the realtime developers, who have put so much work into ensuring that any given task can be interrupted if a higher-priority task needs the CPU. As they have worked to remove preemption-disabled regions, they have observed that, often, all that is really needed is to keep tasks from being moved between CPUs while they are accessing per-CPU data, with perhaps some (normally CPU-local) locking as well. See, for example, the kmap_local() work. Disabling migration still allows a process to be preempted, so it does not interfere with the goals of the realtime project — or so those developers hope.

Disabling migration brings problems of its own, though. The kernel's CPU scheduler is tasked with making the best use of all of the CPUs in the system. If there are N CPUs available, they should be running the N highest-priority tasks at any given time. That goal cannot be achieved without occasionally moving tasks between CPUs; it would be nice if tasks just happened to land on the right processors every time, but the real world is not like that. Depriving the scheduler of the ability to migrate tasks, even for brief periods, thus takes away a tool that is crucial for the overall behavior and throughput of the system.

As a simple example of what can happen, consider a system with two CPUs and two tasks, of which only the lower-priority task is runnable. That task enters a migration-disabled section at the same time that the high-priority task becomes runnable on the same CPU. The low-priority task will be duly preempted so that the high-priority task can run. That low-priority task still needs CPU time, though, and meanwhile the other CPU is sitting idle. Normally the scheduler would just migrate the low-priority task over to the idle CPU and allow it to continue but, since that task has disabled migration, it remains stuck and unable to run. Migration disable thus differs from preemption disable, which does not risk creating stuck processes in this way.

So it is not entirely surprising that the migration-disable capability has not been greeted with open arms by mainline scheduler developers. Those same developers, though (and Zijlstra in particular) understand what is driving this work. So, when Thomas Gleixner posted a migration-disable patch set in September, Zijlstra declined to apply it, but he also went to work to create an alternative that would be acceptable from a scheduling point of view — on realtime kernels, at least.

The patch adding the core machinery makes it clear in a leading comment that the migration disable feature is "(strongly) undesired". It goes on:

This is a 'temporary' work-around at best. The correct solution is getting rid of the above assumptions and reworking the code to employ explicit per-cpu locking or short preempt-disable regions.

The end goal must be to get rid of migrate_disable(), alternatively we need a schedulability theory that does not depend on arbitrary migration.

There are a couple of particularly tricky areas when it comes to making migration disable work properly. One of those, naturally, is CPU hotplug, which has already shown itself to be a difficult area in the past. If a CPU is to be removed from the system, one should first migrate all running processes elsewhere to avoid the even trickier problem of irate users. But if some of those processes have disabled migration, that cannot be immediately done. So the hotplug mechanism had to gain a count of how many tasks in each run queue have disabled migration, and to wait until that number drops to zero.

Then, there is the issue of blocked tasks described above: there may be a CPU available to run a lower-priority task that has been preempted, but the disabling of migration prevents the task from moving to that available CPU. In a truly pathological situation, several preempted tasks could end up stacked on a CPU and unable to migrate while most of the system remains idle. This sort of violation of work conservation does not improve the mood of scheduler developers — and they already have a reputation for grumpiness.

The approach taken to this problem is not a perfect solution (which may not exist), but hopefully it helps. If a CPU's run queue contains a task that is runnable, but which has been preempted by a higher-priority task, the normal response would be to try to migrate the preempted task elsewhere. If migration has been disabled, that cannot happen, obviously. So the scheduler will try, instead, to migrate the running, higher-priority task to get it out of the way. That is not ideal; migration has its costs, including the potential loss of cache locality, that will now be paid by the higher-priority task. Or, as Zijlstra put it: "This adds migration interference to the higher priority task, but restores bandwidth to system that would otherwise be irrevocably lost".

Finally, it's worth pointing out that migration disable will be limited to kernels configured for realtime operation. On everything else, a call to migrate_disable() will disable preemption, as is done now. So behavior for most users will not change, at least not directly. But this is another important step toward getting the realtime preemption patches fully migrated into the mainline after all these years.

Index entries for this article
KernelRealtime
KernelScheduler


to post comments

Migration disable for the mainline

Posted Nov 9, 2020 22:33 UTC (Mon) by tglx (subscriber, #31301) [Link] (2 responses)

> Finally, it's worth pointing out that migration disable will be limited to kernels configured for realtime operation.

That's not entirely correct. Since the scheduler people solved the problem (at least to the extent it is solvable today) there is no real reason anymore to make this an RT only functionality.

The proposed kmap_local() facility depends on the general availability of migrate_disable(). See: https://lwn.net/Articles/836144/ and especially the patch in this series which lifts that restriction: https://lore.kernel.org/r/20201103095858.928160966@linutr...

There are other valid reasons to expose the ability to disable migration without disabling preemption independent of RT.

Thanks,

tglx

Migration disable for the mainline

Posted Nov 9, 2020 22:36 UTC (Mon) by corbet (editor, #1) [Link] (1 responses)

True...I got buried in the current patch set and didn't think beyond it when I wrote that text..even after having linked to the article that contradicted it. I blame election distraction, I think that can properly excuse almost any mistake made in the last week.

Election distraction (was: Migration disable for the mainline)

Posted Nov 10, 2020 9:13 UTC (Tue) by tekNico (subscriber, #22) [Link]

> I blame election distraction, I think that can properly excuse almost any mistake made in the last week.

Nice one. :-)

Migration disable for the mainline

Posted Nov 9, 2020 23:31 UTC (Mon) by dxin (guest, #136611) [Link] (7 responses)

I'm surprised that scheduler people actually care about throughout in the RT case. I wonder if RT people themselves care nearly as much.

BTW what's the corporate force driving RT now, Android graphics stack?

Migration disable for the mainline

Posted Nov 10, 2020 4:40 UTC (Tue) by alison (subscriber, #63752) [Link]

> what's the corporate force driving RT now

Wall St. and robots (two different items!).

Migration disable for the mainline

Posted Nov 10, 2020 14:53 UTC (Tue) by Paf (subscriber, #91811) [Link]

I think most everyone cares about throughput, eventually. If your throughput isn’t high enough, you will eventually slip your real time commitments by not being able to get the work done, regardless of how you schedule it. And if that’s not a problem in your setting, then you could probably use cheaper hardware (ie, you’ve got hardware capacity to waste) until, oops, it’s a problem again.

Migration disable for the mainline

Posted Nov 10, 2020 15:10 UTC (Tue) by pbonzini (subscriber, #60935) [Link] (3 responses)

Throughput is important if this functionality ever ends up being used also by non-RT kernels.

Migration disable for the mainline

Posted Nov 10, 2020 18:21 UTC (Tue) by Wol (subscriber, #4433) [Link] (2 responses)

I would have thought this would IMPROVE throughput, by locking a thread to a CPU it gets rid of all the switching overhead ...

(Yes I know that's a simplistic view, but often simplistic is good enough ... until it isn't :-)

Cheers,
Wol

Migration disable for the mainline

Posted Nov 10, 2020 21:15 UTC (Tue) by edeloget (subscriber, #88392) [Link] (1 responses)

> I would have thought this would IMPROVE throughput, by locking a thread to a CPU it gets rid of all the switching overhead ...

The goal of the RT kernel is to improve latency, and this might be detrimental to throughput. You want your tasks to run on time with a good enough throughput which is not the same as running with the highest possible throughput while being late from time to time.

You already have the ability to lock a thread on a particular logical CPU through the use of pthread_set_affinity_np(3).

Migration disable for the mainline

Posted Nov 12, 2020 16:34 UTC (Thu) by Wol (subscriber, #4433) [Link]

As jschrod said, the goal of RT is *deterministic* latency, not improved latency. If I wrote a patch that halved average latency, but doubled the maximum latency, the RT guys would laugh it out of court.

If, however, I wrote a patch that made sure I could set a maximum latency of six months, and be confident it would be honoured, they'd be quite happy with that!

Cheers,
Wol

Migration disable for the mainline

Posted Nov 11, 2020 10:15 UTC (Wed) by squeed (subscriber, #87316) [Link]

> BTW what's the corporate force driving RT now?

Telcos as well. 5G brings some pretty strict timing requirements. And, (like everyone else), everyone is trying to get off of special-purpose hardware.

Migration disable for the mainline

Posted Nov 10, 2020 20:38 UTC (Tue) by tnemeth (guest, #37648) [Link] (2 responses)

> If a CPU's run queue contains a task that is runnable, but which has been preempted by a higher-priority task, the normal response would be to try to migrate the preempted task elsewhere. If migration has been disabled, that cannot happen, obviously. So the scheduler will try, instead, to migrate the running, higher-priority task to get it out of the way.

If, when a higher-priority task is being woken up, it will preempt a per-CPU data holding task, wouldn't it be faster to wake it up directly on another CPU rather than on the same and trying to move it afterward ?

Migration disable for the mainline

Posted Nov 10, 2020 23:51 UTC (Tue) by willy (subscriber, #9762) [Link]

Not necessarily; you usually want to wake up a task on the CPU it last ran on, because it may have data still in the CPU cache

Migration disable for the mainline

Posted Nov 11, 2020 18:11 UTC (Wed) by matthias (subscriber, #94967) [Link]

> If, when a higher-priority task is being woken up, it will preempt a per-CPU data holding task, wouldn't it be faster to wake it up directly on another CPU rather than on the same and trying to move it afterward ?

The other CPU might only become available while the higher-priority task is already running and already has preempted the per-CPU data holding tasks.

Migration disable for the mainline

Posted Nov 11, 2020 20:36 UTC (Wed) by RogerOdle (guest, #60791) [Link] (4 responses)

Why isn't shielding a CPU core from the scheduler and putting the RT process on that core sufficient? High speed, latency sensitive processes are usually very tight loops that are deliberately made simple. It is unusual to break these processes into multiple threads as that effects deterministic processing. They may run as a polling loop with no scheduler at all in order to eliminate the latency hit from interrupts.

This seems to be a solution for some middle-ground problem where low(er) latency is desired but the highest possible performance isn't. I can't think of a particular use of this that would suit my needs better than assigning cores does now.

Migration disable for the mainline

Posted Nov 12, 2020 15:54 UTC (Thu) by jschrod (subscriber, #1646) [Link]

The goal of RT is deterministic latency, not lower latency.

Enabling preemption often yields lower latency, but that's a side effect.

Migration disable for the mainline

Posted Nov 24, 2020 10:49 UTC (Tue) by roblucid (guest, #48964) [Link] (2 responses)

There was a deadline scheduler, it would be a fair guarantee to run such latency guarantee RT tasks on a deadline core, when they specify their requirements, the CPU local code can run in unallocated time, so long as these tasks can only demand a fractional utilisation.
But I suspect that many of these RT systems are also embedded and they want to cheap out, when it comes to cores, rather than pay for the deterministic behaviour with spare compute.
That's the reason they want absolute priority over even critical OS tasks, attempting to export their problem for solution in other peoples code

Migration disable for the mainline

Posted Nov 24, 2020 18:47 UTC (Tue) by Wol (subscriber, #4433) [Link] (1 responses)

> But I suspect that many of these RT systems are also embedded and they want to cheap out

Hang on a cotton-pickin' minute ... !!!

The WHOLE POINT of an embedded system is to run the APPLICATION. And if the OS can't or won't get out of the way, then it's the OS's fault!

Why should a manufacturer (and hence tthe customer) have to pay extra for a super-duper system to run software they don't even care about!

Cheers,
Wol

Migration disable for the mainline

Posted Jan 12, 2021 19:23 UTC (Tue) by immibis (subscriber, #105511) [Link]

They don't, and yet they chose to...


Copyright © 2020, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds