LWN: Comments on "Completing the EEVDF scheduler"

Memory or heat lag?

fest3er — Sat, 08 Jun 2024 05:14:46 +0000

Agreed. Inadequte heat rejection destroys lots of computers and other electronic equipment.

It's a desktop with generally adequate cooling (an Antec case I bought for the Phenom-II I had years ago, and standard, non-exotic cooling).

I might try a linux build on my Zenbook Ryzen 9 6900H laptop for comparison. And the same build on my desktop. Limit each to 12 concurrent tasks (-j 12). They should be about the same (desktop better cooling but slower NVME), laptop faster CPU/RAM/NVME but maybe more CPU heat throttling, and both happy to run at much higher temps than the Vishera liked).

Memory or heat lag?

raven667 — Tue, 30 Apr 2024 04:02:41 +0000

Was that on a laptop or desktop with better cooling? The raw cpu/gpu speed on portable computers is often wasted because they have no way to reject enough heat in that volume to actually run at full speed for anything other than _very_ short bursts. I've seen my i9/64GB laptop throttle down to ~700Mhz if it gets too busy/hot, which introduces a ton of latency to everything.

Completing the EEVDF scheduler

fest3er — Tue, 30 Apr 2024 03:20:44 +0000

I had a Vishera 8350 (IIRC) with 16GiB RAM. The entire build of Smoothwall Express used about 13GiB RAM (cache). Back when parallel compile worked, the Linux build would use all eight (semi-)cores and finish the build in about five minutes. I presently have a Ryzen 9 3950X 16 core CPU with 32GiB RAM. Even building in a QEMU VM on the Ryzen was as fast as running on the host. While I wasn't using the UI extensively during compiles, I never noticed any lag in response time. In fact, going back even further (to, oh, 2003 or so), once I switched to multi-core CPUs, UI and user response lag all but disappeared. (Except when I was using early-ish EXT4; flushing dirty pages to disk did cause the UI and computer to 'freeze' periodically.) In addition, I found that the runtime difference between using a pre-loaded RAM disk (with cache emptied) and letting Linux cache the disk as needed was essentially the time needed to initially read the data from the HD. Hard drives were pretty much Hitachi/Toshiba 1TB SATA drives. Even the Phenom-II-965 I had before the Vishera was well behaved.

Is it possible that memory pressure is the source of that UI lag (such as parts of the UI being swapped/paged out)?

Completing the EEVDF scheduler

iq-0 — Sun, 21 Apr 2024 10:13:23 +0000

So next we're gonna have a workload monitoring job that analyses the runtime behavior of running threads and adjusts their timeslices accordingly? Possibly by running intermittently for short period to sample the then active tasks.

Completing the EEVDF scheduler

shironeko — Sun, 14 Apr 2024 20:12:44 +0000

I'm not sure I understood what you are describing, are you describing a way to maintain the "net lag=0" invariant? All I said was that invariant seem to be important for the algorithm.

The [19]Earliest Virtual Deadline First (EEVDF) scheduler

mirabilos — Sat, 13 Apr 2024 21:37:40 +0000

WTF says EEVDF stands for earliest eligible virtual deadline first, FWIW.

Completing the EEVDF scheduler

heftig — Sat, 13 Apr 2024 15:11:04 +0000

I have an Intel laptop with a i7-1260P (4P + 8E cores) and UI becomes very laggy when a compilation utilizes all available CPU time. This is with the compositor, apps and the compilation all in different CPU cgroups.

The slowdown may be coming from the thermal limit slowing down the entire processor plus desktop apps getting scheduled to E cores.

I also have an AMD laptop (7840U, 8 cores) with almost the same software where compilation doesn't affect UI nearly as much, even when the system hits its thermal limit.

Completing the EEVDF scheduler

flussence — Fri, 12 Apr 2024 14:48:43 +0000

DOI:10.1145/2901318.2901326 is the most well known one... that paper provoked *some* fixes, but CFS was still exhibiting the same symptoms up until last year.

Completing the EEVDF scheduler

willy — Fri, 12 Apr 2024 02:24:33 +0000

I suspect if you actually delve into what's going on with profiles, you'll find that you're trying to swap something in, so you're actually waiting on I/O. Having an SSD will be your best improvement, but an I/O scheduler might be able to help you. A CPU scheduler can't do much to help you.

Completing the EEVDF scheduler

Wol — Fri, 12 Apr 2024 01:22:44 +0000

If the UI needs to read from disk? That's a likely cause of the slowdown.

And on a stack like mine (dm-integrity, raid, llvm) disk i/o can also chew up cpu ...

Cheers,
Wol

Lag inheritance?

glenn — Fri, 12 Apr 2024 01:04:19 +0000

EEVDF resembles plain old EDF (SCHED_DEADLINE) pretty well. I wonder if "lag inheritance" (analogous to priority/deadline inheritance) would improve the responsiveness of blocked higher-priority EEVDF threads in a meaningful way.

I have certainly run into priority-inheritance-like problems when SCHED_IDLE and SCHED_OTHER threads contend for a mutex. A SCHED_IDLE thread grabs a mutex, and then SCHED_OTHER threads blocked for the mutex experience a classic priority inversion problem: The mutex-holding SCHED_IDLE thread keeps getting preempted because of it's SCHED_IDLE. The scheduler doesn't know that it's impeding the progress of blocked SCHED_OTHER threads.

Completing the EEVDF scheduler

shuhao — Thu, 11 Apr 2024 23:50:04 +0000

Do you have something hat I can read to learn more about the problems?

Completing the EEVDF scheduler

atai — Thu, 11 Apr 2024 23:19:13 +0000

The Linux kernel tends to have a behavior experienced by developers: when the machine is busy compiling/building, even if the number of cores used is below the number of cores of the CPU, the machine UI may temporarily freeze when there is heavy IO going on. Any scheduler can keep the UI responsive in this condition regardless of the heavy load due to running a build/compile?

Completing the EEVDF scheduler

Wol — Thu, 11 Apr 2024 22:19:45 +0000

Dunno. See my other comment.

As each task is rescheduled (be it completed or timeslice expired) you sum lag across all tasks that are not "sleeping with +ve lag", and if that is -ve, you cancel that negativity by creating and sharing the matching positivity across across those tasks ...

So you can't get into an "all jobs have -ve lag" state.

The only problem is it's possible you'd have all tasks with matching 0 lag, but that is rather unlikely ...

Cheers,
Wol

Zero total lag

Wol — Thu, 11 Apr 2024 22:10:21 +0000

> The EEVDF paper describes some elaborate schemes for distributing that lag across the remaining tasks, but I'm not sure the Linux scheduler does that.

Surely the easy way to do that is, as each task exits its timeslice and is allocated -ve lag, that same (+ve) lag is shared out amongst the other tasks. Any sleeping tasks with -ve lag less than their share simply get the -ve lag wiped into it and the share is recalculated. And if the running task exits with +ve lag, that lag is suspended and then merged with the -ve lag accumulated by the next task to run before being shared.

So basically, every time a timeslice expires, that's when everything adds up to 0.

Cheers,
Wol

Completing the EEVDF scheduler

flussence — Thu, 11 Apr 2024 16:37:03 +0000

I've spent almost two decades whining about the linux scheduler to anyone who'd listen, so take this in that context:

EEVDF has been great. MPI/thread-heavy workloads just work and I've felt no temptation to go back to third party patchsets.

Completing the EEVDF scheduler

shironeko — Thu, 11 Apr 2024 15:44:59 +0000

I think it is a critical condition for the scheme to work well, otherwise too much/not enough jobs are marked eligible depending on if the net is negative or positive. In the worse case all jobs can have a negative lag and be ineligible and that is going to be a bad time.

Zero total lag

corbet — Thu, 11 Apr 2024 15:26:14 +0000

Yes, I believe the "sum of all lag is zero" property is intentional. The changes described in the article would tend to conserve that property. That said, there are complications when, for example, a task with a non-zero lag simply exits. The EEVDF paper describes some elaborate schemes for distributing that lag across the remaining tasks, but I'm not sure the Linux scheduler does that.

Completing the EEVDF scheduler

brchrisman — Thu, 11 Apr 2024 15:00:19 +0000

"One property of the EEVDF scheduler that can be seen in the above tables is that the sum of all the lag values in the system is always zero."

This is mentioned a little offhand, but is this intentional? And is it preserved with respect to the handling of processes with perhaps long sleep times mentioned later in the article? It sounds like the chosen strategy of letting slept processes accumulate lag time until they're positive, would maintain conservation of lag, whereas the decay-type possibilities discussed earlier in the article would potentially violate it without specific work to maintain that conservation.