An EEVDF CPU scheduler for Linux
An EEVDF CPU scheduler for Linux
Posted Mar 9, 2023 19:04 UTC (Thu) by geofft (subscriber, #59789)In reply to: An EEVDF CPU scheduler for Linux by dullfire
Parent article: An EEVDF CPU scheduler for Linux
Unfortunately, the CFS quota mechanism tends to result in a lot of weird runtime behavior; the high-level problem, I think, is that you can use a lot of CPU right after being scheduled without the scheduler stopping you, especially in a multi-threaded process, and once you get descheduled, the scheduler will realize that you're so deeply in the red for quota that it won't reschedule you for seconds or even minutes afterwards, which isn't really what users - or the TCP services they talk to - expect. So even though Kubernetes turns it on by default, lots of operators turn it off in practice. It's gotten better recently (see https://lwn.net/Articles/844976/, which also does a better job of explaining the overall mechanism than I'm doing) but I haven't gotten a chance to try it yet.
I'd be curious to know whether EEVDF can implement the quota concept in a way that is less jittery.
Posted Mar 9, 2023 22:33 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (1 responses)
I'd be surprised if CPU was actually allocated as "so much each second", I'd allocate it per cycle. So effectively you should allocate a bunch of CPU to all your processes, and then when it runs out you go round the cycle and allocate again. That way your CPU is not idle if you have processes waiting.
Of course, there's plenty of other things to take into account - what happens if a process fork-bombs or something? And this idea of smaller chunks increasing your likelihood of scheduling might not be so easy to implement this way.
Actually, it's given me an idea for a fork-bomb-proof scheduler :-) If, instead of forking processes each getting a fresh new slice of CPU, you set a limit of how much is available each cycle. Let's say for example that we want a cycle to last a maximum of 2 secs, and the default allocation is 100ms. That gives us a maximum of 20 processes getting a full timeslice allocation. So when process 20 forks, the parent loses half its allocation to its child, giving them 50ms each. Effectively, as it forks the "nice" value goes up. Then as processes die their slice gets given to other processes.
So a fork bomb can't DoS processes that are already running. And if say you had a terminal session already running it at least stands a chance of getting going and letting you kill the fork-bomb ...
Cheers,
Posted Mar 11, 2023 19:30 UTC (Sat)
by NYKevin (subscriber, #129325)
[Link]
I imagine this is resolved by the "virtual time" that corbet mentioned in another comment. If you have too many processes, your virtual clock runs fast, so now there are more than 1000 (virtual) ms in a (real) second, and everybody gets scheduled as allocated. It's just that the 100 (virtual) ms that they get is significantly less than 100 real milliseconds. This is mathematically equivalent to multiplying everyone's quota by 10/11, but you don't have to actually go around doing all those multiplies and divides, nor do you have to deal with the rounding errors failing to line up with each other.
Of course, that wouldn't work for realtime scheduling (where you have actually given each process a contractual guarantee of 100 ms/s, and the process likely expects that to be 100 *real* milliseconds), but we're not talking about that. If you try to configure SCHED_DEADLINE in such a way, it will simply refuse the request as impossible to fulfill.
Posted Mar 17, 2023 20:49 UTC (Fri)
by prauld (subscriber, #39414)
[Link]
An EEVDF CPU scheduler for Linux
Wol
An EEVDF CPU scheduler for Linux
An EEVDF CPU scheduler for Linux
