(Nearly) full tickless operation in 3.10
Linux has had a partial solution to the timer tick problem for years in the form of the CONFIG_NO_HZ configuration option. If that option is set, the timer tick will be turned off, but only when the CPU is idle. This mode improves the situation considerably; it allows idle CPUs to stay in deep sleep states, reducing power use. Systems with virtualized guests also benefit, since, otherwise, each guest would be servicing timer interrupts when it should be doing nothing. In short: disabling the timer tick when the processor is idle makes enough sense that most distributions do it by default.
Indeed, given that letting sleeping CPUs lie is generally a good policy, one might wonder why this behavior is optional at all. The answer is that it increases the cost of moving into and out of the idle state, (very) slightly increasing the time it takes to get an idle CPU back to work. That cost may be considered excessive in highly latency-sensitive environments. For everybody else, disabling the timer tick for idle CPUs is almost certainly the right thing to do; for battery-powered systems that is doubly true.
The next step — disabling the tick for non-idle processors — is a lot more work with a smaller reward, so it is not surprising that it has taken a while to come about. Frederic Weisbecker finally took up the challenge in 2010; after a lot of changes and help by others (Paul McKenney made some significant RCU changes, for example), this work has been pulled into the 3.10 kernel.
In 3.10, the CONFIG_NO_HZ option has been replaced by a three-way choice:
- CONFIG_HZ_PERIODIC is the old-style mode wherein the timer tick runs at all times.
- CONFIG_NO_HZ_IDLE (the default setting) will cause the tick to be disabled at idle, the way setting CONFIG_NO_HZ did in earlier kernels.
- CONFIG_NO_HZ_FULL will enable the "full" tickless mode.
The build-system code has been set up so that "make oldconfig" on 3.10 should yield a configuration that matches the previous setting of CONFIG_NO_HZ with no intervention required. Full tickless mode defaults to off; selecting that mode will enable tasks to run without the timer tick, but there are a number of things to be aware of.
Among those are the requirement that the CPUs available for running without a timer tick must be designated at boot time using the nohz_full= command-line parameter. The boot CPU cannot run in this mode — at least one CPU needs to continue to receive interrupts and do the necessary housekeeping. The CONFIG_NO_HZ_FULL_ALL configuration option causes all CPUs (other than the boot CPU) to run in the full tickless mode by default; it can still be overridden with nohz_full=, though. The set of full tickless CPUs cannot be changed after boot; the amount of work required to make that possible would be large, and there does not seem to be a pressing need for this ability.
Even then, a running CPU will only disable the timer tick if there is a single runnable process on that CPU. As soon as a second process appears, the tick is needed so that the scheduler can make the necessary time-slice decisions. And even with a single runnable process, it is not technically tickless, since the timer tick still needs to happen at least once per second to keep the scheduler happy. But dropping from as high as 1000Hz to 1Hz is obviously a significant improvement. Response-time jitter due to timer interrupts will be nearly eliminated, and, according to Ingo Molnar, as much as 1% of the CPU's time will be saved.
There are workloads out there that will benefit significantly from those improvements. High-performance computing (HPC) and realtime are obvious candidates; in both cases, dedicating a CPU to a single task is a fairly common tactic already. But, in an era where even phones have quad-core processors, having a single runnable process on a given CPU is not an uncommon situation.
There are a lot more details to making full tickless operation work properly; setting up a system to use this feature requires a fair amount of fiddling at this time. At a minimum, the administrator should make extensive use of CPU affinities to keep unwanted processes (including kernel threads) off the relevant processors. Some RCU configuration is required as well; see Documentation/timers/NO_HZ.txt for lots of details on the various options.
Full tickless operation, as seen in 3.10, is clearly a significant step forward, but, equally clearly, this project is not yet complete. There is a fair amount of detail work to be done, including making the feature work on 32-bit processors (a patch exists), getting rid of that final once-per-second tick, mitigating some unfortunate side effects on the scheduler's statistics and load balancing, and fixing the inevitable bugs. This is a large and invasive change to how the core kernel works; there will almost certainly be some surprising behaviors that emerge once the tickless mode starts to get wider testing.
The biggest item on the "to do" list, though, must surely be getting rid of the single-runnable-process requirement. Just in case the developers involved did not already feel that way, Linus made his opinion on the matter clear:
So, chances are, this limitation will be removed from the tickless
implementation in some future development cycle, along with the other
various rough edges. In the meantime, the 3.10 kernel will contain a
significant step forward in the evolution of the core Linux kernel: the
partial removal of a source of latency and overhead that has been there
since the very first kernel release. Not even the big kernel lock endured
anywhere near that long.
Index entries for this article | |
---|---|
Kernel | Dynamic tick |
Kernel | Read-copy-update |
Posted May 8, 2013 16:12 UTC (Wed)
by busterb (subscriber, #560)
[Link] (9 responses)
Requiring the user to explicitly schedule with 'sched_yield' to produce a cooperative multitasking scenario might work well. I've worked on enough realtime systems to know that this can yield good performance if done correctly (since your performance-critical sections never get interrupted without you saying so), though it occasionally bites you if you forget to yield somewhere, e.g. waiting for a lock. I know I would have liked to have had something like this in past system designs Instead, 'yielding' was done with coroutines or tasklets/threads.
What about having the boot cpu do all the scheduling for all the other CPUs? Basically treat the other CPUs like a thread pool and distribute processes. Is the cost of IPIs too much for this to be practical?
Posted May 8, 2013 16:22 UTC (Wed)
by busterb (subscriber, #560)
[Link]
Posted May 8, 2013 16:24 UTC (Wed)
by simlo (guest, #10866)
[Link] (5 responses)
Posted May 8, 2013 18:50 UTC (Wed)
by blitzkrieg3 (guest, #57873)
[Link]
Posted May 8, 2013 19:48 UTC (Wed)
by intgr (subscriber, #39733)
[Link] (3 responses)
> Even then, a running CPU will only disable the timer tick if there is a single runnable process on that CPU. As soon as a second process appears, the tick is needed so that the scheduler can make the necessary time-slice decisions
It's misleading to call it "the tick" if it's not fixed to the HZ any more, seems more like a preemption timer.
Posted May 8, 2013 19:52 UTC (Wed)
by corbet (editor, #1)
[Link] (2 responses)
Posted May 8, 2013 20:23 UTC (Wed)
by intgr (subscriber, #39733)
[Link] (1 responses)
AFAICT the scheduler doesn't switch tasks at every timer tick, even when there is contention for a CPU -- it has its own concept of timeslice length that changes with load. So why does a contended CPU need to run the timer tick if it's not going to switch tasks?
And grandparent wrote something that seemed to match that line of thinking:
> After a little RTFC, I found that a HR-timer was used to calculate the next preemption point. I.e. instead of preempting on 100 Hz clock, it preempts exactly when the timeslot of the current process ends.
Posted May 9, 2013 2:16 UTC (Thu)
by nevets (subscriber, #11875)
[Link]
The hrtick is used to denote exact time slices for the CFS scheduler to create more fairness. It really doesn't do much more than that. But this does not replace the scheduler_tick, which does among other things, keeps track of the SCHED_RR time slices, manages load balancing, and updates task timings.
But I'm sure in the future the hrtick may be used more to get rid of the periodic tick.
Posted May 13, 2013 11:57 UTC (Mon)
by dps (guest, #5725)
[Link] (1 responses)
I know actually implementing this would not be trivial and I am not volunteering to do this myself.
Posted May 14, 2013 12:18 UTC (Tue)
by Tobu (subscriber, #24111)
[Link]
Posted May 8, 2013 16:30 UTC (Wed)
by linuxjacques (subscriber, #45768)
[Link] (1 responses)
The 64-bit only constraint makes me wonder how much this has been tested on non-x86 archs.
I'm interested in it for 32-bit ppc.
Posted May 9, 2013 20:58 UTC (Thu)
by bagder (guest, #38414)
[Link]
Posted May 8, 2013 18:02 UTC (Wed)
by gby (guest, #23264)
[Link] (2 responses)
Posted May 8, 2013 18:29 UTC (Wed)
by corbet (editor, #1)
[Link] (1 responses)
Posted May 9, 2013 0:18 UTC (Thu)
by simlo (guest, #10866)
[Link]
Posted May 8, 2013 22:56 UTC (Wed)
by bgmarete (guest, #47484)
[Link] (9 responses)
Also, how does this relate to CPU hyperthreads? Can a hyperthread be in tickless mode while a sibling hyperthread is not? Or must the entire core be woken up (or not) by the timer ticks?
Posted May 8, 2013 23:05 UTC (Wed)
by corbet (editor, #1)
[Link] (8 responses)
Posted May 9, 2013 1:15 UTC (Thu)
by bgmarete (guest, #47484)
[Link] (7 responses)
I would still like to know if each hyperthread can be independently put into full tickless mode (independent, that is, from its sibling), with the attendant power savings (if any). (Assume that I am interested only in saving battery power).
Posted May 9, 2013 13:14 UTC (Thu)
by sheepdestroyer (guest, #54968)
[Link] (6 responses)
Posted May 9, 2013 15:59 UTC (Thu)
by drago01 (subscriber, #50715)
[Link] (5 responses)
Posted May 10, 2013 8:43 UTC (Fri)
by akeane (guest, #85436)
[Link] (4 responses)
Really, "cooperative" multi-tasking?
Really?
Really...
Posted May 10, 2013 18:59 UTC (Fri)
by PaulMcKenney (✭ supporter ✭, #9624)
[Link]
If there is only one CPU-bound task runnable on a given CPU, there is no point in any scheduling decisions.
If there are multiple tasks runnable on a given CPU, and if the currently running task is CPU-bound, then there is no point in any scheduling decisions until the next timeslice.
Of course, things might change in the meantime, but in that case, this CPU will receive an interrupt and can therefore adjust as appropriate at that point in time.
Posted May 13, 2013 21:46 UTC (Mon)
by chloe_zen (guest, #8258)
[Link] (2 responses)
Posted May 20, 2013 23:13 UTC (Mon)
by marcH (subscriber, #57642)
[Link] (1 responses)
Posted May 21, 2013 7:49 UTC (Tue)
by cladisch (✭ supporter ✭, #50193)
[Link]
> ticks to look some day as outdated as polling?
Polling is regularly checking the status, just because something that needs handling might have happened.
Ticks are polling.
Posted May 22, 2013 7:45 UTC (Wed)
by chenlb206 (guest, #86317)
[Link]
Posted May 24, 2013 7:56 UTC (Fri)
by ajaycavium (guest, #91111)
[Link] (2 responses)
Posted May 28, 2013 8:29 UTC (Tue)
by Ralf (guest, #40688)
[Link] (1 responses)
Posted May 29, 2013 9:46 UTC (Wed)
by Ralf (guest, #40688)
[Link]
Please send test reports to linux-mips@linux-mips.org. Thanks!
Posted Jul 1, 2013 12:28 UTC (Mon)
by methanol (guest, #91650)
[Link]
Posted Jul 16, 2013 21:31 UTC (Tue)
by ParadoxUncreated (guest, #87037)
[Link] (1 responses)
http://ovekarlsen.com/Blog/turning-ubuntu-12-04-into-a-pr...
Choosing components for less jitter, reducing timer interrupts to 90hz, enabling low-latency behaviour etc, + renicing X etc, made doom 3 run perfectly on a core 2 duo, with a GTX 280. On Quakecon 2012 Carmack says that game is still taxing on some configurations. It actually requires an Intel E5 workstation with low jitter hardware, to run as good in a tweaked windows XP.
The game does 3 passes to OpenGL pr frame. That seems to make it very jitter-sensitive. But with a low-jitter configured kernel, it runs perfectly. That seems to apply to things like Wine also. Also Wine-games such as Half-life 2 still had some jitter, but much less.
So low-jitter matters, if you want smooth gameplay on advanced game-engines. And a lot of data is pushed there. Generally the system seems very nice and responsive also. Much more like an optimal desktop system indeed.
Peace Be With You.
Posted Jul 17, 2013 2:56 UTC (Wed)
by hummassa (subscriber, #307)
[Link]
Posted Aug 2, 2013 6:37 UTC (Fri)
by YAK (guest, #91961)
[Link]
Posted Oct 9, 2018 1:44 UTC (Tue)
by henryc (guest, #127733)
[Link]
I thought it wasn't possible to affine migration and kworker processes to other cores as they are per-core processes that bind to the specific cores. RCU related processes can be moved as there is a kernel parameter for that.
The article suggests that in order for a core to be in full nohz mode, "a running CPU will only disable the timer tick if there is a single runnable process on that CPU". It sounds like it is necessary to move kernel processes such as migration and kworker out of the core. Or kernel processes don't count? I am a bit confused.
If kernel processes count, then how to move them to other cores? Can they be moved to another physical CPU?
Thanks.
(Nearly) full tickless operation in 3.10
(Nearly) full tickless operation in 3.10
(Nearly) full tickless operation in 3.10
(Nearly) full tickless operation in 3.10
(Nearly) full tickless operation in 3.10
Confused. If two processes are running, the period tick will run just like it does now. What am I missing? What is misleading?
(Nearly) full tickless operation in 3.10
(Nearly) full tickless operation in 3.10
(Nearly) full tickless operation in 3.10
(Nearly) full tickless operation in 3.10
Part of the work will be to adjust the various accumulators, statistics, governors and feedback mechanisms to work when the timeslice they measure isn't constant any more.
(Nearly) full tickless operation in 3.10
(Nearly) full tickless operation in 3.10
(Nearly) full tickless operation in 3.10
(Nearly) full tickless operation in 3.10
See this commit. One tick per second is still needed.
One-second tick
One-second tick
(Nearly) full tickless operation in 3.10
If you want tickless for latency reasons, the last thing you're going to do is turn on hyperthreading.
Hyperthreading
Hyperthreading
Hyperthreading
Hyperthreading
Hyperthreading
Hyperthreading
Hyperthreading
Hyperthreading
ticks vs. polling
(Nearly) full tickless operation in 3.10
Does Tickless supports for Octeon-2 and ARM as well ?
Does Tickless supports for Octeon-2 and ARM as well ?
Does Tickless supports for Octeon-2 and ARM as well ?
(Nearly) full tickless operation in 3.10
(Nearly) full tickless operation in 3.10
(Nearly) full tickless operation in 3.10
(Nearly) full tickless operation in 3.10
kernel thread/process and full tickless mode?