|
|
Subscribe / Log in / New account

(Nearly) full tickless operation in 3.10

By Jonathan Corbet
May 8, 2013
On a typical Linux system, each running CPU will be diverted between 100 and 1000 times each second by the periodic timer interrupt. That interrupt is the CPU's cue to reconsider which process should be running, catch up with read-copy-update (RCU) callbacks, and generally handle any necessary housekeeping. This periodic "tick" can be reasonably compared to the infamous big kernel lock (BKL): it is convenient to have around, but it also has an effect on performance that makes developers wish to abolish it. The key difference might be that getting rid of the timer tick has taken rather longer than was required to eliminate the BKL. The 3.10 kernel will take an important step in that direction, though, with the addition of the "full NOHZ" mode — but a lot of limitations still apply.

Linux has had a partial solution to the timer tick problem for years in the form of the CONFIG_NO_HZ configuration option. If that option is set, the timer tick will be turned off, but only when the CPU is idle. This mode improves the situation considerably; it allows idle CPUs to stay in deep sleep states, reducing power use. Systems with virtualized guests also benefit, since, otherwise, each guest would be servicing timer interrupts when it should be doing nothing. In short: disabling the timer tick when the processor is idle makes enough sense that most distributions do it by default.

Indeed, given that letting sleeping CPUs lie is generally a good policy, one might wonder why this behavior is optional at all. The answer is that it increases the cost of moving into and out of the idle state, (very) slightly increasing the time it takes to get an idle CPU back to work. That cost may be considered excessive in highly latency-sensitive environments. For everybody else, disabling the timer tick for idle CPUs is almost certainly the right thing to do; for battery-powered systems that is doubly true.

The next step — disabling the tick for non-idle processors — is a lot more work with a smaller reward, so it is not surprising that it has taken a while to come about. Frederic Weisbecker finally took up the challenge in 2010; after a lot of changes and help by others (Paul McKenney made some significant RCU changes, for example), this work has been pulled into the 3.10 kernel.

In 3.10, the CONFIG_NO_HZ option has been replaced by a three-way choice:

  • CONFIG_HZ_PERIODIC is the old-style mode wherein the timer tick runs at all times.
  • CONFIG_NO_HZ_IDLE (the default setting) will cause the tick to be disabled at idle, the way setting CONFIG_NO_HZ did in earlier kernels.
  • CONFIG_NO_HZ_FULL will enable the "full" tickless mode.

The build-system code has been set up so that "make oldconfig" on 3.10 should yield a configuration that matches the previous setting of CONFIG_NO_HZ with no intervention required. Full tickless mode defaults to off; selecting that mode will enable tasks to run without the timer tick, but there are a number of things to be aware of.

Among those are the requirement that the CPUs available for running without a timer tick must be designated at boot time using the nohz_full= command-line parameter. The boot CPU cannot run in this mode — at least one CPU needs to continue to receive interrupts and do the necessary housekeeping. The CONFIG_NO_HZ_FULL_ALL configuration option causes all CPUs (other than the boot CPU) to run in the full tickless mode by default; it can still be overridden with nohz_full=, though. The set of full tickless CPUs cannot be changed after boot; the amount of work required to make that possible would be large, and there does not seem to be a pressing need for this ability.

Even then, a running CPU will only disable the timer tick if there is a single runnable process on that CPU. As soon as a second process appears, the tick is needed so that the scheduler can make the necessary time-slice decisions. And even with a single runnable process, it is not technically tickless, since the timer tick still needs to happen at least once per second to keep the scheduler happy. But dropping from as high as 1000Hz to 1Hz is obviously a significant improvement. Response-time jitter due to timer interrupts will be nearly eliminated, and, according to Ingo Molnar, as much as 1% of the CPU's time will be saved.

There are workloads out there that will benefit significantly from those improvements. High-performance computing (HPC) and realtime are obvious candidates; in both cases, dedicating a CPU to a single task is a fairly common tactic already. But, in an era where even phones have quad-core processors, having a single runnable process on a given CPU is not an uncommon situation.

There are a lot more details to making full tickless operation work properly; setting up a system to use this feature requires a fair amount of fiddling at this time. At a minimum, the administrator should make extensive use of CPU affinities to keep unwanted processes (including kernel threads) off the relevant processors. Some RCU configuration is required as well; see Documentation/timers/NO_HZ.txt for lots of details on the various options.

Full tickless operation, as seen in 3.10, is clearly a significant step forward, but, equally clearly, this project is not yet complete. There is a fair amount of detail work to be done, including making the feature work on 32-bit processors (a patch exists), getting rid of that final once-per-second tick, mitigating some unfortunate side effects on the scheduler's statistics and load balancing, and fixing the inevitable bugs. This is a large and invasive change to how the core kernel works; there will almost certainly be some surprising behaviors that emerge once the tickless mode starts to get wider testing.

The biggest item on the "to do" list, though, must surely be getting rid of the single-runnable-process requirement. Just in case the developers involved did not already feel that way, Linus made his opinion on the matter clear:

So as long as the NOHZ is for HPC-style loads, then quite frankly, I don't feel it is worth it. The _only_ thing that makes it worth it is that "future plans" part where it would actually help real loads.

So, chances are, this limitation will be removed from the tickless implementation in some future development cycle, along with the other various rough edges. In the meantime, the 3.10 kernel will contain a significant step forward in the evolution of the core Linux kernel: the partial removal of a source of latency and overhead that has been there since the very first kernel release. Not even the big kernel lock endured anywhere near that long.

Index entries for this article
KernelDynamic tick
KernelRead-copy-update


to post comments

(Nearly) full tickless operation in 3.10

Posted May 8, 2013 16:12 UTC (Wed) by busterb (subscriber, #560) [Link] (9 responses)

So, what's the solution for running >1 process per CPU with NO_HZ_FULL? Use syscalls as the hook for an implicit schedule 'pump'?

Requiring the user to explicitly schedule with 'sched_yield' to produce a cooperative multitasking scenario might work well. I've worked on enough realtime systems to know that this can yield good performance if done correctly (since your performance-critical sections never get interrupted without you saying so), though it occasionally bites you if you forget to yield somewhere, e.g. waiting for a lock. I know I would have liked to have had something like this in past system designs Instead, 'yielding' was done with coroutines or tasklets/threads.

What about having the boot cpu do all the scheduling for all the other CPUs? Basically treat the other CPUs like a thread pool and distribute processes. Is the cost of IPIs too much for this to be practical?

(Nearly) full tickless operation in 3.10

Posted May 8, 2013 16:22 UTC (Wed) by busterb (subscriber, #560) [Link]

On a related note, it seems like quite a few multi-core SoC vendors have implemented their own patches for this behavior as well, e.g. Tilera's 'zero overhead' Linux. Glad to see if it becomes mainstream, maybe they'll all adopt just one way to do it.

http://www.6windblog.com/linux-based-fast-path/

(Nearly) full tickless operation in 3.10

Posted May 8, 2013 16:24 UTC (Wed) by simlo (guest, #10866) [Link] (5 responses)

After a little RTFC, I found that a HR-timer was used to calculate the next preemption point. I.e. instead of preempting on 100 Hz clock, it preempts exactly when the timeslot of the current process ends.

(Nearly) full tickless operation in 3.10

Posted May 8, 2013 18:50 UTC (Wed) by blitzkrieg3 (guest, #57873) [Link]

Okay, this makes absolutely perfect sense, but because of the way operating systems have been designed for the past 40 years, that solution completely eluded me.

(Nearly) full tickless operation in 3.10

Posted May 8, 2013 19:48 UTC (Wed) by intgr (subscriber, #39733) [Link] (3 responses)

I got the same impression as you. I think the wording in the article is unfortunate.

> Even then, a running CPU will only disable the timer tick if there is a single runnable process on that CPU. As soon as a second process appears, the tick is needed so that the scheduler can make the necessary time-slice decisions

It's misleading to call it "the tick" if it's not fixed to the HZ any more, seems more like a preemption timer.

(Nearly) full tickless operation in 3.10

Posted May 8, 2013 19:52 UTC (Wed) by corbet (editor, #1) [Link] (2 responses)

Confused. If two processes are running, the period tick will run just like it does now. What am I missing? What is misleading?

(Nearly) full tickless operation in 3.10

Posted May 8, 2013 20:23 UTC (Wed) by intgr (subscriber, #39733) [Link] (1 responses)

Sorry, it's much more likely that I am missing something.

AFAICT the scheduler doesn't switch tasks at every timer tick, even when there is contention for a CPU -- it has its own concept of timeslice length that changes with load. So why does a contended CPU need to run the timer tick if it's not going to switch tasks?

And grandparent wrote something that seemed to match that line of thinking:

> After a little RTFC, I found that a HR-timer was used to calculate the next preemption point. I.e. instead of preempting on 100 Hz clock, it preempts exactly when the timeslot of the current process ends.

(Nearly) full tickless operation in 3.10

Posted May 9, 2013 2:16 UTC (Thu) by nevets (subscriber, #11875) [Link]

It's confusing because there's two things at play here. There's the hrtick and the scheduler_tick.

The hrtick is used to denote exact time slices for the CFS scheduler to create more fairness. It really doesn't do much more than that. But this does not replace the scheduler_tick, which does among other things, keeps track of the SCHED_RR time slices, manages load balancing, and updates task timings.

But I'm sure in the future the hrtick may be used more to get rid of the periodic tick.

(Nearly) full tickless operation in 3.10

Posted May 13, 2013 11:57 UTC (Mon) by dps (guest, #5725) [Link] (1 responses)

I believe that real tickless operation with pre-emptition is possible. Instead of recalculating that the same process keeps the CPU every n microseconds, until it does not, one could compute when another process would win the CPU and only schedule an interrupt and that time. Any periods in which the process would keep the CPU does not need interruption.

I know actually implementing this would not be trivial and I am not volunteering to do this myself.

(Nearly) full tickless operation in 3.10

Posted May 14, 2013 12:18 UTC (Tue) by Tobu (subscriber, #24111) [Link]

Part of the work will be to adjust the various accumulators, statistics, governors and feedback mechanisms to work when the timeslice they measure isn't constant any more.

(Nearly) full tickless operation in 3.10

Posted May 8, 2013 16:30 UTC (Wed) by linuxjacques (subscriber, #45768) [Link] (1 responses)

The 64-bit only constraint makes me wonder how much this has been tested on non-x86 archs.

I'm interested in it for 32-bit ppc.

(Nearly) full tickless operation in 3.10

Posted May 9, 2013 20:58 UTC (Thu) by bagder (guest, #38414) [Link]

There's also code for (32bit) ARM as well that works. I'm not sure exactly in which tree/state it is though.

(Nearly) full tickless operation in 3.10

Posted May 8, 2013 18:02 UTC (Wed) by gby (guest, #23264) [Link] (2 responses)

There is no inherent one second timer tick except on on a single CPU on the system. I think you got this wrong.

One-second tick

Posted May 8, 2013 18:29 UTC (Wed) by corbet (editor, #1) [Link] (1 responses)

See this commit. One tick per second is still needed.

One-second tick

Posted May 9, 2013 0:18 UTC (Thu) by simlo (guest, #10866) [Link]

Hmm, wouldn't it be more appropiate just to make sure the timer is set at a maximum of 1 second in the future after last time the scheduler ran instead of being run every second?

(Nearly) full tickless operation in 3.10

Posted May 8, 2013 22:56 UTC (Wed) by bgmarete (guest, #47484) [Link] (9 responses)

With regard to the necessity of setting CPU affinities so that processors marked for tickless operation do not acquire more than one process, wouldn't something like systemd ease the related administrative work? After all, I don't want to be messing around with taskset(1) on my laptop.

Also, how does this relate to CPU hyperthreads? Can a hyperthread be in tickless mode while a sibling hyperthread is not? Or must the entire core be woken up (or not) by the timer ticks?

Hyperthreading

Posted May 8, 2013 23:05 UTC (Wed) by corbet (editor, #1) [Link] (8 responses)

If you want tickless for latency reasons, the last thing you're going to do is turn on hyperthreading.

Hyperthreading

Posted May 9, 2013 1:15 UTC (Thu) by bgmarete (guest, #47484) [Link] (7 responses)

Thanks for the reply Corbet. I have just finished reading a long series of articles, including some at Intel, which make your point clear.

I would still like to know if each hyperthread can be independently put into full tickless mode (independent, that is, from its sibling), with the attendant power savings (if any). (Assume that I am interested only in saving battery power).

Hyperthreading

Posted May 9, 2013 13:14 UTC (Thu) by sheepdestroyer (guest, #54968) [Link] (6 responses)

Also interested by that particular issue. Got a SandyBridge ultra-portable so 2 "real" cores + 2 HT and would like to know how much if any power save should i expect?

Hyperthreading

Posted May 9, 2013 15:59 UTC (Thu) by drago01 (subscriber, #50715) [Link] (5 responses)

I doubt this will get you any noticeable power savings if at all. The most interesting part for power saving is the idle part which is already handled by CONFIG_NO_HZ

Hyperthreading

Posted May 10, 2013 8:43 UTC (Fri) by akeane (guest, #85436) [Link] (4 responses)

The config option should be called: CONFIG_WINDOWS_311

Really, "cooperative" multi-tasking?

Really?

Really...

Hyperthreading

Posted May 10, 2013 18:59 UTC (Fri) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

Cute, but inaccurate. ;-)

If there is only one CPU-bound task runnable on a given CPU, there is no point in any scheduling decisions.

If there are multiple tasks runnable on a given CPU, and if the currently running task is CPU-bound, then there is no point in any scheduling decisions until the next timeslice.

Of course, things might change in the meantime, but in that case, this CPU will receive an interrupt and can therefore adjust as appropriate at that point in time.

Hyperthreading

Posted May 13, 2013 21:46 UTC (Mon) by chloe_zen (guest, #8258) [Link] (2 responses)

I think you haven't been keeping up with current events (no pun intended); event-driven loops in a single OS thread are the New Way of getting the most I/O through a CPU. This is a realistically useful optimization.

Hyperthreading

Posted May 20, 2013 23:13 UTC (Mon) by marcH (subscriber, #57642) [Link] (1 responses)

So, ticks to look some day as outdated as polling?

ticks vs. polling

Posted May 21, 2013 7:49 UTC (Tue) by cladisch (✭ supporter ✭, #50193) [Link]

> ticks to look some day as outdated as polling?

Polling is regularly checking the status, just because something that needs handling might have happened.

Ticks are polling.

(Nearly) full tickless operation in 3.10

Posted May 22, 2013 7:45 UTC (Wed) by chenlb206 (guest, #86317) [Link]

Thank open source guys. You are great~

Does Tickless supports for Octeon-2 and ARM as well ?

Posted May 24, 2013 7:56 UTC (Fri) by ajaycavium (guest, #91111) [Link] (2 responses)

Can anybody please tell this feature is supported for Octeon-2 and ARM as well or it is for x86 only.

Does Tickless supports for Octeon-2 and ARM as well ?

Posted May 28, 2013 8:29 UTC (Tue) by Ralf (guest, #40688) [Link] (1 responses)

Octeon 2 support CONFIG_NO_HZ_IDLE but not yet CONFIG_NO_HZ_FULL. Working on that.

Does Tickless supports for Octeon-2 and ARM as well ?

Posted May 29, 2013 9:46 UTC (Wed) by Ralf (guest, #40688) [Link]

MIPS support (tested on a 12 core Octeon+) is now available in http://git.linux-mips.org/?p=ralf/linux.git;a=commit;h=07...

Please send test reports to linux-mips@linux-mips.org. Thanks!

(Nearly) full tickless operation in 3.10

Posted Jul 1, 2013 12:28 UTC (Mon) by methanol (guest, #91650) [Link]

Is there a way to show what the current "tick-mode" is? And can I force only one process to be on a particular cpu core?

(Nearly) full tickless operation in 3.10

Posted Jul 16, 2013 21:31 UTC (Tue) by ParadoxUncreated (guest, #87037) [Link] (1 responses)

It´s way more than 1% more cpu. Atleast in terms of visual performance. I did a low-jitter config on linux some time ago, that I have now perfected:

http://ovekarlsen.com/Blog/turning-ubuntu-12-04-into-a-pr...

Choosing components for less jitter, reducing timer interrupts to 90hz, enabling low-latency behaviour etc, + renicing X etc, made doom 3 run perfectly on a core 2 duo, with a GTX 280. On Quakecon 2012 Carmack says that game is still taxing on some configurations. It actually requires an Intel E5 workstation with low jitter hardware, to run as good in a tweaked windows XP.

The game does 3 passes to OpenGL pr frame. That seems to make it very jitter-sensitive. But with a low-jitter configured kernel, it runs perfectly. That seems to apply to things like Wine also. Also Wine-games such as Half-life 2 still had some jitter, but much less.

So low-jitter matters, if you want smooth gameplay on advanced game-engines. And a lot of data is pushed there. Generally the system seems very nice and responsive also. Much more like an optimal desktop system indeed.

Peace Be With You.

(Nearly) full tickless operation in 3.10

Posted Jul 17, 2013 2:56 UTC (Wed) by hummassa (subscriber, #307) [Link]

Your link says "maintenance mode".

(Nearly) full tickless operation in 3.10

Posted Aug 2, 2013 6:37 UTC (Fri) by YAK (guest, #91961) [Link]

In file kernel/time/Kconfig, option "NO_HZ_FULL" depends on 64BIT, So is it only supported for 64 bit architecture ? i wanted to try it out on 32-bit ARM cortex A9 but i guess its not possible ?

kernel thread/process and full tickless mode?

Posted Oct 9, 2018 1:44 UTC (Tue) by henryc (guest, #127733) [Link]

The article states, "the administrator should make extensive use of CPU affinities to keep unwanted processes (including kernel threads) off the relevant processors".

I thought it wasn't possible to affine migration and kworker processes to other cores as they are per-core processes that bind to the specific cores. RCU related processes can be moved as there is a kernel parameter for that.

The article suggests that in order for a core to be in full nohz mode, "a running CPU will only disable the timer tick if there is a single runnable process on that CPU". It sounds like it is necessary to move kernel processes such as migration and kworker out of the core. Or kernel processes don't count? I am a bit confused.

If kernel processes count, then how to move them to other cores? Can they be moved to another physical CPU?

Thanks.


Copyright © 2013, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds