LWN.net Logo

NoHZ tasks

By Jonathan Corbet
December 20, 2010
Operating system kernels, at their best, should not be noticed by user space at all; in particular, the resource cost of the kernel should be as small as possible. The Linux kernel has been written with that idea in mind, but, for some people, anything is still too much. High-performance computing users want all of the CPU time for themselves, while some latency-sensitive users want their code to never have to wait for the processor. These users have been asking for a way to run processes on at least one CPU with no kernel interference at all - no timer ticks, no interrupts, etc. Thus far, no satisfactory solution has been found; a new patch set by Frederic Weisbecker is not such a solution yet, but it shows another way of attacking the problem.

The idea behind Frederic's patch set is to enable a process to disable the timer interrupt while it is running. If a set of conditions can be met, this will allow the process to run without regular interference from the timer tick. If other sources of interrupts are directed away from the CPU as well, this process should be able to run uninterrupted for some time. There are a few complications, though.

Actually going into the tickless mode is relatively easy; the process need only write a nonzero value to /proc/self/nohz. The patch imposes a couple of conditions on these processes: (1) the process must be bound to the CPU it is running on, and (2) no other process can be running in the tickless mode on that CPU. If those conditions hold, the write to /proc/self/nohz will succeed and the kernel will try to disable the timer tick while that process runs.

The key word here is "try"; there are a number of things which can keep the disabling of the tick from happening. The first of those is any sort of contention for the CPU. If any other processes are trying to run on the same CPU, the scheduler tick must happen as usual so that decisions on preemption can be made. Since a process can be made runnable from anywhere in the system, Frederic's patch performs a potentially expensive inter-processor interrupt whenever the second process is made runnable on any CPU, regardless of whether that CPU is currently running in the no-tick mode or not.

Another thing that can gum up the works is read-copy-update (RCU). If there are any RCU callbacks which need to be processed on the CPU, that CPU will not go into the no-tick mode. RCU also needs to be notified whenever the CPU goes into a "quiescent state," so that it can know when it is safe to invoke RCU callbacks on other CPUs. If RCU has indicated an interest in knowing when the target CPU goes quiescent, once again, no-tick mode cannot be entered. The CPU can also be forced out of the no-tick mode if RCU develops a curiosity about quiescent states anywhere in the system.

Given that RCU is heavily used in contemporary kernels, one would think that its needs would prevent no-tick mode most of the time. Another part of the patch set tries to mitigate that problem with the realization that, if a process is running in user space with the timer tick disabled, the associated CPU is necessarily quiescent. When a CPU is running in this mode, it will enter an "extended quiescent state" which eliminates the need for notification to the rest of the system. The extended quiescent state will probably increase the amount of no-tick time on a processor considerably, but at a small cost: the architecture-level code must add hooks to notify the no-tick code on every kernel entry and exit.

Reviews of the code, so far, have focused on various details which need to be managed differently, but there has not been a lot of criticism of the concept. It's early-stage code, so it doesn't take care of everything that normally happens during the timer tick, a fact which reviewers have pointed out. The biggest gripe, perhaps, has to do with the conditions mentioned at the beginning of the article: the process must be bound to a single CPU, and there can only be one no-tick process running on that CPU. Peter Zijlstra said:

Well yes, this interface of explicitly marking a task and cpu as task_no_hz is kinda restrictive and useless. When I run 4 cpu-bound tasks on a quad-core I shouldn't have to do anything to benefit from this.

Frederic has indicated that the code can be changed to lift those restrictions, but at the cost of some added complexity. Once the restrictions are gone, it may make sense to just enable the no-tick mode whenever the workload is right for it, regardless of a request (or the lack thereof) from any specific process. That would make the no-tick mode more generally useful; it would also reduce the role of the timer tick just a little more. The kernel would still be far from a fully tickless system, but every step in that direction helps.


(Log in to post comments)

Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds