Hierarchical RCU - unification suggestion
Posted Nov 8, 2008 1:03 UTC (Sat) by PaulMcKenney
In reply to: Hierarchical RCU - unification suggestion
Parent article: Hierarchical RCU
Looks like I should have had yet another Quick Quiz on unifying dyntick-idle and offline detection.
First, it might well turn out to be the right thing to do. However, the challenges include the following:
- As you say,
- As you say, although CPUs are not allowed to go into dyntick-idle state while they have RCU callbacks pending, CPUs -are- allowed to go offline in this state. This means that the code to move their callbacks is still requires. We cannot simply let them silently go offline.
- Present-day systems often run with NR_CPUS much larger than the actual number of CPUs, so the unified approach could waste time scanning CPUs that never will exist. (There are workarounds for this.)
- CPUs get onlined one at a time, so RCU needs to handle onlining.
- A busy system with offlined CPUs would always take three ticks to figure out that the offlined CPUs were never going to respond.
- Switching into and out of dyntick-idle state can happen extremely frequently, so we cannot treat a dyntick-idle CPU as if it was offline due to the high overhead of offlining CPUs.
Under normal circumstances, offline CPUs are not included in the bitmasks indicating which CPUs need to be waited on, so that normally RCU never waits on offline CPUs.
However, there are race conditions that can occur in the online and offline processes that can result in RCU believing that a given CPU is online when it is in fact offline.
Therefore, if RCU sees that the grace period is extending longer than expected (jiffies, not seconds), it will check to see if some of the CPUs that it is waiting on are offline.
This situation corrects itself after a few grace periods: RCU will get back in sync with which CPUs really are offline.
So the offline-CPU checks invoked from force_quiescent_state() are only there to handle rare race conditions.
Again, under normal circumstances, RCU never waits on offline CPUs.
At this point, when the code is still just a patch, and therefore subject to change, the only way I can see to keep up is to ask questions. Which you are doing. :-)
to post comments)