KS2011: Preemption disable and verifiable APIs
The specific problem was the this_cpu_*() series of functions which are intended to make access to per-CPU data faster. These functions were also discussed at the just-concluded Realtime Summit; see that report for a more detailed summary of the problems. In short: these functions work without disabling preemption, making the concept of "this CPU" a slippery one at best. Some of them, like this_cpu_write(), simply cannot be used in a correct manner.
Thomas wants, at a minimum, to add some debugging infrastructure that would make it clear what data is being operated on and when. Beyond that, though, he said that a lot of calls to preempt_disable() are popping up throughout the kernel. Disabling preemption can make access to certain types of data safe, but it is, once again, not really clear what is being protected. According to Thomas, preempt_disable() is the new big kernel lock. Peter added that it is a sort of big per-CPU data lock, but there is no desire or need to take a big lock. It is far better to lock the specific data at hand so that multiple users of different data structures can be interleaved in the scheduler if need be.
Linus responded that he has no problem with a per-CPU data lock that disappears in mainline kernels and allows verification of locking with lockdep. But, he said, calling it a new big kernel lock is a bit unfair; it isn't quite that bad. And, in any case, the big kernel lock worked for the kernel for many years. Andi Kleen worried that using lockdep with per-CPU data could lead to spurious warnings in places where the "lock" ordering does not really matter.
Ted Ts'o asked about the impact of the bugs that had been found. Thomas responded that most of them were in statistical functions and, thus, did not matter much. But there was one in the filesystem layer in 2.6.38 or 2.6.39 that would store a pointer on the wrong CPU and hose a disk, and another one that made the SLUB allocator explode. If, he said, even the people who created this interface (the this_cpu_*() functions were initially added for use with SLUB) cannot get its use right, something is really wrong. That is scary since use of these functions is exploding throughout the kernel.
Some changes are in the works. They may involve renaming the functions and will almost certainly involve the removal of the most dangerous ones and the addition of some debugging infrastructure.
Next: Scheduler testing
Index entries for this article | |
---|---|
Kernel | Development model/Kernel quality |
Posted Oct 25, 2011 15:56 UTC (Tue)
by valyala (guest, #41196)
[Link] (7 responses)
Posted Oct 26, 2011 8:39 UTC (Wed)
by etienne (guest, #25256)
[Link] (1 responses)
Posted Oct 26, 2011 16:56 UTC (Wed)
by valyala (guest, #41196)
[Link]
Posted Oct 26, 2011 22:10 UTC (Wed)
by BenHutchings (subscriber, #37955)
[Link]
Posted Oct 27, 2011 0:10 UTC (Thu)
by rusty (guest, #26)
[Link] (1 responses)
Posted Oct 27, 2011 9:33 UTC (Thu)
by valyala (guest, #41196)
[Link]
Posted Nov 3, 2011 1:00 UTC (Thu)
by slashdot (guest, #22014)
[Link]
Instead of disabling migration or adding potentially contended locks, just making per-cpu data per-thread should work naturally.
Posted Nov 14, 2011 2:32 UTC (Mon)
by mfedyk (guest, #55303)
[Link]
Posted Nov 8, 2011 21:25 UTC (Tue)
by clameter (subscriber, #17005)
[Link] (1 responses)
1. The this_cpu apis were primarily created for statistics and not for SLUB.
2. this_cpu operations are independent of preemption and are intended to be used when it does not matter which processors instance is accessed. The typical use case is the increment of a per cpu counter. It does not matter which one is incremented since a sum of all per cpu counters is used to determine the number of events. It just matters that one of the per cpu counters is incremented.
3. People can of course abuse these operations. this_cpu_read and this_cpu_write (which are operations that are rarely used, these are not typical for code using this_cpu operations) can be easily confusing if one expects the processor to remain the same. The this_cpu operations are intended to be used for context in which it does not matter which processor the operation occurs on. Expectations of being on the same cpu later are just contrary to the design of these operations.
4. this_cpu_write() can be used in a correct manner for example when it is sufficient to set a value on any per cpu instance.
The problem with this_cpu ops seems to exist mainly in some peoples heads and the clueless saw two instances where people had similarly wrong expectation of this_cpu ops and generalized from there not bothering to investigate further and not willing to listen to those explaining things to them.
I am all for verification of APIS but the this_cpu operations are designed to do their job regardless of preemption being enabled or disabled. So they are out of scope for preemption verification (__this_cpu_xx operations may be different but those were not mentioned. Those may require some form of serialization which may be provided by disabling preemption).
Posted Nov 9, 2011 12:35 UTC (Wed)
by PaXTeam (guest, #24616)
[Link]
KS2011: Preemption disable and verifiable APIs
KS2011: Preemption disable and verifiable APIs
Most of the time the processor will not be changed, so you get the speed, but anyway the code is always right, just in some rare cases slower.
KS2011: Preemption disable and verifiable APIs
This won't work if you are going to modify the per-cpu structure after the CPU has been changed. So you'll end up with either locking or atomic modifications for the structure. But since most of the time the structure is modified via native CPU, both locking and atomic modifications should be fast, because there is no cache line bouncing.
KS2011: Preemption disable and verifiable APIs
KS2011: Preemption disable and verifiable APIs
KS2011: Preemption disable and verifiable APIs
KS2011: Preemption disable and verifiable APIs
KS2011: Preemption disable and verifiable APIs
KS2011: Preemption disable and verifiable APIs
KS2011: Preemption disable and verifiable APIs