KS2011: Preemption disable and verifiable APIs

By Jonathan Corbet
October 24, 2011

2011 Kernel Summit coverage

Thomas Gleixner is grumpy about some of the interfaces that have been added to the kernel recently, and he used a session at the 2011 Kernel Summit to talk about it. When we are not careful, he said, we add code to the kernel which destabilizes it and makes problems harder to find.

The specific problem was the this_cpu_*() series of functions which are intended to make access to per-CPU data faster. These functions were also discussed at the just-concluded Realtime Summit; see that report for a more detailed summary of the problems. In short: these functions work without disabling preemption, making the concept of "this CPU" a slippery one at best. Some of them, like this_cpu_write(), simply cannot be used in a correct manner.

Thomas wants, at a minimum, to add some debugging infrastructure that would make it clear what data is being operated on and when. Beyond that, though, he said that a lot of calls to preempt_disable() are popping up throughout the kernel. Disabling preemption can make access to certain types of data safe, but it is, once again, not really clear what is being protected. According to Thomas, preempt_disable() is the new big kernel lock. Peter added that it is a sort of big per-CPU data lock, but there is no desire or need to take a big lock. It is far better to lock the specific data at hand so that multiple users of different data structures can be interleaved in the scheduler if need be.

Linus responded that he has no problem with a per-CPU data lock that disappears in mainline kernels and allows verification of locking with lockdep. But, he said, calling it a new big kernel lock is a bit unfair; it isn't quite that bad. And, in any case, the big kernel lock worked for the kernel for many years. Andi Kleen worried that using lockdep with per-CPU data could lead to spurious warnings in places where the "lock" ordering does not really matter.

Ted Ts'o asked about the impact of the bugs that had been found. Thomas responded that most of them were in statistical functions and, thus, did not matter much. But there was one in the filesystem layer in 2.6.38 or 2.6.39 that would store a pointer on the wrong CPU and hose a disk, and another one that made the SLUB allocator explode. If, he said, even the people who created this interface (the this_cpu_*() functions were initially added for use with SLUB) cannot get its use right, something is really wrong. That is scary since use of these functions is exploding throughout the kernel.

Some changes are in the works. They may involve renaming the functions and will almost certainly involve the removal of the most dangerous ones and the addition of some debugging infrastructure.

Next: Scheduler testing

Index entries for this article
Kernel	Development model/Kernel quality

KS2011: Preemption disable and verifiable APIs

Posted Oct 25, 2011 15:56 UTC (Tue) by valyala (guest, #41196) [Link] (7 responses)

Why not splitting per-CPU data further into per-thread data? At least this_thread_*() functions will be safe in thread contexts without any locking. Access to per-thread data is guaranteed to be fast unless the thread is recently migrated to another CPU.

KS2011: Preemption disable and verifiable APIs

Posted Oct 26, 2011 8:39 UTC (Wed) by etienne (guest, #25256) [Link] (1 responses)

Or maybe get a pointer to a per-cpu structure at the beginning of your treatment and continue to use this pointer anyway, even if the processor has been changed under your foot.
Most of the time the processor will not be changed, so you get the speed, but anyway the code is always right, just in some rare cases slower.

KS2011: Preemption disable and verifiable APIs

Posted Oct 26, 2011 16:56 UTC (Wed) by valyala (guest, #41196) [Link]

>> Or maybe get a pointer to a per-cpu structure at the beginning of your treatment and continue to use this pointer anyway, even if the processor has been changed under your foot.
This won't work if you are going to modify the per-cpu structure after the CPU has been changed. So you'll end up with either locking or atomic modifications for the structure. But since most of the time the structure is modified via native CPU, both locking and atomic modifications should be fast, because there is no cache line bouncing.

KS2011: Preemption disable and verifiable APIs

Posted Oct 26, 2011 22:10 UTC (Wed) by BenHutchings (subscriber, #37955) [Link]

I think this would make thread creation a lot more expensive.

KS2011: Preemption disable and verifiable APIs

Posted Oct 27, 2011 0:10 UTC (Thu) by rusty (guest, #26) [Link] (1 responses)

Generally num-cpus is small, num-threads is huge.

KS2011: Preemption disable and verifiable APIs

Posted Oct 27, 2011 9:33 UTC (Thu) by valyala (guest, #41196) [Link]

I suppose the number of threads, which usually access per-CPU data, is comparable to the number of CPUs. So, per-thread data can be created lazily in order to avoid performance and memory hit for each created thread.

KS2011: Preemption disable and verifiable APIs

Posted Nov 3, 2011 1:00 UTC (Thu) by slashdot (guest, #22014) [Link]

Yeah, I wonder whether this could indeed be the right solution for preempt-rt kernels.

Instead of disabling migration or adding potentially contended locks, just making per-cpu data per-thread should work naturally.

KS2011: Preemption disable and verifiable APIs

Posted Nov 14, 2011 2:32 UTC (Mon) by mfedyk (guest, #55303) [Link]

You are assuming that you have process context, which in the kernel is there less often than many realize.

KS2011: Preemption disable and verifiable APIs

Posted Nov 8, 2011 21:25 UTC (Tue) by clameter (subscriber, #17005) [Link] (1 responses)

As I have explained before this is factually incorrect in many ways. I have talked before with the presenters and they were mostly aware of the following (see the discussion on lkml in the weeks before):

1. The this_cpu apis were primarily created for statistics and not for SLUB.

2. this_cpu operations are independent of preemption and are intended to be used when it does not matter which processors instance is accessed. The typical use case is the increment of a per cpu counter. It does not matter which one is incremented since a sum of all per cpu counters is used to determine the number of events. It just matters that one of the per cpu counters is incremented.

3. People can of course abuse these operations. this_cpu_read and this_cpu_write (which are operations that are rarely used, these are not typical for code using this_cpu operations) can be easily confusing if one expects the processor to remain the same. The this_cpu operations are intended to be used for context in which it does not matter which processor the operation occurs on. Expectations of being on the same cpu later are just contrary to the design of these operations.

4. this_cpu_write() can be used in a correct manner for example when it is sufficient to set a value on any per cpu instance.

The problem with this_cpu ops seems to exist mainly in some peoples heads and the clueless saw two instances where people had similarly wrong expectation of this_cpu ops and generalized from there not bothering to investigate further and not willing to listen to those explaining things to them.

I am all for verification of APIS but the this_cpu operations are designed to do their job regardless of preemption being enabled or disabled. So they are out of scope for preemption verification (__this_cpu_xx operations may be different but those were not mentioned. Those may require some form of serialization which may be provided by disabling preemption).

KS2011: Preemption disable and verifiable APIs

Posted Nov 9, 2011 12:35 UTC (Wed) by PaXTeam (guest, #24616) [Link]

maybe this_cpu_* should be renamed to any_cpu_*? that would surely instill not sense of guaranteed locality unlike the current 'this' does.