Why doesn't Linux support a preemptive off primitive in user space?
It could be made cheaply: Each thread has a counter, read-writeable from userspace. Just as in the kernel, "preempoff" is done by counter++, and preempton by the counter--. In the rare case where the kernel wants to preempt the thread it checks if counter>0. If so it does not preempt the user space task, but sets up a timer to force a preemption. It also sets
a flag to the task which the userspace have to check on --counter == 0 to
call a volentarely schedule(). Just as is done in the kernel right now.
That would effectively make atomic operations for uniprocessors systems in usersspace on ARMv5 and other architectures with no atomic instructions.
Maybe it will even be cheaper on other UP systems with native atomic instructions.
ARM SoC launched with Linux support (Linux Devices)
Posted Jan 9, 2009 10:24 UTC (Fri) by ncm (subscriber, #165)
[Link]
It seems simpler, safer, and faster for the scheduler to look at the instruction at the return address on the stack of the process being pre-empted, and if it's in the middle of what ought to be an atomic sequence (e.g. a naïve implementation of compare-and-swap) just complete it and update the stacked status word and return address.
ARM SoC launched with Linux support (Linux Devices)
Posted Jan 9, 2009 12:37 UTC (Fri) by simlo (subscriber, #10866)
[Link]
"It seems simpler, safer, and faster"
I can't see any of the 3 to be right:
simpler: How do you determine if this is the case?
safer: What if you where wrong and missed one? What if you where wrong
and some attacher could abuse getting something run in kernel mode?
faster: How can it be faster to make a complicated search into the binary
code than checking the content of one address?
ARM SoC launched with Linux support (Linux Devices)
Posted Jan 9, 2009 20:22 UTC (Fri) by ncm (subscriber, #165)
[Link]
Time spent in the kernel scheduler doesn't count, because it happens so rarely. What counts is time spent in user code that isn't pre-empted; incrementing and decrementing a memory word (never mind fooling with a timer!) are expensive. By contrast, code in the kernel to compare a couple of memory words against a pattern is simple, and updating the process state if they match is also simple. So, you just need an instruction sequence that does the right thing very cheaply when it doesn't get pre-empted, and is easy to recognize and patch when it does.
ARM SoC launched with Linux support (Linux Devices)
Posted Jan 9, 2009 22:21 UTC (Fri) by jgg (guest, #55211)
[Link]
It is actually a pretty good idea if you combine it with kernel controlled code in the VDSO. Clearly operating on arbitrary user code is not a good idea..
Basically, the kernel has useful atomic_* function calls exported in its VDSO, if ever there is a preemption then during the context switch back to the suspended task the kernel checks if the PC is within the bad portion of the VDSO (two compares) and if so then it inspects the situation in more detail and adjusts the PC to return to the start of the atomic op function and tries again.
Similar precautions can be taken for signals and other cases. The basic structure of the atomic functions in the VDSO would be of the load locked/store conditional type with the kernel providing a loop back to start if the 'reservation' is lost.
User space simply links to these calls like normal VDSO linking. The VDSO is readonly to user space and has a unique physical and virtual address.
If generalized properly it could be efficiency neutral for userspace by being optimized to a specific CPU model. Certainly on ARM it has got to be a win if you need to support all CPU types. PPC and others with ll/sc it might be neutral, if the vdso functions are thick enough to replace what would have been a function call anyhow. Probably not useful on x86..
ARM SoC launched with Linux support (Linux Devices)
Posted Jan 9, 2009 13:52 UTC (Fri) by tialaramex (subscriber, #21167)
[Link]
Because this is an unspeakable horror ?
Your approach grants one of the following:
It's OK to give unprivileged userspace processes an unlimited quantum. Who needs an operating system anyway ?
When the timer fires the atomic operation will be interrupted anyway, thus adding a complex feature which doesn't solve the problem.
The Linux kernel team presumably don't think either of these options is sane.
ARM SoC launched with Linux support (Linux Devices)
Posted Jan 9, 2009 15:17 UTC (Fri) by simlo (subscriber, #10866)
[Link]
As you appearently understood from the idea, the timer should prevent the
user from getting unlimited CPU. What should happen if he does anyway? Send him a SIGILL and let him deal with it.
ARM SoC launched with Linux support (Linux Devices)
Posted Jan 9, 2009 15:45 UTC (Fri) by tialaramex (subscriber, #21167)
[Link]
So, the user must absolutely avoid racing your timer, or he'll be punished with SIGILL. Thus the "feature" of disabling pre-emption becomes a thin wrapper around simple atomic ops only.
But, it incurs a full system call for every atomic op to let the kernel know that pre-emption can be re-enabled (if the user tries to avoid the system call, he's racing the timer and will SIGILL)
So now your solution has /worse/ performance than the existing solutions, and it's needlessly more complicated.
You might want to check the link nearby (about 0xffff0fc0) in this thread for what Linux actually provides to ARM users, and consider if your solution doesn't look a bit silly by comparison.