User space atomic ops on ARMv5 and earlier
Posted Jan 12, 2009 19:58 UTC (Mon) by
npitre (subscriber, #5680)
In reply to:
ARM SoC launched with Linux support (Linux Devices) by robert_s
Parent article:
ARM SoC launched with Linux support (Linux Devices)
> This sums up ARMv5 well in this respect:
>
> http://0pointer.de/blog/projects/atomic-rt.html
This page is not completely accurate.
I'm the author of the __kernel_cmpxchg facility. I simply disagree with
the claim that true and efficient atomic operations are not possible on
ARMv5.
The trick is very simple: you do your cmpxchg operation in user mode
using standard instructions, without any lock, syscall, exception trap,
etc. However this should be a controlled set of instruction at a fixed
location. Those instructions are provided by the kernel for this purpose
and made read-only to user space.
So what you have on pre-ARMv6 at 0xffff0fc0 is this:
1: ldr r3, [r2] @ load current val
subs r3, r3, r0 @ compare with oldval
2: streq r1, [r2] @ store newval if eq
rsbs r0, r3, #0 @ set return val and C flag
bx lr @ or "mov pc, lr" if no thumb support
This is about the fastest you can get, even when comparing this to ARMv6
with its ll-sc instructions.
Then, upon entry in the kernel which has the potential to schedule another
thread, only suffice to perform a simple test on the saved user space pc.
value. If it is above 0xc0000000 then execution was possibly interrupted
while executing that code, and that may be only due to an interrupt, or a
page fault when attempting to dereference the provided pointer. So in
those exception handlers, this simple test is added:
cmp r2, #TASK_SIZE @ saved user space pc value
blhs kuser_cmpxchg_fixup
The out-of-line kuser_cmpxchg_fixup code determines if pc actually
corresponds to the code located between 1: and 2: labels above, meaning
that the atomicity cannot be guaranteed. In that case the saved user
space pc value is simply rewound to 1: so to restart the operation
entirely the next time this thread is scheduled. Suffice to say that
this has extremely low probability to happen therefore having next to
zero overhead, but when it happens then full "atomicity" is preserved.
This works on non SMP system only, of course. But none of the existing
ARMv5 implementations out there are SMP anyway. And on SMP capable ARM
systems, the kernel replaces the above code by another version which is
SMP safe by using ARMv6 ldrex/strex instructions, making this interface
portable.
All this to say that perfect atomic operations are possible and even fast
on ARMv5 and earlier with no problem at all. This works even for RT
tasks, is signal safe, and if currently this trick is not implemented on
uClinux, there is no inherent limitation preventing this to be usable there
as well.
This interface may look awkward for user space programs, but the purpose
of standard libraries is actually to encapsulate and hide those things.
Here's for example an optimized atomic_add() implementation based on the
above:
#define atomic_add(ptr, val) \
({ register unsigned int *__ptr asm("r2") = (ptr); \
register unsigned int __result asm("r1"); \
asm volatile ( \
"1: @ atomic_add\n\t" \
"ldr r0, [r2]\n\t" \
"mov r3, #0xffff0fff\n\t" \
"add lr, pc, #4\n\t" \
"add r1, r0, %2\n\t" \
"add pc, r3, #(0xffff0fc0 - 0xffff0fff)\n\t" \
"bcc 1b" \
: "=&r" (__result) \
: "r" (__ptr), "rIL" (val) \
: "r0","r3","ip","lr","cc","memory" ); \
__result; })
And so on.
(
Log in to post comments)