LWN.net Logo

User space atomic ops on ARMv5 and earlier

User space atomic ops on ARMv5 and earlier

Posted Jan 12, 2009 19:58 UTC (Mon) by npitre (subscriber, #5680)
In reply to: ARM SoC launched with Linux support (Linux Devices) by robert_s
Parent article: ARM SoC launched with Linux support (Linux Devices)

> This sums up ARMv5 well in this respect:
>
> http://0pointer.de/blog/projects/atomic-rt.html

This page is not completely accurate.

I'm the author of the __kernel_cmpxchg facility. I simply disagree with the claim that true and efficient atomic operations are not possible on ARMv5.

The trick is very simple: you do your cmpxchg operation in user mode using standard instructions, without any lock, syscall, exception trap, etc. However this should be a controlled set of instruction at a fixed location. Those instructions are provided by the kernel for this purpose and made read-only to user space.

So what you have on pre-ARMv6 at 0xffff0fc0 is this:

1:      ldr     r3, [r2]        @ load current val
        subs    r3, r3, r0      @ compare with oldval
2:      streq   r1, [r2]        @ store newval if eq
        rsbs    r0, r3, #0      @ set return val and C flag
        bx      lr              @ or "mov pc, lr" if no thumb support

This is about the fastest you can get, even when comparing this to ARMv6 with its ll-sc instructions.

Then, upon entry in the kernel which has the potential to schedule another thread, only suffice to perform a simple test on the saved user space pc. value. If it is above 0xc0000000 then execution was possibly interrupted while executing that code, and that may be only due to an interrupt, or a page fault when attempting to dereference the provided pointer. So in those exception handlers, this simple test is added:

        cmp     r2, #TASK_SIZE  @ saved user space pc value
        blhs    kuser_cmpxchg_fixup

The out-of-line kuser_cmpxchg_fixup code determines if pc actually corresponds to the code located between 1: and 2: labels above, meaning that the atomicity cannot be guaranteed. In that case the saved user space pc value is simply rewound to 1: so to restart the operation entirely the next time this thread is scheduled. Suffice to say that this has extremely low probability to happen therefore having next to zero overhead, but when it happens then full "atomicity" is preserved.

This works on non SMP system only, of course. But none of the existing ARMv5 implementations out there are SMP anyway. And on SMP capable ARM systems, the kernel replaces the above code by another version which is SMP safe by using ARMv6 ldrex/strex instructions, making this interface portable.

All this to say that perfect atomic operations are possible and even fast on ARMv5 and earlier with no problem at all. This works even for RT tasks, is signal safe, and if currently this trick is not implemented on uClinux, there is no inherent limitation preventing this to be usable there as well.

This interface may look awkward for user space programs, but the purpose of standard libraries is actually to encapsulate and hide those things. Here's for example an optimized atomic_add() implementation based on the above:

#define atomic_add(ptr, val) \
     ({ register unsigned int *__ptr asm("r2") = (ptr); \
        register unsigned int __result asm("r1"); \
        asm volatile ( \
            "1: @ atomic_add\n\t" \
            "ldr     r0, [r2]\n\t" \
            "mov     r3, #0xffff0fff\n\t" \
            "add     lr, pc, #4\n\t" \
            "add     r1, r0, %2\n\t" \
            "add     pc, r3, #(0xffff0fc0 - 0xffff0fff)\n\t" \
            "bcc     1b" \
            : "=&r" (__result) \
            : "r" (__ptr), "rIL" (val) \
            : "r0","r3","ip","lr","cc","memory" ); \
        __result; })

And so on.


(Log in to post comments)

User space atomic ops on ARMv5 and earlier

Posted Jan 13, 2009 16:44 UTC (Tue) by endecotp (guest, #36428) [Link]

It would be great if gcc could make use of this when you use its __sync_* atomic builtins, or if glibc used it in its pthread_* implementations. I'm pretty sure that neither does at present, though I'd love to be corrected if they do!

User space atomic ops on ARMv5 and earlier

Posted Jan 13, 2009 19:39 UTC (Tue) by npitre (subscriber, #5680) [Link]

NPTL support for ARM in glibc certainly does. I developed the kernel part in collaboration with the person who did the glibc part.

As to gcc, I don't see any specific ARM support for the __sync_* primitives, not even for ARMv6+ which has native instructions that could be used for that purpose. However this should be easy to implement following the PA model. Incidentally, the file gcc/config/pa/linux-atomic.c contains this note:

/* Linux-specific atomic operations for PA Linux.
   Copyright (C) 2008 Free Software Foundation, Inc.
   Based on code contributed by CodeSourcery for ARM EABI Linux.
   Modifications for PA Linux by Helge Deller <deller@gmx.de>
[...]

So maybe that support does exist somewhere already?

User space atomic ops on ARMv5 and earlier

Posted Jan 13, 2009 22:18 UTC (Tue) by endecotp (guest, #36428) [Link]

I've just done a quick test and it looks like gcc emits a function call when you invoke __sync_* for ARM. So presumably it would be possible to write a small library of things with the right names that calls the kernel helper code. Then near-optimal portable code would be possible.

Maybe someone has already done this....

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds