Since Linux 3.1 (or Linux 2.6.30 with armv6k or later) and gcc 4.7, ARM supports the same 8 byte atomic operations as x86 does,(using the __kernel_cmpxchg64 syscall) so hardware doesn't really get in your way. (armv6k+ cpus do this much faster however)