Yes, that's a good page - but do read it all, the important bit is down in the comments where Andrew Haley mentions the "magic" code in high memory that does it all automagically (and fairly efficiently).
In my experience, with some thought you can often get quite reasonable code by trying to implement the algorithm directly in terms of atomic swap (i.e. the one atomic operation that ARMv5 has), rather than by writing the code using e.g. atomic add and then trying to implement atomic add on top of swap. I think I had futex-based mutex and condition code that did this.
ARM SoC launched with Linux support (Linux Devices)
Posted Jan 10, 2009 9:59 UTC (Sat) by Ze (subscriber, #54182)
[Link]
>>In my experience, with some thought you can often get quite reasonable code by trying to implement the algorithm directly in terms of atomic swap (i.e. the one atomic operation that ARMv5 has), rather than by writing the code using e.g. atomic add and then trying to implement atomic add on top of swap. I think I had futex-based mutex and condition code that did this.
I'm going by memory for this but I seem to remember you are limited to a fixed number of atomic locks at compile time. Whether you use Peterson's algorithm or Lamport's Bakery algorithm.
Thinking about it now I don't see why you can't do a binary semaphore using atomic swap , something like the following pseudo code. I'd be happy if somebody told me I was wrong though since I could be a bit rusty on my atomic primitives.
binary_semaphore_lock()
{
thread_local = 1;
do
{
atomic_swap(thread_local,lock);
}
while(thread_local==1);
//If we get to here then local = 0 , and thus lock must've been equal to 0 before we swapped 1 into it.
}
binary_semaphore_unlock()
{
lock = 0;
}
If you used Peterson's or Lamport's it would depend upon the particular workload and environment but I wouldn't be surprised if the patching method in the kernel worked quicker in most cases. It'd depend upon number of locks and the ratio of lock/unlocks to kernel interruptions/pre-emptions though.
ARM SoC launched with Linux support (Linux Devices)
Posted Jan 13, 2009 9:41 UTC (Tue) by endecotp (guest, #36428)
[Link]
> Peterson's algorithm or Lamport's Bakery algorithm.
Those are the algorithms that you have to resort to when you don't even have atomic swap, as I recall. With ARMv5 you can use code like your example. I have had success with something like this:
bool locked;
void lock() {
while (atomic_swap(locked,true)) {
sched_yield();
}
}
void unlock() {
locked = false;
}
This works well IFF:
- The probability of contention is low.
- You have tens, not thousands, of threads.
- You care only about average delays, not worst case.
- You don't care about RT threads, priority inversion etc.
The really great thing about this code is that the fast (uncontended) path is just a couple of instructions as long as the compiler inlines it, making it maybe an order of magnitude faster than something like pthread_mutex_lock().
I was trying to recall how to extend this to use a futex() call instead of sched_yield(). It gets tricky....
ARM SoC launched with Linux support (Linux Devices)
Posted Jan 13, 2009 21:08 UTC (Tue) by npitre (subscriber, #5680)
[Link]
> In my experience, with some thought you can often get quite reasonable
> code by trying to implement the algorithm directly in terms of atomic
> swap (i.e. the one atomic operation that ARMv5 has), rather than by
> writing the code using e.g. atomic add and then trying to implement
> atomic add on top of swap. I think I had futex-based mutex and
> condition code that did this.
That certainly works for things like semaphores where you're expected
to go to sleep on contention. However, for simple atomic primitives such
as atomic adds and the like, relying on futexes for arbitration is rather
heavy weight, whereas the technique currently implemented in ARM Linux
for those is extremely light and has no contention issue at all.
Nevertheless, a simple swap instruction, along with futexes, allows
for pretty efficient semaphore support. For example, see my implementation
here (if I'm not mistaken, you are the one I replied to in that message).
That implementation, unfortunately, is not POSIX compliant as it is not
safe wrt concurrent usage from signal handlers.
ARM SoC launched with Linux support (Linux Devices)
Posted Jan 13, 2009 22:09 UTC (Tue) by endecotp (guest, #36428)
[Link]
Yes that's me! It's a bit embarrassing that that exchange was not much more than a year ago yet I seem to have forgotten most of the details...