Transactional Memory

Posted Jun 4, 2009 4:06 UTC (Thu) by wahern (subscriber, #37304)
Parent article: A look at two new languages: Vala and Clojure

Everybody talks about how STM removes locking. This is bollocks. I read all the relevant theoretical papers, and poured over the code of existing implementations. I tried to implement a simple STM mechanism, and it's not possible for any amount of data more than 16-bytes (the largest atomic xchgcmp on the ubiquitous x86). This is because pure STM requires the LL/SC (load-linked/store-conditional) processor capability. But NO EXISTING CPU DOES FULL LL/SC, and x86 doesn't even have LL/SC, period (cmpxchg was _proven_ insufficient as a universal primitive for STM 20 years ago). LL/SC requires complicated and non-performant chip circuitry to guarantee the LL/SC requirements. Basically, _any_ write--no matter the op--to an LL address needs to set a flag so a following SC can fail and the code can restart the copy operation; in practice this costs too much. Chips which nominally support LL/SC only make the guarantee for the width of a cache line, IIRC.

Every existing STM library actually uses locks internally. Period. Please, stop the cargo cult hyperbole. The only real STM implementations are on paper, or maybe in a lab w/ a custom ASIC.

Transactional Memory

Posted Jun 4, 2009 5:18 UTC (Thu) by jamesh (guest, #1159) [Link]

Well Clojure isn't running on a real CPU, so the capabilities of real CPUs are not directly relevant.

The language's home page seems to be saying that STM is the concurrency model provided to programs written in the language (and that those semantics are limited to a single type of variable).

I am sure you are correct that it is implemented via locks internally -- it'd use the primitives provided by the JVM.

Transactional Memory

Posted Jun 4, 2009 9:44 UTC (Thu) by farnz (subscriber, #17727) [Link]

For those who want to be accurate about it, STM moves responsibility for locking from the application developer to the STM implementor; the contract is that STM applications cannot deadlock, and as a quality of implementation issue, should not livelock.

As with many things in programming, the hope is to trade off the possibility of more performance (given a high enough standard of programmer), for reliability now.

Transactional Memory

Posted Jun 4, 2009 13:15 UTC (Thu) by zdzichu (subscriber, #17118) [Link] (2 responses)

What about Sun's Rock CPU?

Transactional Memory

Posted Jun 4, 2009 18:37 UTC (Thu) by wahern (subscriber, #37304) [Link] (1 responses)

From the Wikipedia entry I think this is closer to hardware transactional memory. STM is a way to get transactional semantics by the use of a simple primitive and fancy algorithms where parallel threads do a complicated dance to arrive at the completion of a [compound] transaction.

The Rock CPU ops would seem to mostly obviate the need for the dance.

Transactional Memory

Posted Jun 11, 2009 14:10 UTC (Thu) by tvld (guest, #59052) [Link]

Note that the Rock CPU's TM is a best-effort implementation. Even if there are no data conflicts, it does not guarantee that all transactions run without being aborted for some reason. The guarantee they seem to give is something close to an DCAS.

Transactional Memory

Posted Jun 4, 2009 14:38 UTC (Thu) by mitchskin (subscriber, #32405) [Link] (3 responses)

The only real STM implementations are on paper, or maybe in a lab w/ a custom ASIC.

It sounds like you're thinking about hardware transactional memory. Software transactional memory (STM) may indeed be implemented using locks, but it does provide an interface that means client code can avoid using locks directly. You're right that there still may be locking somewhere, but that can still be a win since it isolates locking to one widely shared piece of code.

Transactional Memory

Posted Jun 4, 2009 17:51 UTC (Thu) by wahern (subscriber, #37304) [Link] (2 responses)

Actually, Software Transactional Memory is the correct term. To wit:

"Software Transactional Memory", Nir Shavit and Dan Toutiou, 1997.

The confusion is part of what I'm raging about ;) You can't simply say, "well, by 'software' I mean whatever I want it to mean, and not what I'm implying".

Fact is, real STM does not exist. It doesn't exist in the JVM, it doesn't exist anywhere in production (notwithstanding the Rock CPU mention). It can only exist w/ a universal primitive like LL/SC, as proven by, I believe, Herlihy.

Everything else can deadlock, and can livelock (if you're providing STM semantics).

Now, as pointed out elsethread, yes, STM can provide conceptual amelioration regarding the semantics of a language. But when that new language is running on your desktop, there is no way to "fake" it.

Existing STM is analagous to Perl5/6's "quantum" operators. Conceptually useful, but it's not literally doing it.

But don't take my word for it. Here's a small list of relevant material. These are the papers I still have in my research directory:

"Software Transactional Memory", Shavit and Toutiou, 1997.

"A Methodology for Implementing Highly Concurrent Data Objects", Herlihy, 1993.

"A Methodology for Implementing Highly Concurrent Data Structures", Herlihy, 1990.

"A Practical Multi-Word Compare-and-Swap Operation", Harris, Fraser, Pratt, 2002.

"Impossibility and Universality Results for Wait-Free Synchronization", Herlihy, 1988.

"Practical Lock-Free and Wait-Free LL:SC:VL Implementations Using 64-bit CAS", Michael, 2004.

"Wait-Free Synchronization", Herlihy, 1991.

Transactional Memory

Posted Jun 4, 2009 20:08 UTC (Thu) by Frej (guest, #4165) [Link]

I think your angle on (software) transactional memory is lockfree/nonblocking/wait-free stuff (the terms are somewhat vaguely defined imo).

But it's not the only point of STM (today). It's about safety and ease of concurrent programming on shared memory architectures.

Secondly, you missed the paper that inspired Software TM:
Transactional Memory: Architectural Support for Lock-Free Data Structures
Herlihy & Moss (1993)

Note that transactional hardware as an idea goes further back. But the above paper is pure hardware. Shavit and Toutiou later introduced Software Transactional Memory in the paper you posted.

So it is correct to distinguish between Transactional Memory and Software (assisted) Transactional Memory. Also, it does not make it pen and paper because the implementation uses locks. So surely STM exists, there are many implementations - The best are compiler/language assisted.

Transactional Memory

Posted Jun 11, 2009 14:06 UTC (Thu) by tvld (guest, #59052) [Link]

First, compare-and-set (CAS, cmpxchg,...) is universal, as is LL/SC.
Therefore, you can build nonblocking STMs, and people actually do (see, for example, the paper describing DSTM (one of the first STMs)).

Assuming that your memory transactions don't block by waiting for external events to happen, no sane (S)TM will deadlock. Some might live-lock, but that is just a question of contention management, and not a principal limitation (remember that CAS is universal...).

Please actually do read the relevant literature.

Transactional Memory

Posted Jun 4, 2009 18:01 UTC (Thu) by ekmett (guest, #58940) [Link] (4 responses)

<start lock-free STM cargo cult hyperbole>
STM _can_ remove locking on modern hardware, but you're using the wrong primitives.

To implement a fully lock-free STM you use an atomic n-CAS. You are correct that LL/SC or DCAS isn't available on a modern CPU directly. However, you CAN build an n-CAS out of that very same x86 cmpxchg16b-style CAS operation. The derivation isn't very straight forward, so I'm not surprised that you didn't come up with it by yourself, but a construction exists. You wind up comparing and swapping the values of n 'slots' atomically, rather than swapping n machine integers directly.

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1...

Once you have that, the derivation is straightforward.

The reason almost all transactional memory implementations fall back on ordered locks is because in practice the n-CAS STM is slower than the lock-based STM, not because no such construction exists.

Transactional Memory

Posted Jun 5, 2009 17:29 UTC (Fri) by wahern (subscriber, #37304) [Link] (3 responses)

I am/was aware of that paper.

There are other DCAS algorithms, too, including (IIRC) one which is probabilistic (but strong). One problem I encountered is that early Opterons don't implement a 128-bit CAS.

Also, my particular task was implementing POSIX signals for pthreads-win32, which ruled out dynamic allocation (requiring locking) of context structures--required by many (all?) of these CAS algorithms.

Transactional Memory

Posted Jun 6, 2009 1:34 UTC (Sat) by ekmett (guest, #58940) [Link] (2 responses)

You can get by with the 8 byte version of cmpxchg by using an array of slots
rather than raw pointers, it is just less elegant.

But even so thats a far cry from no implementation being possible. ;)

Transactional Memory

Posted Jun 8, 2009 19:52 UTC (Mon) by wahern (subscriber, #37304) [Link] (1 responses)

How many slots?

Should I just tell pthread-win32 users, "POSIX signals work without any trouble... just don't use more than N threads"?

That's not STM per the theory, is it? That's something very close to STM, but still requires workarounds and caveats. And w/ that attitude, Intel or AMD will never give us the hardware support that's needed.

Unless STM is simple and straightforward (which, even if the algorithm is complicated, it's _universal_, and so you won't have to roll your own everytime), then people will still mostly just _talk_ about STM rather than actually _using_ real STM, w/ the concomitant _realized_ benefits.

Transactional Memory

Posted Jun 11, 2009 12:33 UTC (Thu) by ekmett (guest, #58940) [Link]

Well, ~4 billion slots seems pretty flexible. ;)

Transactional Memory

Posted Jun 11, 2009 8:24 UTC (Thu) by ketilmalde (guest, #18719) [Link]

> Everybody talks about how STM removes locking. This is bollocks.

Really? I think there's a lot of talk about how STM is bollocks. While I haven't read the literature (as) extensively (as you seem to have), I've yet to see any rationale for this. More specific pointers than just a ream of research paper titles would be great.

> This is because pure STM requires the LL/SC

I'm sorry, but isn't this SOFTWARE transactional memory we're talking about? What's stopping the run-time system from abstracting away the underlying hardware?

> Every existing STM library actually uses locks internally. Period.
> Please, stop the cargo cult hyperbole. The only real STM implementations
> are on paper, or maybe in a lab w/ a custom ASIC.

While I've only made toy implementations using it, Haskell's STM library seems awfully real to me. I'm not aware of any locking, but I could be misunderstanding how it works, of course.

Further down, you go on to claim:

> Fact is, real STM does not exist.

and

> Everything else can deadlock, and can livelock

While I love unsubstianted claims as much as the next guy, your opinions would be more enlightening if you provide actual example code that demonstrates this.