User: Password:
|
|
Subscribe / Log in / New account

McKenney: Transactional Memory Everywhere: 2012 Update for HTM

Paul McKenney looks at the state of hardware transactional memory with an eye toward how it might be useful for current software. "Even with forward-progress guarantees, HTM is subject to aborts and rollbacks, which (aside from wasting energy) are failure paths. Failure code paths are in my experience difficult to work with. The possibility of failure is not handled particularly well by human brain cells, which are programmed for optimism. Failure code paths also pose difficulties for validations, particularly in cases where the probability of failure is low or in cases where multiple failures are required to reach a given code path."
(Log in to post comments)

McKenney: Transactional Memory Everywhere: 2012 Update for HTM

Posted Mar 10, 2012 23:43 UTC (Sat) by wahern (subscriber, #37304) [Link]

Wow. Clearly he's heavily invested in RCU and is paranoid that everyone is going to jump on the HTM bandwagon and leave him holding the bag.

Well... he's paranoid for good reason. Of course everyone will jump on the HTM bandwagon and try to solve every problem under the sun with it, even where it's a poor solution. No amount of shouting is going to prevent that. He needs to calm down or take some meds or something before he has a stroke.

He should just post an I-told-you-so screed on his blog, postdate it to 2015, and go take a long vacation.

McKenney: Transactional Memory Everywhere: 2012 Update for HTM

Posted Mar 11, 2012 0:17 UTC (Sun) by cmccabe (guest, #60281) [Link]

It will be interesting to see how much mindshare transactional memory actually gets among programmers.

The general trend in new programming languages has been to trade off increased hardware resources for reduced programmer effort. Transactional memory is a step in the opposite direction because it forces you to think about thorny questions like what operations are idempotent, whether your data structure will fit in the cache, and so forth. It seems doomed to relative obscurity for the same reasons assembly language programming is.

Has anyone come up with an API that exposes TM to the Linux kernel? I think such an API would have to have an emulation layer for all the non-TM-enabled chips in order to get accepted. There is this: http://www.sosp2007.org/papers/sosp056-rossbach.pdf but it's not clear to me how upstreamable their work is.

McKenney: Transactional Memory Everywhere: 2012 Update for HTM

Posted Mar 11, 2012 10:43 UTC (Sun) by slashdot (guest, #22014) [Link]

HTM support can be transparently added to mutexes and r/w mutexes, so there's not much added programmer effort, and the emulation layer is already there.

McKenney: Transactional Memory Everywhere: 2012 Update for HTM

Posted Mar 11, 2012 11:48 UTC (Sun) by salimma (subscriber, #34460) [Link]

One'd think database programmers are more conditioned to think in terms of transactions than in terms of locks?

As applications get increasingly multi-threaded, transactional memory (whether software-only or with hardware support) should actually be a higher-level, easier to reason about, concept than low-level locks.

McKenney: Transactional Memory Everywhere: 2012 Update for HTM

Posted Mar 11, 2012 15:58 UTC (Sun) by arjan (subscriber, #36785) [Link]

the part for non-enabled hardware is relatively easy; HTM basically has an optimistic fast path, and a pessimistic "slower path".

On non-HTM hardware you just always pick the slower-path part....

McKenney: Transactional Memory Everywhere: 2012 Update for HTM

Posted Mar 12, 2012 17:23 UTC (Mon) by iabervon (subscriber, #722) [Link]

There's been a simultaneous trend toward compilers and tool chains generating lower-level code that the programmer wouldn't have been able to write. I could imagine programmers just saying "this block is atomic" and having a compiler arrange to ensure it is the case (with limits on what you can do in the block). The compiler may have a better chance of generating correct failure paths, simply because it will do it systematically and know the actual attributes of things when generating the code. (A human would probably do the wrong thing if a patch changed the declaration of a variable used in an atomic block to be volatile, but the compiler would see that the situation is untenable, even though it was okay when the code was written.)

And, of course, there's a good chance for things like JVMs to use HTM without the author of the virtualized program knowing. The system could identify that it can speculatively run some critical sections while waiting for the lock, and have them seem to run instantaneously upon acquiring the lock when it turns out that the speculative run wouldn't be distinguishable from a run with the lock.

McKenney: Transactional Memory Everywhere: 2012 Update for HTM

Posted Mar 15, 2012 16:43 UTC (Thu) by tvld (guest, #59052) [Link]

> I could imagine programmers just saying "this block is atomic" and having a compiler arrange to ensure it is the case (with limits on what you can do in the block).

Indeed: http://gcc.gnu.org/wiki/TransactionalMemory

McKenney: Transactional Memory Everywhere: 2012 Update for HTM

Posted Mar 11, 2012 16:04 UTC (Sun) by arjan (subscriber, #36785) [Link]

HTM is solving a different space than RCU is.
RCU solves the case where the code author knows exactly what he is doing, has a relatively fine grained lock, and wants to squeeze the last bit of scalability out of the code.

HTM is primarily aimed at places where locks are NOT finegrained; e.g. the sweet spot for HTM tends to be where you'd take a lock, but have no data conflicts at all if you have contention. For highly tuned OS kernels, these opportunities are in some places. For middleware applications or even apps using runtimes/JITs, the locking is often much more course grained (think "Javascript DOM lock") and HTM can shine.

McKenney: Transactional Memory Everywhere: 2012 Update for HTM

Posted Mar 11, 2012 18:31 UTC (Sun) by cmccabe (guest, #60281) [Link]

Well, at least for the hardware that is now shipping, you can't use hardware transactional memory unless your transaction accesses no more than a fixed, architecture-specific number of cache lines. So coarse-grained transactions seem like a no-go because they would tend to access too much data.

Just to seize on the one example you gave-- I have to ask the question of whether it's even possible for the Javascript VM to determine ahead of time how many cache lines a transaction will access. Isn't Javascript one of those languages that can be monkeypatched at runtime? Perhaps some compiler wizard who knows Javascript better than I can answer that one.

It will be interesting to see how well Intel or someone else succeeds in integrating the HTM concept with actual programmer tools. As far as I'm concerned, the jury is still out on the concept until they can show some actual code.

McKenney: Transactional Memory Everywhere: 2012 Update for HTM

Posted Mar 11, 2012 18:46 UTC (Sun) by arjan (subscriber, #36785) [Link]

it's not the transaction, but the lock that's coarse grained.
think "one lock for a whole bunch of separate data", similar to how the BKL was used. It wasn't that the BKL was used for large regions (although it also was), but more that it was used for many unrelated things, creating false contention.

the JS DOM lock is similar. the lock is not so much accessible from JS directly, as it is used for protecting (behind/in the JIT) for the DOM, which is a collection of mostly independent things...

McKenney: Transactional Memory Everywhere: 2012 Update for HTM

Posted Mar 11, 2012 23:15 UTC (Sun) by cmccabe (guest, #60281) [Link]

Javascript in general tends to run in a single-threaded context. There are a few exceptions-- including the shiny new "web worker" concept that got standardized recently. If the DOM is protected by a giant monolithic mutex, it's probably because nobody cared enough to do something fancier.

I do agree that using TM to modify a tree structure would be conceptually easier than using fine-grained locking or atomic data structures. It will be interesting to see what people come up with.

A lot of people seem to be hoping that hardware TM will be as easy to use as SQL, but my guess is that it will be more like RCU in terms of difficulty. But in the absence of code examples, I'm just guessing and could be wrong.

McKenney: Transactional Memory Everywhere: 2012 Update for HTM

Posted Mar 11, 2012 23:57 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

Well, multithreaded DOM access is just way too complex. It's yet another incarnation of multithreaded GUI toolkit problem: http://weblogs.java.net/blog/kgh/archive/2004/10/multithr...

HTM limits

Posted Mar 12, 2012 15:50 UTC (Mon) by PaulMcKenney (subscriber, #9624) [Link]

“Well, at least for the hardware that is now shipping, you can't use hardware transactional memory unless your transaction accesses no more than a fixed, architecture-specific number of cache lines. So coarse-grained transactions seem like a no-go because they would tend to access too much data.”

Actually, you really can use HTM for moderately large transactions. Yes, you must provide non-HTM fallback code, however, if you have arranged for a low conflict rate, then you have good probability that the transactions will complete and also decent probability of improved performance and scalability. (Yes, I need to compute the probability. I have the math worked out, just need to code/debug.)

What you don't get with HTM is guarantees of forward progress.

As always, use the right tool for the job! ;-)

HTM limits

Posted Mar 12, 2012 18:20 UTC (Mon) by slashdot (guest, #22014) [Link]

There is speculation that the whole per-core L2 cache (256 KB on Sandy Bridge) is usable for transactions, although it doesn't seem to have official confirmation.

HTM limits

Posted Mar 12, 2012 20:40 UTC (Mon) by cmccabe (guest, #60281) [Link]

Thanks for the great article, PaulMcKenney.

I guess there's really two questions that everyone seems to have about hardware transactional memory. One is whether it will prove to be an easier way of doing things than other methods. The other is whether it will provide increased performance on CPUs that support it.

I actually have no doubt that HTM will provide increased performance, on certain CPUs. What I'm still on the fence about is point #1. Is this an easier way to do things? The requirement to write fallback code seems to point to no-- writing things twice over doesn't sound like fun. On the other hand, when you compare it to other solutions like fine-grained locking, atomic data structures, or RCU, perhaps the transactional programming model will start to look more attractive.

P.S. One last question. Is there some kind of requirement that the pages used for HTM be locked into memory using mlock or similar? It seems like having the kernel page out this memory would be bad news. Or is that going to be handled transparently by the fallback code?

HTM limits

Posted Mar 12, 2012 23:56 UTC (Mon) by PaulMcKenney (subscriber, #9624) [Link]

Actual user experience will tell whether HTM provides significant ease of use. My guess is that it is another tool in the toolbox, with its own set of advantages and disadvantages.

My understanding is that HTM does not require that pages be locked into memory. If a page fault occurs, the transaction will be aborted, but presumably the kernel will see the page fault. If the transaction retries, the page will with high probability still be there, so if there are no more page faults, the retry should succeed.

HTM limits

Posted Mar 15, 2012 16:50 UTC (Thu) by tvld (guest, #59052) [Link]

> My understanding is that HTM does not require that pages be locked into memory. If a page fault occurs, the transaction will be aborted, but presumably the kernel will see the page fault. If the transaction retries, the page will with high probability still be there, so if there are no more page faults, the retry should succeed.

This depends on the HTM. For example, ASF made the fault visible; AFAIU, Haswell does not. The transaction abort code will typically reveal that something happened that could be a page fault, but I don't think there is enough data available to replay the fault. Thus, on a fault, running the fallback code might be the best solution.

HTM limits

Posted Mar 15, 2012 17:25 UTC (Thu) by PaulMcKenney (subscriber, #9624) [Link]

Good to know, thank you!

McKenney: Transactional Memory Everywhere: 2012 Update for HTM

Posted Mar 15, 2012 17:02 UTC (Thu) by tvld (guest, #59052) [Link]

> It will be interesting to see how well Intel or someone else succeeds in integrating the HTM concept with actual programmer tools.

HTM is an implementation mechanism, but not necessarily a good programming abstraction. Transactional language constructs for C/C++, for example, are a proper abstraction (https://sites.google.com/site/tmforcplusplus/); Intel is one of the authors of this
specification. It is implemented by GCC, for example.

McKenney: Transactional Memory Everywhere: 2012 Update for HTM

Posted Mar 11, 2012 19:37 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

BTW, there's an interesting study or retrofitting existing code to use STM: http://research.microsoft.com/en-us/um/people/tharris/pap...

They've modified the classic Quake engine to use Software Transactional Memory. It turned out to be doable, though not exactly easy. Gains from parallelism were modest, but they were present.


Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds