User: Password:
|
|
Subscribe / Log in / New account

Transactional memory for GCC 4.7.0

Following a last-minute request, the "transactional memory" GCC branch has been merged into the trunk for the 4.7.0 release. Transactional memory is specified in a draft standard [PDF] for C and C++; the idea is to provide a relatively simple way for developers to execute code with atomic "all or nothing" semantics.

A transaction looks something like this:

    __transaction {
	/* Stuff done here is atomic */
    }

Anything done within the __transaction block will either be visible to other threads in its entirety or not at all. Most exits from a transaction (return, goto, or break, for example) will cause it to close normally. There is a __transaction_cancel statement that can abort (and roll back) a transaction.

There are some constraints, naturally. If changes to a specific variable are protected by a transaction in one place, all accesses to that variable must be done within transactions. Transactions can only be rolled back if they consist of exclusively "safe" statements; performing I/O is an obvious way to lose transactional semantics. Exception handling gets more complicated. All this leads to a certain amount of complexity, with developers needing to mark functions as being "safe," add Java-like declarations of exceptions that a function may raise, and so on.

Details on the specific implementation are scarce; it appears that, in the current patch set, transactions will be implemented using a global lock. GCC developers debated for a bit over whether this code was ready for merging or not. In the end, though, the possibility of being the first to support an interesting new feature seemed to win out. Current plans are to release 4.7.0 sometime around next April.


(Log in to post comments)

Transactional memory for GCC 4.7.0

Posted Nov 10, 2011 2:49 UTC (Thu) by josh (subscriber, #17465) [Link]

This seems like the kind of thing better implemented as a GCC plugin, specific to whatever transactional system the user wants to use.

Transactional memory for GCC 4.7.0

Posted Nov 10, 2011 15:53 UTC (Thu) by tvld (guest, #59052) [Link]

I don't think so. Having several sets of language-level constructs for transactions does not seem to be useful. If you want to just change to a different implementation of these transactions, you can already do so by using another library instead of libitm.

If you're thinking about interfacing with other transactional systems (so, for other resources besides the application's address space), then you should look at the transaction_wrap function attribute, with which you can declare transactional wrappers for functions (e.g., library functions). However, this isn't yet part of any specification.

Transactional memory for GCC 4.7.0

Posted Nov 10, 2011 4:31 UTC (Thu) by quotemstr (subscriber, #45331) [Link]

While transactional memory is a neat feature, the particular model being proposed here could use some simplification. Consider this paragraph:
Note: A cancel-outer statement cancels only outer atomic transactions; the restrictions above imply that a cancel-outer statement cannot be executed when the outermost atomic transaction is not an outer atomic transaction. In contrast, an unannotated cancel statement can cancel an outer atomic transaction if it is the immediately enclosing atomic transaction
Why do we need separate outermost and atomic transactions? (The outer-most transaction isn't even necessarily an outermost transaction.) Why does "throw" commit a transaction by default, necessitating the creation of a separate cancel-and-throw operator? Why is there no retry provision built into the transaction? Why do we mix lexical and dynamic scope?

IMHO, it'd be better to simplify the whole thing and allow only simple operations and inline functions over simple operations inside transactions. Aren't most transactions, in practice, going to be of this variety anyway?

Transactional memory for GCC 4.7.0

Posted Nov 10, 2011 16:15 UTC (Thu) by tvld (guest, #59052) [Link]

Features for failure atomicity (cancel, cancel-outer, cancel-and-throw) require atomic transactions. A transaction marked as "outer" therefore must always be an atomic transaction. Nonetheless, failure atomicity is like an add-on, you don't have to use it or care about it if you don't need it (see the section at the end of the specification that explains the dependencies between features).

The reason why we restrict normal cancel to lexical scope and cancel-outer to specially marked transactions is that we do not want programmers guessing whether a call hidden somewhere in a larger transaction could cancel the transaction or not. Normal transactions give you exactly-once semantics, whereas transactions that can be canceled are at-most-once (while still being atomic of course).

If throw would cancel a transaction by default, we would change the semantics of existing code. So no, we don't want that.

"Retry provision": I'm not exactly sure which semantics you are referring to here (like what's been proposed by Harris et al.?). Note that you cannot busy-wait for some condition to change in a transaction, simply because it's supposed to be atomic and isolated.

Finally, we just don't know how people will use transactions. Sure, you often want to keep synchronizing regions small, but that will not be a good choice for some programmers or programs. So, from a specification perspective, "simplifying the whole thing" would mean to assume a given narrow use case and prevent other uses.

The current specification does not prevent you from using transactions in the simple form you mentioned, so if you want that, you can have it. Here's a simple recipe: Only use __transaction_atomic, and never call any functions in a different compilation unit. And that's it, no further annotations necessary (the specification allows compilers to automatically infer whether a function is transaction_safe, and this is what GCC does too).

Transactional memory for GCC 4.7.0

Posted Nov 10, 2011 15:48 UTC (Thu) by tvld (guest, #59052) [Link]

Some more information about the C/C++ constructs for TM can be found here (including links to lists where one can discuss the specification):
http://transform.t-labs.tu-berlin.de/tw11/torvalds.pdf

GCC implements two transaction keywords (__transaction_atomic and __transaction_relaxed instead of just __transaction), and this is also what the next version of the specification will require. Furthermore, there is only static checking of atomic transactions, exception specifications have been replaced with (optional) noexcept specifications as in C++11, and cancel-and-throw is not implemented by GCC currently.

Exception handling does not get more complicated because of the default commit-on-throw semantics, meaning that even if you add transactions, the behavior of code that throws exceptions will not change.

A certain variable can be accessed with and without transactions, as long as the program ensures using synchronization that there will be no concurrent transactional and nontransactional accesses of which at least one is a write (i.e., data-race-free code is required, and "concurrent" is as defined by the C++11 memory model extended by transactions). For example, you can use a transaction to remove an element from a shared data structure, and after the transaction has finished, access this element like any other private data (both privatization and publication are safe, provided that there are no data races). So, this is in line with the rest of the C++11 memory model.

We have been planning to write a TM tutorial that explains the specification in a more easily accessible way, but this just hasn't happened yet.

It is true that the current implementation uses TM algorithms that synchronize using a single global lock, but this is due to just not having enough time yet to include other algorithms. However, GCC's TM runtime library (libitm) has all the code in place to easily add further algorithms and switch between them at runtime. As a side note, using a single lock is not necessarily bad but depends on the workload at runtime; if transactions are executed rather infrequently, are often read-only, or have frequent data conflicts anway, this performs better than other approaches.

Intel TM ABI requires a lot of function calls

Posted Nov 10, 2011 17:21 UTC (Thu) by scottt (subscriber, #5028) [Link]

I was really surprised that the Intel TM ABI requires all load and store operations in transactions be translated into function calls that can't be inlined. Since the STM library, libitm, would like be built as a shared object on Linux this introduces extra jumps and register moves for each load/store operation.

tvld, do you think most pthread applications could be converted to this TM ABI without performance lost?

Intel TM ABI requires a lot of function calls

Posted Nov 11, 2011 11:57 UTC (Fri) by tvld (guest, #59052) [Link]

Transactions have to work across all code in a process, so we have to make sure that a mix of code compiled by different compilers works. Otherwise, you would have to have a single TM runtime library implementation. However, even though this is the default, it should still be possible to statically link an application and use link-time optimizations to inline everything. Alternatively, we have been discussing generating additional code paths for each transaction that run a particular TM algorithm/implementation but this hasn't been implemented yet.

Regarding load/store runtime overheads, the function calls are part of it but synchronization overheads can be still higher, depending on TM algorithm and workload (e.g., if you get a cache miss when reading data modified previously on another core).

You can use TM with applications that handle threading via pthreads, so I guess your question was about performance of TM vs. locking via pthread mutexes. There is no general answer to that because it really depends on what kind of locking scheme you have, and what kind of synchronization workload your application encounters at runtime. Example 1:, well-tuned fine-granular locking can often perform better than generic SW-only implementations of TM (STM) -- unless you have a lot of read-sharing and pay for contention or cache ping-pong on your readlocks (in this case, the "invisible reads" used in many STMs are usually better). Example 2: Course-granular locking probably often has worse performance than a typical STM -- unless the thing you're protecting with the lock really is a sequential bottleneck, and there is no parallelism in there (in this case, better execute it as fast as possible). Overall, this might sound tricky, but you're facing the same issues as well when just using locks (should you use fine-granular or coarse-granular locking? should you use RW locks?).

What TM tries to do is to give the programmer a simple way to synchronize between nontrivial operations, with best-effort performance. Given that optimizing synchronization is tricky and time-consuming, TM might even give you better performance than implementing something custom when having a normal (aka small) development-effort budget for synchronization. With TM, you declare what synchronizes, but not how this is implemented. In turn, you can benefit from improvements in the generic TM implementations without changing anything in your program. So, TM is supposed to give you better options in the trade-off between development effort and performance. And if you would get HW support for TM (e.g., like on IBM BlueGene), then performance differences between locking and TM will become even smaller and development effort might be the significant factor.

Transactional memory for GCC 4.7.0

Posted Mar 22, 2012 16:12 UTC (Thu) by slashdot (guest, #22014) [Link]

Wouldn't it be better to have the __transaction statement also specify a mutex, so that if HTM is not available the code can fallback to taking the mutex instead?

With the current proposal, the code contains no information on which transactions can conflict with each other, meaning that if HTM is not available, either a full STM implementation or a global mutex would have to be used, and that's probably a serious performance issue either way.

Of course, one may then skip the new syntax entirely, and just add a mutex attribute to enable lock elision...


Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds