Everybody talks about how STM removes locking. This is bollocks. I read all the relevant theoretical papers, and poured over the code of existing implementations. I tried to implement a simple STM mechanism, and it's not possible for any amount of data more than 16-bytes (the largest atomic xchgcmp on the ubiquitous x86). This is because pure STM requires the LL/SC (load-linked/store-conditional) processor capability. But NO EXISTING CPU DOES FULL LL/SC, and x86 doesn't even have LL/SC, period (cmpxchg was _proven_ insufficient as a universal primitive for STM 20 years ago). LL/SC requires complicated and non-performant chip circuitry to guarantee the LL/SC requirements. Basically, _any_ write--no matter the op--to an LL address needs to set a flag so a following SC can fail and the code can restart the copy operation; in practice this costs too much. Chips which nominally support LL/SC only make the guarantee for the width of a cache line, IIRC.
Every existing STM library actually uses locks internally. Period. Please, stop the cargo cult hyperbole. The only real STM implementations are on paper, or maybe in a lab w/ a custom ASIC.