<start lock-free STM cargo cult hyperbole>
STM _can_ remove locking on modern hardware, but you're using the wrong primitives.
To implement a fully lock-free STM you use an atomic n-CAS. You are correct that LL/SC or DCAS isn't available on a modern CPU directly. However, you CAN build an n-CAS out of that very same x86 cmpxchg16b-style CAS operation. The derivation isn't very straight forward, so I'm not surprised that you didn't come up with it by yourself, but a construction exists. You wind up comparing and swapping the values of n 'slots' atomically, rather than swapping n machine integers directly.