one real-world example of lock ping-pong with futexes that I ran into recently was rsyslog (whould be reduced in recent versions)
it has threads that receive messages and add them to a (lock protected) queue, while other threads retrieve messages from the queue to output them.
with a simple UDP input and file output a high enough input rate could push it into lock contention, at which point throughput plummets. I can't say for sure that this is SMP cach line bouncing, but there's a good chance of this being the case.