Having considered this topic for a bit, I wonder: given that there are two uses of CAS involved in ticket spinlocks, i.e. one on next-ticket (lock immediately or wait), and one on now-serving (unlock), which is as many as with regular locked/not spinlocks, the issue is clearly the increased _non-local_ write traffic on the data structure under contention. This seems to suggest a solution where spinlocks associated with objects smaller than a cacheline are moved out of the data and into, say, a hashed group keyed like the objects' parent data structure, trading some false contention for space.
That'd protect the significant cachelines from not only write-bouncing from ticket-acquisition, but from any spinlock-related "oops, had to flush this exclusive cache line to RAM in the meantime" cases due to read traffic also. I'm guessing that an operation to acquire locks on N objects without ordering foulups could fit on top of that as well.