And in this situation the lock will be in L1 data cache, while the data *and* instructions needed for switching contexts will be in L2 at best, most likely in memory. Spinning within a shared cache is good for performance (not discussing energy, many other factors in play for that). Short locking times where the lock sits in shared cache with access times in the 10s of cycles call for spinning.
A problem with adaptive spinlocks occurs when you spin on something bounced between memory / caches / processors. By the time you swap out, you've eaten a large cost and blown a lot of memory traffic. I suspect the kernel's in a better position to know what's where with less cache overhead than user-space adaptive spinlocks, so this is sounding potentially great.