User: Password:
|
|
Subscribe / Log in / New account

Ticket spinlocks

Ticket spinlocks

Posted Feb 11, 2008 9:00 UTC (Mon) by Nick (guest, #15060)
In reply to: Ticket spinlocks by PaulMcKenney
Parent article: Ticket spinlocks

That's a good point. And yes I guess if that happens, then that will cause all other waiters
to spin longer as well.

Although I'm not sure that it is a problem in practice. The theory is that most of the time we
have no contention, then FIFO locks are no worse than unfair locks. Actually we could have up
to 1 process spinning on the lock, and NMI/IRQ delay behaviour shouldn't be any different than
unfair locks. It is only when you start getting more contention that it could start to be a
problem...

But the heavily contended case often means performance has tanked anyway, and we are actually
willing to sacrifice some performance to ensure fairness and forward progress.

In short -- I'm hoping that the situation you ask about should not be a real problem.

Unfortunately, as with a lot of these tricky interactions and uncommon workloads, it may be a
release or two before we hear anything about it. But ticket locks are definitely something for
kernel devs to keep in mind if investigating some unusual performance regression in future ;)


(Log in to post comments)

Ticket spinlocks

Posted Feb 11, 2008 15:27 UTC (Mon) by PaulMcKenney (subscriber, #9624) [Link]

Indeed, you do have to have two tasks spinning on the lock before the NMI/IRQ delay could
possibly cause a problem.  So I also hope that this effect is not a problem in practice, but
as you say, testing and experience will be required.

And ticket locks do have nice real-time properties, as they get rid of the possibility of
starvation!

Ticket spinlocks

Posted Feb 12, 2008 17:45 UTC (Tue) by bgamsa (guest, #33057) [Link]

Has anyone looked at MCS locks recently in the context of Linux and modern hardware. I believe it has more requirements in terms of hardware primitives, but it does provide queueing and local spinning.

Ticket spinlocks

Posted Feb 12, 2008 18:08 UTC (Tue) by PaulMcKenney (subscriber, #9624) [Link]

We did take a look a few years ago.  Of course, pure MCS locks have an incompatible API, but
the K42 folks at IBM Research came up with a variant of MCS locks that matched the
Linux-kernel spin_lock() primitives.  If I remember correctly, MCS locks outperformed the
various NUMA-aware locking primitives, but were outperformed by simple spinlocks.

I would expect that ticket locks would out-perform MCS locks at low contention levels (the
common case in modern Linux kernels).  I am not sure how MCS locks and ticket locks would
compare at high levels of contention -- the ticket locks avoid the "thundering herd" problem,
so might do quite well.

Any benchmark results out there?

Ticket spinlocks

Posted Jul 28, 2008 22:35 UTC (Mon) by okeydoke (guest, #46751) [Link]

Ticket spinlocks perform terribly for uncontested and highly contested cases. Test cases on
LKML do not show any of this (a very trivial test case was done that does not show lock *and*
unlock cost).

Here are relevant numbers on 16-core (8x2) AMD64 machine:
Highly contested, MCS has 30724 a/s (average) and ticket locks have 6315 a/s (average). In the
uncontested case, MCS has 836744 a/s (average) and ticket locks have 688817 a/s (average).
Both implement FIFO fairness, anyways, so I do not expect relevance WRT "thundering herd"
problem.

MCS will perform better than the Anderson locks implemented under the name of "Fairlocks" and
"J-Locks" for Linux, also (I assume these are the NUMA-aware locking primitives you are
talking about?).

There are other implementations that perform better than both of these. Ticket lock with
proportional back-off helps alleviate performance issues in the highly contested case.

Ticket spinlocks

Posted Jul 28, 2008 22:37 UTC (Mon) by okeydoke (guest, #46751) [Link]

This is my MCS implementation and an optimized (for pipeline) Linux ticket lock
implementation. The results are not for the K42-variant MCS locks originally provided to
Linux.

Ticket spinlocks

Posted Dec 12, 2009 9:44 UTC (Sat) by bankkus (guest, #62476) [Link]

>> And ticket locks do have nice real-time properties, as they get rid of
>> the possibility of starvation!

Is that a correct statement? Not an x86 guru but whatever instruction is used to atomically increment the 16bit number is that guaranteed to complete in hw? In other words consider a 16 way processor if all threads incremented simulataneously is there a guarantee all 16 threads increment (in whatever order) or is it possible that on some of them the thread can context switch out before execution. If such is true then it does not guarantee no starvation.

Ticket spinlocks and MP-guest kernels

Posted Feb 12, 2008 19:51 UTC (Tue) by eSk (guest, #45221) [Link]

There is also the case for multi-vCPU guests running in a VM.  For regular spinlocks you have
the problem that a guest vCPU may be preempted while holding a spinlock, thereby potentially
causing large delays for other vCPUs (i.e., think spinlock wait times in the order of hundreds
of milliseconds rather than nanoseconds).  A guest vCPU0 is preempted while holding a
spinlock.  Guest vCPU1 is scheduled and spends its whole timeslice wating for the usually
shortly held spinlock to be released; not knowing that the holder of the spinlock is not
really running at the time.

One might argue that it's stupid to run an MP guest system on a single physical processor, but
the above scenario is also present for MP guests running on multiple physical processors.  You
only get around the problem if you perform strict gang-scheduling of all the vCPUs.  The other
option is to add heuristics to prevent prempting guest vCPUs holding spinlocks.

Using ticket spinlocks the problem with preempting lock-holders is increased.  Not only do you
have the problem that preempting a lock-holder is bad for performance, but since all the
waiters must be granted the lock in FIFO order one had better make sure that the different
lock contenders are scheduled in the proper order.  Failure to do so will massively increase
the lock waiting time.  With regular spinlocks you have the chance that any one of the waiting
vCPUs can grab the lock and be done with it quicky.  With ticket spinlocks you don't have this
option.

I would expect that anyone trying to run a multi-MP guest kernel will run into this
performance problem rather quickly (subject to number of vCPUs and kernel workload, of
course).

Ticket spinlocks and MP-guest kernels

Posted Feb 14, 2008 7:53 UTC (Thu) by goaty (guest, #17783) [Link]

You could get a lovely cascade going. VCPU0 grabs the spinlock, then is pre-empted. VCPU1
spends its whole time slice waiting for the spinlock. VCPU2 is scheduled, and starts queuing
for the spinlock. VCPU0 is scheduled, releases the spinlock, goes to sleep. The host scheduler
looks for another thread to schedule. VCPU1 and VCPU2 have just been busy-waiting so they get
penalised. VCPU3, VCPU4, VCPU5, etc., each get scheduled in turn, each run until they hit the
spinlock in question, and start busy waiting. The cascade can continue until we run out of
virtual CPUs. If the rate at which CPUs manage to acquire the lock is slower than the rate at
which CPUs attempt to acquire the lock, the cascade can continue forever!

Although on a general-purpose system the likelihood that all the CPUs would keep trying to
acquire the same spinlock is pretty small, virtual guests are often used to run specific
workloads. Since all the virtual processors are doing the same kind of work, it is quite
likely that they would all keep contending the same spinlock. Which is exactly the situation
that ticket spinlocks are supposed to help with!

Probably anyone who's running MP guest kernels had already got each virtual CPU locked to a
different real CPU, or is running with realtime scheduling, precisely to avoid this kind of
problem. If not, then ticket spinlocks should provide them with all the motivation they need!
:-)

Ticket spinlocks and MP-guest kernels

Posted Feb 14, 2008 16:38 UTC (Thu) by PaulMcKenney (subscriber, #9624) [Link]

Gang scheduling within the hypervisor would be one way to avoid this issue, and without the
need to lock the VCPUs to real CPUs.


Copyright © 2018, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds