Having this write-up, to say nothing of an online copy of Massalin's dissertation, would have
saved me much time back when I was doing my own dissertation. ;-) Good to see both now
available!!!
One of the key strengths of Massalin's work is the focus on determining what techniques work
well in given situations, and then matching up techniques with the corresponding situations.
"Use the right tool for the job!"
Although lock-free techniques can be quite valuable for situations requiring real-time
response, as can other non-blocking-synchronization (NBS) techniques, these techniques are not
a panacea. NBS algorithms rely heavily on hardware arbitration, which is usually unaware of
process priorities. This can result in priority-inversion-like effects when the hardware
gives the contended cache line to the low-priority process.
Posted Mar 20, 2008 19:24 UTC (Thu) by olecom (guest, #42886)
[Link]
> NBS algorithms rely heavily on hardware arbitration, which is usually
> unaware of process priorities.
There are no priorities and scheduler is I/O rate-based, PLL-managed.
On pdf.page 139 (123) find some ideas for hardware and open yor eyes
(pdf.page 142 (126).
> This can result in priority-inversion-like effects when the hardware
> gives the contended cache line to the low-priority process.
Oh, read whole dissertation, please.
_____
Great writeup!
Posted Apr 3, 2008 19:50 UTC (Thu) by PaulMcKenney (subscriber, #9624)
[Link]
Let's see, page 139:123...
Fully coherent instruction caches do exist on some systems. Replicated registers for fast
interrupt handling were present in some 8080-based CPUs back in the late 70s and early 80s,
but have fallen out of favor, though one might get the same effect on modern multithreaded
hardware by reserving one thread to process interrupts on behalf of the other, but not sure
that this would be worthwhile. Double-compare-and-swap is indeed useful, but doesn't exist on
the systems I have access to, despite its appearing in the 68K line quite some time ago.
Large caches, large cacheline sizes, and very high memory bandwidths have reduced the urgency
of partial-register saving, and again, in some cases you could assign processes to hardware
threads -- but perhaps this will become more compelling in the future. Some CPUs have vector
operations that do some of the multi-byte operations (e.g., SSE, 3DNow, Altivec, ...), and
perhaps compilers will become more aggressive about implementing them. Some of the fancier
bitwise operations are indeed useful in some contexts, so who knows.
But I do apologize for my extreme bigotry in favor of hardware features that I can use today
across a wide variety of CPU types. ;-)
The operating systems I work with -do- have priorities, so I don't get to ignore them.
Perhaps more sophisticated scheduling policies will come into vogue. There are certainly
people working on this. In any case, stock Linux running on commodity microprocessors does
context switches orders of magnitude faster than the one-millisecond figure called out on page
142:126 -- old-style Moore's law at work.
Hardware cache-coherence protocols coupled with high software memory contention really can
result in priority-inversion-like effects. I have seen this, logic analyzer and everything.
Would you care to outline your objections in more detail?