Let's see, page 139:123...
Fully coherent instruction caches do exist on some systems. Replicated registers for fast
interrupt handling were present in some 8080-based CPUs back in the late 70s and early 80s,
but have fallen out of favor, though one might get the same effect on modern multithreaded
hardware by reserving one thread to process interrupts on behalf of the other, but not sure
that this would be worthwhile. Double-compare-and-swap is indeed useful, but doesn't exist on
the systems I have access to, despite its appearing in the 68K line quite some time ago.
Large caches, large cacheline sizes, and very high memory bandwidths have reduced the urgency
of partial-register saving, and again, in some cases you could assign processes to hardware
threads -- but perhaps this will become more compelling in the future. Some CPUs have vector
operations that do some of the multi-byte operations (e.g., SSE, 3DNow, Altivec, ...), and
perhaps compilers will become more aggressive about implementing them. Some of the fancier
bitwise operations are indeed useful in some contexts, so who knows.
But I do apologize for my extreme bigotry in favor of hardware features that I can use today
across a wide variety of CPU types. ;-)
The operating systems I work with -do- have priorities, so I don't get to ignore them.
Perhaps more sophisticated scheduling policies will come into vogue. There are certainly
people working on this. In any case, stock Linux running on commodity microprocessors does
context switches orders of magnitude faster than the one-millisecond figure called out on page
142:126 -- old-style Moore's law at work.
Hardware cache-coherence protocols coupled with high software memory contention really can
result in priority-inversion-like effects. I have seen this, logic analyzer and everything.
Would you care to outline your objections in more detail?