> I never knew atomic increments could be so expensive!
It would indeed be nice to remove the limit that memory can only be read or written by the processor, and instead "post" operations to a more intelligent memory unit (i.e. memory + cache + simple logic), i.e. the main processor could have the assembly instruction "post increment memory address", "post xor 0x03 to memory address".
The fact that only the processor(s) can do memory modifications seems to be very limiting IHMO.
Posted Jan 5, 2011 18:52 UTC (Wed) by PaulMcKenney (subscriber, #9624)
[Link]
One can argue that NUMA architectures associate CPUs with memory. Probably not exactly what you had in mind, but careful allocation of memory and binding to CPUs can allow you to get much of the benefit that you are after.
Paul McKenney's parallel programming book
Posted Jan 6, 2011 17:26 UTC (Thu) by etienne (subscriber, #25256)
[Link]
NUMA architectures are more for totally different tasks for each processor.
IHMO there is a need to update a word in another-processor-cache without all the complex code you are using for RCU or counting high frequency events, and without even exchanging the cache line (i.e. in hardware).
Hopefully in next generation CPU, the assembly instruction "post inc eth_counter" will not update local processor flags, and send a request to every cache to update this address - and if no cache has this line to behave like "inc eth_counter"; and use "post or 0x100,rcu_period" to finish a grace period, irrespective of where rcu_period variable is.
It seems to me, if the solution to increase a variable is too complex, it is because the problem is being handled at the wrong level - i.e. a hammer is being used to change the light bulb.
I have written the initial comment before reading "Quick Quiz 4.10", but I still think some millions of those processor's transistor could be used to reduce the number of values a single variable can hold at the same time...
Paul McKenney's parallel programming book
Posted Jan 6, 2011 23:52 UTC (Thu) by cmccabe (guest, #60281)
[Link]
Well, another way to make things easier for programmers is to implement things like RCU in libraries-- which Paul and others have done. The complexity may still be there, but it's hidden from most RCU users.
I'm not sure if an in-hardware lazy update cache would be more difficult to understand or less for developers. In general, application programmers are horrified by the concept of eventual consistency. The problem from their perspective is not the lack of direct hardware support, but the concept itself. Things like SQL were developed so that they wouldn't have to think about these kind of ordering issues.
As long as its just us systems programmers dealing with this stuff, I doubt that Intel or AMD is going to be very motivated to make it "simpler."
Paul McKenney's parallel programming book
Posted Jan 6, 2011 22:44 UTC (Thu) by BenHutchings (subscriber, #37955)
[Link]