Bear in mind Intel processors are arse-about-face (otherwise known as big-endian). Running on a little-endian processor, there is a clear "top" and "bottom". So we can define forwards and backwards.
But on Intel, let's say I want to write the number 1,234,567,890. And my processor has a 3-digit word size. It actually physically exists in the system as 890,567,234,1 ! So where's the top, bottom, front or back?
The other question, of course, is does the address register increment or decrement faster. There's no reason why those two operations should be equal cost (there's no reason why they shouldn't be, either :-) And if they're different, the result will be a difference in speed going forward or backwards.
Posted Nov 15, 2010 9:47 UTC (Mon) by mpr22 (subscriber, #60784)
[Link]
I must confess to being utterly boggled by the notion of a backwards block copy (decrementing address) being faster than the forward (incrementing address) version. I mean, doesn't backward copying break the memory controller prefetch?
Glibc change exposing bugs
Posted Nov 15, 2010 11:50 UTC (Mon) by cladisch (✭ supporter ✭, #50193)
[Link]
Posted Nov 15, 2010 12:15 UTC (Mon) by slashdot (guest, #22014)
[Link]
That rationale seems a bit dubious.
In particular, won't just doing all reads before all writes ensure no aliasing regardless of CPU operation?
I think there are enough callee-clobbered registers on x86-64 to allow that.
That is, do this:
movq (%rsi), %rax
movq 8(%rsi), %rdx
movq %rax, (%rdi)
movq %rdx, 8(%rdi)
Also, their backward copy obviously aliases if rsi is 0xf00c instead of 0xf004. I'm not sure why either of these cases should be intrinsically more frequent.