Not logged in
Log in now
Create an account
Subscribe to LWN
Pencil, Pencil, and Pencil
Dividing the Linux desktop
LWN.net Weekly Edition for June 13, 2013
A report from pgCon 2013
Little things that matter in language design
If the data is less than 128 bytes, it can be just all read into SSE2 registers and then written out, which handles overlap fine.
Otherwise, you can just check (size_t)(src - dst) >= (size_t)length, which shouldn't be that expensive compared to the copy.
But anyway, why is a backward copy supposed to be faster? It would seem pretty silly to design a CPU such that copies are better done backwards.
Perhaps just converting the new algorithm to a forward copy would give the same improvements?
Glibc change exposing bugs
Posted Nov 11, 2010 11:03 UTC (Thu) by slashdot (guest, #22014)
Posted Nov 11, 2010 12:25 UTC (Thu) by NikLi (guest, #66938)
There is also a big advantage by doing that: hopefully gcc in some cases can detect the alignment of pointers at compile-time and use even faster variants, which is even more important.
At least we hope that the gcc devs will remain sane (inclusion of "go" frontend is scary knowing that google tends to withdraw services and software without much thought (wave, etc))...
Posted Nov 11, 2010 14:49 UTC (Thu) by nix (subscriber, #2304)
Go is not just 'google': Go (in GCC) is Ian Lance Taylor, who is a very-long-standing GCC hacker who doesn't have a record for abandonware (hell, he put out a new release of Taylor UUCP not too long ago, and how old is *that*?)
Posted Nov 12, 2010 14:32 UTC (Fri) by Wol (guest, #4433)
Bear in mind Intel processors are arse-about-face (otherwise known as big-endian). Running on a little-endian processor, there is a clear "top" and "bottom". So we can define forwards and backwards.
But on Intel, let's say I want to write the number 1,234,567,890. And my processor has a 3-digit word size. It actually physically exists in the system as 890,567,234,1 ! So where's the top, bottom, front or back?
The other question, of course, is does the address register increment or decrement faster. There's no reason why those two operations should be equal cost (there's no reason why they shouldn't be, either :-) And if they're different, the result will be a difference in speed going forward or backwards.
Posted Nov 15, 2010 9:47 UTC (Mon) by mpr22 (subscriber, #60784)
Posted Nov 15, 2010 11:50 UTC (Mon) by cladisch (✭ supporter ✭, #50193)
Posted Nov 15, 2010 12:15 UTC (Mon) by slashdot (guest, #22014)
In particular, won't just doing all reads before all writes ensure no aliasing regardless of CPU operation?
I think there are enough callee-clobbered registers on x86-64 to allow that.
That is, do this:
movq (%rsi), %rax
movq 8(%rsi), %rdx
movq %rax, (%rdi)
movq %rdx, 8(%rdi)
Also, their backward copy obviously aliases if rsi is 0xf00c instead of 0xf004. I'm not sure why either of these cases should be intrinsically more frequent.
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds