LWN.net Logo

On breaking things

On breaking things

Posted Nov 25, 2010 3:47 UTC (Thu) by iabervon (subscriber, #722)
Parent article: On breaking things

On the memcpy thing, I have to wonder whether it wouldn't be pretty much optimal on modern processors to implement both memcpy and memmove as the same, using a pair of implementations, each of which works for a different direction of overlap, and either just comparing the pointers to see which way the potential overlap is. If one direction is really substantially faster, you could also check whether there's any overlap and always use that one if there isn't.

That is, I expect that libraries are full of calls to memmove where the library doesn't know that the buffers don't overlap, and so the performance of memmove on non-overlapping buffers matters; and probably a few tests on things in registers aren't going to be noticeable next to a loop and memory accesses.


(Log in to post comments)

On breaking things

Posted Nov 25, 2010 20:59 UTC (Thu) by kleptog (subscriber, #1183) [Link]

Doing memory copies on buffers that overlap is exceedingly rare. As a library it's something you really don't need to worry about. If you're writing code and can't tell if two pointers could overlap, you're doing something wrong.

Another way to look at it is that programs consist of objects. Objects don't overlap because you allocate them that way (they may nest). About the only situation where you do a memory copy on overlapping buffers in when you're shifting objects within an array, and there it's *obvious* that's what you're doing.

It's really not a situation you get into by accident. Deoptimising the common case to save the 0.00000001% (which is an estimate on the high side) of memcpys that are broken is just silly. Valgrind detects this for you, this is really a non-issue.

On breaking things

Posted Nov 26, 2010 15:19 UTC (Fri) by zlynx (subscriber, #2285) [Link]

Compare and branch is an expensive operation. In a lot of cases the memcpy would be finished by the time memmove has figured out which direction to copy in.

On breaking things

Posted Nov 26, 2010 20:21 UTC (Fri) by iabervon (subscriber, #722) [Link]

Is there some architecture where there's a way to do a loop without compare and branch and the cost of the compare and branch doesn't go away when followed by a load with offset? For in-order architectures I've seen, the test is no move expensive than the overhead of one loop iteration; for out-of-order architectures, the processor should know which path it's actually running shortly before it would know which address it's loading anyway. But I confess when I was last looking at out-of-order architecture design was when the Pentium 4 was being designed, so I may be expecting the processor to analyze way too much.

On breaking things

Posted Nov 26, 2010 20:25 UTC (Fri) by dlang (✭ supporter ✭, #313) [Link]

I also question the expense of the test, all the values needed are already in registers, I would expect that the fact that the CPU is so much faster than the ram would let the test happen without noticably affecting the overall memory copy time (since the memory copy time needs to at least wait for the cache, if not for the main memory)

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds