LWN.net Logo

On breaking things

On breaking things

Posted Dec 4, 2010 23:11 UTC (Sat) by nix (subscriber, #2304)
In reply to: On breaking things by RogerOdle
Parent article: On breaking things

Implementation defined means that the property is undefined.
They are distinct and always have been: implementation-defined means that an implementation can choose what to do at that point, undefined means that the program is no longer C and that anything might happen, even before the code invoking undefined behaviour is reached. Quite different.

Overlapping memcpy() is undefined. It is not implementation defined: implementations need do nothing sensible when it is executed, and if they do do something sensible, they are setting up a portability trap for everyone who implements anything for the first time on that platform. (Solaris is infamous for this, with a malloc() that allows double-free() and use-after-free() and God only knows what else without a murmur. Take your Solaris platform and put it on a Linux platform, or a FreeBSD platform, or a Windows platform and *boom*, and everyone blames the Linux/FreeBSD/Windows system: but it is the Solaris system that was at fault, for being tolerant of bad code past sanity.)

Might I suggest that next time you learn something about the C standard before you try to argue about it? You only need to read the first dozen pages to get this distinction. This is a radical suggestion on the net, I know, but I think you'll find it worthwhile...


(Log in to post comments)

On breaking things

Posted Dec 4, 2010 23:48 UTC (Sat) by RogerOdle (subscriber, #60791) [Link]

Please do not lecture me on standards. I am an engineer and I live by them. I have written many specifications and they come back to bite you when you make a mistake so I take extra care not to make them in the first place. When you leave a hole in your specification like saying that it is implementation specific then two people are going to use it differently and one is going to blow up. I do know something about the C standard and in this case the standard leaves one wanting something better.

You can not deny that the insufficiency of the memcpy standard has caused problems. It is easy to say that programmers should be mindful about copying overlapping memory but the reality is that they are not and this is the root of the problem. You can stay in you ivory tower where everything is perfect if you want but I want thing to be easier. I want to get rid of the stumbling blocks like memcpy. Even if programmers know about memcpy, they are still going to forget and this issue is going to come up again. It has before and it will again.

I would argue that since memcpy is a core C function, that it should work all of the time no matter how the addresses overlap. If someone wants an optimized function then they should look elsewhere. No function should go into the core whose behaviour is "implementation defined". I should be able to use the core functions anywhere and get the same results everywhere. I mentioned before that things would be better if the C standard had memcpy alternatives for copying from to lower addresses and for copying to higher addresses. Each of these could be optimized and memcpy could be modified to pick which one to use based on its arguments. If someone wants to optimize performance then they can call these functions directly. But memcpy would always work! It the C standard did this reasonable thing then we wouldn't be having this argument.

My point all along has been that memcpy has once again given us a black eye. If isn't fixed this time then next year we will be having this same argument all over again. How about it GCC just does us all a favor and throws out a warning whenever memcpy is used?

On breaking things

Posted Dec 5, 2010 3:51 UTC (Sun) by magila (subscriber, #49627) [Link]

Implementation defined and undefined behavior exists for a good reason, without them C could not be as versatile or performant as it is across multiple platforms. Sure it can be a pain to deal with, but this is trade-off which lies at the core of C's design philosophy.

If you don't want to deal with ensuring that memory regions don't overlap then you are certainly welcome to use memmove. Although even then, if you are inadvertently copying between overlapping regions there's a chance you'll still have a bug because you really didn't want to clobber the source region.

Frankly, it sounds like what you want isn't C at all. There are plenty of high level languages which will (attempt to) give you safe and consistent behavior across all supported platforms. Of course, to do this those languages accept a different set of trade-offs from C, but if you really don't want to have to deal with implementation defined or undefined behavior I expect you will find them a better match for your needs.

On breaking things

Posted Dec 5, 2010 20:57 UTC (Sun) by anselm (subscriber, #2796) [Link]

I would argue that since memcpy is a core C function, that it should work all of the time no matter how the addresses overlap.

This is wishful thinking. The fact that memcpy() is not guaranteed to work when the source and target ranges overlap has been documented, in an ISO standard no less, for 20 years now. C programmers ignore this at their own peril.

There is a lot to be said for the observation that memcpy() should never have been standardised that way, but that observation ought to have been made before ISO 9899-1990 was finalised. Even if the GNU libc programmers changed their version of memcpy() back to suit your preferences, the fact that ISO C disallows overlapping copies is only going to bite you again on the next C implementation you're going to port your code to.

On breaking things

Posted Dec 13, 2010 20:30 UTC (Mon) by nix (subscriber, #2304) [Link]

Exactly. The C Standard will never change in this respect, and even if it *did* it would be many decades before everyone could rely on it and a source of portability nightmares until this day.

Language standards for major languages do not change that easily. (Look up the history of the && versus & precedence rules, and why & is wrong. That was back when there were only a few C installations, and they *still* held off changing it.)

On breaking things

Posted Dec 5, 2010 12:54 UTC (Sun) by paulj (subscriber, #341) [Link]

And GNU libc checks strings passed to printf for a %s placeholder for NULL, when the C standard says this is not allowed. So a lot of code that runs fine on Linux would blow up on Solaris. I think eventually the Sun engineers relented and held their nose and made Solaris libc similarly check.

So I don't think anyone has full claim to being pure as the snow...

On breaking things

Posted Dec 13, 2010 20:35 UTC (Mon) by nix (subscriber, #2304) [Link]

They did, and your statement is true enough (though, as usual, general logging functions still have to guard against unintended NULLs, because where you can get unintended NULLs you can also get wild pointers, and those will crash anything). But as a general principle, glibc is more paranoid than Solaris libc. (Everyone at work moans about this except for me. I celebrate it. It's caught a good few bugs, although less than the saviour of all tricky bugs, valgrind :) )

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds