LWN.net Logo

On breaking things

On breaking things

Posted Dec 1, 2010 5:19 UTC (Wed) by RogerOdle (subscriber, #60791)
Parent article: On breaking things

The documentation for memcpy says that memory regions should not overlap but who reads the documentation until something breaks? What should be done now? Considering how wide spread memcpy use is, this is not trivial. Who should determine what the memcpy functionality should be? At some point, enough people may use a function that inertia takes control away from the original writers. Should the writers change the functionality just to make it faster? What else could be done?

Solutions:

1. Go ahead and make the change. The problem is that memcpy works differently on different architectures. This is a bad thing. It results in portability problems.

2. Revert memcpy to they way it worked before and introduce new memcpy alternatives that are optimized. I would suggest one version for copying from higher-memory-to-lower (memcpyhl) and another for lower-memory-to-higher (memcpylh). The original memcpy may be modified so that it is safe for overlapped memory by selecting which optimised version to branch to. This would be compatible with existing applications and provide the faster algorithms for future applications.

3. Change the memcpy spec so that it says explicitly what way it will copy so that it can be made to work the same way on all architectures. This will not fix the current problem but, for the future, memcpy will always work the same way on every platform.

We have seen the effects the change has had on highly visible projects, it has surely effected many more projects. It is not enough to say that the writers had the right to make the change. They also have a responsibility to the people who trust that the behaviour of the library will be consistent. You have to be sensitive to the fact that the change will cost your users time and money. It is not a way to endear friends.


(Log in to post comments)

On breaking things

Posted Dec 1, 2010 14:01 UTC (Wed) by mpr22 (subscriber, #60784) [Link]

The documentation for memcpy says that memory regions should not overlap but who reads the documentation until something breaks?

People who've learned the hard way that reading the manual before you use something is a really good idea. This is, given the way the human brain works, not a lesson that can be reliably taught the easy way, though you may be able to learn it the medium-hard way (from someone else's mistakes, rather than your own).

Who should determine what the memcpy functionality should be?

Well, the accepted answer is not "random idiot programmers", but rather "the ISO and IEC technical committees responsible for maintaining ISO/IEC 9899, the international standard for the C programming language".

ISO/IEC 9899:1990 and its replacement ISO/IEC 9899:1999 both declare some things to be implementation-defined (meaning that the implementation must define and document the behaviour) and some things to be undefined (anything can happen - if your hardware supports nasal demons, triggering undefined behaviour is permitted to summon demons from your nose).

The behaviour of overlapping memcpy is formally undefined in the C standard; this allows creators of conforming implementations to implement it in the way that makes most sense for their platform.

The behaviour of overlapping memmove is clearly defined in the standard. After a memmove, the destination area contains a perfect copy of the source area's original contents, and any part of the source area overlapped by the destination area will be overwritten.

On breaking things

Posted Dec 1, 2010 16:41 UTC (Wed) by RogerOdle (subscriber, #60791) [Link]

Implementation defined means that the property is undefined. Should something other than memcpy have been used. Sure but it wasn't. Now we have a problem and we have to pick a solution to get us out of the current situation. You can argue about what should have been done but one problem you can not deny is that this issue has given the community a black eye. I am sure that some people get some thrill telling big companies to toe the line but we should not let egos dictate our actions. The fact is that the behaviour of a core routine was modified without due consideration or concern for the consequences. There was no warning and no plan in place to deal with the consequences. This left people scrambling and worrying. Would it have been so difficult for the glibc people to produce a glibc version that worked the old way.

I think that this would be a acceptable way to change the behaviour of a function. If a function in libfoo-1.0.so is changed then release it with libfoo-1.0.t.so (t for transition) at the same time. It should be understood that the particular change will only be present for this release. That should give developers time to adjust to the changes while providing them with a solution that allows software in the field to continue to work. If it is simple enough that an environment variable can be set to select the library before an application is run then that is a simple solution for the end user. The present solution for Firefox/Flash is just too technical for most people.

I personally do not like functions like memcpy that blow up like this. memcpy should always have worked regardless of whether the memory regions overlap or not. This issue is not new. It has been a perpetual stumbling block since the beginning. People think that when something has a simple name and a simple task then it should just work. Putting use conditions on something as simple as "memory copy" is just awkward. Even if a developer doesn't violate memcpy requirements, someone else may use a library that uses memcpy which is just further complication.

On breaking things

Posted Dec 1, 2010 19:59 UTC (Wed) by mpr22 (subscriber, #60784) [Link]

It doesn't take long, observing humans, to realize that they will routinely and repeatedly put off doing anything about something that is not yet a clear, present, and immediate problem for them, especially if they can come up with a good reason why someone else should do it instead, and even people who are aware of this phenomenon are guilty of it.

Tell them "your program will break when version x.(y+1) comes out and removes x.y's transitional compatibility with x.(y-1)" and they'll say "ok, I'll fix it when x.(y+1) comes out", then go into a blind panic when x.(y+1) really does come out and really does break their program.

On breaking things

Posted Dec 1, 2010 20:20 UTC (Wed) by RogerOdle (subscriber, #60791) [Link]

That is very true but some people are forward looking though. It doesn't help that their source of income can be changed because a core component works differently after an upgrade. Upgrades are important for security and stability. They can not be ignored. Planning for a change like memcpy is better than hitting people cold.

On breaking things

Posted Dec 2, 2010 9:04 UTC (Thu) by mpr22 (subscriber, #60784) [Link]

People responsible enough to be forward-looking about these things should be expected to have read the documentation for the APIs they use, and to accept that if they used an API contrary to the established international standard, it's 100% their fault and their responsibility when their program breaks.

On breaking things

Posted Dec 2, 2010 9:10 UTC (Thu) by linuxrocks123 (guest, #34648) [Link]

No. Just no. This is like saying, "well, this particular case of a use after free worked before, so changing the implementation of malloc to break this program without warning is irresponsible." No. It's not. There is a clear delineation of responsibilities. That delineation is documented in the relevant standards. That delineation says that if you use memcpy with overlapping regions, you are doing it wrong. This has been the case SINCE THE DAWN OF C. Anyone using memcpy with overlapping regions wrote a buggy program. Anyone whose program is broken now has to fix it. Their programs are buggy; they need to fix the problem.

If you don't hold people accountable, you encourage bad behavior. I'm glad glibc has a maintainer willing to hold people accountable for their bugs.

That being said, if Fedora wants to roll back to the old version until the bug is fixed, that is their business, but I wouldn't recommend holding every other program back for Adobe Flash. Just ship a workaround library for Flash and be done with it.

---linuxrocks123

On breaking things

Posted Dec 4, 2010 23:11 UTC (Sat) by nix (subscriber, #2304) [Link]

Implementation defined means that the property is undefined.
They are distinct and always have been: implementation-defined means that an implementation can choose what to do at that point, undefined means that the program is no longer C and that anything might happen, even before the code invoking undefined behaviour is reached. Quite different.

Overlapping memcpy() is undefined. It is not implementation defined: implementations need do nothing sensible when it is executed, and if they do do something sensible, they are setting up a portability trap for everyone who implements anything for the first time on that platform. (Solaris is infamous for this, with a malloc() that allows double-free() and use-after-free() and God only knows what else without a murmur. Take your Solaris platform and put it on a Linux platform, or a FreeBSD platform, or a Windows platform and *boom*, and everyone blames the Linux/FreeBSD/Windows system: but it is the Solaris system that was at fault, for being tolerant of bad code past sanity.)

Might I suggest that next time you learn something about the C standard before you try to argue about it? You only need to read the first dozen pages to get this distinction. This is a radical suggestion on the net, I know, but I think you'll find it worthwhile...

On breaking things

Posted Dec 4, 2010 23:48 UTC (Sat) by RogerOdle (subscriber, #60791) [Link]

Please do not lecture me on standards. I am an engineer and I live by them. I have written many specifications and they come back to bite you when you make a mistake so I take extra care not to make them in the first place. When you leave a hole in your specification like saying that it is implementation specific then two people are going to use it differently and one is going to blow up. I do know something about the C standard and in this case the standard leaves one wanting something better.

You can not deny that the insufficiency of the memcpy standard has caused problems. It is easy to say that programmers should be mindful about copying overlapping memory but the reality is that they are not and this is the root of the problem. You can stay in you ivory tower where everything is perfect if you want but I want thing to be easier. I want to get rid of the stumbling blocks like memcpy. Even if programmers know about memcpy, they are still going to forget and this issue is going to come up again. It has before and it will again.

I would argue that since memcpy is a core C function, that it should work all of the time no matter how the addresses overlap. If someone wants an optimized function then they should look elsewhere. No function should go into the core whose behaviour is "implementation defined". I should be able to use the core functions anywhere and get the same results everywhere. I mentioned before that things would be better if the C standard had memcpy alternatives for copying from to lower addresses and for copying to higher addresses. Each of these could be optimized and memcpy could be modified to pick which one to use based on its arguments. If someone wants to optimize performance then they can call these functions directly. But memcpy would always work! It the C standard did this reasonable thing then we wouldn't be having this argument.

My point all along has been that memcpy has once again given us a black eye. If isn't fixed this time then next year we will be having this same argument all over again. How about it GCC just does us all a favor and throws out a warning whenever memcpy is used?

On breaking things

Posted Dec 5, 2010 3:51 UTC (Sun) by magila (subscriber, #49627) [Link]

Implementation defined and undefined behavior exists for a good reason, without them C could not be as versatile or performant as it is across multiple platforms. Sure it can be a pain to deal with, but this is trade-off which lies at the core of C's design philosophy.

If you don't want to deal with ensuring that memory regions don't overlap then you are certainly welcome to use memmove. Although even then, if you are inadvertently copying between overlapping regions there's a chance you'll still have a bug because you really didn't want to clobber the source region.

Frankly, it sounds like what you want isn't C at all. There are plenty of high level languages which will (attempt to) give you safe and consistent behavior across all supported platforms. Of course, to do this those languages accept a different set of trade-offs from C, but if you really don't want to have to deal with implementation defined or undefined behavior I expect you will find them a better match for your needs.

On breaking things

Posted Dec 5, 2010 20:57 UTC (Sun) by anselm (subscriber, #2796) [Link]

I would argue that since memcpy is a core C function, that it should work all of the time no matter how the addresses overlap.

This is wishful thinking. The fact that memcpy() is not guaranteed to work when the source and target ranges overlap has been documented, in an ISO standard no less, for 20 years now. C programmers ignore this at their own peril.

There is a lot to be said for the observation that memcpy() should never have been standardised that way, but that observation ought to have been made before ISO 9899-1990 was finalised. Even if the GNU libc programmers changed their version of memcpy() back to suit your preferences, the fact that ISO C disallows overlapping copies is only going to bite you again on the next C implementation you're going to port your code to.

On breaking things

Posted Dec 13, 2010 20:30 UTC (Mon) by nix (subscriber, #2304) [Link]

Exactly. The C Standard will never change in this respect, and even if it *did* it would be many decades before everyone could rely on it and a source of portability nightmares until this day.

Language standards for major languages do not change that easily. (Look up the history of the && versus & precedence rules, and why & is wrong. That was back when there were only a few C installations, and they *still* held off changing it.)

On breaking things

Posted Dec 5, 2010 12:54 UTC (Sun) by paulj (subscriber, #341) [Link]

And GNU libc checks strings passed to printf for a %s placeholder for NULL, when the C standard says this is not allowed. So a lot of code that runs fine on Linux would blow up on Solaris. I think eventually the Sun engineers relented and held their nose and made Solaris libc similarly check.

So I don't think anyone has full claim to being pure as the snow...

On breaking things

Posted Dec 13, 2010 20:35 UTC (Mon) by nix (subscriber, #2304) [Link]

They did, and your statement is true enough (though, as usual, general logging functions still have to guard against unintended NULLs, because where you can get unintended NULLs you can also get wild pointers, and those will crash anything). But as a general principle, glibc is more paranoid than Solaris libc. (Everyone at work moans about this except for me. I celebrate it. It's caught a good few bugs, although less than the saviour of all tricky bugs, valgrind :) )

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds