This isn't either/or. Phase in such changes instead!

Posted Nov 11, 2010 15:24 UTC (Thu) by dwheeler (guest, #1216)
Parent article: Glibc change exposing bugs

This isn't an either/or situation. The glibc folks have a great point that it's absurd to presume that a call preserve some functionality, when it has never guaranteed it and the various documentation available SPECIFICALLY says to not depend on it. But Torvalds also has a point that functionality not officially guaranteed, but depended on by real programs, shouldn't be lightly disregarded.

I think the solution for stuff like this is to phase in major changes, in a slower way. First, clearly document that "it used to work this way in practice, but soon it won't". Implement the new semantics in a "testing" library so that people can test it out before it goes "live", but don't ram it down production systems at first. Document *how* to run the testing situations clearly and obviously; libc_g and friends are essentially impossible to find, even if you know they exist. Then, after some time, switch. Yes, even all this somebody will be caught off guard, but the list of impacts will be a lot shorter (and thus more manageable). Also, if you've warned people, many people will be looking for that kind of problem, making it much easier to identify and fix the stragglers.

This isn't either/or. Phase in such changes instead!

Posted Nov 11, 2010 15:36 UTC (Thu) by jwb (guest, #15467) [Link] (1 responses)

I assure you that even if all those pointless hoops had been jumped through, Flash would still have been broken when the switch finally happened. You would simply have been punishing the users and free software developers for nothing.

This isn't either/or. Phase in such changes instead!

Posted Nov 11, 2010 19:40 UTC (Thu) by sgros (guest, #36440) [Link]

This is very interesting... people don't read specs, at least not carefully and tend to blindly generalize.

There are so many broken programs because someone tested something in a specific environment and it happened to work in that particular case and that test finishes with the broad conclusion it will always work.

Network is another example. I heard people, writing networking code that directly accesses Ethernet, claim that frames smaller than 46 octets are perfectly OK. Yes, they are, until some user starts using that code in different environment that is strict with respect to specs.

In the end, I'm not for helping bad programs and lazy programmers (lazy in negative, not positive sense!)

This isn't either/or. Phase in such changes instead!

Posted Nov 11, 2010 19:25 UTC (Thu) by xilun (guest, #50638) [Link] (13 responses)

In this case, the semantic has _not_ changed. You just can't pretend from the user side, that nasal deamons are doing predictable things... Semantic has never included observed behavior of random faulty programs, and never will. In C, when you write a faulty construct that has undefined behavior, there is no way to predict what really will happen on a particular system. Even when you know both the exact compiler version and the exact glibc version: any unrelated change in the same file could trigger other optimisation and could change how you program will fail, or even mask completely the failure.

Standard preconditions user have to observe _are_ parts of the semantic, and neither those nor the correct behavior of the glibc has changed when you do observe them, so in no way this particular memcpy optimization is a major change. Actual preconditions are often relaxed in a given implementation, but unless it's documented in an additional standard the way they are relaxed will never be the same between two implementations or two version of the same, so nobody can pretend to reliably take advantage of undocumented relaxed preconditions.

Would that particular memcpy change be considered as a major change, _every_ glibc change would need to be considered as a major change.

In other words, when a language do define from the beginning of time that trying some operations would result in undefined behaviors (and has since always be consistent about this definition), and when a system does not provide further guarantees, then it does not matter what the observed behavior is with version X of the compiler, Y of the libC, and processor Z with die revision T -- changing any of X, Y, Z or T, or even seemingly unrelated parts of the faulty program can result in it to violently explode, and will eventually result in that because of Murphy's law. It will still result in that even if you blame glibc developers for your own mistakes.

Every C programmer should know the distinction between implementation-defined behavior, undefined behavior, and unspecified behavior -- otherwise he should rather program in an other language... You'd also better have some notions about how compilers, sometimes in a way related to associated libraries, can take advantages of explicitly undefined and unspecified behaviors to do some optimization. Stopping to do that would be hugely ridiculous, on a level as ridiculous as stopping to simplify boolean equations by taking advantage of "don't care" outputs, or even stopping to automatically factorize redundant computations.

If you know the difference, but just don't like that C has undefined behaviors, or that C compilers and other associated system stacks are targeting efficient code sometimes by taking advantages of explicitly undefined behaviors, well it's not going to change anytime. So in this case you also don't have any choice: use another language.

This isn't either/or. Phase in such changes instead!

Posted Nov 11, 2010 22:34 UTC (Thu) by dafid_b (guest, #67424) [Link] (12 responses)

This thread is quite strange.

On one side there are arguments that a change that is made by free software purists that happens to break pre-existing programs is good - because FLASH is one of the broken programs...

On the other side are arguments that users systems are exposed to corruptions due to changes in the behaviour of a library call made for a marginal optimisation of a utility function.

To put it in perspective: I do not want the software I rely on to have ONE randomly inserted bug activated for a 200% improvement of its overall performance.

That bug could be the one a hacker uses to observe my credit-card details when paying for LWN subscription.

The proposed benefit is 20% of 2%, or 0.4%. The possible cost is my bank account.

I hope that the packagers of the distributions do the sensible thing.

That is: pull that change out and shoot it.

This isn't either/or. Phase in such changes instead!

Posted Nov 11, 2010 22:57 UTC (Thu) by dgm (subscriber, #49227) [Link] (6 responses)

It's quite simple: ask Adobe to fix the Flash player, or to open source it, so we can fix it ourselves.

This isn't either/or. Phase in such changes instead!

Posted Nov 11, 2010 23:09 UTC (Thu) by dafid_b (guest, #67424) [Link] (5 responses)

did you see this early comment in the thread...

"Actually I think we may have first seen this with squashfs. Problems showed up right before the F14 alpha. Phillip found the cause of the problem was using memcpy instead of memmove."

So there are at least two bugs exposed by this change in Glibc.

There may be more. There are vast number of applications out there still waiting to be tested.

It is just impolite to cause users to do the testing when you don't have to.

Dave

This isn't either/or. Phase in such changes instead!

Posted Nov 12, 2010 4:31 UTC (Fri) by mrshiny (guest, #4266) [Link] (4 responses)

Exactly. And given that Glibc has symbol versioning the performance can be had already for anyone who recompiles their code. Remember: this bug exists for all users of glibc even if they compiled their apps a long time ago but recently updated glibc. The fact that Flash is proprietary is slightly irrelevant to the discussion.

This isn't either/or. Phase in such changes instead!

Posted Nov 17, 2010 15:14 UTC (Wed) by meuh (guest, #22042) [Link] (3 responses)

Remember: this bug exists for all users of glibc even if they compiled their apps a long time ago but recently updated glibc.

It's not a bug. And it doesn't affect all users.

Hopefully, legitimate uses (regarding to specification) of memcpy() are not affected by the optimisation in newer glibc.

This isn't either/or. Phase in such changes instead!

Posted Nov 17, 2010 15:52 UTC (Wed) by mrshiny (guest, #4266) [Link] (2 responses)

Yes, it is a bug. Sure, the application is responsible for using APIs properly. But here we have a situation where a library has worked one way for years, and then suddenly works a different way. There was no way for the apps in question to detect the bugs because the code worked perfectly before. Now, due to a library upgrade, those apps don't work. In some cases there is data corruption. The corruption might happen silently. There is no way to be sure that this change is not quietly damaging untold amounts of data without auditing every use of memcpy everywhere to ensure that it is doing the right thing.

And this means that not only do you have to fix all source code which is wrong and issue new binaries, but you shouldn't upgrade to this version of Glibc because you might have an app somewhere that wasn't fixed, or isn't fixed in the version you have installed.

Glibc is a critical library in the system. Almost every program uses it. As such, it is their responsibility to treat ABI changes very carefully. Sure, this is not a change in the specification, it is an unintended consequence and it's due to those stupid lazy programmers who didn't read the spec or didn't care or whatever. Or inadvertently introduced errors when their code was changed. Or changed something without realizing that this change would result, somewhere, in a call to overlapping memcpy. Given that the bug was hard to identify (at least for some cases), and given that Glibc has symbol versioning, maybe they should use it?

Your last sentence sums up the problem: "Hopefully legitimate uses are not affected". I think we should expect stronger guarantees from glibc than "hopefully".

This isn't either/or. Phase in such changes instead!

Posted Nov 17, 2010 16:09 UTC (Wed) by meuh (guest, #22042) [Link]

Hopefully, every programmer run their programs under valgrind once in a while.

This isn't either/or. Phase in such changes instead!

Posted Nov 17, 2010 16:23 UTC (Wed) by xilun (guest, #50638) [Link]

> There is no way to be sure that this change is not quietly damaging untold amounts of data without auditing every use of memcpy everywhere to ensure that it is doing the right thing.

There is also no way to be sure that this change is not _fixing_ untold amounts of data corruption when the memcpy is done backward without auditing every use of memcpy :)
Anyway, C being what it is, this is a little ridiculous to do a fixation on that particular change, because some other changes exposing bugs are done every day, hundred at a time. So really, you have no way after ANY upgrade to be sure that memory corruption won't mysteriously happens when they previously did not. If that's a problem for you, don't ever update anything => problem magically solved.

> given that Glibc has symbol versioning, maybe they should use it?

Nope. Symbol versioning is for ABI changes, and symbol versioning does not even pretend to automatically solve every problem ABI changes has been shown to cause. The memcpy implementation change is not even an ABI change.

> Your last sentence sums up the problem: "Hopefully legitimate uses are not affected". I think we should expect stronger guarantees from glibc than "hopefully".

There was a problem only in the sentence. The "hopefully" is not needed. Legitimate users of memcpy will not be affected.

This isn't either/or. Phase in such changes instead!

Posted Nov 12, 2010 0:08 UTC (Fri) by xilun (guest, #50638) [Link] (4 responses)

> On one side there are arguments that a change that is made by free software purists that happens to break pre-existing programs is good - because FLASH is one of the broken programs...

I fail to see how my previous post, which you replied to, is in any way related to free software purist happy to break Flash.

Indeed I think you did not even read it.

So I'll make an executive summary (but with new elements, for those who follow): in http://www.coding-guidelines.com/cbook/cbook1_2.pdf ; read, starting at pdf page 183, 3.4 behavior, 3.4.1 implementation-defined behavior, 3.4.3 undefined behavior, and 3.4.4 unspecified behavior. You will then hopefully understand why it would indeed be *dangerous* for security (not even talking about performance) in the long term if a widely used implementation starts giving guarantees defining "undefined behaviors", or if the maintainers of such implementation start acting like there seems to be some guarantees. (Think about other compliant implementations.)

If you don't like implementation-defined, undefined and unspecified behaviors in programming languages, use Java. I'm indeed starting to wonder if Linus does not secretly dream about writing operating systems in Java -- look at: some of his responses during the NULL-page mapping debacle, GCC adding optimizations taking advantages of undefined behaviors on integers, and his position on this memcpy implementation.

> On the other side are arguments that users systems are exposed to corruptions due to changes in the behaviour of a library call made for a marginal optimisation of a utility function.

Users systems are exposed to corruptions because they wrote code having undefined behavior in the first place, and there should be neither surprise nor scandal when code containing faulty constructs having undefined behavior starts to behave in an undefined way, because that's precisely the definition of what "undefined behavior" means.

Undefined behavior could has well change observable behavior depending on your power supply, the phase of the moon, and the fact Linus has been personally annoyed by a random bug (the last cause being the most probable in those examples, which is a little weird from an economical perspective, but oh well). Blame glibc maintainers all that you want, but you'll soon have multiple targets when the next advance in GCC expose other bugs caused by other undefined behaviors.

> To put it in perspective: I do not want the software I rely on to have ONE randomly inserted bug activated for a 200% improvement of its overall performance.

Under which perspective bugs are not "randomly" inserted? (given a non malicious intent in the first place). Would you be OK with "ONE randomly inserted bug activated" because of a change for support of a new hardware, or a new feature. Do you realize that even a bug fix can activate other bugs? Do you realize that you can easily avoid all that kind of trouble by NOT upgrading your system library ever, if you really want to? Do you understand that optimizations made at system level follow a different economic than optimizations made at application level? Do you understand that compiler/library evolution have participated in the moore law, and that your computer would be maybe 4x slower or produce 4x more heat if we still were in the naive compiler era and if low level layers had not been updated to be efficient with modern processor architecture?

> The proposed benefit is 20% of 2%, or 0.4%. The possible cost is my bank account.

Given the nature of the memory corruption, very unlikely to have that kind of security impact (but not 100 % impossible).

What is really funny is that even without the incriminated patch, the memcpy was NOT a memmove (fortunately). This particular Flash call resulted in corrupted data when copying memory in reverse order because the pointers was in a specific order, and the area overlapping. Calling memcpy with the previous GLibc implementation, and probably 99% of implementations existing on earth, will still result in data corruption if memory area are overlapping in the other order.

So I suggest you run your whole system LD_PRELOAD'ing all processes with a library that calls memmove instead of memcpy, if you are worried too much about that.
I also suggest that you immediately start looking for other bugs more susceptible to have big security impacts than this class, and that you also workaround them in weird way instead of fixing them correctly in the first place.
Maybe it would indeed be easier that you take a really old distribution, with a compiler that does very few optimization, and a very simple libc, and stick to it forever. (Well, you'll still have to do the memcpy/memmove replacement trick, but you'll have very few optimizations, so I guess that will make you happy.)

And oh, I forget to tell you: randomly defining "undefined behaviors" without auditing every components involved in both the system and its construction can sometime expose bugs with an high security impact. See the NULL page mapping debacle.

> I hope that the packagers of the distributions do the sensible thing.
> That is: pull that change out and shoot it.

Yeah, they all are reading LWN comments, waiting for your enlightenments.

This isn't either/or. Phase in such changes instead!

Posted Nov 12, 2010 1:10 UTC (Fri) by dafid_b (guest, #67424) [Link] (3 responses)

Firstly, I am sorry, I wrote as if I was posting a new post at the top level but posted to it as a reply to your post.. and then corrected by reposting clarified thoughts at the top level - not the right way to do things.

However, I think that the conversation we are having in this thread is a bit disjointed because when I say 'user' I mean a 3rd party to this conversation - not the Flash developer, not the Glibc developer, nor the crushfs developer.. but a simple user:). Whereas I think you read my user as 'developer'.

In the later post I liken (the knowing continued) delivery of this change to Glibc to mugging the person (user) who is near (uses the software written by) a jay-walker(developer who used undefined behaviour that used to work in the past).

That does not seem very fair to the user. It is sure to convince most users to stop being Linux users if the change does cause a security issue to happen - and they find out that it was a deliberate choice.

I think a better policy would be to mug the developer (send the crash reports, mocking messages in the trade press, or whatever).

This could be done by putting an intercept layer between Glibc in system tests that any user could load - at a known performance cost - that logs such violations of API requirements.

I would be happy, ecstatic even, to take part in such a mugging, when I am not doing my banking on the system.

Thanks for pointing out my other mistake - I should have said 'randomly activated bugs' rather than 'randomly inserted bugs' - as from both the end-user and developer perspective that is what is happening.

On your other points, I agree. However I think that the problem the points address is developer behaviour, and the person you mug is the user.

end user should not be punished, depending of distro target

Posted Nov 12, 2010 2:28 UTC (Fri) by xilun (guest, #50638) [Link] (2 responses)

Even according to my position that the glibc has 0 responsability in the way Flash misbehaves, I agree with you that the end user should, when possible, not be punished. I think this is clearly the job of distributions, that can identify, given their target and if they have the occasion and resources, which interaction exposing a bug in some piece of software is acceptable, and which is not. Maybe some distributions will indeed make the choice of temporarily reverting this optimisation, but I hope Adobe will simply have enough time to fix their stuff before lot of distro are released, and if the timing is OK maybe the distributions could pressure Adobe by sending them a clear message that they will ship with the best technology anyway, so that it's up to Adobe to cleanup their mess if they want that flash continue working correctly under distro X. If the timing is bad, I would clearly prefer an automatised LD_PRELOAD style work around (if done transparently for the end user) to a whole system disabling of an optimisation.

But I would not even be angry against a distribution that makes the choice to not care at all about Flash. I perfectly understand that some can absolutely not care about Flash, in which case an angry user should just do a workaround himself or switch to an other distro, if he indeed is not in the target of the one he used.

end user should not be punished, depending of distro target

Posted Nov 12, 2010 5:03 UTC (Fri) by dafid_b (guest, #67424) [Link]

we are in agreement mostly.

end user should not be punished, depending of distro target

Posted Nov 12, 2010 19:28 UTC (Fri) by charlieb (guest, #23340) [Link]

> Maybe some distributions will indeed make the choice of temporarily
> reverting this optimisation

And wouldn't it be nice if Fedora 14 were to do this :-)

This isn't either/or. Phase in such changes instead!

Posted Nov 11, 2010 23:04 UTC (Thu) by dgm (subscriber, #49227) [Link] (1 responses)

How much should the changes be delayed? A year? two?
By that time maybe the targeted machines will no longer exists. Remember those processors are not exactly new (Core2 and Atom).

This isn't either/or. Phase in such changes instead!

Posted Nov 12, 2010 18:26 UTC (Fri) by chad.netzer (subscriber, #4257) [Link]

And they are certainly not old, either. This optimization will be relevant for 3-5 years, at least. And the answer to your question is 6 months. ;)

This isn't either/or. Phase in such changes instead!

Posted Nov 12, 2010 0:10 UTC (Fri) by bojan (subscriber, #14302) [Link]

> Implement the new semantics in a "testing" library so that people can test it out before it goes "live", but don't ram it down production systems at first.

We are talking about brand new Fedora release, with yet unreleased version of glibc 2.12.90. At some point software has to get shipped in order to get tested by real users.

This isn't either/or. Phase in such changes instead!

Posted Nov 12, 2010 11:06 UTC (Fri) by marcH (subscriber, #57642) [Link]

> I think the solution for stuff like this is to phase in major changes, in a slower way.

This would be a way too much reasonable and professional approach. It would have a very high risk of less flamewars.