Glibc change exposing bugs
Glibc change exposing bugs
Posted Nov 10, 2010 21:17 UTC (Wed) by lmb (subscriber, #39048)In reply to: Glibc change exposing bugs by jwb
Parent article: Glibc change exposing bugs
Posted Nov 10, 2010 21:52 UTC (Wed)
by nybble41 (subscriber, #55106)
[Link] (33 responses)
The restrictions on memcpy() are hardly unique; *most* APIs do not tolerate overlapping memory regions. The memmove() routine is an exception. If you want a nice "safe" way to copy some data between buffers which may or may not overlap, and don't care so much about performance, just use memmove() everywhere.
While forward compatibility is a good thing in general, it is unreasonable for API developers to feel bound to support obvious *misuse* of their APIs which directly contradicts explicit API documentation, which is exactly what is happening here. Given that any broken applications can be trivially patched with a simple LD_PRELOAD, I see no reason not to permit this change to the internal implementation of memcpy() in glibc.
Posted Nov 10, 2010 22:18 UTC (Wed)
by lmb (subscriber, #39048)
[Link] (32 responses)
But the code worked so far - when users upgrade their glibc, suddenly their programs break, or possibly corrupt the user's data. How's that good?
Yes, it pushes users to complain to the developer (if they still can, and their e-mail/internet bits aren't affected), but it leaves them with a bitter aftertaste for the platform/ecosystem that forces developers to fix bugs at their users's expense.
The code should start with emitting a warning to the logs (once per program run, otherwise it becomes a DoS). The compiler could start warning if it detects the possibility of this happening (or coverity/valgrind etc all can). Possibly taunt developers publicly if you spot those messages in your logs.
But breaking underneath an unsuspecting user? Horribly, horribly wrong.
Posted Nov 10, 2010 22:34 UTC (Wed)
by jwb (guest, #15467)
[Link] (29 responses)
Posted Nov 10, 2010 23:59 UTC (Wed)
by lmb (subscriber, #39048)
[Link] (24 responses)
Knowingly introducing a change with consequences that aren't just mere crashes, but data corruption, for end users - if you cannot see how that is wrong, I have no idea how to help explain it.
Yes, of course, the performance achievement is worth having. However, not at this cost. Not without first adding some audit logging. Not without giving developers time to fix that. It is an incompatible change in the ABI.
I can read the man page as well as you can. Sure, the applications are buggy; that does not give one the right to corrupt user data. Such changes need to be phased in carefully; not in a "I am more righteous than you" style. It is bad enough when it happens by accident; intentionally, it is malpractice.
Posted Nov 11, 2010 0:05 UTC (Thu)
by bojan (subscriber, #14302)
[Link] (2 responses)
Posted Nov 11, 2010 16:35 UTC (Thu)
by jedbrown (subscriber, #49919)
[Link] (1 responses)
Posted Nov 12, 2010 0:02 UTC (Fri)
by bojan (subscriber, #14302)
[Link]
glibc-2.12.90-18.x86_64
This is an unreleased version of glibc. Fedora does this from time to time - ship early cuts of new glibc (this will be 2.13 one day).
Posted Nov 11, 2010 0:31 UTC (Thu)
by jwb (guest, #15467)
[Link] (14 responses)
You are proposing that it would be wise to have a test and branch at every entry into memcpy? That is madness.
Posted Nov 11, 2010 1:52 UTC (Thu)
by gus3 (guest, #61103)
[Link] (13 responses)
if ((p1 + length <= p2) || (p2 + length <= p1)) {
It is not a sophisticated test, and the more noise it makes about buggy parameters, the sooner the calling code will get fixed.
Posted Nov 11, 2010 2:05 UTC (Thu)
by gus3 (guest, #61103)
[Link] (10 responses)
if ((p1 + length >= p2) || (p2 + length >= p1)) {
But goofing the test, doesn't mean the test isn't simple.
Posted Nov 11, 2010 2:12 UTC (Thu)
by gus3 (guest, #61103)
[Link] (9 responses)
I see the actual test in my head, but I can't code it right now due to fatigue. But even with all necessary calculations, being integer math, it'll take no more than a few tens of cycles. Even on a register-starved x86, putting a couple temporary variables on the stack will only pollute the cache, before over-writing the temps anyway. It shouldn't take more than a microsecond to check for overlap.
Posted Nov 11, 2010 7:24 UTC (Thu)
by nix (subscriber, #2304)
[Link] (6 responses)
There is absolutely no chance that the glibc devs would ever accept this except in the -lc_g version of the library (which nobody ever uses).
Posted Nov 12, 2010 23:04 UTC (Fri)
by gus3 (guest, #61103)
[Link] (5 responses)
Posted Nov 14, 2010 22:29 UTC (Sun)
by nix (subscriber, #2304)
[Link] (4 responses)
This isn't an obscure or hard-to-interpret part of the Standard. Undefined, bang, that's it. Perhaps you are operating under the misapprehension that the linux manpages project, a descriptive effort, not a prescriptive one, is in some way binding on glibc? It isn't. It really isn't. It isn't binding on anything.
Posted Nov 15, 2010 0:00 UTC (Mon)
by promotion-account (guest, #70778)
[Link] (3 responses)
What on earth? The relevant documentation for memcpy() is ISO C, incorporated by reference into all versions of POSIX.1.
Indeed.
Linux man-pages are only authoritative for the kernel system-calls (more precisely, their glibc thin layer). The rest of the APIs are only included for convenience: they are a secondary source to the primary source references residing in the 'CONFORMING TO' section.
Posted Nov 15, 2010 0:31 UTC (Mon)
by nix (subscriber, #2304)
[Link] (2 responses)
Posted Nov 15, 2010 1:27 UTC (Mon)
by promotion-account (guest, #70778)
[Link] (1 responses)
For good or bad, these manpages are the 'most primary' sources available for such topics, only beside the code.
But unfortunately these man-pages do not always exist. I once had to carefully study the bluez userspace code to know how to best interface with the kernel Bluetooth API (undocumented AF_BLUETOOTH sockets, undocumented netlink interfaces, etc).
Posted Nov 15, 2010 10:38 UTC (Mon)
by nix (subscriber, #2304)
[Link]
Posted Nov 11, 2010 18:52 UTC (Thu)
by oak (guest, #2786)
[Link] (1 responses)
Posted Nov 15, 2010 0:14 UTC (Mon)
by promotion-account (guest, #70778)
[Link]
memmove() has this check you're clamoring for... And if the given areas don't overlap, it calls memcpy().
Sometimes even if the areas do overlap, it calls memcpy(). This happens if the library has an internal knowledge about memcpy()'s copying direction.
A common example is having src > dst, copying is forward, and the CPU block transfer unit is smaller than or equal to (src - dst). x86-64 CPUs support copying up-to 8-byte blocks in one opcode (movsq), assuming no floating-point ops in use, which is usually the case with kernel code.
Posted Nov 12, 2010 6:34 UTC (Fri)
by jmm82 (guest, #59425)
[Link] (1 responses)
Posted Nov 12, 2010 22:56 UTC (Fri)
by gus3 (guest, #61103)
[Link]
Posted Nov 11, 2010 11:13 UTC (Thu)
by marcH (subscriber, #57642)
[Link] (5 responses)
No it is not, unless "defined-undefined behaviour" has now become part of Interfaces.
By using the wrong name you are trying to sidestep all the nuances of this problem. Unfair tactics lowering your credibility.
Posted Nov 11, 2010 12:52 UTC (Thu)
by nye (subscriber, #51576)
[Link] (4 responses)
Deterministic observed behaviour, like it or not, will always be considered a part of the ABI.
This is why the kernel goes out of its way to preserve observe but undocumented behaviour, and one of the reasons Windows is wildly successful despite its numerous design flaws is that Microsoft agrees.
If a change breaks existing software, then it's a regression. Hand-wringing, finger-pointing, and bitter recriminations about 'proprietary crapware' are all irrelevant. Something worked. Now it doesn't.
From the comments on this it sounds like symbol versioning could be used to avoid this problem altogether, while still getting the benefit for newly built applications. Developers don't want to do this because they feel that it will benefit only proprietary software[0]. Of course the only people harmed by this attitude are end users.
This is just yet another case where open source software chooses politics over technical excellence, which is sad but entirely unsurprising.
[0] Disregarding the idea that one might want to use some open source software with a similar bug that hasn't yet been fixed - most developers seem to always want to run the latest bleeding edge version of everything, and don't understand that the rest of the world isn't like that and expects existing software not to break unexpectedly.
Posted Nov 11, 2010 18:27 UTC (Thu)
by donwaugaman (subscriber, #4214)
[Link]
Oddly enough, I would consider "technical excellence" to mean fixing bugs in software that has them, in this case the Adobe Flash player, whereas "politics" means allowing poorly-written precedent to trump (and in this case penalize) better performance for programs written with an eye to the standard.
It's a shame there's no way to get Adobe to do an 's/memcpy/memmove/' on their codebase. But the fact that they won't let others do it has more to do with their politics (and opposition to software freedom) than about technical excellence.
Posted Nov 11, 2010 18:32 UTC (Thu)
by xilun (guest, #50638)
[Link] (2 responses)
Nonsence. The definition of the ABI is NOT any random characteristic that would please you by making others responsible for your own errors.
First if you want to program in a language (and its associated standard library) that is not full of undefined behaviors, then you don't program in C.
If you do program in C, *YOU* are responsible for respecting preconditions. The system will *NOT* magically fix your bugs for you. Glibc developpers are not responsible for the Flash software package, and this is not a free software or not free software problem; they are also not responsible for other random piece of free software, even crappy ones.
You are inversing roles, given the bad one to the developpers of the very high quality, standard compliant, piece of code the is glibc, and the good one to the constant notable piece of crap that is the flash player.
And even if you could be a dictator for the glibc project, please explain us what is the politic you would then impose to *concretely* solve the generalized unattended interactions problem between software components. Nobody has ever solved that. Even Microsoft, which you seems to cherish so much, does for *years* (maybe even decades now) ask third party developers to ship the MS libc the third party developer test his application with. So of course most of the time you don't have this problem of a suddenly changing libc under MS Windows, because each program has its own libc. Now what happens in case some version contains a security exploitable flaw? Security effort are duplicated in such an environment. (And this is just an example.)
Imagine a random application depends on a BUG of a particular version of the glibc. Because of that, you are asking for this bug to remain forever? Nonsense. This is what you call "technical excellence"? What a joke. It would mean freezing libraries forever, because that's the only way you can guarantee in the shared library model that the behavior of any random piece of crappy software won't change too much.
If you're a third party developer who only cares about making your proprietary app sort of working even when you write highly faulty unacceptable quality code, please: 1/ ship it with frozen version of the library it needs, like you do under windows anyway 2/ leave the developers of libraries that are trying to improve them alone, and especially do not report YOUR mistake on them.
Posted Nov 12, 2010 2:50 UTC (Fri)
by gmatht (guest, #58961)
[Link] (1 responses)
This seems almost a strawman. Criticize Microsoft's policy if you want, but all nye suggested was using symbol versioning for this particular known-to-be-dangerous change. This wouldn't cause any security issues (and may even avoid some).
I agree that it isn't possible to avoid every regression. For example, newer software often has performance regressions on old hardware. However, this seems like a particularly serious regression, so if there was an easy way to stop old versions of software silently corrupting data it may be worth taking.
Posted Nov 12, 2010 11:13 UTC (Fri)
by xilun (guest, #50638)
[Link]
Posted Nov 11, 2010 0:00 UTC (Thu)
by nix (subscriber, #2304)
[Link] (3 responses)
The same will happen here.
Posted Nov 11, 2010 0:08 UTC (Thu)
by lmb (subscriber, #39048)
[Link] (2 responses)
Of course it will cause the code to be fixed. But that the maintainers of the core system library place "I am right" above users's data is a worrying insight.
Posted Nov 11, 2010 0:26 UTC (Thu)
by bojan (subscriber, #14302)
[Link]
Now you're making glibc maintainers responsible for other people's bugs. They are not.
If this same buggy program was linked against some other library that implements memcpy() similarly to the way latest glibc does, the data would be just as corrupt.
In essence, it is the program that is corrupting the data, not glibc. And it's doing so by clear misuse of a function.
> But that the maintainers of the core system library place "I am right" above users's data is a worrying insight.
I think that's a bit overly dramatic. Fedora 14 is a fresh release, currently carrying a non-released version of glibc. As such, users of it (which includes me) sometimes encounter things that are surprising at first. But the audience is limited and the impact is not earth shattering.
Posted Nov 11, 2010 10:53 UTC (Thu)
by nix (subscriber, #2304)
[Link]
Posted Nov 10, 2010 22:55 UTC (Wed)
by clugstj (subscriber, #4020)
[Link]
Posted Nov 10, 2010 23:56 UTC (Wed)
by nix (subscriber, #2304)
[Link]
Emitting a warning to the logs is far too expensive: this stuff is so often called that the compiler sometimes open-codes it! Adding a conditional in there that isn't absolutely needed would have horrible effects on performance.
(And as for detecting it at compile time, well, sure! It requires whole-program optimization of every single program and all its shared libraries, and even then detecting it reliably reduces to solving the halting problem. This seems to be rather harder than just valgrinding the bloody thing and learning elementary C before you write it.)
(Sure, there will be actual bugs teased out by this: code that didn't expect to receive overlapping regions when it was written, but that now is. But, guess what? I bet those overlapping copies were causing other bugs, because it is surely rare for code to just memcpy() from region A to *unexpectedly*-overlapping region B and then never do anything with A again.)
Glibc change exposing bugs
Glibc change exposing bugs
Glibc change exposing bugs
Glibc change exposing bugs
Glibc change exposing bugs
Glibc change exposing bugs
Glibc change exposing bugs
glibc-2.12.90-18.i686
Glibc change exposing bugs
Glibc change exposing bugs
crash_and_burn();
}
Glibc change exposing bugs
crash_and_burn();
}
Glibc change exposing bugs
Glibc change exposing bugs
Glibc change exposing bugs
Glibc change exposing bugs
The GNU people should own up to having violated the documentation on their code.
What on earth? The relevant documentation for memcpy() is ISO C, incorporated by reference into all versions of POSIX.1. This clearly states "If copying takes place between objects that overlap, the behavior is undefined."
Glibc change exposing bugs
Glibc change exposing bugs
Linux man-pages are only authoritative for the kernel system-calls (more precisely, their glibc thin layer).
No, even those are descriptive. Perhaps the glibc texinfo documentation would be authoritative for that, if it was ever maintained by anyone. As it is, I think only Ulrich and Roland's brains are authoritative for glibc.
Glibc change exposing bugs
Glibc change exposing bugs
Glibc change exposing bugs
Glibc change exposing bugs
Glibc change exposing bugs
Glibc change exposing bugs
Glibc change exposing bugs
Glibc change exposing bugs
Glibc change exposing bugs
Glibc change exposing bugs
I'm not sure nye meant all regressions need to be avoided.
I'm not sure nye meant all regressions need to be avoided.
Glibc change exposing bugs
Glibc change exposing bugs
Glibc change exposing bugs
Glibc change exposing bugs
Glibc change exposing bugs
Glibc change exposing bugs