Not logged in
Log in now
Create an account
Subscribe to LWN
LWN.net Weekly Edition for May 23, 2013
An "enum" for Python 3
An unexpected perf feature
LWN.net Weekly Edition for May 16, 2013
A look at the PyPy 2.0 release
zlib: buffer overflow
Posted Jul 26, 2005 19:29 UTC (Tue) by roelofs (guest, #2599)
Posted Oct 6, 2005 4:22 UTC (Thu) by JoeBuck (subscriber, #2330)
zlib simply isn't that complex, and it is pervasive. It should be possible to analyze the code and prove that there are no remaining buffer overflows (rewriting any parts that are necessary to obtain the proof). Same goes for other pervasive libraries, like JPEG, GIF, and PNG, so people can have rock-solid confidence that viewing attachments isn't a malware vector any more.
Too bad summer's over, it would have been nice to ask Google to sponsor something like that.
Posted Oct 7, 2005 3:38 UTC (Fri) by zblaxell (subscriber, #26385)
13 years ago, in the bad old days of DOS image viewers, you could expect a mangled image file to routinely take down your whole machine. Now we expect non-mangled image files to take over a small part of the machine and do something nefarious with it, without you noticing. Real progress, that. ;-)
Posted Nov 5, 2005 1:46 UTC (Sat) by roelofs (guest, #2599)
I would suggest that it's a tad more complex than you think--and libpng and libjpeg are an order of magnitude worse than zlib. That doesn't mean a complete security audit wouldn't be useful or doable, but such things aren't very fun. You can probably see where this is going. ;-)
Forget security attacks--these bugs can cause application crashes with plain old randomly corrupted data.
But if you want stable/secure, why are you using 1.2.x? 1.1.4 is the stable and well-tested branch. 1.2 is a fairly significant rewrite and has been publicly released for less than two years, if I recall correctly. Those who have switched have done so either because it's also significantly faster than 1.1 or because they don't see the risks as particularly significant (or both). IOW, they did their cost/benefit analyses, either explicitly or implicitly, and their use of zlib 1.2.x is the outcome. But none of that stops you from using, e.g., Gentoo, downgrading to 1.1.4, and recompiling everything to your own tastes.
That's just wrong for open source code that is 13 years old and counting. :-(
Er...that's slightly exaggerated. zlib was born shortly after PNG was, just over 10 years ago.
Posted Nov 6, 2005 0:43 UTC (Sun) by zblaxell (subscriber, #26385)
14 years ago, I was implementing compression algorithms from technical papers or by manual translation from other programming languages. At the time I had little experience with formal software design, algorithm proving, information security, or anything like that--but I had a single computer running one or two dozen processes with no MMU, so bugs would crash not just the application in question, but the entire OS. A slow CPU ensured that I'd have several minutes to think about my last bug while the machine rebooted. I would stare in envious wonder at Unix machines, and dreamed of having the luxury of a machine that could just say "segmentation fault" and keep running as if nothing happened.
I *assumed* I would screw up the implementation from the start, so I carefully checked pointer validity against things like buffer overruns and uninitialized or invalid accesses. It's really not difficult to prevent buffer overruns if you really want to prevent them--it's really just a matter of picking good loop invariants and then ensuring they're implemented correctly (array bounds checking macros help too). I would think that people who implement compression algorithms would do a better job at this than average coders, but apparently I'm wrong.
Compression libraries are sometimes very sensitive to performance issues, and can't afford to do assertions everywhere if they are going to be fast on slow CPU's. Sometimes it's possible to optimize the assertion checks. Sometimes optimizing the assertions by hand makes them buggy. On the other hand, sometimes the input is strictly controlled (e.g. database-like applications which never consume any compressed data that they didn't produce themselves) so checking for evil data isn't necessary.
On single-user desktops with parallel execution in their CPU's and fast L1 caches, the extra assertion checks necessary to prevent buffer overruns disappear in the cache miss noise. Maybe the solution is to have two parallel zlib implementations: one that is safe, and one that is fast?
Posted Nov 10, 2005 7:55 UTC (Thu) by roelofs (guest, #2599)
I assure you, there's no great need to lecture me on the history of this software. ;-)
The code in gzip and zlib is certainly related, but zlib marked the first major rewrite of it, and as such, it hardly counts as the same code. By your argument, Windows XP (or whatever's current) should be the pinacle of GUI perfection since its earliest copyrights date to Windows 1.0 in 1985 or whenever. Clearly that's a fallacy. Rewrites are just what they sound like: new code, tempered by lessons learned in earlier implementations, but by no means perfect or even comparably bug-free.
It's really not difficult to prevent buffer overruns if you really want to prevent them--it's really just a matter of picking good loop invariants and then ensuring they're implemented correctly (array bounds checking macros help too).
Obviously it depends on your goals; you value robustness over all else, but the primary aim of the zlib authors was performance, both in terms of compression efficiency and in terms of speed. Generally speaking, bounds-checking is a significant performance hit--if it weren't, you wouldn't see separate debugging malloc and STL libraries, because that capability would already be built into the standard versions.
Maybe the solution is to have two parallel zlib implementations: one that is safe, and one that is fast?
Loosely speaking, we already do: zlib 1.1.4 and zlib 1.2.x. Of course, you would presumably argue that even 1.1.4 doesn't do enough bounds-checking and input-validation to be considered truly "safe," and you'd be right. But it's a start.
I thought there was an Ada port of zlib somewhere, but I don't seem to have the link handy, so maybe I'm thinking of something else. But the point is that there are a number of inflate and deflate implementations in various languages, so don't feel unnecessarily constrained to use a library you don't trust.
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds