LWN.net Logo

Mayhem finds 1200 bugs

By Jake Edge
July 3, 2013

The reporting of 1200 bugs, some of which may have security implications, is sure to overwhelm any distribution's bug handling abilities. So it was rather helpful that Alexandre Rebert started out by posting to the debian-devel mailing list rather than just flooding the bug tracker. Beyond just the sheer number of bugs, though, there is a question of dealing with so many potential security issues, which are generally handled differently than regular bugs. Rebert and other security researchers at Carnegie Mellon University (CMU) found the bugs in binaries from the Debian repositories using an automated bug finder called Mayhem [PDF]

Mayhem is a closed-source research project at CMU CyLab that uses symbolic execution on binary programs to find exploitable bugs in the code. It does its job by looking for load and store instructions that can be influenced by the inputs to the program. It examines the paths through the program using a "hybrid symbolic execution" mechanism that combines normal execution of the program with symbolic execution of an intermediate language representation that is created whenever a tainted (i.e. dependent on user input) branch condition is detected. The symbolic execution looks for ways to exploit the tainted code and builds an exploit if it can. The Mayhem paper goes into a lot more detail, perhaps enough for others to reproduce the technique.

The bugs are "exploitable" in the sense that each crash can execute arbitrary code. While code execution bugs are serious, the programs in question are typically run by regular users from the shell, so being able to get a shell (which is the usual proof of concept used by demonstration exploits as well as by Mayhem) is not a huge accomplishment. But being able to get a shell means that an exploit could do anything the user could do, including exposing or deleting files, participating in a botnet, sending spam, and so on. The exploits require specially crafted arguments and/or input files to trigger the bugs, so users would have to be tricked into running the programs that way.

Of course, any setuid programs or those accessible via the web or other internet services are a much larger concern. That's not to downplay what the Mayhem team has done in any way, but fuzzing has shown us that arbitrary inputs to programs often lead to crashes—the trick is finding a way to get users to provide crafted inputs that lead to an interesting (to the attacker) result. Regardless, the bugs do need to be fixed, and the Mayhem team has provided a wealth of information to do just that.

Each bug report comes with a tar file (an example for gcov was provided with Rebert's message) that contains a script to reproduce the problem, files containing the arguments and input that cause the crash, the core dump, and more. Reports for each of the bugs were sent to the appropriate Debian package maintainers, though some of those addresses were actually mailing lists, as Paul Wise pointed out. That allows us to see some of the reports, including one for the nfsidmap binary in the nfs-common package. Rebert's message also linked to a text file that lists all of the affected packages and their maintainers.

There are almost certainly more bugs out there for Mayhem to find as the team limited the search space of the tool, allowing just five minutes of run time per binary. They also limit the bugs reported to one per binary and five per package. There are likely to be plenty of duplicate bugs on the list as well; bugs in libraries may well appear for multiple binaries. And, of course, the bugs aren't limited to Debian, as many of the packages will be in the repositories of lots of different distributions; all or nearly all of them will not be Debian-specific at all.

Unfortunately, there is no automated way to extract addresses for the upstream developers or mailing lists from the Debian packages. The bug reports may ultimately need to make their way upstream, but the Mayhem team couldn't find a way to do that, so they started with the Debian maintainers. As Andreas Tille noted, some packages may have implemented the machine-readable debian/copyright file, which might provide an upstream contact and email address. But, for security reports, even that may not be the right place to send the message.

But, in fact, Rebert has recognized that the security tag on most of the proposed bug reports was probably not accurate. "It looks like a majority of the crashes have little security implications", he said, so that tag will be removed before the actual bug reports get submitted. It isn't clear that a security contact would be needed in the majority of cases but, since Mayhem sets out to find exploitable bugs, "responsible disclosure" might still indicate that a security list or email should be used to report the problems.

The problem is, in some ways, similar to the question of where bugs should be filed that we reported on last week. Which bug tracker (distribution or upstream) to use is contentious enough when looking at single bugs reported by users; 1200 bugs increases the scale of the problem significantly. The clear indication is that Mayhem can find lots more if it were given free rein, though the duplicates need to eliminated or substantially reduced or the team risks overwhelming distributions and upstreams.

The "huge pile of bugs" problem is a consequence of the closed-source nature of Mayhem. If the tool were available to be used by various projects' developers as part of their testing, the bugs could be found and fixed in the normal course of development. Rebert mentioned the possibility of creating some kind of Mayhem web service, but it would be far more useful if the tool was free software (even "free as in beer" would be better than the existing situation). Since public funds were used to develop the tool, one might hope the public would get a bit more out of that spending. The Mayhem paper mentions that the US Defense Advanced Research Projects Agency (DARPA) helped fund some of the work, but, alas, that funding doesn't seem to come with a mandate to publish the source.

It's clear that running Mayhem on the 23,000 or so binaries found in the Debian "Wheezy" repository has found real bugs, some of which are "exploitable" in limited scenarios. Some are probably worse than that, however, and as the tool gets improved, it may be able to narrow in on more dangerous bugs. One might guess that CMU and the Mayhem developers plan to commercialize Mayhem. That is, of course, their prerogative, but it is unfortunate that tools like Mayhem and the Coverity static analyzer (which came out of Stanford University) are not free software tools. One suspects they would see much more use—and, possibly, improvement—if they were.


(Log in to post comments)

Mayhem finds 1200 bugs

Posted Jul 4, 2013 2:01 UTC (Thu) by luto (subscriber, #39314) [Link]

If this is useful enough, maybe someone (or multiple groups) could offer enough additional funding to the authors to convince them to open-source it.

Mayhem and free software

Posted Jul 4, 2013 13:21 UTC (Thu) by davecb (subscriber, #1574) [Link]

The authors may wish to offer a commercial service to closed-source companies, and so will wish to distinguish between free software and non-free. Coverity does something of the sort, while supporting Linux and Samba.

Were I them, I'd accept recommendations from the community as what to test and who to notify, then set up a cron job (:-))

--dave

Mayhem and free software

Posted Jul 9, 2013 14:17 UTC (Tue) by drag (subscriber, #31333) [Link]

I would expect that like most Universities CMA is looking for a way to patent and then license the technology that students develop in their labs.

This, I understand, is very normal and is why most Universities support software patent system.

Mayhem finds 1200 bugs

Posted Jul 4, 2013 15:19 UTC (Thu) by mstone (subscriber, #58824) [Link]

Libre versions of related technology like http://klee.llvm.org + http://dslab.epfl.ch/proj/s2e have actually been available, if largely ignored, since 2008-2009.

(I have personally used klee to find previously unknown out-of-bounds memory access bugs in production network code at $WORK and to test proposed fixes.)

Mayhem finds 1200 bugs

Posted Jul 5, 2013 21:26 UTC (Fri) by robert_s (subscriber, #42402) [Link]

I suspect the reluctance to publish the source involves a fear of black hats using the tool to find countless vulnerabilities for exploitation or sale.

Of course, I'm sure DARPA managed to make sure certain Three Letter Agencies got a copy of the source for their own nefarious uses.

Mayhem finds 1200 bugs

Posted Jul 5, 2013 22:50 UTC (Fri) by gmaxwell (subscriber, #30048) [Link]

If so, it may be a misplaced motivation. Tools like KLEE do the same sort of analysis and have been publicly available for some time.

In the meantime I have software I'd like to run their tool on and cannot.

Mayhem finds 1200 bugs

Posted Jul 11, 2013 22:43 UTC (Thu) by robert_s (subscriber, #42402) [Link]

>If so, it may be a misplaced motivation.

I'm not necessarily disagreeing here.

>Tools like KLEE do the same sort of analysis and have been publicly available for some time.

As exciting as KLEE looks I don't (yet) see any lists of 1200 bugs they've found.

And KLEE only works with LLVM bytecode from what I can tell and LLVM's support for disassembling from machine code (to LLVM bytecode) only seems to be in its infant stages, so it would be from-source-only analysis.

Mayhem finds 1200 bugs

Posted Jul 11, 2013 22:59 UTC (Thu) by gmaxwell (subscriber, #30048) [Link]

> As exciting as KLEE looks I don't (yet) see any lists of 1200 bugs they've found.

And perhaps this is a better argument for not distributing the tool— being able to gather up the results and take credit for them. I've fixed bugs in my own software with KLEE, but you wouldn't find them some large trophy list.

> so it would be from-source-only analysis.

Sure, and in the context of running tools against free software (as was done here), this isn't that much of a limitation.

Mayhem finds 1200 bugs

Posted Jul 12, 2013 9:13 UTC (Fri) by robert_s (subscriber, #42402) [Link]

>Sure, and in the context of running tools against free software (as was done here), this isn't that much of a limitation.

Right, but that's not quite the context - we're talking about why people might be hesitant to publish the source, and that involves worry about the wider landscape, including proprietary binary-only-land.

I fear we're just descending into pedantry here.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds