LWN.net Logo

Fuzz testing

September 20, 2006

This article was contributed by Jake Edge.

Providing random or semi-random data to a program to see what happens is an excellent black-box testing technique known as fuzzing. Programs that generate this data are, unsurprisingly, called fuzzers and are a potent tool for folks doing penetration or other kinds of testing. After sitting through some interesting presentations at this summer's Black Hat Briefings, it seems like a good opportunity for an overview of fuzzing and some pointers to tools, techniques and research.

Generating bad input for programs is a time-honored tradition for test engineers, but human generated test cases tend to contain fewer tests than a fuzzer can produce. In addition, test engineers may make implicit assumptions about the kind of data that can or will be fed into a program where an automated, brainless fuzzer will just try anything. The simplest fuzzer will just send random bytes of data to a program and see what, if anything, happens. It might also vary the length of the data that it sends to explore buffer length issues and the like.

More sophisticated fuzzers extend those simple techniques with more domain specific data. A fuzzer targeted at web applications might generate GET and POST queries using (and abusing) the variables that the form or page submits as well as adding in some random variables and values. A fuzzer targeting a web browser might generate random input that conformed to HTML syntax, with random tags and attributes as well as abusing the defined tags. This domain specific approach tends to yield better results by limiting the search space but that can lead to some of the same implicit assumption problems that are prevalent in human generated tests. A combination of both simple and complex fuzzing is likely the best approach.

Open source tools for fuzzing various applications and protocols are available; Jack Koziol provides a nice, but not exhaustive, list. While it is not specifically a fuzzer, one must mention Metasploit, the swiss army knife of penetration testing, which provides a framework for all kinds of exploit testing. It would appear that the Ruby language is gaining some traction for penetration testing as Metasploit has been rewritten in Ruby for its next version and RFuzz provides a nice library for web application fuzzing. Most other popular languages (C, Perl, Python, Java) are represented as well.

Researchers at the University of Central Florida are trying to take fuzzing a step further by using information about what portions of the code were exercised by various inputs and whether they led to program crashes to drive a genetic algorithm that 'optimizes' for inputs that are likely to cause crashes. Obviously, this is no longer black-box testing, but it could be a fairly useful technique for projects that are looking for vulnerabilities in their own code. Slides from the Black Hat presentation are available here (PDF).

An input source that is often overlooked is data files. Because these files are often generated by a program, it is easy to write code that blindly believes what a data file says; this mistake has led to many exploits. Dan Kaminsky briefly talked about data format fuzzing in his "Black Ops 2006" presentation. He presented some ideas from his research into automated recognition of formats for the purposes of fuzzing them. Just feeding a random stream of bytes into a program meant to read a specific format is less likely to cause it to fail. With some rudimentary understanding of the format and fuzzing within that framework, much more interesting program failures can be provoked. Dan's slides are available here, unfortunately in PowerPoint format, but readable by OpenOffice.org.

Internationalization (i18n) is another potentially exploitable area for many applications. Scott Stender presented some ideas on fuzzing i18n data at Black Hat, in particular using Unicode representations to get bad data past validators when different levels of the application handle character encodings differently. He gave some explicit examples of input that might validate within a web application, but be interpreted differently by a database leading to various kinds of misbehavior. His slides are here (PDF).

Fuzzing can be used to find all kinds of security issues with a program: buffer overflows, SQL injection, cross-site scripting, denial of service, etc. It is, of course, no silver bullet. It is just a powerful technique to help a developer or tester pinpoint areas where input validation and filtering are not working and to give some level of confidence that validation is working in other areas.


(Log in to post comments)

Fuzz testing

Posted Sep 21, 2006 9:48 UTC (Thu) by nix (subscriber, #2304) [Link]

Of course fuzzing is useful elsewhere as well. In general it's useful whenever you've written something with an interface complex enough that a total-coverage test is impractical: in some cases there are so many edge cases that you can't even test all of those: but you surely can fuzz them.

I'd say that 30--40% of the testcases I write for code I've written are fuzz tests of some description, but almost none of them are looking for potential security holes per se: just plain old bugs.

OT: PowerPoint

Posted Sep 21, 2006 12:32 UTC (Thu) by walles (guest, #954) [Link]

I think the correct description for that file format is "OpenOffice's "PowerPoint" file format" :-).

Article quality management on LWN

Posted Sep 28, 2006 11:36 UTC (Thu) by tmk (guest, #40799) [Link]

So I'm not a subscriber so I have no right to complain, but still... Please, please, Stop publishing these trivial fluff "articles" from Jake Edge who obviously has only a very shallow overview on misguided attempts at false security. Well engineered software needs no "fuzzing", it's provably correct. Fuzzing and pen-testing are just techniques of the incompetent (but criminal) underground of a bygone era.

Article quality management on LWN

Posted Sep 29, 2006 10:50 UTC (Fri) by robbe (subscriber, #16131) [Link]

> Stop publishing these trivial fluff "articles" from Jake Edge

IMO the piece was a decent introduction to fuzzing for those who have only heard the term, but never looked into it further.

> Well engineered software needs no "fuzzing", it's provably correct.

The "market" for well-engineered software of your kind is miniscule. How much of the systems you use (HW+SW) has been proven correct?

Article quality management on LWN

Posted Sep 29, 2006 14:42 UTC (Fri) by dmag (subscriber, #17775) [Link]

> Well engineered software needs no "fuzzing", it's provably correct.

Even if you prove your software is 100% correct, fuzzing is still useful until you prove your hardware and OS are correct too.

Proving your OS is "correct" is easy, if you strip you OS down to 5 lines of code. But on a real-world (useful) OS, its just not possible yet.

So, tmk, What percentage of the software *you* use is "proven correct?" (Remember to include in the list all the software involved in posting your reply: your OS, the code in your keyboard, mouse, monitor, BIOS and hard drive, your web browser, all routers on the path, any web caches, web proxy/load balance servers, web servers, etc..)

> Fuzzing and pen-testing are just techniques of the incompetent (but criminal) underground of a bygone era.

Ha ha. Just to pick a random example, I might agree that Microsoft is "incompetent" and "criminal", but the dream of "underground" and "bygone" has not happened yet..

P.S: I liked the original article. But I'm worried about downloading a PPT presentation from a guy looking for obscure holes in file formats... :)

Copyright © 2006, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds