Fuzz testing
Providing random or semi-random data to a program to see what happens is an excellent black-box testing technique known as fuzzing. Programs that generate this data are, unsurprisingly, called fuzzers and are a potent tool for folks doing penetration or other kinds of testing. After sitting through some interesting presentations at this summer's Black Hat Briefings, it seems like a good opportunity for an overview of fuzzing and some pointers to tools, techniques and research.
Generating bad input for programs is a time-honored tradition for test engineers, but human generated test cases tend to contain fewer tests than a fuzzer can produce. In addition, test engineers may make implicit assumptions about the kind of data that can or will be fed into a program where an automated, brainless fuzzer will just try anything. The simplest fuzzer will just send random bytes of data to a program and see what, if anything, happens. It might also vary the length of the data that it sends to explore buffer length issues and the like.
More sophisticated fuzzers extend those simple techniques with more domain specific data. A fuzzer targeted at web applications might generate GET and POST queries using (and abusing) the variables that the form or page submits as well as adding in some random variables and values. A fuzzer targeting a web browser might generate random input that conformed to HTML syntax, with random tags and attributes as well as abusing the defined tags. This domain specific approach tends to yield better results by limiting the search space but that can lead to some of the same implicit assumption problems that are prevalent in human generated tests. A combination of both simple and complex fuzzing is likely the best approach.
Open source tools for fuzzing various applications and protocols are available; Jack Koziol provides a nice, but not exhaustive, list. While it is not specifically a fuzzer, one must mention Metasploit, the swiss army knife of penetration testing, which provides a framework for all kinds of exploit testing. It would appear that the Ruby language is gaining some traction for penetration testing as Metasploit has been rewritten in Ruby for its next version and RFuzz provides a nice library for web application fuzzing. Most other popular languages (C, Perl, Python, Java) are represented as well.
Researchers at the University of Central Florida are trying to take fuzzing a step further by using information about what portions of the code were exercised by various inputs and whether they led to program crashes to drive a genetic algorithm that 'optimizes' for inputs that are likely to cause crashes. Obviously, this is no longer black-box testing, but it could be a fairly useful technique for projects that are looking for vulnerabilities in their own code. Slides from the Black Hat presentation are available here (PDF).
An input source that is often overlooked is data files. Because these files are often generated by a program, it is easy to write code that blindly believes what a data file says; this mistake has led to many exploits. Dan Kaminsky briefly talked about data format fuzzing in his "Black Ops 2006" presentation. He presented some ideas from his research into automated recognition of formats for the purposes of fuzzing them. Just feeding a random stream of bytes into a program meant to read a specific format is less likely to cause it to fail. With some rudimentary understanding of the format and fuzzing within that framework, much more interesting program failures can be provoked. Dan's slides are available here, unfortunately in PowerPoint format, but readable by OpenOffice.org.
Internationalization (i18n) is another potentially exploitable area for many applications. Scott Stender presented some ideas on fuzzing i18n data at Black Hat, in particular using Unicode representations to get bad data past validators when different levels of the application handle character encodings differently. He gave some explicit examples of input that might validate within a web application, but be interpreted differently by a database leading to various kinds of misbehavior. His slides are here (PDF).
Fuzzing can be used to find all kinds of security issues with a program: buffer overflows, SQL injection, cross-site scripting, denial of service, etc. It is, of course, no silver bullet. It is just a powerful technique to help a developer or tester pinpoint areas where input validation and filtering are not working and to give some level of confidence that validation is working in other areas.
| Index entries for this article | |
|---|---|
| GuestArticles | Edge, Jake |
