September 20, 2006
This article was contributed by Jake Edge.
Providing random or semi-random data to a program to see what happens is
an excellent black-box testing technique known as
fuzzing. Programs that
generate this data are, unsurprisingly, called fuzzers and are a potent
tool for folks doing penetration or other kinds of testing. After
sitting through some interesting presentations at this summer's
Black Hat Briefings, it seems like a good opportunity for an overview
of fuzzing and some pointers to tools, techniques and research.
Generating bad input for programs is a time-honored tradition for test
engineers, but human generated test cases tend to contain fewer tests
than a fuzzer can produce. In addition, test engineers may make
implicit assumptions about the kind of data that can or will be fed into
a program where an automated, brainless fuzzer will just try anything.
The simplest fuzzer will just send random bytes of data to a
program and see what, if anything, happens. It might also vary the length
of the data that it sends to explore buffer length issues and the like.
More sophisticated fuzzers extend those simple techniques with more
domain specific data. A fuzzer targeted at web applications might
generate GET and POST queries using (and abusing) the variables that
the form or page submits as well as adding in some random variables and
values. A fuzzer targeting a web browser might generate random input that
conformed to HTML syntax, with random tags and attributes as well as abusing
the defined tags. This domain specific approach tends to yield better
results by limiting the search space but that can lead to some of the same
implicit assumption problems that are prevalent in human generated
tests. A combination of both simple and complex fuzzing is likely the
best approach.
Open source tools for fuzzing various applications and protocols are
available; Jack Koziol provides a nice, but not exhaustive,
list.
While it is not specifically a fuzzer, one must mention
Metasploit, the swiss army knife of
penetration testing, which provides a framework for all kinds of exploit
testing. It would appear that the Ruby language is gaining some traction
for penetration testing as Metasploit has been rewritten in Ruby for its
next version and
RFuzz provides a nice library
for web application fuzzing. Most other popular languages (C, Perl, Python,
Java) are represented as well.
Researchers at the University of Central Florida are trying to take fuzzing
a step further by using information about what portions of the code
were exercised by various inputs and whether they led to program crashes
to drive a
genetic
algorithm that 'optimizes' for inputs that are likely to cause
crashes. Obviously, this is no longer black-box testing, but it could be
a fairly useful technique for projects that are looking for vulnerabilities
in their own code. Slides from the Black Hat presentation are available
here
(PDF).
An input source that is often overlooked is data files. Because these files
are often generated by a program, it is easy to write code that
blindly believes what a data file says; this mistake has led
to many exploits. Dan Kaminsky briefly talked about data format fuzzing in
his "Black Ops 2006" presentation. He presented some ideas from his research
into automated recognition of formats for the purposes of fuzzing them.
Just feeding a random stream of bytes into a program meant to read a specific
format is less likely to cause it to fail. With some rudimentary understanding
of the format and fuzzing within that framework, much more interesting
program failures can be provoked. Dan's slides are available
here,
unfortunately in PowerPoint format, but readable by OpenOffice.org.
Internationalization (i18n) is another potentially exploitable area for many
applications. Scott Stender presented some ideas on fuzzing i18n data
at Black Hat, in particular using Unicode representations to get bad data
past validators when different levels of the application handle character
encodings differently. He gave some explicit examples of input that might
validate within a web application, but be interpreted differently by a database
leading to various kinds of misbehavior. His slides are
here
(PDF).
Fuzzing can be used to find all kinds of security issues with a program:
buffer overflows, SQL injection, cross-site scripting, denial of service,
etc. It is, of course, no silver bullet. It is just a powerful
technique to help a developer or tester pinpoint areas where input
validation and filtering are not working and to give some level of confidence
that validation is working in other areas.
(
Log in to post comments)