|
|
Subscribe / Log in / New account

LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite

LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite

Posted May 23, 2017 19:58 UTC (Tue) by welinder (guest, #4699)
Parent article: LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite

I wonder how well this fuzzing is working for LO. If, in fact, it really is working.

LO's native file format is xml-inside-zip. If you fuzz a zip file directly, you are going to trip up either the checksum or the compression in the zip layer. I.e., you are fuzzing the zip library and very little of LO. And no finite amount of coverage-based mutation is going to change that. I tried.

If you fuzz the xml and stuff it inside a well-formed zip container you will get further, but you will mostly be testing the xml library. If the xml library does a full validation first, then possibly you will be testing only the xml library because almost any mutation will lead to a malformed xml file. Little fuzzed data will make it into the guts of the program.

Contrast this to file formats that are basically sequences of binary records. Think the outdated xls format. Or some image format. For those you really will manage to get fuzzed data into the guts of the program.

Perhaps someone in the know could tell how they get around these obstacles.

(And, boy, is that article full of PR speak.)


to post comments

LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite

Posted May 23, 2017 20:36 UTC (Tue) by khim (subscriber, #9252) [Link] (2 responses)

You could read tutorial. Yes, fuzzing is an art, you need some clever ideas about what to fuzz and at what level.

I don't think they started with XML or, even worse, ZIP files. I hope they used some functions which are beyond protection offered by "this must be a valid zip" or "this should be a valid XML" layers. Although I'm not sure if XML fuzzing is so hopeless as you describe it: if you start with valid XML and alter it slightly - chances are great that the result would still be an XML file, but with some internal logic broken (IDs of objects which don't exist are referenced, etc). "Great" here means: "do thousand tries and you'll get one candidate which triggers new path in code"

LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite

Posted May 24, 2017 13:31 UTC (Wed) by welinder (guest, #4699) [Link] (1 responses)

It's a fine tutorial saying, in essence, that fuzz testing is done by repeatedly sending you a block of data with which you do something interesting. That doesn't shed a whole lot of light on what might be done with LO.

I have done the xml fuzzing experiment -- with Gnumeric, not LO. I have done it both with
"binary" fuzzers like American Fuzzy Lop and specialized fuzzers that know a good deal about xml. While the latter is getting you further, faster, the coverage is still depressingly shallow.
The only good news here is that they get precisely the same outer-layer bugs that anyone else doing an automated scan will get.

Fuzzing is a numbers game. You really cannot afford to have all but one in a million trials fail early due to consistency tests. If some xml attribute needs to contain a reference to a spreadsheet cell, then random mutation isn't very likely to produce a different, still-valid reference. Contrast that with xls where just about any replacement bit pattern will do.

Note, that the xml used by LO and Gnumeric compress so well that compression is built-in to the file formats. They compress so well precisely because there are so many syntactic rules that the files must satisfy, both at the xml level -- tags must nest properly -- and at the application level -- some attribute must be an integer.

LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite

Posted May 24, 2017 14:57 UTC (Wed) by epa (subscriber, #39769) [Link]

It's OK if 999999 out of your million trials fail due to XML syntax or consistency errors -- as long as they do so *fast*. Perhaps the fuzzer needs some initial checker to weed out the obvious failures. It could link in a common XML parsing library and check for well-formed-ness before passing on the test cases to the program being fuzzed.

LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite

Posted May 23, 2017 20:40 UTC (Tue) by xtifr (guest, #143) [Link]

They handle many formats other than their native ones, if you'll remember. Including a fair number of binary formats--most notably, classic .doc format. Also, there's various graphics formats which they definitely fuzz--I see files named things like "pngfuzzer.cxx", "bmpfuzzer.cxx", and "epsfuzzer.cxx". They also have an extensive API/ABI, which includes not only their native BASIC, but plugins which offer things like python or javascript.

LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite

Posted May 24, 2017 9:31 UTC (Wed) by epa (subscriber, #39769) [Link] (2 responses)

Can you start off with an uncompressed zipfile (zip -0)?

LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite

Posted May 25, 2017 7:39 UTC (Thu) by shiftee (subscriber, #110711) [Link] (1 responses)

LibreOffice has a save option called .fodt (Flat) which does not compress the output.
It's useful for scripting

LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite

Posted May 25, 2017 20:12 UTC (Thu) by spaetz (guest, #32870) [Link]

I find fotd also to be nice for git repos as it actually allows to be sensible file diffs in many cases.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds