I was just wondering if you had tried monkey testing. Dumb monkey tests only find crash/abort type bugs, but that is certainly good start. I wrote a script to perform Monkey tests and related tasks: spamming millions of keypresses, detecting crashes, finding minimal recipes to reproduce the crash, automatically finding the changeset that introduced the crash and so on. At the moment I've mostly used it on LyX (on which is has found about 60 bugs), but I am making it more general, and it has found a bug in abiword.