Posted Sep 1, 2012 5:12 UTC (Sat) by pjm (subscriber, #2080)
[Link]
The last time that I wanted an image of exactly how Gecko renders a web page (as distinct from any other rendering engine), I ran Firefox inside a virtual X session (xnest or xvnc or the like), with a very large virtual screen size to fit in the whole document. That approach only gave me a bitmap.
Since then, the cairo library (which Firefox uses) has added a debugging facility that might be useful for reproducing text as text; though I haven't tried it. It might also help to know that Inkscape can import and export PDF (including from the command line); this allows mechanical editing using sed/perl/python on SVG, which I find easier than editing PDF with command-line tools directly.
For people who don't need specifically Gecko's rendering, people posted information about a few options for rendering PDF from HTML in response to a different LWN article a couple of months ago.