LWN.net Logo

HTML/CSS to PDF software

HTML/CSS to PDF software

Posted Jul 31, 2012 0:45 UTC (Tue) by pjm (subscriber, #2080)
In reply to: Command-line publishing with Easybook by gidoca
Parent article: Command-line publishing with Easybook

Some other HTML-to-pdf software I know of that might be useful for e-book content:

  • WeasyPrint. It was started with the intent of being easy to hack on, and use of css3-page things for page styling. The "easy to hack on" goal seems to have paid off fairly well, as they've made rapid progress over the last year, and supports more of css3-page (including pagination control) than I think any of the available free-software alternatives.

    Given the different priorities of WeasyPrint compared to wkhtmltopdf, you'd expect wkhtmltopdf to be much faster, but WeasyPrint is still pretty respectable at almost 5 pages a second on an i5 laptop — that's a lot faster than your printer. (wkhtmltopdf is indeed faster, at 30–50 pages a second, for those that need it; most other programs would be somewhere between the two.)

    It's a young product that still does things like truncate a float when it occurs near a page boundary, but if you have problems with it now then the progress it's made over the past few months makes me confident that it'll continue to get better over the next year or two. The hackability is a feature in itself: if you do find something missing, then you'll be more likely to be able to add it yourself with this software than any others that I'm familiar with.

  • Some more comments on wkhtmltopdf, beyond what's said above: wkhtmltopdf has become a much more viable option in the last year or so, especially if using the forked webkit engine with support for more page-control stuff. Older wkhtmltopdf/webkit was infamous for clipping text lines, but I haven't seen that happen in the current stuff (using the forked webkit). Last I saw, it still wasn't honouring 'widows' & 'orphans' and 'page-break-before/after: avoid' (so you'll sometimes see a heading at the very end of a page), but I imagine that that sort of thing is being worked on, and might already be fixed for all I know.

  • If HTML is only an intermediate format, then consider using some different intermediate for creating PDF. For example, Apache FOP, or one of the various docbook-based things, or reStructured text as mentioned in another comment. Many of these get really good output, better than any of the current free-software HTML/CSS options.

    A disadvantage of some of these is that you might not find it as easy to change the style of the output as the CSS-based approaches.

  • I hope that in the future, that last niche might be filled by some software that I'm working on, to be called Morp. It uses CSS for styling, and already makes better pagination and line-breaking decisions than the other HTML/CSS-based pdf renderers I know of, as can already be seen in Morp rendering of http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html-single/5.6_Release_Notes/index.html using a custom stylesheet for print-related things (using SVG images where available, and changing page headings, page margins and page numbering to match the corresponding Apache FOP output, which is currently produced using a completely separate XSL-based stylesheet).

    (Apache FOP doesn't do font substitution, so it doesn't produce usable PDF output for Red Hat's Indic-language documents, so I maintain a directory of Morp PDF output of the affected documents (along with the corresponding English translation). I should also mention that Apache FOP doesn't like having a keep-together (page-break-inside:avoid) region longer than a page, which is why that directory also has the HTTP load-balancing documents even though they aren't Indic-language.)

    There's still work to do before it's a half-way usable product, but Morp might later become a good option for creating PDF e-books using CSS styling.


(Log in to post comments)

HTML/CSS to PDF software

Posted Aug 7, 2012 13:36 UTC (Tue) by philomath (guest, #84172) [Link]

Nice roundup. there is also htmldoc.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds