> It is also a major strike against Easybook that its PDF export functionality comes from a proprietary library. There are certainly free software alternatives
I looked for FLOSS software that converts HTML to PDF back when I was developing a web application that needed to generate some PDFs. However, I didn't find anything that satisfied my needs. I didn't have very special requirements, just not-too-buggy CSS support, a possibility to insert or prevent a page break at a specific location, vector graphics inclusion (for the company logo), a possibility to integrate it into Rails, and that the tool doesn't depend on X11. In the end I settled on using prawn instead of generating from HTML, despite the fact that it is a nightmare for complex layouts.
Posted Jul 26, 2012 15:13 UTC (Thu) by anselm (subscriber, #2796)
[Link]
I have similar requirements and I ended up using Flying Saucer. It doesn't do vector graphics inclusion but the HTML-to-PDF conversion works reasonably well.
Command-line publishing with Easybook
Posted Jul 26, 2012 20:00 UTC (Thu) by jimparis (subscriber, #38647)
[Link]
I've been pretty happy with the webkit-based wkhtmltopdf. It does require an X11 server, but that doesn't mean you need a real display; I use Debian's xvfb-run wrapper to automatically run it inside xvfb. You can also build wkhtmltopdf against a custom-patched version of QT, which adds a bunch of features, including a fully X11-free workflow. There are some issues in areas like page breaks (which can be somewhat controlled with CSS), but overall it's decent. I use it to generate PDF invoices for a Ruby on Rails based webstore.
Command-line publishing with Easybook
Posted Jul 26, 2012 20:06 UTC (Thu) by sciurus (subscriber, #58832)
[Link]
I expect that both the easiest and most correct way to build such a program is to reuse a browser's rendering engine. http://code.google.com/p/wkhtmltopdf/ uses webkit, for example.
HTML/CSS to PDF software
Posted Jul 31, 2012 0:45 UTC (Tue) by pjm (subscriber, #2080)
[Link]
Some other HTML-to-pdf software I know of that might be useful for e-book content:
WeasyPrint. It was started with the intent of being easy to hack on, and use of css3-page things for page styling. The "easy to hack on" goal seems to have paid off fairly well, as they've made rapid progress over the last year, and supports more of css3-page (including pagination control) than I think any of the available free-software alternatives.
Given the different priorities of WeasyPrint compared to wkhtmltopdf, you'd expect wkhtmltopdf to be much faster, but WeasyPrint is still pretty respectable at almost 5 pages a second on an i5 laptop — that's a lot faster than your printer. (wkhtmltopdf is indeed faster, at 30–50 pages a second, for those that need it; most other programs would be somewhere between the two.)
It's a young product that still does things like truncate a float when it occurs near a page boundary, but if you have problems with it now then the progress it's made over the past few months makes me confident that it'll continue to get better over the next year or two. The hackability is a feature in itself: if you do find something missing, then you'll be more likely to be able to add it yourself with this software than any others that I'm familiar with.
Some more comments on wkhtmltopdf, beyond what's said above: wkhtmltopdf has become a much more viable option in the last year or so, especially if using the forked webkit engine with support for more page-control stuff. Older wkhtmltopdf/webkit was infamous for clipping text lines, but I haven't seen that happen in the current stuff (using the forked webkit). Last I saw, it still wasn't honouring 'widows' & 'orphans' and 'page-break-before/after: avoid' (so you'll sometimes see a heading at the very end of a page), but I imagine that that sort of thing is being worked on, and might already be fixed for all I know.
If HTML is only an intermediate format, then consider using some different intermediate for creating PDF. For example, Apache FOP, or one of the various docbook-based things, or reStructured text as mentioned in another comment. Many of these get really good output, better than any of the current free-software HTML/CSS options.
A disadvantage of some of these is that you might not find it as easy to change the style of the output as the CSS-based approaches.
I hope that in the future, that last niche might be filled by some software that I'm working on, to be called Morp. It uses CSS for styling, and already makes better pagination and line-breaking decisions than the other HTML/CSS-based pdf renderers I know of, as can already be seen in Morp rendering of http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html-single/5.6_Release_Notes/index.html using a custom stylesheet for print-related things (using SVG images where available, and changing page headings, page margins and page numbering to match the corresponding Apache FOP output, which is currently produced using a completely separate XSL-based stylesheet).
(Apache FOP doesn't do font substitution, so it doesn't produce usable PDF output for Red Hat's Indic-language documents, so I maintain a directory of Morp PDF output of the affected documents (along with the corresponding English translation). I should also mention that Apache FOP doesn't like having a keep-together (page-break-inside:avoid) region longer than a page, which is why that directory also has the HTTP load-balancing documents even though they aren't Indic-language.)
There's still work to do before it's a half-way usable product, but Morp might later become a good option for creating PDF e-books using CSS styling.
HTML/CSS to PDF software
Posted Aug 7, 2012 13:36 UTC (Tue) by philomath (guest, #84172)
[Link]
Nice roundup. there is also htmldoc.
Command-line publishing with Easybook
Posted Aug 9, 2012 11:49 UTC (Thu) by gidoca (subscriber, #62438)
[Link]
Thank you to everybody for your suggestions. I will check them out sometime.