User: Password:
|
|
Subscribe / Log in / New account

Command-line publishing with Easybook

Did you know...?

LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

By Nathan Willis
July 25, 2012

The web and e-books were both supposed to kill off paper-based publishing, but the reality is that authors and publishers often need to produce editions for all three formats instead. Easybook is one of several open source tools for doing just that. It is a PHP program used to write book content just once, then export it for output in a variety of formats — currently EPUB, HTML, and PDF. Easybook certainly does make the actual rendering of content into a simple affair, but there are other issues to consider, including the ease of editing content, and Easybook's reliance on proprietary software under the hood.

Easybook itself is a script written in PHP, but designed to be called from the command line. The actual book content is written in Markdown format, stored one-chapter-per-file, with a separate YAML file holding the book's configuration settings and a description of its structure. The configuration settings allow you to define several "editions" of each book, which incorporate output templates plus variations in the content. For example, you might want to include a "list of figures" in the PDF edition of a book, but omit it from the HTML version where such things are uncommon. Whenever you are happy with the content, you can export it to one of the defined editions with the Easybook script. The script generates HTML and EPUB (which of course is derived from HTML) directly, and calls PrinceXML to do the HTML-to-PDF conversion for PDF output.

As is the case in any typesetting program, it is the quality of the templates and styles that make or break the final output. The "editions" that you define in an Easybook project's configuration file are themes that incorporate page settings, typography, CSS styles, and structure. Easybook defines four themes by default: EPUB, PDF, and two varieties of HTML (single-page HTML and "HTML chunked," which splits the book into separate pages for each chapter). Even four sizes do not fit all, however, so you can (and indeed should) edit and extend them for your own work.

You can download Easybook as a Zip archive or check it out via Git. The current release is version 4.4. PHP 5.3.2 is required, and the download bundles in most of the required PHP packages, such as the Symfony component framework and Twig templating system. Easybook itself is available under the MIT license and most of the other components are open source as well. PrinceXML is proprietary, however: you must either install the free-for-noncommercial edition, which watermarks the first page of each PDF, or buy a license.

Book creation 101

Starting a new book project with Easybook is as simple as executing the ./book new "My Title" command, which creates a skeleton directory structure for the new book located beneath your Easybook installation directory, wherever that may be, and with the book title morphed into a more Unix-like lowercase-and-hyphens form for the subdirectory name. The command also populates it with a default configuration file and some blank chapter files. The directory structure looks like:

    ./my-title/
               config.yml
               Contents/
                        chapter1.md
                        chapter2.md
                        images/
               Output/

As mentioned above, book text is written in Markdown format (hence the .md file extensions). I am not a huge fan of Markdown; in my experience its not-quite-HTML syntax requires just as much mental effort as HTML, but subsequently requires you to process your output before reading it. But it does have its supporters. In any case, when writing the meat of your text you can use any editor or combination of editors you choose. The chapter1.md and chapter2.md file names are there merely to guide you; you can name your files anything you wish, because you must edit the config.yml file to tell Easybook what your book consists of, and how it will look.

The config.yml file contains a header stanza that includes general-purpose information like title, author, and publication date. Watch out for the edition option, though: in the header, this refers to the publication edition, which is what will enable rare-book-collectors decades from now to recognize your valuable first editions and pay more to own them. Further down, the editions (note the plural) option is where you list and describe the Easybook "editions" mentioned above. The default file created by new includes the basic four theme types, although they have different names: "ebook" mean EPUB, "web" means single-file HTML, "website" means HTML chunked, and "print" means PDF.

The name of the edition you want is the argument you pass to Easybook when generating output. So, for instance,

    ./book publish my-title print
generates the "print" edition PDF, and places it in the Output/ subdirectory.

The next major section of the config.yml file is taken up by the contents stanza, in which you list the elements comprising your book and the files in which its data is contained. Every element that goes into the book has its own element: option in this section. Easybook understands about twenty different elements at present. Some of them can be included simply by listing the element, such as a table of contents:

        - { element: toc   }

Because the table of contents is generated automatically at export time, it requires no other configuration. Chapters, however, need to point to the correct file:
        - { element: chapter, number: 1, content: chapter1.md }
        - { element: chapter, number: 2, content: blahblahblahblah.md }
        - { element: chapter, number: 3, content: thebutlerdidit.md }

Since you specify the filename, it can be anything you want, and it is simple to rearrange the chapters. Some of the other elements work the same way as chapter, such as introduction and epilogue. There are two distinctions to using a separate element for these components, though: you can style them differently for output, and you can optionally include or omit them from the different editions of your book. Other elements, such as list-of-tables (lot) are automatically generated. You can also include higher-level divisions of your text with the part element.

Finally, down in the editions stanza you will see each of the editions defined for the book. The four defaults mentioned earlier each have an indented list of options, and you can add additional directives to adjust their output or to alter the way they interpret individual book elements. For instance, the toc element can take a deep directive telling it how many levels deep to index content. It takes a value from 1 to 6, which correspond to HTML's <h1> down to <h6> headings. By default toc searches only chapter, part and appendix elements to create its index, but you can add others by listing them after an elements: directive underneath toc. For example,

    toc:
        deep:       4
        elements:   ["appendix", "chapter", "preface", "afterword", "conclusion"]

Because toc's deep directive is only used in the editions stanza, rather than being up in the contents stanza, you can define your print, web, and ebook editions to have different depths in their tables of contents. There are other directives that are unique to a specific output format, for example a PDF edition can specify all four margin widths, whether pages are single-or-double-sided, and include an ISBN number.

Themes with variation

The simplest way to customize Easybook's output is to create your own editions. You can add them to the editions stanza and specify every option, or extend the basic set with different options. For example, you could either create a new edition called booklet with different font and page sizes from print, or you could put the extends: print directive in your new edition and automatically inherit all of print's settings.

A more complicated option is to override the way the default theme handles specific content types. You will recall that in the contents stanza, the chapter elements pointed to a file, but most other elements required no further attributes. In those cases, the default theme already has a Twig template defined that tells the Easybook renderer how to handle the element. For example, the license element has a boilerplate "All Rights reserved" license buried within Easybook's theme directory. But you can point to your own as well, such as:

        - { element: license, content: GNU-FDL.md }

The default behavior of the theme for each element type is stored in a Contents subdirectory of the edition type, which is itself a subdirectory of the theme, all of which lives beneath your personal Easybook installation location. Throw in the twenty-odd element types, and that adds up to a large set of files. For instance, the license element for the default PDF type is found in $your_easybook_install_dir/app/Resources/Themes/Base/Pdf/Contents/license.md.twig.

You are (clearly) not meant to find and modify these files by hand, but the online documentation lacks a complete reference for the default themes and what they produce. In some cases, this is a cosmetic issue, but in others it is significant. If you put a cover element in your book, Easybook will generate it based on the book title, author, and edition given in the header. If you want it to include the publication date, too, you have to find and modify the Twig file.

Because themes are a collection of Twig template files, you can also create your own. The Easybook documentation has a separate chapter on the process, which is good because there are a considerable number of pieces to assemble. Each theme requires Twig templates for every content element type, plus templates for tables, figures, source code listings, and the book as a whole. HTML themes require extra templates for basic layout, and EPUB themes require other templates to handle creating EPUB's metadata files. In addition, you must create a CSS stylesheet that defines the styles referenced in the Twig templates.

Admittedly, this is advanced stuff, and Easybook attempts to provide you with simpler methods to modify your output merely by adding attributes and options in the config.yml file. I suspect that if you found the time required to develop an Easybook theme from scratch, you could just as easily use LaTeX or another typesetting system.

Easybook's real competition is other lightweight (in terms of user interface) formatting systems like Sourcefabric's Booktype (which we covered in February). Between the two, Booktype edges out Easybook for ease-of-editing. It provides a web-based WYSIWYG editor rather than requiring Markdown, and it works automatically with distributed teams of authors. While Easybook's locally-installed CLI option is easy to use, the fact that it relies on storing the book's contents and configuration file in a single location does not lend itself well to working with others. Despite the romanticized notion of authors toiling away in remote cabins in the woods, few if any book projects are single-user affairs.

It is also a major strike against Easybook that its PDF export functionality comes from a proprietary library. There are certainly free software alternatives; perhaps the Easybook project was unimpressed with their output, but considering the fact that the free version of PrinceXML watermarks output, it is hardly a viable option anyway. That said, what Easybook does do is provide a straightforward way to maintain format-independent text works and rapidly generate output suitable for consumption. For a lot of armchair publishers, that may be enough.


(Log in to post comments)

Command-line publishing with Easybook

Posted Jul 26, 2012 13:01 UTC (Thu) by gidoca (subscriber, #62438) [Link]

> It is also a major strike against Easybook that its PDF export functionality comes from a proprietary library. There are certainly free software alternatives
I looked for FLOSS software that converts HTML to PDF back when I was developing a web application that needed to generate some PDFs. However, I didn't find anything that satisfied my needs. I didn't have very special requirements, just not-too-buggy CSS support, a possibility to insert or prevent a page break at a specific location, vector graphics inclusion (for the company logo), a possibility to integrate it into Rails, and that the tool doesn't depend on X11. In the end I settled on using prawn instead of generating from HTML, despite the fact that it is a nightmare for complex layouts.

Command-line publishing with Easybook

Posted Jul 26, 2012 15:13 UTC (Thu) by anselm (subscriber, #2796) [Link]

I have similar requirements and I ended up using Flying Saucer. It doesn't do vector graphics inclusion but the HTML-to-PDF conversion works reasonably well.

Command-line publishing with Easybook

Posted Jul 26, 2012 20:00 UTC (Thu) by jimparis (subscriber, #38647) [Link]

I've been pretty happy with the webkit-based wkhtmltopdf. It does require an X11 server, but that doesn't mean you need a real display; I use Debian's xvfb-run wrapper to automatically run it inside xvfb. You can also build wkhtmltopdf against a custom-patched version of QT, which adds a bunch of features, including a fully X11-free workflow. There are some issues in areas like page breaks (which can be somewhat controlled with CSS), but overall it's decent. I use it to generate PDF invoices for a Ruby on Rails based webstore.

Command-line publishing with Easybook

Posted Jul 26, 2012 20:06 UTC (Thu) by sciurus (subscriber, #58832) [Link]

I expect that both the easiest and most correct way to build such a program is to reuse a browser's rendering engine. http://code.google.com/p/wkhtmltopdf/ uses webkit, for example.

HTML/CSS to PDF software

Posted Jul 31, 2012 0:45 UTC (Tue) by pjm (subscriber, #2080) [Link]

Some other HTML-to-pdf software I know of that might be useful for e-book content:

  • WeasyPrint. It was started with the intent of being easy to hack on, and use of css3-page things for page styling. The "easy to hack on" goal seems to have paid off fairly well, as they've made rapid progress over the last year, and supports more of css3-page (including pagination control) than I think any of the available free-software alternatives.

    Given the different priorities of WeasyPrint compared to wkhtmltopdf, you'd expect wkhtmltopdf to be much faster, but WeasyPrint is still pretty respectable at almost 5 pages a second on an i5 laptop — that's a lot faster than your printer. (wkhtmltopdf is indeed faster, at 30–50 pages a second, for those that need it; most other programs would be somewhere between the two.)

    It's a young product that still does things like truncate a float when it occurs near a page boundary, but if you have problems with it now then the progress it's made over the past few months makes me confident that it'll continue to get better over the next year or two. The hackability is a feature in itself: if you do find something missing, then you'll be more likely to be able to add it yourself with this software than any others that I'm familiar with.

  • Some more comments on wkhtmltopdf, beyond what's said above: wkhtmltopdf has become a much more viable option in the last year or so, especially if using the forked webkit engine with support for more page-control stuff. Older wkhtmltopdf/webkit was infamous for clipping text lines, but I haven't seen that happen in the current stuff (using the forked webkit). Last I saw, it still wasn't honouring 'widows' & 'orphans' and 'page-break-before/after: avoid' (so you'll sometimes see a heading at the very end of a page), but I imagine that that sort of thing is being worked on, and might already be fixed for all I know.

  • If HTML is only an intermediate format, then consider using some different intermediate for creating PDF. For example, Apache FOP, or one of the various docbook-based things, or reStructured text as mentioned in another comment. Many of these get really good output, better than any of the current free-software HTML/CSS options.

    A disadvantage of some of these is that you might not find it as easy to change the style of the output as the CSS-based approaches.

  • I hope that in the future, that last niche might be filled by some software that I'm working on, to be called Morp. It uses CSS for styling, and already makes better pagination and line-breaking decisions than the other HTML/CSS-based pdf renderers I know of, as can already be seen in Morp rendering of http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html-single/5.6_Release_Notes/index.html using a custom stylesheet for print-related things (using SVG images where available, and changing page headings, page margins and page numbering to match the corresponding Apache FOP output, which is currently produced using a completely separate XSL-based stylesheet).

    (Apache FOP doesn't do font substitution, so it doesn't produce usable PDF output for Red Hat's Indic-language documents, so I maintain a directory of Morp PDF output of the affected documents (along with the corresponding English translation). I should also mention that Apache FOP doesn't like having a keep-together (page-break-inside:avoid) region longer than a page, which is why that directory also has the HTTP load-balancing documents even though they aren't Indic-language.)

    There's still work to do before it's a half-way usable product, but Morp might later become a good option for creating PDF e-books using CSS styling.

HTML/CSS to PDF software

Posted Aug 7, 2012 13:36 UTC (Tue) by philomath (guest, #84172) [Link]

Nice roundup. there is also htmldoc.

Command-line publishing with Easybook

Posted Aug 9, 2012 11:49 UTC (Thu) by gidoca (subscriber, #62438) [Link]

Thank you to everybody for your suggestions. I will check them out sometime.

Command-line publishing with Easybook

Posted Jul 27, 2012 8:18 UTC (Fri) by cmccabe (guest, #60281) [Link]

It's interesting to know about another option. The PrinceXML dependency sounds like a real pain, though. I'm also not crazy about having to install a bundled version of PHP.

I think if I were writing a book, I'd use TeX. There is a learning curve, but it's very powerful and stable. I know that TeX is used a lot by mathematicians and computer scientists, but I'm not too familiar with how widely used it is in the mainstream publishing industry. Does MS Word still prevail there?

Command-line publishing with Easybook

Posted Jul 27, 2012 12:22 UTC (Fri) by nix (subscriber, #2304) [Link]

In the genre fiction publishing industry, at least, the entire workflow is based on Word 97--2003-format files (and features, such as change tracking).

Command-line publishing with Easybook

Posted Jul 28, 2012 18:22 UTC (Sat) by dag- (subscriber, #30207) [Link]

I would use AsciiDoc for writing the content as it supports the Simple DocBook functionality. AsciiDoc converts natively to HTML, and converts to epub and pdf through its a2x helper.

For PDF output there are two options, either use DocBook+FOP, or asciidoc-odf to produce PDF output styled through LibreOffice.

Since AsciiDoc is used nowadays for various O'Reilly books it is up to par with the requirements for printed material and digital formats.

Command-line publishing with Easybook

Posted Jul 30, 2012 7:34 UTC (Mon) by valhalla (subscriber, #56634) [Link]

Personally I'm currently writing almost everything (article-lenght stuff and posts for my website, mostly) in reStructuredText and then using one of the many tools available to transform it.

The best results for PDF generation are still those that go through LaTeX, and they may require some knowdlege of it for customisation (which works for me, since I come from that :) ), but people who don't know LaTeX can still ask somebody else for that bit of customisation and concentrate on writing the text.

There are of course also tool to generate HTML and ePub, and also a ODT, which would then allow to generate .doc files for submission to traditional publishers.

Most of those tools are written in python (a good number comes straight from docutils) and of course they don't have dependencies on proprietary software.

I've been looking also for a toolchain based on pandoc, which is able to generate even more formats, but it's support for reStructuredText is still somewhat limited. On the other hand it is an alternative for people who would like to use markdown (it's "native" source format), and of course it is still totally free.


Copyright © 2012, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds