LWN.net Logo

Ebook editing with Sigil

August 31, 2011

This article was contributed by Nathan Willis

Although most software and hardware ebook readers can cope with standard PDF documents, text-based formats like HTML and EPUB make life easier for the author and reader due to their flexibility and editability. They can re-flow text to fit varying screen sizes and orientations, for example, and better cope with missing fonts. But historically ebook authors and editors have not had a good open source option, which meant anyone wishing to pitch in at Project Gutenberg or start an original book had to stick with a generic text editor or word processor. The creators of Sigil are attempting to change that by building a free book editor tailored to the recurring tasks in preparing a text for ebook publication.

Sigil has been in development since late 2009, and the current release is version 0.4.1. Releases are made simultaneously for Linux, Mac OS X, and Windows, with 32-bit and 64-bit versions for Linux and Windows. John Schember assumed maintainership of the project in July 2011 after founder Strahinja Marković decided to step down.

Schember pushed out the 0.4 release that had been in beta-and-RC limbo since Marković finished graduate school and work began consuming his time. There is also a degree of overlap (including Schember himself) between the Sigil development community and the Calibre ebook-manager project — which is good, considering the their shared and complementary concerns. The 0.4.1 release dropped on August 26. As Schember explained it on the development blog, in his numbering scheme an N.0 release is unstable and the N.1 release indicates that the application is now ready for public use.

Sigil uses Qt as its application framework and embeds WebKit to render HTML book contents. However, the Linux downloads are InstallJammer installers rather than standard RPM or Debian packages, and they bundle their own copy of the necessary .so files, placing everything in /opt/sigil/. Presumably this installation choice is the result of structuring the code to produce cross-platform packages for those OSes that cannot simply fetch the necessary dependencies through their package managers. As a result of that choice, though, the installer weighs in at 20.5MB and you end up with some duplicate libraries. Hopefully distributions will eventually start packaging Sigil themselves to avoid that duplication.

The installer does create .desktop launcher files on the Desktop and in the system's "Applications -> Office" menu, however. On the down side, it also registers itself as the default file-helper for EPUB downloads in Firefox, which may not be the desired behavior.

Getting around the ebook

[Editor]

Sigil's interface resembles a cross between a lightweight HTML editor and an ebook reader — the main content is in the center pane of the window, and you can toggle between rendered, source, and split views on the text, while navigation tools sit in side panes to either side.

On the left is a "book browser." The EPUB format consists of a ZIP file with nested folders containing XHTML text, images, CSS stylesheets, embedded fonts, and two metadata files. The .opf file includes publication metadata (author, copyright, etc.) and a manifest of the book's other content files, and the .ncx file contains an XML version of the table of contents (TOC). Sigil can edit any of the text-based files; double-clicking on one in the book browser opens it in a new tab. But you also use the book browser to add, remove, and manipulate extra media, such as images and fonts. On the right is a table-of-contents list, which automatically loads the TOC from an EPUB's .ncx file. You can jump between TOC entries by clicking on them in the list.

Basic text editing is self-explanatory. Sigil provides a word-processor-like toolbar for cutting, pasting, and basic formatting, plus shortcuts to validate the HTML and insert chapter break markers. The edit menus allow you to jump to specific lines by number, and the find-and-replace function is beefy enough to accept standard wildcard characters and Perl-like regular expressions.

When it comes to formatting, however, the need for the source code and split views becomes more clear. Basic HTML tags are good enough for text formatting, but the structure of EPUB files depends on its own set of conventions. For example, I tested several Project Gutenberg EPUBs in Sigil, and noticed that it showed numerous erroneous chapter breaks. Investigating the source, I found that the .ncx file flagged every headline HTML tag as a chapter break.

According to the documentation, Sigil evidently expects headline tags only to be used to mark structural divisions in the text, and also interprets them as a nested tree: H2 tags are subdivisions of H1 tags, and so on. But Project Gutenberg uses headline tags to mark other elements, such as the authors byline on the title page, and occasionally to mark up text itself, such as centered text or "inscriptions" that are separated from the main copy. When brought through to Sigil, these tags are inserted into the TOC. If the text merely uses different levels of headline tag for different display styles, they get nested into a hierarchical TOC anyway.

Editing — and grappling — with EPUB

Correcting this sort of problem requires re-formatting the HTML, perhaps even munging about in the CSS — such as using <DIV> or <SPAN> tags to apply styles within the text. While paragraph indentation and text weight are simple enough for WYSIWYG editing, both of those more complex tasks are better done in a proper code editor. Fortunately Sigil's code view includes syntax highlighting, parenthesis matching, and cleanup courtesy of HTML Tidy.

Changes you make to the CSS files are instantly picked up in the rendered HTML view, which is essential. What seems to be missing, however, is an interface to work with the other structural elements, starting with element ID attributes. Sigil's TOC generator creates these automatically when creating the TOC and inserts them into the text. Element IDs are the preferred method for linking to content within a page (replacing older anchor tags), but Sigil assigns them automatically, using a mnemonically unfriendly number scheme. It would be helpful to allow some control over this process, particularly for those working on very long texts.

Sigil even actively discourages you from directly editing the .ncx and .opf files in the editor, popping up a scary warning dialog that you must dismiss to continue. Perhaps that is wise, and without it too many users would foul up their TOC. But considering that Sigil can re-generate a new TOC with one button click, it seems like an unnecessary red flag.

It is also possible to break your text up into multiple XHTML files, and use multiple CSS files, but the Sigil editor offers little in the way of managing them. The convention is to use a separate XHTML file for each chapter in a long work, and while you can right-click in the book editor and create a new file, the only management options are "merge with previous" and an "Add Semantics" sub-menu where you can check special pages such as introductions or glossaries.

[Metadata editor]

Beyond the text itself, Sigil allows you to edit an EPUB's metadata (which is stored inside the book's .opf file). You do this with the "Tools -> Meta Editor" window, which provides a structured editor for key:value metadata pairs. The list of supported metadata properties is long; the editor breaks it into "basic" and "advanced" groupings. ISBN, Publisher, Author, Subject and so forth are "basic," while "advanced" contains several hundred possibilities, from "Actor" and "Adapter" all the way to "Woodcutter" and "Writer of accompanying material."

The tool is simple enough to use, particularly since each metadata property comes with a brief explanation. Otherwise I might not have guessed my way to the correct definitions of obscure properties like "Respondent-appellee" or "Electrotyper." Scanning through the list, it is clear that the EPUB community has far more than generic novels in mind for its document format: everything from screenplays to academic dissertations to legal filings are supported. In my experience, though, most ebooks take little advantage of the rich metadata options available — and free ebooks even less so than commercial ones.

[Adding semantics]

You can also add images and fonts to the ebook by right-clicking in the book browser, although doing so only adds the files to the browser. You still must manually insert images into the text editor, and reference the fonts in the CSS in order to take advantage of them. Whenever your editing session is complete, Sigil's "Validate Epub" tool will look for errors using a side-project validator called FlightCrew (although which errors it looks for are not documented), and "File -> Save As" will generate an EPUB output file.

Old books and new books

At the moment EPUB is the only output format supported, although Sigil can also import HTML text and PDFs. The use cases I looked at first were editing existing ebooks — such as fixing formatting problems in Project Gutenberg books. Ideally, Sigil will someday be useful for other tasks, such as converting an HTML or MOBI-formatted work into a proper EPUB, or writing a new ebook from scratch.

Calibre can perform format conversions, although it does so automatically. In order to edit the result, you must convert it first and then open the result in Sigil. That is not too time-consuming, although it would help matters if Calibre's conversion function could be called from within Sigil at import-time — as it stands, Calibre's main focus is indexing one's ebook library, which makes converting a single file tedious.

More importantly, Sigil allows you to create a "new" ebook project, and will automatically generate empty .ncx and .opf skeleton files. But here the book-management shortcomings mentioned earlier come right to the forefront. Working on any kind of multi-file document is awkward, as is changing CSS formatting to perfect the layout and look. Hopefully future revisions of the application will build more tools for authors, taking care of some of the predictable but still manual-only tasks. For example, if you add a font to an EPUB package, it would be nice to be able highlight text in the editor and choose that font from the Format menu, or even to set it as the default for the entire book.

Colophon

Perhaps dedicated EPUB editors are not the wave of the future, and in a few years managing an ebook text will be a simple task in normal word processors, or simply a "print-to-file" style option in every application. For now, however, if you want e-reader output, you need an application that understands the formats explicitly. In that regard, Sigil is vastly superior to using an HTML editor, or generating PDFs and hoping for the best.

Now that a new maintainer is driving development, the pace at which it advances should pick up, including the weak points mentioned above. There are other nitpicks in 0.4.1, such as the number of windows that require re-sizing in order for the user to read their contents, and the fact that opening a file from the File menu closes the current editing session (although it warns about this first with a dialog box). I was greatly relieved to find that Sigil uses standard system icons and widgets throughout its UI, unlike its cousin Calibre, which tends towards the garish.

I spent some time reading about the EPUB format itself, and although it is free, a major new revision is due shortly (for some value of "shortly"). The Sigil team is following the specification process, which is good, although there does not seem to be much interest in supporting other ebook formats, such as FictionBook, Plucker, or the Kindle format.

No doubt EPUB has plenty of life left in it, but as electronic publishing consumes more and more of the publishing industry, other formats are going to become just as important (if not more so). EPUB is not well suited for documents that rely on controlled layout, such as magazines or textbooks, nor does it handle math. Closed and proprietary formats are going to make a play for those documents; with luck Sigil — and other open source tools — will be ready. If you need to edit an EPUB today, Sigil is just what you want. If you are writing from scratch, however, you might find it easier to crank out your tome in another editor, and turn to Sigil for the final formatting.


(Log in to post comments)

epub shortcomings, mathematics

Posted Sep 1, 2011 6:30 UTC (Thu) by pjm (subscriber, #2080) [Link]

> nor does it handle math

The "major new revision ... due shortly" linked page mentions the addition of support for MathML. Does that adequately address the "doesn't handle mathematics" concern (once the new version is implemented and in use)?

epub shortcomings, mathematics

Posted Sep 1, 2011 13:32 UTC (Thu) by n8willis (editor, #43041) [Link]

If you're asking for my honest opinion, I'd say "we'll have to wait and see." MathML has its share of difficulties, but the biggest is that it is designed as a presentation markup language. That's great for the e-textbook use case, where authors would presumably write up formulas and proofs in some other system (and by that I mean another application, e.g., Mathematica) and have it converted, but editing MathML in GUI form remains a largely unsolved (or unoptimized, at any rate) problem. FireMath is decent enough, but can you imagine trying to integrate something like it into Sigil? That would be a lot of work.

Nate

epub shortcomings, mathematics

Posted Sep 2, 2011 9:00 UTC (Fri) by pjm (subscriber, #2080) [Link]

I think you'd agree that "[MathML] is designed as a presentation markup language" at least needs qualifying, given the content/semantics part of MathML and Mathematica's authors' substantial involvement in its development.

To anyone looking at adding MathML support to an editor like Sigil, I'd first suggest having a look at http://www.w3.org/Math/Software/ to see what the options are. There are a few approaches one could take for an initial implementation, from using human-editable ASCII representations à la TeX for simple uses (mentioning x² and the like in body text) to launching an external application for more complex uses. The external application might even be something that can only export to MathML rather than editing the MathML directly (so long as Sigil keeps track of what the input file is). Granted, these simple approaches are nothing if not "unoptimized"; but they're a start, and using TeX-like input is no worse than what academics have been doing for a few decades.

epub shortcomings, mathematics

Posted Sep 2, 2011 14:29 UTC (Fri) by n8willis (editor, #43041) [Link]

Yes, I mean Presentation MathML (I don't think Content MathML is applicable in the ebook context, since it's designed as a medium for human reading). We did a piece about the whole can of beans back in April: http://lwn.net/Articles/440313/ .... Unfortunately (though probably predictably) the comment thread eventually devolved into an argument over the abstract virtues of different approaches.

If you ask me, ultimately writing mathematics is always going to involve ambiguity -- just like writing language -- thanks to context and the brain, so debates about devising the "perfect" form of content markup are similarly moot.

Another question entirely is whether a text-centric app like Sigil is the right application to do something as layout-driven as a scientific work. I'd probably use Scribus for that; it already supports "render frames" for content in other formats (TeX included, as is gnuplot, lilypad, and a few others).

Nate

Ebook editing with Sigil

Posted Sep 4, 2011 17:42 UTC (Sun) by mst@redhat.com (subscriber, #60682) [Link]

> Calibre can perform format conversions, although it does so automatically. In order to edit the result, you must convert it first and then open the result in Sigil. That is not too time-consuming, although it would help matters if Calibre's conversion function could be called from within Sigil at import-time — as it stands, Calibre's main focus is indexing one's ebook library, which makes converting a single file tedious.

Do you know about the ebook-convert tool which is packaged
with calibre?
http://manual.calibre-ebook.com/cli/ebook-convert.html

Converting a single file is just a function of invoking that, no need for the GUI.

Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds