August 31, 2011
This article was contributed by Nathan Willis
Although most software and hardware ebook readers can cope with standard
PDF documents, text-based formats like HTML and EPUB make life easier for
the author and reader due to their flexibility and editability. They can
re-flow text to fit varying screen sizes and orientations, for example, and
better cope with missing fonts. But historically ebook authors and editors
have not had a good open source option, which meant anyone wishing to pitch
in at Project Gutenberg or start an
original book had to stick with a generic text editor or word processor.
The creators of Sigil are
attempting to change that by building a free book editor tailored to the recurring tasks in preparing a text for ebook publication.
Sigil has been in development since late 2009, and the current release is version 0.4.1. Releases are made simultaneously for Linux, Mac OS X, and Windows, with 32-bit and 64-bit versions for Linux and Windows. John Schember assumed maintainership of the project in July 2011 after founder Strahinja Marković decided to step down.
Schember pushed out the 0.4 release that had been in beta-and-RC limbo since Marković finished graduate school and work began consuming his time. There is also a degree of overlap (including Schember himself) between the Sigil development community and the Calibre ebook-manager project — which is good, considering the their shared and complementary concerns. The 0.4.1 release dropped on August 26. As Schember explained it on the development blog, in his numbering scheme an N.0 release is unstable and the N.1 release indicates that the application is now ready for public use.
Sigil uses Qt as its application framework and embeds WebKit to render
HTML book contents. However, the Linux downloads are InstallJammer installers rather
than standard RPM or Debian packages, and they bundle their own copy of the
necessary .so files, placing everything in /opt/sigil/.
Presumably this installation choice is the result of structuring the code
to produce cross-platform packages for those OSes that cannot simply fetch
the necessary dependencies through their package managers. As a result of
that choice, though, the installer weighs in at 20.5MB and you end up with some
duplicate libraries. Hopefully distributions will eventually start
packaging Sigil themselves to avoid that duplication.
The installer does create .desktop launcher files on the
Desktop and in the system's "Applications -> Office" menu, however. On the
down side, it also registers itself as the default file-helper for EPUB
downloads in Firefox, which may not be the desired behavior.
Getting around the ebook
Sigil's interface resembles a cross between a lightweight HTML editor and an ebook reader — the main content is in the center pane of the window, and you can toggle between rendered, source, and split views on the text, while navigation tools sit in side panes to either side.
On the left is a "book browser." The EPUB format consists of a ZIP file with nested folders containing XHTML text, images, CSS stylesheets, embedded fonts, and two metadata files. The .opf file includes publication metadata (author, copyright, etc.) and a manifest of the book's other content files, and the .ncx file contains an XML version of the table of contents (TOC). Sigil can edit any of the text-based files; double-clicking on one in the book browser opens it in a new tab. But you also use the book browser to add, remove, and manipulate extra media, such as images and fonts. On the right is a table-of-contents list, which automatically loads the TOC from an EPUB's .ncx file. You can jump between TOC entries by clicking on them in the list.
Basic text editing is self-explanatory. Sigil provides a word-processor-like toolbar for cutting, pasting, and basic formatting, plus shortcuts to validate the HTML and insert chapter break markers. The edit menus allow you to jump to specific lines by number, and the find-and-replace function is beefy enough to accept standard wildcard characters and Perl-like regular expressions.
When it comes to formatting, however, the need for the source code and split views becomes more clear. Basic HTML tags are good enough for text formatting, but the structure of EPUB files depends on its own set of conventions. For example, I tested several Project Gutenberg EPUBs in Sigil, and noticed that it showed numerous erroneous chapter breaks. Investigating the source, I found that the .ncx file flagged every headline HTML tag as a chapter break.
According to the documentation, Sigil evidently expects
headline tags only to be used to mark structural divisions in the text, and
also interprets them as a nested tree: H2 tags are subdivisions of H1 tags,
and so on. But Project Gutenberg uses headline tags to mark other
elements, such as the authors byline on the title page, and occasionally to
mark up text itself, such as centered text or "inscriptions" that are
separated from the main copy. When brought through to Sigil, these tags are inserted into the TOC. If the text merely uses different levels of headline tag for different display styles, they get nested into a hierarchical TOC anyway.
Editing — and grappling — with EPUB
Correcting this sort of problem requires re-formatting the HTML, perhaps even munging about in the CSS — such as using <DIV> or <SPAN> tags to apply styles within the text. While paragraph indentation and text weight are simple enough for WYSIWYG editing, both of those more complex tasks are better done in a proper code editor. Fortunately Sigil's code view includes syntax highlighting, parenthesis matching, and cleanup courtesy of HTML Tidy.
Changes you make to the CSS files are instantly picked up in the
rendered HTML view, which is essential. What seems to be missing, however,
is an interface to work with the other structural elements, starting with
element ID attributes. Sigil's TOC generator creates these automatically
when creating the TOC and inserts them into the text. Element IDs
are the preferred method for linking to content within a page
(replacing older anchor tags), but Sigil assigns them automatically, using
a mnemonically unfriendly number scheme. It would be helpful to allow some control over this process, particularly for those working on very long texts.
Sigil even actively discourages you from directly editing the .ncx and .opf files in the editor, popping up a scary warning dialog that you must dismiss to continue. Perhaps that is wise, and without it too many users would foul up their TOC. But considering that Sigil can re-generate a new TOC with one button click, it seems like an unnecessary red flag.
It is also possible to break your text up into multiple XHTML files, and use multiple CSS files, but the Sigil editor offers little in the way of managing them. The convention is to use a separate XHTML file for each chapter in a long work, and while you can right-click in the book editor and create a new file, the only management options are "merge with previous" and an "Add Semantics" sub-menu where you can check special pages such as introductions or glossaries.
Beyond the text itself, Sigil allows you to edit an EPUB's metadata (which is stored inside the book's .opf file). You do this with the "Tools -> Meta Editor" window, which provides a structured editor for key:value metadata pairs. The list of supported metadata properties is long; the editor breaks it into "basic" and "advanced" groupings. ISBN, Publisher, Author, Subject and so forth are "basic," while "advanced" contains several hundred possibilities, from "Actor" and "Adapter" all the way to "Woodcutter" and "Writer of accompanying material."
The tool is simple enough to use, particularly since each metadata
property comes with a brief explanation. Otherwise I might not have
guessed my way to the correct definitions of obscure properties like
"Respondent-appellee" or "Electrotyper." Scanning through the list, it is
clear that the EPUB community has far more than generic novels in mind for
its document format: everything from screenplays to academic dissertations
to legal filings are supported. In my experience, though, most ebooks take little advantage of the rich metadata options available — and free ebooks even less so than commercial ones.
You can also add images and fonts to the ebook by right-clicking in the
book browser, although doing so only adds the files to the browser. You
still must manually insert images into the text editor, and reference the
fonts in the CSS in order to take advantage of them. Whenever your editing
session is complete, Sigil's "Validate Epub" tool will look for errors
using a side-project validator called FlightCrew (although which
errors it looks for are not documented), and "File -> Save As" will generate an EPUB output file.
Old books and new books
At the moment EPUB is the only output format supported, although Sigil can also import HTML text and PDFs. The use cases I looked at first were editing existing ebooks — such as fixing formatting problems in Project Gutenberg books. Ideally, Sigil will someday be useful for other tasks, such as converting an HTML or MOBI-formatted work into a proper EPUB, or writing a new ebook from scratch.
Calibre can perform format conversions, although it does so automatically. In order to edit the result, you must convert it first and then open the result in Sigil. That is not too time-consuming, although it would help matters if Calibre's conversion function could be called from within Sigil at import-time — as it stands, Calibre's main focus is indexing one's ebook library, which makes converting a single file tedious.
More importantly, Sigil allows you to create a "new" ebook project, and will automatically generate empty .ncx and .opf skeleton files. But here the book-management shortcomings mentioned earlier come right to the forefront. Working on any kind of multi-file document is awkward, as is changing CSS formatting to perfect the layout and look. Hopefully future revisions of the application will build more tools for authors, taking care of some of the predictable but still manual-only tasks. For example, if you add a font to an EPUB package, it would be nice to be able highlight text in the editor and choose that font from the Format menu, or even to set it as the default for the entire book.
Colophon
Perhaps dedicated EPUB editors are not the wave of the future, and in a few years managing an ebook text will be a simple task in normal word processors, or simply a "print-to-file" style option in every application. For now, however, if you want e-reader output, you need an application that understands the formats explicitly. In that regard, Sigil is vastly superior to using an HTML editor, or generating PDFs and hoping for the best.
Now that a new maintainer is driving development, the pace at which it advances should pick up, including the weak points mentioned above. There are other nitpicks in 0.4.1, such as the number of windows that require re-sizing in order for the user to read their contents, and the fact that opening a file from the File menu closes the current editing session (although it warns about this first with a dialog box). I was greatly relieved to find that Sigil uses standard system icons and widgets throughout its UI, unlike its cousin Calibre, which tends towards the garish.
I spent some time reading about the EPUB format itself, and although it is free, a major new revision is due shortly (for some value of "shortly"). The Sigil team is following the specification process, which is good, although there does not seem to be much interest in supporting other ebook formats, such as FictionBook, Plucker, or the Kindle format.
No doubt EPUB has plenty of life left in it, but as electronic publishing consumes more and more of the publishing industry, other formats are going to become just as important (if not more so). EPUB is not well suited for documents that rely on controlled layout, such as magazines or textbooks, nor does it handle math. Closed and proprietary formats are going to make a play for those documents; with luck Sigil — and other open source tools — will be ready. If you need to edit an EPUB today, Sigil is just what you want. If you are writing from scratch, however, you might find it easier to crank out your tome in another editor, and turn to Sigil for the final formatting.
(
Log in to post comments)