Although most software and hardware ebook readers can cope with standard
PDF documents, text-based formats like HTML and EPUB make life easier for
the author and reader due to their flexibility and editability. They can
re-flow text to fit varying screen sizes and orientations, for example, and
better cope with missing fonts. But historically ebook authors and editors
have not had a good open source option, which meant anyone wishing to pitch
in at Project Gutenberg or start an
original book had to stick with a generic text editor or word processor.
The creators of Sigil are
attempting to change that by building a free book editor tailored to the recurring tasks in preparing a text for ebook publication.
Sigil has been in development since late 2009, and the current release is version 0.4.1. Releases are made simultaneously for Linux, Mac OS X, and Windows, with 32-bit and 64-bit versions for Linux and Windows. John Schember assumed maintainership of the project in July 2011 after founder Strahinja Marković decided to step down.
Schember pushed out the 0.4 release that had been in beta-and-RC limbo since Marković finished graduate school and work began consuming his time. There is also a degree of overlap (including Schember himself) between the Sigil development community and the Calibre ebook-manager project — which is good, considering the their shared and complementary concerns. The 0.4.1 release dropped on August 26. As Schember explained it on the development blog, in his numbering scheme an N.0 release is unstable and the N.1 release indicates that the application is now ready for public use.
Sigil uses Qt as its application framework and embeds WebKit to render
HTML book contents. However, the Linux downloads are InstallJammer installers rather
than standard RPM or Debian packages, and they bundle their own copy of the
necessary .so files, placing everything in /opt/sigil/.
Presumably this installation choice is the result of structuring the code
to produce cross-platform packages for those OSes that cannot simply fetch
the necessary dependencies through their package managers. As a result of
that choice, though, the installer weighs in at 20.5MB and you end up with some
duplicate libraries. Hopefully distributions will eventually start
packaging Sigil themselves to avoid that duplication.
The installer does create .desktop launcher files on the
Desktop and in the system's "Applications -> Office" menu, however. On the
down side, it also registers itself as the default file-helper for EPUB
downloads in Firefox, which may not be the desired behavior.
Getting around the ebook
Sigil's interface resembles a cross between a lightweight HTML editor and an ebook reader — the main content is in the center pane of the window, and you can toggle between rendered, source, and split views on the text, while navigation tools sit in side panes to either side.
On the left is a "book browser." The EPUB format consists of a ZIP file with nested folders containing XHTML text, images, CSS stylesheets, embedded fonts, and two metadata files. The .opf file includes publication metadata (author, copyright, etc.) and a manifest of the book's other content files, and the .ncx file contains an XML version of the table of contents (TOC). Sigil can edit any of the text-based files; double-clicking on one in the book browser opens it in a new tab. But you also use the book browser to add, remove, and manipulate extra media, such as images and fonts. On the right is a table-of-contents list, which automatically loads the TOC from an EPUB's .ncx file. You can jump between TOC entries by clicking on them in the list.
Basic text editing is self-explanatory. Sigil provides a word-processor-like toolbar for cutting, pasting, and basic formatting, plus shortcuts to validate the HTML and insert chapter break markers. The edit menus allow you to jump to specific lines by number, and the find-and-replace function is beefy enough to accept standard wildcard characters and Perl-like regular expressions.
When it comes to formatting, however, the need for the source code and split views becomes more clear. Basic HTML tags are good enough for text formatting, but the structure of EPUB files depends on its own set of conventions. For example, I tested several Project Gutenberg EPUBs in Sigil, and noticed that it showed numerous erroneous chapter breaks. Investigating the source, I found that the .ncx file flagged every headline HTML tag as a chapter break.
According to the documentation, Sigil evidently expects
headline tags only to be used to mark structural divisions in the text, and
also interprets them as a nested tree: H2 tags are subdivisions of H1 tags,
and so on. But Project Gutenberg uses headline tags to mark other
elements, such as the authors byline on the title page, and occasionally to
mark up text itself, such as centered text or "inscriptions" that are
separated from the main copy. When brought through to Sigil, these tags are inserted into the TOC. If the text merely uses different levels of headline tag for different display styles, they get nested into a hierarchical TOC anyway.
Editing — and grappling — with EPUB
Correcting this sort of problem requires re-formatting the HTML, perhaps even munging about in the CSS — such as using <DIV> or <SPAN> tags to apply styles within the text. While paragraph indentation and text weight are simple enough for WYSIWYG editing, both of those more complex tasks are better done in a proper code editor. Fortunately Sigil's code view includes syntax highlighting, parenthesis matching, and cleanup courtesy of HTML Tidy.
Changes you make to the CSS files are instantly picked up in the
rendered HTML view, which is essential. What seems to be missing, however,
is an interface to work with the other structural elements, starting with
element ID attributes. Sigil's TOC generator creates these automatically
when creating the TOC and inserts them into the text. Element IDs
are the preferred method for linking to content within a page
(replacing older anchor tags), but Sigil assigns them automatically, using
a mnemonically unfriendly number scheme. It would be helpful to allow some control over this process, particularly for those working on very long texts.
Sigil even actively discourages you from directly editing the .ncx and .opf files in the editor, popping up a scary warning dialog that you must dismiss to continue. Perhaps that is wise, and without it too many users would foul up their TOC. But considering that Sigil can re-generate a new TOC with one button click, it seems like an unnecessary red flag.
It is also possible to break your text up into multiple XHTML files, and use multiple CSS files, but the Sigil editor offers little in the way of managing them. The convention is to use a separate XHTML file for each chapter in a long work, and while you can right-click in the book editor and create a new file, the only management options are "merge with previous" and an "Add Semantics" sub-menu where you can check special pages such as introductions or glossaries.
Beyond the text itself, Sigil allows you to edit an EPUB's metadata (which is stored inside the book's .opf file). You do this with the "Tools -> Meta Editor" window, which provides a structured editor for key:value metadata pairs. The list of supported metadata properties is long; the editor breaks it into "basic" and "advanced" groupings. ISBN, Publisher, Author, Subject and so forth are "basic," while "advanced" contains several hundred possibilities, from "Actor" and "Adapter" all the way to "Woodcutter" and "Writer of accompanying material."
The tool is simple enough to use, particularly since each metadata
property comes with a brief explanation. Otherwise I might not have
guessed my way to the correct definitions of obscure properties like
"Respondent-appellee" or "Electrotyper." Scanning through the list, it is
clear that the EPUB community has far more than generic novels in mind for
its document format: everything from screenplays to academic dissertations
to legal filings are supported. In my experience, though, most ebooks take little advantage of the rich metadata options available — and free ebooks even less so than commercial ones.
You can also add images and fonts to the ebook by right-clicking in the
book browser, although doing so only adds the files to the browser. You
still must manually insert images into the text editor, and reference the
fonts in the CSS in order to take advantage of them. Whenever your editing
session is complete, Sigil's "Validate Epub" tool will look for errors
using a side-project validator called FlightCrew (although which
errors it looks for are not documented), and "File -> Save As" will generate an EPUB output file.
Old books and new books
At the moment EPUB is the only output format supported, although Sigil can also import HTML text and PDFs. The use cases I looked at first were editing existing ebooks — such as fixing formatting problems in Project Gutenberg books. Ideally, Sigil will someday be useful for other tasks, such as converting an HTML or MOBI-formatted work into a proper EPUB, or writing a new ebook from scratch.
Calibre can perform format conversions, although it does so automatically. In order to edit the result, you must convert it first and then open the result in Sigil. That is not too time-consuming, although it would help matters if Calibre's conversion function could be called from within Sigil at import-time — as it stands, Calibre's main focus is indexing one's ebook library, which makes converting a single file tedious.
More importantly, Sigil allows you to create a "new" ebook project, and will automatically generate empty .ncx and .opf skeleton files. But here the book-management shortcomings mentioned earlier come right to the forefront. Working on any kind of multi-file document is awkward, as is changing CSS formatting to perfect the layout and look. Hopefully future revisions of the application will build more tools for authors, taking care of some of the predictable but still manual-only tasks. For example, if you add a font to an EPUB package, it would be nice to be able highlight text in the editor and choose that font from the Format menu, or even to set it as the default for the entire book.
Perhaps dedicated EPUB editors are not the wave of the future, and in a few years managing an ebook text will be a simple task in normal word processors, or simply a "print-to-file" style option in every application. For now, however, if you want e-reader output, you need an application that understands the formats explicitly. In that regard, Sigil is vastly superior to using an HTML editor, or generating PDFs and hoping for the best.
Now that a new maintainer is driving development, the pace at which it advances should pick up, including the weak points mentioned above. There are other nitpicks in 0.4.1, such as the number of windows that require re-sizing in order for the user to read their contents, and the fact that opening a file from the File menu closes the current editing session (although it warns about this first with a dialog box). I was greatly relieved to find that Sigil uses standard system icons and widgets throughout its UI, unlike its cousin Calibre, which tends towards the garish.
I spent some time reading about the EPUB format itself, and although it is free, a major new revision is due shortly (for some value of "shortly"). The Sigil team is following the specification process, which is good, although there does not seem to be much interest in supporting other ebook formats, such as FictionBook, Plucker, or the Kindle format.
No doubt EPUB has plenty of life left in it, but as electronic publishing consumes more and more of the publishing industry, other formats are going to become just as important (if not more so). EPUB is not well suited for documents that rely on controlled layout, such as magazines or textbooks, nor does it handle math. Closed and proprietary formats are going to make a play for those documents; with luck Sigil — and other open source tools — will be ready. If you need to edit an EPUB today, Sigil is just what you want. If you are writing from scratch, however, you might find it easier to crank out your tome in another editor, and turn to Sigil for the final formatting.
Comments (5 posted)
In hindsight however, I think the complexity of Swig has exceeded
anyone's ability to fully understand it (including my own). For
example, to even make sense of what's happening, you have to have a
pretty solid grasp of the C/C++ type system (easier said than
done). Couple that with all sorts of crazy pattern matching,
low-level code fragments, and a ton of macro definitions, your head
will literally explode if you try to figure out what's happening.
So far as I know, recent versions of Swig have even combined all of
this type-pattern matching with regular expressions. I can't even
-- David Beazley
But if you want to be taken seriously as a researcher, you should
publish your code! Without publication of your *code* research in
your area cannot be reproduced by others, so it is not science.
-- Guido van Rossum
It's downright absurd for there to be a known and understood
crasher bug, affecting all users, in such a critical component for
so long without any acknowledgment or response by upstream or the
Fedora maintainers. This and the Flash audio corruption mess make
it fairly clear that glibc maintenance is not what it should be for
such a crucial package. Given that, the only sensible approach
seems to be to go ahead and Just Fix It.
-- Adam Williamson
Comments (none posted)
Version 2.4.0 of the bzr version control system is out. "This is a bugfix and polish release over the 2.3 series, with a large number
of bugs fixed (>150 for the 2.4 series alone), and some performance
improvements. Support for python 2.4 and 2.5 has been dropped, many large
working tree operations have been optimized as well as some stacked branches
Full Story (comments: none)
A new version of GNOME Shell is available. "While there are many
substantial features in this release, it's particular worth pointing out
the changes contributed by our Summer of Code Students: Nohemi Fernandez's
onscreen keyboard, Morten Mjelva's contact search, and Neha Doijode's work
on getting cover art and other images to display in notifications.
Full Story (comments: 27)
C parser has long been used to run certain types of static analysis checks
on the kernel source. It has been a slow-moving project for some time.
Now, however, Pekka Enberg and Jeff Garzik have announced a project to
couple sparse and the LLVM backend to produce a working C compiler. The
eventual goal is to compile the kernel; for now, they seem to be reasonably
happy with a working "hello world" example.
Full Story (comments: 33)
The Opa language project
has announced its
. "Opa is a new member in the family of languages
aiming to make web programming transparent by automatically generating
is written in OCaml. A hierarchical database and web server are integrated
with the language. The distribution model is based on a notion of a
session, a construct roughly comparable to process definitions in the
join-calculus or to concurrent objects in a number of formalisms.
for lots of information about Opa.
Comments (7 posted)
Newsletters and articles
Comments (none posted)
Over at opensource.com, Red Hat's Mike McLean looks at the history of the Koji build system
, starting from when it was an internal tool through the freeing of the code (at the Fedora Core 6 to Fedora 7 transition) to the present. "Of course, this newly unified Fedora would need a build system and it quickly became apparent that Koji was the right tool for the job. While Fedora Extras was already using Plague, it did not satisfy the requirements for building the entire distribution. So, after much discussion, Red Hat released Koji under an open source license. Koji became a key part of Fedora's new end-to-end, free and open infrastructure.
Comments (none posted)
Michael Reed looks
at GIMP 2.7.3 which now has Single Window Mode. "GIMP 2.7.3 has added one of the most
requested features in the program's history: a single window mode. Version
2.7 is part of the development branch, so unfortunately, the feature wont
hit most distro repositories for a while. If you want to have a sneak peek
at the new development features, you'll probably have to compile from
Comments (21 posted)
NetworkManager hacker Dan Williams has an overview of the new features in NetworkManager 0.9
on his blog. Among them: "When connected to a large unified WiFi network, like a workplace, university, or hotel, NetworkManager 0.9 enhances roaming behavior as you move between locations. By using the background scanning and nl80211 features in wpa_supplicant 0.7 and later, you'll notice fewer drops in connectivity and better signal quality in large networks. Most kernel drivers will now provide automatic updates of new access points and enhanced connection quality reporting, allowing wpa_supplicant to quickly roam to the best access point when the current access point's quality degrades and not before.
Comments (26 posted)
Page editor: Jonathan Corbet
Next page: Announcements>>