September 12, 2006
This article was contributed by John L. Clark
Introduction
The OpenDocument Format, developed under OASIS
(Organization for the Advancement of Structured Information Standards),
has been getting quite a bit of attention lately.
ODF is an Open Standard and it serves as an important vehicle for the Free Software community and this community's information;
the Software Freedom Law Center recently
confirmed
that ODF is safe from patent claims from its OASIS Technical Committee
members. Version 1.0 of the format
was ratified in May of 2005 by this TC, and ODF
recently arrived at one of the last stages in its process towards
ISO/IEC adoption as ISO/IEC 26300.
The state of Massachusetts underwent a grueling and
well-scrutinized process last year in which
it decided to use ODF for its official documents; at least one vendor
strongly opposed this decision, but even this vendor has recently
announced work
on interoperability with ODF.
All this attention is well-deserved, for ODF intends to provide the structure for many of the documents that store many users' information:
"office" documents. The basic purpose of a format for office documents
is to encode the presentation of information. Most commonly, office
documents encode how to present page-based sequential documents in
print, spreadsheets in various media, and slides in interactive display
and various other media.
One alternative approach to authoring content focuses on the semantics
of the information; this approach requires more discipline but can
provide some advantages, particularly where it comes to reusing the
information.
In addition to ODF, OASIS also oversees the development of DocBook,
which takes this alternative approach. Several significant events in
DocBook development warrant some attention in that direction.
DocBook was originally developed as
an SGML application and has been modernized to simultaneously support
SGML and XML; it focuses on the semantics of software and hardware
documentation. DocBook also provides a clear and rich representation of
the semantics of general-purpose documentation, including detailed
structures for bibliographic information, glossaries, and a variety of
contextual devices such as footnotes. Many free software projects make
use of DocBook (or a variant), including
KDE,
GNOME, and
OpenDarwin.
Not surprisingly, The Linux Documentation Project
makes heavy use of DocBook.
What can you do to read a DocBook file if you (unexpectedly) receive
one? Perhaps the easiest approach is to use the DocBook XSL stylesheets
to format the file as HTML, then view it with your favorite web browser.
The xsltproc utility provides XML translation
functionality, and it is easy to install if your distribution does not
already provide it. Using xsltproc, you can translate a
DocBook file to HTML with the command: xsltproc
http://docbook.sourceforge.net/release/xsl/current/html/docbook.xsl
file.docbook > file.html. Other translation tools and
stylesheets exist, and perhaps the best solution is to use a native
reader or editor of DocBook, such as
Vex or
Conglomerate,
to view and interact with the file directly.
The DocBook language: present and future
The DocBook 4 development line currently produces the stable version
of DocBook: DocBook 4.4. The current "OASIS Standard" version of DocBook,
however, is DocBook 4.1, which is why you often see projects using DocBook
4.1.2—the latest bug-fix version of DocBook 4.1.
DocBook 4.5 is nearly completed, and has also been submitted for approval
as an OASIS Standard.
Release Candidate 3 (released in June) will likely become the newest
stable version; RC2 was itself almost accepted as an OASIS Standard until
a small bug in the specification forced the version bump.
As a matter of DocBook project policy, individual DocBook minor versions
within a major version are backwards compatible with previous minor
versions in the same major version. For example, all documents written
in DocBook 4.1.2 are valid DocBook 4.4 documents and all DocBook 4.4
documents will be valid DocBook 4.5 documents when that version is
available.
These minor versions of DocBook 4 have subtly added to its
expressiveness in addition to adding completely new elements, such as
user-requested markup for describing tasks.
A new major version of DocBook, version 5, is rapidly approaching.
DocBook 5 explicitly breaks backwards compatibility in order to move in
some new directions, which largely have to do with aspects of the
underlying technology. The naming and semantics of markup in DocBook 5, on
the other hand, strongly reflect DocBook 4. DocBook 5 makes a break from
its SGML roots, moving to aspects of XML technology that are not
represented in the SGML model.
The most prominent of the architectural
changes is that DocBook 5 now uses an
XML namespace
for its element set. This
namespace will be used by the stable version when it is released
so users will not need to migrate to a different namespace once DocBook 5
stabilizes. The use of an XML namespace allows DocBook to more cleanly
take advantage of other XML dialects such as SVG and MathML; it also
allows other languages to more easily integrate DocBook, or subsets of
DocBook, in places where they want to express prose documentation.
Validation and new features
Document validation is an important tool for supporting document
interoperability. Through version 4, DocBook has primarily provided a
Document Type Definition (DTD) for assessing document validity.
DTDs are well supported and built into the core XML specification, but
they are not able to deal with XML Namespaces and they are not as
expressive as more modern tools.
For these and other reasons, DocBook 5 (like ODF) provides a
RELAX NG
schema as its basis for validation. RELAX NG is more context-aware,
which means that in several places certain DocBook constructs have been
simplified or merged, and a number of previously unenforceable constraints
are now enforced.
The DocBook 5 schema in RELAX NG is also highly modular,
which means that anyone interested in modifying the language can easily
pick and choose from small components to build their custom language. If
needed, users can also use less accurate, monolithic DTDs or W3C XML
Schemas that are generated from the RELAX NG schema. In addition to RELAX
NG, the DocBook 5 schema uses a set of optional Schematron assertions to help
validate those hard-to-reach places.
DocBook 5 also sports new and improved facilities for expressing
content. Instead of native hypertext markup, it uses XLink for hypertext references.
Interestingly, in DocBook 5 almost every element can serve as a hyperlink:
if xlink is bound to the XLink namespace, then simply set
xlink:href="target" on an element to have that element point
at the target. In XLink, these types of links are called Simple Links;
DocBook 5 also adds support for XLink Extended Links using the new,
imaginatively named extendedlink element.
DocBook 5 continues
to use XInclude to support transclusion. In addition to many fixes, the
removal of several obsolete components, and a number of small adjustments,
it also introduces elements designed to support new features, such as a
general mechanism for annotating content and a structure for noting the
correspondence between a term and its definition.
Practical considerations
DocBook 5 will likely have a stable release soon. Norman Walsh, the
main hacker, er, lead architect of DocBook 5, published his first
experiments with the new language in May of 2003 and the first official
beta of DocBook 5 was published in October of 2005. It is currently
at beta 7, and there will be several release candidates before the
Technical Committee applies the official DocBook 5.0 seal of approval.
Many of the tools for processing DocBook have gained DocBook 5 support
as DocBook 5 has developed. Many users take advantage of the (previously
mentioned) DocBook XSL stylesheets for
converting DocBook to other formats for publication, such as HTML and
XSL-FO (an intermediate step toward producing PDF). The stable version of
the DocBook XSL stylesheets is 1.70.1, and it includes support for DocBook
5.0; the next testing version of these stylesheets, version 1.71.0, was
released recently. Work has also begun on a rewrite of the DocBook XSL
stylesheets using XSLT 2; these are unsurprisingly called the DocBook XSL
2 stylesheets. Developers of some DocBook editors and other tools
have worked to integrate support for DocBook 5.
Jirka Kosek, card-carrying member of the DocBook illuminati, has
written and currently maintains
DocBook V5.0: The Transition Guide,
which covers the above DocBook 5
issues in more detail and which will be very useful to anyone interested
in migrating from DocBook 4 to DocBook 5.
DocBook offers authors a powerful level of expressiveness, and both the
stable version 4 and the new version 5 will soon reach important
milestones. DocBook 5 is a refactoring, intended to better integrate with
XML technologies and to be easier to use by authors and users who need to
customize the language itself.
It is written with the intention of avoiding major disruptions of
patterns of authoring that exist with DocBook 4.
New versions of both DocBook 4 and DocBook 5 continue to offer
enhancements that allow authors to better express their thoughts and
convey information.
(
Log in to post comments)