A survey of the DocBook landscape
Introduction
The OpenDocument Format, developed under OASIS (Organization for the Advancement of Structured Information Standards), has been getting quite a bit of attention lately. ODF is an Open Standard and it serves as an important vehicle for the Free Software community and this community's information; the Software Freedom Law Center recently confirmed that ODF is safe from patent claims from its OASIS Technical Committee members. Version 1.0 of the format was ratified in May of 2005 by this TC, and ODF recently arrived at one of the last stages in its process towards ISO/IEC adoption as ISO/IEC 26300. The state of Massachusetts underwent a grueling and well-scrutinized process last year in which it decided to use ODF for its official documents; at least one vendor strongly opposed this decision, but even this vendor has recently announced work on interoperability with ODF.
All this attention is well-deserved, for ODF intends to provide the structure for many of the documents that store many users' information: "office" documents. The basic purpose of a format for office documents is to encode the presentation of information. Most commonly, office documents encode how to present page-based sequential documents in print, spreadsheets in various media, and slides in interactive display and various other media. One alternative approach to authoring content focuses on the semantics of the information; this approach requires more discipline but can provide some advantages, particularly where it comes to reusing the information. In addition to ODF, OASIS also oversees the development of DocBook, which takes this alternative approach. Several significant events in DocBook development warrant some attention in that direction.
DocBook was originally developed as an SGML application and has been modernized to simultaneously support SGML and XML; it focuses on the semantics of software and hardware documentation. DocBook also provides a clear and rich representation of the semantics of general-purpose documentation, including detailed structures for bibliographic information, glossaries, and a variety of contextual devices such as footnotes. Many free software projects make use of DocBook (or a variant), including KDE, GNOME, and OpenDarwin. Not surprisingly, The Linux Documentation Project makes heavy use of DocBook.
What can you do to read a DocBook file if you (unexpectedly) receive
one? Perhaps the easiest approach is to use the DocBook XSL stylesheets
to format the file as HTML, then view it with your favorite web browser.
The xsltproc
utility provides XML translation
functionality, and it is easy to install if your distribution does not
already provide it. Using xsltproc
, you can translate a
DocBook file to HTML with the command: xsltproc
http://docbook.sourceforge.net/release/xsl/current/html/docbook.xsl
file.docbook > file.html
. Other translation tools and
stylesheets exist, and perhaps the best solution is to use a native
reader or editor of DocBook, such as
Vex or
Conglomerate,
to view and interact with the file directly.
The DocBook language: present and future
The DocBook 4 development line currently produces the stable version of DocBook: DocBook 4.4. The current "OASIS Standard" version of DocBook, however, is DocBook 4.1, which is why you often see projects using DocBook 4.1.2—the latest bug-fix version of DocBook 4.1. DocBook 4.5 is nearly completed, and has also been submitted for approval as an OASIS Standard. Release Candidate 3 (released in June) will likely become the newest stable version; RC2 was itself almost accepted as an OASIS Standard until a small bug in the specification forced the version bump.
As a matter of DocBook project policy, individual DocBook minor versions within a major version are backwards compatible with previous minor versions in the same major version. For example, all documents written in DocBook 4.1.2 are valid DocBook 4.4 documents and all DocBook 4.4 documents will be valid DocBook 4.5 documents when that version is available. These minor versions of DocBook 4 have subtly added to its expressiveness in addition to adding completely new elements, such as user-requested markup for describing tasks.
A new major version of DocBook, version 5, is rapidly approaching. DocBook 5 explicitly breaks backwards compatibility in order to move in some new directions, which largely have to do with aspects of the underlying technology. The naming and semantics of markup in DocBook 5, on the other hand, strongly reflect DocBook 4. DocBook 5 makes a break from its SGML roots, moving to aspects of XML technology that are not represented in the SGML model.
The most prominent of the architectural changes is that DocBook 5 now uses an XML namespace for its element set. This namespace will be used by the stable version when it is released so users will not need to migrate to a different namespace once DocBook 5 stabilizes. The use of an XML namespace allows DocBook to more cleanly take advantage of other XML dialects such as SVG and MathML; it also allows other languages to more easily integrate DocBook, or subsets of DocBook, in places where they want to express prose documentation.
Validation and new features
Document validation is an important tool for supporting document interoperability. Through version 4, DocBook has primarily provided a Document Type Definition (DTD) for assessing document validity. DTDs are well supported and built into the core XML specification, but they are not able to deal with XML Namespaces and they are not as expressive as more modern tools. For these and other reasons, DocBook 5 (like ODF) provides a RELAX NG schema as its basis for validation. RELAX NG is more context-aware, which means that in several places certain DocBook constructs have been simplified or merged, and a number of previously unenforceable constraints are now enforced.
The DocBook 5 schema in RELAX NG is also highly modular, which means that anyone interested in modifying the language can easily pick and choose from small components to build their custom language. If needed, users can also use less accurate, monolithic DTDs or W3C XML Schemas that are generated from the RELAX NG schema. In addition to RELAX NG, the DocBook 5 schema uses a set of optional Schematron assertions to help validate those hard-to-reach places.
DocBook 5 also sports new and improved facilities for expressing
content. Instead of native hypertext markup, it uses XLink for hypertext references.
Interestingly, in DocBook 5 almost every element can serve as a hyperlink:
if xlink
is bound to the XLink namespace, then simply set
xlink:href="target"
on an element to have that element point
at the target. In XLink, these types of links are called Simple Links;
DocBook 5 also adds support for XLink Extended Links using the new,
imaginatively named extendedlink
element.
DocBook 5 continues to use XInclude to support transclusion. In addition to many fixes, the removal of several obsolete components, and a number of small adjustments, it also introduces elements designed to support new features, such as a general mechanism for annotating content and a structure for noting the correspondence between a term and its definition.
Practical considerations
DocBook 5 will likely have a stable release soon. Norman Walsh, the main hacker, er, lead architect of DocBook 5, published his first experiments with the new language in May of 2003 and the first official beta of DocBook 5 was published in October of 2005. It is currently at beta 7, and there will be several release candidates before the Technical Committee applies the official DocBook 5.0 seal of approval.
Many of the tools for processing DocBook have gained DocBook 5 support as DocBook 5 has developed. Many users take advantage of the (previously mentioned) DocBook XSL stylesheets for converting DocBook to other formats for publication, such as HTML and XSL-FO (an intermediate step toward producing PDF). The stable version of the DocBook XSL stylesheets is 1.70.1, and it includes support for DocBook 5.0; the next testing version of these stylesheets, version 1.71.0, was released recently. Work has also begun on a rewrite of the DocBook XSL stylesheets using XSLT 2; these are unsurprisingly called the DocBook XSL 2 stylesheets. Developers of some DocBook editors and other tools have worked to integrate support for DocBook 5.
Jirka Kosek, card-carrying member of the DocBook illuminati, has written and currently maintains DocBook V5.0: The Transition Guide, which covers the above DocBook 5 issues in more detail and which will be very useful to anyone interested in migrating from DocBook 4 to DocBook 5.
DocBook offers authors a powerful level of expressiveness, and both the stable version 4 and the new version 5 will soon reach important milestones. DocBook 5 is a refactoring, intended to better integrate with XML technologies and to be easier to use by authors and users who need to customize the language itself. It is written with the intention of avoiding major disruptions of patterns of authoring that exist with DocBook 4. New versions of both DocBook 4 and DocBook 5 continue to offer enhancements that allow authors to better express their thoughts and convey information.
Index entries for this article | |
---|---|
GuestArticles | Clark, John L. |
Posted Sep 17, 2006 20:14 UTC (Sun)
by kreutzm (guest, #4700)
[Link] (3 responses)
Posted Sep 18, 2006 17:43 UTC (Mon)
by JLCdjinn (guest, #1905)
[Link] (2 responses)
There are issues here at a variety of levels that usually include DocBook (the language) as well as tools that process DocBook. First, you (in the generic sense) may want to write the same text in several different languages in the same document. This is called profiling, and DocBook has facilities for profiling such as the It sounds like you may also be interested in i18n (internationalization) support with respect to automatically generated portions of an output document. Clearly, such support is the responsibility of tools that process DocBook. Again, DocBook XSL: The Complete Guide provides a thorough treatment of the way in which the DocBook XSL stylesheets allow for language-specific generated content in its section titled "Language support". One criticism that you might level against DocBook (and many XML dialects) is the choice of language (usually English) for their tag names. Clearly element names like It would be interesting to learn what tools you are using in order to try to fix the problem with those tools.
Posted Sep 18, 2006 19:42 UTC (Mon)
by kreutzm (guest, #4700)
[Link] (1 responses)
I use docbook2man to create man pages. You can read the entire problem including my patch and attempts to reach the author or Debian maintainer at
my bug report regarding this issue.
The first of your items sounds interesting, but from a practical point of view (i.e. for applications I envisage) having all languages in one file does not look optimal. The second paragraph is exactly the one describing my problem. If there are better tools (not ideas) for transforming a docbook file to a man page I will certainly have a look. The third paragraph describes an issue I have not thought about. First, of course, as I speak english but secondly because I treat docbook like a programming language - you'll have to know the tags/keywords to get your work done. It's very interesting, though, that there are solutions for this, even. Hopefully it'll work even if documents are interchanged.
Posted Sep 18, 2006 21:56 UTC (Mon)
by JLCdjinn (guest, #1905)
[Link]
I use docbook2man to create man pages. If there are
better tools (not ideas) for transforming a docbook file to a man page I
will certainly have a look. Some casual poking around the Internet leads me to believe that the
DocBook XSL stylesheets seem to be a better-supported solution than
docbook2man. For example, I created a simple DocBook-formatted man page
with the Granted, after generating the UTF-8 troff source, I was unable to
actually view the Russian headings in a formatted man page, but that's a
different story...
Posted Feb 8, 2008 0:42 UTC (Fri)
by JLCdjinn (guest, #1905)
[Link]
DocBook 5 will likely have a stable release soon. Clearly, by "soon" I meant 17 months. Ahem. My prognostication
abilities leave something to be desired. In any case, DocBook 5.0 has been
finalized (that is, approved as a committee draft) as of 2008-02-06!
And there was much rejoicing! When I wrote this article, DocBook 5 was at beta 7; after two more
betas, it also went through 7 release candidates before arriving at 5.0
final. There were a number of bug fixes and minor enhancements, but the
overall DocBook architecture is largely the same as described in this
article. For all the gory details, check out the change log.
Posted Nov 10, 2009 20:36 UTC (Tue)
by JLCdjinn (guest, #1905)
[Link]
I really like docbook, but what is severly missing is i18n. I wrote several man pages in german, but the transformation to nroff insists on using english terms and conventions. Writing to the upstream authors or opening bugs (with patches!) in the bug tracker of my distribution did not cause any response. I hope future versions of docbook deal with this deficiency.So far, so missing i18n
Missing where?
lang
and os
attributes. DocBook XSL: The Complete Guide has a chapter called "Profiling (conditional text)" that talks about these issues both in DocBook as well as support for these features in the DocBook XSL stylesheets.article
and section
only have meaning to those who understand English. A problem like this must be solved at the XML level; one possible solution is the DSRL (Document Schema Renaming Language) approach. See the DSRL documents on the DSDL (Document Schema Definition Languages) web site for details.Missing where?
Missing where?
lang="ru"
attribute set on the root element. I then
ran xsltproc
http://docbook.sourceforge.net/release/xsl/current/manpages/docbook.xsl
manpage.docbook
, and the resulting troff source contains Russian
section headings (although I cannot speak at all to their accuracy, not
being able to speak Russian and all). Any incompleteness or inaccuracy in
a
particular l10n should be fixed through the DocBook XSL stylesheets
project.DocBook 5.0 finalized
And over three years later (as of 2009-11-02), DocBook 5 is now an OASIS Standard.
OASIS Standard