Gutenberg 2.0: the birth of open content

March 29, 2006

This article was contributed by Glyn Moody

A previous LWN.net feature examined the parallels between open source and open access, which strives for the free online availability of the academic knowledge distilled into research papers. Although it has some particular characteristics of its own, open access can be considered part of a wider move to gain free online access to general digital content.

The roots of this open content movement, as it came to be called, go back to before the Internet existed, and when even computers were relatively rare beasts. In 1971, the year Richard Stallman joined the MIT AI Lab, Michael Hart was given an operator's account on a Xerox Sigma V mainframe at the University of Illinois. Since he estimated this computer time had a nominal worth of $100 million, he felt he had an obligation to repay this generosity by using it to create something of comparable and lasting value.

His solution was to type in the US Declaration of Independence, roughly 5K of ASCII, and to attempt to send it to everyone on ARPANET (fortunately, this trailblazing attempt at spam failed). His insight was that once turned from analogue to digital form, a book could be reproduced endlessly for almost zero additional cost what Hart termed "Replicator Technology". By converting printed texts into etexts, he was able to create something whose potential aggregate value far exceeded even the heady figure he put on the computing time he used to generate it.

Hart chose the name "Project Gutenberg" for this body of etexts, making a bold claim that they represented the start of something as epoch-making as the original Gutenberg revolution. Indeed, he goes further: he sees the original Gutenberg as the well-spring of the Industrial Revolution, and his own project as the precursor of the next Industrial Revolution, where Replicator Technology will be applied not just to digital entities as with Project Gutenberg but to analogue ones too.

The Replicator idea is similar to one of the key defining characteristics of free software: that it can be copied endlessly, at almost no marginal cost. Hart's motivation for this move the creation of a huge permanent store of human knowledge is very different from Stallman's reason for starting the GNU project, which is powered by his commitment to spreading freedom. But on the Project Gutenberg site, there is a discussion about the ambiguity of the word "free" that could come straight from Stallman: "The word free in the English language does not distinguish between free of charge and freedom. .... Fortunately almost all Project Gutenberg ebooks are free of charge and free as in freedom."

There are other interesting parallels between the two men. After they had their respective epiphanies, both labored almost entirely alone to begin with Hart entering page after page of books into a computer, and Stallman coding the first few programs of the GNU project. Even 20 years after Project Gutenberg had begun, Hart had only created 10 ebooks (today, the figure is 17,000). Given the dedication required, it is no surprise that both are driven men, sustained by their sense of moral duty and of the unparalleled possibilities for changing the world that the digital realm offers.

Both, too, were aided enormously as the Internet grew and spread, since it allowed the two projects to adopt a distributed approach for their work. In the case of Project Gutenberg, this was formalized with the foundation of the Distributed Proofreaders team in October 2000; since then - and thanks in part to a Slashdotting in November 2002 - hundreds of books are being turned into ebooks every month.

Moreover, just as free software paid back the debt by creating programs that pushed Internet adoption to even higher levels, so Project Gutenberg returned the compliment by making key early titles like "Zen and the Art of the Internet" (June 1992) and "The Hitchhikers Guide to the Internet" (September 1992) available to help new Internet users find their way around.

The Internet was also the perfect low-cost distribution medium for the digital creations of Hart and Stallman. After starting out at the University of Illinois, Project Gutenberg was mirrored at the University of North Carolina, under the auspices of Paul Jones, one of the pioneers in facilitating free access to all kinds of digital files. In 1992, SunSITE was launched there, designed as "a central repository for a collection of public-domain software, shareware and other electronic material such as research articles and electronic images" according to the press release of the time. SunSITE became iBiblio.org in 2000 (after briefly turning into MetaLab in 1998), and received a $4 million grant from the Center for the Public Domain, set up by Red Hat co-founders Bob Young and Marc Ewing. Over time, iBiblio became Project Gutenberg's official host and primary distribution site.

To the collection of open content at SunSITE was soon added an early GNU/Linux archive, managed successively by Jonathan Magid, Erik Troan, and Eric Raymond. Given this close association between SunSITE and GNU/Linux, it was only natural that it became the host for the Linux Documentation Project (LDP) when it was founded in 1992 by Matt Welsh, and this soon grew into another important early collection of free content. The LDP began with the Linux FAQ, and expanded to include a kernel hackers guide and system administrator guide when Michael K. Johnson and Lars Wirzenius joined the project. These texts were originally created in LaTeX, but documentation later appeared in the then-new HTML. Around the same time, in April 1993, there were discussions between people like Tim Berners-Lee, Guido van Rossum and Nathan Torkington about the idea of working with Project Gutenberg to distribute HTML versions of its etexts, in part, presumably, to use the well-established Project Gutenberg to help promote the fledgling Web format.

An early concern about the LDP materials was that they might be published commercially without permission. To avoid this, a fairly restrictive license was employed, which allowed reproduction in electronic or printed form, but only non-commercially, and without modifications. This was later relaxed, and the current license allows derivative works. This issue of whether to allow changes has been a vexed one from the earliest days of online content: what were probably the first digital documents available on a network, the RFCs (which first appeared in 1969, even before ARPANET), had also forbidden modifications.

Since Project Gutenberg's materials are almost exclusively drawn from the public domain (a few copyrighted works have been included with the author's permission), it might be expected that the license would allow any kind of use, including modifications. However, it imposes a number of conditions on those who wish to use the name Project Gutenberg in the ebooks they distribute; in this case, only verbatim copies are permitted, and commercial distributors must pay royalties. If all references to the Project are stripped out, leaving the bare text, the latter can be used in any way.

One other condition for etexts distributed under the Project Gutenberg name is worth noting. The license stipulates:

if you provide access to or distribute copies of a Project Gutenberg work in a format other than "Plain Vanilla ASCII" or other format used in the official version posted on the official Project Gutenberg-tm web site (www.gutenberg.net), you must, at no additional cost, fee or expense to the user, provide a copy, a means of exporting a copy, or a means of obtaining a copy upon request, of the work in its original "Plain Vanilla ASCII" or other form.

Just as the GPL does for software, the Project Gutenberg license insists that the "source code" of etexts distributed in non-ASCII formats be freely available.

In fact, an explicit connection between Project Gutenberg and free software is to be found at the top of every page on the Project Gutenberg Web site, which offers thanks to those who wrote the programs which the site employs GNU/Linux, Apache, PostgreSQL, PHP, Perl and Python and a link to the Free Software Foundation.

Licensing proved to be the crucial issue for freely-available materials, and it was only when it was fully resolved that open content really began to take off. The next feature in this series will look at how that happened, and what some of the immediate consequences were.

Glyn Moody writes about open source and open content at opendotdotdot.

Index entries for this article
GuestArticles	Moody, Glyn

Gutenberg 2.0: the birth of open content

Posted Mar 30, 2006 19:01 UTC (Thu) by hppnq (guest, #14462) [Link] (1 responses)

Thanks Glyn, nice article. :-)

Gutenberg 2.0: the birth of open content

Posted Mar 31, 2006 23:20 UTC (Fri) by grouch (guest, #27289) [Link]

Yes, it is. I always pick up new (to me) information from a Moody article.