March 29, 2006
This article was contributed by Glyn Moody
A previous LWN.net
feature examined the
parallels between open source and open access, which strives for the free
online availability of the academic knowledge distilled into research
papers. Although it has some particular characteristics of its own, open
access can be considered part of a wider move to gain free online access to
general digital content.
The roots of this open content movement, as it came to be called, go back to
before the Internet existed, and when even computers were relatively rare
beasts. In 1971, the year Richard
Stallman joined the MIT AI Lab, Michael Hart
was given an operator's account on a Xerox
Sigma V mainframe at the
University of Illinois. Since he estimated this computer time had a nominal
worth of $100 million, he felt he had an obligation to repay this generosity
by using it to create something of comparable and lasting value.
His solution was to type in the US Declaration of Independence, roughly 5K
of ASCII, and to attempt to send it to everyone on ARPANET (fortunately,
this trailblazing attempt at spam failed). His insight was that once turned
from analogue to digital form, a book could be reproduced endlessly for
almost zero additional cost what Hart termed "Replicator Technology". By
converting printed texts into etexts, he was able to create something whose
potential aggregate value far exceeded even the heady figure he put on the
computing time he used to generate it.
Hart chose the name "Project Gutenberg" for this body of etexts, making a
bold claim that they represented the start of something as epoch-making as
the original Gutenberg revolution. Indeed, he goes further: he sees the
original Gutenberg as the well-spring of the Industrial Revolution, and his
own project as the precursor of the next Industrial Revolution, where
Replicator Technology will be applied not just to digital entities as with
Project Gutenberg but to analogue ones too.
The Replicator idea is similar to one of the key defining characteristics of
free software: that it can be copied endlessly, at almost no marginal cost.
Hart's motivation for this move the creation of a huge permanent store of
human knowledge is very different from Stallman's reason for starting the
GNU project, which is powered by his commitment to spreading freedom. But on
the Project Gutenberg site, there
is a
discussion about the ambiguity of the
word "free" that could come straight from Stallman: "The word free in the
English language does not distinguish between free of charge and freedom.
.... Fortunately almost all Project Gutenberg ebooks are free of charge and
free as in freedom."
There are other interesting parallels between the two men. After they had
their respective epiphanies, both labored almost entirely alone to begin
with Hart entering page after page of books into a computer, and Stallman
coding the first few programs of the GNU project. Even 20 years after
Project Gutenberg had begun, Hart had only created 10 ebooks (today, the
figure is 17,000). Given the dedication required, it is no surprise that
both are driven men, sustained by their sense of moral duty and of the
unparalleled possibilities for changing the world that the digital realm
offers.
Both, too, were aided enormously as the Internet grew and spread, since it
allowed the two projects to adopt a distributed approach for their work. In
the case of Project Gutenberg, this was formalized with the foundation of
the Distributed Proofreaders team in
October 2000; since then - and thanks in part to a Slashdotting in November
2002 - hundreds of books are being turned into ebooks every month.
Moreover, just as free software paid back the debt by creating programs that
pushed Internet adoption to even higher levels, so Project Gutenberg
returned the compliment by making key early titles like "Zen and the Art of
the Internet" (June 1992) and "The
Hitchhikers Guide to the Internet"
(September 1992) available to help new Internet users find their way around.
The Internet was also the perfect low-cost distribution medium for the
digital creations of Hart and Stallman. After starting out at the University
of Illinois, Project Gutenberg was mirrored at the University of North
Carolina, under the auspices of Paul Jones,
one of the pioneers in facilitating free access to all kinds of digital
files. In 1992, SunSITE was launched there, designed as "a central
repository for a collection of public-domain software, shareware and other
electronic material such as research articles and electronic images"
according to the press release of the time. SunSITE became
iBiblio.org in 2000 (after briefly turning
into MetaLab in 1998), and received a $4
million grant from the Center for the Public Domain, set up by Red Hat
co-founders Bob Young and Marc Ewing. Over time, iBiblio became Project
Gutenberg's official host and primary distribution site.
To the collection of open content at SunSITE was soon added an early
GNU/Linux archive, managed
successively by Jonathan Magid, Erik Troan, and Eric Raymond. Given this
close association between SunSITE and GNU/Linux, it was only natural that it
became the host for the Linux Documentation Project (LDP)
when it was founded in 1992 by Matt Welsh, and this soon grew into another
important early collection of free content. The LDP began with the Linux
FAQ, and expanded to include a kernel hackers guide and system administrator
guide when Michael K. Johnson and Lars Wirzenius joined the project. These
texts were originally created in LaTeX, but documentation later appeared in
the then-new HTML. Around the same time, in April 1993, there were
discussions between people like Tim Berners-Lee, Guido van Rossum and Nathan
Torkington about the idea of working with Project Gutenberg to distribute
HTML versions of its etexts, in part, presumably, to use the
well-established Project Gutenberg to help promote the fledgling Web format.
An early concern about the LDP materials was that they might be published
commercially without permission. To avoid this, a fairly restrictive license
was employed, which allowed reproduction in electronic or printed form, but
only non-commercially, and without modifications. This was later relaxed,
and the current license allows derivative
works. This issue of whether to allow changes has been a vexed one from the
earliest days of online content: what were probably the first digital
documents available on a network, the RFCs (which first appeared in 1969,
even before ARPANET), had also forbidden modifications.
Since Project Gutenberg's materials are almost exclusively drawn from the
public domain (a few copyrighted works have been included with the author's
permission), it might be expected that the
license would allow any kind of
use, including modifications. However, it imposes a
number of conditions on those who wish to use the name Project Gutenberg in
the ebooks they distribute; in this case, only verbatim copies are
permitted, and commercial distributors must pay royalties. If all
references to the Project are stripped out, leaving the bare text, the
latter can be used in any way.
One other condition for etexts distributed under the Project Gutenberg name
is worth noting. The license stipulates:
if you provide access to or distribute copies of a Project
Gutenberg work in a format other than "Plain Vanilla ASCII" or
other format used in the official version posted on the official
Project Gutenberg-tm web site (www.gutenberg.net), you must, at no
additional cost, fee or expense to the user, provide a copy, a means
of exporting a copy, or a means of obtaining a copy upon request, of
the work in its original "Plain Vanilla ASCII" or other form.
Just as the GPL does for software, the Project Gutenberg license insists
that the "source code" of etexts distributed in non-ASCII formats be freely
available.
In fact, an explicit connection between Project Gutenberg and free software
is to be found at the top of every page on the Project Gutenberg Web site, which
offers thanks to those who wrote the programs which the site employs
GNU/Linux, Apache, PostgreSQL, PHP, Perl and Python and a link to the Free
Software Foundation.
Licensing proved to be the crucial issue for freely-available materials, and
it was only when it was fully resolved that open content really began to
take off. The next feature in this series will look at how that happened,
and what some of the immediate consequences were.
Glyn Moody writes about open source and open content at
opendotdotdot.
(
Log in to post comments)