Nina Paley has certainly stirred things up with her recent "rantifesto" on
free culture and free software. It has spawned numerous responses on
various blogs, both from supporters and those who disagree with her
contention that the Free Software Foundation (FSF) is being hypocritical
in its licensing of its web pages and other non-software works. For some
people it is
a bit galling to see an organization that is set up to ensure the right
to create and distribute derivative works (subject to some conditions, of
course) of software, be so steadfast in its refusal to apply those same
freedoms to text and other works.
Paley's main example is quite cogent. In her essay, she restates the
freedoms" from the FSF's free software
definition and applies them to free culture. In doing so, she has
arguably created a derivative work of the FSF's definition, which is not
compatible with the "verbatim
copying license" that governs
text on the GNU web site. Though the FSF web
site (unlike the GNU web site) is covered by the Creative
Commons Attribution-NoDerivs (CC BY-ND) license, and Paley confuses the
two a bit, either of the those two licenses would restrict derivative
works. It's a little hard to argue, however, that what Paley has done is
not in keeping with the spirit of free software even though it has been applied
to text that is specifically licensed to restrict that kind of activity.
In the unlikely event it ever becomes an issue, an argument could certainly be made that Paley exercised her "fair use"
rights when creating the four freedoms of free culture.
But fair use is a weak and uncertain defense—at best—for
anyone wishing to make use of a restrictively-licensed work.
Fair use is jurisdiction-dependent and, even in
places like the US where it is an established precedent, judges can
interpret it in wildly different ways. There is also the small matter of
the cost to defend
against a claim of copyright violation even if it seems to be fair use.
In some ways, it's reminiscent of the uphill battle faced by a software
company accused of a patent violation for a patent with "obvious" prior
art—it's extremely costly to defend against the suit without any real
assurance of getting a sensible ruling. A license that explicitly allows
derivative works provides much more certainty.
The argument that Paley is making is not that all works should be licensed
freely—though, of course, that is the argument that the FSF
makes for software—but that the champion of the copyleft
movement should more liberally license its non-software works. The FSF
has already run
afoul of other free software advocates (e.g. Debian) for documentation
licensed under its GNU Free
Documentation License—at least for documentation that has "invariant"
sections which are required to
be carried along with any derivative work. Cynical observers have pointed
out that the main reason that the invariant sections exist is so that the
GNU Manifesto can be more widely spread. It is difficult to
argue that the invariant sections make the documentation more free,
however, and they certainly make it difficult to create derivative works in
the same spirit as is done with software.
As Paley points out, the creator of a work (be it software, text,
photographs, video, fine art, etc.) cannot know the kinds of things that a
user might create with a suitably licensed work. This is an argument the free
software community should be very familiar with, and would seem to be at
the heart of what free software is. All Paley is trying to do is to
broaden that freedom to other works in a free culture movement that seeks
to remove the restrictions on at least some of the cultural works that are
created by our society. Much like the FSF takes projects and other
organizations to task over their "anti-freedom" moves with respect to
software, Paley is essentially doing the same to the FSF. She is asking the
FSF (and the much larger FOSS
community) to join forces in helping to foster free culture.
Make no mistake, free culture is clearly under serious attack from the
large "content" industries. Fair use is well-nigh impossible to actually
exercise with organizations like the RIAA and MPAA along with media giants
like Disney trying to maximize copyright in all dimensions. Without a
major sea change, nothing that is under copyright today will ever come out
from under it and fall into the public domain. Legislators will keep
extending copyright terms
so that Disney—whose success has largely been based on remixing
public domain works—never loses the copyright on its iconic mouse.
Without armies of expensive lawyers and lobbyists, the copyright situation
is unlikely to
change, but individuals certainly can participate in free culture
to create a separate commons that is available to all.
Are there differences between software and other works? Of course there
are, but they aren't such huge differences that the same principles cannot
apply to all. In fact, a perfect example is copyright itself, which
applies "equally" to a wide variety of different forms of expression.
Another example is Paley's restatement of the four freedoms—it could
be adopted by the FSF for software without any real loss. There is no huge
chasm between technical and cultural works as
some have claimed, and both kinds of works embody the opinion of their
creator in one form
Another part of Paley's argument should seem rather familiar to our
community as well. She bemoans the dilution—perhaps
distortion—of the "free culture" term by including things that are
licensed in ways that aren't truly free. We have struggled with the same
basic problem, most recently in the "open source" vs. "open core" debates,
but, more generally, in trying to agree on what constitutes "free" (or
"open") in the context of software.
The proliferation of the non-commercial (NC) versions of CC licenses on
supposedly free culture works is one of the problems that Paley
highlights. As she rightly points out, these are essentially "field of
use" restrictions that wouldn't be accepted for free software or open
source licenses. In addition, though Paley doesn't specifically mention
restrictions are a murky quagmire that
just make it difficult for potential users to know what's acceptable and what
isn't. Can you use an NC-licensed photo on your blog if you also run
Google ads to try to offset the hosting costs? Or on a commercial blog
service that runs its own ads? Those are questions for lawyers, which is
reason enough make folks leery of NC whether they are
inclined toward free culture or not—it's simpler to just use regular
copyright and decide on a case-by-case basis whether the use is suitably
The no-derivatives (ND) variants of CC licenses have their own set of
problems as well. A strict interpretation would not allow a photo to be
cropped, resized, or have text placed on it, for example. An ND text
couldn't have typos fixed or an introduction added either, which seriously
reduces the ability to use it in any reasonable way. NC and/or ND
restrictions may be just what the creator intends, but they don't really
contribute to free culture in any sensible way.
In the end, there are going to be plenty of non-free works, both software
and otherwise. Whoever creates a work gets to choose the license it's
available under, and no one has argued otherwise. Paley is just trying to
make a fairly reasonable argument
that free software and free culture should be allies, and that it's
disappointing to see the FSF make fairly arbitrary distinctions between
types of expression. The free culture movement is still in its infancy,
more or less where free software was 20 years ago or so. If free culture
can make similar inroads against the content behemoths that free software
has made in the software world, we will all be better off for it. And
that, in a nutshell, is what Paley is advocating.
Comments (27 posted)
The recent discussion of applying software-style freedoms to other creative
works has focused on, among other things, the possibility of confusion if a
derivative work is made to say something that the original author did not
intend or even actively disagrees with. But that concern is not felt only
by creators working outside of the software realm; freedom can be used (or
abused) to do unpleasant things in the software world too.
VLC is a well-respected media
player known for its multi-platform support and its ability to play almost
anything the user can come up with. If one searches for VLC in Google, the
project's site comes up at the top of the list. But it is likely to be
accompanied by one or more paid ads from sites offering free downloads of
VLC binaries. This might strike one as an interesting situation; the
people behind these sites are willing to pay money for ads so that they can
offer up their bandwidth for free downloads. Either their enthusiasm for
VLC is so extreme that they are willing to put considerable resources into
encouraging its distribution, or something else is going on.
Unsurprisingly, it seems that something else is going on. A recent blog
posting by VLC developer Ludovic Fauvet names a number of these sites
and complains about the business they are in:
What bothers us the most is that many of them are bundling VLC with
various crapware to monetize it in ways that mislead our users by
thinking they're downloading an original version. This is not
acceptable. The result is a poor product that doesn't work as
intended, that can't be uninstalled and that clearly abuses its
users and their privacy. Not to mention that it also discredits our
work as volunteers and that it's time-consuming, time that is not
invested in the development.
In the best case, these distributors commercialize the software for their
own objectives. In progressively worse cases, they break the program, add
antifeatures, or turn it into overt spyware or malware. It is not
surprising that the VLC developers are not pleased by this kind of
activity. One user saying "I downloaded VLC and it infected my system" can
be enough to deter many others from trying the real thing.
VLC is free software, released under the GPL. As long as these
redistributors comply with the requirements of the GPL, they are within
the rights that the VLC project gave to them - even if they may or may not
be violating various laws regarding the distribution of malicious
software. As it happens, readers will no doubt be shocked to learn that
many of these companies fail to take the GPL's source availability
requirements seriously. It's almost as if they actively didn't want others
to see what they were doing to the program. Failure to comply with the GPL
gives the VLC project one tool which can be used to shut some of these
operators down; the project has evidently made use of this power at times.
What happens if a distributor corrupts VLC in some way, but complies
properly with the licensing requirements? The result can easily be unhappy
users and damage to the VLC project's "brand." One could say that the VLC
code base reflects the developers' opinions on how a media player should
work; making user-hostile changes to that code can cause those developers
to be blamed for opinions which they would never have thought to express.
They would like to prevent that from happening, but, it seems, the
inability to restrict modified versions ties their hands.
This problem is exactly why the Mozilla project maintains such firm and
uncompromising control over the use of the "Firefox" trademark. A malware
version of Firefox could do untold damage to its users and, consequently,
to the world's view of Firefox in general; it is not hard to imagine
Firefox developers lying awake at night worrying about this scenario. A
fiercely-defended trademark with a tight policy on acceptable uses gives
Mozilla a means to quickly shut down Fakefoxes which behave in undesirable
The trademark approach has its own problems; among other things, it makes
it harder for distributors to support Firefox, especially after official
support for a given release ends. Strong trademark policies often seem to
run counter to the spirit of free software and free expression as well; you
cannot, for example, set up a community site at FedoraFans.org (say)
without encountering the Fedora
trademark rules. Despite these worries, the intersection of trademarks
and free software has worked reasonably well most of the time.
The VLC project is evidently working on its trademark policies, but VLC has
a problem common to many development projects: it is not endowed with the
sort of legal budget that Mozilla has. Enforcing trademarks in any
non-trivial way requires lawyers; trademarks which are not enforced tend to
go away. Organizations like the Software Freedom Law Center can help in
the defense of trademarks, but resources for pro-bono work will be
limited. So trademarks, even if handled well, are not a complete solution
to this problem either.
That leaves the bulk of free software projects without much in the way of
defense against those who would misuse their code. But, somehow, as a
community, we have managed to muddle along reasonably well anyway. We have
some advantages in that area: we have well established trusted distribution
channels for software and a natural disinclination to run binaries from
suspicious-looking third-party web sites. We also, for better or worse,
have relatively few big-name programs which are sought out by users of more
frequently targeted operating systems. As free software continues to grow
in popularity, though, we may find ourselves confronted with unpleasant
actions by sleazy people more often. Somehow we'll find a way to deal with
them without compromising the freedoms that make free software what it is.
Comments (25 posted)
Interactive Knowledge Stack
(IKS) is an open source project focused on building an open and flexible
technology platform for semantically enhanced Content Management Systems
(CMS). It is a collaboration between academia, industry, and open source
developers, co-funded with €6.58 million by the European Union. The
goal is to enrich content management systems with semantic content in order
to let the users benefit from more intelligent extraction and linking of
their content. This could solve part of the chicken-and-egg problem for the
semantic web that arises because end users don't have easy-to-use semantic web tools.
At the recent IKS
workshop in Paris, one of the keynote speakers was Mark Greaves, who
spoke about the possibilities of the semantic web in the wiki setting. His speech
looked at the limits of traditional wikis, the promise of semantic wikis, and the birth of Semantic MediaWiki (SMW), an extension to MediaWiki, the wiki software that powers Wikipedia.
Wikis have become a powerful instrument for crowdsourcing, but they're
not the only types of content management systems that tap into the
potential of the crowd. Greaves, who is working as Director of Knowledge
Systems at Paul Allen's asset management company Vulcan, emphasized that bulletin boards,
forums, and newsgroups are the antecedents of wikis and even the beginnings
of social networks. Now we have many websites that crowdsource their content from their users:
Users write reviews on Amazon, give recommendations on Amazon and Netflix, add tags to content on Flickr and YouTube, and so on. So we see in a lot of cases that users can help building your website, not only the content but even the structure. For instance, the system administrators of WikiMedia only have to run the servers; all the rest is done by volunteers.
A critical property of wikis is consensus, which comes thanks to
collaboration and custom policies. For instance, one of the core content
policies of the Wikipedia encyclopedia is that each article should be
written from a neutral
point of view (NPOV). This forces authors to not write from their own
point of view and that helps lead them to consensus with authors that have
another point of view about the topic. The MediaWiki software also has
software support to facilitate reaching consensus, such as the talk pages
and change tracking.
But traditional wikis have their limits, as most knowledge is locked inside text and cannot be queried in a smart way. Wikipedia has an answer for this with thousands of lists, for instance lists of countries (which is itself a list of lists). But these are all manually maintained, each of them ordered by another property, like birth rate, literacy rate, population, income equality, and so on. So Greaves asked the logical question: "Why don't we give Wikipedia authors a way to add structure to their content?"
That's where semantic wikis come in, and according to Greaves they hold a lot of promise:
Semantic wikis augment traditional wikis with database capabilities, but with one crucial difference: traditional databases are schema-first, while semantic wikis are schema-last: the database schema is developed and maintained in the wiki by the authors themselves. So with semantic wikis we get a lot more flexibility and we have the means to reach social consensus over data. Unfortunately, consensus over data is very hard, and one of the prime reasons is that data modeling is a highly specialized skill.
One project working to add semantics to wiki systems is Semantic
MediaWiki (SMW), a GPL licensed extension to MediaWiki that allows annotating
semantic data within wiki pages. This means that a MediaWiki wiki that
incorporates the extension is turned into a semantic wiki: content that has
been enriched with semantic information can be used in specialized
searches, used for aggregation of pages, displayed in alternate formats
like maps, calendars, or graphs, and exported to formats like RDF (Resource Description Framework) and
CSV (Comma-Separated Values).
How does this work?
Some examples will make it clear what SMW adds. For instance, on the normal Wikipedia page of France, there's a link to its capital city, Paris:
... the capital city is [[Paris]] ...
The [[Paris]] code is a link to a wiki page about Paris, but
there's no information encoded about the specific relationship between
France and Paris.
In contrast to this classical approach, the semantic web is all about interlinking data in a machine-readable way. The core technology under the hood is RDF, which is used to describe entities and their relationships. Each RDF statement comes in the form of a triple: subject - predicate - object. Each subject and predicate is identified by a URI, while an object can be represented by a URI or be a literal value such as a string or a number. So, a Semantic MediaWiki version of the sentence about Paris could be:
... the capital city is [[Has capital::Paris]] ...
The [[Has capital::Paris]] code not only adds a link to a wiki page about Paris, it also specifies the nature of the relationship between France and Paris: France has Paris as its capital. Or to translate it into an RDF triple: "France" (which is implicit as it is the topic of the current page) is the subject, "has capital" is the predicate, and "Paris" is the object.
This is an example where the object can be represented by a URI, but there are also other examples where the object is represented by a literal value such as a number:
... its population is [[Has population::65,821,885]] ...
When this code is on the page about France, it represents an RDF triple with "France" as its subject, "has population" as its predicate, and "65,821,885" as its object. These typed links (with the predicate as the type) give SMW an out-of-the-box mechanism to automatically generate lists. With SMW's inline queries feature, it's easy to re-use this structured information to generate lists and tables which are automatically updated and cached. For instance, users can easily generate a page with a list of all countries ordered by their population, or a list of all countries with a population greater than 20 million, or a table of all countries with their capitals, and so on.
Automatically-generated lists are not the only possibility when you
start adding semantic links. You can also display the information in
various formats, you can have different language versions of a wiki using
the same data, you can integrate and mash-up your wiki's data and export it
for external re-use, and more.
The development of Semantic MediaWiki was initially funded by the EU
project SEKT (Semantically
Enabled Knowledge Technologies), and after this supported in part by the
University of Karlsruhe in Germany. The first release was version 0.1 in 2005. In 2007, Vulcan started sponsoring the German company Ontoprise to develop a commercial version of the extension, Semantic MediaWiki+ (SMW+).
According to Greaves, there are 50 open source MediaWiki extensions that use the semantic information provided by SMW. For example, there's Halo, funded by Vulcan, that facilitates creation, retrieval, navigation and organization of semantic data with some intuitive graphical user interfaces, Semantic Drilldown that provides a faceted browser interface for viewing semantic data by filtering, and Semantic Result Formats that provides a large number of display formats, including maps, calendars, graphs, and charts.
If you want to install SMW on your own wiki, there's an extensive administrator
manual with installation
instructions and a list of the configuration
options. For users who will be entering the semantic markup, the
project also has a user
Some semantic wikis
Semantic MediaWiki is already used by over 300 public active wikis around
the world. Greaves called these semantic wiki applications "the
icing on the cake", as they really show the flexibility of adding
semantics to a wiki. Some notable examples are Open Energy Information, a
crowdsourced wiki with information about energy resources, including
real-time data and visualizations, SKYbrary, a wiki
created by several European aviation organizations to create a comprehensive source of aviation safety information, Familypedia, a wiki on family history and genealogy, SNPedia, a wiki investigating human genetics, Oh Internet, a wiki to track internet memes, and Ultrapedia, a search engine for OCR'd books.
Many organizations also use SMW internally, including Pfizer, Johnson & Johnson Pharmaceutical Research and Development, and the U.S. Department of Defense. Greaves added that Vulcan is eating its own dog food:
We have created a lightweight project management tool with Semantic MediaWiki+. Our developers even change the ontology of this Scrum wiki [Scrum is an iterative, incremental framework for project management] once a month, which proves that the added flexibility of a schema-last database is welcome.
Towards a semantic Wikipedia
Some academics have already proposed using SMW on Wikipedia to tackle the problem of the many lists that have to be created manually, but according to Wikimedia Foundation Deputy Director Erik Möller it's still unclear whether SMW is up to the task of supporting a web site on the scale of Wikipedia. So while Semantic MediaWiki already powers a lot of web sites and is quite user-friendly, it remains to be seen whether it will eventually bring semantics to the ultimate wiki, Wikipedia.
The SMW project has a fairly detailed roadmap. Some of the
interesting tasks are an improvement of the usability of the semantic
search features (part of Google Summer of Code 2011), a light version of
SMW without query capabilities, improvements for the Semantic Drilldown
extension, and so on. It's already quite usable, as many of the active SMW
wikis show, but to really reach the vision of the semantic web and be able
to link various semantic wikis and other content management systems,
Semantic MediaWiki needs to become as easy to use as Wikipedia.
Comments (5 posted)
Page editor: Jonathan Corbet
Inside this week's LWN.net Weekly Edition
- Security: Reactive vs. pro-active kernel security; New vulnerabilities in blender, kernel, openoffice.org, oprofile, ...
- Kernel: Version numbers; Who wrote 3.0 - from two points of view; The structured logging challenge.
- Distributions: Linux Mint beefs up its Debian-based distribution; CentOS, Debian, ...
- Development: Making GEGL useful beyond GIMP; MELT, Nettle, Selenium, ...
- Announcements: Brockmeier: Anti-rantifesto; DeRose: Designing pro creative apps; Android