User: Password:
|
|
Subscribe / Log in / New account

Leading items

Copyright, copyleft, and culture

By Jake Edge
July 13, 2011

Nina Paley has certainly stirred things up with her recent "rantifesto" on free culture and free software. It has spawned numerous responses on various blogs, both from supporters and those who disagree with her contention that the Free Software Foundation (FSF) is being hypocritical in its licensing of its web pages and other non-software works. For some people it is a bit galling to see an organization that is set up to ensure the right to create and distribute derivative works (subject to some conditions, of course) of software, be so steadfast in its refusal to apply those same freedoms to text and other works.

Paley's main example is quite cogent. In her essay, she restates the famous "four freedoms" from the FSF's free software definition and applies them to free culture. In doing so, she has arguably created a derivative work of the FSF's definition, which is not compatible with the "verbatim copying license" that governs text on the GNU web site. Though the FSF web site (unlike the GNU web site) is covered by the Creative Commons Attribution-NoDerivs (CC BY-ND) license, and Paley confuses the two a bit, either of the those two licenses would restrict derivative works. It's a little hard to argue, however, that what Paley has done is not in keeping with the spirit of free software even though it has been applied to text that is specifically licensed to restrict that kind of activity.

In the unlikely event it ever becomes an issue, an argument could certainly be made that Paley exercised her "fair use" rights when creating the four freedoms of free culture. But fair use is a weak and uncertain defense—at best—for anyone wishing to make use of a restrictively-licensed work. Fair use is jurisdiction-dependent and, even in places like the US where it is an established precedent, judges can interpret it in wildly different ways. There is also the small matter of the cost to defend against a claim of copyright violation even if it seems to be fair use. In some ways, it's reminiscent of the uphill battle faced by a software company accused of a patent violation for a patent with "obvious" prior art—it's extremely costly to defend against the suit without any real assurance of getting a sensible ruling. A license that explicitly allows derivative works provides much more certainty.

The argument that Paley is making is not that all works should be licensed freely—though, of course, that is the argument that the FSF makes for software—but that the champion of the copyleft movement should more liberally license its non-software works. The FSF has already run afoul of other free software advocates (e.g. Debian) for documentation licensed under its GNU Free Documentation License—at least for documentation that has "invariant" sections which are required to be carried along with any derivative work. Cynical observers have pointed out that the main reason that the invariant sections exist is so that the GNU Manifesto can be more widely spread. It is difficult to argue that the invariant sections make the documentation more free, however, and they certainly make it difficult to create derivative works in the same spirit as is done with software.

As Paley points out, the creator of a work (be it software, text, photographs, video, fine art, etc.) cannot know the kinds of things that a downstream user might create with a suitably licensed work. This is an argument the free software community should be very familiar with, and would seem to be at the heart of what free software is. All Paley is trying to do is to broaden that freedom to other works in a free culture movement that seeks to remove the restrictions on at least some of the cultural works that are created by our society. Much like the FSF takes projects and other organizations to task over their "anti-freedom" moves with respect to software, Paley is essentially doing the same to the FSF. She is asking the FSF (and the much larger FOSS community) to join forces in helping to foster free culture.

Make no mistake, free culture is clearly under serious attack from the large "content" industries. Fair use is well-nigh impossible to actually exercise with organizations like the RIAA and MPAA along with media giants like Disney trying to maximize copyright in all dimensions. Without a major sea change, nothing that is under copyright today will ever come out from under it and fall into the public domain. Legislators will keep extending copyright terms so that Disney—whose success has largely been based on remixing public domain works—never loses the copyright on its iconic mouse. Without armies of expensive lawyers and lobbyists, the copyright situation is unlikely to change, but individuals certainly can participate in free culture to create a separate commons that is available to all.

Are there differences between software and other works? Of course there are, but they aren't such huge differences that the same principles cannot apply to all. In fact, a perfect example is copyright itself, which applies "equally" to a wide variety of different forms of expression. Another example is Paley's restatement of the four freedoms—it could be adopted by the FSF for software without any real loss. There is no huge chasm between technical and cultural works as some have claimed, and both kinds of works embody the opinion of their creator in one form or another.

Another part of Paley's argument should seem rather familiar to our community as well. She bemoans the dilution—perhaps distortion—of the "free culture" term by including things that are licensed in ways that aren't truly free. We have struggled with the same basic problem, most recently in the "open source" vs. "open core" debates, but, more generally, in trying to agree on what constitutes "free" (or "open") in the context of software.

The proliferation of the non-commercial (NC) versions of CC licenses on supposedly free culture works is one of the problems that Paley highlights. As she rightly points out, these are essentially "field of use" restrictions that wouldn't be accepted for free software or open source licenses. In addition, though Paley doesn't specifically mention it, NC restrictions are a murky quagmire that just make it difficult for potential users to know what's acceptable and what isn't. Can you use an NC-licensed photo on your blog if you also run Google ads to try to offset the hosting costs? Or on a commercial blog service that runs its own ads? Those are questions for lawyers, which is reason enough make folks leery of NC whether they are inclined toward free culture or not—it's simpler to just use regular copyright and decide on a case-by-case basis whether the use is suitably non-commercial.

The no-derivatives (ND) variants of CC licenses have their own set of problems as well. A strict interpretation would not allow a photo to be cropped, resized, or have text placed on it, for example. An ND text couldn't have typos fixed or an introduction added either, which seriously reduces the ability to use it in any reasonable way. NC and/or ND restrictions may be just what the creator intends, but they don't really contribute to free culture in any sensible way.

In the end, there are going to be plenty of non-free works, both software and otherwise. Whoever creates a work gets to choose the license it's available under, and no one has argued otherwise. Paley is just trying to make a fairly reasonable argument that free software and free culture should be allies, and that it's disappointing to see the FSF make fairly arbitrary distinctions between types of expression. The free culture movement is still in its infancy, more or less where free software was 20 years ago or so. If free culture can make similar inroads against the content behemoths that free software has made in the software world, we will all be better off for it. And that, in a nutshell, is what Paley is advocating.

Comments (27 posted)

VLC and unwelcome redistributors

By Jonathan Corbet
July 13, 2011
The recent discussion of applying software-style freedoms to other creative works has focused on, among other things, the possibility of confusion if a derivative work is made to say something that the original author did not intend or even actively disagrees with. But that concern is not felt only by creators working outside of the software realm; freedom can be used (or abused) to do unpleasant things in the software world too.

VLC is a well-respected media player known for its multi-platform support and its ability to play almost anything the user can come up with. If one searches for VLC in Google, the project's site comes up at the top of the list. But it is likely to be accompanied by one or more paid ads from sites offering free downloads of VLC binaries. This might strike one as an interesting situation; the people behind these sites are willing to pay money for ads so that they can offer up their bandwidth for free downloads. Either their enthusiasm for VLC is so extreme that they are willing to put considerable resources into encouraging its distribution, or something else is going on.

Unsurprisingly, it seems that something else is going on. A recent blog posting by VLC developer Ludovic Fauvet names a number of these sites and complains about the business they are in:

What bothers us the most is that many of them are bundling VLC with various crapware to monetize it in ways that mislead our users by thinking they're downloading an original version. This is not acceptable. The result is a poor product that doesn't work as intended, that can't be uninstalled and that clearly abuses its users and their privacy. Not to mention that it also discredits our work as volunteers and that it's time-consuming, time that is not invested in the development.

In the best case, these distributors commercialize the software for their own objectives. In progressively worse cases, they break the program, add antifeatures, or turn it into overt spyware or malware. It is not surprising that the VLC developers are not pleased by this kind of activity. One user saying "I downloaded VLC and it infected my system" can be enough to deter many others from trying the real thing.

VLC is free software, released under the GPL. As long as these redistributors comply with the requirements of the GPL, they are within the rights that the VLC project gave to them - even if they may or may not be violating various laws regarding the distribution of malicious software. As it happens, readers will no doubt be shocked to learn that many of these companies fail to take the GPL's source availability requirements seriously. It's almost as if they actively didn't want others to see what they were doing to the program. Failure to comply with the GPL gives the VLC project one tool which can be used to shut some of these operators down; the project has evidently made use of this power at times.

What happens if a distributor corrupts VLC in some way, but complies properly with the licensing requirements? The result can easily be unhappy users and damage to the VLC project's "brand." One could say that the VLC code base reflects the developers' opinions on how a media player should work; making user-hostile changes to that code can cause those developers to be blamed for opinions which they would never have thought to express. They would like to prevent that from happening, but, it seems, the inability to restrict modified versions ties their hands.

This problem is exactly why the Mozilla project maintains such firm and uncompromising control over the use of the "Firefox" trademark. A malware version of Firefox could do untold damage to its users and, consequently, to the world's view of Firefox in general; it is not hard to imagine Firefox developers lying awake at night worrying about this scenario. A fiercely-defended trademark with a tight policy on acceptable uses gives Mozilla a means to quickly shut down Fakefoxes which behave in undesirable ways.

The trademark approach has its own problems; among other things, it makes it harder for distributors to support Firefox, especially after official support for a given release ends. Strong trademark policies often seem to run counter to the spirit of free software and free expression as well; you cannot, for example, set up a community site at FedoraFans.org (say) without encountering the Fedora trademark rules. Despite these worries, the intersection of trademarks and free software has worked reasonably well most of the time.

The VLC project is evidently working on its trademark policies, but VLC has a problem common to many development projects: it is not endowed with the sort of legal budget that Mozilla has. Enforcing trademarks in any non-trivial way requires lawyers; trademarks which are not enforced tend to go away. Organizations like the Software Freedom Law Center can help in the defense of trademarks, but resources for pro-bono work will be limited. So trademarks, even if handled well, are not a complete solution to this problem either.

That leaves the bulk of free software projects without much in the way of defense against those who would misuse their code. But, somehow, as a community, we have managed to muddle along reasonably well anyway. We have some advantages in that area: we have well established trusted distribution channels for software and a natural disinclination to run binaries from suspicious-looking third-party web sites. We also, for better or worse, have relatively few big-name programs which are sought out by users of more frequently targeted operating systems. As free software continues to grow in popularity, though, we may find ourselves confronted with unpleasant actions by sleazy people more often. Somehow we'll find a way to deal with them without compromising the freedoms that make free software what it is.

Comments (26 posted)

Semantic MediaWiki: Toward smarter wikis

July 13, 2011

This article was contributed by Koen Vervloesem

Interactive Knowledge Stack (IKS) is an open source project focused on building an open and flexible technology platform for semantically enhanced Content Management Systems (CMS). It is a collaboration between academia, industry, and open source developers, co-funded with €6.58 million by the European Union. The goal is to enrich content management systems with semantic content in order to let the users benefit from more intelligent extraction and linking of their content. This could solve part of the chicken-and-egg problem for the semantic web that arises because end users don't have easy-to-use semantic web tools.

At the recent IKS workshop in Paris, one of the keynote speakers was Mark Greaves, who spoke about the possibilities of the semantic web in the wiki setting. His speech looked at the limits of traditional wikis, the promise of semantic wikis, and the birth of Semantic MediaWiki (SMW), an extension to MediaWiki, the wiki software that powers Wikipedia.

Wikis have become a powerful instrument for crowdsourcing, but they're not the only types of content management systems that tap into the potential of the crowd. Greaves, who is working as Director of Knowledge Systems at Paul Allen's asset management company Vulcan, emphasized that bulletin boards, forums, and newsgroups are the antecedents of wikis and even the beginnings of social networks. Now we have many websites that crowdsource their content from their users:

Users write reviews on Amazon, give recommendations on Amazon and Netflix, add tags to content on Flickr and YouTube, and so on. So we see in a lot of cases that users can help building your website, not only the content but even the structure. For instance, the system administrators of WikiMedia only have to run the servers; all the rest is done by volunteers.

A critical property of wikis is consensus, which comes thanks to collaboration and custom policies. For instance, one of the core content policies of the Wikipedia encyclopedia is that each article should be written from a neutral point of view (NPOV). This forces authors to not write from their own point of view and that helps lead them to consensus with authors that have another point of view about the topic. The MediaWiki software also has software support to facilitate reaching consensus, such as the talk pages and change tracking.

But traditional wikis have their limits, as most knowledge is locked inside text and cannot be queried in a smart way. Wikipedia has an answer for this with thousands of lists, for instance lists of countries (which is itself a list of lists). But these are all manually maintained, each of them ordered by another property, like birth rate, literacy rate, population, income equality, and so on. So Greaves asked the logical question: "Why don't we give Wikipedia authors a way to add structure to their content?"

Semantic MediaWiki

That's where semantic wikis come in, and according to Greaves they hold a lot of promise:

Semantic wikis augment traditional wikis with database capabilities, but with one crucial difference: traditional databases are schema-first, while semantic wikis are schema-last: the database schema is developed and maintained in the wiki by the authors themselves. So with semantic wikis we get a lot more flexibility and we have the means to reach social consensus over data. Unfortunately, consensus over data is very hard, and one of the prime reasons is that data modeling is a highly specialized skill.

One project working to add semantics to wiki systems is Semantic MediaWiki (SMW), a GPL licensed extension to MediaWiki that allows annotating semantic data within wiki pages. This means that a MediaWiki wiki that incorporates the extension is turned into a semantic wiki: content that has been enriched with semantic information can be used in specialized searches, used for aggregation of pages, displayed in alternate formats like maps, calendars, or graphs, and exported to formats like RDF (Resource Description Framework) and CSV (Comma-Separated Values).

How does this work?

Some examples will make it clear what SMW adds. For instance, on the normal Wikipedia page of France, there's a link to its capital city, Paris:

    ... the capital city is [[Paris]] ...

The [[Paris]] code is a link to a wiki page about Paris, but there's no information encoded about the specific relationship between France and Paris.

In contrast to this classical approach, the semantic web is all about interlinking data in a machine-readable way. The core technology under the hood is RDF, which is used to describe entities and their relationships. Each RDF statement comes in the form of a triple: subject - predicate - object. Each subject and predicate is identified by a URI, while an object can be represented by a URI or be a literal value such as a string or a number. So, a Semantic MediaWiki version of the sentence about Paris could be:

    ... the capital city is [[Has capital::Paris]] ...

The [[Has capital::Paris]] code not only adds a link to a wiki page about Paris, it also specifies the nature of the relationship between France and Paris: France has Paris as its capital. Or to translate it into an RDF triple: "France" (which is implicit as it is the topic of the current page) is the subject, "has capital" is the predicate, and "Paris" is the object.

This is an example where the object can be represented by a URI, but there are also other examples where the object is represented by a literal value such as a number:

    ... its population is [[Has population::65,821,885]] ...

When this code is on the page about France, it represents an RDF triple with "France" as its subject, "has population" as its predicate, and "65,821,885" as its object. These typed links (with the predicate as the type) give SMW an out-of-the-box mechanism to automatically generate lists. With SMW's inline queries feature, it's easy to re-use this structured information to generate lists and tables which are automatically updated and cached. For instance, users can easily generate a page with a list of all countries ordered by their population, or a list of all countries with a population greater than 20 million, or a table of all countries with their capitals, and so on.

Automatically-generated lists are not the only possibility when you start adding semantic links. You can also display the information in various formats, you can have different language versions of a wiki using the same data, you can integrate and mash-up your wiki's data and export it for external re-use, and more.

Ecosystem

The development of Semantic MediaWiki was initially funded by the EU project SEKT (Semantically Enabled Knowledge Technologies), and after this supported in part by the University of Karlsruhe in Germany. The first release was version 0.1 in 2005. In 2007, Vulcan started sponsoring the German company Ontoprise to develop a commercial version of the extension, Semantic MediaWiki+ (SMW+).

According to Greaves, there are 50 open source MediaWiki extensions that use the semantic information provided by SMW. For example, there's Halo, funded by Vulcan, that facilitates creation, retrieval, navigation and organization of semantic data with some intuitive graphical user interfaces, Semantic Drilldown that provides a faceted browser interface for viewing semantic data by filtering, and Semantic Result Formats that provides a large number of display formats, including maps, calendars, graphs, and charts.

If you want to install SMW on your own wiki, there's an extensive administrator manual with installation instructions and a list of the configuration options. For users who will be entering the semantic markup, the project also has a user manual

Some semantic wikis

Semantic MediaWiki is already used by over 300 public active wikis around the world. Greaves called these semantic wiki applications "the icing on the cake", as they really show the flexibility of adding semantics to a wiki. Some notable examples are Open Energy Information, a crowdsourced wiki with information about energy resources, including real-time data and visualizations, SKYbrary, a wiki created by several European aviation organizations to create a comprehensive source of aviation safety information, Familypedia, a wiki on family history and genealogy, SNPedia, a wiki investigating human genetics, Oh Internet, a wiki to track internet memes, and Ultrapedia, a search engine for OCR'd books.

Many organizations also use SMW internally, including Pfizer, Johnson & Johnson Pharmaceutical Research and Development, and the U.S. Department of Defense. Greaves added that Vulcan is eating its own dog food:

We have created a lightweight project management tool with Semantic MediaWiki+. Our developers even change the ontology of this Scrum wiki [Scrum is an iterative, incremental framework for project management] once a month, which proves that the added flexibility of a schema-last database is welcome.

Towards a semantic Wikipedia

Some academics have already proposed using SMW on Wikipedia to tackle the problem of the many lists that have to be created manually, but according to Wikimedia Foundation Deputy Director Erik Möller it's still unclear whether SMW is up to the task of supporting a web site on the scale of Wikipedia. So while Semantic MediaWiki already powers a lot of web sites and is quite user-friendly, it remains to be seen whether it will eventually bring semantics to the ultimate wiki, Wikipedia.

The SMW project has a fairly detailed roadmap. Some of the interesting tasks are an improvement of the usability of the semantic search features (part of Google Summer of Code 2011), a light version of SMW without query capabilities, improvements for the Semantic Drilldown extension, and so on. It's already quite usable, as many of the active SMW wikis show, but to really reach the vision of the semantic web and be able to link various semantic wikis and other content management systems, Semantic MediaWiki needs to become as easy to use as Wikipedia.

Comments (5 posted)

Page editor: Jonathan Corbet
Next page: Security>>


Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds