July 13, 2011
This article was contributed by Koen Vervloesem
Interactive Knowledge Stack
(IKS) is an open source project focused on building an open and flexible
technology platform for semantically enhanced Content Management Systems
(CMS). It is a collaboration between academia, industry, and open source
developers, co-funded with €6.58 million by the European Union. The
goal is to enrich content management systems with semantic content in order
to let the users benefit from more intelligent extraction and linking of
their content. This could solve part of the chicken-and-egg problem for the
semantic web that arises because end users don't have easy-to-use semantic web tools.
At the recent IKS
workshop in Paris, one of the keynote speakers was Mark Greaves, who
spoke about the possibilities of the semantic web in the wiki setting. His speech
looked at the limits of traditional wikis, the promise of semantic wikis, and the birth of Semantic MediaWiki (SMW), an extension to MediaWiki, the wiki software that powers Wikipedia.
Wikis have become a powerful instrument for crowdsourcing, but they're
not the only types of content management systems that tap into the
potential of the crowd. Greaves, who is working as Director of Knowledge
Systems at Paul Allen's asset management company Vulcan, emphasized that bulletin boards,
forums, and newsgroups are the antecedents of wikis and even the beginnings
of social networks. Now we have many websites that crowdsource their content from their users:
Users write reviews on Amazon, give recommendations on Amazon and Netflix, add tags to content on Flickr and YouTube, and so on. So we see in a lot of cases that users can help building your website, not only the content but even the structure. For instance, the system administrators of WikiMedia only have to run the servers; all the rest is done by volunteers.
A critical property of wikis is consensus, which comes thanks to
collaboration and custom policies. For instance, one of the core content
policies of the Wikipedia encyclopedia is that each article should be
written from a neutral
point of view (NPOV). This forces authors to not write from their own
point of view and that helps lead them to consensus with authors that have
another point of view about the topic. The MediaWiki software also has
software support to facilitate reaching consensus, such as the talk pages
and change tracking.
But traditional wikis have their limits, as most knowledge is locked inside text and cannot be queried in a smart way. Wikipedia has an answer for this with thousands of lists, for instance lists of countries (which is itself a list of lists). But these are all manually maintained, each of them ordered by another property, like birth rate, literacy rate, population, income equality, and so on. So Greaves asked the logical question: "Why don't we give Wikipedia authors a way to add structure to their content?"
Semantic MediaWiki
That's where semantic wikis come in, and according to Greaves they hold a lot of promise:
Semantic wikis augment traditional wikis with database capabilities, but with one crucial difference: traditional databases are schema-first, while semantic wikis are schema-last: the database schema is developed and maintained in the wiki by the authors themselves. So with semantic wikis we get a lot more flexibility and we have the means to reach social consensus over data. Unfortunately, consensus over data is very hard, and one of the prime reasons is that data modeling is a highly specialized skill.
One project working to add semantics to wiki systems is Semantic
MediaWiki (SMW), a GPL licensed extension to MediaWiki that allows annotating
semantic data within wiki pages. This means that a MediaWiki wiki that
incorporates the extension is turned into a semantic wiki: content that has
been enriched with semantic information can be used in specialized
searches, used for aggregation of pages, displayed in alternate formats
like maps, calendars, or graphs, and exported to formats like RDF (Resource Description Framework) and
CSV (Comma-Separated Values).
How does this work?
Some examples will make it clear what SMW adds. For instance, on the normal Wikipedia page of France, there's a link to its capital city, Paris:
... the capital city is [[Paris]] ...
The [[Paris]] code is a link to a wiki page about Paris, but
there's no information encoded about the specific relationship between
France and Paris.
In contrast to this classical approach, the semantic web is all about interlinking data in a machine-readable way. The core technology under the hood is RDF, which is used to describe entities and their relationships. Each RDF statement comes in the form of a triple: subject - predicate - object. Each subject and predicate is identified by a URI, while an object can be represented by a URI or be a literal value such as a string or a number. So, a Semantic MediaWiki version of the sentence about Paris could be:
... the capital city is [[Has capital::Paris]] ...
The [[Has capital::Paris]] code not only adds a link to a wiki page about Paris, it also specifies the nature of the relationship between France and Paris: France has Paris as its capital. Or to translate it into an RDF triple: "France" (which is implicit as it is the topic of the current page) is the subject, "has capital" is the predicate, and "Paris" is the object.
This is an example where the object can be represented by a URI, but there are also other examples where the object is represented by a literal value such as a number:
... its population is [[Has population::65,821,885]] ...
When this code is on the page about France, it represents an RDF triple with "France" as its subject, "has population" as its predicate, and "65,821,885" as its object. These typed links (with the predicate as the type) give SMW an out-of-the-box mechanism to automatically generate lists. With SMW's inline queries feature, it's easy to re-use this structured information to generate lists and tables which are automatically updated and cached. For instance, users can easily generate a page with a list of all countries ordered by their population, or a list of all countries with a population greater than 20 million, or a table of all countries with their capitals, and so on.
Automatically-generated lists are not the only possibility when you
start adding semantic links. You can also display the information in
various formats, you can have different language versions of a wiki using
the same data, you can integrate and mash-up your wiki's data and export it
for external re-use, and more.
Ecosystem
The development of Semantic MediaWiki was initially funded by the EU
project SEKT (Semantically
Enabled Knowledge Technologies), and after this supported in part by the
University of Karlsruhe in Germany. The first release was version 0.1 in 2005. In 2007, Vulcan started sponsoring the German company Ontoprise to develop a commercial version of the extension, Semantic MediaWiki+ (SMW+).
According to Greaves, there are 50 open source MediaWiki extensions that use the semantic information provided by SMW. For example, there's Halo, funded by Vulcan, that facilitates creation, retrieval, navigation and organization of semantic data with some intuitive graphical user interfaces, Semantic Drilldown that provides a faceted browser interface for viewing semantic data by filtering, and Semantic Result Formats that provides a large number of display formats, including maps, calendars, graphs, and charts.
If you want to install SMW on your own wiki, there's an extensive administrator
manual with installation
instructions and a list of the configuration
options. For users who will be entering the semantic markup, the
project also has a user
manual
Some semantic wikis
Semantic MediaWiki is already used by over 300 public active wikis around
the world. Greaves called these semantic wiki applications "the
icing on the cake", as they really show the flexibility of adding
semantics to a wiki. Some notable examples are Open Energy Information, a
crowdsourced wiki with information about energy resources, including
real-time data and visualizations, SKYbrary, a wiki
created by several European aviation organizations to create a comprehensive source of aviation safety information, Familypedia, a wiki on family history and genealogy, SNPedia, a wiki investigating human genetics, Oh Internet, a wiki to track internet memes, and Ultrapedia, a search engine for OCR'd books.
Many organizations also use SMW internally, including Pfizer, Johnson & Johnson Pharmaceutical Research and Development, and the U.S. Department of Defense. Greaves added that Vulcan is eating its own dog food:
We have created a lightweight project management tool with Semantic MediaWiki+. Our developers even change the ontology of this Scrum wiki [Scrum is an iterative, incremental framework for project management] once a month, which proves that the added flexibility of a schema-last database is welcome.
Towards a semantic Wikipedia
Some academics have already proposed using SMW on Wikipedia to tackle the problem of the many lists that have to be created manually, but according to Wikimedia Foundation Deputy Director Erik Möller it's still unclear whether SMW is up to the task of supporting a web site on the scale of Wikipedia. So while Semantic MediaWiki already powers a lot of web sites and is quite user-friendly, it remains to be seen whether it will eventually bring semantics to the ultimate wiki, Wikipedia.
The SMW project has a fairly detailed roadmap. Some of the
interesting tasks are an improvement of the usability of the semantic
search features (part of Google Summer of Code 2011), a light version of
SMW without query capabilities, improvements for the Semantic Drilldown
extension, and so on. It's already quite usable, as many of the active SMW
wikis show, but to really reach the vision of the semantic web and be able
to link various semantic wikis and other content management systems,
Semantic MediaWiki needs to become as easy to use as Wikipedia.
(
Log in to post comments)