November 11, 2009
This article was contributed by Ben Martin
The Nepomuk project has the
potential to unlock the data from its originating application so that
it can used by other applications on the desktop. If Nepomuk becomes
pervasive, history logs, bookmarks, file metadata, email,
instant messages, photo tags, or other metadata will be shared between
various desktop applications. Why should music metadata like track length
or artist and song title be locked away in an index created and used
explicitly by a music playing application?
Consider a download assistant such as kget. The subversion branch of
kget recently got the ability to store its download history using
Nepomuk. kget could already save the transfer history in XML or
SQLite. The advantage of using Nepomuk is that other desktop tools can
easily see where a file was downloaded from and when; the information
is unlocked from just kget. With Nepomuk, other applications don't need to know
where the SQLite file is, or find and parse an XML file. All of the
sudden, the file manager can let you know where this file came from so
you can easily return for newer versions, or a desktop search
can reveal all the files downloaded last year from http://example.com.
To allow data to be stored, exchanged, and understood by many
applications, Nepomuk uses the same underlying technology that
the Semantic
Web is designed around. The Semantic Web tries to separate the
data from the presentation in a way that allows for both humans and
computers to inspect and digest the data. At the base of the Semantic
Web is
Resource Description Framework (RDF)
which aims to allow metadata to be exchanged in an unambiguous,
machine-processing-friendly format.
There are many who dismiss the Semantic Web as an ivory tower pipe
dream. Various concerns are cited as reasons that RDF will not be
adopted: it takes extra time to generate RDF data, it allows for
automated comparisons which will make companies uncomfortable, and
there will no agreement between companies on which schemas to use, etc.
Nepomuk and RDF have a huge potential on the Free and Open
Source Software (FOSS) desktop because application developers have no
vested interest in locking their data away, and due to the nature of
free software, one can patch in RDF and/or Nepomuk support into projects. The
latter problem about projects designing their own schema is still
present for FOSS, but, luckily, schema
mismatches in and of themselves are not a show stopper for RDF
adoption. By definition, once data is in RDF it can be processed
automatically by a computer, so the machine, rather than the human, can
always work
around schema differences.
RDF tries to
capture information in the form of triples.
The classic examples are relationships and ownerships, for
example: "Mary knows Mark" and "dog has tail". To avoid name clashes for
things described in RDF, longer URL style identifiers are used for the
three pieces of information. To get back to
smaller text strings for these URLs, prefixes are used in the style of XML
namespaces. For example, foaf:name could be used for a human name
which expands to the URL
http://xmlns.com/foaf/0.1/name. This way,
individual things can still be described concisely, but they should also
have globally
understood meaning. A foaf:name is a person's name, whereas a
toolshed:name might name a screwdriver.
Below is an example of using Nepomuk from the command line to create and
list an RDF file:
$ sopranocmd --backend redland add \
"<http://onto.libferris.com/things/1234>" "<http://onto.libferris.com/price>" "30"
$ sopranocmd --backend redland add \
"<http://onto.libferris.com/things/1234>" "<http://onto.libferris.com/title>" \
"super crazy magical item"
$ sopranocmd --backend redland list
<http://onto.libferris.com/things/1234> <http://onto.libferris.com/price> \
"30"^^<http://www.w3.org/2001/XMLSchema#int> (empty)
<http://onto.libferris.com/things/1234> <http://onto.libferris.com/title> \
"super crazy magical item"^^<http://www.w3.org/2001/XMLSchema#string> (empty)
Total results: 2
Execution time: 00:00:00.1
While an RDF repository can be used to just store, update, and query
these triples, a schema can also be imposed so that
applications know what to expect. For example, that the foaf:homepage
is a link to a web site
with certain
constraints. Examples of constraints include the type of data
stored (integer, date, etc), how many times a property can appear
(only one homepage), and so on.
The
SPARQL query
language can be used to join together the triples and select the
information of interest. While SPARQL uses familiar SQL, like the
SELECT, WHERE, ORDER BY, and LIMIT keywords, joining
triples is a bit different than with SQL. For example, the query below grabs
the price and title for "something". We don't particularly care what
the something is, as long as the same something has a title and a price
of less than 30.5.
SELECT ?title ?price
WHERE { ?x ns:price ?price .
FILTER (?price < 30.5)
?x dc:title ?title . }
With all this talk of RDF, triples, and ivory towers, one might
think that using Nepomuk and RDF will be painful and have an extremely long
learning curve. Below are a few examples of using Nepomuk in a KDE
application to quell those fears. Nepomuk makes using RDF simple
because it provides a code generator that makes native C++ classes to
allow interaction with the RDF store:
Nepomuk::File f( "/home/foo/bar.txt" );
f.setAnnotation( "This is just a test file that contains nothing of interest." );
The above is much neater than thinking in terms of the triples shown below which might
be stored to represent it. In this case X will really be a persistent unique
identifier used to identify the file, similar to the device number and
inode in the kernel. The type, file etc will of course be longer URIs in the
real RDF store.
X type file
X url "/home/foo/bar.txt"
X annotation "This is just a..."
The above example which uses setAnnotation() takes advantage of a
schema for annotating and tagging files which comes with Nepomuk
itself. The kget program mentioned earlier in the article is a good
example of not using a standard schema. In the sources of kget,
the transferhistorystore.cpp
file manages the XML, SQLite, and Nepomuk representations of download
history. At the end of transferhistorystore.cpp file, there is the
following code:
void NepomukStore::saveItem(const TransferHistoryItem &item)
{
Nepomuk::HistoryItem historyItem(item.source());
historyItem.setDestination(item.dest());
historyItem.setSource(item.source());
historyItem.setState(item.state());
historyItem.setSize(item.size());
historyItem.setDateTime(item.dateTime());
}
void NepomukStore::deleteItem(const TransferHistoryItem &item)
{
Nepomuk::HistoryItem historyItem(item.source());
historyItem.remove();
...
The HistoryItem class is generated by Nepomuk using the custom schema
file kget_history.trig,
part of which is repeated below. While the schema language that
kget_history.trig is using may be unfamiliar, it should
still be clear that there is a ndho:HistoryItem which has
properties of various types with various restrictions on them, such as
a destination property which can appear zero or one times and is a
string. Given the below schema file, Neopmuk can generate the C++ class
Nepomuk::HistoryItem needed to allow the above C++ code compile.
<http://nepomuk.kde.org/ontologies/2008/10/06/ndho> {
ndho:HistoryItem
a rdfs:Class ;
rdfs:comment "A kget history item." ;
rdfs:label "application" ;
rdfs:subClassOf rdfs:Resource .
ndho:destination
a rdf:Property ;
rdfs:comment "Destination of the download." ;
rdfs:domain ndho:HistoryItem ;
rdfs:label "source" ;
rdfs:range xsd:string ;
nrl:maxCardinality "1" .
...
At the base of the Nepomuk project is the Soprano library and command
line tools which depend only on QtCore, making them a useful RDF
library for use on both desktop and mobile platforms. The Nepomuk
libraries build on Soprano to make writing KDE applications using RDF
simple. One of the great things about the design of Soprano is that
there are multiple backends which can store and query RDF. So there can
be a memory mapped implementation for a mobile device, or a full-blown
database server for a LAN, and applications still use the same
API.
For a long time Soprano has had two main
backends: Redland and Sesame2. The
former is a C library for RDF and the latter a Java implementation.
While Sesame2 is written in Java it can
deliver
better query performance than Redland. This left KDE4 in the
predicament that it required Java to achieve good RDF performance. To
solve this issue the
new Virtuoso backend was
created and is getting to the point where it is now stable.
As
I discovered
recently, the main impediment to developing a backend for
soprano is implementing SPARQL.
Adoption still remains the major hurdle for Nepomuk and Soprano.
With the host of persistence options available, the first thing that
comes to an application developer's mind might be flat files, MySQL,
Sqlite, Berkeley DB, or some
generic relational database library, when wanting to store and retrieve
data. However, when storing data that might be of interest to
other applications, using Nepomuk or Soprano has the potential to
unlock an application's data. As can be seen above,
the main thing to learn is a bit about the schema language and
then native C++ objects can be used to interact with Nepomuk
from an application.
(
Log in to post comments)