Just in Time For KDE SC 4.4: Virtuoso 6.1.0

Finally all testing and bugfixing is finished. OpenLink has done an outstanding job with this new release of Virtuoso. Again my thanks go out to the Virtuoso development team and Patrick van Kleef who was my contact to smooth out the issues which prevented us to use Virtuoso 6 with Nepomuk.

So now is the time for distributions to package Virtuoso 6.1.0 and for you to update it on your own. But wait, there is one little detail: the database format changed significantly between Virtuoso 5 and 6. That is why I wrote a little conversion tool called Virtuosoconverter which takes care of this problem (Caution: the build system will download the Virtuoso 5.0.12 sources which are roughly 60MB). Usage is simple:

  1. Shut down Nepomuk
  2. Install Virtuoso 6.1.0
  3. Run the Converter
  4. Restart Nepomuk

Virtuoso 6 offers a wide range of features which are yet to be exposed through Nepomuk. The fun is only just starting!

Hints for Distributors:

  • You might want to run the converter in auto mode before starting Nepomuk.
  • If you do not like the build system downloading the Virtuoso 5 sources simply put them in the source tree. The build system will pick them up and use them instead of downloading.

Updates:

  • If you have old Virtuoso V5 data and do not run the converter after updating to Virtuoso V6 Nepomuk will not start.
  • The converter is the only way to convert the data to the new database format (except if you run some sql commands on the server manually)

Dangling Meta Data Graphs (Caution: Very Technical)

Nepomuk in KDE uses NRL – the Nepomuk Representation Language – especially the named graphs that it defines. Each triple that is stored in the Nepomuk database is stored in a named graph. We use this graph to attach meta data to the triples themselves. So far this is restricted to the creation date but in the future there will be more like the creator (for shared meta data) and the modification date (this is a bit tricky since technically triples are never modified, only added and deleted. But from a user’s point of view changing a rating means changing a triple.).

What did I say? “We attach meta data to the triples”? Well, to be exact we attach it to the graph which contains the triples. And since everything is triples (or quadruples since there is the named graph) the meta data is, too. And like every triple these also need to be put in a dedicated named graph – the meta data graph. Thus, each triple is contained in one graph and each graph has exactly one meta data graph.

So far so good. But what happens if we delete all triples in a graph? Well, the graph ceases to exist since graphs like resources in an RDF database do only exist due to the triples in which they occur.

And that is when it happens: dangling meta data graphs, i.e. meta data graphs that describe a graph which does no longer exist.

In theory Nepomuk could delete these automatically but I decided against that for performance reasons. It would have to check for dangling meta data graphs after each removal operation. So for now (until I come up with some database maintenance service) these graphs are just waste hanging around not bothering anyone (they are small).

In case you want to check how many of them you have in your database use the following command (see the Nepomuk Tips and Tricks for nepomukcmd):

nepomukcmd query 'select count(?mg) where { ?mg nrl:coreGraphMetadataFor ?g . OPTIONAL { graph ?g { ?s ?p ?o . } . } . FILTER(!BOUND(?s)) . }'

And to simply delete them use a bit of shell magic (the –foo is important since it removes any human-readability gimmicks from the otuput):

for a in `nepomukcmd --foo query 'select ?mg where { ?mg nrl:coreGraphMetadataFor ?g . OPTIONAL { graph ?g { ?s ?p ?o . } . } . FILTER(!BOUND(?s)) . }'`; do nepomukcmd rmgraph "$a"; done

Files on Removable Media – Step 1

So far Nepomuk only handled annotations for local files and did not care about mount points and the like. With KDE SC 4.4 that is about to change. (What I present here today is the first step in supporting removable media like USB keys or external hard drives. The next step will have to wait until 4.5.)

As always my blog will have two parts: the user visible one and the technical one which discusses implementation details.

Imagine you had a USB key with a file on it which you annotated, say with a rating of 6:

As long as the key is mounted searching for files with a rating of 6 will always return our image the way we know it:

But now we unmount the key. Thus, the file will not be accessible anymore. But the file still shows up in the search:

However, there is a slight change in the name: now it contains a hint to the USB key (this hint did not make it into 4.4). Opening the file still works since the key is automatically mounted.

But what happens if we remove the key completely and thus, auto-mounting is not an option anymore? Well, the search does still return the file but trying to open it gives us an error. Well, Gwenview does not handle KIO errors properly, thus we do not get an error message. But in theory we would get the message “Please insert the removable medium ‘1,9 GiB Removable Media’ to access this file.”. Just to show that I am not lying let me present the Okular error dialog (which could use some improvement, too):

Here you see the whole ugly Nepomuk query URL and all at the bottom the unformatted error message. Hopefully in KDE SC 4.5 we will have worked out these problems.

So if we would see this error message we would know where to find the file if we want to access it (well, giving proper names to USB keys would help, too).

This is already pretty nice. Step 2 will then be to export the annotations to the removable storage and sync them again as soon as the medium is mounted (remember how I blogged about my experiments with that already? Well, as you can see I did not get that finished, yet.)

The Technical Part

OK, now we know how it looks to the user (or how it should look). Let us have a look under the hood. Basically three players are involved in this process:

The removable storage service

The removable storage service (code in kdebase) uses the great power of Solid to act on newly inserted, mounted, and unmounted removable storage devices. The simple part is that it tells the Strigi service to index the files on the device on mount.

More interesting, however, is what happens on unmount. The service converts all absolute URLs of files on the unmounted device to relative ones. These relative URLs use the filex:/ scheme I made up and consist of two part: the UUID of the storage device and the relative path. In our example above the URL is filex://fc30-3da9/thepic.JPG. In addition a new nfo:Filesystem resource is created storing the description and the UUID of the unmounted device. The files are then related to this new nfo:Filesystem resource via nie:isPartOf.

libnepomuk

Nepomuk::Resource can now transparently handle relative filex:/ URLs. Thus, annotating the file in the example after remounting will store the annotations with the same resource although that uses a filex:/ URL.

The nepomuk:/ KIO slave

The nepomuk:/ KIO slave (code in kdebase) does the rest of the work. The nepomuksearch:/ KIO slave creates the virtual query folders but uses the nepomuk:/ KIO slave to stat all resources (at least the ones with a nepomuk:/ scheme URI).

So as soon as a relative filex:/ URL is encountered it is converted to a local URL if possible:

Solid::StorageAccess* storageFromUUID( const QString& uuid ) {
    QString solidQuery = QString::fromLatin1( "[ StorageVolume.usage=='FileSystem' AND StorageVolume.uuid=='%1' ]" ).arg( uuid.toLower() );
    QList<Solid::Device> devices = Solid::Device::listFromQuery( solidQuery );
    if ( !devices.isEmpty() )
        return devices.first().as<Solid::StorageAccess>();
    else
        return 0;
}

KUrl convertRemovableMediaFileUrl( const KUrl& url, bool evenMountIfNecessary = false ) {
    Solid::StorageAccess* storage = storageFromUUID( url.host() );
    if ( storage &&
         ( storage->isAccessible() ||
           ( evenMountIfNecessary && mountAndWait( storage ) ) ) ) {
        return storage->filePath() + QLatin1String( "/" ) + url.path();
    }
    else {
        return KUrl();
    }
}

And here you can already see the auto-mounting code being called. (I do not show it here since this is enough to read already. If you are interested have a look at the full source code.) The converted URL is then simply passed to KIO::ForwardingSlaveBase which handles the rest. In case the URL cannot be converted (the medium is not mounted and auto-mounting is not used) all information is read from the Nepomuk database to create a proper KIO::UDSEntry.

What We Did Last Summer (And the Rest of 2009) – A Look Back Onto the Nepomuk Development Year With an Obscenely Long Title

2009 is over. Yeah, sure, trueg, we know that, it has been over for a while now! Ok, ok, I am a bit late, but still I would like to get this one out – if only for my archive. So here goes.

Virtuoso

Let’s start with the major topic of 2009 (and also the beginning of 2010): The new Nepomuk database backend: Virtuoso. Everybody who used Nepomuk had the same problems: you either used the sesame2 backend which depends on Java and steals all of your memory or you were stuck with Redland which had the worst performance and missed some SPARQL features making important parts of Nepomuk  like queries unusable. So more than a year ago I had the idea to use the one GPL’ed database server out there that supported RDF in a professional manner: OpenLink’s Virtuoso. It has all the features we need, has a very good performance, and scales up to dimensions we will probably never reach on the desktop (yeah, right, and 64k main memory will be enough forever!). So very early I started coding the necessary Soprano plugin which would talk to a locally running Virtuoso server through ODBC. But since I ran into tons of small problems (as always) and got sidetracked by other tasks I did not finish it right away. OpenLink, however, was very interested in the idea of their server being part of every KDE installation (why wouldn’t they ;)). So they not only introduced a lite-mode which makes Virtuoso suitable for the desktop but also helped in debugging all the problems that I had left. Many test runs, patches, and a Virtuoso 5.0.12 release later I could finally announce the Virtuoso integration as usable.

Then end of last year I dropped the support for sesame2 and redland. Virtuoso is now the only supported database backend. The reason is simple: Virtuoso is way more powerful than the rest – not only in terms of performance – and it is fully implemented in C(++) without any traces of Java. Maybe even more important is the integration of the full text index which makes the previously used CLucene index unnecessary. Thus, we can finally combine full text and graph queries in one SPARQL query. This results in a cleaner API and way faster return of  search results since there is no need to combine the results from several queries anymore. A direct result of that is the new Nepomuk Query API which I will discuss later.

So now the only thing I am waiting for is the first bugfix release of Virtuoso 6, i.e. 6.0.1 which will fix the bugs that make 6.0.0 fail with Nepomuk. Should be out any day now. :)

The Nepomuk Query API

Querying data in Nepomuk pre-KDE-4.4 could be done in one of two ways: 1. Use the very limited capabilities of the ResourceManager to list resources with certain properties or of a certain type; or 2. Write your own SPARQL query using ugly QString::arg replacements.

With the introduction of Virtuoso and its awesome power we can now do pretty much everything in one query. This allowed me to finally create a query API for KDE: Nepomuk::Query::Query and friends. I won’t go into much detail here since I did that before.

All in all you should remember one thing: whenever you think about writing your own SPARQL query in a KDE application – have a look at libnepomukquery. It is very likely that you can avoid the hassle of debugging a query by using the query API.

The first nice effect of the new API (apart from me using it all over the place obviously) is the new query interface in Dolphin. Internally it simply combines a bunch of Nepomuk::Query::Term objects into a Nepomuk::Query::AndTerm. All very readable and no ugly query strings.

Dolphin Search Panel in KDE SC 4.4

Shared Desktop Ontologies

An important part of the Nepomuk research project was the creation of a set of ontologies for describing desktop resources and their metadata. After the Xesam project under the umbrella of freedesktop.org had been convinced to use RDF for describing file metadata they developed their own ontology. Thanks to Evgeny (phreedom) Egorochkin and Antonie Mylka both the Xesam ontology and the Nepomuk Information Elements Ontology were already very close in design. Thus, it was relatively easy to merge the two and be left with only one ontology to support. Since then not only KDE but also Strigi and Tracker are using the Nepomuk ontologies.

At the Gran Canaria Desktop Summit I met some of the guys from Tracker and we tried to come up with a plan to create a joint project to maintain the ontologies. This got off to a rough start as nobody really felt responsible. So I simply took the initiative and released the shared-desktop-ontologies version 0.1 in November 2009. The result was a s***-load of hate-mails and bug reports due to me breaking KDE build. But in the end it was worth it. Now the package is established and other projects can start to pick it up to create data compatible to the Nepomuk system and Tracker.

Today the ontologies (and the shared-desktop-ontologies package) are maintained in the Oscaf project at Sourceforge. The situation is far from perfect but it is a good start. If you need specific properties in the ontologies or are thinking about creating one for your own application – come and join us in the bug tracker

Timeline KIO Slave

It was at the Akonadi meeting that Will Stephenson and myself got into talking about mimicking some Zeitgeist functionality through Nepomuk. Basically it meant gathering some data when opening and when saving files. We quickly came up with a hacky patch for KIO and KFileDialog which covered most cases and allowed us to track when a file was modified and by which application. This little experiment did not leave that state though (it will, however, this year) but another one did: Zeitgeist also provides a fuse filesystem which allows to browse the files by modification dates. Well, whatever fuse can do, KIO can do as well. Introducing the timeline:/ KIO slave which gives a calendar view onto your files.

Tips And Tricks

Well, I thought I would mention the Tips And Tricks section I wrote for the techbase. It might not be a big deal but I think it contains some valuable information in case you are using Nepomuk as a developer.

Google Summer Of Code 2009

This time around I had the privilege to mentor two students in the Google Summer of Code. Alessandro Sivieri and Adam Kidder did outstanding work on Improved Virtual Folders and the Smart File Dialog.

Adam’s work lead me to some heavy improvements in the Nepomuk KIO slaves myself which I only finished this week (more details on that coming up). Alessandro continued his work on faceted file browsing in KDE and created:

Sembrowser

Alessandro is following up on his work to make faceted file browsing a reality in 2010 (and KDE SC 4.5). Since it was too late to get faceted browsing into KDE SC 4.4 he is working on Sembrowser, a stand-alone faceted file browser which will be the grounds for experiments until the code is merged into Dolphin.

Faceted Browsing in KDE with Sembrowser

Nepomuk Workshops

In 2009 I organized the first Nepomuk workshop in Freiburg, Germany. And also the second one. While I reported properly on the first one I still owe a summary for the second one. I will get around to that – sooner or later. ;)

CMake Magic

Soprano gives us a nice command line tool to create a C++ namespace from an ontology file: onto2vocabularyclass. It produces nice convenience namespaces like Soprano::Vocabulary::NAO. Nepomuk adds another tool named nepomuk-rcgen. Both were a bit clumsy to use before. Now we have nice cmake macros which make it very simple to use both.

See the techbase article on how to use the new macros.

Bangarang

Without my knowledge (imagine that!) Andrew Lake created an amazing new media player named Bangaranga Jamaican word for noise, chaos or disorder. This player is Nepomuk-enabled in the sense that it has a media library which lets you browse your media files based on the Nepomuk data. It remembers the number of times a song or a video has been played and when it was played last. It allows to add detail such as the TV series name, season, episode number, or actors that are in the video – all through Nepomuk (I hope we will soon get tvdb integration).

Edit metadata directly in Bangarang

Dolphin showing TV episode metadata created by Bangarang

And of course searching for it works, too...

And it is pretty, too...

I am especially excited about this since finally applications not written or mentored by me start contributing Nepomuk data.

Gran Canaria Desktop Summit

2009 was also the year of the first Gnome-KDE joint-conference. Let me make a bulletin for completeness and refer to my previous blog post reporting on my experiences on the island.

Well, that was by far not all I did in 2009 but I think I covered most of the important topics. And after all it is “just a blog entry” – there is no need for completeness. Thanks for reading.

Convenient Querying in libnepomuk

It has been a while since I blogged and a lot has happened. Not only did we have the second Nepomuk workshop on the “Open Social Semantic Desktop” which I did not report on yet, there is also a lot of stuff going on for KDE 4.4. And since I like blogging about technical stuff so much I will start with that. So here goes:

Queries in KDE 4.4 will be so much simpler (for the developer that is) since we now have the Nepomuk Query API!

Do you remember the days when you tried to write your own SPARQL queries and code looked like this:

Nepomuk::Tag myTag = getOurFancyTag();
QString query
   = QString("select distinct ?r where { ?r %1 %2 . }")
     .arg( Soprano::Node::resourceToN3(Soprano::Vocabulary::NAO::hasTag()) )
     .arg( Soprano::Node::resourceToN3(myTag.resourceUri()) );

Well, using the query API this looks a lot nicer:

Nepomuk::Query::ResourceTerm tagTerm(myTag);
Nepomuk::Query::ComparisonTerm term(Soprano::Vocabulary::NAO::hasTag(), tagTerm);
Nepomuk::Query::Query query(term);
QString queryString = query.toSparqlString();

As you can see you do not need to know any SPARQL anymore. But it gets better. The Query class is integrated with the Nepomuk Query Service via the QueryServiceClient class. This allows to simply let the query service do the querying and receive result change updates – in other words use live searches:

Nepomuk::Query::QueryServiceClient client;
connect(&client, SIGNAL(newEntries(QList<Nepomuk::Query::Result>)),
            this, SLOT(slotNewEntries(QList<Nepomuk::Query::Result>)));
client.query(query);

And now just handle the incoming results.

Maybe even nicer is the integration with KIO, i.e. the possibility to list search results as a virtual folder:

KDirModel* model = getFancyDirModel();
Nepomuk::Query::Query query = buildFancyQuery();
KUrl searchUrl = query.toSearchUrl();
model->dirLister()->open( searchUrl );

As you can see it is really simple to list results via KIO (BTW: this is what Dolphin does).

For more examples check the Nepomuk Query API documentation.

Just another way of browsing your files

The Zeitgeist guys created a fuse file system called zeitgeistfs. It is basically a calendar containing the files accessed at that specific date. So at the Akonadi meeting last weekend, having two hours to kill, I thought that should be doable with KIO. So two hours later (most of that time was spent twiddling with UDS entries) the timeline:/ KIO slave was up and running:

The code that actually does something is minimal: a bit of UDS entry creation for dates and a simple SPARQL query to forward to the nepomuksearch KIO slave. Yes, it is as easy as that since we can simply set the UDS_URL property of an item to a nepomuksearch URL and KIO will take care of the rest. Smooth. Thanks a lot David Faure. Once again you paved the way.

OK then. Try it if you like. This is just another example of what can be done with Nepomuk (no real semantics here though). The code is in the playground as always and is based on the current kdebase trunk. So with KDE 4.3 this baby won’t work. And of course as this is based on file meta data, the Nepomuk Strigi integration has to be enabled.

SPARQL is weird…

It really is. The following query is the only way I found to exclude folders when looking for files:

select ?r where {
   ?r a nfo:FileDataObject .
   OPTIONAL { ?r2 a nfo:Folder . FILTER(?r = ?r2) . } .
   FILTER( !BOUND(?r2) ) .
}

Now to me this looks really weird. And maybe I am simply not seeing the wood for all the trees…

Virtuoso – Once More With Feeling

The Virtuoso backend for Soprano and, thus, Nepomuk can be seen as rather stable now. So now the big tests can begin as the goal is to make it the standard in KDE 4.4. Let me summarize the important information again:

Step 1

Get Virtuoso 5.0.12 from the Sourceforge download page. Virtuoso 6 is NOT supported. (not yet anyway)

Step 2

Hints for packagers: Soprano only needs two files: the virtuoso-t binary and the virtodbc_r(.so) ODBC driver. Everything else is optional. (For the self-compiling folks out there: –disable-all-vads is your friend.)

Step 3

Install libiodbc which is what the Soprano build will look for (Virtuoso is simply a run-time dependency.)

Step 4

Rebuild Soprano from current svn trunk (Remember: Redland is still mandatory. Its memory storage is used all over Nepomuk!)

Step 5

Edit ${KDEHOME}/share/config/nepomukserverrc with your favorite editor. In the “[Basic Settings]“ section add “Soprano Backend=virtuosobackend”. Do not touch the main repository settings!

Step 6

Restart Nepomuk. I propose the following procedure to gather debugging information in case something goes wrong:
Shutdown Nepomuk completely:

 # qdbus org.kde.NepomukServer /nepomukserver org.kde.NepomukServer.quit

Restart it by piping the output into a temporary file (bash syntax):

 # nepomukserver 2> /tmp/nepomuk.stderr

Step 7

Wait for Nepomuk to convert your data. If you are running KDE trunk you even get a nice progress bar in the notification area (BTW: does anyone know why it won’t show the title?)

And Now?

That is already it. Now you can enjoy the new Virtuoso backend.

The development has taken a long time. But I want to thank OpenLink and especially Patrick van Kleef who helped a lot by fixing the last little tidbits in Virtuoso 5 for my unit tests to pass. Next step is Virtuoso 6.

And Yet Another Post About Virtuoso

Today nearly all problems are solved. OpenLink provided a patch that makes inserting very large literals (more than 1 metabyte in size) lightning fast, even with a very low buffer count. Also I worked around the issue of URI encoding. Now the Soprano Virtuoso backend simply percent-encodes all non-unreserved characters and all reserved characters that are not used in their special meaning in URIs used in queries. Man, that is a mouth full. Well, it seems to work fine although I can always use more testing with weird file URLs (weird means containing weird characters like brackets and the likes). I also fixed some error handling bugs.

So what is left? Well, there are a few hacks in the Virtuoso backend which are rather ugly. One example is the detection of query result types. To determine if the result is boolean, bindings, or a graph it actually checks the name and number of result columns. Urgh! It would be nicer to check for the type of the result. Seems like graph results are BLOBs.

Anyway, enough for tonight. I am tired. Here is the patch to make Virtuoso not hang when Strigi adds nie:PlainTextContent literals of big files:

Index: sqlrcomp.c
===================================================================
RCS file: virtuoso-opensource/libsrc/Wi/sqlrcomp.c,v
retrieving revision 1.9
diff -u -r1.9 sqlrcomp.c
--- sqlrcomp.c  20 Aug 2009 17:47:22 -0000      1.9
+++ sqlrcomp.c  13 Oct 2009 16:11:49 -0000
@@ -65,7 +65,7 @@
 {
 va_list list;
 char temp[2000];
-  int ret;
+  int ret, rest_sz, copybytes;
 va_start (list, string);
 ret = vsnprintf (temp, sizeof (temp), string, list);
 #ifndef NDEBUG
@@ -75,11 +75,16 @@
 va_end (list);
 #ifndef NDEBUG
 if (*fill + strlen (temp) > len - 1)
-    GPF_T1 ("overflow in strncpy");
+    GPF_T1 ("overflow in memcpy");
 #endif
-  strncpy (&text[*fill], temp, len - *fill - 1);
+  rest_sz = (len - fill[0]);
+  if (ret >= rest_sz)
+    copybytes = ((rest_sz > 0) ? rest_sz : 0);
+  else
+    copybytes = ret+1;
+  memcpy (text+fill[0], temp, copybytes);
 text[len - 1] = 0;
-  *fill += (int) strlen (temp);
+  fill[0] += ret;
 }

Virtuoso – for real!

used-bckend-virtuoso

Soprano 2.3.63 – that is the magic version number you need to look out for.

And then once you have updated your kdebase copy to the latest trunk you run your favorite text editor on ~/.kde/share/config/nepomukserverrc. In there you set Soprano Backend=virtuosobackend in the [Basic Settings] section. After that you simply restart Nepomuk as described in the corresponding howto. You can also logout and log back in again but then you won’t be able to provide as nice bug reports.

Once done Nepomuk will convert your database. This can take a loooong time if strigi is enabled. But it will finish. :)

BTW: You need a recent snapshot of Virtuoso 5.0.12 for this to work.