“Are We There Yet?” – The Long Road To a Stable Soprano Virtuoso Backend


The first time I blogged about the Virtuoso backend for Soprano testing it was still a bit bumpy: one had to manually start a server locally and then connect to it. Now the situation has changed. I just commited the changes that allow the Soprano Virtuoso backend to spawn a local instance of the Virtuoso server. Thus, using the Virtuoso backend in Nepomuk is now as simple as specifying it in the Nepomuk server configuration file ~/.kde/share/config/nepomukserverrc as mentioned before:

[Basic Settings]
Configured repositories=main
Soprano Backend=virtuoso
Start Nepomuk=true

(Notice that the backend name is now “virtuoso” without the “backend” suffix. Actually Soprano can now handle both for convinience.)

The backend is still not 100% stable. Some queries still fail due to encoding problems. which may include conversion of the data.

Apart from the spawning of a local instance the backend can now handle indices (for improved query speed) and the state of the full text index. Both are controlled through backend options as explained in the Soprano Virtuoso backend documentation.

Virtuoso Packaging

Now that is established let’s go to a few packaging questions. There were arguments that Virtuoso uses too much disk space. Well, fact is that the Virtuoso server as used by Soprano only needs a very small portion of the whole Virtuoso installation: the “virtuoso-t” binary and the “virtodbc_r.so” ODBC driver. Combined they come to a size of roughly 9 MB. I think that is no big problem (as a comparison the whole Virtuoso installation is about 150 MB). :)

Packagers should split the Virtuoso package into small parts. I recommend to provide a package for the server binary, one for the ODBC drivers, one for the VAD files, and so on. Thus, the harddisk usage will be minimal.

About these ads

10 thoughts on ““Are We There Yet?” – The Long Road To a Stable Soprano Virtuoso Backend

  1. Hi Sebastian,

    I am a big fan of your work on the semantic desktop and always smiling when I see a new post by you in Akregator. :)

    I have a question regarding encoding: Some languages use a lot of accentuation (é, ä, etc.) and adhering to utf-8 is of course the way to go. But Google search does something really helpful: searches in ASCII characters return results with higher complexity (“wutend” finds “wütend,” which is to say that “a” produces “ā”, “å”, “ä”, etc.), while a non-ASCII character “ä” does not return a result containing “a”.

    I think it would be great if there was a character-replacement table for nepomuk searches as well, possibly even user-editable. I would certainly appreciate this as I work with foreign languages and use a lot of funny characters for transliterations of non-Roman scripts.

    Thanks for all your work!

      • The main problem will be to keep the transliteration consistent. Or you will end up using an || in your sparql statement all the time.

        e.g.

        ?text bif:contains “Bærum” || ?text bif:contains “Baerum”

        We transliterate all data when inserted into a lucene index and replace the characters when found in a query. The same approach would be valid for virtuoso. However, you will have to do the replacement at the level of soprano as neither sparql nor the virtuoso bif functions supports this (as far as my knowledge goes).

        And I have no ideas on how that would work out with remote shared sparql backends.

        Other use cases will be ™ for trademark. And perhaps for the mathematically inclined alpha to
        \u03B1 etc… But that might be a bad option for greeks ;)

  2. Question from a database noob (yes, I read the previous blogs): Virtuoso is a database, right ? What can it do more than MySql ? Can it be used for the same purposes as MySql ? I.e. is the functionality a superset of MySql ?

    Alex

    • yes, Virtuoso’s features are a superset of MySQL’s. Most importantly in my case is its support for RDF including full text search, inference, and extended query features (such as embedded sparql queries and aggregate functions).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s