Home > Soprano > “Are We There Yet?” – The Long Road To a Stable Soprano Virtuoso Backend

“Are We There Yet?” – The Long Road To a Stable Soprano Virtuoso Backend

February 27, 2009 Leave a comment Go to comments

The first time I blogged about the Virtuoso backend for Soprano testing it was still a bit bumpy: one had to manually start a server locally and then connect to it. Now the situation has changed. I just commited the changes that allow the Soprano Virtuoso backend to spawn a local instance of the Virtuoso server. Thus, using the Virtuoso backend in Nepomuk is now as simple as specifying it in the Nepomuk server configuration file ~/.kde/share/config/nepomukserverrc as mentioned before:

[Basic Settings]
Configured repositories=main
Soprano Backend=virtuoso
Start Nepomuk=true

(Notice that the backend name is now “virtuoso” without the “backend” suffix. Actually Soprano can now handle both for convinience.)

The backend is still not 100% stable. Some queries still fail due to encoding problems. which may include conversion of the data.

Apart from the spawning of a local instance the backend can now handle indices (for improved query speed) and the state of the full text index. Both are controlled through backend options as explained in the Soprano Virtuoso backend documentation.

Virtuoso Packaging

Now that is established let’s go to a few packaging questions. There were arguments that Virtuoso uses too much disk space. Well, fact is that the Virtuoso server as used by Soprano only needs a very small portion of the whole Virtuoso installation: the “virtuoso-t” binary and the “virtodbc_r.so” ODBC driver. Combined they come to a size of roughly 9 MB. I think that is no big problem (as a comparison the whole Virtuoso installation is about 150 MB). :)

Packagers should split the Virtuoso package into small parts. I recommend to provide a package for the server binary, one for the ODBC drivers, one for the VAD files, and so on. Thus, the harddisk usage will be minimal.

  1. mutlu
    February 27, 2009 at 15:54 | #1

    Hi Sebastian,

    I am a big fan of your work on the semantic desktop and always smiling when I see a new post by you in Akregator. :)

    I have a question regarding encoding: Some languages use a lot of accentuation (é, ä, etc.) and adhering to utf-8 is of course the way to go. But Google search does something really helpful: searches in ASCII characters return results with higher complexity (“wutend” finds “wütend,” which is to say that “a” produces “ā”, “å”, “ä”, etc.), while a non-ASCII character “ä” does not return a result containing “a”.

    I think it would be great if there was a character-replacement table for nepomuk searches as well, possibly even user-editable. I would certainly appreciate this as I work with foreign languages and use a lot of funny characters for transliterations of non-Roman scripts.

    Thanks for all your work!

    • March 3, 2009 at 07:58 | #2

      Although not related to the blog entry, your comment is a delight to read. :)
      The idea is very good. I would have to see how that can be done… not sure.

      • jerven
        March 4, 2009 at 16:29 | #3

        The main problem will be to keep the transliteration consistent. Or you will end up using an || in your sparql statement all the time.

        e.g.

        ?text bif:contains “Bærum” || ?text bif:contains “Baerum”

        We transliterate all data when inserted into a lucene index and replace the characters when found in a query. The same approach would be valid for virtuoso. However, you will have to do the replacement at the level of soprano as neither sparql nor the virtuoso bif functions supports this (as far as my knowledge goes).

        And I have no ideas on how that would work out with remote shared sparql backends.

        Other use cases will be ™ for trademark. And perhaps for the mathematically inclined alpha to
        \u03B1 etc… But that might be a bad option for greeks ;)

  2. February 27, 2009 at 16:36 | #4

    i am adding this pice of info as reference for the debian community…
    i have linked your packaging suggestions to debian’s BTS where packaging virtuoso is discussed: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=508048

  3. February 28, 2009 at 09:21 | #5

    Question from a database noob (yes, I read the previous blogs): Virtuoso is a database, right ? What can it do more than MySql ? Can it be used for the same purposes as MySql ? I.e. is the functionality a superset of MySql ?

    Alex

    • March 1, 2009 at 10:29 | #6

      yes, Virtuoso’s features are a superset of MySQL’s. Most importantly in my case is its support for RDF including full text search, inference, and extended query features (such as embedded sparql queries and aggregate functions).

  4. Olivier Berger
    March 5, 2009 at 18:30 | #7

    Which version (svn rev.) should we consider if wanting to test the new soprano ?

    • March 5, 2009 at 19:16 | #8

      Try Soprano trunk. I always try to keep the trunk as stable as possible.

      • Olivier Berger
        March 5, 2009 at 19:23 | #9

        Alright, thanks.

        I’m trying to recompile it for Debian ATM, trying to see if I’ll be able to (recompile and) use swimpomuk somehow ;)

  1. No trackbacks yet.