Just in Time For KDE SC 4.4: Virtuoso 6.1.0

Finally all testing and bugfixing is finished. OpenLink has done an outstanding job with this new release of Virtuoso. Again my thanks go out to the Virtuoso development team and Patrick van Kleef who was my contact to smooth out the issues which prevented us to use Virtuoso 6 with Nepomuk.

So now is the time for distributions to package Virtuoso 6.1.0 and for you to update it on your own. But wait, there is one little detail: the database format changed significantly between Virtuoso 5 and 6. That is why I wrote a little conversion tool called Virtuosoconverter which takes care of this problem (Caution: the build system will download the Virtuoso 5.0.12 sources which are roughly 60MB). Usage is simple:

  1. Shut down Nepomuk
  2. Install Virtuoso 6.1.0
  3. Run the Converter
  4. Restart Nepomuk

Virtuoso 6 offers a wide range of features which are yet to be exposed through Nepomuk. The fun is only just starting!

Hints for Distributors:

  • You might want to run the converter in auto mode before starting Nepomuk.
  • If you do not like the build system downloading the Virtuoso 5 sources simply put them in the source tree. The build system will pick them up and use them instead of downloading.

Updates:

  • If you have old Virtuoso V5 data and do not run the converter after updating to Virtuoso V6 Nepomuk will not start.
  • The converter is the only way to convert the data to the new database format (except if you run some sql commands on the server manually)

Virtuoso – Once More With Feeling

The Virtuoso backend for Soprano and, thus, Nepomuk can be seen as rather stable now. So now the big tests can begin as the goal is to make it the standard in KDE 4.4. Let me summarize the important information again:

Step 1

Get Virtuoso 5.0.12 from the Sourceforge download page. Virtuoso 6 is NOT supported. (not yet anyway)

Step 2

Hints for packagers: Soprano only needs two files: the virtuoso-t binary and the virtodbc_r(.so) ODBC driver. Everything else is optional. (For the self-compiling folks out there: –disable-all-vads is your friend.)

Step 3

Install libiodbc which is what the Soprano build will look for (Virtuoso is simply a run-time dependency.)

Step 4

Rebuild Soprano from current svn trunk (Remember: Redland is still mandatory. Its memory storage is used all over Nepomuk!)

Step 5

Edit ${KDEHOME}/share/config/nepomukserverrc with your favorite editor. In the “[Basic Settings]“ section add “Soprano Backend=virtuosobackend”. Do not touch the main repository settings!

Step 6

Restart Nepomuk. I propose the following procedure to gather debugging information in case something goes wrong:
Shutdown Nepomuk completely:

 # qdbus org.kde.NepomukServer /nepomukserver org.kde.NepomukServer.quit

Restart it by piping the output into a temporary file (bash syntax):

 # nepomukserver 2> /tmp/nepomuk.stderr

Step 7

Wait for Nepomuk to convert your data. If you are running KDE trunk you even get a nice progress bar in the notification area (BTW: does anyone know why it won’t show the title?)

And Now?

That is already it. Now you can enjoy the new Virtuoso backend.

The development has taken a long time. But I want to thank OpenLink and especially Patrick van Kleef who helped a lot by fixing the last little tidbits in Virtuoso 5 for my unit tests to pass. Next step is Virtuoso 6.

And Yet Another Post About Virtuoso

Today nearly all problems are solved. OpenLink provided a patch that makes inserting very large literals (more than 1 metabyte in size) lightning fast, even with a very low buffer count. Also I worked around the issue of URI encoding. Now the Soprano Virtuoso backend simply percent-encodes all non-unreserved characters and all reserved characters that are not used in their special meaning in URIs used in queries. Man, that is a mouth full. Well, it seems to work fine although I can always use more testing with weird file URLs (weird means containing weird characters like brackets and the likes). I also fixed some error handling bugs.

So what is left? Well, there are a few hacks in the Virtuoso backend which are rather ugly. One example is the detection of query result types. To determine if the result is boolean, bindings, or a graph it actually checks the name and number of result columns. Urgh! It would be nicer to check for the type of the result. Seems like graph results are BLOBs.

Anyway, enough for tonight. I am tired. Here is the patch to make Virtuoso not hang when Strigi adds nie:PlainTextContent literals of big files:

Index: sqlrcomp.c
===================================================================
RCS file: virtuoso-opensource/libsrc/Wi/sqlrcomp.c,v
retrieving revision 1.9
diff -u -r1.9 sqlrcomp.c
--- sqlrcomp.c  20 Aug 2009 17:47:22 -0000      1.9
+++ sqlrcomp.c  13 Oct 2009 16:11:49 -0000
@@ -65,7 +65,7 @@
 {
 va_list list;
 char temp[2000];
-  int ret;
+  int ret, rest_sz, copybytes;
 va_start (list, string);
 ret = vsnprintf (temp, sizeof (temp), string, list);
 #ifndef NDEBUG
@@ -75,11 +75,16 @@
 va_end (list);
 #ifndef NDEBUG
 if (*fill + strlen (temp) > len - 1)
-    GPF_T1 ("overflow in strncpy");
+    GPF_T1 ("overflow in memcpy");
 #endif
-  strncpy (&text[*fill], temp, len - *fill - 1);
+  rest_sz = (len - fill[0]);
+  if (ret >= rest_sz)
+    copybytes = ((rest_sz > 0) ? rest_sz : 0);
+  else
+    copybytes = ret+1;
+  memcpy (text+fill[0], temp, copybytes);
 text[len - 1] = 0;
-  *fill += (int) strlen (temp);
+  fill[0] += ret;
 }

Virtuoso – for real!

used-bckend-virtuoso

Soprano 2.3.63 – that is the magic version number you need to look out for.

And then once you have updated your kdebase copy to the latest trunk you run your favorite text editor on ~/.kde/share/config/nepomukserverrc. In there you set Soprano Backend=virtuosobackend in the [Basic Settings] section. After that you simply restart Nepomuk as described in the corresponding howto. You can also logout and log back in again but then you won’t be able to provide as nice bug reports.

Once done Nepomuk will convert your database. This can take a loooong time if strigi is enabled. But it will finish. :)

BTW: You need a recent snapshot of Virtuoso 5.0.12 for this to work.

“Are We There Yet?” – The Long Road To a Stable Soprano Virtuoso Backend

The first time I blogged about the Virtuoso backend for Soprano testing it was still a bit bumpy: one had to manually start a server locally and then connect to it. Now the situation has changed. I just commited the changes that allow the Soprano Virtuoso backend to spawn a local instance of the Virtuoso server. Thus, using the Virtuoso backend in Nepomuk is now as simple as specifying it in the Nepomuk server configuration file ~/.kde/share/config/nepomukserverrc as mentioned before:

[Basic Settings]
Configured repositories=main
Soprano Backend=virtuoso
Start Nepomuk=true

(Notice that the backend name is now “virtuoso” without the “backend” suffix. Actually Soprano can now handle both for convinience.)

The backend is still not 100% stable. Some queries still fail due to encoding problems. which may include conversion of the data.

Apart from the spawning of a local instance the backend can now handle indices (for improved query speed) and the state of the full text index. Both are controlled through backend options as explained in the Soprano Virtuoso backend documentation.

Virtuoso Packaging

Now that is established let’s go to a few packaging questions. There were arguments that Virtuoso uses too much disk space. Well, fact is that the Virtuoso server as used by Soprano only needs a very small portion of the whole Virtuoso installation: the “virtuoso-t” binary and the “virtodbc_r.so” ODBC driver. Combined they come to a size of roughly 9 MB. I think that is no big problem (as a comparison the whole Virtuoso installation is about 150 MB). :)

Packagers should split the Virtuoso package into small parts. I recommend to provide a package for the server binary, one for the ODBC drivers, one for the VAD files, and so on. Thus, the harddisk usage will be minimal.

Akonepomuk, Neponadi – Friendly Takeover or Real Love Marriage

Meeting with developers is always great. Not only does one get challanged and dragged into interesting discussions, often the result is an actual plan or a concrete idea. In this case my trip to Oslo and the vistit of the Trolltech (ah, sorry, Nokia) offices resulted in at least three results: 1. a lively discussion about the possibilities that the Semantic Desktop could offer (including a promise from a bunch of developers to ad wishes describing these ideas into the KDE bug database); 2. the plan for a Nepomuk developer sprint in the next months (more about that in the next days), 3. a very very fruitful discussion with Tobias König (currently doing an internship at Nokia) about the integration of Nepomuk into Akonadi or vice versa. Here I will present the results of the latter discussion.

The current situation

Currently (meaning: in KDE 4.2) both Akonadi and Nepomuk use their own databases: Akonadi starts its own mysql server and Nepomuk runs a java based RDF backend. Many people have problems with both mysql and java. I personally can understand the latter. In any case this separation of data means that at least parts of Akonadi data needs to be mirrored in Nepomuk. Otherwise we cannot search for PIM data, we cannot link to PIM items, and we cannot create relations between PIM items.

This mirroring of data is an ugly thing since it means that we convert the data in Akonadi (stored as VCards, ICal, or other formats) into RDF graphs. This is currently done by two Akonadi agents which can be found in the kdepim package. Whenever data in Akonadi changes, the data in Nepomuk has to be synced. I personally can see no advantage of this situation.

A possible solution worth discussing

The possible solution Tobias and myself came up with looks as follows:

Akonadi and Nepomuk will share one database – a Virtuoso SQL/RDF server. (For a discussion of the new Virtuoso support in Soprano see my last blog entry.) This is achieved as follows: the current database schema of Akonadi will be kept except for the parts tables which at the moment store the actual mime data of the items. Instead the item table will get a new column which points to an RDF resource in Virtuoso’s own RDF tables. The RDF resource will them represent the item in a real object-oriented form: contacts are encoded using the NCO Ontology, emails are encoded using the NMO Ontology, and so on. Akonadi will then use an RDF encoding (we propose turtle) for all data serialization. The advantages are numerous:

  1. All PIM data does exist as nicely encoded semantic data in Nepomuk which can directly be used to relate to and from.
  2. No syncing is necessary.
  3. Serialization plugins for Akonadi can be created automatically from an ontology using a code generator. Thus, an application developer who wants to store data in Akonadi only needs to create their own ontology. A task that is necessary for Nepomuk support anyway. And let’s face it: this is what each application should include in the future ;).
  4. Only one database server running: Virtuoso.
  5. Simpler mapping from database objects to convenience objects in C++: the data in Nepomuk is already represented using an object-oriented approach.

The Virtuoso server could be started using the same “process-singleton” approach libakonadi is currently using to start Akonadi if it is not running. This would keep both Nepomuk and Akonadi free of dependancies on each other.

Possible Problems

The one possible problem remaining might be performance: It stays to be tested if reading a vcard from SQL and parsing it is much faster than querying an NCO contact resource. Volker, we need your help. Plus: I see you guys at the KDE-PIM meeting in Berlin. Lots to do. ;)