Virtuoso 6.1.6 and KDE 4.9

Shortly after KDE 4.9 hits the net Virtuoso 6.1.6 follows. Virtuoso 6.1.6 comes with a ton of fixes, improvements and optimizations and it is highly recommended to update for the best Nepomuk experience.

Virtuoso 6.1.6 has been tested by the Nepomuk team in cooperation with OpenLink Software before its release. It is the recommended release for Nepomuk. This is not only true for KDE 4.9 but for any version before it.

Get the sources while they are hot and build your packages.

Nepomuk Tasks: KActivityManager Crash

After a little silence during which I was occupied with Eastern and OpenLink related work I bring you news about the second Nepomuk task: the KActivityManager crash.

Ivan Cukic already “fixed” the bug by simply not using Nepomuk but an SQLite backend (at least that is how I understood it, correct me if I am wrong). However, I wanted to fix the root of the original problem.

Soprano provides the communication channel between Nepomuk and its clients. It is based on a very simple custom protocol going through a local socket. So far QLocalSocket, ie. Qt’s implementation was used. The problem with QLocalSocket is that it is a QObject. Thus, it cannot live in two threads at the same time. The hacky solution was to maintain one socket per thread. Sadly that resulted in complicated maintenance code which was impossible to get right. Hence crashes like #269573 or #283451 (basically any crash involving The Soprano::ClientConnection) were never fixed.

A few days ago I finally gave up and decided to get rid of QLocalSocket and replace it with my own implementation. The only problem is that in order to keep Windows compatibility I had to keep the old implementation around by adding quite a lot of #ifdefs.

And now I could use some testers for a Soprano client library that does only create a single connection to the server instead of one per thread. I already pushed the new code into Soprano’s git master. So all you need to do is run KDE on top of that.

Oh, and while at it I finally fixed the problem with re-connecting of clients. So now a restart of Nepomuk will no longer leave the clients with dangling connections, unable to perform queries. That fix, however, is in kdelibs.

Well, the day was long, I am tired, and this blog post feels a little boring. So before in addition to that it gets too long I will stop.

Nepomuk Tasks: Let The Virtuoso Inferencing Begin

Only four days ago I started the experiment to fund specific Nepomuk tasks through donations. Like with last year’s fundraiser I was uncertain if it was a good idea. That, however, changed when only a few hours later two tasks had already reached their donation goal. Again it became obvious that the work done here is appreciated and that the “open” in Open-Source is understood for what it actually is.

So despite my wife not being overly happy about it I used the weekend to work on one of the tasks: Virtuoso inferencing.

Inference?

As a quick reminder: the inferencer automatically infers information from the data in the database. While Virtuoso can handle pretty much any inference rule you throw at it we stick to the basics for now: if resource R1 is of type B and B derives from A then R1 is also of type A. And: if R1 has property P1 with value “foobar” and P1 is derived from P2 then R1 also has property P2 with value “foobar“.

Crappy Inference

This is already very useful and even mandatory in many cases. Until now we used what we called “crappy inferencing 1 & 2”. The Crappy inferencer 1 was based on work done in the original Nepomuk project and it simply inserted triples for all sub-class and sub-property relations. That way we could simulate real inference by querying for something like

select * where {
  ?r ?p "foobar" . 
  ?p rdfs:subPropertyOf rdfs:label .
}

and catch all sub-properties of rdfs:label like nao:prefLabel or nie:title. While this works it means bad performance, additional storage and additional maintenance.

The Crappy Inferencer 2 was even worse. It inserted rdf:type triples for all super-classes. This means that it would look at every added and removed triple to check if it was a rdf:type triple. If so it would add or remove the appropriate rdf:type triples for the super-types. That way we could do fast type queries without relying on the crappy inferencer 1 which relies on the rdfs:subClassOf method. But this meant even more maintenance and even more storage space wasted.

Introducing: Virtuoso Inference

So now we simply rely on Virtuoso to do all that and it does such a wonderful job. Thanks to Virtuoso graph groups we can keep our clean ontology separation (each ontology has its own graph) and still stick to a very simple extension of the queries:

DEFINE input:inference <nepomuk:/ontographgroup>
select * where {
  ?r rdfs:label "foobar" .
}

Brilliant. Of course there are still situations in which you do not want to use the inferencer. Imagine for example the listing of resource properties in the UI. This is what it would look like with inference:

We do not want that. Inference is intended for machine, not for the human, at least not like this. So since back in the day I did not think of adding query flags to Soprano I simply introduced a new virtual query language: SparqlNoInference.

Resource Visibility

While at it I also improved the resource visibility support by simplifying it. We do not need any additional processing anymore. This again means less work on startup and with every triple manipulation command. Again we save space and increase performance. But this also means that resource visibility filtering will not work as before anymore. Nepoogle for example will need adjustment to the new way of filtering. Instead of

?r nao:userVisible 1 .

we now need

FILTER EXISTS { ?r a [ nao:userVisible "true"^^xsd:boolean ] }

Testing

The implementation is done. All that rests are the tests. I am already running all the patches but I still need to adjust some unit tests and maybe write new ones.

You can also test it. The code changes are, as always, spread over Soprano, kdelibs and kde-runtime. Both kdelibs and kde-runtime now contain a branch “nepomuk/virtuosoInference”. For Soprano you need git master.

Look for regressions of any kind so we can merge this as soon as possible. The goal is KDE 4.9.

Akonadi, Nepomuk, and A Lot Of CPU

One Bug has been driving people crazy. This is more than understandable seeing that the bug was an endless high CPU usage by Virtuoso, the database used in Nepomuk. Kolab Systems, the Free Software groupware company behind Kolab, a driving force behind Akonadi, sponsored me to look into that issue.

Finding the issue turned out to be a bit harder than I thought, coming up with a fix even more so. In the process I ended up improving the Akonadi Nepomuk Email indexer/feeder in several places. This, however useful and worthwhile, turned out to be unrelated to the high CPU usage. Virtuoso was not to blame either. In the end the real issue was solved by a little SPARQL query optimization.

Application developers against Akonadi and Nepomuk might want to keep that in mind: The way you build your queries will have dramatic impact on the performance of the whole system. So this is also where opimizations are likely to have a lot of impact in case people want to help improve things further. Discussing query design with the Nepomuk team or on the Virtuoso mailing list can go a long way here.

So thanks to the support from Kolab Systems, Virtuoso is no longer chewing so much CPU, and Akonadi Email indexing will work a lot smoother with KDE 4.8.2.

Nepomuk Tasks – Sponsor a Bug or Feature

Thanks to a very successful fundraiser in 2011 I was able to continue working on Nepomuk and searching for new enterprise sponsoring. Sadly that search was not fruitful and in 2012 Nepomuk has become a hobby. Several people proposed to start another fundraiser or try to raise money on a monthly basis. I, however, will try to get sponsoring for specific bugs or features. Depending on their size the sponsoring goal will differ. This would allow me to keep working on Nepomuk as more than a hobby.

The Nepomuk Tasks page lists the current tasks that can be sponsored. Of course you can propose new tasks but I will try to keep the list of current tasks small. Donate to the tasks you would like to see finished, ignore the ones you do not deem important. I will simply remove tasks if there is no activity within a certain period of time. So please have a look at


The Nepomuk Tasks Page

Nepomuk Gives Back Your CPU Cycles…

…at least partially. With the introduction of the Data Management Service we got a very powerful way to put your data into Nepomuk: storeResources (thank you Vishesh). I will not go into the details here but the important fact is that the file indexer, the Akonadi indexer, the TV namer, the movie integration, and maybe some more rely on this method.

Thus, it is obvious that improving its performance means improving the performance of the overall system in general.

Now like I said it is a very powerful method. Sadly this results in very complex code that is not easy to wrap your head around. So optimizing it is not trivial. I tried anyway and tackled one of the main parts of it: the resource identification. After playing around a little, trying different ways to break up the main query I got to a point where I am satisfied. And here is why:

Total time spent in the resource identification code for 196692 indexed Akonadi items.

Average time spent in the resource identification code for 196692 indexed Akonadi items.

Finally some nice bar graphs! So what does this tell us? Well, the most important bit is that with my patch we save roughly 3 hours when indexing the 196692 Akonadi items I used for testing. But maybe more importantly: if the identification is faster the whole indexing is smoother and eats less CPU since it is throttled and, thus, has shorter phases of high CPU usage.

I took the Akonadi indexing as an example since the difference here is impressive. The file indexing is also slightly improved but by far not as significant as the Akonadi indexing. I suppose this is due to the very simply identification that is required during file indexing (some very basic nco:Contact merging is all that is required most of the time).

So, this is what you will get with KDE 4.8.2. One of several going away presents…

Rehashing Download Metadata

Recently Martin Klapatek created the nice Firefox add-on Firemuk which stores download meta-data in Nepomuk. Vishesh Handa wrote the corresponding Nepomuk stub which does the actual work of pushing the data to Nepomuk. This is really great. Of course I already installed it.

But it made me think of some obviously forgotten Nepomuk API which I introduced in August of 2010: Nepomuk::Utils::createCopyEvent. It essentially does the exact same thing as Vishesh’s NADM, only as a library call.

Konqueror had support for download meta-data for a while now. If you want to know more about it read my blog entry Remembering downloads via Nepomuk.

Finally – for sentimental reasons – an image…

Season Posters Anyone?

I simply cannot stop playing around with TV Shows and Nepomuk. We already had posters for the series but not for the seasons. Well, we do now:

Fun, isnt’ it? But sadly it required an improved libtvdb and additions to SDO. The former is already in git master. But the SDO changes are a bit experimental. That is why I put them into branch nmm/banners. Also due to both these requirements not being that easy to install I put the improved season handling of nepomuktvnamer into branch seasonResources.

So in order to try this yourself you need to get libtvdb git master and the mentioned branches of SDO and nepomuktvnamer. But do not worry, I am pretty sure that I will get the SDO changes merged soon. Then this will become easier.