Akonadi, Nepomuk, and A Lot Of CPU

One Bug has been driving people crazy. This is more than understandable seeing that the bug was an endless high CPU usage by Virtuoso, the database used in Nepomuk. Kolab Systems, the Free Software groupware company behind Kolab, a driving force behind Akonadi, sponsored me to look into that issue.

Finding the issue turned out to be a bit harder than I thought, coming up with a fix even more so. In the process I ended up improving the Akonadi Nepomuk Email indexer/feeder in several places. This, however useful and worthwhile, turned out to be unrelated to the high CPU usage. Virtuoso was not to blame either. In the end the real issue was solved by a little SPARQL query optimization.

Application developers against Akonadi and Nepomuk might want to keep that in mind: The way you build your queries will have dramatic impact on the performance of the whole system. So this is also where opimizations are likely to have a lot of impact in case people want to help improve things further. Discussing query design with the Nepomuk team or on the Virtuoso mailing list can go a long way here.

So thanks to the support from Kolab Systems, Virtuoso is no longer chewing so much CPU, and Akonadi Email indexing will work a lot smoother with KDE 4.8.2.

Update on Bugs And Stuff

There was not that much activity last week. That is simply because I took a few days off to spend time with my family on a farm – mostly so that my daughter could ride a pony every day.

Still, there are a few words I can say regarding Nepomuk bugs: 4 crashes have been reported for KDE 4.7.3. One of them I already fixed, one has a patch which is awaiting testing, one is very confusing and made me contact the mighty Dario for help, and the last one is the akonadi feeder which still has a memory leak. Apart from that blogging about my inotify usage had a very nice side-effect: Marcel Wiesweg from Digikam wants to use KInotfy and stumbled upon a serious bug which for some stupid reason I never caught. That bug means that files with umlauts and friends in their paths never keep their meta-data when moved. Urgh!

So KDE 4.7.4 will again contain a bunch of fixes. The hunt is not over yet. At least now all bugs are nicely categorized which makes triaging much easier.

And in other news: thank you so much for the amazing fundraiser. Only about 800€ to the goal which I feared was reaching for the stars. This is given me the energy to go on and try to find the required funding. No one stepped up so far but I am still hopeful as some are still “in discussion”. Keep your fingers crossed (and send ideas my way).

Click here to lend your support to: Nepomuk - The semantic desktop on KDE and make a donation at www.pledgie.com !

Aggregating Nepomuk

Recently there have been some posts on Nepomuk in KDE. Tobias König blogged about how to Pimp my Nepomuk. He explains how for many users redland is still the default backend and how to change that. He gives the most important pointers on how to enable the java-based sesame2 Soprano backend. Thomas McGuire gives a very good introduction into what Soprano, Nepomuk, Strigi, and Akonadi are and how they relate. This was a much needed post. Thank you for that, Thomas! And finally mat69 gives his ideas on how to improve the desktop search experience with Nepomuk. He has some good ideas that should really be implemented.

Can somebody please tell me how to get 40 hours out of the work-day? That would really help! ;)

Akonepomuk, Neponadi – Friendly Takeover or Real Love Marriage

Meeting with developers is always great. Not only does one get challanged and dragged into interesting discussions, often the result is an actual plan or a concrete idea. In this case my trip to Oslo and the vistit of the Trolltech (ah, sorry, Nokia) offices resulted in at least three results: 1. a lively discussion about the possibilities that the Semantic Desktop could offer (including a promise from a bunch of developers to ad wishes describing these ideas into the KDE bug database); 2. the plan for a Nepomuk developer sprint in the next months (more about that in the next days), 3. a very very fruitful discussion with Tobias König (currently doing an internship at Nokia) about the integration of Nepomuk into Akonadi or vice versa. Here I will present the results of the latter discussion.

The current situation

Currently (meaning: in KDE 4.2) both Akonadi and Nepomuk use their own databases: Akonadi starts its own mysql server and Nepomuk runs a java based RDF backend. Many people have problems with both mysql and java. I personally can understand the latter. In any case this separation of data means that at least parts of Akonadi data needs to be mirrored in Nepomuk. Otherwise we cannot search for PIM data, we cannot link to PIM items, and we cannot create relations between PIM items.

This mirroring of data is an ugly thing since it means that we convert the data in Akonadi (stored as VCards, ICal, or other formats) into RDF graphs. This is currently done by two Akonadi agents which can be found in the kdepim package. Whenever data in Akonadi changes, the data in Nepomuk has to be synced. I personally can see no advantage of this situation.

A possible solution worth discussing

The possible solution Tobias and myself came up with looks as follows:

Akonadi and Nepomuk will share one database – a Virtuoso SQL/RDF server. (For a discussion of the new Virtuoso support in Soprano see my last blog entry.) This is achieved as follows: the current database schema of Akonadi will be kept except for the parts tables which at the moment store the actual mime data of the items. Instead the item table will get a new column which points to an RDF resource in Virtuoso’s own RDF tables. The RDF resource will them represent the item in a real object-oriented form: contacts are encoded using the NCO Ontology, emails are encoded using the NMO Ontology, and so on. Akonadi will then use an RDF encoding (we propose turtle) for all data serialization. The advantages are numerous:

  1. All PIM data does exist as nicely encoded semantic data in Nepomuk which can directly be used to relate to and from.
  2. No syncing is necessary.
  3. Serialization plugins for Akonadi can be created automatically from an ontology using a code generator. Thus, an application developer who wants to store data in Akonadi only needs to create their own ontology. A task that is necessary for Nepomuk support anyway. And let’s face it: this is what each application should include in the future ;).
  4. Only one database server running: Virtuoso.
  5. Simpler mapping from database objects to convenience objects in C++: the data in Nepomuk is already represented using an object-oriented approach.

The Virtuoso server could be started using the same “process-singleton” approach libakonadi is currently using to start Akonadi if it is not running. This would keep both Nepomuk and Akonadi free of dependancies on each other.

Possible Problems

The one possible problem remaining might be performance: It stays to be tested if reading a vcard from SQL and parsing it is much faster than querying an NCO contact resource. Volker, we need your help. Plus: I see you guys at the KDE-PIM meeting in Berlin. Lots to do. ;)