Akonepomuk, Neponadi – Friendly Takeover or Real Love Marriage

Meeting with developers is always great. Not only does one get challanged and dragged into interesting discussions, often the result is an actual plan or a concrete idea. In this case my trip to Oslo and the vistit of the Trolltech (ah, sorry, Nokia) offices resulted in at least three results: 1. a lively discussion about the possibilities that the Semantic Desktop could offer (including a promise from a bunch of developers to ad wishes describing these ideas into the KDE bug database); 2. the plan for a Nepomuk developer sprint in the next months (more about that in the next days), 3. a very very fruitful discussion with Tobias König (currently doing an internship at Nokia) about the integration of Nepomuk into Akonadi or vice versa. Here I will present the results of the latter discussion.

The current situation

Currently (meaning: in KDE 4.2) both Akonadi and Nepomuk use their own databases: Akonadi starts its own mysql server and Nepomuk runs a java based RDF backend. Many people have problems with both mysql and java. I personally can understand the latter. In any case this separation of data means that at least parts of Akonadi data needs to be mirrored in Nepomuk. Otherwise we cannot search for PIM data, we cannot link to PIM items, and we cannot create relations between PIM items.

This mirroring of data is an ugly thing since it means that we convert the data in Akonadi (stored as VCards, ICal, or other formats) into RDF graphs. This is currently done by two Akonadi agents which can be found in the kdepim package. Whenever data in Akonadi changes, the data in Nepomuk has to be synced. I personally can see no advantage of this situation.

A possible solution worth discussing

The possible solution Tobias and myself came up with looks as follows:

Akonadi and Nepomuk will share one database – a Virtuoso SQL/RDF server. (For a discussion of the new Virtuoso support in Soprano see my last blog entry.) This is achieved as follows: the current database schema of Akonadi will be kept except for the parts tables which at the moment store the actual mime data of the items. Instead the item table will get a new column which points to an RDF resource in Virtuoso’s own RDF tables. The RDF resource will them represent the item in a real object-oriented form: contacts are encoded using the NCO Ontology, emails are encoded using the NMO Ontology, and so on. Akonadi will then use an RDF encoding (we propose turtle) for all data serialization. The advantages are numerous:

  1. All PIM data does exist as nicely encoded semantic data in Nepomuk which can directly be used to relate to and from.
  2. No syncing is necessary.
  3. Serialization plugins for Akonadi can be created automatically from an ontology using a code generator. Thus, an application developer who wants to store data in Akonadi only needs to create their own ontology. A task that is necessary for Nepomuk support anyway. And let’s face it: this is what each application should include in the future ;).
  4. Only one database server running: Virtuoso.
  5. Simpler mapping from database objects to convenience objects in C++: the data in Nepomuk is already represented using an object-oriented approach.

The Virtuoso server could be started using the same “process-singleton” approach libakonadi is currently using to start Akonadi if it is not running. This would keep both Nepomuk and Akonadi free of dependancies on each other.

Possible Problems

The one possible problem remaining might be performance: It stays to be tested if reading a vcard from SQL and parsing it is much faster than querying an NCO contact resource. Volker, we need your help. Plus: I see you guys at the KDE-PIM meeting in Berlin. Lots to do. ;)


34 thoughts on “Akonepomuk, Neponadi – Friendly Takeover or Real Love Marriage

  1. “Akonadi and Nepomuk will share one database – a Virtuoso SQL/RDF server. (For a discussion of the new Virtuoso support in Soprano see my last blog entry.) ”

    The extra integration and reduced redundancy is great, but at 80MB overhead for Virtuoso (from your last post), won’t this penalise people who want to run Akonadi (which in the future will equate to “people who want to use pretty much any KDE PIM application”) but not have the overhead of Virtuoso? The existing MySQL server doesn’t use anywhere near this much RAM.

    Also, isn’t Akonadi targetting the n810, with its scant 128MB RAM?

    Anyway, hope this doesn’t sound like too much of a downer – I’m excited by the possibilities of Akonadi and Nepomuk :)

    • Well, this is just the beginning. Who knows how small we can get the memory footprint…
      Plus: I just checked and even dbus-daemon uses more memory than that. Don’t get me started on Firefox and Konqueror. ;)

  2. Similarly, while doing this redesign, it would make sense to consider the needs of Amarok, Digikam and others. At some point those programs should probably start using the Nepomuk store for their metadata. Even if that does not happen right now, it would be prudent to at least think about what, if any, requirements that would place on Nepomuk, and what overlap there might be with the PIM case. The Amrok/Digikam case should be much simpler than the PIM case, since it is more similar to what Nepomuk was designed to do. Identical? Maybe, maybe not.

  3. “Well, this is just the beginning. Who knows how small we can get the memory footprint…”

    Ok, that’s cause for optimism :)

    “Plus: I just checked and even dbus-daemon uses more memory than that. Don’t get me started on Firefox and Konqueror. ;)”

    Erm … can I ask how you’re measuring the memory usage, here? That seems extremely high :)

  4. Not to discourage you but I’m wondering if all this Akonadi Nepomuk Soprano Virtuoso mumbo jumbo is of any real use to the average user. I’m sure hundreds if not thousands of KDE user would much rather want a working port of K3B for KDE4… *sigh*

  5. dbus-daemon using more than 80MB? That seems quite high….

    On my laptop (4.2, Kubuntu) dbus-daemon is running twice, as root and as my user. Combined they are only using 2.5MB of ram. (And they might share code, this is from KDE’s system activity alt-esc window).

  6. “Plus: I just checked and even dbus-daemon uses more memory than that. Don’t get me started on Firefox and Konqueror. ;)” ???? really ?????

    Re: DigiKam and Amarok, I think both of those projects have said that their reason for not using Nepomuk is speed. I’m interested to hear whether Virtuoso is sufficient to remedy this and whether these apps are likely to migrate if Virtuoso becomes the default for Nepomuk.

  7. “Grendel: no, you may not ask. It will only make me look stupid. I can feel it! ;) Well, I used htop. It is very well possible that I am reading the wrong values.”

    Hehe :) VIRT is more or less useless; RES is probably the most accurate.


    “Not to discourage you but I’m wondering if all this Akonadi Nepomuk Soprano Virtuoso mumbo jumbo is of any real use to the average user. I’m sure hundreds if not thousands of KDE user would much rather want a working port of K3B for KDE4… *sigh*”

    I’m very much looking forward to seeing Nepomuk and Akonadi integrated ubiquitously – and besides, Nepomuk is what Sebastien is paid to work on. If it wasn’t for Nepomuk, it’s possible that we would see much less output from him, which would be a shame.

  8. I certainly like the idea of removing the data duplication, which indeed is a pain if you have multiple gigabytes of mail.

    Looks like this will also solve the Akonadi search problem, as it would bring semantic search support to Akonadi, as Akonadi can just query Virtuoso. This is great.

    But, with my mail hat on, I have one concern: For mail clients, it is important that the mails they put into Akonadi come back exactly the same, i.e. no changes in the message structure, like subtle encoding changes or changes in the MIME structure.

    Currently, NMO looks like it will not do that at all, it says things like “Plain text representation of the body of the message. For multipart messages, all parts are concatenated into the value of this property.”
    Looks like NMO only knows about HTML and plain text parts, and that it doesn’t store the proper MIME structure of the mail, which is essential. Also, think about signed messages or message attachments.

    So how would we deal with that? Define a new ontology that is much more fine grained than the current one and fully supports MIME? That sounds like an error-prone thing to me, but nevertheless possible.
    Or did I miss something here?

  9. @ fish: PIM is one of the main things people use computers for nowadays, and so having the most advanced and integrated PIM system in the world will be a key selling point for KDE (particularly to businesses, I would expect), especially if it integrates niceley with their existing systems (like Exchange). Having a strong desktop search system integrated into KDE is important for competing in today’s computer world, with Windows, Mac, Gnome, and Google all having it, and being able to search your PIM programs with it a key feature. Therefore, making them work well together and increasing their overall performance is also very important.

  10. My 10 minutes impression on virtuoso is so far that it is a fragile piece of shit, and I’m still at the build system.

    I don’t think it is a good idea as it is now. maybe in a year when virtuoso get time to try to grow up.

  11. Hmmmm…i don’t know much about SPARQL or rdf ontologies but…wouldnt it be better to use one mysql server and share it between amarok, akonadi, nepomuk an so on?
    Is it impossible to map that data to SQL?
    Why not write an abstraction layer to query and store the nepomuk stuff in a mysql db?
    As mentioned before I don’t have the complete idea what you will do with nepomuk in the future but mysql is really fast when it comes to big relational queries like tag models or variable data models with 1:n field relations.
    Perhaps its possible to use one layer of virtuoso and put it on top of mysql?

  12. I’m worried about two things here:

    1. That you seem to be skirting around the issue of memory footprint when KDE is supposed to be portable to many platforms (even n810’s and cellphones). Memory footprint is paramount. Don’t keep bringing up other projects that suck at memory management (firefox, konqueror), just make sure that Akonadi/Nepomuk doesn’t.

    2. Virtuoso is largely untested. What about performance/speed? It needs to fly. And for it to be tested properly, it needs to be managing a huge amount of diverse data. MySQL is tried and tested. When using huge databases, MySQL starts to shine – does Virtuoso?

    I’m sure you’re already thinking of this stuff. But please, please make sure you put a lot of emphasis on these two things.

  13. When I read about the first sketches for Akonadi I didn’t really understand why they were doing all that at all – when at the same time there was Nepomuk building a universal metadata system already. Why didn’t they consider using that right from the start?
    So yeah, I’m all for the integration of the two!

    I’m currently using Virtuoso as an RDF store for a project myself and I really like its list of features. Data integratrion is their main goal after all.
    Btw, it lets you create RDF views on every table, so life conversion of fixed schema data into RDF and querying it is really simple.

  14. “Only one database server running: Virtuoso.”
    If you’re not running any other database server of course… In relation to the choice of MySQL for Akonadi I read some comments about the choice for this server which is forced upon the user. Wouldn’t it be an idea to implement an ORM which other KDE apps could use as well?? Just an idea…

  15. Just in addition to my last comment:

    I believe 80Mb is unacceptable. If you’re going to be marrying Akonadi and Nepomuk then that means that if you have a footprint of anything over 20Mb then you’ll be destroying our chance to bring things like PIM to small form-factor devices. And that would be a crying shame!

  16. What’s wrong with SQLite databases for everything? It has nearly no memory overhead and is quite fast nowadays.

    Also, Virtuoso is licensed GPL, which would taint kdelibs’ LGPL.

  17. @ fish: I have a feeling that all the Nepomuk/Akonadi/Soprano/Viruoso mumbo jumbo will be an amazingly useful thing once it is all complete and working together. Until all the applications are taking advantage of these technologies, no they don’t mean much to the average user, but when they are all doing their thing in the background (without even being noticed by the average user) it will become a lot more obvious what the benefits are.

    Having said that a KDE4 port of K3B would be great too – i wonder who will step up and do it? (maybe someone already has, i haven’t checked for a while). I guess some people are too busy working on other things…

  18. That sounds amazazazaing! Sebastian, keep up the great work.

    I have to say I highly believe in the merits of Nepomuk. From a user’s POV, the one point where Nepomuk is lacking most currently (4.2, that is) is integration, integration, integration, I think. So it is amazing to see you are working on this and collaborating extensively with other parts of the infrastructure.

    I think the better Nepomuk is integrated into applications and other data sources (like KDEPIM and Amarok), the more data it can gather and the more useful it will get. The other thing lacking currently is actually presenting the gathered data to the user, but there is also work going on there, e.g. the plasmoid Sebas is working on, and hopefully we’ll also get better integration into e.g. dolphin.

    I think a ‘turning point’ will be reached when applications start using nepomuk to present their own data: E.g. digikam sorting and filtering based on nepomuk ratings and categories, or kmail filtering by tags set via nepomuk. That would provide true integration between applications, the ultimate death to app-lock-in! Yay.

  19. Thomas: as for the ontology: NMO needs improvement and work on that is underways. The mime problem is already discussed. I also hope to get more input from you KDE-PIM guys on that. Discussions are done in the Nepomuk ontology bug tracker:

    Michael: writing a layer for MySQL – be my guest. That is A LOT of work.

    Socceroos: I agree that the arguments are bad. The only good argument is that using Virtuoso instead of sesame2 brings down the mem usage a lot. So it is a step in the right direction. Might not be the final perfect solution though.
    Virtuoso is not largely untested. It is not something that just popped up. It is a mature DB system that is used in many commercial environments and many Semantic Web systems such as DBPedia for example.

    Simon: sounds like you already have some experience. Maybe you could assist in the integration tests?

    Lucien: GPL is no problem as Virtuoso is started in another process and accessed via a plugin to Soprano. So 2 layers of no-licensing-problem.
    As for SQLLite: way too slow, does not scale enought, and again the necessary implementation of an RDF layer on top which is way too much work ATM.

    Frando: it is really nice noticing that there are people that understand the goal of Nepomuk, even though I tend to give bad explanations. :)

  20. Elias: yes, thanks a lot. Interesting and necessary read. However, it sadly does not change the fact that Virtuoso in lite mode (at least the beta I have running) uses ~ 80M. But I did not do any optimizations in settings yet. So there is hope…

  21. I’m just a KDE user, but I think what has to be considered (at least) is a data access API/layer that is able to work with multiple data backends (possibly even simultaneously), and provide this API to any application willing to use it. Even if just one storage backend is implemented at the beginning there could still be an option to implement more later without changes to the user applications.

    So, in an ideal world, if I (OEM, advanced user, etc.) want to run apps XYZ on a low-powered phone with a very small dataset, I could use sqlite backend; if I wanted to set up an akonadi/nepomuk/amarok server for 1000s of users, I could use Oracle, and so on.

  22. Sune:
    You seem to base your impression of Virtuoso entirely on the build system. Consider that it started life as a commercial product (which is now mature and somewhat widely used), and commercial products are usually distributed in binary form only. Only the original developers have had to deal with build system for the most time, so they had no need to polish it: no value for customers.
    I think that this is likely to change, if only because open source communities will be able to contribute fixes.

  23. please get rid of the mysql dep! i tried upgrading my kubuntu install to 4.2, and it broke my specially configured (manually installed) mysql, which i need for work.

  24. Virtuoso rocks my socks, I finally managed to package it and make it work. On my box, it never grows over 35.7MB in RAM (version 5.0.11) and CPU usage while indexing stays under 25%. And shit, it’s fast. Way to go at once, even if the buildsystem is friggin’ scary.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s