Portable Meta-Information Yet Again (Only this time there is code!)

There has been quite some discussion about portable meta data lately. David Nolden blogged about the usage of sidecar-files, Jos van den Oever answered and many many people commented. So I felt forced to also give my 2 cents. The only difference is: I wrote some code. To be presise I wrote a Nepomuk service.

But first things first: what is my opinion on the matter? Obviously I think sidecar files are no solution. For one they only provide a means to store simple key/value pairs (which would throw us back into the meta data stone age) and no way to link between files. Secondly, Nepomuk aims to store way more than just file meta data. Nepomuk also stores meta data about emails, about persons, about projects, it even stores resources that do not exists anywhere else (Yes, the idea is that you create entities in Nepomuk and only there. An example would be the city Berlin which has a link to its wikipedia page). Last but not least, we want to query the vast graph of meta data which is impossible using sidebar files (but Jos and others already mention that).

Thus, we need a different solution. We need to maintain the central Nepomuk store but it alone does not seem to solve the problem. So I came up with an idea (only to be pointed to the Metadata on Removable Devices on gnome.live idea a few hours later. Looks very similar indeed).

Basically instead of having sidecar files all around, each removable storage has one meta data file (stored in some obscure location like .cache/metadata/nepomuk.turtle). This file contains all meta data related to the files stored on the device (excluding the information which can be extracted by strigi) and the very basic information about resources the files are related to (if for example a file is related to a project we only store the project’s type and its label). File paths are saved relative to the storage root. Example:

<file:/Pictures/IMG_0012.jpg>
    a nfo:FileDataObject ;
    nfo:fileUrl <file:/Pictures/IMG_0012.jpg> ;
    nao:hasTag <nepomuk:/tags/Summer08> ;
    nao:relatedTo <nepomuk:/KDE> .

<nepomuk:/KDE>
    a pimo:Project ;
    nao:prefLabel "KDE" .

<nepomuk:/tags/Summer08>
    a nao:Tag ;
    nao:prefLabel "Summer 09" .

Once the device is mounted, this information is imported into the local Nepomuk store (using a temporary graph), relative URLs are replaced with absolute ones according to the mount point. This allows to search these files like any other local files. Now if the meta data changes it needs to be written back to the cache file. Again absolute URLs are converted back to relative ones. This way the data can simply be reused on any Nepomuk-enabled system.

Now the service I implemented (which to date can be found in the Nepomuk playground) does the importing automatically. However, the writing back of data has to be triggered manually. Here an integration with KIO would be necessary.

IMHO this is already a nice start. However, a few things are still not solved:

  • As mentioned data is not written back automatically. KIO should somehow trigger that.
  • libnepomuk is not aware of removable devices yet and thus, the data will be written twice: once in the local store and once in the cache file. I see two solutions: 1. delete all traces of the meta data locally and only keep the cache file or 2. use relative URLs locally, too, and link the file resources to some volume resource that describes the removable storage. The latter solution would also allow to find files that are on storages not currently mounted.

Well, that is it for now I think. The service is there, it works and at least gives an idea of a solution. If anyone is up to the task to perfect it, please step up. :)