It is a sad truth: when it comes to creating data for the Nepomuk semantic desktop there are a million ways to do it wrong and basically only one way to get it right. Typically people will choose from the first set of ways. While that is of course bad they are not to blame. Who wants to read page after page of documentation and reference guide? Who wants to dive into the depth of RDF and all that ontology stuff when they just need to store a note (yes, this blog was inspired by a real problem). Nobody – that’s who! Thus, the Nepomuk API should do most of the work. Sadly it does not. It basically allows you to do everything. Resource::setProperty will happily use classes or even invalid URLs as properties without giving any feedback to the developer. Why is that? Well, I suppose there are at least three reasons: 1. Back in the day I figured people would almost always use the resource-generator to create their own convenience classes which handle the types and properties properly, 2. The Resource class is probably the oldest part of the whole Nepomuk stack, and 3. basic lack of time, drive and development power.
So what can we do about this situation? Vishesh and me have been discussing the idea of a central DBus API for Nepomuk data management a million times (as you can see today “a million” is my goto expression when I want to say “a lot”). So far, however, we could not come up with a good API that solves all problems, is future-proof (to a certain extend), and performs well. That did not change. I still do not know the solution. But I have some ideas as to what the API should do for the user in terms of data integrity.
- Ensure that only valid existing properties are used and provide a good error message in case a class or an invalid URL or something non-existing is used instead. This would also mean that one could only use ontologies that have been imported into Nepomuk. But since the ontology loader already supports fetching ontologies from the internet this should not be a big problem.
- Ensure that the ranges of the properties are honoured. This is pretty straight-forward for literal ranges. In that case we could also do some fancy auto-conversion to simplify the usage but in essence it is easy. The case of a non-literal ranges is a bit more tricky. Do we want to force proper types or do we assume that the object resource has the required type? I suppose flags would be of use:
- ClosedWorld – It is required that the object resource has the type of the range. If it has not the call fails.
- OpenWorld – The object resource will simply get the range type. This is not problem since resources can be of several types.
This would also mean that each property needs to have a properly defined range. AFAIK this is currently not the case for all NIE ontologies. I think it is time for ontology unit tests!
- Automatically handle pimo:Things to a certain extend: Here I could imagine that trying to add a PIMO property on a resource would automatically add it to the related pimo:Thing instead.
Moving this from the client library into a service would have other benefits, too.
- The service could be used from other languages than C++ or even from applications not using KDE.
- The service could perform optimizations when it comes to storing triples, updating resources, caching, you name it.
- The service could provide change notifications which are much more useful than the Soprano::Model signals which are pretty useless.
- The service could perform any number of integrity tests before executing the actual commands on the database, thus improving the quality of the data in Nepomuk altogether.
This blog entry is not about presenting the one solution to solve all our problems. It is merely a brain-dump, trying to share some of the random thoughts that go through my head when taking a walk in the woods. Nonetheless this is an issue that needs tackling at one point or another. In any case my ideas are saved for the ages. :)