Nepomuk 2.0 and the Data Management Service

During the development of Nepomuk in the last years we have gathered a lot of knowledge and ideas about how to integrate semantics and more specifically RDF into the desktop. This knowledge was spread over several components like services, libraries, and applications. Some of it was only found as convention, some of it documented, some only in our brains. But I guess this can be seen as normal in a project which treats on new territory, tries to invent new ways of handling information on our computers.

Thus, in January of this year Vishesh and myself finally set out to define a new API that would gather all this knowledge, all the ideas in one service. This service was intended to enforce our idea of how data in the semantic desktop should be formed while at the same time providing clean and powerful methods to manipulate this data. On April 8th, after 155 commits in a separate repository, I finally merged the new data management service into kde-runtime. And after hinting at it several times it is high time that I explain its ideas and the way we implemented it.

DMS and named graphs

Before I go into details about the DMS API I would like to explain the way information is stored in Nepomuk. More specifically, which ontology entities are used (until now by convention) to describe resources and meta-data.

The following example shows the information created by the file indexer. It nicely shows how we encode the information in Nepomuk. It is encoded using the Trig serialization which is also used by the Shared-Desktop-Ontologies.

<nepomuk:/ctx/graph1> {
    nao:created “2011-05-20T11:23:45Z”^^xsd:dateTime ;
    nao:lastModified “2011-05-20T11:23:45Z”^^xsd:dateTime ;

    nie:contentSize "1286"^^xsd:int ;
    nie:isPartOf <nepomuk:/res/80b4187c-9c40-4e98-9322-9ebcc10bd0bd> ;
    nie:lastModified "2010-12-14T14:49:49Z"^^xsd:dateTime ;
    nie:mimeType "text/plain"^^xsd:string ;
    nie:plainTextContent "[...]"^^xsd:string ;
    nie:url <file:///home/nepomuk/helloworld.txt> ;
    nfo:characterCount "1249"^^xsd:int ;
    nfo:fileName "helloworld.txt"^^xsd:string ;
    nfo:lineCount "37"^^xsd:int ;
    nfo:wordCount "126"^^xsd:int ;
    a nfo:PlainTextDocument, nfo:FileDataObject .
<nepomuk:/ctx/metadatagraph1> {
    a nrl:GraphMetadata ;
    nrl:coreGraphMetadataFor <nepomuk:/ctx/graph1> .

    a nrl:DiscardableInstanceBase ;
    nao:created "2011-05-04T09:46:11.724Z"^^xsd:dateTime ;
    nao:maintainedBy <nepomuk:/res/someapp> .

There is essentially three types of information in this example. If we look at the first graph nepomuk:/ctx/graph1 we see that it contains information about one resource nepomuk:/res/file1. This information is split into two blocks for visualization purposes. The first block contains two properties nao:created and nao:lastModified. We call this information the resource-meta-data. It refers to the Nepomuk resource in the database and states when it was created and modified. This information can only be changed by the data management service. In contrast to that the second block contains the data in the resource. In this case we see file meta-data which in Nepomuk terms is just data. Here it is important to see the difference between nao:lastModified and nie:lastModified. The latter refers to the file on disk itself while the former refers to the Nepomuk resource representing the file on disk.

The second graph nepomuk:/ctx/metadatagraph1 only contains what we call graph-meta-data. A named graph (or context in Soprano terms) is a resource like anything else in Nepomuk. Thus, it has a type. Graphs containing graph-meta-data are always of type nrl:GraphMetadata and belong to exactly one non-graph-meta-data graph. The meta-data graph contains the type, the creation date, and the creating application of the actual graph. In Nepomuk we make the distinction between four types of graphs:

  • nrl:InstanceBase graphs contain normal data that has been created by applications or the user.
  • nrl:DiscardableInstanceBase graphs contain normal data that has been extracted from somewhere and can easily be recreated. This includes file or email indexing. Data in this type of graph does not need to be included in backups.
  • nrl:GraphMetadata graphs contain meta-data about other graphs.
  • nrl:Ontology graphs contain class and property definitions. Nepomuk does import all installed ontologies in its database. This is required for query parsing, data inference, and so on.

Whenever the information is changed or new information is added new graphs are created accordingly. Thus, Nepomuk always knows when which information was created by which application.

The Data Management API

Now that the data format is defined we can continue with the DMS API. The DMS provides one central multi-threaded DBus API and a KDE client library for convenience (this library currently lives in kde-runtime until it is stabilized to be moved to kdelibs or a dedicated repository once kdelibs are split). The API consists of two parts: 1. the simple API which is useful for scripts and simple applications and 2. the advanced API which holds most of the power required for systems like the file indexer or data sharing and syncing. Here I will show the client API as it uses proper Qt/KDE types.

The Simple API

The simple API contains the methods that directly spring to mind when thinking about such a service:

KJob* addProperty(const QList<QUrl>& resources,
                  const QUrl& property,
                  const QVariantList& values,
                  const KComponentData& component = KGlobal::mainComponent());
KJob* setProperty(const QList<QUrl>& resources,
                  const QUrl& property,
                  const QVariantList& values,
                  const KComponentData& component = KGlobal::mainComponent());
KJob* removeProperty(const QList<QUrl>& resources,
                     const QUrl& property,
                     const QVariantList& values,
                     const KComponentData& component = KGlobal::mainComponent());
KJob* removeProperties(const QList<QUrl>& resources,
                       const QList<QUrl>& properties,
                       const KComponentData& component = KGlobal::mainComponent());
CreateResourceJob* createResource(const QList<QUrl>& types,
                                  const QString& label,
                                  const QString& description,
                                  const KComponentData& component = KGlobal::mainComponent());
KJob* removeResources(const QList<QUrl>& resources,
                      RemovalFlags flags = NoRemovalFlags,
                      const KComponentData& component = KGlobal::mainComponent());

It has methods to set, add, and remove properties and to create and remove resources. Each method has an addition parameter to set the application that performs the modification. By default the main component of a KDE app is used. The removeResources method has a flags parameter which so far only has one flag: RemoveSubResources. I will get into sub resources another time though.

This part of the API is pretty straight-forward and rather self-explanatory. Every method will check property domains and ranges, enforce cardinalities, and reject any data that is not well-formed.

The Advanced API

The more interesting part is the advanced API.

KJob* removeDataByApplication(const QList<QUrl>& resources,
                              RemovalFlags flags = NoRemovalFlags,
                              const KComponentData& component = KGlobal::mainComponent());
KJob* removeDataByApplication(RemovalFlags flags = NoRemovalFlags,
                              const KComponentData& component = KGlobal::mainComponent());
KJob* mergeResources(const QUrl& resource1,
                     const QUrl& resource2,
                     const KComponentData& component = KGlobal::mainComponent());
KJob* storeResources(const SimpleResourceGraph& resources,
                     const QHash<QUrl, QVariant>& additionalMetadata = QHash<QUrl, QVariant>(),
                     const KComponentData& component = KGlobal::mainComponent());
KJob* importResources(const KUrl& url,
                      Soprano::RdfSerialization serialization,
                      const QString& userSerialization = QString(),
                      const QHash<QUrl, QVariant>& additionalMetadata = QHash<QUrl, QVariant>(),
                      const KComponentData& component = KGlobal::mainComponent());
DescribeResourcesJob* describeResources(const QList<QUrl>& resources,
                                        bool includeSubResources);

The first two methods allow an application to only remove the information it created itself. This is for example very important for tools like the file indexer that need to update data without touching anything added by the user like tags or comments or relations to other resources.

The method mergeResources allows to actually merge two resources. The result is that the properties and relations of the second resource will be moved to the first one, after which the second resource is deleted.

The most powerful method is without a doubt storeResources. It allows to store entire sets of resources in Nepomuk letting it sync them with existing ones automatically. The SimpleResourceGraph which is used as input is basically just a set of resources which in turn consist of a set of properties and an optional URI. The DMS will look for already existing matching resources before storing them in the database. This also means that simple resources like emails and contacts are merged automatically. Clients like Akonadi do not need to perform their own resource resolution anymore. Another variant of the same method is importResources which is probably most useful for scripts as it allows to read the resources from a file rather than a C++ struct.

Last but not least DMS has one single read-only method: describeResources. It returns all relevant information about the resources provided. This method will be used for meta-data sharing and syncing. While currently it only allows to filter sub-resources it will be extended to also allow to filter by application and permissions.

Well, that is it for the DMS for now. The plan is to port all existing tools and apps to this new API. But do not fear. In most cases this does not mean any work for you as the existing Nepomuk API can be used as always. It will then internally perform calls to DMS. The plan is to make the old Soprano-based interface read-only by KDE 4.8.

About these ads

15 thoughts on “Nepomuk 2.0 and the Data Management Service

  1. Nice, nice!
    The thing is that a real work should be done with the indexer… may be tracker can be use to do the job since strigi isn’t it good shape lately…

  2. Sounds quite nice, I think this might finally fix the tag setting on multiple files in dolphin. :)
    What about the speed compared to the current solutions like MassUpdateJob?

  3. Perhaps my question is a little bit OT but: when I add a tag to a file, this tag is stored in the file too or is stored only in the nepomuk DB?

    • No, it is not. Most file formats do not support tags anyway. A summer of code student is, however working on a writeback service which will write as much of the information as possible back to the files.

  4. Pingback: KDE 一周提交摘要(2011/7/3 & 2011/7/10) | I, KDE

  5. Pingback: Nepomuk – What Comes Next | Trueg's Blog

  6. Pingback: Un proyecto de KDE que necesita ayuda. « El Blog de Ragadast

  7. Pingback: Finding Duplicate Images Made Easy | Trueg's Blog

  8. Pingback: Something Dry: Change Notifications | Trueg's Blog

  9. Pingback: Just For The Fun Of It: Browsing Music With Nepomuk | Trueg's Blog

  10. Pingback: Nepomuk Gives Back Your CPU Cycles… | Trueg's Blog

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s