Conditional Sharing – Virtuoso ACL Groups Revisited

Previously we saw how ACLs can be used in Virtuoso to protect different types of resources. Today we will look into conditional groups which allow to share resources or grant permissions to a dynamic group of individuals. This means that we do not maintain a list of group members but instead define a set of conditions which an individual needs to fulfill in order to be part of the group in question.

That does sound very dry. Let’s just jump to an example:

@prefix oplacl: <http://www.openlinksw.com/ontology/acl#> .
[] a oplacl:ConditionalGroup ;
  foaf:name "People I know" ;
  oplacl:hasCondition [
    a oplacl:QueryCondition ;
    oplacl:hasQuery """ask where { graph <urn:my> { <urn:me> foaf:knows ^{uri}^ } }"""
  ] .

This group is based on a single condition which uses a simple SPARQL ASK query. The ask query contains a variable ^{uri}^ which the ACL engine will replace with the URI of the authenticated user. The group contains anyone who is in a foaf:knows relationship to urn:me in named graph urn:my. (Ideally the latter graph should be write-protected using ACLs as described before.)

Now we use this group in ACL rules. That means we first create it:

$ curl -X POST \
    --data-binary @group.ttl \
    -H"Content-Type: text/turtle" \
    -u dba:dba \
    http://localhost:8890/acl/groups

As a result we get a description of the newly created group which also contains its URI. Let’s imagine this URI is http://localhost:8890/acl/groups/1.

To mix things up we will use the group for sharing permission to access a service instead of files or named graphs. Like many of the Virtuoso-hosted services the URI Shortener is ACL controlled. We can restrict access to it using ACLs.

As always the URI Shortener has its own ACL scope which we need to enable for the ACL system to kick in:

sparql
prefix oplacl: <http://www.openlinksw.com/ontology/acl#>
with <urn:virtuoso:val:config>
delete {
  oplacl:DefaultRealm oplacl:hasDisabledAclScope <urn:virtuoso:val:scopes:curi> .
}
insert {
  oplacl:DefaultRealm oplacl:hasEnabledAclScope <urn:virtuoso:val:scopes:curi> .
};

Now we can go ahead and create our new ACL rule which allows anyone in our conditional group to shorten URLs:

[] a acl:Authorization ;
  oplacl:hasAccessMode oplacl:Write ;
  acl:accessTo <http://localhost:8890/c> ;
  acl:agent <http://localhost:8890/acl/groups/1> ;
  oplacl:hasScope <urn:virtuoso:val:scopes:curi> ;
  oplacl:hasRealm oplacl:DefaultRealm .

Finally we add one URI to the conditional group as follows:

sparql
insert into <urn:my> {
  <urn:me> foaf:knows <http://www.facebook.com/sebastian.trug> .
};

As a result my facebook account has access to the URL Shortener:
Virtuoso URI Shortener

The example we saw here uses a simple query to determine the members of the conditional group. These queries could get much more complex and multiple query conditions could be combined. In addition Virtuoso handles a set of non-query conditions (see also oplacl:GenericCondition). The most basic one being the following which matches any authenticated person:

[] a oplacl:ConditionalGroup ;
  foaf:name "Valid Identifiers" ;
  oplacl:hasCondition [
    a oplacl:GroupCondition, oplacl:GenericCondition ;
    oplacl:hasCriteria oplacl:NetID ;
    oplacl:hasComparator oplacl:IsNotNull ;
    oplacl:hasValue 1
  ] .

This shall be enough on conditional groups for today. There will be more playing around with ACLs in the future…

Protecting And Sharing Linked Data With Virtuoso

Disclaimer: Many of the features presented here are rather new and can not be found in  the open-source version of Virtuoso.

Last time we saw how to share files and folders stored in the Virtuoso DAV system. Today we will protect and share data stored in Virtuoso’s Triple Store – we will share RDF data.

Virtuoso is actually a quadruple-store which means each triple lives in a named graph. In Virtuoso named graphs can be public or private (in reality it is a bit more complex than that but this view on things is sufficient for our purposes), public graphs being readable and writable by anyone who has permission to read or write in general, private graphs only being readable and writable by administrators and those to which named graph permissions have been granted. The latter case is what interests us today.

We will start by inserting some triples into a named graph as dba – the master of the Virtuoso universe:

Virtuoso Sparql Endpoint

Sparql Result

This graph is now public and can be queried by anyone. Since we want to make it private we quickly need to change into a SQL session since this part is typically performed by an application rather than manually:

$ isql-v localhost:1112 dba dba
Connected to OpenLink Virtuoso
Driver: 07.10.3211 OpenLink Virtuoso ODBC Driver
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
SQL> DB.DBA.RDF_GRAPH_GROUP_INS ('http://www.openlinksw.com/schemas/virtrdf#PrivateGraphs', 'urn:trueg:demo');

Done. -- 2 msec.

Now our new named graph urn:trueg:demo is private and its contents cannot be seen by anyone. We can easily test this by logging out and trying to query the graph:

Sparql Query
Sparql Query Result

But now we want to share the contents of this named graph with someone. Like before we will use my LinkedIn account. This time, however, we will not use a UI but Virtuoso’s RESTful ACL API to create the necessary rules for sharing the named graph. The API uses Turtle as its main input format. Thus, we will describe the ACL rule used to share the contents of the named graph as follows.

@prefix acl: <http://www.w3.org/ns/auth/acl#> .
@prefix oplacl: <http://www.openlinksw.com/ontology/acl#> .
<#rule> a acl:Authorization ;
  rdfs:label "Share Demo Graph with trueg's LinkedIn account" ;
  acl:agent <http://www.linkedin.com/in/trueg> ;
  acl:accessTo <urn:trueg:demo> ;
  oplacl:hasAccessMode oplacl:Read ;
  oplacl:hasScope oplacl:PrivateGraphs .

Virtuoso makes use of the ACL ontology proposed by the W3C and extends on it with several custom classes and properties in the OpenLink ACL Ontology. Most of this little Turtle snippet should be obvious: we create an Authorization resource which grants Read access to urn:trueg:demo for agent http://www.linkedin.com/in/trueg. The only tricky part is the scope. Virtuoso has the concept of ACL scopes which group rules by their resource type. In this case the scope is private graphs, another typical scope would be DAV resources.

Given that file rule.ttl contains the above resource we can post the rule via the RESTful ACL API:

$ curl -X POST --data-binary @rule.ttl -H"Content-Type: text/turtle" -u dba:dba http://localhost:8890/acl/rules

As a result we get the full rule resource including additional properties added by the API.

Finally we will login using my LinkedIn identity and are granted read access to the graph:

SPARQL Endpoint  Login
sparql6
sparql7
sparql8

We see all the original triples in the private graph. And as before with DAV resources no local account is necessary to get access to named graphs. Of course we can also grant write access, use groups, etc.. But those are topics for another day.

Technical Footnote

Using ACLs with named graphs as described in this article requires some basic configuration. The ACL system is disabled by default. In order to enable it for the default application realm (another topic for another day) the following SPARQL statement needs to be executed as administrator:

sparql
prefix oplacl: <http://www.openlinksw.com/ontology/acl#>
with <urn:virtuoso:val:config>
delete {
  oplacl:DefaultRealm oplacl:hasDisabledAclScope oplacl:Query , oplacl:PrivateGraphs .
}
insert {
  oplacl:DefaultRealm oplacl:hasEnabledAclScope oplacl:Query , oplacl:PrivateGraphs .
};

This will enable ACLs for named graphs and SPARQL in general. Finally the LinkedIn account from the example requires generic SPARQL read permissions. The simplest approach is to just allow anyone to SPARQL read:

@prefix acl: <http://www.w3.org/ns/auth/acl#> .
@prefix oplacl: <http://www.openlinksw.com/ontology/acl#> .
<#rule> a acl:Authorization ;
  rdfs:label "Allow Anyone to SPARQL Read" ;
  acl:agentClass foaf:Agent ;
  acl:accessTo <urn:virtuoso:access:sparql> ;
  oplacl:hasAccessMode oplacl:Read ;
  oplacl:hasScope oplacl:Query .

I will explain these technical concepts in more detail in another article.

Virtuoso Open-Source Moved to GitHub

Ever since 2006 OpenLink Software has provided its Open-Source version of Virtuoso (VOS), the high-performance SQL server with a powerful RDF/SPARQL data management layer on top.

So far the sources have been developed in an internal cvs repository which was published through the Virtuoso sourceforge pages.

As of March 21. OpenLink took the next step towards Open Development by moving to git as its version management system. The sources are now hosted in the VOS GitHub repository.

Like mentioned on the VOS git usage pages OpenLink now accepts GitHub pull requests and patches. Be sure to read the notes on git branching policy in VOS which are based on the git-flow approach by Vincent Driessen – which by the way is an interesting read independent of VOS.

Most importantly it is now a lot simpler to follow the development of Virtuoso Open-Source. Simply clone the git repository and switch to the appropriate develop branch:

$ git clone git://github.com/openlink/virtuoso-opensource.git
$ cd virtuoso-opensource
$ git checkout -t remotes/origin/develop/6

For details on the used branches see the already mentioned VOS git usage guide.

Refer to the VOS building instructions if the following is not enough for you:

$ ./autogen.sh
$ ./configure --prefix=/usr/local --with-layout=<LAYOUT>
$ make
$ make install

where <LAYOUT> is one of Gnu, Debian, Gentoo, Redhat, Freebsd, opt, Openlink. The latter two force the prefix.

You do not need to know RDF or FOAF to use WebID

After going into way too much detail about FOAF and how to create your own WebID manually the last time, today I will show you how easy it is when you have a system that supports WebID properly.

The system I am talking about is ODS – The OpenLink Data Spaces. Please do not be scared away by the UI which does not offer all the fanciness of today’s Web interfaces. The point with ODS is its backend (which BTW is entirely based on Virtuoso, including the serving of the web pages).

Getting your own WebID through ODS contains of only two steps: 1. create an ODS account, and 2. let ODS create the X.509 certificate for you. Now as mentioned before you need to trust OpenLink with your private key in this case. If you are not willing to do that you can setup your own instance of ODS – more about that another time.

Creating an ODS account

So start by navigating to the public instance of ODS in order to create an account. In the upper right corner you will find a little “Sign Up” button which will get you here:

As you can see there are several ways to sign up for an ODS account, one of which is of course WebID. Since you want to use ODS to create your WebID you need to use another means: plain old username+password or something OAuth-powered like LinkedIn.

Generating your X.509 certificate

Once you created the account and are logged into ODS, navigate to the profile manager via the little “edit” button right next to “Profile” in the upper left. You are now presented with lots of tabs allowing to change all details of your ODS account. In this case you are interested in the “Security->Certificate Generator” section.

Normally all required fields should already be filled. Now you only need to click the “submit certificate request” button and let ODS and your browser do the rest. If the certificate generation was successful you get a notice that it has been imported into your borwser’s key chain. This is what it looks like in Chrome:

To verify that your new certificate has been registered successfully check the “X.509 Certificates” tab which lists all certificates installed in ODS (the ones which include the private key):

Testing your shiny new WebID

Finally you can try your new WebID by logging out (“Logout” in the upper right corner) and signing in again, this time with your WebID:

Notice how your browser takes care of the login by itself as it has the certificate installed in its own key chain.

That is already it. You now have a fully functional and valid WebID without every worrying about RDF or FOAF or anything else besides clicking some buttons.

WebID – A Guide For The Clueless

Clueless – that is what I was a while ago regarding WebID. Since then I learned a lot. One of the things I learned was that apparently there is no easy hands-on guide to get started with WebID. This is what I will try to remedy here. So let us start with the basics:

What is WebID?

WebID is essentially two things: 1. a way to identify yourself and others in the semantic web of things, and 2. a unified password-less alternative to classical login credentials.

A WebID is essentially a URL pointing to a description of yourself (this is typically a FOAF file) combined with a self-signed X.509 certificate. X.509 certificates are those things used to verify the identity of web servers via SSL. Typically they are signed by big brother authorities like Verisign whose root certificates are hard-coded into all web browsers.

How Does WebID Work?

As mentioned before the WebID is a URL which points to a FOAF file. Now if you want to log into some site you simply provide the WebID which means to select a certificate from a list in your web browser. The server will then fetch the FOAF, extract the certificate’s public key from it (more about that later), and then ask you to prove your identity. Since you are the only one having the private key of the certificate that is easily done. And that’s already it. From a high level point of view it is very simple.

The WebID certificate selection dialog on Linux is ugly and shows way too many pointless details – better integration does exist. However, the point stands: it is easy to select the WebID to login with.

Of course this does not differ much from other private/public key systems yet. However, it gets really interesting when you use WebIDs to share information. Imagine your social platform allows you to setup fine grained ACLs based on WebIDs. This person can read that photo, this person can write to that document, and so on. These people do not even need to have accounts on the service in question. Using their WebIDs they will have access to exactly that information.

Is It Safe?

In order to ensure security only two things need to be made sure: 1. The FOAF file your WebID is pointing to should be under your control or that of a trustworthy entity, and 2. as it always has been: make sure nobody steals your private key.

And even if you loose your private key, disabling the WebID is as easy as removing the public key from your FOAF profile (more details following later). Even replacing the certificate with a new one will never invalidate your WebID since it stays an identifier for yourself in the semantic web, independent of the certificate.

Where Can I Learn More?

If you like to read specs check out the latest WebID specs by the W3C WebID community group. Join the mailing list, chat on IRC #foaf, and watch the video showing how simple WebID is for the end user.

How Can I Try Myself?

I will present two ways to play with WebID. The first way is as simple as creating an account at id.myopenlink.net and clicking a few buttons. However, I will leave that for the next blog entry. The second way is the one that leads to a better understanding of WebID and is the result of my struggle with the matter. Be aware, however, that the following howto does show how we can do manually what tools will eventually do for us.

Step 1: Create Your Very Own FOAF Profile

FOAF – Friend Of A Friend – is essentially an RDF vocabulary which allows you to describe your social web. Your WebID will eventually resolve to a foaf:Person representing yourself. So the first thing to do is to decide what your WebID looks like. For simplicity I will use my own as an example: http://www.trueg.de/people/sebastian#me. While Turtle would be a much more readable representation of your FOAF profile I will use RDF+XML instead to increase the probability of server support. Let us start with the basics: the foaf:Person.

<?xml version="1.0"?>
<rdf:RDF
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns:foaf="http://xmlns.com/foaf/0.1/">
  <foaf:Person rdf:about="http://www.trueg.de/people/sebastian#me">
    <foaf:name>Sebastian Trueg</foaf:name>
  </foaf:Person>
</rdf:RDF>

This is a simple RDF document describing a resource of type foaf:Person with one property: foaf:name. It should be fairly self-explanatory. To this you may add all sorts of information like blogs, email addresses, nick names, personal details, whatever you want. Just be sure to remember that all of this will be public.

There is one very important distinction I want to stress since I missed that for a long time: the document is not the person, ie. the URL of the document needs to differ from the WebID URL (hence the ‘#me’ fragment). To stress this fact lets use some more FOAF:

<?xml version="1.0"?>
<rdf:RDF
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns:foaf="http://xmlns.com/foaf/0.1/">
  <foaf:PersonalProfileDocument rdf:about="http://www.trueg.de/people/sebastian">
    <dc:title>Sebastian Trueg's FOAF Profile</dc:title>
    <foaf:maker rdf:resource="http://www.trueg.de/people/sebastian#me" />
    <foaf:primaryTopic rdf:resource="http://www.trueg.de/people/sebastian#me" />
  </foaf:PersonalProfileDocument>

  <foaf:Person rdf:about="http://www.trueg.de/people/sebastian#me">
    <foaf:name>Sebastian Trueg</foaf:name>
  </foaf:Person>
</rdf:RDF>

As you can see the document itself only describes the person but it is not the same resource. Failing the make this distinction will result in an invalid WebID.

Create The X.509 Certificate

Once your FOAF file is done you need to get your certificate. This could be done via OpenSSL manually but it is tedious and error-prone. Thus, we fire up our web-browser and let it do the work for us by relying on a certificate generator like webid.fcns.eu/certgen.php and the browser’s own key generator.

All you need is the WebID and your name. The rest is optional. The nice thing here is that the web service will generate the certificate but your browser will generate the key locally and never send the private key to the service. It is safe.

Once the certificate has been generated it is saved to the browser’s own certificate storage thingy.

Copy The Certificate Public Key Into Your FOAF Profile

Now that your certificate has been created you can look at it in the browser’s preferences. In Firefox it can be found via Advanced->View Certificates->Your Certificates. Find the “Subject’s Public Key’ in the details and copy it into your FOAF profile (remove any whitespace in the process):

<?xml version="1.0"?>
<rdf:RDF
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns:foaf="http://xmlns.com/foaf/0.1/"
 xmlns:cert="http://www.w3.org/ns/auth/cert#">
[...]
  <foaf:Person rdf:about="http://www.trueg.de/people/sebastian#me">
    <foaf:name>Sebastian Trueg</foaf:name>
    <cert:key>
      <cert:RSAPublicKey>
        <cert:modulus rdf:datatype="http://www.w3.org/2001/XMLSchema#hexBinary">a2425fd56d265a45690a36524db6ae290d347a1905429918c7eed70c5bfbf3ce07316563173628d6dfe7b98e1f054446cab7d878953d85d1b8d41b9ffbc983cda6a1daa951207e920205f7172c6f850a3c5d191d314624d984208b365412331d8c260c81813c54ae3b7f3eac6b5f3e152f2ffb6ac951bc0fb3e629171e5c3ded9fd8dcc6ca7e2313bb59186a78af44ee20c9fd4f70c4f443efcecfd75c7c7c19a54c2c749f804cff45cb78e811a6f0993d5da13ba67c426b028d204d908ea9e11794db80bbed569cc99676830db03df98a7462e089fe0e9d5a786ee4eb1ce227e2918a5bf071b4e5a2325b0c67e8b80096e23b58afe3144e5e6b76c9d2fb8e41</cert:modulus>
        <cert:exponent rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">65537</cert:exponent>
      </cert:RSAPublicKey>
    </cert:key>
  </foaf:Person>
</rdf:RDF>

Having the public key in the FOAF profile is essential since that is where the servers you want to identify yourself to will read it from.

Upload The FOAF File to Your Server

Finally you need to upload the FOAF file to your web server and make it accessible. Since we are using a fancy WebID this requires some very basic Apaching through .htaccess in the ‘people’ folder which sets up some redirects:

# Turn off MultiViews
Options -MultiViews

# Directive to ensure *.rdf files served as appropriate content type,
# if not present in main apache config
AddType application/rdf+xml .rdf

# Rewrite engine setup
RewriteEngine On
RewriteBase /people

# Rewrite rule to serve HTML content from the vocabulary URI if requested
RewriteCond %{HTTP_ACCEPT} !application/rdf\+xml.*(text/html|application/xhtml\+xml)
RewriteCond %{HTTP_ACCEPT} text/html [OR]
RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/.*
RewriteRule ^([^/]+)$ $1/index.html [R=303]

# Rewrite rule to serve RDF/XML content from the vocabulary URI if requested
RewriteCond %{HTTP_ACCEPT} application/rdf\+xml
RewriteRule ^([^/]+)$ $1/foaf.rdf [R=303]

# Rewrite rule to serve the RDF+XML content from the vocabulary URI by default
RewriteRule ^([^/]+)$ $1/foaf.rdf [R=303]

Now your shiny new WebID is accessible by web services. Go ahead and verify your WebID on my-profile.eu and use it to sign into id.myopenlink.net.

Next up: more on id.myopenlink.net and possible usage of WebID for social capabilities in Nepomuk.

The Different Places Something Can Go Wrong

This is just a little blog entry about the impact that the ontologies can have on functionality.

The ontologies are a set of vocabularies describing the types of resources stored in Nepomuk, the possible relations between these types, and the possible annotations. We have for example a type for local files, one for an address book entry, one for a person, one for music content and so on. We also have relations that describe that some person is the author or some piece of content and so on.

These ontologies are maintained in the Shared-Desktop-Ontologies project – to my knowledge the only real open-source project developing RDF ontologies.

Now to the actual topic. There once was a bug. Like so many other bugs it talked about file indexing in Nepomuk and like so many other bugs it said that some file could not be indexed. First it was Nepomuk’s fault, then it was the fault of libstreamanalyzer, but in the end I realized: there was a bug in the ontologies. More specificly in NMM – the Nepomuk MultiMedia ontology. (Granted this was not really the source of the hang the bug talks about but it was the reason the file could not be indexed.)

The problem was the domain of the nmm:setSize property. Each property has a domain and a range – the domain defines on which type of resource the property can be set, the range defines the type of the value. In other words they are defining the subject and object type of the triple. The domain is always a resource type (rdfs:Class), the range a resource or a literal type (typically one defined in the XML schema). In this case the domain of nmm:setSIze was set as nmm:MusicPiece whereas it should have been nmm:MusicAlbum. Thus, Nepomuk rejected the data generated by libstreamanalyzer as being invalid due to using an invalid domain. (Update: Nepomuk treats RDF data in a closed-world fashion. In comparison to the open-world approach which is typical for RDF/S resource types are not inferred from their relations. In an open-world situation the resource would simply end up being both a nmm:MusicPiece and a nmm:MusicAlbum.)

The solution is shared-desktop-ontologies 0.8.1 with the fixed domain. Installing it will make Nepomuk re-parse the changed ontology and indexing the mp3 files in question will finally work.

Well, this was pretty verbose for a rather small issue. Still it gave a little introduction into how the ontologies are used in Nepomuk. One more thing to take care of in the “Nepomuk universe”.

And as always:

Click here to lend your support to: Nepomuk - The semantic desktop on KDE and make a donation at www.pledgie.com !

About Strigi, Soprano, Virtuoso, CLucene, and Libstreamanalyzer

There seems to be a lot of confusion about the parts that make up the Nepomuk infrastructure. Let me shed some light.

Soprano is the RDF data storage and parsing library used in Nepomuk. Soprano provides a plugin for Virtuoso which is mandatory and requires libiodbc. It does NOT work with unixODBC (It compiles but simply does not work due to some extensions in libiodbc required for RDF data handling). In addition to the Virtuoso plugin Nepomuk requires the Raptor parser plugin and the Redland storage plugin for ontology import.

CLucene is not required in Nepomuk anymore. It has been used for full-text indexing in early versions of KDE but is superseded by the fullt-text indexing functionality of Virtuoso. Consequently the Soprano clucene module is not required anymore and development has effectively been stopped. It will most likely not be part of Soprano 3 (unless someone interested steps up and does the required work).

Virtuoso is a full-blown SQL server with a powerful RDF layer on top. OpenLink, the company developing Virtuoso, maintains an open channel of communication to the Nepomuk developers and introduced a “lite” mode for us (please no comments on how it still is not “lite”). Virtuoso 6.1.3 is the current version. It has a unicode bug which can be fixed by applying the patch attached to KDE bug 271664. Virtuoso 6.1.4 will be released soon and contains several fixes to bugs reported by me. An update is highly recommended.

Libstreamanalyzer and libstreams are libraries which are part of the Strigi project. In addition the Strigi project contains strigidaemon, an alternative scheduler for indexing files which is based on CLucene and not used by Nepomuk. I asked the maintainer of Strigi once to split libstreams and libstreamanalyzer into their own independently released packages. He refused which is understandable seeing as he has little time for Strigi as it is. As a consequence I advise packagers to either use libstreamanalyzer from git master or the latest tag instead of using released tarballs.

I think that is all. If I missed something please comment and I will update the post.

Nepomuk 2.0 and the Data Management Service

During the development of Nepomuk in the last years we have gathered a lot of knowledge and ideas about how to integrate semantics and more specifically RDF into the desktop. This knowledge was spread over several components like services, libraries, and applications. Some of it was only found as convention, some of it documented, some only in our brains. But I guess this can be seen as normal in a project which treats on new territory, tries to invent new ways of handling information on our computers.

Thus, in January of this year Vishesh and myself finally set out to define a new API that would gather all this knowledge, all the ideas in one service. This service was intended to enforce our idea of how data in the semantic desktop should be formed while at the same time providing clean and powerful methods to manipulate this data. On April 8th, after 155 commits in a separate repository, I finally merged the new data management service into kde-runtime. And after hinting at it several times it is high time that I explain its ideas and the way we implemented it.

DMS and named graphs

Before I go into details about the DMS API I would like to explain the way information is stored in Nepomuk. More specifically, which ontology entities are used (until now by convention) to describe resources and meta-data.

The following example shows the information created by the file indexer. It nicely shows how we encode the information in Nepomuk. It is encoded using the Trig serialization which is also used by the Shared-Desktop-Ontologies.

<nepomuk:/ctx/graph1> {
  <nepomuk:/res/file1>
    nao:created “2011-05-20T11:23:45Z”^^xsd:dateTime ;
    nao:lastModified “2011-05-20T11:23:45Z”^^xsd:dateTime ;

    nie:contentSize "1286"^^xsd:int ;
    nie:isPartOf <nepomuk:/res/80b4187c-9c40-4e98-9322-9ebcc10bd0bd> ;
    nie:lastModified "2010-12-14T14:49:49Z"^^xsd:dateTime ;
    nie:mimeType "text/plain"^^xsd:string ;
    nie:plainTextContent "[...]"^^xsd:string ;
    nie:url <file:///home/nepomuk/helloworld.txt> ;
    nfo:characterCount "1249"^^xsd:int ;
    nfo:fileName "helloworld.txt"^^xsd:string ;
    nfo:lineCount "37"^^xsd:int ;
    nfo:wordCount "126"^^xsd:int ;
    a nfo:PlainTextDocument, nfo:FileDataObject .
}
<nepomuk:/ctx/metadatagraph1> {
  <nepomuk:/ctx/metadatagraph1>
    a nrl:GraphMetadata ;
    nrl:coreGraphMetadataFor <nepomuk:/ctx/graph1> .

  <nepomuk:/ctx/graph1>
    a nrl:DiscardableInstanceBase ;
    nao:created "2011-05-04T09:46:11.724Z"^^xsd:dateTime ;
    nao:maintainedBy <nepomuk:/res/someapp> .
}

There is essentially three types of information in this example. If we look at the first graph nepomuk:/ctx/graph1 we see that it contains information about one resource nepomuk:/res/file1. This information is split into two blocks for visualization purposes. The first block contains two properties nao:created and nao:lastModified. We call this information the resource-meta-data. It refers to the Nepomuk resource in the database and states when it was created and modified. This information can only be changed by the data management service. In contrast to that the second block contains the data in the resource. In this case we see file meta-data which in Nepomuk terms is just data. Here it is important to see the difference between nao:lastModified and nie:lastModified. The latter refers to the file on disk itself while the former refers to the Nepomuk resource representing the file on disk.

The second graph nepomuk:/ctx/metadatagraph1 only contains what we call graph-meta-data. A named graph (or context in Soprano terms) is a resource like anything else in Nepomuk. Thus, it has a type. Graphs containing graph-meta-data are always of type nrl:GraphMetadata and belong to exactly one non-graph-meta-data graph. The meta-data graph contains the type, the creation date, and the creating application of the actual graph. In Nepomuk we make the distinction between four types of graphs:

  • nrl:InstanceBase graphs contain normal data that has been created by applications or the user.
  • nrl:DiscardableInstanceBase graphs contain normal data that has been extracted from somewhere and can easily be recreated. This includes file or email indexing. Data in this type of graph does not need to be included in backups.
  • nrl:GraphMetadata graphs contain meta-data about other graphs.
  • nrl:Ontology graphs contain class and property definitions. Nepomuk does import all installed ontologies in its database. This is required for query parsing, data inference, and so on.

Whenever the information is changed or new information is added new graphs are created accordingly. Thus, Nepomuk always knows when which information was created by which application.

The Data Management API

Now that the data format is defined we can continue with the DMS API. The DMS provides one central multi-threaded DBus API and a KDE client library for convenience (this library currently lives in kde-runtime until it is stabilized to be moved to kdelibs or a dedicated repository once kdelibs are split). The API consists of two parts: 1. the simple API which is useful for scripts and simple applications and 2. the advanced API which holds most of the power required for systems like the file indexer or data sharing and syncing. Here I will show the client API as it uses proper Qt/KDE types.

The Simple API

The simple API contains the methods that directly spring to mind when thinking about such a service:

KJob* addProperty(const QList<QUrl>& resources,
                  const QUrl& property,
                  const QVariantList& values,
                  const KComponentData& component = KGlobal::mainComponent());
KJob* setProperty(const QList<QUrl>& resources,
                  const QUrl& property,
                  const QVariantList& values,
                  const KComponentData& component = KGlobal::mainComponent());
KJob* removeProperty(const QList<QUrl>& resources,
                     const QUrl& property,
                     const QVariantList& values,
                     const KComponentData& component = KGlobal::mainComponent());
KJob* removeProperties(const QList<QUrl>& resources,
                       const QList<QUrl>& properties,
                       const KComponentData& component = KGlobal::mainComponent());
CreateResourceJob* createResource(const QList<QUrl>& types,
                                  const QString& label,
                                  const QString& description,
                                  const KComponentData& component = KGlobal::mainComponent());
KJob* removeResources(const QList<QUrl>& resources,
                      RemovalFlags flags = NoRemovalFlags,
                      const KComponentData& component = KGlobal::mainComponent());

It has methods to set, add, and remove properties and to create and remove resources. Each method has an addition parameter to set the application that performs the modification. By default the main component of a KDE app is used. The removeResources method has a flags parameter which so far only has one flag: RemoveSubResources. I will get into sub resources another time though.

This part of the API is pretty straight-forward and rather self-explanatory. Every method will check property domains and ranges, enforce cardinalities, and reject any data that is not well-formed.

The Advanced API

The more interesting part is the advanced API.

KJob* removeDataByApplication(const QList<QUrl>& resources,
                              RemovalFlags flags = NoRemovalFlags,
                              const KComponentData& component = KGlobal::mainComponent());
KJob* removeDataByApplication(RemovalFlags flags = NoRemovalFlags,
                              const KComponentData& component = KGlobal::mainComponent());
KJob* mergeResources(const QUrl& resource1,
                     const QUrl& resource2,
                     const KComponentData& component = KGlobal::mainComponent());
KJob* storeResources(const SimpleResourceGraph& resources,
                     const QHash<QUrl, QVariant>& additionalMetadata = QHash<QUrl, QVariant>(),
                     const KComponentData& component = KGlobal::mainComponent());
KJob* importResources(const KUrl& url,
                      Soprano::RdfSerialization serialization,
                      const QString& userSerialization = QString(),
                      const QHash<QUrl, QVariant>& additionalMetadata = QHash<QUrl, QVariant>(),
                      const KComponentData& component = KGlobal::mainComponent());
DescribeResourcesJob* describeResources(const QList<QUrl>& resources,
                                        bool includeSubResources);

The first two methods allow an application to only remove the information it created itself. This is for example very important for tools like the file indexer that need to update data without touching anything added by the user like tags or comments or relations to other resources.

The method mergeResources allows to actually merge two resources. The result is that the properties and relations of the second resource will be moved to the first one, after which the second resource is deleted.

The most powerful method is without a doubt storeResources. It allows to store entire sets of resources in Nepomuk letting it sync them with existing ones automatically. The SimpleResourceGraph which is used as input is basically just a set of resources which in turn consist of a set of properties and an optional URI. The DMS will look for already existing matching resources before storing them in the database. This also means that simple resources like emails and contacts are merged automatically. Clients like Akonadi do not need to perform their own resource resolution anymore. Another variant of the same method is importResources which is probably most useful for scripts as it allows to read the resources from a file rather than a C++ struct.

Last but not least DMS has one single read-only method: describeResources. It returns all relevant information about the resources provided. This method will be used for meta-data sharing and syncing. While currently it only allows to filter sub-resources it will be extended to also allow to filter by application and permissions.

Well, that is it for the DMS for now. The plan is to port all existing tools and apps to this new API. But do not fear. In most cases this does not mean any work for you as the existing Nepomuk API can be used as always. It will then internally perform calls to DMS. The plan is to make the old Soprano-based interface read-only by KDE 4.8.

Randa and Ontologies and whatnot…

So I am back from Randa, the small town in the Alpes where roughly 60 KDE hackers (and 2 Gnome people) met to achieve great things. Before I start raving on about Nepomuk and ontologies and all that jazz let me thank Mario Fux and his family for the great organization and the fabulous food.

Now that the pink fluffy bunny part of the blog post is done I can get to the real thing: Nepomuk. The guys (Daniele E. Domenichelli from Telapathy-KDE, GSoC students Smit Shah and Martin Klapetek, Ivan Cukic, and Seif Lotfy and Trever Fischer from Zeitgeist) and me met in Africa (for the unaware: the rooms in the house in Randa are named after continents). I gave a few short tutorials on Nepomuk, its design, the basics like RDF and how it is used in Nepomuk, and the new Data Management Service (which I still did not blog about yet – but it will come, give it time). Then I answered questions and helped with the individual projects.

Apart from the discussions on KDEPIM and Telepathy integration two topics stand out: 1. The new and improved Documentation for the Shared-Desktop-Ontologies and 2. The integration between the KDE Activity Manager, Nepomuk, and Zeitgeist.

New and Improved SDO Documentation

One thing I wanted to do while in Randa was improving the documentation for Nepomuk. Well, we did not really look much into Nepomuk’s own documentation but Daniele E. Domenichelli (Telepathy-KDE hacker and former GSoC student) and myself finished what I started a while back. We finished the script which extracts all ontology entities from the sources and generates nice docbook references. We split the manually created docbook documentation into subfiles which are combined into one docbook chapter per ontology. Then the chapters are merged into one big file which is transformed into html via xslt. All is streamlined through the cmake build system’s “docs” target. Thus, everyone can easily create their own ontology documentation at home. If you do not want to do that but want to check it out anyway without waiting for an update on semanticdesktop.org just use the oscaf project’s page which now contains this documentation.

Integration Between KDE Activity Manager, Nepomuk and Zeitgeist

Zeitgeist is an interesting project which in its topic is closely related to Nepomuk: Zeitgeist tracks events on the desktop and stores them in a database which can then be queried by clients. This allows for example to see which files you touched in the last week and track the history of a single file (here “history” refers to the modification events, not the content). Lately Zeitgeist developers have moved closer towards KDE and shown a lot of interest to work with us. At Randa we finally came up with a good plan to do so.

The problem is that KDE already has a database to store all kinds of information including events: Nepomuk. And KDE has the KDE activity manager whose API is supposed to be used by KDE applications to inform about opened/modified/closed files. Thus, we had three things to bring together: Zeitgeist and its application plugins which inform Zeitgeist about events, the KDE activity manager which does the same thing for KDE applications, and Nepomuk as the one semantic database in KDE. Thus, Seif Lotfy (Zeitgeist founder), Trever Fischer (GSoC student doing a Zeitgeist GUI for KDE), Ivan Cukic (Author of the KDE activity manager), and myself discussed the different issues heatedly and actually came to a conclusion.

The basic design will be like this: KDE applications will use the KDE Activity Manager Daemon (KAMD) API to inform about opened, modified, and closed files. The KAMD will then send these events to Zeitgeist which will apply all its magic (blacklisting, geo location attachment, and so on). Finally Zeitgeist will store the events in Nepomuk. If Zeitgeist is not available KAMD will push the events to Nepomuk itself.

Integration between KAMD, Zeitgeist, and Nepomuk

This way we benefit from Zeitgeist’s additional event processing and its integration into applications like OpenOffice or even vim.

Update: Ivan told me that he has a much nicer diagram. And in fact he has. His looks like a mutated bunny with one Gnome and one KDE ear:

Ivan is the master of simple diagrams (mine are always too complex)

We also had to merge ontologies. Zeitgeist has its own event ontology which is built upon the Nepomuk ontologies. We tried to merge as much of it into SDO as made sense. The result can be seen as a separate branch in the SDO git repository. We decided to store usage events as follows:

Usage Events in Nepomuk

If looking at the example of a file being opened NUAO models one main event which stretches from the time the file is opened to the time it is closed. This is described via nuao:UsageEvent and the properties nuao:start and nuao:end. During the time span of this event the file can be in the focus of the user or not. This is modelled via nuao:FocusEvent instances that describe when a file was in focus.

Since the action of modifying a resource is a very foggy concept and hard to fit into an event with start and end time it was decided to instead only store a timestamp for each modification of the resource. This is described via nie:modified and nie:contentModified. For file modifications one would typically use nie:modified.

Storing all focus events in Nepomuk will most likely result in an ugly amount of data which does not provide any really useful information besides the total time a resource was in focus. Thus, we introduced nuao:totalFocusDuration which allows Zeitgeist to compress all focus events and attach this resulting duration to the enclosing nuao:UsageEvent.

Well, that is it for today. Aaron urged me to write a dot article about Randa so I will probably try to do that (he says vaguely).

Nice Things To Do With Nepomuk – Part Two

Yesterday I presented how to extract web links from resources (mostly files) in Nepomuk and store them as proper resources themselves.

Let us now take a look at the data we created. For this we will fire up NepSak aka. Nepomukshell and use a bit of SPARQL for testing and debugging purposes (remember: when implementing stuff try to keep to the query API instead of coding your own SPARQL queries). We start by listing all the nfo:Website resources there are:

There are a lot of websites there. This is not very helpful yet. So let’s throw the magic creation time into the mix:

This is already better. We now see all web links sorted by creation date. But what about the files we extracted them from? Let’s modify the query one last time to see those, too:

Now we see all the data the code from yesterday created. The files we extracted the web links from and the websites themselves. This information could now be used in some fancy GUI. Since I did not create a fancy GUI yet I will instead show what already works out of the box. We open Dolphin and hover one of the files we extracted a website from:

The links are already properly displayed and even clickable. (But when we click the links we see a possibility for improvement: it does not open the link but a query looking for other files linking to this website.)

Well, this is all I will present today as I was sidetracked by my system breaking down on me again.