Conditional Sharing – Virtuoso ACL Groups Revisited

Previously we saw how ACLs can be used in Virtuoso to protect different types of resources. Today we will look into conditional groups which allow to share resources or grant permissions to a dynamic group of individuals. This means that we do not maintain a list of group members but instead define a set of conditions which an individual needs to fulfill in order to be part of the group in question.

That does sound very dry. Let’s just jump to an example:

@prefix oplacl: <http://www.openlinksw.com/ontology/acl#> .
[] a oplacl:ConditionalGroup ;
  foaf:name "People I know" ;
  oplacl:hasCondition [
    a oplacl:QueryCondition ;
    oplacl:hasQuery """ask where { graph <urn:my> { <urn:me> foaf:knows ^{uri}^ } }"""
  ] .

This group is based on a single condition which uses a simple SPARQL ASK query. The ask query contains a variable ^{uri}^ which the ACL engine will replace with the URI of the authenticated user. The group contains anyone who is in a foaf:knows relationship to urn:me in named graph urn:my. (Ideally the latter graph should be write-protected using ACLs as described before.)

Now we use this group in ACL rules. That means we first create it:

$ curl -X POST \
    --data-binary @group.ttl \
    -H"Content-Type: text/turtle" \
    -u dba:dba \
    http://localhost:8890/acl/groups

As a result we get a description of the newly created group which also contains its URI. Let’s imagine this URI is http://localhost:8890/acl/groups/1.

To mix things up we will use the group for sharing permission to access a service instead of files or named graphs. Like many of the Virtuoso-hosted services the URI Shortener is ACL controlled. We can restrict access to it using ACLs.

As always the URI Shortener has its own ACL scope which we need to enable for the ACL system to kick in:

sparql
prefix oplacl: <http://www.openlinksw.com/ontology/acl#>
with <urn:virtuoso:val:config>
delete {
  oplacl:DefaultRealm oplacl:hasDisabledAclScope <urn:virtuoso:val:scopes:curi> .
}
insert {
  oplacl:DefaultRealm oplacl:hasEnabledAclScope <urn:virtuoso:val:scopes:curi> .
};

Now we can go ahead and create our new ACL rule which allows anyone in our conditional group to shorten URLs:

[] a acl:Authorization ;
  oplacl:hasAccessMode oplacl:Write ;
  acl:accessTo <http://localhost:8890/c> ;
  acl:agent <http://localhost:8890/acl/groups/1> ;
  oplacl:hasScope <urn:virtuoso:val:scopes:curi> ;
  oplacl:hasRealm oplacl:DefaultRealm .

Finally we add one URI to the conditional group as follows:

sparql
insert into <urn:my> {
  <urn:me> foaf:knows <http://www.facebook.com/sebastian.trug> .
};

As a result my facebook account has access to the URL Shortener:
Virtuoso URI Shortener

The example we saw here uses a simple query to determine the members of the conditional group. These queries could get much more complex and multiple query conditions could be combined. In addition Virtuoso handles a set of non-query conditions (see also oplacl:GenericCondition). The most basic one being the following which matches any authenticated person:

[] a oplacl:ConditionalGroup ;
  foaf:name "Valid Identifiers" ;
  oplacl:hasCondition [
    a oplacl:GroupCondition, oplacl:GenericCondition ;
    oplacl:hasCriteria oplacl:NetID ;
    oplacl:hasComparator oplacl:IsNotNull ;
    oplacl:hasValue 1
  ] .

This shall be enough on conditional groups for today. There will be more playing around with ACLs in the future…

Protecting And Sharing Linked Data With Virtuoso

Disclaimer: Many of the features presented here are rather new and can not be found in  the open-source version of Virtuoso.

Last time we saw how to share files and folders stored in the Virtuoso DAV system. Today we will protect and share data stored in Virtuoso’s Triple Store – we will share RDF data.

Virtuoso is actually a quadruple-store which means each triple lives in a named graph. In Virtuoso named graphs can be public or private (in reality it is a bit more complex than that but this view on things is sufficient for our purposes), public graphs being readable and writable by anyone who has permission to read or write in general, private graphs only being readable and writable by administrators and those to which named graph permissions have been granted. The latter case is what interests us today.

We will start by inserting some triples into a named graph as dba – the master of the Virtuoso universe:

Virtuoso Sparql Endpoint

Sparql Result

This graph is now public and can be queried by anyone. Since we want to make it private we quickly need to change into a SQL session since this part is typically performed by an application rather than manually:

$ isql-v localhost:1112 dba dba
Connected to OpenLink Virtuoso
Driver: 07.10.3211 OpenLink Virtuoso ODBC Driver
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
SQL> DB.DBA.RDF_GRAPH_GROUP_INS ('http://www.openlinksw.com/schemas/virtrdf#PrivateGraphs', 'urn:trueg:demo');

Done. -- 2 msec.

Now our new named graph urn:trueg:demo is private and its contents cannot be seen by anyone. We can easily test this by logging out and trying to query the graph:

Sparql Query
Sparql Query Result

But now we want to share the contents of this named graph with someone. Like before we will use my LinkedIn account. This time, however, we will not use a UI but Virtuoso’s RESTful ACL API to create the necessary rules for sharing the named graph. The API uses Turtle as its main input format. Thus, we will describe the ACL rule used to share the contents of the named graph as follows.

@prefix acl: <http://www.w3.org/ns/auth/acl#> .
@prefix oplacl: <http://www.openlinksw.com/ontology/acl#> .
<#rule> a acl:Authorization ;
  rdfs:label "Share Demo Graph with trueg's LinkedIn account" ;
  acl:agent <http://www.linkedin.com/in/trueg> ;
  acl:accessTo <urn:trueg:demo> ;
  oplacl:hasAccessMode oplacl:Read ;
  oplacl:hasScope oplacl:PrivateGraphs .

Virtuoso makes use of the ACL ontology proposed by the W3C and extends on it with several custom classes and properties in the OpenLink ACL Ontology. Most of this little Turtle snippet should be obvious: we create an Authorization resource which grants Read access to urn:trueg:demo for agent http://www.linkedin.com/in/trueg. The only tricky part is the scope. Virtuoso has the concept of ACL scopes which group rules by their resource type. In this case the scope is private graphs, another typical scope would be DAV resources.

Given that file rule.ttl contains the above resource we can post the rule via the RESTful ACL API:

$ curl -X POST --data-binary @rule.ttl -H"Content-Type: text/turtle" -u dba:dba http://localhost:8890/acl/rules

As a result we get the full rule resource including additional properties added by the API.

Finally we will login using my LinkedIn identity and are granted read access to the graph:

SPARQL Endpoint  Login
sparql6
sparql7
sparql8

We see all the original triples in the private graph. And as before with DAV resources no local account is necessary to get access to named graphs. Of course we can also grant write access, use groups, etc.. But those are topics for another day.

Technical Footnote

Using ACLs with named graphs as described in this article requires some basic configuration. The ACL system is disabled by default. In order to enable it for the default application realm (another topic for another day) the following SPARQL statement needs to be executed as administrator:

sparql
prefix oplacl: <http://www.openlinksw.com/ontology/acl#>
with <urn:virtuoso:val:config>
delete {
  oplacl:DefaultRealm oplacl:hasDisabledAclScope oplacl:Query , oplacl:PrivateGraphs .
}
insert {
  oplacl:DefaultRealm oplacl:hasEnabledAclScope oplacl:Query , oplacl:PrivateGraphs .
};

This will enable ACLs for named graphs and SPARQL in general. Finally the LinkedIn account from the example requires generic SPARQL read permissions. The simplest approach is to just allow anyone to SPARQL read:

@prefix acl: <http://www.w3.org/ns/auth/acl#> .
@prefix oplacl: <http://www.openlinksw.com/ontology/acl#> .
<#rule> a acl:Authorization ;
  rdfs:label "Allow Anyone to SPARQL Read" ;
  acl:agentClass foaf:Agent ;
  acl:accessTo <urn:virtuoso:access:sparql> ;
  oplacl:hasAccessMode oplacl:Read ;
  oplacl:hasScope oplacl:Query .

I will explain these technical concepts in more detail in another article.

Virtuoso Open-Source Moved to GitHub

Ever since 2006 OpenLink Software has provided its Open-Source version of Virtuoso (VOS), the high-performance SQL server with a powerful RDF/SPARQL data management layer on top.

So far the sources have been developed in an internal cvs repository which was published through the Virtuoso sourceforge pages.

As of March 21. OpenLink took the next step towards Open Development by moving to git as its version management system. The sources are now hosted in the VOS GitHub repository.

Like mentioned on the VOS git usage pages OpenLink now accepts GitHub pull requests and patches. Be sure to read the notes on git branching policy in VOS which are based on the git-flow approach by Vincent Driessen – which by the way is an interesting read independent of VOS.

Most importantly it is now a lot simpler to follow the development of Virtuoso Open-Source. Simply clone the git repository and switch to the appropriate develop branch:

$ git clone git://github.com/openlink/virtuoso-opensource.git
$ cd virtuoso-opensource
$ git checkout -t remotes/origin/develop/6

For details on the used branches see the already mentioned VOS git usage guide.

Refer to the VOS building instructions if the following is not enough for you:

$ ./autogen.sh
$ ./configure --prefix=/usr/local --with-layout=<LAYOUT>
$ make
$ make install

where <LAYOUT> is one of Gnu, Debian, Gentoo, Redhat, Freebsd, opt, Openlink. The latter two force the prefix.

You do not need to know RDF or FOAF to use WebID

After going into way too much detail about FOAF and how to create your own WebID manually the last time, today I will show you how easy it is when you have a system that supports WebID properly.

The system I am talking about is ODS – The OpenLink Data Spaces. Please do not be scared away by the UI which does not offer all the fanciness of today’s Web interfaces. The point with ODS is its backend (which BTW is entirely based on Virtuoso, including the serving of the web pages).

Getting your own WebID through ODS contains of only two steps: 1. create an ODS account, and 2. let ODS create the X.509 certificate for you. Now as mentioned before you need to trust OpenLink with your private key in this case. If you are not willing to do that you can setup your own instance of ODS – more about that another time.

Creating an ODS account

So start by navigating to the public instance of ODS in order to create an account. In the upper right corner you will find a little “Sign Up” button which will get you here:

As you can see there are several ways to sign up for an ODS account, one of which is of course WebID. Since you want to use ODS to create your WebID you need to use another means: plain old username+password or something OAuth-powered like LinkedIn.

Generating your X.509 certificate

Once you created the account and are logged into ODS, navigate to the profile manager via the little “edit” button right next to “Profile” in the upper left. You are now presented with lots of tabs allowing to change all details of your ODS account. In this case you are interested in the “Security->Certificate Generator” section.

Normally all required fields should already be filled. Now you only need to click the “submit certificate request” button and let ODS and your browser do the rest. If the certificate generation was successful you get a notice that it has been imported into your borwser’s key chain. This is what it looks like in Chrome:

To verify that your new certificate has been registered successfully check the “X.509 Certificates” tab which lists all certificates installed in ODS (the ones which include the private key):

Testing your shiny new WebID

Finally you can try your new WebID by logging out (“Logout” in the upper right corner) and signing in again, this time with your WebID:

Notice how your browser takes care of the login by itself as it has the certificate installed in its own key chain.

That is already it. You now have a fully functional and valid WebID without every worrying about RDF or FOAF or anything else besides clicking some buttons.

WebID – A Guide For The Clueless

Clueless – that is what I was a while ago regarding WebID. Since then I learned a lot. One of the things I learned was that apparently there is no easy hands-on guide to get started with WebID. This is what I will try to remedy here. So let us start with the basics:

What is WebID?

WebID is essentially two things: 1. a way to identify yourself and others in the semantic web of things, and 2. a unified password-less alternative to classical login credentials.

A WebID is essentially a URL pointing to a description of yourself (this is typically a FOAF file) combined with a self-signed X.509 certificate. X.509 certificates are those things used to verify the identity of web servers via SSL. Typically they are signed by big brother authorities like Verisign whose root certificates are hard-coded into all web browsers.

How Does WebID Work?

As mentioned before the WebID is a URL which points to a FOAF file. Now if you want to log into some site you simply provide the WebID which means to select a certificate from a list in your web browser. The server will then fetch the FOAF, extract the certificate’s public key from it (more about that later), and then ask you to prove your identity. Since you are the only one having the private key of the certificate that is easily done. And that’s already it. From a high level point of view it is very simple.

The WebID certificate selection dialog on Linux is ugly and shows way too many pointless details – better integration does exist. However, the point stands: it is easy to select the WebID to login with.

Of course this does not differ much from other private/public key systems yet. However, it gets really interesting when you use WebIDs to share information. Imagine your social platform allows you to setup fine grained ACLs based on WebIDs. This person can read that photo, this person can write to that document, and so on. These people do not even need to have accounts on the service in question. Using their WebIDs they will have access to exactly that information.

Is It Safe?

In order to ensure security only two things need to be made sure: 1. The FOAF file your WebID is pointing to should be under your control or that of a trustworthy entity, and 2. as it always has been: make sure nobody steals your private key.

And even if you loose your private key, disabling the WebID is as easy as removing the public key from your FOAF profile (more details following later). Even replacing the certificate with a new one will never invalidate your WebID since it stays an identifier for yourself in the semantic web, independent of the certificate.

Where Can I Learn More?

If you like to read specs check out the latest WebID specs by the W3C WebID community group. Join the mailing list, chat on IRC #foaf, and watch the video showing how simple WebID is for the end user.

How Can I Try Myself?

I will present two ways to play with WebID. The first way is as simple as creating an account at id.myopenlink.net and clicking a few buttons. However, I will leave that for the next blog entry. The second way is the one that leads to a better understanding of WebID and is the result of my struggle with the matter. Be aware, however, that the following howto does show how we can do manually what tools will eventually do for us.

Step 1: Create Your Very Own FOAF Profile

FOAF – Friend Of A Friend – is essentially an RDF vocabulary which allows you to describe your social web. Your WebID will eventually resolve to a foaf:Person representing yourself. So the first thing to do is to decide what your WebID looks like. For simplicity I will use my own as an example: http://www.trueg.de/people/sebastian#me. While Turtle would be a much more readable representation of your FOAF profile I will use RDF+XML instead to increase the probability of server support. Let us start with the basics: the foaf:Person.

<?xml version="1.0"?>
<rdf:RDF
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns:foaf="http://xmlns.com/foaf/0.1/">
  <foaf:Person rdf:about="http://www.trueg.de/people/sebastian#me">
    <foaf:name>Sebastian Trueg</foaf:name>
  </foaf:Person>
</rdf:RDF>

This is a simple RDF document describing a resource of type foaf:Person with one property: foaf:name. It should be fairly self-explanatory. To this you may add all sorts of information like blogs, email addresses, nick names, personal details, whatever you want. Just be sure to remember that all of this will be public.

There is one very important distinction I want to stress since I missed that for a long time: the document is not the person, ie. the URL of the document needs to differ from the WebID URL (hence the ‘#me’ fragment). To stress this fact lets use some more FOAF:

<?xml version="1.0"?>
<rdf:RDF
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns:foaf="http://xmlns.com/foaf/0.1/">
  <foaf:PersonalProfileDocument rdf:about="http://www.trueg.de/people/sebastian">
    <dc:title>Sebastian Trueg's FOAF Profile</dc:title>
    <foaf:maker rdf:resource="http://www.trueg.de/people/sebastian#me" />
    <foaf:primaryTopic rdf:resource="http://www.trueg.de/people/sebastian#me" />
  </foaf:PersonalProfileDocument>

  <foaf:Person rdf:about="http://www.trueg.de/people/sebastian#me">
    <foaf:name>Sebastian Trueg</foaf:name>
  </foaf:Person>
</rdf:RDF>

As you can see the document itself only describes the person but it is not the same resource. Failing the make this distinction will result in an invalid WebID.

Create The X.509 Certificate

Once your FOAF file is done you need to get your certificate. This could be done via OpenSSL manually but it is tedious and error-prone. Thus, we fire up our web-browser and let it do the work for us by relying on a certificate generator like webid.fcns.eu/certgen.php and the browser’s own key generator.

All you need is the WebID and your name. The rest is optional. The nice thing here is that the web service will generate the certificate but your browser will generate the key locally and never send the private key to the service. It is safe.

Once the certificate has been generated it is saved to the browser’s own certificate storage thingy.

Copy The Certificate Public Key Into Your FOAF Profile

Now that your certificate has been created you can look at it in the browser’s preferences. In Firefox it can be found via Advanced->View Certificates->Your Certificates. Find the “Subject’s Public Key’ in the details and copy it into your FOAF profile (remove any whitespace in the process):

<?xml version="1.0"?>
<rdf:RDF
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns:foaf="http://xmlns.com/foaf/0.1/"
 xmlns:cert="http://www.w3.org/ns/auth/cert#">
[...]
  <foaf:Person rdf:about="http://www.trueg.de/people/sebastian#me">
    <foaf:name>Sebastian Trueg</foaf:name>
    <cert:key>
      <cert:RSAPublicKey>
        <cert:modulus rdf:datatype="http://www.w3.org/2001/XMLSchema#hexBinary">a2425fd56d265a45690a36524db6ae290d347a1905429918c7eed70c5bfbf3ce07316563173628d6dfe7b98e1f054446cab7d878953d85d1b8d41b9ffbc983cda6a1daa951207e920205f7172c6f850a3c5d191d314624d984208b365412331d8c260c81813c54ae3b7f3eac6b5f3e152f2ffb6ac951bc0fb3e629171e5c3ded9fd8dcc6ca7e2313bb59186a78af44ee20c9fd4f70c4f443efcecfd75c7c7c19a54c2c749f804cff45cb78e811a6f0993d5da13ba67c426b028d204d908ea9e11794db80bbed569cc99676830db03df98a7462e089fe0e9d5a786ee4eb1ce227e2918a5bf071b4e5a2325b0c67e8b80096e23b58afe3144e5e6b76c9d2fb8e41</cert:modulus>
        <cert:exponent rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">65537</cert:exponent>
      </cert:RSAPublicKey>
    </cert:key>
  </foaf:Person>
</rdf:RDF>

Having the public key in the FOAF profile is essential since that is where the servers you want to identify yourself to will read it from.

Upload The FOAF File to Your Server

Finally you need to upload the FOAF file to your web server and make it accessible. Since we are using a fancy WebID this requires some very basic Apaching through .htaccess in the ‘people’ folder which sets up some redirects:

# Turn off MultiViews
Options -MultiViews

# Directive to ensure *.rdf files served as appropriate content type,
# if not present in main apache config
AddType application/rdf+xml .rdf

# Rewrite engine setup
RewriteEngine On
RewriteBase /people

# Rewrite rule to serve HTML content from the vocabulary URI if requested
RewriteCond %{HTTP_ACCEPT} !application/rdf\+xml.*(text/html|application/xhtml\+xml)
RewriteCond %{HTTP_ACCEPT} text/html [OR]
RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/.*
RewriteRule ^([^/]+)$ $1/index.html [R=303]

# Rewrite rule to serve RDF/XML content from the vocabulary URI if requested
RewriteCond %{HTTP_ACCEPT} application/rdf\+xml
RewriteRule ^([^/]+)$ $1/foaf.rdf [R=303]

# Rewrite rule to serve the RDF+XML content from the vocabulary URI by default
RewriteRule ^([^/]+)$ $1/foaf.rdf [R=303]

Now your shiny new WebID is accessible by web services. Go ahead and verify your WebID on my-profile.eu and use it to sign into id.myopenlink.net.

Next up: more on id.myopenlink.net and possible usage of WebID for social capabilities in Nepomuk.

The Different Places Something Can Go Wrong

This is just a little blog entry about the impact that the ontologies can have on functionality.

The ontologies are a set of vocabularies describing the types of resources stored in Nepomuk, the possible relations between these types, and the possible annotations. We have for example a type for local files, one for an address book entry, one for a person, one for music content and so on. We also have relations that describe that some person is the author or some piece of content and so on.

These ontologies are maintained in the Shared-Desktop-Ontologies project – to my knowledge the only real open-source project developing RDF ontologies.

Now to the actual topic. There once was a bug. Like so many other bugs it talked about file indexing in Nepomuk and like so many other bugs it said that some file could not be indexed. First it was Nepomuk’s fault, then it was the fault of libstreamanalyzer, but in the end I realized: there was a bug in the ontologies. More specificly in NMM – the Nepomuk MultiMedia ontology. (Granted this was not really the source of the hang the bug talks about but it was the reason the file could not be indexed.)

The problem was the domain of the nmm:setSize property. Each property has a domain and a range – the domain defines on which type of resource the property can be set, the range defines the type of the value. In other words they are defining the subject and object type of the triple. The domain is always a resource type (rdfs:Class), the range a resource or a literal type (typically one defined in the XML schema). In this case the domain of nmm:setSIze was set as nmm:MusicPiece whereas it should have been nmm:MusicAlbum. Thus, Nepomuk rejected the data generated by libstreamanalyzer as being invalid due to using an invalid domain. (Update: Nepomuk treats RDF data in a closed-world fashion. In comparison to the open-world approach which is typical for RDF/S resource types are not inferred from their relations. In an open-world situation the resource would simply end up being both a nmm:MusicPiece and a nmm:MusicAlbum.)

The solution is shared-desktop-ontologies 0.8.1 with the fixed domain. Installing it will make Nepomuk re-parse the changed ontology and indexing the mp3 files in question will finally work.

Well, this was pretty verbose for a rather small issue. Still it gave a little introduction into how the ontologies are used in Nepomuk. One more thing to take care of in the “Nepomuk universe”.

And as always:

Click here to lend your support to: Nepomuk - The semantic desktop on KDE and make a donation at www.pledgie.com !

About Strigi, Soprano, Virtuoso, CLucene, and Libstreamanalyzer

There seems to be a lot of confusion about the parts that make up the Nepomuk infrastructure. Let me shed some light.

Soprano is the RDF data storage and parsing library used in Nepomuk. Soprano provides a plugin for Virtuoso which is mandatory and requires libiodbc. It does NOT work with unixODBC (It compiles but simply does not work due to some extensions in libiodbc required for RDF data handling). In addition to the Virtuoso plugin Nepomuk requires the Raptor parser plugin and the Redland storage plugin for ontology import.

CLucene is not required in Nepomuk anymore. It has been used for full-text indexing in early versions of KDE but is superseded by the fullt-text indexing functionality of Virtuoso. Consequently the Soprano clucene module is not required anymore and development has effectively been stopped. It will most likely not be part of Soprano 3 (unless someone interested steps up and does the required work).

Virtuoso is a full-blown SQL server with a powerful RDF layer on top. OpenLink, the company developing Virtuoso, maintains an open channel of communication to the Nepomuk developers and introduced a “lite” mode for us (please no comments on how it still is not “lite”). Virtuoso 6.1.3 is the current version. It has a unicode bug which can be fixed by applying the patch attached to KDE bug 271664. Virtuoso 6.1.4 will be released soon and contains several fixes to bugs reported by me. An update is highly recommended.

Libstreamanalyzer and libstreams are libraries which are part of the Strigi project. In addition the Strigi project contains strigidaemon, an alternative scheduler for indexing files which is based on CLucene and not used by Nepomuk. I asked the maintainer of Strigi once to split libstreams and libstreamanalyzer into their own independently released packages. He refused which is understandable seeing as he has little time for Strigi as it is. As a consequence I advise packagers to either use libstreamanalyzer from git master or the latest tag instead of using released tarballs.

I think that is all. If I missed something please comment and I will update the post.