Small Things are Happening…


I have to admit: I am a sucker for nice APIs. And, yes, I am sort of in love with some of my own creations. Well, at least until I find the flaws and cannot remove them due to binary compatibility issues (see Soprano). This may sound a bit egomaniac but let’s face it: we almost never get credit for good API or good API documentation. So we need to congratulate ourselves.

My new pet is the Nepomuk Query API. As its name says it can be used to query Nepomuk resources and sets out to replace as many hard coded SPARQL queries as possible. It started out rather simple: matching a set of resources with different types of terms. But then Dario Freddi and his relatively complex telepathy queries came along. So the challenge began. I tried to bend the existing API as much as possible to fit in the features he requested. One thing let to another, I suddenly found myself in need of optional terms and a few days later things were not as simple as they started out any more.

ComparisonTerm already was the most complex term in the query API. But that did not mean much. Basically you could set a property and a sub term and that was it. On Monday it became a bit more beastly. Now you can invert it, force the name of the variable used, give it a sort weight, change the sort order, and even set an aggregate function. And all that on only one type of term. At least I implemented optional terms separately.

To explain what all this is good for I will try to illustrate it with a few examples:

Say you want to find all tags a specifc file has (and ignore the fact that there is a Nepomuk::Resource::tags method). This was not possible before inverted comparison terms came along. Now all you have to do is:

Nepomuk::Query::ComparisonTerm tagTerm(
    Soprano::Vocabulary::NAO::hasTag(),
    Nepomuk::Query::ResourceTerm(myFile)
);
tagTerm.setInverted(true);
Nepomuk::Query::Query(tagTerm);

What happens is that subject and object change places in the ComparisonTerm, thus, the resulting SPARQL query looks something like the following:

select ?r where { <myFile> nao:hasTag ?r . }

Simple but effective and confusing. It gets better. Previously we only had the clumsy Query::addRequestProperty to get additional bindings from the query. It is very restricted as it only allows to query direct relations from the results. With ComparisonTerm::setVariableName we now have the generic counterpart. By setting a variable name this variable is included in the bindings of the final query and can be retrieved via Result::additionalBinding. This allows to retrieve any detail from any part of the query. Again we use the most simple example to illustrate:

Nepomuk::Query::ComparisonTerm labelTerm(
    Soprano::Vocabulary::NAO::prefLabel(),
    Nepomuk::Query::Term() );
labelTerm.setVariableName( "label" );
Nepomuk::Query::ComparisonTerm tagTerm(
    Soprano::Vocabulary::NAO::hasTag(),
    labelTerm );
tagTerm.setInverted(true);
Nepomuk::Query::Query query( tagTerm );

This query lists all tags including their labels. Again the resulting SPARQL query would look something like the following:

select ?r ?label where { <myFile> nao:hasTag ?v1 . ?v1 nao:prefLabel ?label . }

And silently I used another little gimmick that I introduced: ComparisonTerm can now handle invalid properties and invalid sub terms which will simply act as wildcards (or be represented by a variable in SPARQL terms).

Now on to the next feature: sort weights. The idea is simple: you can sort the search results using any value matched by a ComparisonTerm. So let’s extend the above query by sorting the tags according to their labels.

labelTerm.setSortWeight( 1 );

And the resulting SPARQL query will reflect the sorting:

select ?r ?label where { <myFile> nao:hasTag ?v1 . ?v1 nao:prefLabel ?label . } order by ?label

Here I used a sort weight of 1 since I only have the one term that includes sorting. But in theory you can include any number of ComparisonTerms in the sorting. The higher the weight the more important the sort order.

We are nearly done. Only one feature is left: aggregate functions. The Virtuoso SPARQL extensions (and SPARQL 1.1, too) support aggregate functions like count or max. These are now supported in ComparisonTerm. They are only useful in combination with a forced variable name in which case they will be included in the additional bindings or with a sort weight. If we go back to our tags we could for example count the number of tags each file has attached:

Nepomuk::Query::ComparisonTerm tagTerm(
    Soprano::Vocabulary::NAO::hasTag(),
    Nepomuk::Query::ResourceTypeTerm(Soprano::Vocabulary::NAO::Tag())
);
tagTerm.setAggregateFunction(
    Nepomuk::Query::ComparisonTerm::Count
);
tagTerm.setVariableName("cnt");
Nepomuk::Query::Query(tagTerm);

And the resulting SPARQL query will be along the lines of:

select ?r count(?v1) as ?cnt where { ?r nao:hasTag ?v1 . ?v1 a nao:Tag . }

Now we can of course sort by number of tags:

tagTerm.setSortWeight(1, Qt::DescendingOrder);

And we get:

select ?r count(?v1) as ?cnt where { ?r nao:hasTag ?v1 . ?v1 a nao:Tag . } order by desc ?cnt

And just because it is fun, let us make the tagging optional so we also get files with zero tags (be aware that normally one should have at least one non-optional term in the query to get useful results. In this case we are on the safe side since we are using a FileQuery):

Nepomuk::Query::ComparisonTerm tagTerm(
    Soprano::Vocabulary::NAO::hasTag(),
    Nepomuk::Query::ResourceTypeTerm(Soprano::Vocabulary::NAO::Tag())
);
tagTerm.setAggregateFunction(
    Nepomuk::Query::ComparisonTerm::Count
);
tagTerm.setVariableName("cnt");
tagTerm.setSortWeight(1, Qt::DescendingOrder);
Nepomuk::Query::FileQuery(
    Nepomuk::Query::OptionalTerm::optionalizeTerm(tagTerm)
);

And with the SPARQL result of this beauty I finish my little session of self-congratulations:

select ?r count(?v1) as ?cnt where { ?r a nfo:FileDataObject . OPTIONAL { ?r nao:hasTag ?v1 . ?v1 a nao:Tag . } . } order by desc ?cnt
About these ads

11 thoughts on “Small Things are Happening…

  1. I already told you how much I liked the Query API (especially after all those string queries last summer ;) but this is getting more interesting at each update!
    Nice work!

  2. It always seemed to me that Query API is too cumbersome.
    Maybe it would me much better to make use of string-template system (read Grantlee – http://dot.kde.org/2010/04/12/grantlee-version-010-out) instead?
    i.e if the main task of this API is to hide and centralize backend-dependant SPARQL code – then it would be much clearer to process string-templated SPARQL intsead of myriads of “setVariableName” and “setInverted”

    • Actually the main task is to get rid of SPARQL strings in application code. Thus, using a template system seems like a step back as developer suddenly have to think of variables and all that again (with the last sentence you have to ignore the setVariableName method)

    • I agree with Sebastian here: to me, this query system seems clearer than having strings all over the place, with string joins, string substitutions and all the rest… frankly. having queries created with strings is very difficult to handle, especially when the queries themselves are built incrementally.

      • > I agree with Sebastian here: to me, this query system seems clearer than having strings all over the place, with string joins, string substitutions and all the rest…
        Decent string template system + thoroughly worked-out standard can greatly simplify all the string joins\substitutions.
        Maybe i’m pointless here until i can present proof-of-concept with comlex cases considered, but this is something for me to work on ;)

  3. Yes, it’s very useful. It seems to offer a level of abstraction over SPARQL queries (almost) the same way an ORM does for SQL.

  4. > I agree with Sebastian here: to me, this query system seems clearer than having strings all over the place, with string joins, string substitutions and all the rest…Decent string template system + thoroughly worked-out standard can greatly simplify all the string joins\substitutions.Maybe i’m pointless here until i can present proof-of-concept with comlex cases considered, but this is something for me to work on ;)
    +1

  5. The URL for the nepomuk query api gives a 404 error. I’d love to use it but it is somewhat difficult when there’s no documentation.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s