Archive

Posts Tagged ‘SPARQL’

Just another way of browsing your files

October 26, 2009 15 comments

The Zeitgeist guys created a fuse file system called zeitgeistfs. It is basically a calendar containing the files accessed at that specific date. So at the Akonadi meeting last weekend, having two hours to kill, I thought that should be doable with KIO. So two hours later (most of that time was spent twiddling with UDS entries) the timeline:/ KIO slave was up and running:

The code that actually does something is minimal: a bit of UDS entry creation for dates and a simple SPARQL query to forward to the nepomuksearch KIO slave. Yes, it is as easy as that since we can simply set the UDS_URL property of an item to a nepomuksearch URL and KIO will take care of the rest. Smooth. Thanks a lot David Faure. Once again you paved the way.

OK then. Try it if you like. This is just another example of what can be done with Nepomuk (no real semantics here though). The code is in the playground as always and is based on the current kdebase trunk. So with KDE 4.3 this baby won’t work. And of course as this is based on file meta data, the Nepomuk Strigi integration has to be enabled.

SPARQL is weird…

October 26, 2009 9 comments

It really is. The following query is the only way I found to exclude folders when looking for files:

select ?r where {
   ?r a nfo:FileDataObject .
   OPTIONAL { ?r2 a nfo:Folder . FILTER(?r = ?r2) . } .
   FILTER( !BOUND(?r2) ) .
}

Now to me this looks really weird. And maybe I am simply not seeing the wood for all the trees…

And Yet Another Post About Virtuoso

October 14, 2009 7 comments

Today nearly all problems are solved. OpenLink provided a patch that makes inserting very large literals (more than 1 metabyte in size) lightning fast, even with a very low buffer count. Also I worked around the issue of URI encoding. Now the Soprano Virtuoso backend simply percent-encodes all non-unreserved characters and all reserved characters that are not used in their special meaning in URIs used in queries. Man, that is a mouth full. Well, it seems to work fine although I can always use more testing with weird file URLs (weird means containing weird characters like brackets and the likes). I also fixed some error handling bugs.

So what is left? Well, there are a few hacks in the Virtuoso backend which are rather ugly. One example is the detection of query result types. To determine if the result is boolean, bindings, or a graph it actually checks the name and number of result columns. Urgh! It would be nicer to check for the type of the result. Seems like graph results are BLOBs.

Anyway, enough for tonight. I am tired. Here is the patch to make Virtuoso not hang when Strigi adds nie:PlainTextContent literals of big files:

Index: sqlrcomp.c
===================================================================
RCS file: virtuoso-opensource/libsrc/Wi/sqlrcomp.c,v
retrieving revision 1.9
diff -u -r1.9 sqlrcomp.c
--- sqlrcomp.c  20 Aug 2009 17:47:22 -0000      1.9
+++ sqlrcomp.c  13 Oct 2009 16:11:49 -0000
@@ -65,7 +65,7 @@
 {
 va_list list;
 char temp[2000];
-  int ret;
+  int ret, rest_sz, copybytes;
 va_start (list, string);
 ret = vsnprintf (temp, sizeof (temp), string, list);
 #ifndef NDEBUG
@@ -75,11 +75,16 @@
 va_end (list);
 #ifndef NDEBUG
 if (*fill + strlen (temp) > len - 1)
-    GPF_T1 ("overflow in strncpy");
+    GPF_T1 ("overflow in memcpy");
 #endif
-  strncpy (&text[*fill], temp, len - *fill - 1);
+  rest_sz = (len - fill[0]);
+  if (ret >= rest_sz)
+    copybytes = ((rest_sz > 0) ? rest_sz : 0);
+  else
+    copybytes = ret+1;
+  memcpy (text+fill[0], temp, copybytes);
 text[len - 1] = 0;
-  *fill += (int) strlen (temp);
+  fill[0] += ret;
 }

A Bit Of Nepomuk Goodness On Your Developer Fingertips

September 22, 2009 Leave a comment

And yet another technical blog entry. This time it concerns the latest improvements in sopranocmd (Soprano >= 2.3.61). For starters there is an improved NRLModel which provides automatic query prefix expansion. What does that mean? Well, it means that debugging Nepomuk data is simpler now as you can simply use all ontologies stored in the Nepomuk database without defining their prefixes. With sopranocmd (I am using nepomukcmd, a little alias I introduce in the Nepomuk Tips and Tricks) this feature is enabled via the –nrl parameter. Thus, querying all tags becomes:

nepomukcmd --nrl query "select ?r where { ?r a nao:Tag . }"

The second new thing is an improved import command. Again enabled with the –nrl parameter it creates a new named graph of type nrl:KnowledgeBase and puts all new statements (which are not in a graph yet) into it. As described in Nepomuk Data Layout it also adds a metadata graph and the creation date. It actually makes use of NRLModel::createGraph.

The reason I did this was to be able to migrate the tmo:Task instances I had created on the laptop to my desktop machine. Just as an example I will show the procedure here:

First I export all the tasks and their properties on the laptop:

nepomukcmd --nrl export "describe ?r where { ?r a tmo:Task . }" \
     /tmp/task-dump.n4

And then on the desktop I simply import them into Nepomuk:

nepomukcmd --nrl import /tmp/task-dump.n4

Update: You can also use query prefixes for statement listing now. Thus the following is now possible:

nepomukcmd --nrl list "" a tmo:Task

(Even the “a” keyword now maps to rdf:type.)

GSoC Wrap-up Part 1

August 25, 2009 18 comments

This year’s Google Summer of Code has ended. And it was a great success!

This year I had the pleasure to mentor two outstanding students: Adam Kidder and Alessandro Sivieri. Working with them was fun and rewarding. Both quickly understood what Nepomuk was all about and provided high quality work. I am very happy about that. Even more so since both of them plan to continue working on KDE and Nepomuk. Thus, I can only repeat myself: a great success.

Enough of the euphoria. Let us dive into the good stuff and start with Adam’s project:

Improved Virtual Folders

We have had the virtual folder KIO slave in KDE for quite some time now. But it was one big hack I threw together and always had its hickups, not to mention the lack of features. Adam took the project of improving the situation by making it more stable, introducing new features such as negated terms and relative dates, and providing a GUI for query creation. I can assure you that this was no easy task. Diving into the messy code I produced both for the Nepomuk query service and the search KIO slave Adam needed nerves of steel. But he proved himself by understanding and sorting out the mess and introducing a bunch of nice features.

Relative Dates

One of the nicest thing Adam implemented is the support for relative date in queries. By relative dates I mean for example yesterday as you can see in the following screenshot:

Virtual Folder using a relative date

Virtual Folder using a relative date

Another possible relative date is “a week ago” which can of course also be combined with other query terms:

gsoc-virtfolders-last-weekApart from relative dates Adam implemented

Negated Query Terms

Using a minus sign as the negation prefix we can exclude certain query terms:

Querying for one tag

Querying for one tag

Excluding another tag

Excluding another tag

Very useful and mandatory for any search engine.

One thing I personally find very important is the possibility to use

Sparql Queries in the KIO slave

This allows to use the KIO slave to list arbitrary query results (as long as its only resources) and list them in Dolphin or even use a KDirModel to list resources in any application.

Listing Nepomuk Tasks via the Search KIO slave

Listing Nepomuk Tasks via the Search KIO slave

Now let us have a look at the

GUI

Due to the complexity of Adam’s project’s code he did not get as far with the GUI as he would have liked. But as mentioned already he will continue to work on it and integrate it into Dolphin nicely. Anyway, so far we have a small query creator which allows to save queries that are then displayed in the nepomuksearch:/ main folder.

Editing a query in the simple query editor

Editing a query in the simple query editor

Try it

If you want to test Adam’s new features before they are merged into trunk you need to install his work branch which replaces a few files installed by kdebase-runtime. The query editor is still part of the Nepomuk playground module. It is not enabled in the build system of the whole module, it needs to be built independantly.

That’s it for now. Next up: Alessandro’s smart file dialog.

Reblog this post [with Zemanta]

Nepomuk Data Layout

August 21, 2009 11 comments

I am happy to announce that I just finished writing an article about the data layout in Nepomuk. I think it is another must-read if you want to work with Nepomuk data, either querying it or creating data yourself. The article is another one in the series of techbase tutorials about Nepomuk. Well, this was short and fairly dry…

Reblog this post [with Zemanta]

Introduction to RDF and SPARQL by the Other Guy

July 21, 2009 Leave a comment

As you might know already Tracker is now using RDF and SPARQL and also the Nepomuk ontologies. One of their most active developers is Phillip van Hoof. He just wrote two blogs on this topic: Introduction to RDF and SPARQL and More introduction to RDF and SPARQL. He gives a very nice overview of semantic data on the desktop. Of course, his examples are a bit Tracker specific but most can also be performed using sopranocmd.

Reblog this post [with Zemanta]

What Nepomuk Can do and How You Should Use it (as a Developer)

July 16, 2009 26 comments

Nepomuk has been around for quite a while but the functionality exposed in KDE 4.3 is still not that impressive. This does not mean that there does not exist cool stuff. It only means that there is not enough developer power to get it all stable and integrated perfectly. Let me give you an overview of what already exists in playground and how it can be used (and how you should use it).

The Basics

For starters there is the Nepomuk API in kdelibs which you should get familiar with.Most importantly (we will use it quite a lot later on) there is Nepomuk::Resource which gives access to arbitrary resources in Nepomuk.

Nepomuk::Resource file( myFilePath );
file.addTag( Nepomuk::Tag( “Fancy stuff” ) );
QString desc = file.description();
QList<Nepomuk::Tag> allTags = Nepomuk::Tag::allTags();

Resource allows simple manipulation of data in Nepomuk. Using some fancy cmake magic through the new NepomukAddOntologyClasses macro in kdelibs data manipulation gets even simpler. The second basic thing you should get familliar with is Soprano and SPARQL. As a quickstart the following code shows how I typically create queries using Soprano:

using namespace Soprano;

Model* model = Nepomuk::ResourceManager::instance()->mainModel();
QString query = QString( “prefix nao:%1 “
                         “select ?x where { “
                         “%2 nao:hasTag ?t . “
                         “?r nao:hasTag ?t . }” )
        .arg(Node::resourceToN3(Vocabulary::NAO::naoNamespace()))
        .arg(Node::resourceToN3(file.resourceUri()));
QueryResultIterator it
        = model->executeQuery( query, Query::QueryLanguageSparql );

As you can see there is always a lot of QString::arg involved to prevent hard-coding of URIs (again Soprano provides some cmake magic for generating Vocabulary namespaces).

These are the basics. Without these basics you cannot use Nepomuk.

Debugging Nepomuk Data

Now before we dive into the unstable, experimental, and really cool stuff let me mention sopranocmd.

sopranocmd is a command line tool that comes with Soprano and allows to perform virtually any operation possible on the Nepomuk RDF database. It has an exhaustive help output and you should use it to debug your data, test your queries and the like (if anyone is interested in creating a graphical version, please step up).

The Nepomuk database (hosting only a single Soprano model called “main”) can be accessed though D-Bus as follows:

sopranocmd --dbus org.kde.NepomukStorage --model main \
      query "select ?r where { ?r ?p ?o . }"

The Good Stuff

There is quite a lot of experimental stuff in the playground but I want to focus on the annotation framework and Scribo.

The central idea of the annotation framework is the annotation suggestion which is encapsulated in the Annotation class (Hint: run “make apidox” in the annotationplugin folder). Instead of the user manually annotating resources (adding tags or relating things to other things) the system proposes annotations which the user then simply acknowledges or discards. These Annotation instances are normally created by AnnotationPlugin instances (although it is perfectly possible to create them some other way) which are trigged through an AnnotationRequest.

Before I continue a short piece of code for the impatient:

Resource res = getResource();

AnnotationPluginWrapper* wrapper = new AnnotationPluginWrapper();
wrapper.setPlugins( AnnotationPluginFactory::instance()
   ->getPluginsSupportingAnnotationForResource( res.resourceUri() ) );
connect( wrapper, SIGNAL(newAnnotation(Nepomuk::Annotation*)),
         this, SLOT(addNewAnnotation(Nepomuk::Annotation*)) );
connect( wrapper, SIGNAL(finished()),
         this, SLOT(slotFinished()) );

AnnotationRequest req;
req.setResource( res );
req.setFilter( filter );
wrapper->getPossibleAnnotations( req );

The AnnotationPluginWrapper is just a convenience class which prevents us from connecting to each plugin separately. It reproduces the same signals the plugins emit.

The interesting part is the AnnotationRequest. At the moment (the framework is under development. This also means that your ideas, patches, and even refactoring actions are very welcome) it has three parameters, all of which are optional:

  1. A resource – The resource for which the annotation should be created. This parameter is a bit tricky as the Annotation::create method allows to create an annotation on an arbitrary resource but in some cases it makes perfect sense to only create annotation suggestions for only one resource.
  2. A filter string – A filter is supposed to be a short string entered by the user which triggers an auto-completion via annotations. Plugins should also take the resource into account if it is set.
  3. A text – An arbitrary long text which is to be analyzed by plugins. Plugins would typically extract keywords or concepts from it. Plugins should also take resource and filter into account if possible. This is where the Scribo system comes in (more later).

Plugins that I already created include very simple ones like the tag plugin which matches the filter to existing tag names and also excludes tags already set on the resource. Way more interesting are other plugins like the pimotype plugin which matches the filter to pimo types and proposed to use that type or the pimo relation plugin which allows to create relations via a very simple syntax: “author:trueg“. The latter will match author to existing properties and trueg to a value based on the property range. One step further goes the geonames annotation plugin which matches the filter or the resource label to cities or countries using the geonames web service. It will then propose to set a location or (in case the resource label was matched) to convert the resource into a city or country linking to the geonames resource.

A picture says more than a thousand words. Thus, here goes:

annotations-english

What do we see here? The user entered the text Paris in the AnnotationWidget (a class available in the framework) and the framework then created a set of suggested annotations. The most likely one is Paris, the city in France as sugested by the geonames plugin. The latter also proposes a few not so likely places. The pimotype plugin proposes to create a new type named Paris and the tag plugin proposes to create a new tag named Paris. Here I see room for improvement: if we can relate to the city Paris there is no need for the tag. Thus, some more sophisticated rating and comparision may be in order.

Now let us bring Scribo into play. Scribo is another framework in the playground which provides an API for text analysis and keyword extraction. It is tied into the annotation framework through a dedicated plugin which uses the TextAnnotation class to create annotations on specific text positions. The TextAnnotation class is supposed to be used to annotate text documents. It will create a new nfo:TextDocument and make it a nie:isPartOf the main document. Then the new resource is annotated according to the implementation.

The Scribo framework will extract keywords and entities from the text (specified via the AnnotationRequest text field) via plugins which will then be used to create annotation suggestions. There currently exist three plugins for Scribo: the datetime plugin extracts dates and times, the pimo plugin matches words in the text to things in the Nepomuk database, and the OpenCalais plugin will use the OpenCalais webservice to extract entities from the text.

You can try the Scribo framework by using the scriboshell which can be found in the playground, too:

scriboshell3

Paste the text to analyze in the left view and press the “Start” button. The right panel will then show all found entities and keywords including the text position and relevance.

The other possibility is to directly use the resourceeditor which is part of the annotation framework and bundles all gui elements the latter has to offer in one widget. Call it on a text file and you will get a window similar to the following:

resourceeditor

At the top you have the typical things: editable label and description, the rating, and the tags. Below that you have the exisiting properties and annotations. In the picture these are only properties extracted by Strigi. Then comes the interesting part: the suggestions. Here you can see three different Scribo plugins in action. First the pimo plugin matched the word “Brein” to an event I already had in my Nepomuk database. Then there is the OpenCalais plugin which extracted the “Commission of European Communities” (so far the plugin ignores the additional semantic information provided by OpenCalais) and proposes to tag the text with it.

The last suggested annotation that we can see is “Create Event“. This is a very interesting hack I did. The Scribo plugin detected the mentioning of a project, a date, and persons and thus, proposes to create an event which has as its topic the project and takes place at the extracted time. Since it is a hack created specifically for a demo its results will not be very great in many situations. But it shows the direction which I would like to take.

Below the suggestions you can see the AnnotationWidget again which allows to manually annotate the file.

How to Write an AnnotationPlugin

This is a Howto in three sentences: Derive from AnnotationPlugin and implement doGetPossibleAnnotations. In that method trigger the creation of annotations. Your annotations can be instances of SimpleAnnotation or be based on Annotation and implement at least doCreate, exists, and equals .

class MyAnnotationPlugin : pubic Nepomuk::AnnotationPlugin
{
public:
    MyAnnotationPlugin(QObject* parent, const QVariantList&);
protected:
    void doGetPossibleAnnotations(const Nepomuk::AnnotationRequest&);
};

void MyAnnotationPlugin::doGetPossibleAnnotations(
      const Nepomuk::AnnotationRequest& request
)
{
    // MyFancyAnnotation can do all sorts of crazy things like creating
    // whole graphs of data or even openeing another GUI
    addNewAnnotation(new MyFancyAnnotation(request));

    // SimpleAnnotation can be used to create simple key/value pairs
    Nepomuk::Types::Property property(Soprano::Vocabulary::NAO::prefLabel());
    Nepomuk::SimpleAnnotation* anno = new Nepomuk::SimpleAnnotation();
    anno->setProperty(property);
    anno->setValue("Hello World");
    // currently only the comment is used in the existing GUIs
    anno->setComment("Set label to 'Hello World'");
    addNewAnnotation(anno);

    // tell the framework that we are done. All this could also
    // be async
    emitFinsihed();
}

And Now?

At the Nepomuk workshop Tom Albers already experimented with integrating the annotation suggestions into Mailody. It is rather simple to do that but the framework still needs polishing. More importantly, however, the created data needs to be presented to the user in a more appealing way. In short: I need help with all this!

Integrate it into your applications, improve it, come up with new ways of presenting the information, write new plugins. Jump on board of the semantic desktop train.

Thanks for reading.