Archive

Archive for the ‘Scribo’ Category

GSoC Wrap-up Part 2

August 28, 2009 54 comments

Last time I presented the work Adam Kidder did on Nepomuk virtual folders in the GSoC. Today the story continues with the work by Alessandro Sivieri, my second GSoC student.

Whenever we handle files on the computer we need to bother with folder structures and file names. We need to come up with good naming schemes which allow us to find our files. We need to decide several times a day in which folder a file should go – should it go into folder A or B or should I create a subfolder? In the end there is always a little bit of chaos, even with the most structured minds. Alessandro tried a different approach in his project: save and load documents based on meta data and annotations rather than file and folder names.

This is not an easy task but I dare say that he succeeded. Alessandro created two new dialogs for saving and loading documents (we do not talk about files anymore – way too technical). The saving dialog allows to create arbitrary annotations for the document using the Nepomuk annotation plugin system which also brings in Scribo text analysis features. The loading dialog on the other hand uses a fancy filter system to narrow down the list of documents to open.

Saving Documents

We start by looking at the document saving dialog. Our example is KWord from which we want to save a fancy little text document. (No, it is not a test document, I really wrote this, this is real data, I assure you! … Yeah, OK, I admit it, just random words…) Hitting the save button opens up the new smart save dialog as can be seen in the screenshot below.

Smart Saving of a KWord document

Smart Saving of a KWord document

The first thing we notice is that there is no filename and no folder selection. Name and folder are selected by Nepomuk. However, we get to give the document a name (it makes things much easier for us later on) and a description (in a future version applications will be able to prefill these fields with some meaningful defaults). But the interesting part is the meta data. The dialog suggests certain possible annotations which we can approve). Below the recently used annotations we have the possiblity to add any annotation we want through the existing Nepomuk annotation system. Last but not least we can give the document a type. This type does not identify the document on a mime-type level but much more real-life oriented. The idea is that users either define their own types based on pimo:Document or use ontologies that provide them. Typical examples include invoices or letters or project descriptions. This way documents are saved on a much higher abstraction level than with the classical file chooser: instead of a text file we save an invoice.

Once we specified the meta data we want to apply to the new document and hit the save button the smart save dialog generates a folder and file name and saves the document. We do not need to care about the location.

(Hint: there are certainly situations in which we want to use the classic file chooser. That is why the smart save dialog allows to switch over to the old ways by the simple click of a button.)

Loading Documents

But if documents are saved in some random folder which we do not know, how do we find them again? Well, that is the real beauty of the new approach. The idea is that you tell the open dialog what you want to open by specifying some details that you remember.

Let us have a look at the smart open dialog as it opens from within Okular.

smartopen-okular1

We see two main views: on the left hand side we see a list of filters and on the right hand side we see a long list of files/documents. This might look overwhelming in the beginning but wait until we specify the first detail about the document we want to open: we tell the dialog that the document has mime type image/png (Yes, in the future this will look less technical) and the file view changes only showing png images.

smartopen-okular2

These are still way too many to search for the one we need, so we give more detail. We remember that we accessed the document sometime this week:

smartopen-okular3

Again the list of files is changed and now after only choosing two filters we are down to seven documents to choose from. Although this would be enough we do one better just to show that the filter system obviously also includes manual annotations such as tags:

smartopen-okular4

And after activating the tag filter we are down to a single document. Nice, isn’t it?

A Few Technical Details

There are a few technical aspects worth mentioning about Alessandro’s work.

First of all: he makes direct use of Adam’s work on the virtual folders. The file list on the right is a simple KDirModel listing a nepomuksearch:/?sparql=… query. I find this very nice as my two students shared knowledge and discussed their work to find good solutions to their problems.

The second thing I find important is the creation of the filter list. The list of filters is created dynamically based on the existing annotations of the files in the current selection. In essence the idea is to only show filters that would actually change the list of available files (as you can see in the last screenshot this does not work 100% yet but we are close).

The GUI is obviously a prototype and we hope that you will give feedback and ideas to improve its usability. As Adam, Alessandro will continue working on KDE and Nepomuk and the smart file dialog will evolve until KDE 4.4.

Try It

To test the smart file dialog you need three things:

  1. My kdelibs patch which makes the KFileDialog pluggable. This is actually a very simple one as the file dialog already loads the backend from a separate lib. While you are on it, please review the patch so it can get into KDE 4.4.
  2. The Nepomuk-KDE playground module which also contains the smart save dialog. I recommend installing the whole module as the smart save dialog makes use of pretty much every Nepomuk lib available.
  3. Tell KFileDialog to load the smartfilemodule instead of the default by adding “file module=smartfilemodule” into the “KFileDialog Settings” group of kdeglobals.

Obviously nepomuk needs to be enabled for it to work. Have fun.

A Nepomuk Vision

August 11, 2009 6 comments

It is always a pleasure being able to blog about someone else writing about Nepomuk. In this case it is Danté Ashton giving his vision of the Nepomuk and Scribo enabled semantic desktop. He has some very nice ideas and I hope that his words will encourage some more people to join the Nepomuk development team (we could really use the help).

Reblog this post [with Zemanta]

What Nepomuk Can do and How You Should Use it (as a Developer)

July 16, 2009 26 comments

Nepomuk has been around for quite a while but the functionality exposed in KDE 4.3 is still not that impressive. This does not mean that there does not exist cool stuff. It only means that there is not enough developer power to get it all stable and integrated perfectly. Let me give you an overview of what already exists in playground and how it can be used (and how you should use it).

The Basics

For starters there is the Nepomuk API in kdelibs which you should get familiar with.Most importantly (we will use it quite a lot later on) there is Nepomuk::Resource which gives access to arbitrary resources in Nepomuk.

Nepomuk::Resource file( myFilePath );
file.addTag( Nepomuk::Tag( “Fancy stuff” ) );
QString desc = file.description();
QList<Nepomuk::Tag> allTags = Nepomuk::Tag::allTags();

Resource allows simple manipulation of data in Nepomuk. Using some fancy cmake magic through the new NepomukAddOntologyClasses macro in kdelibs data manipulation gets even simpler. The second basic thing you should get familliar with is Soprano and SPARQL. As a quickstart the following code shows how I typically create queries using Soprano:

using namespace Soprano;

Model* model = Nepomuk::ResourceManager::instance()->mainModel();
QString query = QString( “prefix nao:%1 “
                         “select ?x where { “
                         “%2 nao:hasTag ?t . “
                         “?r nao:hasTag ?t . }” )
        .arg(Node::resourceToN3(Vocabulary::NAO::naoNamespace()))
        .arg(Node::resourceToN3(file.resourceUri()));
QueryResultIterator it
        = model->executeQuery( query, Query::QueryLanguageSparql );

As you can see there is always a lot of QString::arg involved to prevent hard-coding of URIs (again Soprano provides some cmake magic for generating Vocabulary namespaces).

These are the basics. Without these basics you cannot use Nepomuk.

Debugging Nepomuk Data

Now before we dive into the unstable, experimental, and really cool stuff let me mention sopranocmd.

sopranocmd is a command line tool that comes with Soprano and allows to perform virtually any operation possible on the Nepomuk RDF database. It has an exhaustive help output and you should use it to debug your data, test your queries and the like (if anyone is interested in creating a graphical version, please step up).

The Nepomuk database (hosting only a single Soprano model called “main”) can be accessed though D-Bus as follows:

sopranocmd --dbus org.kde.NepomukStorage --model main \
      query "select ?r where { ?r ?p ?o . }"

The Good Stuff

There is quite a lot of experimental stuff in the playground but I want to focus on the annotation framework and Scribo.

The central idea of the annotation framework is the annotation suggestion which is encapsulated in the Annotation class (Hint: run “make apidox” in the annotationplugin folder). Instead of the user manually annotating resources (adding tags or relating things to other things) the system proposes annotations which the user then simply acknowledges or discards. These Annotation instances are normally created by AnnotationPlugin instances (although it is perfectly possible to create them some other way) which are trigged through an AnnotationRequest.

Before I continue a short piece of code for the impatient:

Resource res = getResource();

AnnotationPluginWrapper* wrapper = new AnnotationPluginWrapper();
wrapper.setPlugins( AnnotationPluginFactory::instance()
   ->getPluginsSupportingAnnotationForResource( res.resourceUri() ) );
connect( wrapper, SIGNAL(newAnnotation(Nepomuk::Annotation*)),
         this, SLOT(addNewAnnotation(Nepomuk::Annotation*)) );
connect( wrapper, SIGNAL(finished()),
         this, SLOT(slotFinished()) );

AnnotationRequest req;
req.setResource( res );
req.setFilter( filter );
wrapper->getPossibleAnnotations( req );

The AnnotationPluginWrapper is just a convenience class which prevents us from connecting to each plugin separately. It reproduces the same signals the plugins emit.

The interesting part is the AnnotationRequest. At the moment (the framework is under development. This also means that your ideas, patches, and even refactoring actions are very welcome) it has three parameters, all of which are optional:

  1. A resource – The resource for which the annotation should be created. This parameter is a bit tricky as the Annotation::create method allows to create an annotation on an arbitrary resource but in some cases it makes perfect sense to only create annotation suggestions for only one resource.
  2. A filter string – A filter is supposed to be a short string entered by the user which triggers an auto-completion via annotations. Plugins should also take the resource into account if it is set.
  3. A text – An arbitrary long text which is to be analyzed by plugins. Plugins would typically extract keywords or concepts from it. Plugins should also take resource and filter into account if possible. This is where the Scribo system comes in (more later).

Plugins that I already created include very simple ones like the tag plugin which matches the filter to existing tag names and also excludes tags already set on the resource. Way more interesting are other plugins like the pimotype plugin which matches the filter to pimo types and proposed to use that type or the pimo relation plugin which allows to create relations via a very simple syntax: “author:trueg“. The latter will match author to existing properties and trueg to a value based on the property range. One step further goes the geonames annotation plugin which matches the filter or the resource label to cities or countries using the geonames web service. It will then propose to set a location or (in case the resource label was matched) to convert the resource into a city or country linking to the geonames resource.

A picture says more than a thousand words. Thus, here goes:

annotations-english

What do we see here? The user entered the text Paris in the AnnotationWidget (a class available in the framework) and the framework then created a set of suggested annotations. The most likely one is Paris, the city in France as sugested by the geonames plugin. The latter also proposes a few not so likely places. The pimotype plugin proposes to create a new type named Paris and the tag plugin proposes to create a new tag named Paris. Here I see room for improvement: if we can relate to the city Paris there is no need for the tag. Thus, some more sophisticated rating and comparision may be in order.

Now let us bring Scribo into play. Scribo is another framework in the playground which provides an API for text analysis and keyword extraction. It is tied into the annotation framework through a dedicated plugin which uses the TextAnnotation class to create annotations on specific text positions. The TextAnnotation class is supposed to be used to annotate text documents. It will create a new nfo:TextDocument and make it a nie:isPartOf the main document. Then the new resource is annotated according to the implementation.

The Scribo framework will extract keywords and entities from the text (specified via the AnnotationRequest text field) via plugins which will then be used to create annotation suggestions. There currently exist three plugins for Scribo: the datetime plugin extracts dates and times, the pimo plugin matches words in the text to things in the Nepomuk database, and the OpenCalais plugin will use the OpenCalais webservice to extract entities from the text.

You can try the Scribo framework by using the scriboshell which can be found in the playground, too:

scriboshell3

Paste the text to analyze in the left view and press the “Start” button. The right panel will then show all found entities and keywords including the text position and relevance.

The other possibility is to directly use the resourceeditor which is part of the annotation framework and bundles all gui elements the latter has to offer in one widget. Call it on a text file and you will get a window similar to the following:

resourceeditor

At the top you have the typical things: editable label and description, the rating, and the tags. Below that you have the exisiting properties and annotations. In the picture these are only properties extracted by Strigi. Then comes the interesting part: the suggestions. Here you can see three different Scribo plugins in action. First the pimo plugin matched the word “Brein” to an event I already had in my Nepomuk database. Then there is the OpenCalais plugin which extracted the “Commission of European Communities” (so far the plugin ignores the additional semantic information provided by OpenCalais) and proposes to tag the text with it.

The last suggested annotation that we can see is “Create Event“. This is a very interesting hack I did. The Scribo plugin detected the mentioning of a project, a date, and persons and thus, proposes to create an event which has as its topic the project and takes place at the extracted time. Since it is a hack created specifically for a demo its results will not be very great in many situations. But it shows the direction which I would like to take.

Below the suggestions you can see the AnnotationWidget again which allows to manually annotate the file.

How to Write an AnnotationPlugin

This is a Howto in three sentences: Derive from AnnotationPlugin and implement doGetPossibleAnnotations. In that method trigger the creation of annotations. Your annotations can be instances of SimpleAnnotation or be based on Annotation and implement at least doCreate, exists, and equals .

class MyAnnotationPlugin : pubic Nepomuk::AnnotationPlugin
{
public:
    MyAnnotationPlugin(QObject* parent, const QVariantList&);
protected:
    void doGetPossibleAnnotations(const Nepomuk::AnnotationRequest&);
};

void MyAnnotationPlugin::doGetPossibleAnnotations(
      const Nepomuk::AnnotationRequest& request
)
{
    // MyFancyAnnotation can do all sorts of crazy things like creating
    // whole graphs of data or even openeing another GUI
    addNewAnnotation(new MyFancyAnnotation(request));

    // SimpleAnnotation can be used to create simple key/value pairs
    Nepomuk::Types::Property property(Soprano::Vocabulary::NAO::prefLabel());
    Nepomuk::SimpleAnnotation* anno = new Nepomuk::SimpleAnnotation();
    anno->setProperty(property);
    anno->setValue("Hello World");
    // currently only the comment is used in the existing GUIs
    anno->setComment("Set label to 'Hello World'");
    addNewAnnotation(anno);

    // tell the framework that we are done. All this could also
    // be async
    emitFinsihed();
}

And Now?

At the Nepomuk workshop Tom Albers already experimented with integrating the annotation suggestions into Mailody. It is rather simple to do that but the framework still needs polishing. More importantly, however, the created data needs to be presented to the user in a more appealing way. In short: I need help with all this!

Integrate it into your applications, improve it, come up with new ways of presenting the information, write new plugins. Jump on board of the semantic desktop train.

Thanks for reading.

Scribo – Getting Natural Language Into The Mix

May 14, 2009 14 comments

So the Nepomuk project is over (as in Nepomuk – the big research project, not Nepomuk in KDE) but thanks to Mandriva and the new french research project Scribo I will continue working on the semantic desktop in KDE. Scribo is all about brining natural language processing (NLP) to the desktop. And Mandriva will again (as done with Nepomuk) bring it to the KDE desktop. This is exiting, especially since it integrates very well with the ideas of the semantic desktop: we can analyze emails, text documents or web pages and extract machine readable information from it which we can then use within Nepomuk.

As a first glance at what can be done I created a little plugin system (not unlike my annotation system for Nepomuk which is still in playground, trying to mature) for text analysis.

Scribo analyzed a block of text

Scribo analyzed a block of text

So far I implemented two plugins which analyze a text and propose certain entities or statements. The first one uses the keyword extraction developed by DERI Galway. This provides a list of keywords with corresponding relevance values which could then be used as tags or be mapped to resources in the Nepomuk store (imagine projects or persons or whatever). The second one uses the OpenCalais web service by Reuters which uses a huge database and some fancy algorithms to extract entities and statements from the text. An example can be seen in the following screenshot – “Linux” has been detected as a Technology and can be found twice in the text as such.  Like the keyword extraction OpenCalais also provides a relevance. But in addition we get the position(s) in the text and the type of entity (based on an ontology created by Reuters).

Showing one detected entity

Showing one detected entity

I think this is already quite nice. Now it is time to use this stuff in something more than a test app, to propose annotations for files and emails for example (BTW: I already implemented a plugin for my annotation framework based on the Scribo framework – ah, isn’t it nice if it all fits together)

Anyway, this is what Scribo will bring us: NLP. Enjoy, discuss, flame, praise. :)

Scribo Configuration

Scribo Configuration