So the Nepomuk project is over (as in Nepomuk – the big research project, not Nepomuk in KDE) but thanks to Mandriva and the new french research project Scribo I will continue working on the semantic desktop in KDE. Scribo is all about brining natural language processing (NLP) to the desktop. And Mandriva will again (as done with Nepomuk) bring it to the KDE desktop. This is exiting, especially since it integrates very well with the ideas of the semantic desktop: we can analyze emails, text documents or web pages and extract machine readable information from it which we can then use within Nepomuk.
As a first glance at what can be done I created a little plugin system (not unlike my annotation system for Nepomuk which is still in playground, trying to mature) for text analysis.
So far I implemented two plugins which analyze a text and propose certain entities or statements. The first one uses the keyword extraction developed by DERI Galway. This provides a list of keywords with corresponding relevance values which could then be used as tags or be mapped to resources in the Nepomuk store (imagine projects or persons or whatever). The second one uses the OpenCalais web service by Reuters which uses a huge database and some fancy algorithms to extract entities and statements from the text. An example can be seen in the following screenshot – “Linux” has been detected as a Technology and can be found twice in the text as such. Like the keyword extraction OpenCalais also provides a relevance. But in addition we get the position(s) in the text and the type of entity (based on an ontology created by Reuters).
I think this is already quite nice. Now it is time to use this stuff in something more than a test app, to propose annotations for files and emails for example (BTW: I already implemented a plugin for my annotation framework based on the Scribo framework – ah, isn’t it nice if it all fits together)
Anyway, this is what Scribo will bring us: NLP. Enjoy, discuss, flame, praise. :)