A Summer 2010 Full of Nepomuk Code


Another year, another Google Summer of Code, another 4 (yes, four!) semantic desktop projects. It is amazing. After two very successful projects in 2009 we now take it one step further with three Nepomuk projects and one Strigi project. Without further ado I give you the Nepomuk Google Summer of Code 2010 projects:

Metadata Backup, Sync and Sharing by Vishesh Handa

Ever since we started to create meta data on the desktop (by this I mean tags, ratings, and relations between resources that can not be recreated easily) we also had the need for backup and syncing of this data. So far this area is lacking in Nepomuk. Vishesh sets out to change this situation and develop ways to sync meta data between different clients (imagine syncing your laptop with the desktop computer or the phone) or simply to back it up. This does not simply mean to code some backup GUI – it actually includes changes on the ontology (the data-) level. When syncing data between two clients (or syncing data between a client and a backup – the principle is the same) the two most complicated matters are: 1. identifying the resources which need to be merged on both ends and 2. deciding which data needs to be removed and which to be added.

Well, it suffices to say that Vishesh has an ambitious project ahead of him. But looking at his enthusiasm and his early involvement in KDE (he is already commiting one patch after the other) I am very confident that he will succeed.

Web Metadata Extractor Framework and Service by Artem Serebriyskiy

In Nepomuk we use the Strigi system to extract meta data from files and store them in the Nepomuk database, allowing the user to search files based on their meta data. This is very useful. However, there are certain types of files that do not provide much or no meta data at all. Typical examples are video files. It would be very interesting to be able to search for video files by title, actors, directors, or release year. All this information is available on the Internet. So why not make use of it?

This is exactly what Artem’s project is about: extract meta data from the web and associate it with local files. Of course he will implement this as a Nepomuk service that provides a plugin system allowing for different types of extractors and being able to handle uncertainties and information duplicates as smoothly as possible. Look out for more cool information on your fingertips.

Nepomuk Dedicated Desktop Search GUI by Oszkar Ambrus

Let’s face it: today desktop search is still the number one use case for Nepomuk (although it was not the original motivation. But that is another story.) So having a good and convenient user interface is essential for the success of the system. We have several interfaces in KDE including the search bar in Dolphin and the search runner. But all are lacking in at least two main areas: 1. the query building: so far one has to know a lot about the underlying data structures to write powerful queries; and 2. the presentation of the search results: currently the results are presented like any other folder excluding interesting information like a hit score or details on why the result was returned. (Actually there is a number three which I hope Oszkar will have the time to attack: since we have more than file results we need a good way to open and present these resources.)

Oszkar sets out to improve this situation and create reusable components to let the user create powerful queries without much knowledge of the data and to present the results in a convenient way. An important project that will undoubtedly yield great results.

Strigi: Stream Analyzer based on Data Structure Descriptions

Jos was kind enough to write a paragraph on the Strigi project:

Yet another project has been granted. Yulia Medvedeva will work on a new type of file analyzer for Strigi. The goal of the project is to write the structure of files down in a grammar file and generate code from the grammar or parse the grammar at runtime. Writing analyzers usually involves quite a bit of repetitive error-prone code. It also requires knowledge of C++. By writing the format in a grammar language, coding errors are avoided. In adddition to that, the independence of the programming language allows the grammars to be shared with other projects.

Well, that is it for the four projects that should give Nepomuk a good push forward. I am very happy about the selection and have to say thank you to Google and the rest of the KDE mentor team for giving us this much support. It will be legendary!

10 thoughts on “A Summer 2010 Full of Nepomuk Code

  1. it is always great to hear about new cool stuff in nepomuk, however what happened to last years projects? Were they merged in trunk?

    • Yes and no: The improved query folders that Adam worked on have been merged quite a while ago and are in KDE 4.4. Alessandro’s work, however, was more advanced and experimental and is still in playground. It is, however, far from forgotten. Mandriva even ships and enables it by default. The plan is still to get it into kdebase at some point. But it needs some love.
      Alessandro also built upon his work and created SemBrowser which will soon be merged into Dolphin.

  2. In creating a new search interface it will be very important to think about the overall desktop user experience. Currently there is a search bar in Dolphin, a nepomuk runner in KRunner and KFind, these things and the new interface need to somehow be tied together in a way that makes sense to the user so that it is easy to understand what is going on and what the correct tool to use is. – oh and faceted browsing is I suppose almost searching so that’s another interface –

    Windows 7 warns the user if a directory is “not indexed” when searching it but uses the same search GUI. Of course the situation in KDE is more complex as Nepomuk (1) contains data and features not present if it is turned off (i.e. it doesn’t just provide indexing) (2) is turned on globally (3) strigi indexing per folder is also a part of the picture.

    Nevertheless I think the idea of helping the user understand what the different searches are doing is a good one.

  3. Pingback: GSoC Vision « Oszkar’s Developer Blog

  4. Pingback: Plasma Media Center Status report and introduction

Leave a comment