The Hunt For Nepomuk Bugs Continues

Let me open with a few stats just to brag:

  • Top bug killer on the last commit digest
  • Number of Nepomuk crash reports now below 100
  • Overall number of Nepomuk bugs down to 163 (this is actually not much, have a look at the related statistics)
  • I closed some serious bugs this week (details below)

If you want to track the progress you can use the following links to check from time to time:

Finally I want to present two fixes I did this last week just to show what kind of work needs to be done in order to fix problems in Nepomuk:

1. Bug 281136 – Nepomuk queries containing unicode characters fail

The problem presented itself as follows: whenever the user would execute a query containing extended characters such as german umlauts, french accents, or for example any russian character the query would not return any results.

After some testing I realized that the queries simply failed when being delivered to Virtuoso because of Nepomuk’s automatic search excerpt extraction. It turned out that Virtuoso’s bif:search_excerpt method cannot handle wide characters which is exactly what it got. So I turned to the Virtuoso team for help and got a workaround which essentially means that we convert the wide characters to UTF8. However, this results in stripped search excerpts so the story does not end yet – I am waiting for a better solution from the Virtuoso guys.

2. Nepomuk deletes annotations of files on removable media

This was a very interesting bug – to me at least. The problem was that Nepomuk would delete the manually added information like tags, ratings, relations to other files, and so on from files that are stored on an external hard disk.

Now to understand this problem better I have to explain a bit how Nepomuk handles external media: Nepomuk uses Soprano’s Api to access RDF data. This is done through a whole stack of what we call models, each of which performs some operations on the data that passes through. One of these models handles external media. It converts each URL of a file from an external media into a new URL which is independent of the media’s mount point.

Imagine for example that the external hard disk with UUID “foobar” is mounted at /media/hd. Then a URL like file:///media/hd/myfile.txt is converted to filex://foobar/myfile.txt. That way Nepomuk will find the file again even when the disk is mounted at another path. This conversion happens transparently for all clients, meaning they only work with the local file:/ URLs. A nice side-effect is that when the disk is not mounted any code that performs clean-up like removing data for non-existing files will ignore those entries since they have no relation to the mount point.

On to the bug. Thankfully Ignacio Serantes realized that he only lost the information from files that had spaces in their names. That already pointed to a URL encoding problem. When we convert URIs from and to strings we use percent encoding. If all goes well this works fine. However, if we have a bug we might end up percent-encoding the percent-encoded URI. This was the case in the removable media handling of Nepomuk. When converting the internal filex:/ URL back to its file:// counterpart the percent encoding got borked. As a result the clean-up code would check for the existence of the wrong local URL and remove the related data. The fix involved some trickery with QUrl and KUrl and reminded me that unit tests involving URIs should always check for possible percent-encoding problems.

Well, the hunt for bugs is going on. In the meantime I am also still hunting for Nepomuk funding.

Click here to lend your support to: Nepomuk - The semantic desktop on KDE and make a donation at !
Click here to donate to Nepomukvia Moneybookers


11 thoughts on “The Hunt For Nepomuk Bugs Continues

  1. Awesome.

    I see you have done at least some changes on but do you have any plans to change the design to one used on phonon- and

  2. Thanks, Sebastian, for isolating and killing the bug 281136 I just reported.

    However, I couldn’t isolate and report it properly without Nepoogle and its already workarounded NEPOMUK search methods. I wish you luck with all NEPOMUK bugs, and luck also on getting funding!

  3. I checked it yesterday in my trunk build – flawless. In the beginning the email indexing used up 50% CPU for a long while (~a night), now it only has a few flares once in a while. One complaint however: Nepomuk is sinking fast from the statistics list – Soon I will need to select the top 50 of programs, because top 30 does not suffice anymore!

    Thanks a lot!

  4. Great work, Sebastian! On behalf of all of us who can see a revolutionary future for an intelligent, semantic operating system beyond anything available today… I salute you! Thanks for all your work, listening to the community, and generally kicking ass. May you (and the semantic desktop) get the funding, so richly deserved!

  5. I got told it may be a Nepomuk bug, not a Digikam bug: I once started Digikam (with Nepomuk integration activated) without Nepomuk running and its tag list got swarmed with internal Nepomuk tags up to the point of being basically useless now. Unfortunately, while I do have backups of my photos and the Digikam database, they’re too new to return to an earlier state. My question is how to correct that, without days of manually deleting those tags.

  6. I have laptop and home server that holds all the files and shares them through nfsv4. So I’m not always connected to the server. Stringi indexes all the files fine but if I reboot without the server connected and then connect it later on I have to index everything again which renders it useless. Is this kind of behaviour known? I use KDE 4.7.2.

  7. Pingback: A Word (or Two) on Removable Storage Media Handling in Nepomuk | Trueg's Blog

  8. I’m using kubuntu 11.10 and I get lots of akonadi/nepomuk problems. Just this morning akonadi_nepomuk_email_feeder went through my 4GB and RAM and 4.7GB of swap, crashed and did it again … and again … (I’ve never run out of memory before). I also get nepomuk stub crashes frequently. The problem is that I can’t report any of these crashes because I can’t get backtraces with symbols. I’ve installed every debug package I can find. The crash report “install debug” packages doesn’t install anything. No one tells me how to solve the problem so I can report bugs. Is there something wrong with the kubuntu packaging? Meanwhile in my experience nepomuk/strigi/kmail are as unusable as they were two years ago.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s