Let me open with a few stats just to brag:
- Top bug killer on the last commit digest
- Number of Nepomuk crash reports now below 100
- Overall number of Nepomuk bugs down to 163 (this is actually not much, have a look at the related statistics)
- I closed some serious bugs this week (details below)
If you want to track the progress you can use the following links to check from time to time:
- Open Nepomuk crash reports
- Open Nepomuk bug reports excluding crashes
- Open Nepomuk wishes
- Closed Nepomuk bugs since this whole thing started
Finally I want to present two fixes I did this last week just to show what kind of work needs to be done in order to fix problems in Nepomuk:
1. Bug 281136 – Nepomuk queries containing unicode characters fail
The problem presented itself as follows: whenever the user would execute a query containing extended characters such as german umlauts, french accents, or for example any russian character the query would not return any results.
After some testing I realized that the queries simply failed when being delivered to Virtuoso because of Nepomuk’s automatic search excerpt extraction. It turned out that Virtuoso’s bif:search_excerpt method cannot handle wide characters which is exactly what it got. So I turned to the Virtuoso team for help and got a workaround which essentially means that we convert the wide characters to UTF8. However, this results in stripped search excerpts so the story does not end yet – I am waiting for a better solution from the Virtuoso guys.
2. Nepomuk deletes annotations of files on removable media
This was a very interesting bug – to me at least. The problem was that Nepomuk would delete the manually added information like tags, ratings, relations to other files, and so on from files that are stored on an external hard disk.
Now to understand this problem better I have to explain a bit how Nepomuk handles external media: Nepomuk uses Soprano’s Api to access RDF data. This is done through a whole stack of what we call models, each of which performs some operations on the data that passes through. One of these models handles external media. It converts each URL of a file from an external media into a new URL which is independent of the media’s mount point.
Imagine for example that the external hard disk with UUID “foobar” is mounted at /media/hd. Then a URL like file:///media/hd/myfile.txt is converted to filex://foobar/myfile.txt. That way Nepomuk will find the file again even when the disk is mounted at another path. This conversion happens transparently for all clients, meaning they only work with the local file:/ URLs. A nice side-effect is that when the disk is not mounted any code that performs clean-up like removing data for non-existing files will ignore those entries since they have no relation to the mount point.
On to the bug. Thankfully Ignacio Serantes realized that he only lost the information from files that had spaces in their names. That already pointed to a URL encoding problem. When we convert URIs from and to strings we use percent encoding. If all goes well this works fine. However, if we have a bug we might end up percent-encoding the percent-encoded URI. This was the case in the removable media handling of Nepomuk. When converting the internal filex:/ URL back to its file:// counterpart the percent encoding got borked. As a result the clean-up code would check for the existence of the wrong local URL and remove the related data. The fix involved some trickery with QUrl and KUrl and reminded me that unit tests involving URIs should always check for possible percent-encoding problems.
Well, the hunt for bugs is going on. In the meantime I am also still hunting for Nepomuk funding.