KDE 4.7.3 – The (First) Nepomuk Stability Release

Now that KDE 4.7.3 has been released let me look back onto the work that I put into it over the last weeks.

  • I fixed four actual crashes in 4.7.3. On first glance this might not sound like much but these four crash fixes entail 38 duplicates.
  • I finally managed to close the memory leak in the file watcher service.
  • I significantly improved the file indexer:
    • Exclusion filters are now also correctly taken into account for folders.
    • .xsession-errors is now always excluded from indexing.
    • Rapidly changing files are only indexed once closed. This results in a lot less IO.
    • The previous change also results in torrent downloads being indexed after finished.
    • Files that are written over and over (like IRC logs for example) are only re-indexed once every five seconds.
    • Nepomuk now always extracts the plain text from PDF files via pdftotext. This is a hack to make sure that we can at least search all PDFs by content. The next step will be to extract meta-data like title and author via poppler. (This is required since the PDF analyzer in libstreamanalyzer/Strigi is not powerful enough yet.)
    • Symbolic links now have the correct mime type which means better search results.
    • In case the indexer gets stuck (runs forever on one file) it is killed after a period of time.
    • Jos van den Oever fixed a bunch of issues in libstreamanalyzer (Strigi) which results in less crashes and less endless PDF indexing. Stay tuned for Strigi 0.7.7.
  • With Soprano 2.7.3 Nepomuk will now restart the storage if Virtuoso goes down due to a crash or a third-party kill.
  • A running Virtuoso instance which was not shut down due to a crashed or killed Nepomuk will now gracefully be shut down before starting a new instance. This solves some startup issues.
  • A small query performance improvement based on a pointless UNION.
  • Smit Shah backported his patch which gets rid of the flickering Nepomuk indexer icon in the system tray. It now only becomes active if the indexer has been working for a certain period of time.

All in all 15 bugs are marked with “FIXED-IN: 4.7.3”. This does not include the fixes and improvements I made which did not have matching reports.

Today the next round of Nepomuk stability and performance begins. If all goes well KDE 4.7.4 should be rock-solid when it comes to Nepomuk. Thanks a lot for your continued support. I am still hopeful that I will find a more permanent solution soon:

Click here to lend your support to: Nepomuk - The semantic desktop on KDE and make a donation at www.pledgie.com !
Click here to donate to Nepomukvia Moneybookers

30 thoughts on “KDE 4.7.3 – The (First) Nepomuk Stability Release

  1. Superb!

    Good things are being said about all your continued bug fixing/enhancement work in the KDE related forums I visit, and well deserved they are.

    A lot of other development blogs could take a leaf out of your book and do regular reports on what work is being carried out.

    Thank you so much for all this work.

  2. Hello Sebastian!

    >Rapidly changing files are only indexed once closed. This results in a lot less IO.

    >Files that are written over and over (like IRC logs for example) are only re-indexed once every five seconds.

    Didn’t you say that Nepomuk only reindexes a file*after* it gets the modified event AND the closed event? The first quote confirms that, while the second quote seems to contradict it. IRC log is open if it changes, so according to what you were saying, it should not be reindexed at all? How does Nepomuk differentiate a torrent from an IRC log? (Or maybe IRC clients open, write into and close the log file every 3 seconds? That’d be crazy to say the least.)

    Other than that, I’d say that your work definitely deserves the funding it gets, contrary to some GSOC projects that get 5 000 dollars to write 10 plasmoids in QML (an week-long intern task at a real software company at best, and not two months of a qualified dev’s work which 5 000 doloars are.)

    • Not that I’m comparing newbies with Trueg, but GSoC has its own place, just like as a KDE contributor I have no issue helping out a student for 30 mins rather than fixing a bug, Its absolutely ok for a newbie to produce less output than a seasoned dev as long as (s)he puts in effort. Please do not underrate people’s efforts (and not demotivate new contributors in the process; Not everyone can be a dfaure)

  3. I don’t think a difference is made between irc logs, torrents, etc.

    What is meant that when a file’s (inotify) modify/close events are received, nepomuk waits another 5 seconds just to be sure. Just as it should be. Great work!

    Can’t wait to start using nepomuk in earnest, for all my scientific papers in pdf format. I believe there are some patches out there for better pdf indexing (including metadata), patches for strigi that is I think

  4. Good work!
    These fixes are very important and in my opinion they should be better advertised. The release announcement or at least the changelog should list them.

  5. In two months there have been amazing strides in Nepomuk. But I’m waiting anxiously for the next two months: optimizations, and… you getting a stable income.

    It’s sad that our Nepomuk happiness had to happen because of your tragedy, Sebastian. But at least you’ll have € 10,000 to support you. Thank you very much for all your effort. Hope you find a job soon.

      • I also appreciate the efforts and the results. Really.

        But a simple configuration switch would have been enough. IIRC KDE 4 was out in 2008 and only now are this issues being fixed (I know, new tech and all that stuff). I suppose that meant loosing many users and testers that were simply fed up with all the promises.

        With the switch I’ve mentioned and a port of kicker, KDE 4 would have been a simple evolution from KDE3. All this semantic stuff, and plasma could have been developed just the same in parallel…

  6. Finally someone is polishing that semantic bloat!! Don’t get me wrong I love kde, but kde should provide lighter experience (e.g. you don’t even need to know what akonadi is, you can use thunderbird for emails and no kdepim but this fucker will eat 100MB of ram all the time just because “display calendar events” is enabled by default) some things need to be re-thinked (i was shocked couple days ago, amarok is not kde per se, but is very kde’ish software – playing one mp3 eats 300MB of ram, that’s insane!!)

  7. First of all, thanks a lot for your work.

    Maybe I’m a little bit OT, but i’m too curious:
    1) Is it tecnically possible to use “pieces” of tracker instead of libstreamanalyzer ?
    2) Times ago, I red “somewhere” about the use of nepomuk in gnome: is there something of official or are they just “rumors”?

    • No. They’ve forked the Shared Desktop Ontologies and made incompatible changes, plus there is a lot of tracker specific stuff in their extractors. If they’re willing to make a nice library, and fix the ontologies, it should be possible.

  8. Hi! I have one question about the p2p functionality of nepomuk.

    I read that Nepomuk implements SON* functionality to share semantic data on a GridVine based network. Unfortunately I haven’t found much more, all the documentation seems to stop around 2009, when the GridVine project was over.

    Do you have any more information? I’d be curious to know where are you at with that. Does it work already? Is it working well?

      • I’m really sorry to hear that. Do you know if there is anyone specifically I can talk to about that? Since I’m working on something potentially similar, I would like to know what exactly were the problems involved, at lest to avoid repeating them..

  9. A little off topic: I saw this video and thought… “Hey, if we can communicate that Nepomuk is like the failed WinFS, only better, working, and FREE, Nepomuk will be more understood”.

    If you look at WinFS docs, you’ll quickly notice the similiarities between WinFS and Nepomuk. Microsoft talks about ontologies, talks about metadata, but, unlike Nepomuk, WinFS is at the filesystem level. A video.

    If we can do all the (file navigation: images, music, etc.) things happening there with Nepomuk, we would claim a small but important victory over Microsoft.

  10. just one question.

    Is normal that start nepomuk, full indexes my home folder and goes “idle”, then logout my sesion, login again and then nepomuk again start to index my home folder?. Then wait to finish, close logout and login and the thing again?.

    If you ask me, there’s no reason for that, because there’s no changes in my home folder.

  11. Hi Sebastian,

    great to see Nepomuk becoming more stable.

    When I read “Files that are written over and over (like IRC logs for example) are only re-indexed once every five seconds.” I wondered wheter this needs further tweaking? Why scan a log every five seconds? I would suggest every 60 seconds or so.

    Or how about making this configurable? How about some Nepomuk “profiles” like “Aggressive/Immediatly up-to-date” “Normal” and “Save resources”

    Keep up the good work!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s