Taking a Break From Crash Fixing For Usability

Fixing bugs is actually more fun than I thought. It is rewarding and seeing the bug count go down feels great. But after a few weeks of mostly hunting crashes I needed to do something different for a change.

Thus, I went after the file indexer for optimizations. As discussed in the comments of this very blog downloads and rapidly changing files in general have always been a problem – they are indexed way too often. This is a clear waste of resources. In a discussion the very good idea of introducing a delay after which to re-index a changed file was born. This is what I actually did. However, while doing that I found that it can be even more improved:

In addition to file modification events the file system can tell us when a file is closed after having been opened for writing. Thus, Nepomuk now uses that event instead. For downloads that means they will be indexed only a single time: when they are done.

So now Nepomuk will only re-index files that have actually been modified (modification event) and that have been closed (close after write event). And in addition the re-indexing is delayed for 5 seconds to ensure that we do not re-index rapidly changing files all the time.

All in all this is a great improvement for IO in Nepomuk. Thanks a lot for pointing it out and getting me on the right track. (I even backported this to KDE 4.7.3.)

In other news my former GSoC student Smit Shah added another delay: the nepomukcontroller icon in your system tray will now stop flickering in and out of activity whenever a small file is indexed. Instead it will wait a short while to ensure that some longer indexing operation is in progress. Another nice usability thingi that he will also backport to 4.7.3.

And now it is back to crash fixing for me. :)

In the meantime let me mention again that I am still looking for Nepomuk funding. So far no company has given a positive answer (funnily enough I did not get a negative yet either). I am still interested in your proposals and as always your support (which has been amazing – thank you so much):

Click here to lend your support to: Nepomuk - The semantic desktop on KDE and make a donation at www.pledgie.com !
Click here to donate to Nepomukvia Moneybookers

31 thoughts on “Taking a Break From Crash Fixing For Usability

  1. Nice post. I always wanted to write few bugs about the indexer being too aggressive.

    Since you got me started here are few …things:
    – I have files since 1999 and the total size is around 10GB, not to mention movies and pictures of 200GB, so it taxes a tremendous amount of time and resources to go through all of them at LOGIN time. Isn’t there a faster way to speed up this process? Do you parse all files all over again at every login? or do you look at the date/size to see if there are changes since last time nepomuk looked at them? That would be similar to rsync (first phase).
    – I have files which are not available all the time, they are SSHFS mounts. As far as I see they are lost in the index when the mount point is not there. Maybe this should not happen.
    – I see that nepomuk is reniced. For Linux at least, a even better way is the chrt combined with ionice. Usually when I start compilation, while I am working, I do it with chrt -i 0 ionice -c 3 . This will use IDLE scheduler (real-time attribute) and IDLE for IO operations.

    Overall, I am very stubborn in continuing using the nepomuk also it fails every now and them finding the files or indexing. In the past I have used docsearcher which never failed. The only disadvantage was that it needed manual or timed reindex. But the re-index speed was FAST compared to nepomuk. ~10 minutes instead of 1-2 hours.

    Last but not least… no words can express the appreciation I have for people like you (and your student:-) contributing to the success of GPL software.

    • – In the initial scan only mtimes are compared. If something is re-indexed it is a bug.
      – sshfs is actually not supported yet. That is true. It should be fairly simple to add though – basically all I need is support in Solid. Support for nfs and samba was already added in Solid for exactly this purpose.
      – Nepomuk already uses ionice. :)
      – Nepomuk is much slower when indexing because our database is not optimized for file indexing. It is a general purpose data store which is the whole point of Nepomuk: relate everything to everything. Thus, Nepomuk will most likely always be a bit slower than solutions that are optimized for one specific task.

  2. > And in addition the re-indexing is delayed for 5 seconds to ensure that we do not re-index rapidly changing files all the time.

    What will happen if I modify a file and shutdown the pc in less of five seconds? Plasma will use a “flush-like” command on nepomuk? The file will be reindexed at the next pc’s startup? or the file will not be reindexed until the next routine check?

    • If you actually manage to do that (I really doubt it since from the click on “shutdown” to the actual shutdown Nepomuk has more than 5 seconds to finish the work) the file will be re-indexed on the next startup.

      • Tunnelvision :P People that use the powerbutton to shutdown are able to do so. I click on the button and less then 5 seconds later KDE is closed.

        But I think we are talking about something that rarely happens so you should not use your time for it (now).

  3. Now this is what a focus group gives you. (OK, you’re not a group ;) that’s irrelevant anyway) I see a very nice trend emerging in KDE, namely what I’d call “focused development of pain areas with the help of full-time developers funded this or that way”. This is how Krita did it, this is how it should go. Not bashing around blaming KDE, the world and capitalism, while waiting for some lib / api / project change, but taking an active, I dare say aggressive approach and fixing things fast. Go Sebastian go! And KDE as well.

    P.S. It’s really nice to see the most problematic area of KDE getting fixed, and getting fixed fast. Too bad it didn’t happen earlier, say 4.3 or 4.4, but hey, there’s no “would” in real life and late is better than never.

  4. This is very much appreciated! All of my Nepomuk related Crashes atm Happen when “compiling” LaTeX Files, where always several Files are generated, i.e. Overwritten. I guess that this changes solve my Problem? Thanks alot

    • Ironically, I think this “break from crash fixing for usability” will render a lot of crash bugs untriggerable, fixing them for all practical purposes, since a lot of the crashes seem to have something to do with files being reindexed too quickly.

  5. Nepomuk is being fixed at an astonishing rate. Thanks, Sebastian, not only for fixing Nepomuk, but for publishing fixes immediately. Nepomuk in KDE 4.7.3 will be 10x better than Nepomuk in KDE 4.7.0.

    I’ll try to stabilize my own economical situation, and when I do that, I’ll donate for sure.

  6. I just want to add my thanks to you for taking such a user focused and pragmatic approach of giving us massive value upfront (bug fixing, optimizations, the stuff users have been asking for) and asking for funding along side. Many would have asked for full funding *before* undertaking the work. To me this is a huge show of good faith, that no-matter-what benefits the whole community.

    In my mind, Nepomuk was already awesome from around 4.5 onwards and now with your recent efforts it’s just plain rocking. Everyone else: this is one dev / project *worth* supporting with whatever bucks you can spare. This is work that benefits us now (file search), in future innovation (the everything-is-connected-semantic-desktop) and even in the future computing platforms (Plasma Active et al). Sebastian is one of the free desktop’s (yes, Gnome’s going to use Nepomuk too too) treasure’s. Whatever you can do to help him out (cash, kind words, publicising his funding dive) please just take a minute to do it NOW! You’ll be glad you did.

  7. Just a question. Will that fix the problem with ktorrent. nepomuk allways index a running torrent again and again and again as long as it is finished or nepomuk crashes. in this time it uses 100% of my old singlecore :s

  8. Great! I’m looking forward to seeing all this on my system, but even more so: To everyone routinely having stuff switched on. Which would mean that more applications can start to use a nepomuk backend (KBibtex and Amarok come to mind), and the semantic desktop dream may finally come to blossom.

    If I look at the bug graph, I was slightly amazed to see that all your efforts in bugfixing ‘only’ negated the bug-boom of the past half year. In a perfect world of stabilizing software, I’d expect bugcount only going up over 1 entire release cycle if: *) The user base increases or *) New features were added. Which one was it here? It could be the first one, given KDE-PIM being released lately. On the other hand, PIM is only going to hit the bulk of the users now, with the upcoming releases of Kubuntu, Fedora, en openSuse.

    So I hope the challenge of keeping bugs.kde.org functioning (i.e. a controllable amount of bugs) can be met in the foreseeable future, at least for nepomuk!

  9. Actually, after reading the post closely, I realised I haven’t quite understood it. :)

    Does Nepomuk use only the close after writing event? Or (modified) && (closed after writing)? Or simply closed after writing? Also, what’s the difference between modified and closed after writing, does “modified” mean that the file is still open, but something has been written into it?

    • Or, does this mean that for some files, that are always open (like torrents, where small chunks of data are constantly added), you will use the modified event, while for the files that are open, written into once and then closed (like a Word doc), you will use the closed after writing event? And did I get it right that both events are used for any file, just the closed-after-modified is done once in the end, and the modified is done constantly?

      • It means that Nepomuk will only index a file after getting the close event. But only if it got at least one modification or create event before that. All files are handled the same, downloads are only indexed once done. In the case of torrents once they are finished downloading or once you stop a torrent.

  10. Oh wow! I really like this news! However it’s kinda sad though that the general mindset for nepomuk (and everything related to it) is : “Oh nepomuk.. gotta turn that off”

    I certainly hope that your efforts can help in making the mindset positive instead of negative for nepomuk since the technology seems promising. Just one thing i kinda dislike now is the requirement of MySQL (for akanodi if i’m right).. I rather have something lighter like a NoSQL thing (MongoDB) or SQLite. Will save tons of memory and is a lot faster as well.

    Good job so far!

  11. Would it be possible to exclude certain directories from getting indexed? E.g. I build many packages (where I can exclude the whole absolute path) but also heavily use Maven & SBT so I would like to generally exclude the “target” folder regardless of where it is on the disk?

    Besides that thanks a lot for your hard work :)

  12. This is really great news! I hope distributions realize this!

    And the funny thing is, while you did this as a break from bugfixing, I think this actually fixes a large number of bugs! Big thanks to you!

  13. Pingback: A Word (or Two) on Removable Storage Media Handling in Nepomuk | Trueg's Blog

  14. Pingback: KDE 4.7.3 – The (First) Nepomuk Stability Release | Trueg's Blog

  15. Pingback: Update on Bugs And Stuff | Trueg's Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s