Now that KDE 4.7.3 has been released let me look back onto the work that I put into it over the last weeks.
- I fixed four actual crashes in 4.7.3. On first glance this might not sound like much but these four crash fixes entail 38 duplicates.
- I finally managed to close the memory leak in the file watcher service.
- I significantly improved the file indexer:
- Exclusion filters are now also correctly taken into account for folders.
- .xsession-errors is now always excluded from indexing.
- Rapidly changing files are only indexed once closed. This results in a lot less IO.
- The previous change also results in torrent downloads being indexed after finished.
- Files that are written over and over (like IRC logs for example) are only re-indexed once every five seconds.
- Nepomuk now always extracts the plain text from PDF files via pdftotext. This is a hack to make sure that we can at least search all PDFs by content. The next step will be to extract meta-data like title and author via poppler. (This is required since the PDF analyzer in libstreamanalyzer/Strigi is not powerful enough yet.)
- Symbolic links now have the correct mime type which means better search results.
- In case the indexer gets stuck (runs forever on one file) it is killed after a period of time.
- Jos van den Oever fixed a bunch of issues in libstreamanalyzer (Strigi) which results in less crashes and less endless PDF indexing. Stay tuned for Strigi 0.7.7.
- With Soprano 2.7.3 Nepomuk will now restart the storage if Virtuoso goes down due to a crash or a third-party kill.
- A running Virtuoso instance which was not shut down due to a crashed or killed Nepomuk will now gracefully be shut down before starting a new instance. This solves some startup issues.
- A small query performance improvement based on a pointless UNION.
- Smit Shah backported his patch which gets rid of the flickering Nepomuk indexer icon in the system tray. It now only becomes active if the indexer has been working for a certain period of time.
All in all 15 bugs are marked with “FIXED-IN: 4.7.3”. This does not include the fixes and improvements I made which did not have matching reports.
Today the next round of Nepomuk stability and performance begins. If all goes well KDE 4.7.4 should be rock-solid when it comes to Nepomuk. Thanks a lot for your continued support. I am still hopeful that I will find a more permanent solution soon:
Superb!
Good things are being said about all your continued bug fixing/enhancement work in the KDE related forums I visit, and well deserved they are.
A lot of other development blogs could take a leaf out of your book and do regular reports on what work is being carried out.
Thank you so much for all this work.
Thank you for your work! Nepomuk gets better and better with every release!
Hello Sebastian!
>Rapidly changing files are only indexed once closed. This results in a lot less IO.
>Files that are written over and over (like IRC logs for example) are only re-indexed once every five seconds.
Didn’t you say that Nepomuk only reindexes a file*after* it gets the modified event AND the closed event? The first quote confirms that, while the second quote seems to contradict it. IRC log is open if it changes, so according to what you were saying, it should not be reindexed at all? How does Nepomuk differentiate a torrent from an IRC log? (Or maybe IRC clients open, write into and close the log file every 3 seconds? That’d be crazy to say the least.)
Other than that, I’d say that your work definitely deserves the funding it gets, contrary to some GSOC projects that get 5 000 dollars to write 10 plasmoids in QML (an week-long intern task at a real software company at best, and not two months of a qualified dev’s work which 5 000 doloars are.)
Simple: sometimes an IRC log is opened, closed, and opened again more than once in 5 seconds.
Not that I’m comparing newbies with Trueg, but GSoC has its own place, just like as a KDE contributor I have no issue helping out a student for 30 mins rather than fixing a bug, Its absolutely ok for a newbie to produce less output than a seasoned dev as long as (s)he puts in effort. Please do not underrate people’s efforts (and not demotivate new contributors in the process; Not everyone can be a dfaure)
I don’t think a difference is made between irc logs, torrents, etc.
What is meant that when a file’s (inotify) modify/close events are received, nepomuk waits another 5 seconds just to be sure. Just as it should be. Great work!
Can’t wait to start using nepomuk in earnest, for all my scientific papers in pdf format. I believe there are some patches out there for better pdf indexing (including metadata), patches for strigi that is I think
I fear the strigi patches still take some time. That is why I made PDF a special case in Nepomuk now. Full text is already extracted. 4.7.4 will include meta-data extraction, too.
Good work!
These fixes are very important and in my opinion they should be better advertised. The release announcement or at least the changelog should list them.
In two months there have been amazing strides in Nepomuk. But I’m waiting anxiously for the next two months: optimizations, and… you getting a stable income.
It’s sad that our Nepomuk happiness had to happen because of your tragedy, Sebastian. But at least you’ll have € 10,000 to support you. Thank you very much for all your effort. Hope you find a job soon.
I appreciate the strides, but again: why was it released AND TURNED ON BY DEFAULT if it needed this much fixing?
I also appreciate the efforts and the results. Really.
But a simple configuration switch would have been enough. IIRC KDE 4 was out in 2008 and only now are this issues being fixed (I know, new tech and all that stuff). I suppose that meant loosing many users and testers that were simply fed up with all the promises.
With the switch I’ve mentioned and a port of kicker, KDE 4 would have been a simple evolution from KDE3. All this semantic stuff, and plasma could have been developed just the same in parallel…
Finally someone is polishing that semantic bloat!! Don’t get me wrong I love kde, but kde should provide lighter experience (e.g. you don’t even need to know what akonadi is, you can use thunderbird for emails and no kdepim but this fucker will eat 100MB of ram all the time just because “display calendar events” is enabled by default) some things need to be re-thinked (i was shocked couple days ago, amarok is not kde per se, but is very kde’ish software – playing one mp3 eats 300MB of ram, that’s insane!!)
Try Bangarang, and guess what powers its library ;)
First of all, thanks a lot for your work.
Maybe I’m a little bit OT, but i’m too curious:
1) Is it tecnically possible to use “pieces” of tracker instead of libstreamanalyzer ?
2) Times ago, I red “somewhere” about the use of nepomuk in gnome: is there something of official or are they just “rumors”?
No. They’ve forked the Shared Desktop Ontologies and made incompatible changes, plus there is a lot of tracker specific stuff in their extractors. If they’re willing to make a nice library, and fix the ontologies, it should be possible.
Vishesh, thanks for your reply! :)
Do you know why they forked the SDO?
Excellent work :) Upgrading and about to retry KDE.
Hi! I have one question about the p2p functionality of nepomuk.
I read that Nepomuk implements SON* functionality to share semantic data on a GridVine based network. Unfortunately I haven’t found much more, all the documentation seems to stop around 2009, when the GridVine project was over.
Do you have any more information? I’d be curious to know where are you at with that. Does it work already? Is it working well?
GridVine was a part of the Nepomuk research project. Its implementation was done in Java and it never reached a usable state.
I’m really sorry to hear that. Do you know if there is anyone specifically I can talk to about that? Since I’m working on something potentially similar, I would like to know what exactly were the problems involved, at lest to avoid repeating them..
A little off topic: I saw this video and thought… “Hey, if we can communicate that Nepomuk is like the failed WinFS, only better, working, and FREE, Nepomuk will be more understood”.
If you look at WinFS docs, you’ll quickly notice the similiarities between WinFS and Nepomuk. Microsoft talks about ontologies, talks about metadata, but, unlike Nepomuk, WinFS is at the filesystem level. A video.
If we can do all the (file navigation: images, music, etc.) things happening there with Nepomuk, we would claim a small but important victory over Microsoft.
All these videos are Flash mockups of Longhorn, but they illustrate what is possible with Nepomuk (because of the similarities between Nepomuk and WinFS)
just one question.
Is normal that start nepomuk, full indexes my home folder and goes “idle”, then logout my sesion, login again and then nepomuk again start to index my home folder?. Then wait to finish, close logout and login and the thing again?.
If you ask me, there’s no reason for that, because there’s no changes in my home folder.
Currently there is no way to detect if there were any changes in between sessions other than scanning each folder for changed mtimes, ie. looking at each file. Hopefully the kernel will soon provide something better. There was a patch by SuSE at some point which would propagate mtimes up in the folder hierarchy but it got never into the kernel I think.
So, if a folder’s mtime hasn’t changed, that folder shouldn’t be entered for reindexing, right?
With the kernel patch – right.
Do you know if the patch has been accepted in upstream kernel (kernel.org)
Hi Sebastian,
great to see Nepomuk becoming more stable.
When I read “Files that are written over and over (like IRC logs for example) are only re-indexed once every five seconds.” I wondered wheter this needs further tweaking? Why scan a log every five seconds? I would suggest every 60 seconds or so.
Or how about making this configurable? How about some Nepomuk “profiles” like “Aggressive/Immediatly up-to-date” “Normal” and “Save resources”
Keep up the good work!
Thanks for your hard work!