Nepomuk – What Comes Next


After a very generous start to my fundraiser (thank you so much for your support) it is time I get into more detail about what you are actually supporting. Originally I wanted to do that by updating nepomuk.kde.org. I will still do that but it will take a little more time than anticipated. Thus, I will simply start with another blog post.

Well then, apart from cleaning out the bug database at bugs.kde.org (this will be a hard one), continuing to support app developers with Nepomuk integration, maintaining the whole Nepomuk stack, Soprano, the Shared-desktop-ontologies, and some smaller Nepomuk-based applications there are some very specific tasks I want to work on in the near future (In this case the near-future roughly spans the next half year).

Semantic Saving and Loading of Documents

Pretty much forever we have managed documents in a very nerdy manner: the way they are stored on the local file system. We navigate physical folders, create complex hierarchies, get lost in them, recreate parts of them, never find our files again, and still keep on doing it.

The vision I have is that we do not think about folders at all any more since for me they are a restriction of the 3-dimensional world that has no place in a computer. A document on the real world can only be archived in a single folder. On the computer there is no such restriction. We can do whatever we want. Thus, the idea is to organize documents closer to the way our brain organizes information: based on context and familiar topics and relations.

This vision, however, is not feasible in the near future. There is simply too much legacy data and too many applications relying on the classical folder structure. Thus, the idea would be a hybrid approach which combines classical local folders with advanced semantic relations and meta-data. This is a development which I already started with fantastic input from the community.

The next steps include finishing the prototype and creating its counterpart, the file open dialog. This will be a very tough one for which I will ask your support again since that works out so great with the save dialog.

Excerpts

A typical use case is bookmarking pages or copying specific parts of a document into some collage of snippets. However, as always we loose the relation to the source. This is were Nepomuk will shine: instead of copying the part of the document we simply define the excerpt (the portion the user is interested in. This can be a section which is marked, it can be a specific position in the document ranging up to its end, or it can be part of an image.) as a resource in Nepomuk which we can annotate like any other resource. This means that we relate it to topics, people, projects, files, other snippets, web pages, comment on it, and so on – all the while we keep the relation to the original document

This allows for nice things like automatic collages (think of selecting all snippets which mention a certain topic or relate to a certain project and were created before some date and merging them all into one view), simpler quoting of things you read before (since the relation to the original document is in tact you have easy access to the details required for the quote – very interesting for academic workers), and a simple listing of all interesting quotes from documents by some person you like (an example query).

Sharing Nepomuk Data – Step 1

Whenever we create information we want to share it with others. Vishesh Handa already started a very ambitious project to support several types of data sharing through a plugin system. What I want to do first is much less but nonetheless interesting: sharing bits of Nepomuk data manually.

This means that you define the information you want to share and then simply export it into a file which you can then send to someone else. They in turn can import this information into their own Nepomuk system. For starters there will be tracking of origin of the data or anything like keeping two ratings at the same time. That is for later.

This is a very simple first step to sharing which should be fairly easy to implement, the GUI being the only actually hard part. The Data Management Service already takes care of export and import for us.

Once this works adding the same to EMail sending or Telepathy communications ill be very simple. In fact the Telepathy-KDE guys (namely Daniele E. Domenichell aka Dr. Danz) have been interesting in that for a long time. (I wish I were with you guys at Cambridge now!)

To this end I will probably finally get to work on Ginkgo, the generic Nepomuk resource management tool developed by Mandriva’s Stephane Lauriere.

For App Developers: Resource Watcher

For the longest time the only way of getting notified of changes in the Nepomuk database were the very generic Soprano signals Model::statementsAdded and Model::statementsRemoved. Checking for specific changes meant to check each statement which was added or removed or doing a pull each time one of those signals was emitted. An ugly and not very fast solution.

With the introduction of the Data Management Service this will finally change. We already have a draft API for the Nepomuk::ResourceWatcher which allows to opt in for change notifications of different kinds: changes on specific resources, new resource of specific types, changes to specific properties.

The initial API is there and partially integrated with the Data Management Service already. However, I would like to add some more nice features like only watching for non-indexed data or excluding changes done by a specific application (useful for an app which does changes itself and does not want to bother with that data). Also integration into the DMS needs to be finished as not all features exposed in the API are supported yet.

The technical aspect: KDE frameworks

With KDE 5.0 kdelibs and kde-runtime will be split into smaller parts to make it simpler for application developers to depend our powerful technologies. This also means a split for Nepomuk. I already started this split but a lot more work needs to be done to make Nepomuk an independent part in the KDE frameworks family.

Part of this also involves getting rid of deprecated legacy API and improving API where we were previously restricted by binary-compatibility issues.

So this is it for now. Reading over it again I get the feeling that it might be too much already – especially since I am fairly certain that new things will pop up all over the place. Nonetheless I will try to stay the course for once. ;)

Thanks again for your support.

Click here to lend your support to: Nepomuk - The semantic desktop on KDE and make a donation at www.pledgie.com !

49 thoughts on “Nepomuk – What Comes Next

  1. Hi Sebastian,
    thanks for your terrific work on Nepomuk! Seeing the skeptics and critics, perhaps you should extend this list with an extensive profiling and analysis phase? It’s sad to see so many people disable Nepomuk completely, and surely there must be a way to improve this. Maybe its just packaging issues on the distributions, so checking the situation in OpenSUSE, Mandriva and Kubuntu would be advisable (in a pragmatic not academic sense, what is the real world status quo?). Or is it virtuoso that is the problem? Or strigi? Maybe if you blog about the difficulties other people would step up and help as well (performance interests _a lot_ of people).
    Now this list already looks like enough work for a few months to come, so the amount of money to achieve with the pledge is surely stretched already, but I’m sure such an activity would be well received as well (maybe even help the pledge?).

    Greetings
    André

      • Yeah, I’m absolutely sure of that as well. But nevertheless, I can’t really relate your answer to my question. Do you mean that the topics you describe are independent of the pledge? Otherwise I don’t understand your answer, sorry.

  2. No love for the file indexer? No love for getting to know better virtuosso? All those things you’ve planned sound wonderful but the file indexer is still extremely slow compared to, tracker or spotlight.

    • Well, sure. You have no idea how much time I already put into those things when originally file indexing is just a side product of Nepomuk. Thus, now I want to spend some time on real Nepomuk features. Because otherwise it is not clear why we use RDF and Virtuoso instead of a simple sqlite DB like Tracker.

      • Still, if you want proper adoption I guess that those pain points need to go away.

        Some things are fishy anyway regarding IO, I turned off only the file indexing (nepomuk is still enabled) and at startup I still get a large nepomukservice and virtuoso hit. It clearly does something with my files and folders at session startup but no idea why since it’s supposedly not indexing.

        Although I can live with such a behavior on my workstation it unfortunately means I had to disable it completely on my laptop.

        Which reminds me, any pointers on finding out what a nepomuk agent is doing at startup, etc? Any equivalent to akonadiconsole which is very convenient for developers or power users to find out what’s going on?

        I’d love to provide more feedback, but without this kind of tooling it’s unlikely I’d have the time to investigate further.

        PS: Regarding the indexing itself, it was reindexing all my home at each login, and taking the whole day at that. That’s mainly why I turned it off (upgrading to 4.7.1 and starting over with the soprano db and nepomuk config unfortunately didn’t show progress, although I kind of assumed this was fixed).

        • 1. Install a patched Virtuoso. Chakra, Arch, Fedora and OpenSuSE deliver it.
          2. Change your distro if necessary to get 1.
          3. Ensure yourself you have Strigi 0.7.5 and try to use libstreams and libstreamanalyzer from git.
          4. Your file indexer will work as intended. Your Home directory will index in about half an hour, the first time. The second time you will get no weird reindexings.

      • There is a general problem with Nepomuk being only a technology and not an application: The only thing that’s really prominent about Nepomuk is its memory usage. I’ve heard again and again that many parts of the KDE workspace won’t work without Nepomuk, but I’d like to have a look what exactly is stored in there.

        Is there some kind of database viewer for Nepomuk? Something that tells me, “The Activity Manager has stored this and that data, which needs X MB on your disk.”

          • Okay. I’ve tried all four apps.

            Ginkgo showed nothing, until I noticed that Nepomuk was not running. Enabled it, restarted Ginkgo, still nothing. To ensure that the app actually works, I opened Dolphin, added a tag and a rating to a random file, and it really showed up. So the application works, but there’s no true data being displayed by it. However, it seems that Ginkgo’s limited to some distinct types of data only.

            NExp seems promising in the way that really showed some useful data in a quite well-presented manner. Problem is, it’s only really good for monitoring, because then I can just click on the objects involved to query for additional information. Queries on stored data require me to write SPARQL, which I can’t.

            KSoprano looks totally worthless without SPARQL knowledge. I could not get more than error messages out of it.

            NepSaK comes closest to my needs in the sense that it seems to actually display data, albeit only “resources”, which seem to be like data types or something. “Browse Resources” will only show general types like “Resource” or “Event” or “FileHash”. There’s also “Query Resources” but that requires SPARQL again.

            So of four tools, one (Ginkgo) actually does the job I want it to do, but it’s limited to some types of data. The three others look like they would be able to do what I want *if* I knew SPARQL. And again: I don’t.

            What I want for Nepomuk/Soprano/Virtuoso is something like what phpMyAdmin is for MySQL. A straight-forward query tool that allows me to answer the question “what data is in that database and why?”.

            • Normally Nepomukshell should list all existing types in the default view. That way you can simply browse all resource in Nepomuk. Be aware though that chaning from “Thing” as the root class to “All Resources” will block the GUI for quite some time. This is something that needs to be changed to async operation (and better queries actually).

            • Ginko is working fine for me, probably because Nepomuk is running y my system. I don’t use sections but the search tool in the upper right corner.

        • Nepomuk uses libstreams, not strigi. libstreams has been developed as part of strigi initially, but strigi being a Qt only thing lacked many of the features we needed for workspace integration. Nowadays, Nepomuk runs its own indexer, which is just a small app around libstreams. Tracker is not a replacement for libstreams. I think at some point, tracker developers thought about adopting libstreams, not sure what came out of it.

      • I don’t think Tracker is a ‘simple sqlite DB’. It uses RDF, SPARQL and the Nepomuk ontologies. Certainly it isn’t as scaleable as Virtuoso, and it doesn’t have full support for advanced features such as named graphs. But your description above is simply misleading and unhelpful. We need all the Nepomuk implementations we can get to make the concept viable and to achieve a critical mass of developers who are familiar with the concepts.

    • Manolin: before whining about file indexing, tell us what is your strigi-libs version. There are still so many distros who are sticking to 0.7.2, and that will give you an USELESS file indexer with Nepomuk and KDE 4.7.1. 0.7.5 is a lot better, and, if you can, use libstreams and libstreamanalyzer from trunk, like as I do. If you run libstreams and libstreamanalyzer from trunk, Nepomuk file indexing kicks the Tracker ass one thousand times.

      Also, use a different distro. Debian is not a good choice. If you really want good support, you should stick to OpenSuSE, Kubuntu (although I don’t like some things about it), Arch Linux or its spin, Chakra Linux (a solid recommendation, despite the fact it needs some tweaking after installing it).

      • Let me add another distro to your list of decent KDE support/updated versions; Lunar-Linux. It is a source based distro so some folks may not care for but it does try to be as up-to-date on the KDE dependencies as possible; including strigi-0.7.5.

        • Unfortunately, I’ll have to remove Kubuntu from that list, after I filed a bug in Kubuntu to support strigi 0.7.5. Basically any distro based in Debian, including Debian (Sid) itself, is stuck in strigi 0.7.2. Also Fedora 15 and before is stuck (for a Fedora solution, see the other “revised” post by Sebastian; I’ve monopolized comments there, sorry about that :P)

          Please, let’s compile a list of working distros (with Strigi-libs 0.7.5 or greater, like Mandriva) and a list of broken distros (with Strigi 0.7.2). We’ll kill a lot of the unfair trolling we are seeing about Nepomuk that way.

          Let’s support Nepomuk!

  3. For sharing data in my external hds I’m developed a small tool called Neposidekick

    http://kde-apps.org/content/show.php/Neposidekick+service+menu?content=137233

    and currently it covers all my needs: comments, ratings and tags; but a more integrated service would be great.

    About sharing I don’t know what Vishesh Handa has in mind but I’m thinked something about this stuff and add a web server to Nepomuk as presentation layer seems to me a good approach. Something like nepomuk:/ protocol but with remote access. It’s natural to rdf/sparql so any rdf/sparql system could access your local data and you don’t need to worried about synchronization, obviously this method is slow but accurate and I prefer accurate data to quick data. Other advantage is the ability to query your data from a remote location, I actually do remote queries via ssh with a python script but a web service would be more user friendly :).

  4. I just recently disabled Nepomuk altogether because it gets on my nerves with the constant Dr. Konqui popping up and the Semantic Search finds every file but the ones I am actually looking for.
    But on monday I’ll get paid, maybe funding a bit money will help resolve those issues finally …

    • Don’t forget to check your Strigi version in your package manager! If you are using Strigi 0.7.2, you WILL see lots of crashes! And if you run Debian or Ubuntu, you are using Strigi 0.7.2.

      I filed bugs in Fedora and Ubuntu, so Sid got the updated packages yesterday. They should roll into Debian Testing soon, and they will, hopefully, be included in both Kubuntu Backports and Oneiric Ocelot. For the time being, if you are trying the Ocelot, you need to add this PPA.

      https://launchpad.net/~fboudra/+archive/kde

      Add it with the usual procedure:

      sudo add-apt-repository ppa:fboudra

      And you’ll have a Strigi MUCH better than what you may have, if you are on Ubuntu.

    • You do realise that this is a bit like asking Einstein to paint you a masterpiece. Yes, the guy is smart, but that’s *not* his area of expertise. As long as you’re going to ask people whose experience lies elsewhere to fix Plasma, why not fix it yourself?

  5. I can’t wait to see the day when Nepomuk is used seamlessy everywhere in KDE SC. The day when it would be possible to completely ingnore the classical filesystem hierarchy with automatic tag handling, unified file indexes, simple user interfaces and so on. It’s also awesome to see Nepomuk used with more abstract data like contacts and whatnot, hopefully we will see more that in the future.

    Simply, Thank you for your work. Unfortunaltely as a stundent I’m not a much of financial help.

    • I second this. I’m desperate for semantic save. I can never remember whether I saved a piece of music for my project in my music folder or in my projects folder. Same goes with documents, images, everything! Being able to say that this file is both part of my music collection *and* part of project X (and part of… etc) will be invaluable for me, especially when I can search for the most importantt (ie highest rated) project files. I for one would happily do away with the traditional folder approach forever, right now, if semantic save was available and stable.

      The only other missing piece would be, the ability to copy files to other computer and still have the same semantic data (ratings, tags etc) available.

      Anyway, thank you so much for actually outlining what your goals would be with precision (even if they end up needing to be trimmed down a bit). This gives people a real idea of what funding is going towards.

      • There is a python script called Neposidekick that I’m using to share external hd ratings, tags and comments manually. Maybe could be useful to you until a better solution will be available.

  6. The one thing that I (personally) would like to see first is actually something you’ve blogged about a year ago: Excerpts.
    https://trueg.wordpress.com/2010/09/10/someone-requested-excerpts-for-query-results/
    That one, and locating PDFs at the right spot if appropriate. Fedora is giving me KDE 4.6, so all this might actually be in already. But if it is not: Such a thing would make me start using Nepomuk a lot more. Once it is in my workflow, I’m sure I will start using the more complex stuff as well. The same happened to me with Activities and KDevelop.

    And yes, I know it can do many more cooler things, but I guess the grouchy conservative users around here need something obvious and now to be convinced :)

  7. “Sharing Nepomuk Data”

    Will this include being able to store your nepomuk data for removable devices on the removable device? I was really looking forward to this when it was announced, but I haven’t seen any mention of it in a long time. I work a lot with flash drives, so I can’t make much use of nepomuk because my data is stuck on one computer and thus useless to me much of the time.

  8. I think Nepomuk is going in the wrong direction. Instead of loading yet another database server into the environment, effort should be put in creating a new, or extending an existing file system to take care of the metadata. The metadata associated with some files, sometimes is much more important than the data itself. I trust the file system much more than I trust KDE applications and don’t want this precious information to be buried somewhere deep inside the .kde4 directory, which I have to re-create with every major update of KDE and that any subtle bug may accidentally wipe. Moreover, it doesn’t offer me any direct control over it. I want to be able to use the standard command line utilities to copy/move files together with their metadata, that’s what would be really useful. Developing Nepomuk is like buying the remote control first, in the hope that someone will give you the TV later on.

    With that being said, I’m still grateful to people like Sebastian for their incredible dedication.

    • That is a typical question that arises every now and then. The answer is simple: Nepomuk is not about files. Nepomuk is about resources in general. This includes files but also emails, people, projects, events, locations, ideas, notes, and so on. You cannot put those into a file system since they are not files. And if you do you essentially convert your file system into a generic resource storage which allows the creation of relations between the resources. What you get then is something like an RDF database which brings us back to Virtuoso.

      • Sebastian, I support Nepomuk with force, but I’ll whine a bit this time. akonadi_nepomuk_email_feeder has a weird behaviour. It only indexes when it wants to do it, and it stop indexing mails when I press keys in my keyboard. Can you kill the idle detection code and replace it by something more rational (e.g. using the same criteria Strigi uses to suspend file indexing, to suspend mail indexing)?

        Other than that, I need this. And Nepomuk has some nice side effects here (Bangarang gives me all the features I need of a heavy media player, with a slim memory footprint and no loading time, among other things).

    • that is already there today – reiserfs 4. Politics from aholes at redhat ( who for obvious reasons want to keep pushing the $hitty ext) made sure it doesn’t get included in the kernel.

      This is what we face in so called “open source”.

  9. Pingback: Links 19/9/2011: Linux Mint Debian 201109, Knoppix 6.7.1 | Techrights

  10. :)

    Thanks for the quick reply and for clearing up this one. I’ll wait anxiously for 4.8 then. Please, don’t drop Nepomuk, and tell Jörg Ehrichs to blog about Conquirere, a fantastic idea to couple research projects with Nepomuk. I’m following that!

  11. Pingback: Nepomuk – What Comes Next – Revised | Trueg's Blog

  12. Pingback: Los planes de futuro de Nepomuk : KDE Blog

Leave a comment