Nepomuk and the Google Summer of Code – second try: I want your ideas!

Comments on the last blog regarding Nepomuk and the GSoC were rare. Since I am not sure if that is due to a misleading blog title or due to an actual idea shortage, I am trying again.
Please post your ideas as comments to this blog entry. See the KDE GSoC ideas page to check for projects I already proposed. Go nuts!

47 thoughts on “Nepomuk and the Google Summer of Code – second try: I want your ideas!

    • Oh, forgot:
      – Nepomuk and Akonadi, but I heart there is something underway.

      Nepomuk to world domination.

  1. Lets use the power of the nepomuk/strigi index to build a (k)backup-tool.

    What about a special window where you can directly see what are your favorite movies, sounds etc. This special window could be easily reachable through a klick in dolphin and so you can switch easily to your favorites until you have to search after your things.

  2. IANAC[1] IAAL[2], but an ideal that I imagined NEPOMUK could reach is that meta data would be able to be linked together — e.g. I could search for the document that a friend of my brother (who I do not know the name of) sent me via Jabber.

    This means that NEPOMUK would need not only to have metadata stored that would tell it that that file was sent to me by +person+ and +protocol+, but also link the +person+ to +another_person+ as “friend” and that +another_person+ as my “brother”.

    FOAF does something similar and I think a feature like that would be nice to have, when your contact address gets huge and you don’t remember who’s connected how to other people, which would build up groups and relations/links between different people.

    Another idea that I had while I was discovering what NEPOMUK, Semantik Desktop, FOAF, RDF, SIOS etc. are is that we could finally take advantage of sharing information via the internet (maybe XMPP, but in any case something encrypted).

    Here are two user scenarios:
    1) Bob meets a girl at a party and they get along fine. He promises her to send her photos from the party, but later notices that she forgot to give him his address. He knows this girl happens to be Alice’s friend and Charlie’s cousin. So Bob starts up KAddresbook, enables “search via FOAF”[3] and searches for “friend:Alice AND cousin:Charlie”. NEPOMUK then searches through parts of addressbooks that Bob’s contacts have marked as “public” or “share with friends” and by using those two criteria finds the match and gives him the contact information. Now Bob can send her the e-mail with the pictures.

    The next step would be if Bob could just say to KMail to send an e-mail by describing the person as above.

    2) Now that Sarah (as Alice’s friend is named) and Bob have gotten back in touch and sparks started to fly, she wants to surprise Bob after his lectures and wait for him in front of his faculty and take him to lunch. So she opens up her KOrganizer and enables FOAF to reach the canlendar entries that her FOAF have made available. Because Bob has marked the entries for his lectures avaliable to read by people he is connected to as “close friends” and put Alice in that group, she can see when he finished his lectures and even in which room. (On the other hand, she cannot see that he has planned to go buy her two hours ago, because he has that entry marked as “private”).
    This would also make it easier to plan going out with a group of friends, because you could just check their “time schedule” when they’re available.

    I wouldn’t be a member of the IAAL club, if I didn’t think about abuse here ;)
    So what’s keeping an unknown burglar to access the data when you’re going to be away from home? Well primarily that your data will by default be “private” and you can set permissions for different types of data (e.g. “schoolmates” can share lecture info; “business” only business contacts, but not private; etc.) and secondarily the data that would be send to and from your contacts would be encrypted. Maybe it would be sensible for data that isn’t your own to be encrypted on your disk using a GPG key pair (yours and the contact you got the data from).

    And if I extend this idea even further (and probably too far for 2009), this could be the base of an FOSS F2F cloud computing integrated directly into the desktop! :D

    I may be a dreamer, but I think KDE and NEPOMUK can get this done …if not tomorrow, at least the day after ;)

    [1] IANAC — I am Not a Coder
    [2] IAAL — I am a Lawyer
    [3] For the lack of a better word, I’m using “FOAF” here, as that protocol would be theoretically useable for such use. Although as it is not completely compatible with NEPOMUK, either a compability layer would have to be made or a NEPOMUK-native protocol similar to FOAF would have to be invented.

    • Errata:
      * Of course, Alice forgot to give him *her* address (not “his”)
      * and Bob didn’t wanted to buy alice *flowers* (and not participate in prostitution) ;)

      Regarding “cloud computing”, “Web 2.0” etc., I think a lot of popular solutions (Including Google*, FaceBook, LinkedIn and many more!) out there are neither secure nor open enough for sensible use. We should offer users FOSS solutions that implement open formats and standards, promote appropriate content licenses, enable the user the power over his contents and what he shares with whom, but make this done easier. Some content should be available and some not — and the user should have the right to choose!

      That being said, the so called “Web 2.0” and the “upcoming Web 3.0” a.k.a. semantic web and “cloud computing” are all concepts that prefer the middle man (i.e. the service provider) and take away a lot of their users’ freedom and power over the contents they entrust to the provider. This is a major flaw that we especially — the free and open source community — should criticize and try to bypass.

      And even if we ignore the privacy and security aspects of “cloud computing” and “Web >1.x”, in the world of wireless acceess, laptops getting more available every minute, netbooks and smartphones, those solutions are not the most practicle at all. Semantic web and cloud computing would make sense in the time where computers were still stationary and the internet was the best option for a user to “take his data with her/himself”.

      IMHO the *real* future is a F2F (and maybe even P2P) semantic desktop. Instead of browsing websites for content and then either entrusting that data/content to be kept by a 3rd party company (in this case: 1st party = you, 2nd party = the person who is accessing your conent), we should cut the middle man (unless we *need* him) and share the data and content directly from one (trusted) person to another. Directly using the programs that we use to handle that data/content natively in our everyday lives!

      (The only answer I can think of is that my laptop is not online 24/7, so I wouldn’t be always reachable …but we could still have servers that would handle or cache such data, encrypted and available only on to those whom the owner/author of the data/content alowed to.)

      • Great ideas. However, I think too much for a SoC project. I will add them to the upcoming Nepomuk TODO/Ideas page though.
        Social features was one of the things I wanted to target in 2009.

  3. I would like Nepomuk strigi to search Gmail, Google Docs, GTalk logs & Google Calendar for results as well as the local documents please. We’re all web 2.0 now you know!

    • All web 2.0, hum !
      I really don’t agree. What would be really nice is that strigi able to index my 4.5Gb 14 years old mails boxes, My 120Gb documentations files (txt, odf, sxw, pdf) without 2 days indexing at 100% cpu and after that just crash and not able to give me back my indexes.

      The second point which is more critical : How to backup meta-data and restore them if I change my system, or need to rebuild my /home.

  4. some integration with kfind/dolphin/konqueror, like in vista, so you can search from konqueror/dolphin or when you use kfine see an option to use nepomuk.

    ability to auto-attach labels from the apps that im using. Example: take a screenshoot using ksnapshoot and add the label “ksnapshoot”, if is created using kword the same, if you after that edit that picture taked using ksnapshoot with krita, the file have 2 tags and you can search by programs created or program edited.

  5. Probably some UI library for creating queries at different levels of complexity.

    Applications dealing with potentially large quantities of data most likely already have UI for their most common types of queries, whether they are using some external search service or implementing it internally.

    However, enabling a user to do searches based on semantics isn’t covered yet and probably better done in a common library to get consistent interfaces and avoid mistakes.

    Probably a bit like visual programming, i.e. connecting “functional blocks” on a workspace widget.

    Or maybe by letting the user select a sample of items that should be matched and then check the relations if they have something in common.

  6. Automatically integrating intangible discussions on the design of software in documentation.

    As someone who maintains software that was created without a stringent requirements process I am often left with questions,
    why was this feature implemented and why in this way. This kind of information is rarely preserved in comments but much of it might already be documented especially in open collaborative projects.
    For example e-mail discussions on a new feature, an issue in a bug tracker, review board logs, an irc discussion, or relevant academic papers. Attempting to match this kind of data to code written might help future collaborators gain an insight to design choices made in the past.

    As an example today I was working on an issue that had been opened recently. This issue was known 2 years ago when implementing the software. However, this was not in the documentation attached to the code or in the issue tracker. Yet, it had been discussed and e-mails had been sent back and forth about this. Thankfully a colleague knew remembered this issue and had a copy of the relevant e-mail discussion preventing a rerun of the discussion of 2 years ago. This kind of institutional knowledge is not always available. A tab in the editor showing possible related e-mails and other documents to this piece of code might help all in the future.

    This example talks about code but it might be equally valid for an engineering approach. At some point a choice is made for a type of screws in an assembly. The reasons for the selection of this screw type might have been trivial (cheapest) or for a very specific reason the material does not rust in the expect humidity conditions and won’t become brittle until -50 degrees centigrade. The recording of the discussion and selection procedure might avoid major problems years later. Secondly if at a future time a part fails, due to a mistake in the engineering process, the manufacturer can not only search for similar parts but also for parts where a similar decision process was followed.

    Reading back on this it just seems a specific implementation of your idea the context sidebar.

    • very good idea. I think the most important part here is the way how this information is generated, i.e. how applications support the user in relating an email discussion to a piece of code or a project or whatever.

  7. i would advise to ! concentrate on the *first* step to make nepomuk work, and this is automatically tagging files as the come in to your system…

    (copy/mail/digicampictures/movies and so on)

    then provide services around those tags (playlists,my bla, my blub, my documents,…)

  8. This might already be taken care of with the addition of the Soprano Virtuoso backend, but the ability to handle federation searches would be nice. Eg. I use nepomuk on the desktop machine and my laptop, when I search for metadata on a file on an NFS share (or something that has unison run on it on the laptop and server) I should be able to pull/push/search metadata to the laptop’s nepomuk service from the desktop machine.

    • while this is an important issue I am not sure if it could be handled as a SoC project. Seems too complex and complicated to me. However, very important as far as general Nepomuk development goes (I need more time! ;)

  9. I had an idea for an search client. Something real different from the others. I plan to code it myself :-)
    Basicly it looks like a diagram of set theory. In the middle there is a main circle which is presented as a bubble (dirigible). Now you can drag a tag from a sidebar into it and it will search like krunner in a result area. There are “AND” bubbles and “OR” bubbles from another sidebar to drag them into the main bubble. You can drag more tags into that bubbles to query nested sets of criterias and hide brackets to casual users.
    I had this idea while I am working on a model abstraction in php and I will code this in js to have a userfriendly visual “SQL Query Builder”.
    But If I think on nepomuk/strigi you could do much more with it.
    There are 2 possible ways I’m thinking off to collect the possible criterias. While tags are not the only thing to search for this application have to get the possible properties from somewhere to present it to the user.
    One way is to retrieve them from nepomuk. Like Gwenview is retrieving the tags. (I wish there where a nepomuktags:// kio…, we already have digikamtags://)
    You could have a entity like “contact”. Now you list all contacts in a sidebar (which are presented by a icon of there faces :-P) and you can drag that face into a bubble to search on things connected to that person. Other criterias like last-modified or created are done by dragging a clock into it. While dragging the app asks you for the time.
    Then there is a “simple-mode” to build this single criterias which basicly means the criterias are builded against “anycontactfield=draggedContact” where anyfield is ( (fileretrievedby=draggedcontact) OR (fileowneris=draggedcontact) OR (composer=draggedcontact)….) or a expert-mode where the sidebar is connected to a single property to only search interpret=draggedContact.
    You can expand on this idea to save the queries as virtual folders or allow the user to build own criterias with his keyboard :-), have bookmarks and so on. And these things would be the basic idea to have .desktop files for criterias. Like there are virtual search folders in nepomuk you could have a contact desktop file to put it into any directory and perform a nepomuksearch or assign a directory to a user.
    I know that these things are easier to handle with a sidebar but I would like to have a file which represents any type of entity :-)
    Okay…I hope I’ll find time to code this.

  10. I haven’t looked into the internal of Nepomuk integration and how it is used. It is on my map, I promise…

    1) What I would like to see is some dead simple way to store arbitrary information within Nepomuk and use the power of semantics to retrieve it again later.

    I have the need to manage Tasks. Each task has priority and other special attributes. A person is assigned to the task…. you get the drill.

    – So, I just design an ontology (rdf) and link it to existing concepts already defined on FOAF and other core concepts. A data model is built.
    – Running some “generator”, a basic UI interface for the management of data (editing, adding, deleting) is generated. No need for direct coding is required. All data handling is handled fully transparently.
    – Even better – the interface can be customized using drag and drop (Topbraid Composer does this nicely)
    – For more advanced requirements, the UI can be customized. An API is given to work with the ontology data. Scripting languages are supported.

    And – yes – the data should be usable from within any semantic aware KDE application :)

    2) Migration aspects and multiple computers:
    RDF / OWL define a graph. Therefore, determining the instances that belong e. g. to a certain computer is difficult.

    Core questions:
    – What happens if a certain file is moved to an external storage media and moved to another machine? How is the meta-information syncronized / migrated?
    – What happesn if a certain mail is forwarded to another mail address? Is there an option to also send the meta-data along? Does this make sense regarding privacy and usability?
    – Imap: How do we synchronize the meta-data of different computers for an IMAP account? The objective should be to make the relevant information available on all machines that access this account.

    3) Privacy: The more data is stored as semantic data, the more complex the data model becomes. Privacy might become an important concern. How can we support the user to make sound decisions if he marks a certain object as private? How do we present the implications, such a decision has? What kind of security model would be required?
    Lots of thinking, little code ATM, I think

    4) How about custom rules or triggers, a user can define? Such a rule or trigger defies a certain condition in the data model. If this condition (or more than one) is given, the action part of the rule is executed.

    Example (not the best):
    – Receiving a mail, the mail is forwarded to my SMS if a) the sender is from my list of business contacts and b) is referenced from a task or project I am currently working on and c) whatever

    Well, lot’s of blue sky thinking…. over time, we may reach the sky and actually find out, what works and what does not work.

    For me, nepomuk is the most innovative approach in KDE4. Thanks for the vision and work!

    • Again a good idea but I think too much for a SoC project. It will, however, make it to the todo list.

      As for sharing data: Nepomuk makes use of named graphs to attach metadata to metadata. This includes creation date and creator. This combined with an improved URI schema (jabber ids are the current idea) allows to share data while preserving origin an other information.

  11. I am currently working for a company, where unfortunately the resources are distributed in all of the employee’s computers. When I want to find something I use google’s desktop search but that searches only on my desktop, missing information that can greatly help which exists in someone else’s pc.

    So, the ability to have “distributed” strigi and nepomuk functionality could save both me and a lot of colleagues since we won’t need to organize centrally everything from the beginning which can be a time consuming/complex task and of course one very prone to errors …

  12. Does creating a ‘Smart Location Bar’ (like in Firefox), would qualify as a good idea ?
    This would be used by Konqueror, and made available as a stand alone widget (unrealistic idea ?)
    Of course, it would have to use Nepomuk capabilities…

    But now that I think of it… it might be smarter to re-use krunner capabilities for this.

  13. Associate devices with contacts

    Image you have a digital camera now you could associate it with you what happens would be that every downloaded picture would have …

    * the camera as sourceDevice()
    * your name as creator
    * your contact as sourceOwner()

    The contact could also be a group of contacts like “Family” consisting of several other contacts.

    That way by just downloading pictures from your camera they are associated with you and thus easier search-able, especially if you do not tag them.

    Another use case would be mp3-players. Imagine your friend has free music on their mp3 player. No you copy that and as the mp3 player was associated with your friend you easily now from who you got some mp3s from.

    In all these cases (I guess) the implementation would need to use Solid for getting the information on the hardware.

  14. Dolphin — Sort by Tags and Show in Groups

    Imo there should be a “View/Sort By/Tags” entry. Though this would only make sense if Show in Groups was also activated. In the options or maybe a sidebar you should be able to define what kind of tags are used (not used) for sorting. Though that should only be needed in rare cases.

    A small example:
    * garfield.png –> Tag:”Comic”, Subject: “Garfield”, creator: “Jim Davis”, created: “2009-02-28″
    * xkcd.png –> Tag:”Comic”, Subject: “XKCD”, title: “Westley’s a Dick”, description: “Inigo/Buttercup 4eva Tag: “Comic”, Tag: “How To”, Title: “Create a Webcomic”, creator: “XYZ” …

    In this case you’d have two (four) groups:
    1. Comic
    2. How To
    3. (Dates –> special case see next post)
    4. (Filetypes)

    The reason for that is that all other metadata is only associated to one tag.
    * All ‘subject: “Garfield”‘ files are also tagged comic –> if other files would also have ‘subject: “Garfield”‘ there would be another group called Garfield.
    * All images are also comics
    * how_to_create_comics.pdf is shown in “Comic” and “How To” as it is an intersection of both.
    * Filetypes get less priority than other metadata, that is why no additional “PDF”-Groups and “Image”-Groups are shown in this case, though that would change if X groups have Images in them

    Group name:
    * Hovering over the group name shows a button to collapse it
    * Hovering over the group name shows additional metadata you could use to filter e.g.: Hovering over Comic would show “Garfield, XKCD” in something like a frame and “Image” and “PDF” also in a frame. Clicking on one of the additional metadata creates a new group like “Comic:Garfield” or “Comic:Image”, the original “Comic” is automatically collapsed.
    * Clicking on the group name would mark all of its entries

    This way could enable one to easily browse (virtual) folders with a lot of files. I’m not sure though if it would be of much use.

  15. Timeline

    A new sidebar could be introduced into Dolphin having a timeline.
    This would enable it to easily show only files/folders in a certain time range.

    As default all metadata connected to Dates (created, modified, downloaded etc.) would be used, but that would be configure able.

    Additionally one could activate using colors. The idea is that there are two colors e.g. white and red, where white would be the oldest date in the specified time range while red would be the newest. Now the background of the files would be a combination of white and red depending how old they are.

    This allows you to have even more information at one glance.

    Again this feature would be especially useful for (virtual) folders with a lot of files.

    • It’s interesting that you mention a timeline in dolphin. I’m working on a simple backup system for KDE (similar in concept to Time Machine) and I have a kioslave for browsing “back in time”. My idea was to add a digikam esque slider to the bottom of timevault (as a toolbox, removable of course) to be able to move back in time.

      It sounds like you want the timeline widget from digikam just as it is now, with the ability to select multiple “time ranges”.

  16. Probably the most necessary (if not the most sexy) thing to develop at the moment is the UI.

    A graphical tool for building compound queries. The basic design of OSX Finder is pretty good but would need to be extended for semantic data.

    Something for simply graphically browsing the your RDF information might be nice, particularly if it allowed you to fix problems, for example if you had tagged a load of photographs, then moved the root directory of your photo album leaving the associations of all of those tags broken.

    ps Is it possible with nepomuk to associate numbers with tags, e.g. “Jenny is in this photo at position (x,y)”, in a reasonably efficient way.

    • yes, but that would not be a tag as used in nepomuk atm. One would make a relation between the person “Jenny” and the picture. Then the coordinates have to be attached to this relation, too. Actually, in playground there is a tool which does exactly that. Only it is in alpha quality and especially the ontology used is not perfect yet.

  17. – Nepomuk in networks: Howto use a common service for workgroups

    – Nemomuk as CRM system. A special kontact module that collects and presents customer data from all pim data of a workgroup via akonadi like: all members of company A, what files are send to member 1, last appointment together with a member of company C etc.



  18. The 5-star file rating system is interesting, but rather useless as it is. Synchronization with FOSS projects like Amarok, RockBox, and Ampache which also use 5-star rating systems for media files would be amazing, and is pretty much the point of a project like this, right?

  19. Two related tasks:

    1. Metadata export tool

    I send a file to friend. The Nepomuk export tool lets me export relevant metadata for this file as a small .nepomuk file that I can send along with the actual data file. The recipient can then simply open this file and the metadata is imported into her Nepomuk store. I can set preferences for what data to export, to preserve my privacy, and I will also be offered a human-readable preview of the data that is about to be sent. Once this is working, it can be integrated into KMail, so that you get asked if you want to include the metadata whenever you send a file, and similarly if you want to import the metadata when you receive one.

    2. Metadata migration (and sync) tool

    Allows me to migrate and sync all of my Nepomuk data, or subsets thereof, between different machines.

  20. I’d love to see virtual folders for bookmarks as mentioned in comments for other blog post. That would be sooo handy!

  21. I would like to see a plasmoid which I can ask questions. Questions like:
    – How many apps crashed yesterday?
    – Which files did Susan send me?
    – How many mp3 files do I have on my PC?


    Openbrain looks like a possible plasmoid but it’s not there yet.

  22. Collecting data:
    e.g. Kontact, Thunderbird extension,

    e.g. Amarok, RockBox


    remeber when and from what IP-address some file was saved

    when extracting a file the extracted files should inherit the metadata from the archiev.

    collect browsing information, maybe strigi can even parse all(?) websites one visits so you could do kind of a history full text search afterwards?
    Download information, not only http downloads but all kio_slave download/copy processes.

    Display data:
    Maybe some of the data is already collected(?), but its very hard to find out. Only Doplhin displays any of this data in a obvious way.

    a comprehensive search application would be very nice. Including the posibility to save search results as virtual folders.

    display metadata for a selected contact (chat history, files from this user, …)

    display metadata for a selected day (timespan)

  23. it would be great to have a multi-criteria search program.
    for instance i’d like to search my file with a tag containing “land” (such as landscape, island,…) that are “images files” with a mark higher than “3 stars”. and it would be interesting to be able to store this search as a virtual folder in dolphin.
    today we can tag files, mark them, comment, but we are not able to use these data to search our file effectively.

    nepomuk should also share some datas with other programs such as amarok for marks on music files, digikam for tags and marks, kontact for senders, date, and description of the file (title of the mail in the description of the file for instance)…

    • the nepomuskearch:/ kio slave does allow the multi-criteria search. I added a SoC project to improve it.

      Data sharing will be achieved once applications store their data in Nepomuk. That is the idea. contacts and emails and pim data in general is on the way (Akonadi integration), digikam and others will hopefully show up to the workshop I announced today)

      • i’m not sure a that graphical interface to make requests is a good idea because everybody won’t understand that they have to create request and use them in dolphin to be able to search their files. it should be very easy so that everyone could benefit from the power of nepomuk, and not only the geeks…

  24. A feature I would ecpect from a desktop search engine is the ability to search the contents of all the web pages my _bookmarks_ point to.

    Since I read a lot on the web I extensively bookmark all interesting stuff I eventually want to come back on.
    But browser bookmarks alone are just inconvenient.
    And organizing bookmarks in folders ot tagging them in some elaborate systematics is just tedious and of little value.

    Thus indexing web pages given by browser boomarks would give a great possibility to explore ones personal knowledge space.
    May be the implementation of this feature is to small for a SoC project, but it could be much more valuable then indexing local disk content.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s