Semantic Saving – Next Next Try


After my last post on Semantic Saving I got a lot of responses, most of them very positive and productive – thanks a lot for that. Today I tried to incorporate most of what you suggested into a new mockup. The new mockup contains of two versions: The default small one only shows a file name, a document type (I still think this is very important), and the folder selection (I fear we still need that although I hope that need will go away in the future):

A “More” button allows to expand the dialog into the bigger version. In there I did not change that much: The title has been replaced with a name field since the title of a file is essentially its name, there is no need to confuse users with that distinction. The comment has been moved below the annotations. I agree that it is less important, especially when saving files that contain text.

Maybe we can continue the discussion on this revised mockup before I start implementing something.

49 thoughts on “Semantic Saving – Next Next Try

  1. In your mockup the more-section is loaded in between default elements while the button is under the default elements. From an user point of perspective I would say: apparently it is less important -hence it is under the More-button- so it would go under the ‘important’ elements.

    However: having a small default dialog is a major step forward :D

  2. While I love the fact that semantics are finally kicking off the location of the file on disk is still important, especially when synchronizing with other, non-KDE desktops and operating systems. I for one want to save my files in a certain directory (at best depending on the main category or a combination of tags, no matter how good or complete semantic integration is. Say I have a letter to my bank with tags: bankname, year, topic. I want it to be saved in the folder where all my bank-stuff is. Or a letter to the tax authorities, tags: taxes, year, topic. I would want it to be automatically saved in the folder ~/Documents/Taxes//.

    With the right filters this should be doable, just saying: if these tags are present, set directory to that, else default. If something is ambiguous, as for user input.

    • Sounds very reasonable indeed. The only complication is how to define these filters. We need some GUI for that and a definition format.
      Or as an alternative the system could propose to re-use the same folder structure used thus far – although that is much harder than doing it manually.

      • I would put it in a KCM. Then you can add simple AND OR combination of tags. For example the path ~/Documents/Finances should be used when the Tags “bank, money” (logic AND) or “taxes” are present. I would say this is something for users that know what they are doing, but some predefined things could be done. IE with Downloads: Put them in ~/Download, tag them accordingly automatically. odt, pdf etc. put them in Documents, mp3, ogg etc. put them in Music etc. That would of course mean that we would have to distinguish by mime type as well, but same could be done with the predefined document types.

      • Regarding this I would propose the following mechanism:
        I have document with some tags e.g. school,math,2009
        1. Is there another file with these tags?
        Yes: Put it in the same folder | No: Go ahead
        2. Is there a folder path containing the tags (e.g. documents/school/subjects/math/2009-10)
        Yes: Put it in there | No: Go ahead
        3. Is there a pattern in file-location in files with similar tags (e.g. documents/school/subjects//) you can apply to the given tags
        Yes: suggest the location to the user | No: Ask the user to set the Location manually (and/or use a standard-folder)

        Before this is effectively usable, all files need to have sensible tags. This could easily be achieved by taking the path from the next location on (like dolphin-breadcrumbs do it (e.g. Documents/school instead of /home/whoever/Documents/school)) and use every folder-name as a tag (Documents,school). This should easily create the needed tag-index to make the above working.

  3. 1) I prefer the biggest version, maybe just make more important fields visually exposed, that user know what care about.

    I think in stand of “folder” it could be used “place”, with options like:
    Documents,(and other user defined locations)
    Pendrive (and all remote devices)
    Digikam Album -> when selected list of Digikam albums appears
    Remote location -> list of know remote locations
    Custom -> appears control to select file
    I don’t know if it is to much maybe some could be removed or moved to suboption

    How do you think about adding optional simple file versioning like in digikam, it could be handled by Nepomuk

    • Actually I do not know the simple file versioning in Digikam. How does that work?
      As for the places: it sounds like a good idea as it gives less priority to the folders again while defining places on a higher level. I suppose for starters we could just go with the places that are already defined in KDE – the ones that Dolphin for example shows in the places panel.

      And how would you propose to visually expose the important fields?

  4. I like this, usability is better.
    One question:
    I have a (real) group of 10 ordinary users. They produce thousands of documents per year, mainly odt/ods. As they are lazy, I doubt they will fill properly or regularly all the fields. What will happened if they don’t use folders anymore? will they be able to open the document they wants?

    • Well, there is always the automatically created metadata like creation date, (soon usage dates), types, file names, file content. All that can be used to find the files again. IMHO mime type, creation and modification dates are often enough to find stuff.
      It is imperative that this system also works if users do not annotate their files like professional knowledge workers.

      • Well, sorry for arguing ;-) but I’m sure that mime type, creation and modification dates will not work:
        Some users (and they are real users, using kde at work for years) doesn’t understand type of documents. And it’s normal: now that Okular launch automaticaly when you receive a pdf attached in email, why should one know that document type is pdf? And if you remove folders, don’t you think that it make sense to remove distinction between type of documents (simpler users experience)?
        Regarding dates: users, especially group of users, can’t remember exactly when someone have done something. Naming a document (or telling someone that the document contains some words) is the prefered way to retrieve something.

        But I don’t think it is a big problem: I can see that my users never use Open dialog, they allways browse directories. So they will likely using searches to retrieve something. The only important thing is how to search and how to present results. But file name, file content and doc type are easy to search with.

        How can document type be shared between users without boring them with an enormous list (each user work on it’s own type of docs)? say, one users is working on a press release, it has created the “press release” doc type. Another one is in charge of the project, he’s asked to complete the press release document, and a third one is the supervisor who wants to check the work. The danger is that they could shared the doc by email instead of retrieving it on the hardrive (they already do that sometimes because they have too much folders/sub folders).

        To complete my point: I’m not sure that document type is the right term. While it’s correct regarding semantic datas, it might be strange for users. May be “document concern” or “doc. objective” or …? I’m trying to find something more related to actions.

        I don’t know if this are pertinent questions but I’m very concerned with usability as I can see that users never make progress using computers.

        • About file sharing via email: the idea is to share the metadata with the file. We are currently working on that actually and the only open question (it is a complicated one though) is security.
          As for document type: you are right, the term “type” is very technical and should be changed to something from real life. But that…?

          • About file sharing by email: I didn’t like it. I was just thinking of users using it instead of retrieving the document on the hard drive if they feel it’s too difficult to find it.
            Instead of document type: “classifying, or classify, document”? (in french: classement, ou classer, le document) However classifying could be used for the whole dialog instead of save as.

        • About sharing by email,

          A combination of tinyUrl and OwnCould could be used.

          It means user sends a link, that points to the file. An auth could be added to resolve security issues, but more importantly, JavaScript on files page could determine if user is KDE user, and if so, open file directly using KIO://, otherwise, offer to download as stand alone file.

          If users are sharing files on the network, browsing large archives is a paint in the … So sending a “file” by email could do the job, by just linking the right file in right place.

      • Oh! Did I see this or dream this or? But, dates! Dates ar brilliant! Would it be doable to make a slider where one slide an arrowe through a line of dates, and below it one can see (previews?) of documents created on that date? (Is my “Thsi must have been done by Apple” feeling correct here?)

  5. I also think it is an excellent idea to have a document type attribute, the problem is the number of document types. Having a scrollable list just won’t be user-friendly I think.

    Here are a few document types which are only within domain (procurement), think of the size of that list when other domains are added:
    – requirement/requisition
    – request for quotation
    – contract
    – purchase order
    – pick list
    – goods issue
    – certificate
    – customs declaration form
    – goods receipt
    – invoice
    – credit note
    – nota fiscal

    So, either the list would have to be a hierarchy, or some other solution should be found.

    Another issue is that for some – or for some time – it may not be the document type that is most important attribute, but the customer with which it is associated. In terms of storage, the document type should not be different from any of the other attributes I suppose, it’s just another tag. And to get a better seach the tag values should be associated with attributes, so that there is a difference between searching for “Customer: Berlin” and “Location: Berlin” for those who use the search options optimally.
    Would it be possible to implement this (tags associated with attributes) in Nepomuk and let the user select which attribute(s) to display in addition to the name in the default dialog? I know it is too much to ask, so I’ll just ask if it is possible :-)
    From a user interface perspective I would in the latter case move the folder to the right of the name, and have the attribute(s) listed below.

    Having worked with a DMS (SAP DMS) it has dawned upon me that document metadata are a complex matter and hard to get right the first time…. Heck, it is hard to get right even the third time around. The most important aspect of a solution seems to be that it is flexible and can handle the ever-changing requirements – and has tools which allows you to mass-update existing documents so that you will be able to find them according to your new attribute definitions.
    Basically that’s what you are doing here, you are creating a document management system where metadata are in focus and can be used to find your document again. It’s the best way when the number of documents gets large, and if I recall correct someone once told me that BeOS had this built in. You never selected where to store a file, just the associated tags. When you searched for it you could find it for any of those tags.
    Digikam has parts of this already implemented – with people tags, location tags, event tags etc. If this could be available throughout KDE and of course integrated with centralized storage it would be one of the real killer features for organizations such the local government (commune) where they currently spend loads of money on document storage systems.

    Then again, for those users good tools for managing access to documents would of course also have to be able to integrate with the solution.

    OK; back to work – I think I got a bit carried away.

    Summary: I applaud your effort, and a request from me would be to avoid locking the solution to having document type as a special attribute but rather treat it as one of many. That way the solution is open to future improvements.

    • Let me first comment on the attributes: Nepomuk is based on RDF. As such all properties are typed and have typed ranges. That means that for example the document type is not only a keyword but it is a fully qualified entity known to the system and internally just another property. The only reason I expose it so prominently is that I feel it is easier to understand than the rest while being very powerful. The same is true with other annotations like people, projects, and so on. Thus, stating that some person is the customer a certain invoice relates to is exactly what Nepomuk is intended for. The generic term “relation” is just a basis for any kind of specialization you can think of. Thus, you will indeed be able to list all documents which have customer “Foobar” but not author “Foobar” although they are “related” to both. All in all the short answer is: yes. :)

      As for the types: you are right – a plain list is no solution, at least not in the long run. The first implementation will probably use one. But since I am no fan of long lists anyway I will quickly try to replace that. Here the first things to try are 1. only show most frequently used N types and 2. try to filter the types by context. The latter can include the activity, the mime type (an image is not likely to be an invoice for example), and maybe the saving application.

  6. The whole idea of semantic saving is great.
    Anyway why in the name field there’s “.pdf” that doesn’t make any sense.
    The usual mime type could be a tag

    • I thought about that – stripping away the extension before saving. But I think we cannot do this just yet. Users will look for the extension or even add it. One possibility would be to only make the name editable but not the extension.

      • I would not like that. As I know some people will use the extension naming to choose the type of file. Like when editing image in GIMP. You plan to save it. Instead scrolling a other type than XCF, you just type filename.jpg and it gets saved as JPG instead XCF.

        But I can see a point of the problem what *.pdf will give when user clicks filename and it is selected and when typing it gets deleted.

        • Maybe make it use case dependant: when downloading a file from the web you will not change the extension. But in an app like gimp you actually do (I do the same thing). So maybe the extension would only be read-only if the app already provides a fixed mimetype or a prefilled file name.

          • sometimes you do, when the webpage is php, firefox often wants to save it as a .php, while what firefox actually sees is a .html

  7. Non-KDE software will have to be encouraged to incorporate facility for using the KDE file chooser. OOo and Firefox currently support the KDE file chooser, but not very well. Other software such as the Gimp do not at all.

  8. I like where this is going, but as you’ve already noted the filename part is clunky legacy.

    An alternative might be to have to different dialogs, one for (semantic) Save and one for Export. Export would follow the traditional Save As… style (although you may want to add something like a “Associate this file with the document” checkbox to distinguish between export-as-copy and traditional save).

    For the (semantic) Save dialog leave out the filename, instead use a genuine name (as in dc:title) along with the other fields you describe. As well as the name/title there’d be the date, media type and whatever categories/tags you drop in available for disambiguation.

    • In the first mockup I used a title indeed. But then in one comment it was pointed out that a file already had its file name which is a sort of title. So why make the distinction? I think it only confuses the user and if we do not care about the file name anymore anyway we can just sync it to the title when saving.

  9. I like the slim default version. Easy enough just type name and location and type.
    And I agree that in the extended view a “Folder” choosing box should stay where it is in slim view – under a name field. And then bring comment and annotations below it.

    Am I having a correct idea what we can be expected to see in the future, that with “annotations” and “document type”, that we could actually forget the directory where the file is saved but still find it later easily with filemanager?

    As someway I hope we could make “smart folders” what we create in dolphin or save dialog. We could add meta info for that specific folder and then files what are annotated and linked with “document type” will get saved there automatically?

    Example: I get new project in work what I know will take few weeks/months. So I create a new folder in dolphin to my ~/Projects/ and name it as “Project X”. Then from Dolphin sidepanel, I assign some metadata to that folder.

    Then when I start creating documents what belongs to it, I get lots of different types, images, spreadsheets, text documents, music and video files. And everytime I create new file, I can give file name, document type and annotate just the project and so on I dont need to care about folder as the system knows where it should be saved?

    Could I then add later new folders with wanted metadata in that project folder and when I again annotate, they get saved there? Or if I add people as metadata to those folders and I get emails with attachments (or emails itself) they get saved to those correct folders?

    I can not put finger on it but I just have that feeling of your vision (or general idea of the semantic system what is been in IT for decades) that when using production applications, user does not need to do file management. But the file management is done with filemanager application itself.
    Like with Dolphin user can first prepare the working environment, add folders, create links, add metadata etc. And then later manage all the files what has been created with production applications.

    And then with just a production applications (Words, Stage, Krita etc) I could annotate different infos (people, email addresses etc). And they get saved to correct places?
    And because files gets more metadata, we don’t need to anymore relay on directories and filenames and work with them. But we can get it mostly automated?

    As I could see with this your mockup, that when I annotate “Software project nepomuk”. Then in ~/Projects folder will created folder named “nepomuk” and all files stored there what I have annoated with project nepomuk.

    Or am I totally mistaken?

    • Well, actually I did not think about creating specific folders at all since I envision file browsing to always be virtual, ie. never use an actual folder on disk. Thus, I would actually skip the overhead (as in implementation) of sorting files into folders on disk based on their metadata and simply never show physical folders to the user.

  10. I have to say something in favour of folders. And how they relate to tags.

    One of the big problem with folders is that they form a single hierarchy. But I think it is important to understand that hierarchy as a set of TAGS. Rather than trying to convince the user hedoesn’t need to know where his files are, use the files as a basis for tagging.

    For example, in the folder administration/accounts I should have the tags ledger, bill, important, and perhaps a list of tasks.

    In the code folder, I should have the tags project a, project b etc.

    Use tags to enrich the filesystem hierarchy, not to obscure it. Because we will always need backup and remote access.

    • Both backup and remote access can perfectly be done on top of a virtual folder system. IMHO it is no argument for physical folders. (This is of course only true if all applications and systems understand the new way of organizing files – and that is the vision after all.)

      • I think my points were not clear.
        – Tagging ought to be hierarchical (field – project – part) as well as contextual (when – who – why)
        – Folders (for those of us who organise-ish our stuff) already provide the hierarchical part. Physical folders. That is what they are for. If you cannot get people to construct hierarchies of folders, how do you expect them to tag files properly?
        – A big problem with tagging is that it is time-consuming. When using the folders as a base for tagging, you gain quite a bit (notably nepomuk doesn’t use the folder path when searching, as it should) [1].
        – It will take many many years before all applications understand the virtual filesystem. Anything which doesn’t play along with the physical filesystem is broken. For example, you are a coder. If tomorrow, kate (vim is not really concerned with semantic filesaving, I guess) stopped showing the physical location of your files, how would that work for you? And this is an important use-case: clearly, the proportion of geeks and coders on linux is high.

        Bottom line: folders hierarchies are already tags is a sense. Make users “tag their files to a directory” and let them add context in the form of additional tags. Old-schoolers are extra happy, newcomers wowed.

        [1] an example of this is music organised in Artist/album hierarchies. Of course alternative hierarchies are possible or even desirable at times. But the point is that the folders already reflect the tags — and using tags to create hierarchies is useful. Amarok does this.

        • I fully agree. It does make sense to combine folders and tags, i.e. assigning a tag could automatically fill in the folder field, i.e. us the path assigned to a tag. But dreaming of no folders is going to fail simply because the real world works differently.

          E.g. if I save a picture it gets the picture tag and thus is saved somewhere in /home/user/Pictures the rest of the path is either defined by hand or by another tag which could e.g. be 2011 or Family.

          Getting rid of folders will fail IMHO, combining them and using them as kind of tags would improve things.

          • I already stated some of those ideas in my comment at the top. I think that, even though the goal is to remove the need for physical folders completely, it would be very smart to still use them as a fallback by using them for the hierarchial part. As stated above I would like to see some tags to be automatically translated to according directories. Mime types could also be handled accordingly. That way files are (kind of) organized when viewing the disk from a not supported operating system/desktop environment and also tags can be predefined on the type of document saved, i.e music, document, video. maybe even generate tags automatically by parsing the filename and suggesting stuff. For example: When I transcode Videos from my Mythbox I put certain infos in the filename, maybe parse that info (user can provide the regex) and put tags accordingly, in my case: languages available, codecs, genre.

          • This is where IMHO you are wrong: your brain does NOT work like physical folders. And that is exactly the point. The “real world” you are talking about is the very restricted way of using classifiers and drawers which is imposed onto us by the restrictions of the 3D world. Our brain, however, works completely differently, it does relate things by context and properties. Of course the brain works much more fuzzy than what we could ever achieve with Nepomuk but the direction is somewhat similar.

  11. Maybe I miss something but I do not see a speed-bar, i.e. some widget to easily jump/pick a folder for e.g. images/documents/external device etc.

    Other thoughts:

    IMHO the destination/folder is most important and should be picked first which has a few advantages.

    a) If the destination is already correct it does not do any harm
    b) If the user wants to save it to another then the currently selected location selecting this first gives nepomuk further info on the potential context of the file. This means that e.g. the annotations and document types of files in the selected destination can be considered and offered prioritised. Otherwise the document drop-down and annotations space can become cluttered and very scroll-intensive because of the many items in it.

    From my point of view saving a file most of the times does not start with the file name, simply because most files we save do already have a file name, i.e. are downloaded from some other location. So either file type or destination are the first things to select which in turn provide the context for the rest of the dialogue. One can even combine them.

    So if I save a picture I would like to easily pick “my pictures folder” or the device I want to save to, e.g. some usb stick, i.e. nepomuk would have to have a look at the mime and offer me:

    [General pictures folder] [Last folder a picture was saved to] [Devices/Places (when clicked on showing all mounted media/home/root etc.] as buttons to simply click on and move on from there.

    Most of the times the user is now already done, i.e. he saves an existing file to a location.

    If I save a document it’s the same.

    The next thing might be additional info, i.e. doc-type (which could be filled-in automatically based on the mime type and/or the already existing files in the picked destination).

    And last if at all the file name which in most cases won’t be changed – unless one created a new document.

    • Actually since this is a file save dialog selecting a name is important since it will be used whenever you save an office document. Saving files from the web is also a use case, but I actually think that the name is as important then. If I download an image or some pdf or whatever the file name is often too generic for my taste. Thus, I need to change it (BTW something I hate about Firefox – it does not allow me to change the name).

      Plus, the whole purpose of this project is to get rid of physical folders. Thus, making them most prominent is exactly what I do not want.

      • IMO this is not realistic simply because there are too many actions and apps that will always rely on folders or categories, i.e. removing whole folders, copying whole folders non-KDE apps, backups etc.

        And of course external devices will always be there and need a quick way to access, bluetooth devices etc. One folder to store it all is messy and people always categorise into folders simply because nobody mixes everything in one box in the real world either (except messies). You keep pictures separate from music and documents.

        So as hard as it might sound, my guess is that anything that will not take into account destinations/folders will fail. Even on mobile devices files are separated according to their type. But who knows, maybe I’m wrong and soon nobody needs folders anymore.

        Just make sure you do not force this on the user because as good as it might be in theory that will turn as many users against it that it will hardly ever recover.

  12. Places are good idea but it need some kind of filtering .e.g Filesystem Root is usually in places but is not good place for storing files

    I need to find exploitation how it works in digikam. It’s called “non destructive editing” Brief explanation: You have two options save and save as new version. Versions are separated files (with different name sufixes) changes make in every version are written (i need to check how it is stored) .

    BDW maybe separate saving to three options (save, save as new version, export)

    Export allows to choose place and file format (wizard?)
    Save just save file in default format in default location.
    Save as new version as above but in new file (digikam allows to change file format in saving in new version)

    I think file dialog in that case could only have filename and annotations. Comment could be just another type of annotation. (list entry that could be drop down and edited)

    Default file storage could be set per activity.

    • Sorry, It was answer to post

      “Actually I do not know the simple file versioning in Digikam. How does that work?
      As for the places: it sounds like a good idea as it gives less priority to the folders again while defining places on a higher level. I suppose for starters we could just go with the places that are already defined in KDE – the ones that Dolphin for example shows in the places panel.

      And how would you propose to visually expose the important fields?”

  13. One small thing I noticed in the mockup is that the file extension is included in the name-entry, which I don’t think is good. Naming a file usually don’t include the file-extension. E.g. the name of the file is “filename” not “filename.pdf”. File extension should be chosen in a drop down menu or similar and just automatically added when saving.

    Also, just an idea that popped up about getting rid of physical folders. Wouldn’t it be nicer if Nepomuk sorted into folders automatically based on tags and categories? The old schoolers probably still want control over it, but new users that don’t care where files are saved, but with Nepomuk sorting them into folders they should still be able to find the files in the file tree. Just a thought.

  14. A contradictory message: the big screen put the ‘Add Information’ field above the ‘Folder’ field which suggests that it’s more important, but the ‘slim default’ hide the ‘Add Information’ field but keep the ‘Folder’ field suggesting the opposite..

    I think that you have to choose what is more important and be coherent.

  15. There is a problem with tagging only system, there are no categories. The problem can be seen with an example.

    This would be the path with folders:
    -Projects
    -Project 1
    -Data
    -Data foo
    -30 files of which I want a few

    With tags:
    -Search Project 1 tag
    -Be presented with 1000 files, try to remember (or look at all 1000 files) what kind of files there was (did I have some documents? or I only have pictures about project 1? was there any “Data foo”?)
    -After remembering you have to type/choose from a list what kind of data you need.
    -After inputting “data” or “document” you get the files you want.

    The main advantage of the first is that at each step, you get an overview of what’s in the folder, with tags you are first presented with A LOT of choices, and have to get smaller until you get to your files.
    I’m very sure I don’t remember what’s in every folder in my system, and I’m moderately tidy.
    The only solution would be making tags “top level” or “second level”, etc. And presenting them in the same gui of folders, or a gui that works just as well. And if that work is already being done, making them correspond to real folders is not much work.

  16. Pingback: Semantic Save – Mockups are Easy… | Trueg's Blog

  17. I do believe semantic file management should be an EXTENSION to physical file system, not replacement.

    Any folder could have meta info attached, and it could be used to display additional content. E.g. ~/Projects/ProjectA is tagged/labeled with CompanyB, PersonC and PlaceD.Once user opens physical folder, Nempomuk also tires to find all related files and collections, and merges the result with exiting content. The big question, is how to group the files, as if project has 1000+ files, the right one could be found.

    But in this case a virtual (one for each attached tag/label) folders like semantic://Projects/ProjectA/PersonC could be used to navigate with project.

    The key of folder like structure – folders are CLICK – CLICK – CLICK – done kind of thing.
    Using only semantic data requires typing. In many cases I (and probably many users) has no idea how the document was called – was it rabbit, was it bunny, was it hare? Browsing solves this kind of issues. Also browsing is usually faster than typing :)

Leave a comment