About Strigi, Soprano, Virtuoso, CLucene, and Libstreamanalyzer

There seems to be a lot of confusion about the parts that make up the Nepomuk infrastructure. Let me shed some light.

Soprano is the RDF data storage and parsing library used in Nepomuk. Soprano provides a plugin for Virtuoso which is mandatory and requires libiodbc. It does NOT work with unixODBC (It compiles but simply does not work due to some extensions in libiodbc required for RDF data handling). In addition to the Virtuoso plugin Nepomuk requires the Raptor parser plugin and the Redland storage plugin for ontology import.

CLucene is not required in Nepomuk anymore. It has been used for full-text indexing in early versions of KDE but is superseded by the fullt-text indexing functionality of Virtuoso. Consequently the Soprano clucene module is not required anymore and development has effectively been stopped. It will most likely not be part of Soprano 3 (unless someone interested steps up and does the required work).

Virtuoso is a full-blown SQL server with a powerful RDF layer on top. OpenLink, the company developing Virtuoso, maintains an open channel of communication to the Nepomuk developers and introduced a “lite” mode for us (please no comments on how it still is not “lite”). Virtuoso 6.1.3 is the current version. It has a unicode bug which can be fixed by applying the patch attached to KDE bug 271664. Virtuoso 6.1.4 will be released soon and contains several fixes to bugs reported by me. An update is highly recommended.

Libstreamanalyzer and libstreams are libraries which are part of the Strigi project. In addition the Strigi project contains strigidaemon, an alternative scheduler for indexing files which is based on CLucene and not used by Nepomuk. I asked the maintainer of Strigi once to split libstreams and libstreamanalyzer into their own independently released packages. He refused which is understandable seeing as he has little time for Strigi as it is. As a consequence I advise packagers to either use libstreamanalyzer from git master or the latest tag instead of using released tarballs.

I think that is all. If I missed something please comment and I will update the post.

20 thoughts on “About Strigi, Soprano, Virtuoso, CLucene, and Libstreamanalyzer

  1. Just to verify my assumption: when using libstreamanalyzer and libstreams from projects.kde.org, the current strigi from strigi.sf.net isn’t needed anymore by nepomuk? Is that correct?

  2. Thanks, Sebastian. We all needed sorely this post. However, one thing is missing in this grand map: Soprano needs to be compiled against Raptor 2.0.4 and the latest Redland release; if you compile Soprano against Raptor 0.9.x, you will face gargantuan memory leaks in the nepomukstorage service.

    Now I’ll file bugs in my favourite distros to ask them to package libstreamanalyzer-git from today, instead of doing it by myself. I now know that this is the supported procedure.

    Thank you so much.

  3. As a packager, using libstreamanalyzer from git master isn’t an advice I’ll buy. Tracking git head, uploading a random snapshot, no thanks. Using the latest tag is a way much better advice to me.
    It would be nice though to communicate the latest tag. For example, 0.7.6 hasn’t been announced (not even kde-packager) and tarball isn’t available. You should track the git repository and notice a new tag. It worths mentioning 0.7.6 has PDF parsing broken and a extra patch is needed. As of independent release libraries, it should be easier now: code moved in git, each library is splitted individualy. It seems to me a matter of fixing the release process.

      • Would it be impossible to tag/release tarballs from kde-copies?

        What’s the worst that would happen? Imho, it’s better to visually go against a project, than to just complain about them.

        Also, I’ve dontated – good luck.

          • Any news on this?
            I mean, in the end it’s just a fork living on projects.kde.org where releases could be done completely independent from Jos.

            Also: as libstreamanalyzer/libstreams are AFAIK simply libraries, why not just start with 0.1 releases of them completely independent from the Strigi releases? As far as I understood, Strigi from sf.net just adds the Qt4 GUI on top of libstreamanalyzer/libstreams and that’s all.

            • Right after posting this, I realised one possible issue when starting versioning from 0.1: the so-names are already at 0.7.6 which would possibly cause way too much breakage and packager headaches.

              So releases on projects.kde.org would have to continue with 0.7.7.

  4. Thanks. It has been a very interesting reader. But, being not a developer, I’ve still some holes on the global picture.

    I assume Virtuoso, being the database, is the bottom layer, and when you say that “has a powerful RDF layer on top”, it means that you communicate with Virtuoso with RDF instead of SQL, and you use the abstraction provided by soprano to do it. Am I right?

    Raptor and Redland roles are less clear to me. I thought Redland was abandoned some time ago due to being too slow…

    Perhaps a link pointing to an example and/or a graphic would help a lot, but thanks anyway.

    • Virtuoso stores the data. It provides extensions to SQL through which Soprano can perform SPARQL queries via ODBC. Soprano then provides a Qt’ish API which is used by Nepomuk. Redland is abstracted by Soprano in the same way Virtuoso is. But it is only used for temporary memory RDF storage which is required when importing ontologies into Nepomuk.

  5. Hi Sebastian

    If Soprano is an RDF data storage (storing on Virtuoso through libiodbc) and Redland is a storage plugin, why do we need both? Can’t we use a single storage thing?

    Also, considering Akonadi also needs a SQL storage, can’t we somehow share the SQL database and access framework?

    • Redland is only required for temporary in-mem storage during ontology import. With some effort it could probably be removed as a dependency but to be honest I do not see the need. Redland is very small and pure C.
      As for Akonadi: we discussed the sharing of the database a long while ago. Sadly it apparently requires a lot of work on the Akonadi side which made it impossible back then. I am not sure about the situation today though.

  6. Sebastian, what about s-d-o? Which version is recommended for 4.6 and 4.7? Should we use 0.8 , 0.9 or you also suggesting tracking git master?

    • With SDO it is simple: I am the maintainer. Thus, as with Soprano always use the latest stable release with stable KDE/Nepomuk releases and track git master if you are also tracking Nepomuk git master.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s