A New Blog and The Possible End to the Java Dependancy in Nepomuk-KDE


I changed the blog system again. Why? It is rather simple: since the update of the blogging system of kdedevelopers.org it is virtually unusable. All the nice features are gone and I am told, one needs to have an account to comment. I need something spiffy that works nicely. WordPress was recommended to me.

Now back to the real content: A new backend for Soprano which could finally let us drop the sesame2 backend which needs a JVM.

Today I committed the new Virtuoso Soprano backend. Virtuoso is a powerful SQL/RDF DB server created by OpenLink Software. OpenLink provides an open-source version of their database server released under the GPL. The official description from their homepage reads:

At core, Virtuoso is a high-performance object-relational SQL database. As a database, it provides transactions, a smart SQL compiler, powerful stored-procedure language with optional Java and .Net server-side hosting, hot backup, SQL-99 support and more. It has all major data-access interfaces, such as ODBC, JDBC, ADO .Net and OLE/DB.
[…]
OpenLink Virtuoso supports SPARQL embedded into SQL for querying RDF data stored in Virtuoso’s database. SPARQL benefits from low-level support in the engine itself, such as SPARQL-aware type-casting rules and a dedicated IRI data type. This is the newest and fastest developing area in Virtuoso.

Virtuoso not only frees us from the shackles of the java dependency. It also provides a bunch of features that Sesame2 does not. Most importantly Virtuoso features full text indexing which can be used within SPARQL queries:

select ?foo where { ?foo rdfs:label ?label . ?label bif:contains "bar" . }

At the moment (using the sesame2 backend) we need to make use of the CLucene based full-text-indexing layer which I implemented for Soprano. While this works very well, it forces us to split queries into fulltext and graph part and then merge the results. That tends to be slow.

Apart from the Virtuoso also allows to do quite a lot of SPARQL magic with nested queries or even embedded SQL. Plus it supports SPARUL, the Sparql Update Language which will make a lot of code simpler and faster.

The nice guys at OpenLink were very open to the idea of using their server in KDE. With the release of Virtuoso 5.0.10 they introduced a new lite mode which trims the server down to our needs. The memory usage goes down to a minimum: IMHO roughly 80M is acceptable. (Especially since the JVM easily goes up to 10 times that value!) That in combination with disabling pretty much all features except for SPARQL we have a decent desktop DB solution.

Over the next weeks I will try to get the new backend into shape to replace the sesame2 backend. Hopefully by then distributions will have created packages for Virtuoso. If you want to give it a spin yourself without waiting or even to help me ;) this is what you need to do:

  • Get Virtuoso 5.0.10 from the Sourceforge download page
  • Install Virtuoso (obviously)
  • Start Virtuoso with a config file similar to the one below
  • Change the Nepomuk server config file (~/.kde4/share/config/nepomukserverrc) – Add "Soprano Backend=virtuosobackend” to the Basic Settings section.
  • Restart Nepomuk

Now Nepomuk should convert all existing data to the new backend. This process can take a while. There might even be errors. But it allows to test the new features such as the integrated fulltext indexing (which still has to be enabled manually).

And now the virtuoso.ini file:

[Database]
DatabaseFile = soprano-virtuoso.db
ErrorLogFile = soprano-virtuoso.log
TransactionFile = soprano-virtuoso.trx
xa_persistent_file = soprano-virtuoso.pxa
ErrorLogLevel = 7
FileExtend = 100
MaxCheckpointRemap = 1000
Striping = 0
TempStorage = TempDatabase

[TempDatabase]
DatabaseFile = soprano-virtuoso-temp.db
TransactionFile = soprano-virtuoso-temp.trx
MaxCheckpointRemap = 1000
Striping = 0

[Parameters]
LiteMode = 1
ServerPort = 1111
DisableUnixSocket = 0
O_DIRECT = 0
CaseMode = 1
CheckpointAuditTrail = 0
AllowOSCalls = 0
DirsAllowed = .
PrefixResultNames = 0
ServerThreads = 5 ; down from 10
CheckpointInterval = 10 ; down from 60
MaxDirtyBuffers = 50 ; down from 1200
SchedulerInterval = 5 ; down from 10
FreeTextBatchSize = 1000

[HTTPServer]
DavRoot = DAV
EnabledDavVSP = 0
HTTPProxyEnabled = 0
TempASPXDir = 0
Charset = UTF-8
ServerThreads = 2 ; down from 5
KeepAliveTimeout = 5 ; down from 10
HTTPThreadSize = 10000 ; down from 280000

[AutoRepair]
BadParentLinks = 0

[Client]
SQL_PREFETCH_ROWS = 10
SQL_PREFETCH_BYTES = 4096
SQL_QUERY_TIMEOUT = 0
SQL_TXN_TIMEOUT = 0

[VDB]
ArrayOptimization = 0
NumArrayParameters = 10
VDBDisconnectTimeout = 1000
KeepConnectionOnFixedThread = 0

[Replication]
ServerEnable = 0

30 thoughts on “A New Blog and The Possible End to the Java Dependancy in Nepomuk-KDE

  1. Hi Sebastien,

    Just wanted to say “thanks” for switching your blog – adding comments to kdedevelopers.org is such a chore that I just don’t even bother :)

    Great news about new, non-Java backend (many thanks, OpenLink :)), although I have to say I’m rather disturbed by the recent trend of KDE stating that “X amount of memory is acceptable”, where X is a number greater than that taken up by an entire running KDE3 environment, but since Nepomuk-KDE is purely optional, I guess this isn’t so bad. With 2GB of RAM in my laptop, I personally am not going to be losing any sleep, but it does nothing for the “KDE is bloated!1” meme that’s been going around for years :)

  2. Great news. I think Soprano is really doing things the correct way. I’m not sure about Tracker writing its own SPARQL database…

    Of course if Tracker is awesome then Soprano could use it. So either way Soprano wins. :)

  3. Woo hoo! I think this a very exciting news for KDE indeed. I think it is very significant that Open Link have re-packaged their database especially for KDE and are keen to partner with us.

    I also think that the combination of Open Link and Soprano (with Ruby bindings for me) is perfect for web applications too. I would be really keen on having a KDE SPARQL endpoint hosted on an Open Link Virtuoso server. That would be the great basis for the Social Desktop project’s data store.

  4. That’s great to know that the sesame2 dependency is gone. but the first thing that came to my head was ‘now there’s 2 dependencies on sql servers’, since Amarok uses mysql for their implementation. it will be great if mysql uses the same one as nepomuk, or better yet, uses nepomuk directly.

    ( thanks for moving your blog, it’s better here ;D )

  5. Hi! It’s cool that we can get rid of the java. And I don’t if I should say this because I haven’t studied the case at all, but 80Mb of RAM usage seems quite high to me. And even more, I haven’t seen any JVM using “easily 10 times” that number 80×10 = 800Mb that’s wayy too much. Can you please try to ellaborate more on this matter?

  6. Excellent news! Thanks for your work in this direction, this should solve in my opinion one of the two main issues with Nepomuk! 80MB is probably acceptable on desktop systems that have at least 1GB RAM, and it sounds OK to focus on that as system with little RAM probably don’t want to run anything like Nepomuk anyway.

    The other big issue in my opinion, is that tags should survive when files are moved/copied with “mv” and “cp”, but as we discussed, this requires a very deep change in the whole stack far below KDE and Nepomuk (inotify being insufficient), and at least it’s great that Nepomuk is highlighting the current shortcoming of the stack!

  7. @Tomaz Canabrava Amarok doesn’t use RDF.

    Switching to RDF and Soprano is an interesting idea though. I’m not sure how feasible it is to use Nepomuk as the default given that we’re a cross-desktop application.

    Given that we have a hard enough time finding SQL experts to help us out, SPARQL might be a bit much though!

  8. TheBlankCat: this is trunk (aka KDE 4.3) only.

    Tomaz: yes, replacing Amarok’s mysql would be great. Daniel Winter will probably soon try that. At least I hope so. ;)

    Eduardo: well, 80M is less than firefox or kontact are using, so, yes, I think it is quite good. As far as java goes: I don’t know why the mem usage at times is that high. It might be due to mem leaks in sesame2 or due to my misuse of JNI. But I would not know how to do it better since there is no documentation on using JNI to call java-code from C, only the other way around.

    jos: you got me. ;)

  9. I’m a little disturbed too – with Nepomuk, Akonadi and Amarok running we will have 3 different RDMBS running – Virtuoso, Mysql Embedded and standalone Mysql server. Even making Amarok use Nepomuk (which is a good idea anyway) will not eliminate MySql. BTW: how Nepomuk makes impossible for Amarok to be multiplatform? It uses other KDE stuff like plasma anyway.
    You mentioned that Virtuoso implements SQL too – maybe Akonadi could use it instead of MySQL?

  10. Not only Amarok, but Akonadi also uses MySQL. This is all uneeded overhead. Why can’t you guys just work together and agree on a slingle database. After all, you are within the same project (KDE).

  11. Stay tuned for my next blog where I discuss exactly that problem and show a possible solution which I came up with Tobias Koenig.

  12. Pingback: Akonepomuk, Neponadi - Friendly Takeover or Real Love Marriage « Trueg’s Blog

  13. Pingback: Nuova spinta a Nepomuk con un migliore «backend» « pollycoke :)

  14. Pingback: » Nuova spinta a Nepomuk con un migliore «backend»

  15. Nice, but how do I get soprano to recognize virtuoso in order to build the backend?
    From your instructions I can’t get it to work.

  16. i just want to say: thank you! getting rid of this huge depency only can be good. if the cool kde community moves along this speed every possible cause for a rant will be gone by 4.4 :p

  17. Looking at virtuoso-opensource 5.0.10, it has a dependency on a Java VM

    So we’ll end up needing both, a JVM and Virtuoso, lol.

  18. great move to WP ! love it.

    i am about to follow your suggestions, but i am a little bit hesitating…
    i just read these lines on virtuoso build instructions:
    “…At least 800 MB of free space should be available in the build file system.
    When running `make install’, the target file system should have about 460 MB free…”
    should we go into that ? i already have mysql installed for several other applications and frameworks that i already use on my system.
    is it possible to use mysql instead ?
    (maybe i do get it, please be patient ;-)

  19. Pingback: “Are We There Yet?” - The Long Road To a Stable Soprano Virtuoso Backend « Trueg’s Blog

  20. I’d like to add two refinements here. First, Thank you for your work on this, it’s really nice to skip the java dependency – It’s caused me (and everybody else who uses Nepomuk?) a few headaches.

    Initially, I tried following the instructions in the virtuoso readme for how to start the server once it was installed. You’re supposed to call “virtuoso-t -f &” to start it up. I got a “virtuoso-t: command not found.” As far as I can see, the workaround is to use the complete path to the command, even if you’re in the same folder. This means: cd to /usr/local/virtuoso-opensource/var/lib/virtuoso/db and issue the command “/usr/local/virtuoso-opensource/bin/virtuoso-t -f &” It worked for me, I hope it works for you.

    I use Kubuntu, and I don’t know if it’s different on other distros, but the nepomukserverrc file is in ~/.kde/share/config on my system.

    • Actually a lot changed since that blog and I should really update it (will, too). In fact, virtuoso is now spawned automatically. However, the current version of Virtuoso (5.0.11) has a bug which prevents it from working with Nepomuk. A fix is on its way.

Leave a comment