A New Blog and The Possible End to the Java Dependancy in Nepomuk-KDE

February 19, 2009 / Sebastian Trüg

I changed the blog system again. Why? It is rather simple: since the update of the blogging system of kdedevelopers.org it is virtually unusable. All the nice features are gone and I am told, one needs to have an account to comment. I need something spiffy that works nicely. WordPress was recommended to me.

Now back to the real content: A new backend for Soprano which could finally let us drop the sesame2 backend which needs a JVM.

Today I committed the new Virtuoso Soprano backend. Virtuoso is a powerful SQL/RDF DB server created by OpenLink Software. OpenLink provides an open-source version of their database server released under the GPL. The official description from their homepage reads:

At core, Virtuoso is a high-performance object-relational SQL database. As a database, it provides transactions, a smart SQL compiler, powerful stored-procedure language with optional Java and .Net server-side hosting, hot backup, SQL-99 support and more. It has all major data-access interfaces, such as ODBC, JDBC, ADO .Net and OLE/DB.
[…]
OpenLink Virtuoso supports SPARQL embedded into SQL for querying RDF data stored in Virtuoso’s database. SPARQL benefits from low-level support in the engine itself, such as SPARQL-aware type-casting rules and a dedicated IRI data type. This is the newest and fastest developing area in Virtuoso.

Virtuoso not only frees us from the shackles of the java dependency. It also provides a bunch of features that Sesame2 does not. Most importantly Virtuoso features full text indexing which can be used within SPARQL queries:

select ?foo where { ?foo rdfs:label ?label . ?label bif:contains "bar" . }

At the moment (using the sesame2 backend) we need to make use of the CLucene based full-text-indexing layer which I implemented for Soprano. While this works very well, it forces us to split queries into fulltext and graph part and then merge the results. That tends to be slow.

Apart from the Virtuoso also allows to do quite a lot of SPARQL magic with nested queries or even embedded SQL. Plus it supports SPARUL, the Sparql Update Language which will make a lot of code simpler and faster.

The nice guys at OpenLink were very open to the idea of using their server in KDE. With the release of Virtuoso 5.0.10 they introduced a new lite mode which trims the server down to our needs. The memory usage goes down to a minimum: IMHO roughly 80M is acceptable. (Especially since the JVM easily goes up to 10 times that value!) That in combination with disabling pretty much all features except for SPARQL we have a decent desktop DB solution.

Over the next weeks I will try to get the new backend into shape to replace the sesame2 backend. Hopefully by then distributions will have created packages for Virtuoso. If you want to give it a spin yourself without waiting or even to help me ;) this is what you need to do:

Get Virtuoso 5.0.10 from the Sourceforge download page
Install Virtuoso (obviously)
Start Virtuoso with a config file similar to the one below
Change the Nepomuk server config file (~/.kde4/share/config/nepomukserverrc) – Add "Soprano Backend=virtuosobackend” to the Basic Settings section.
Restart Nepomuk

Now Nepomuk should convert all existing data to the new backend. This process can take a while. There might even be errors. But it allows to test the new features such as the integrated fulltext indexing (which still has to be enabled manually).

And now the virtuoso.ini file:
[Database] DatabaseFile = soprano-virtuoso.db ErrorLogFile = soprano-virtuoso.log TransactionFile = soprano-virtuoso.trx xa_persistent_file = soprano-virtuoso.pxa ErrorLogLevel = 7 FileExtend = 100 MaxCheckpointRemap = 1000 Striping = 0 TempStorage = TempDatabase

[TempDatabase]
DatabaseFile = soprano-virtuoso-temp.db
TransactionFile = soprano-virtuoso-temp.trx
MaxCheckpointRemap = 1000
Striping = 0

[Parameters]
LiteMode = 1
ServerPort = 1111
DisableUnixSocket = 0
O_DIRECT = 0
CaseMode = 1
CheckpointAuditTrail = 0
AllowOSCalls = 0
DirsAllowed = .
PrefixResultNames = 0
ServerThreads = 5 ; down from 10
CheckpointInterval = 10 ; down from 60
MaxDirtyBuffers = 50 ; down from 1200
SchedulerInterval = 5 ; down from 10
FreeTextBatchSize = 1000

[HTTPServer]
DavRoot = DAV
EnabledDavVSP = 0
HTTPProxyEnabled = 0
TempASPXDir = 0
Charset = UTF-8
ServerThreads = 2 ; down from 5
KeepAliveTimeout = 5 ; down from 10
HTTPThreadSize = 10000 ; down from 280000

[AutoRepair]
BadParentLinks = 0

[Client]
SQL_PREFETCH_ROWS = 10
SQL_PREFETCH_BYTES = 4096
SQL_QUERY_TIMEOUT = 0
SQL_TXN_TIMEOUT = 0

[VDB]
ArrayOptimization = 0
NumArrayParameters = 10
VDBDisconnectTimeout = 1000
KeepConnectionOnFixedThread = 0

[Replication]
ServerEnable = 0

30 thoughts on “A New Blog and The Possible End to the Java Dependancy in Nepomuk-KDE”

Anon

February 19, 2009 at 15:08

Hi Sebastien,

Just wanted to say “thanks” for switching your blog – adding comments to kdedevelopers.org is such a chore that I just don’t even bother :)

Great news about new, non-Java backend (many thanks, OpenLink :)), although I have to say I’m rather disturbed by the recent trend of KDE stating that “X amount of memory is acceptable”, where X is a number greater than that taken up by an entire running KDE3 environment, but since Nepomuk-KDE is purely optional, I guess this isn’t so bad. With 2GB of RAM in my laptop, I personally am not going to be losing any sleep, but it does nothing for the “KDE is bloated!1” meme that’s been going around for years :)

Reply
TheBlackCat

February 19, 2009 at 15:48

Nice! Is this change only for trunk or is it being included in 4.2.1?

Reply
Ian Monroe

February 19, 2009 at 15:49

Great news. I think Soprano is really doing things the correct way. I’m not sure about Tracker writing its own SPARQL database…

Of course if Tracker is awesome then Soprano could use it. So either way Soprano wins. :)

Reply
Diego.

February 19, 2009 at 16:01

Together with the reprise of the work on K3B this is great news!

Reply
Richard Dale

February 19, 2009 at 16:09

Woo hoo! I think this a very exciting news for KDE indeed. I think it is very significant that Open Link have re-packaged their database especially for KDE and are keen to partner with us.

I also think that the combination of Open Link and Soprano (with Ruby bindings for me) is perfect for web applications too. I would be really keen on having a KDE SPARQL endpoint hosted on an Open Link Virtuoso server. That would be the great basis for the Social Desktop project’s data store.

Reply
Tomaz Canabrava

February 19, 2009 at 16:17

That’s great to know that the sesame2 dependency is gone. but the first thing that came to my head was ‘now there’s 2 dependencies on sql servers’, since Amarok uses mysql for their implementation. it will be great if mysql uses the same one as nepomuk, or better yet, uses nepomuk directly.

( thanks for moving your blog, it’s better here ;D )

Reply
Eduardo Robles Elvira

February 19, 2009 at 16:26

Hi! It’s cool that we can get rid of the java. And I don’t if I should say this because I haven’t studied the case at all, but 80Mb of RAM usage seems quite high to me. And even more, I haven’t seen any JVM using “easily 10 times” that number 80×10 = 800Mb that’s wayy too much. Can you please try to ellaborate more on this matter?

Reply
Benoit Jacob

February 19, 2009 at 16:34

Excellent news! Thanks for your work in this direction, this should solve in my opinion one of the two main issues with Nepomuk! 80MB is probably acceptable on desktop systems that have at least 1GB RAM, and it sounds OK to focus on that as system with little RAM probably don’t want to run anything like Nepomuk anyway.

The other big issue in my opinion, is that tags should survive when files are moved/copied with “mv” and “cp”, but as we discussed, this requires a very deep change in the whole stack far below KDE and Nepomuk (inotify being insufficient), and at least it’s great that Nepomuk is highlighting the current shortcoming of the stack!

Reply
jospoortvliet

February 19, 2009 at 16:42

You only posted the the virtuoso.ini file to make sure your post was sufficiently geeky, right? ;-)

Reply
Ian Monroe

February 19, 2009 at 17:07

@Tomaz Canabrava Amarok doesn’t use RDF.

Switching to RDF and Soprano is an interesting idea though. I’m not sure how feasible it is to use Nepomuk as the default given that we’re a cross-desktop application.

Given that we have a hard enough time finding SQL experts to help us out, SPARQL might be a bit much though!

Reply
trueg

February 19, 2009 at 17:19

TheBlankCat: this is trunk (aka KDE 4.3) only.

Tomaz: yes, replacing Amarok’s mysql would be great. Daniel Winter will probably soon try that. At least I hope so. ;)

Eduardo: well, 80M is less than firefox or kontact are using, so, yes, I think it is quite good. As far as java goes: I don’t know why the mem usage at times is that high. It might be due to mem leaks in sesame2 or due to my misuse of JNI. But I would not know how to do it better since there is no documentation on using JNI to call java-code from C, only the other way around.

jos: you got me. ;)

Reply
Jakub

February 19, 2009 at 17:37

I’m a little disturbed too – with Nepomuk, Akonadi and Amarok running we will have 3 different RDMBS running – Virtuoso, Mysql Embedded and standalone Mysql server. Even making Amarok use Nepomuk (which is a good idea anyway) will not eliminate MySql. BTW: how Nepomuk makes impossible for Amarok to be multiplatform? It uses other KDE stuff like plasma anyway.
You mentioned that Virtuoso implements SQL too – maybe Akonadi could use it instead of MySQL?

Reply
Markus

February 19, 2009 at 17:42

Not only Amarok, but Akonadi also uses MySQL. This is all uneeded overhead. Why can’t you guys just work together and agree on a slingle database. After all, you are within the same project (KDE).

Reply
trueg

February 19, 2009 at 18:01

Stay tuned for my next blog where I discuss exactly that problem and show a possible solution which I came up with Tobias Koenig.

Reply
Pingback: Akonepomuk, Neponadi - Friendly Takeover or Real Love Marriage « Trueg’s Blog
Pingback: Nuova spinta a Nepomuk con un migliore «backend» « pollycoke :)
Pingback: » Nuova spinta a Nepomuk con un migliore «backend»
dcrabs

February 20, 2009 at 12:51

Nice, but how do I get soprano to recognize virtuoso in order to build the backend?
From your instructions I can’t get it to work.

Reply
- Sebastian Trüg
  
  February 20, 2009 at 13:07
  
  dcrabs: install libiodbc. then the Soprano virtuoso backend should be compiled.
  
  Reply
  - Dareus
    
    June 3, 2009 at 19:42
    
    it doesn’t seem enough on my machine…
    
    Reply
mario

February 20, 2009 at 23:44

i just want to say: thank you! getting rid of this huge depency only can be good. if the cool kde community moves along this speed every possible cause for a rant will be gone by 4.4 :p

Reply
Tom

February 22, 2009 at 13:07

Looking at virtuoso-opensource 5.0.10, it has a dependency on a Java VM

So we’ll end up needing both, a JVM and Virtuoso, lol.

Reply
Tom

February 22, 2009 at 13:16

Ah nevermind, it can be built without Java just fine

Reply
nadavkav

February 25, 2009 at 22:42

great move to WP ! love it.

i am about to follow your suggestions, but i am a little bit hesitating…
i just read these lines on virtuoso build instructions:
“…At least 800 MB of free space should be available in the build file system.
When running `make install’, the target file system should have about 460 MB free…”
should we go into that ? i already have mysql installed for several other applications and frameworks that i already use on my system.
is it possible to use mysql instead ?
(maybe i do get it, please be patient ;-)

Reply
- Sebastian Trüg
  
  February 26, 2009 at 09:26
  
  My installation uses 155 MB. This still is a lot but I think the install contains a lot of files that we don’t need. Thus, an optimized installation should be possible. I am trying to get this done with the help of the OpenLink people.
  
  Reply
  - nadavkav
    
    February 26, 2009 at 12:18
    
    great news :-)
    
    are you familiar with:
    http://ppa.launchpad.net/wdaniels/ppa/ubuntu/pool/main/v/virtuoso-opensource/
    
    i use debian and i saw some effort to package it for debian too.
    http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=508048
    
    i have allot of space on my system. i am just thinking from a system-wide perspective and trying to see if we are not splitting into too many DB backends.
    ( of course, if there is a real need then so it be ) in the KDE project at least.
    i am referring to : http://pusling.com/blog/?p=85
    
    Reply
Pingback: “Are We There Yet?” - The Long Road To a Stable Soprano Virtuoso Backend « Trueg’s Blog
E.T. Anderson

September 8, 2009 at 23:48

I’d like to add two refinements here. First, Thank you for your work on this, it’s really nice to skip the java dependency – It’s caused me (and everybody else who uses Nepomuk?) a few headaches.

Initially, I tried following the instructions in the virtuoso readme for how to start the server once it was installed. You’re supposed to call “virtuoso-t -f &” to start it up. I got a “virtuoso-t: command not found.” As far as I can see, the workaround is to use the complete path to the command, even if you’re in the same folder. This means: cd to /usr/local/virtuoso-opensource/var/lib/virtuoso/db and issue the command “/usr/local/virtuoso-opensource/bin/virtuoso-t -f &” It worked for me, I hope it works for you.

I use Kubuntu, and I don’t know if it’s different on other distros, but the nepomukserverrc file is in ~/.kde/share/config on my system.

Reply
- Sebastian Trüg
  
  September 14, 2009 at 11:15
  
  Actually a lot changed since that blog and I should really update it (will, too). In fact, virtuoso is now spawned automatically. However, the current version of Virtuoso (5.0.11) has a bug which prevents it from working with Nepomuk. A fix is on its way.
  
  Reply
ZAREMA

March 19, 2010 at 17:43

Thanks the author for article. The main thing do not forget about users, and continue in the same spirit.

Reply