Intermission: Why I Needed To Fork QProcess For K3b

Yes, I know. This is supposed to be the Nepomuk blog. And yes, I have a blog over at k3b.org. But this one is so much nicer. Also: the blog’s name is “trueg”, not “nepomuk”. And anyway: I write what I damn well please! ;) … OK, enough mindless exaggeration, I doubt that anyone really has a problem with me blogging about K3b a bit. So here goes.

K3b for KDE 4. Where is it? Well, it is on its way, as it has been for the last few years. It started with Laurent porting the core of the build system and running all those nice KDE3->KDE4 scripts. He then invited me to Paris for a prelonged K3b-porting-session. We worked good but K3b was far from fully ported. Then time passed again and I only worked on Nepomuk. Then there was the week without Internet. Again some K3b porting. But still no end in sight. Finally Mandriva really wanted K3b done for 2009 Spring and put two developers on it (part time and under my supervision). This was a nice but also wise move. Everybody who ever worked on K3b or sent patches knows how protective I am of its code. So by letting two developers work on K3b Mandriva kind of forced me to work on it, too. Not the bad kind of being forced, not at all. I am finally enjoing hacking on K3b again.

Right, nice story, but what does that actually mean for the KDE 4 port? When will it be done? Well, if all goes well K3b 2 will be released with Mandriva 2009 Spring. But that is not certain. K3b is big and the port from KDE3 to KDE4 is not trivial at all. This brings us to the actual topic of this blog entry: QProcess and why I had to fork it.

A little background: In KDE 3 KProcess was non-cross-platform and used pipes to communicate. These pipes were even accessible through the API. This allowed me to implement K3bProcess which featured raw modes, i.e. blocking reading and writing. This was used all over libk3b to pipe data from one process to the other or between processes and some image creating thread. The advantage is obvious: the highest data-throughput, no handing of signals and unnecessary async behavior. Life was good.

Then came KDE 4 and with it KProcess became a convinience wrapper around the cross-platform QProcess. Sounds like a good concept and actually is one, too. However, the internals of QProcess are hidden and its API is way too high level for K3b’s needs. Why? Well, In K3b I used to pump data between processes and threads in a separate thread. QProcess has some methods for usage without an event loop which I thought would solve my problems. I thought I could just block using QProcess::waitForReadyRead and QProcess::waitForBytesWritten in my pumping thread. However, this is not possible. Simply because the processes are started in the GUI thread and thus, they have an event loop. Then QProcess::waitForReadyRead clashes badly with the internal QSocketNotifiers. OK, so that was not an option.

My next step was to change the multi-threaded design of K3b to an async one. I did not really like the idea since that would mean that data pumping (and thus the stream of data to cdrecord and others) would have to share event cycles with GUI updates and user interaction. Anyway I went for it, ported it to use signals and slots, only act on QProcess::readyRead and QProcess::bytesWritten. Looked OK. Well, it was not. QProcess is fully buffered, double-buffered even since QIODevice already has its own buffer. This makes sense for its async design (although I fail to understand the need for two buffers). But here it really messed things up. The read buffer has no maximum and can grow to an arbitrary size. That is exactly what happended. Mkisofs was happily creating a 4 GB ISO image while cdrecord was still initializing things. In the end, K3b used 4 GB of memory. Unacceptable. And yet another useless coding round.

So finally I realized that I really needed a fork of QProcess (and with it KProcess since that is what K3bProcess is based upon). And that is what I did. K3b now has classes with weird names like K3bQProcess and K3bKProcess. I did not change much in QProcess though. After a detour implementing an Unbuffered mode (I actually posted the crude patch as a Qt task) I ended up introducing flags RawStdin and RawStdout which do exactly what I want: reading and writing becomes blocking and full sync. Now K3b can pump its data as before and it works swimmingly. All jobs use QIODevice for communication, all of those devices need to be unbuffered and sync.

This whole development took forever. But in the end I think it is clean and fast again. And once QtSoftware will include the unbuffered mode in Qt, all communication becomes cross-platform, too (I did only patch the unix portion).

Well, this is it for now. Just a little telling of my adventures with QProcess and forking Qt classes.

Sadly I just read that Qt will not implement a blocking API. But the unbuffered mode does only make sense for me if it is blocking. Otherwise I am in multi-threading hell again, with QSocketNotifiers and waiting methods killing each other! Thus, I will have to maintain the fork :( Too bad, it could have been so easy…

A New Blog and The Possible End to the Java Dependancy in Nepomuk-KDE

I changed the blog system again. Why? It is rather simple: since the update of the blogging system of kdedevelopers.org it is virtually unusable. All the nice features are gone and I am told, one needs to have an account to comment. I need something spiffy that works nicely. WordPress was recommended to me.

Now back to the real content: A new backend for Soprano which could finally let us drop the sesame2 backend which needs a JVM.

Today I committed the new Virtuoso Soprano backend. Virtuoso is a powerful SQL/RDF DB server created by OpenLink Software. OpenLink provides an open-source version of their database server released under the GPL. The official description from their homepage reads:

At core, Virtuoso is a high-performance object-relational SQL database. As a database, it provides transactions, a smart SQL compiler, powerful stored-procedure language with optional Java and .Net server-side hosting, hot backup, SQL-99 support and more. It has all major data-access interfaces, such as ODBC, JDBC, ADO .Net and OLE/DB.
[…]
OpenLink Virtuoso supports SPARQL embedded into SQL for querying RDF data stored in Virtuoso’s database. SPARQL benefits from low-level support in the engine itself, such as SPARQL-aware type-casting rules and a dedicated IRI data type. This is the newest and fastest developing area in Virtuoso.

Virtuoso not only frees us from the shackles of the java dependency. It also provides a bunch of features that Sesame2 does not. Most importantly Virtuoso features full text indexing which can be used within SPARQL queries:

select ?foo where { ?foo rdfs:label ?label . ?label bif:contains "bar" . }

At the moment (using the sesame2 backend) we need to make use of the CLucene based full-text-indexing layer which I implemented for Soprano. While this works very well, it forces us to split queries into fulltext and graph part and then merge the results. That tends to be slow.

Apart from the Virtuoso also allows to do quite a lot of SPARQL magic with nested queries or even embedded SQL. Plus it supports SPARUL, the Sparql Update Language which will make a lot of code simpler and faster.

The nice guys at OpenLink were very open to the idea of using their server in KDE. With the release of Virtuoso 5.0.10 they introduced a new lite mode which trims the server down to our needs. The memory usage goes down to a minimum: IMHO roughly 80M is acceptable. (Especially since the JVM easily goes up to 10 times that value!) That in combination with disabling pretty much all features except for SPARQL we have a decent desktop DB solution.

Over the next weeks I will try to get the new backend into shape to replace the sesame2 backend. Hopefully by then distributions will have created packages for Virtuoso. If you want to give it a spin yourself without waiting or even to help me ;) this is what you need to do:

  • Get Virtuoso 5.0.10 from the Sourceforge download page
  • Install Virtuoso (obviously)
  • Start Virtuoso with a config file similar to the one below
  • Change the Nepomuk server config file (~/.kde4/share/config/nepomukserverrc) – Add "Soprano Backend=virtuosobackend” to the Basic Settings section.
  • Restart Nepomuk

Now Nepomuk should convert all existing data to the new backend. This process can take a while. There might even be errors. But it allows to test the new features such as the integrated fulltext indexing (which still has to be enabled manually).

And now the virtuoso.ini file:

[Database]
DatabaseFile = soprano-virtuoso.db
ErrorLogFile = soprano-virtuoso.log
TransactionFile = soprano-virtuoso.trx
xa_persistent_file = soprano-virtuoso.pxa
ErrorLogLevel = 7
FileExtend = 100
MaxCheckpointRemap = 1000
Striping = 0
TempStorage = TempDatabase

[TempDatabase]
DatabaseFile = soprano-virtuoso-temp.db
TransactionFile = soprano-virtuoso-temp.trx
MaxCheckpointRemap = 1000
Striping = 0

[Parameters]
LiteMode = 1
ServerPort = 1111
DisableUnixSocket = 0
O_DIRECT = 0
CaseMode = 1
CheckpointAuditTrail = 0
AllowOSCalls = 0
DirsAllowed = .
PrefixResultNames = 0
ServerThreads = 5 ; down from 10
CheckpointInterval = 10 ; down from 60
MaxDirtyBuffers = 50 ; down from 1200
SchedulerInterval = 5 ; down from 10
FreeTextBatchSize = 1000

[HTTPServer]
DavRoot = DAV
EnabledDavVSP = 0
HTTPProxyEnabled = 0
TempASPXDir = 0
Charset = UTF-8
ServerThreads = 2 ; down from 5
KeepAliveTimeout = 5 ; down from 10
HTTPThreadSize = 10000 ; down from 280000

[AutoRepair]
BadParentLinks = 0

[Client]
SQL_PREFETCH_ROWS = 10
SQL_PREFETCH_BYTES = 4096
SQL_QUERY_TIMEOUT = 0
SQL_TXN_TIMEOUT = 0

[VDB]
ArrayOptimization = 0
NumArrayParameters = 10
VDBDisconnectTimeout = 1000
KeepConnectionOnFixedThread = 0

[Replication]
ServerEnable = 0