A Fun Release: Nepomuk TV Namer 0.2

As requested I prepared a release of the TV Show managing thingi I implemented. You can download it from download.kde.org mirrors at unstable/nepomuk/nepomuktvnamer-0.2.0.tar.bz2.

The nepomuktvnamer 0.2.0 is a little more polished than the original version and comes with a nice service menu extension allowing to manually start the fetching of TV Show information on folders or video files. This is important since the service does only react on new videos. So you need to start the initial information fetching manually on your TV Show folder.

The tvnamer has two requirements in addition to the typical KDE ones:

  • LibTVDb – LibTvdb is a Qt-based library which provides asynchronous access to TV series information from thetvdb.com via a very simple interface. Its use in the Nepomuk TV namer should be obvious.
  • Shared-Desktop-Ontologies 0.9.0 – The recently released new version of SDO provides the required nfo:depiction property used by the tvnamer to store banners.

I also recommend to apply the kdelibs patch I mentioned earlier to actually see the TV Show banners. Have fun with it – maybe someone will even package it.

Just For The Fun Of It: Browsing Music With Nepomuk

Since implementing the TV Show KIO slave was that easy I decided I could do the same for music – just to show how simple it can be. There a a few more lines but that is only because I added browsing by album, artist, and genre. So there are a lot of if/else constructs. Anyway, here goes:

Browsing music by artist is easy. As you can see I also implemented a preview generator plugin the same way I did for the TV Shows. The only problem is that there is no tool yet that automatically fetches those images. Thus, I had to do it manually for one example which looks somewhat like this:

qdbus org.kde.NepomukStorage 
  /datamanagement
  org.kde.nepomuk.DataManagement.addProperty
  "nepomuk:/res/0152825f-5c49-4ca8-aa0a-23fc9a1305f1"
  "nfo:depiction"
  "/home/trueg/atb2.jpg"
  "shell"

This is part of the fancy Data management API which allows me to add the file atb2.jg as a nfo:depiction of the nco:Contact resource identifying the artist ATB.

Anyway, entering the artist themselves and what lies beyond:

(Again I had to fetch the cover art manually. I did not want to implement my own cover art retrieval tool and I found the Amarok code not to be very reusable. Again maybe someone wants to take up this task?)

Finally we end up in the album tracks. Sadly dragging an album to a media player playlist does not work yet. I am not quite sure how to fix that.

Last but not least a quick look at browsing by genre:

This was fun. But before I go to bed let me share with you the very simple code which is responsible for the nice previews (abbreviated of course):

bool MusicThumbCreator::create(const QString &path,
                               int w, int h,
                               QImage &img)
{
  KUrl url(path);
  QStringList pathTokens
      = url.path().split('/', QString::SkipEmptyParts);
  if(pathTokens.count() < 2) {
    return false;
  }

  // there are only two cases for us: artists and albums
  if(pathTokens[pathTokens.count()-2] == QLatin1String("artists") ||
     pathTokens[pathTokens.count()-2] == QLatin1String("albums")) {
      const QUrl uri = recoverUriFromUrlToken(pathTokens.last());
    // we just query the first depiction there is
    Soprano::QueryResultIterator it
       = Nepomuk::ResourceManager::instance()->mainModel()
         ->executeQuery(
              QString::fromLatin1("select ?u where { "
                                  "%1 nfo:depiction [ nie:url ?u ] . "
                                  "} LIMIT 1")
              .arg(Soprano::Node::resourceToN3(uri)),
              Soprano::Query::QueryLanguageSparql);
    if(it.next()) {
      img.load(it["u"].uri().toLocalFile());
      return true;
    }
  }

  return false;
}

The rest of the code can be found in the nepomuk-audio-kio-slave scratch repository. Maybe at some point I could just throw all of those things into some “Nepomuk KIO extensions” package… oh, well, off to bed now…

More Fun With TV Shows

After fetching all the details about TV Shows from thetvdb.com I went back to my favorite way of browsing things: KIO slaves. So without further ado let me introduce the tvshow:/ KIO slave:

So the root folder lists all TV Series. As you can see the previews are messed up aspect-ratio-wise. If anyone has an idea of how to improve that without patching KIcon or KIO or caching my own thumbnails in some tmp folder please tell me.

Entering the season listing…

And finally the episodes. And just because it is fun here is one more:

Why do this? Well, nepomuksearch cannot create sub-folders (yet) and this only has about 120 relevant lines of code, most of which is used up by the three queries it creates.

To try it simply update your git clone of the nepomuktvnamer and have fun.

A Little Bit Of Query Optimization

Every once in a while I add another piece of query optimization code to the Nepomuk Query API. This time it was a direct result of my earlier TV Show handling. I simply thought that a query like “downton season=2 episode=2” takes too long to complete.

Now in order to understand this you need to know that there is a rather simple QueryParser class which converts a query string like the above into a Nepomuk::Query::Query which is simply a collection of Nepomuk::Query::Term instances. A Query instance is then converted into a SPARQL query which can be handled by Virtuoso. This SPARQL query already contains a set of optimizations, some specific to Virtuoso, some specific to Nepomuk. Of course there is always room for improvement.

So let us get back to our query “downton season=2 episode=2” and look at the resulting SPARQL query string (I simplified the query a bit for readability. The important parts are still there):

select distinct ?r where {
  ?r nmm:season "2"^^xsd:int .
  {
    ?r nmm:hasEpisode ?v2 .
    ?v2 ?v3 "2"^^xsd:int .
    ?v3 rdfs:subPropertyOf rdfs:label .
  } UNION {
    ?r nmm:episodeNumber "2"^^xsd:int .
  } .
  {
    ?r ?v4 ?v6 .
    FILTER(bif:contains(?v6, "'downton'")) .
  } UNION {
    ?r ?v4 ?v7 .
    ?v7 ?v5 ?v6 .
    ?v5 rdfs:subPropertyOf rdfs:label .
    FILTER(bif:contains(?v6, "'downton'")) .
  } .
}

Like the user query the SPARQL query has three main parts: the graph pattern checking the nmm:season, the graph pattern checking the episode and the graph pattern checking the full text search term “downton“. The latter we can safely ignore in this case. It is always a UNION so full text searches also include relations to tags and the like.

The interesting bit is the second UNION. The query parser matched the term “episode” to properties nmm:hasEpisode and nmm:episodeNumber. On first glance this is fine since both contain the term “episode“. However, the property nmm:season which is used in the first non-optional graph-pattern has a domain of nmm:TVShow. nmm:hasEpisode on the other hand has a domain of nmm:TVSeries. That means that the first pattern in the UNION can never match in combination with the first graph pattern since the domains are different.

The obvious optimization is to remove the first part of the UNION which yields a much simpler and way faster query:

select distinct ?r where {
  ?r nmm:season "2"^^xsd:int .
  ?r nmm:episodeNumber "2"^^xsd:int .
  {
    ?r ?v4 ?v6 .
    FILTER(bif:contains(?v6, "'downton'")) .
  } UNION {
    ?r ?v4 ?v7 .
    ?v7 ?v5 ?v6 .
    ?v5 rdfs:subPropertyOf rdfs:label .
    FILTER(bif:contains(?v6, "'downton'")) .
  } .
}

Well, sadly this is not generically true since resources can be double/triple/whateveriple-typed, meaning that in theory an nmm:TVShow could also have type nmm:TVSeries. In this case it is obviously not likely but there are many cases in which it in fact does apply. Thus, this optimization cannot be applied to all queries. I will, however, include it in the parser where it is very likely that the user does not take double-typing into account.

If you have good examples that show why this optimization should not be included in the query parser by default please tell me so I can re-consider.

Now after having written this and proof-reading the SPARQL query I realize that this particular query could have been optimized in a much simpler way: the value “2” is obviously an integer value, thus it can never match to a non-literal like required for the nmm:hasEpisode property…

A Little Drier But Not That Dry: Extracting Websites From Nepomuk Resources

After writing about my TV Show Namer I want to get out some more ideas and examples before I will be retiring as a full-time KDE developer in a few weeks.

The original idea for what I am about to present came a long while ago when I remembered that Vishesh gave me a link on IRC but I could not remember when exactly. So I figured that it would be nice to extract web links from Nepomuk resources to be able to query and browse them.

As always what I figured would be a quick thing lead me to a few bugs which I needed to fix before moving on. So all in all it took much longer than I had hoped. Anyway, the result is another small application called nepomukwebsiteextractor. It is a small tool without a UI which will extract websites from the given resource or file. If called without arguments it will query for resources which do not have any related websites and and extract websites from them. Since it tries to fetch a title for each website this is a very slow procedure.

As before the storing to Nepomuk is the easy part. Getting the information is way harder:

using namespace Nepomuk;
using namespace Nepomuk::Vocabulary;

// create the main Website resource
NFO::Website website(url);
website.addType(NFO::WebDataObject());
QString title = fetchHtmlPageTitle(url);
if(!title.isEmpty()) {
  website.setTitle(title);
}

// create the domain website resource
KUrl domainUrl = extractDomain(url);
NFO::Website domainWebPage(domainUrl);
domainWebPage.addType(NFO::WebDataObject());
domainWebPage.addPart(website.uri());
title = fetchHtmlPageTitle(domainUrl);
if(!title.isEmpty()) {
  domainWebPage.setTitle(title);
}

// relate the two via the nie:isPartOf relation
website.addProperty(NIE::isPartOf(), domainUrl);
domainWebPage.addProperty(NIE::hasPart(), website.uri());

// funnily enough the domain is a sub-resource of the website
// this is so removing the website will also remove the domain
// as it is the one which triggered the domain resource's creation
website.addSubResource(domainUrl);

// save it all to Nepomuk
Nepomuk::storeResources(SimpleResourceGraph() << website << domainWebPage);

Once done you will have thousands of nfo:Website resources in your Nepomuk database, each of which are related to their respective domain via nie:isPartOf (I am not entirely sure if this is perfectly sound but it is convenient as far as graph traversal goes). We can of course query those resources with nepomukshell (this is trivial but allows me to pimp up this blog post with a screenshot):

And of course Dolphin shows the extracted links in its meta-data panel:

I am not entirely sure how to usefully show this information to the user yet but it is already quite nice to navigate the sub-graph which has been created here.

Of course we could query all the resources which mention a link with domain http://www.kde.org:

select ?r where {
  ?r nie:links ?w .
  ?w a nfo:Website .
  ?w nie:isPartOf ?p .
  ?p nie:url <http://www.kde.org> .
}

Or the Nepomuk API version of the same:

using namespace Nepomuk::Query;
using namespace Nepomuk::Vocabulary;

Query query =
  ComparisonTerm(NIE::links(),
    ResourceTypeTerm(NFO::Website()) &&
    ComparisonTerm(NIE::isPartOf(),
      ResourceTerm(QUrl("http://www.kde.org"))
    )
  );

It gets even more interesting when combined with the nfo:Websites created by KParts when downloading files.

Well, now I provided screenshots, code examples, and a link to a repository – I think it is all there – have fun.

Update: In the spirit of promoting the previously mentioned ResourceWatcher here is how the website extractor would monitor for new stuff to be extracted:

Nepomuk::ResourceWatcher* watcher = new Nepomuk::ResourceWatcher(this);
watcher->addProperty(NIE::plainTextContent());
connect(watcher, 
        SIGNAL(propertyAdded(Nepomuk::Resource,
                             Nepomuk::Types::Property,
                             QVariant)),
        this,
        SLOT(slotPropertyAdded(Nepomuk::Resource,
                               Nepomuk::Types::Property,
                               QVariant)));
watcher->start();

[...]

void slotPropertyAdded(const Nepomuk::Resource& res,
                       const Nepomuk::Types::Property&,
                       const QVariant& value) {
  if(!hasOneOfThoseXmlOrRdfMimeTypes(res)) {
    const QString text = value.toString();
    extractWebsites(res, text);
  }
}

Something Way Less Dry: TV Shows

After my rather boring blog about change notifications I will now to write about something that I wanted every since I started developing Nepomuk. But only now has Nepomuk reached a point where it provides all the necessary pieces. I am talking about TV Show management – obviously I mean the rips from the DVD boxes I own.

So what about it? Well, I wrote a little tool called nepomuktvnamer (inspired by the great python tool tvnamer) which works a bit like our nepomukindexer except that it does not extract meta-data from the file but tries to fetch information about TV Shows from thetvdb.com. You can run the tool on a single file or recursively on a whole directory. It will then use a set of regular expressions (based on the ones from tvnamer)  to analyze the file names and extract the show title, season and episode numbers.

The nepomuktvnamer will ask the user in case multiple matches have been found and cannot be filtered according to season and episode numbers

It will then save that information into Nepomuk through our powerful Data Management API. The code looks a bit as follows ignoring code to store actors, banners and the like.

const Tvdb::Series series = getSeriesForName(name);
Nepomuk::NMM::TVSeries seriesRes;
seriesRes.setTitle(series.name());
seriesRes.addDescription(series.overview());

Nepomuk::NMM::TVShow episodeRes(url);
episodeRes.setEpisodeNumber(episode);
episodeRes.setSeason(season);
episodeRes.setTitle(series[season][episode].name());
episodeRes.setSynopsis(series[season][episode].overview());
episodeRes.setReleaseDate(QDateTime(series[season][episode].firstAired(), QTime(), Qt::UTC));
episodeRes.setGenres(series.genres());

seriesRes.addEpisode(episodeRes.uri());
episodeRes.setSeries(seriesRes.uri());

Nepomuk::SimpleResourceGraph graph;
graph << episodeRes << seriesRes;
Nepomuk::storeResources(graph, Nepomuk::IdentifyNew, Nepomuk::OverwriteProperties)

(This code uses my very own LibTvdb which is essentially a Qt’ish wrapper around the thetvdb.org API.)

The result of this can be seen in Dolphin:

Here we see the actors, the series, the synopsis and so on. Clicking on an actor will bring up all they played in, clicking on the series will bring up all the episodes from that series, and so on.

Now let us have a look at the series itself using my beefed up version of the Nepomuk KIO slave:

As we can see the nepomuktvnamer also fetched a banner which is stored as nie:depiction. (A reason why to compile nepomuktvnamer you need the git master version of shared-desktop-ontologies. Oh, and also nepomuktvnamer is linked against libnepomukcore from nepomuk-core instead of libnepomuk. So you either have to install nepomuk-core which cab be a bit tricky or quickly change the CMakeLists.txt to link to libnepomuk instead.)

We can of course also query the newly created information. Simple queries in Dolphin could be “series:Sherlock” or “sherlock season=1″. Well, things to play with.

I also created the smallest Nepomuk service to date: the nepomuktvnamerservice uses the ResourceWatcher to listen for newly created nfo:Video resources and simply calls the nepomuktvnamer on the related file.

Last but not least the git repository contains a python script which checks for each existing series if a new episode has been aired. The output looks a bit like this:

White Collar - New episode "Withdrawal" (02x01) first aired 13 July 2010.
Freaks and Geeks - No new episode found.
The Mentalist - Upcoming episode "Red is the New Black" (04x13) will air 02 February 2012.

Now obviously this is more a task for a Plasma applet. So if anyone out there is interested in doing that – please go ahead. I think it could be a cool thing. One basically only has to update whenever a new nmm:TVShow is created or when the new day dawns.

And the cherry on top is of course Bangarang: