All My Eye

This week, I've been playing with SquirrelRDF. It allows you to query RDBs with SPARQL. Magic.

Why? Well, there are lots of RDBs full of interesting metadata, lurking under rocks here at Ingenta. It's my job to worry about integrating metadata, and for us, the future is RDF.

I could tip ALL the data into my nice new triplestore. All your databases will be ASSSIMILATED. Resistance is futile, etc etc. And that's probably the way to go with most of it.

But in some cases, this approach just isn't appropriate. What if the owner guards it jealously and I'm waiting for them to go on leave before we snaffle it? What if the data is itself a bit freaky, and I just don't want it polluting my store? What if I haven't written the loader yet. (Writing loaders is a lot of work!) What if there's just butt-loads of it, so the load is going to take 6 weeks of processing? I want that data NOW!

I decided to experiment with SquirrelRDF (Although D2RQ would be another option.) It implements this spec, and works by treating each row as a resource, where the column names are its properties, and the column values are the objects.

Eg:

I have a database called POND, and in it, a table called FROGS which goes


 ID | name | colour | legs
----------------------------
 1 | Fred  | green  | 4
 2 | Felix | yellow | 4 
 3 | Frank | green  | 3

This gets translated to triples:

_x ns:FROGS_ID 1
_x ns:FROGS_name "Fred"
_x ns:FROGS_colour "green"
_x ns:FROGS_legs 4
_y ns:FROGS_ID 2
...etc..

Where "ns" is some URI I configure - eg http://www.ingenta.com/POND/

One nice advantage is I don't have to define a new vocabulary for Frogs - that's all automatic, based on the db schema.

Overall, it was a good out-of-the box experience. I downloaded the jars, and ran a command line utility which generated this maping file. It seemed to handle the datatypes OK, and made the automatic vocabulary. Next, I was able to run a SPARQL query using another command line utility. Finally, I installed the little webapp, and did the same thing over HTTP; I entered this query in the box, and got these results.

Of course, it won't be able to offer the flexibility of a real RDF store - it will still be a faff to add a new property etc. But, at least I should be able to do SPARQL on it, and start joining it up with other data from the triplestore, or from under other rocks.

How does a scruffy little RDB get into a Triplestores-Only Golf Club?

Contributors