Wherein the IngentaConnect Product Management, Engineering, and Sales Teams
ramble, rant, and generally sound off on topics of the day
 

Does your "Boy Scout Handbook" look as though it has been read by a grizzly bear?

Wednesday, March 15, 2006

"Does your 'Boy Scout Handbook' look as though it has been read by a grizzly bear? ... It's easy to repair paperback books using Japanese bookbinding techniques..." - according to : Bind It Fast.

It's less easy if they are electronic though...

I've spent the last several days wrestling with a grim data munging task.. loading Books into the MetaStore.

Previously, we've modelled Books as a funny kind of Journal. In the new triplestore, Books will be...well, BOOKS. This means we need to un-journal them.

I think I may need to backtrack at this point...:

Journals tend follow a regular structural pattern: Journal->Issues->Articles. In the triplestore, we've got RFDS classes for Journal, Issue, Article, and loading usually involves creating a bunch of these (along with associated extras like Authors) and using simple but sturdy prism:isPartOfs to hook them together.

Here's an example Article:

<rdf:RDF xml:base="http://metastore.ingenta.com/content/" xmlns:branding="http://metastore.ingenta.com/ns/branding/" .. bla, bla, namespace:bla ">
<struct:Article rdf:about="http://metastore.ingenta.com/content/articles/6094">
<dc:title><xhtml:spanxml:lang="en">Skeletal Muscle Hypoperfusion During Recovery from Maximal Supine Bicycle Exercise in Patients With Heart Failure</xhtml:span>
</dc:title>
<ident:doi>10.1016/S0002-9149(96)00421-3</ident:doi>
<ident:sectionNumber>23</ident:sectionNumber>
<branding:tocSubHeading>Short Communications</branding:tocSubHeading>
<prism:endingPage>844</prism:endingPage>
<prism:startingPage>841</prism:startingPage>
<dc:description>(the abstract)</dc:description>
<dc:identifier rdf:resource="http://metastore.ingenta.com/content/00029149/v78n7/p841"/>
<foaf:maker rdf:resource="http://metastore.ingenta.com/content/authors/15832"/>
<foaf:maker rdf:resource="http://metastore.ingenta.com/content/authors/10786"/>
<prism:isPartOf>
<struct:Issue rdf:about="http://metastore.ingenta.com/content/parts/396">
<prism:coverDate>1996-10-01T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime</prism:coverDate>
<prism:number>7</prism:number>
<prism:volume>78</prism:volume>
<prism:isPartOf>
<struct:Journal rdf:about="http://metastore.ingenta.com/content/titles/1885">
<prism:issn>00029149</prism:issn>
(etc.. journal properties)
<prism:isPartOf rdf:resource="http://metastore.ingenta.com/content/pubs/84"/>
</struct:Journal>
</prism:isPartOf>
</struct:Issue>
</prism:isPartOf
</struct:Article>
</rdf:RDF>


Note the isPartOfs? Articles are part of Issues; Issues are part of Journals. Article->Issue->Journal. That's the general structure of things. Great. Loading program sorted. I'm off for a long lunch.

...except, what with all my claims about flexibility, people are now expecting me to put *other* stuff like books in the store - the cheek!

The problem with books is that they refuse to follow the journal hierarchy model. The Grapes of Wrath doesn't have "Issues" or "Articles" - it's a book, damnit! It does have Chapters - but not in a very interesting way, metadata-wise. Something like New Essays on the Rationalists has Chapters in a stronger sense - each has its own authors...maybe this kind of Chapter is an Article?...hooray, things are looking up... but damn, where are the "Issues"?

So, the final model is complex. A book can have parts (chapters), or no parts, or even, (say it's an encyclopedia with volumes,) can have parts which themselves have parts. Here are some examples:

1. A monolithic "noparts" book:
<struct:Book rdf:about="http://metastore.ingenta.com/content/titles/12980">
<struct:partHierarchy rdf:resource="http://metastore.ingenta.com/content/parthierarchy/nopart"/>
<ident:isbn>9789290795346</ident:isbn>
<ident:shortCode>rers</ident:shortCode>
<branding:tocSubHeading>Rethinking the EU Regulatory Strategy for the Internal Energy Market</branding:tocSubHeading>
<prism:coverDate>2004-12-01T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime</prism:coverDate>
<prism:isPartOf rdf:resource="http://metastore.ingenta.com/content/pubs/283"/>
<dc:description>(the abstract)</dc:description>
<dc:format>application/pdf</dc:format>
<dc:title><xhtml:span xml:lang="en">Rethinking the EU Regulatory Strategy for the Internal Energy Market</xhtml:span>
</dc:title>
<foaf:maker rdf:resource="http://metastore.ingenta.com/content/authors/6919970"/>
<foaf:maker rdf:resource="http://metastore.ingenta.com/content/authors/6919969"/>
</struct:Book>


2. A "onepart" book, with chapters:
<struct:Chapter rdf:about="http://metastore.ingenta.com/content/articles/3562204">
<ident:doi>10.1093/0195144260.003.0002</ident:doi>
<ident:sectionNumber>3</ident:sectionNumber>
<prism:endingPage>48</prism:endingPage>
<prism:startingPage>31</prism:startingPage>
<dc:description>(the abstract)</dc:description>
<dc:title><xhtml:span xml:lang="en">1. The Colonial Legacy</xhtml:span></dc:title>
<foaf:maker rdf:resource="http://metastore.ingenta.com/content/authors/6920869"/>
<prism:isPartOf>
<struct:Book rdf:about="http://metastore.ingenta.com/content/titles/13834">
<struct:partHierarchy rdf:resource="http://metastore.ingenta.com/content/parthierarchy/onepart"/>
<ident:isbn>9780195144260</ident:isbn>
<ident:shortCode>341157</ident:shortCode>
(etc)
</struct:Book>
</prism:isPartOf>
</struct:Chapter>

3. A "twopart" book :
<struct:Chapter rdf:about="http://metastore.ingenta.com/content/articles/3559033">
<branding:pdfSize>1803154</branding:pdfSize>
<ident:doi>10.1166/000000004323030339</ident:doi>
<ident:sectionNumber>25</ident:sectionNumber>
<prism:endingPage>640</prism:endingPage>
<dc:title><xhtml:spanxml:lang="en">Self-Organization of Colloidal Nanoparticles</xhtml:span></dc:title>
<foaf:maker rdf:resource="http://metastore.ingenta.com/content/authors/6919913"/>
<foaf:maker rdf:resource="http://metastore.ingenta.com/content/authors/6919912"/>
<prism:isPartOf>
<struct:BookPart rdf:about="http://metastore.ingenta.com/content/parts/259254">
<prism:volume>9</prism:volume>
<prism:coverDate>2004-03-01T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime</prism:coverDate>
<prism:isPartOf>
<struct:Book rdf:about="http://metastore.ingenta.com/content/titles/12957">
<ident:isbn>9781588830012</ident:isbn>
<dc:title><xhtml:span xml:lang="en">Encyclopedia of Nanoscience and Nanotechnology</xhtml:span></dc:title>
<struct:partHierarchy rdf:resource="http://metastore.ingenta.com/content/parthierarchy/twopart"/>
</struct:Book>
</prism:isPartOf>
</struct:BookPart>
</prism:isPartOf>
</struct:Chapter>


The book repair job has been grim and munggy. Still, a few more shoehorns have been binned.

I feel cleansed.

posted by Katie Portwin at 10:19 am

 

<<Blog Home

The Team

Contact us

Recent Posts

Links

Blogs we're reading

RSS feed icon Subscribe to this site

How do I do that

Powered by Blogger