Does your "Boy Scout Handbook" look as though it has been read by a grizzly bear?
Wednesday, March 15, 2006
"Does your 'Boy Scout Handbook' look as though it has been read by a grizzly bear? ... It's easy to repair paperback books using Japanese bookbinding techniques..." - according to : Bind It Fast.
It's less easy if they are electronic though...
I've spent the last several days wrestling with a grim data munging task.. loading Books into the MetaStore.
Previously, we've modelled Books as a funny kind of Journal. In the new triplestore, Books will be...well, BOOKS. This means we need to un-journal them.
I think I may need to backtrack at this point...:
Journals tend follow a regular structural pattern: Journal->Issues->Articles. In the triplestore, we've got RFDS classes for Journal, Issue, Article, and loading usually involves creating a bunch of these (along with associated extras like Authors) and using simple but sturdy prism:isPartOfs to hook them together.
Here's an example Article:
Note the isPartOfs? Articles are part of Issues; Issues are part of Journals. Article->Issue->Journal. That's the general structure of things. Great. Loading program sorted. I'm off for a long lunch.
...except, what with all my claims about flexibility, people are now expecting me to put *other* stuff like books in the store - the cheek!
The problem with books is that they refuse to follow the journal hierarchy model. The Grapes of Wrath doesn't have "Issues" or "Articles" - it's a book, damnit! It does have Chapters - but not in a very interesting way, metadata-wise. Something like New Essays on the Rationalists has Chapters in a stronger sense - each has its own authors...maybe this kind of Chapter is an Article?...hooray, things are looking up... but damn, where are the "Issues"?
So, the final model is complex. A book can have parts (chapters), or no parts, or even, (say it's an encyclopedia with volumes,) can have parts which themselves have parts. Here are some examples:
1. A monolithic "noparts" book:
2. A "onepart" book, with chapters:
3. A "twopart" book :
The book repair job has been grim and munggy. Still, a few more shoehorns have been binned.
I feel cleansed.
It's less easy if they are electronic though...
I've spent the last several days wrestling with a grim data munging task.. loading Books into the MetaStore.
Previously, we've modelled Books as a funny kind of Journal. In the new triplestore, Books will be...well, BOOKS. This means we need to un-journal them.
I think I may need to backtrack at this point...:
Journals tend follow a regular structural pattern: Journal->Issues->Articles. In the triplestore, we've got RFDS classes for Journal, Issue, Article, and loading usually involves creating a bunch of these (along with associated extras like Authors) and using simple but sturdy prism:isPartOfs to hook them together.
Here's an example Article:
<rdf:RDF xml:base="http://metastore.ingenta.com/content/" xmlns:branding="http://metastore.ingenta.com/ns/branding/" .. bla, bla, namespace:bla ">
<struct:Article rdf:about="http://metastore.ingenta.com/content/articles/6094">
<dc:title><xhtml:spanxml:lang="en">Skeletal Muscle Hypoperfusion During Recovery from Maximal Supine Bicycle Exercise in Patients With Heart Failure</xhtml:span>
</dc:title>
<ident:doi>10.1016/S0002-9149(96)00421-3</ident:doi>
<ident:sectionNumber>23</ident:sectionNumber>
<branding:tocSubHeading>Short Communications</branding:tocSubHeading>
<prism:endingPage>844</prism:endingPage>
<prism:startingPage>841</prism:startingPage>
<dc:description>(the abstract)</dc:description>
<dc:identifier rdf:resource="http://metastore.ingenta.com/content/00029149/v78n7/p841"/>
<foaf:maker rdf:resource="http://metastore.ingenta.com/content/authors/15832"/>
<foaf:maker rdf:resource="http://metastore.ingenta.com/content/authors/10786"/>
<prism:isPartOf>
<struct:Issue rdf:about="http://metastore.ingenta.com/content/parts/396">
<prism:coverDate>1996-10-01T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime</prism:coverDate>
<prism:number>7</prism:number>
<prism:volume>78</prism:volume>
<prism:isPartOf>
<struct:Journal rdf:about="http://metastore.ingenta.com/content/titles/1885">
<prism:issn>00029149</prism:issn>
(etc.. journal properties)
<prism:isPartOf rdf:resource="http://metastore.ingenta.com/content/pubs/84"/>
</struct:Journal>
</prism:isPartOf>
</struct:Issue>
</prism:isPartOf
</struct:Article>
</rdf:RDF>
Note the isPartOfs? Articles are part of Issues; Issues are part of Journals. Article->Issue->Journal. That's the general structure of things. Great. Loading program sorted. I'm off for a long lunch.
...except, what with all my claims about flexibility, people are now expecting me to put *other* stuff like books in the store - the cheek!
The problem with books is that they refuse to follow the journal hierarchy model. The Grapes of Wrath doesn't have "Issues" or "Articles" - it's a book, damnit! It does have Chapters - but not in a very interesting way, metadata-wise. Something like New Essays on the Rationalists has Chapters in a stronger sense - each has its own authors...maybe this kind of Chapter is an Article?...hooray, things are looking up... but damn, where are the "Issues"?
So, the final model is complex. A book can have parts (chapters), or no parts, or even, (say it's an encyclopedia with volumes,) can have parts which themselves have parts. Here are some examples:
1. A monolithic "noparts" book:
<struct:Book rdf:about="http://metastore.ingenta.com/content/titles/12980">
<struct:partHierarchy rdf:resource="http://metastore.ingenta.com/content/parthierarchy/nopart"/>
<ident:isbn>9789290795346</ident:isbn>
<ident:shortCode>rers</ident:shortCode>
<branding:tocSubHeading>Rethinking the EU Regulatory Strategy for the Internal Energy Market</branding:tocSubHeading>
<prism:coverDate>2004-12-01T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime</prism:coverDate>
<prism:isPartOf rdf:resource="http://metastore.ingenta.com/content/pubs/283"/>
<dc:description>(the abstract)</dc:description>
<dc:format>application/pdf</dc:format>
<dc:title><xhtml:span xml:lang="en">Rethinking the EU Regulatory Strategy for the Internal Energy Market</xhtml:span>
</dc:title>
<foaf:maker rdf:resource="http://metastore.ingenta.com/content/authors/6919970"/>
<foaf:maker rdf:resource="http://metastore.ingenta.com/content/authors/6919969"/>
</struct:Book>
2. A "onepart" book, with chapters:
<struct:Chapter rdf:about="http://metastore.ingenta.com/content/articles/3562204">
<ident:doi>10.1093/0195144260.003.0002</ident:doi>
<ident:sectionNumber>3</ident:sectionNumber>
<prism:endingPage>48</prism:endingPage>
<prism:startingPage>31</prism:startingPage>
<dc:description>(the abstract)</dc:description>
<dc:title><xhtml:span xml:lang="en">1. The Colonial Legacy</xhtml:span></dc:title>
<foaf:maker rdf:resource="http://metastore.ingenta.com/content/authors/6920869"/>
<prism:isPartOf>
<struct:Book rdf:about="http://metastore.ingenta.com/content/titles/13834">
<struct:partHierarchy rdf:resource="http://metastore.ingenta.com/content/parthierarchy/onepart"/>
<ident:isbn>9780195144260</ident:isbn>
<ident:shortCode>341157</ident:shortCode>
(etc)
</struct:Book>
</prism:isPartOf>
</struct:Chapter>
3. A "twopart" book :
<struct:Chapter rdf:about="http://metastore.ingenta.com/content/articles/3559033">
<branding:pdfSize>1803154</branding:pdfSize>
<ident:doi>10.1166/000000004323030339</ident:doi>
<ident:sectionNumber>25</ident:sectionNumber>
<prism:endingPage>640</prism:endingPage>
<dc:title><xhtml:spanxml:lang="en">Self-Organization of Colloidal Nanoparticles</xhtml:span></dc:title>
<foaf:maker rdf:resource="http://metastore.ingenta.com/content/authors/6919913"/>
<foaf:maker rdf:resource="http://metastore.ingenta.com/content/authors/6919912"/>
<prism:isPartOf>
<struct:BookPart rdf:about="http://metastore.ingenta.com/content/parts/259254">
<prism:volume>9</prism:volume>
<prism:coverDate>2004-03-01T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime</prism:coverDate>
<prism:isPartOf>
<struct:Book rdf:about="http://metastore.ingenta.com/content/titles/12957">
<ident:isbn>9781588830012</ident:isbn>
<dc:title><xhtml:span xml:lang="en">Encyclopedia of Nanoscience and Nanotechnology</xhtml:span></dc:title>
<struct:partHierarchy rdf:resource="http://metastore.ingenta.com/content/parthierarchy/twopart"/>
</struct:Book>
</prism:isPartOf>
</struct:BookPart>
</prism:isPartOf>
</struct:Chapter>
The book repair job has been grim and munggy. Still, a few more shoehorns have been binned.
I feel cleansed.
posted by Katie Portwin at 10:19 am
<<Blog Home