Wherein the IngentaConnect Product Management, Engineering, and Sales Teams
ramble, rant, and generally sound off on topics of the day

Slides from ALPSP DRM Tech Update

Thursday, March 29, 2007

I just noticed that ALPSP have posted the slides from their recent Technology Update on DRM.

My talk, DRM: A Skeptic's View, as you can guess from the title was largely anti-DRM. I attempted to measure DRM against two yard-sticks: ability to enable new business models, and stopping unlicensed usage. My feeling is that DRM fails to achieve on either point. I feel strongly that publishers should be looking at ways to innovate and create new models for selling and licensing content, but that DRM isn't a necessary step to achieving that. I pointed to a few examples of what I think is innovative behaviour, e.g. Safari, Beta books, etc.

I found the talk from the British Library, outlining how they had deployed DRM for their Secure Electronic Delivery (SED) service, particularly compelling. Mat Pfleger clearly laid out some of the hidden costs of DRM that they encountered, particularly the increase in user support and the time it took to actually get clients up and running with the required versions of Acrobat. If you're thinking about implementing DRM you should look through this presentation. MIT Libraries recent cancellation of a service because of its use of DRM is also an interesting data point in the ongoing DRM debate.

I'm told that eventually there will be a podcast of the ALPSP event available so you'll be able to listen to all the talks and get the context and discussion not available from the slides alone.

Labels: ,

posted by Leigh Dodds at 11:08 am


Publishing 2.0 Conference 25th April

XML UK are holding an event next month called "Publishing 2.0". (See also the Upcoming.org entry)

This one day conference lines-up a number of acknowledged experts drawn from the publishing and XML standards worlds, and will focus on how some publishers are deriving benefits from innovation, as well as what up-coming technologies and standards are likely to be making an impact in the near future.

I'm going to be giving a talk on RDF; how we're using it to model academic publications; and why we think its going to deliver advantages. The full line-up is pretty impressive, and includes a keynote from Sean McGrath who is an excellent speaker.

I'm very pleased to have been invited to speak at the event. And as an un-reformed geek how could I turn down the opportunity to speak at Bletchley Park?!

Check the event page for registration details.


posted by Leigh Dodds at 10:48 am


Science Blogging, Blog Carnivals and Secondary Publishing

Wednesday, March 28, 2007

A few years ago I used to write the XML Deviant column for XML.com. Each week I had to submit an article summarizing the key issues, news and debates in the XML community. This meant tracking (amongst other sources) all the major mailing lists, reading nearly every discussion with a view to finding an interesting angle to write-up.

The experience was an education in not only writing to a schedule, but also a great introduction to a wide range of XML technologies and some pretty arcane markup lore. The process was also largely an editorial one; over time I think I got pretty good at picking out key contributors and relating topics across different forums. It was also pretty intensive as there was a lot of information to absorb and summarize.

Having gained this first hand experience at taking technical topics and opening them up to a wider audience, Blog Carnivals have interested me for some time. Blog Carnivals attempt to provide a weekly or monthly summary of key postings in a particular blogging community or topic. The source media is different (blogs versus mailing lists) but the editorial process and end results are essentially the same: a regular digest of important scholarly or technical discussions. Carole Anne Meyer has described Blog Carnivals as secondary publishing reinvented.

There's a growing number of high quality Blog Carnivals produced by scholarly communities. See for example this list of science carnivals which includes Bio::Blogs and Tangled Bank.

Ben Vershbow has described these scholarly carnivals as a "looser, less formal peer review" process. Vershbow went on to explain that "the idea of the carnival, refined and sharpened by academics and lifelong learners, might in fact have broader application for electronic publishing. It happily incorporates the de-centralized nature of the web, thriving through collaborative labor, and yet it retains the primacy of individual voices and editorial sensibilities".

So I was intrigued to read in Nautilus this week that the Bio::Blogs carnival has started producing a monthly compilation of the best bioinformatics articles in PDF format. The content is distributed using Box.net free online storage to actually host the content. This is another step closer to a traditional secondary publishing model; Carnivals are becoming more and more like traditional publications all the time.

Some publishers may feel threatened by this, but academic blogging is a useful complement to scholarly journals, not a replacement. They're different kinds of media, often with a different audience, and certainly with a different "voice".

If you're interested to learn more about science blogging and other new forms of interaction, there's an opportunity for you to hear presentations from Ben Vershbow, and Sandra Porter (science blogger and Tangled Bank contributor) at the International Scholarly Communications Conference which I'm chairing on the 13th April.


posted by Leigh Dodds at 2:15 pm


Persistent Links in Bookmarks

A few weeks ago I blogged an idea for incorporating a "preferred bookmark link" into web pages to improve the stability of links submitted to social bookmarking sites. The comments were favourable so I think its worthwhile pushing ahead with the idea.

However, I've since decided that my proposed implementation is wrong! Originally I suggested embedding the link in a META tag in the HTML. But it's dawned on me that the LINK tag is obviously a better alternative. The LINK tag is intended to be used to convey relationships between documents, and that's essentially what we're trying to achieve. There's even a predefined link type for indicating bookmark links.

This mechanism is already in use on many blogs to identify the "permalink" for a specific article, e.g:

<a href="...some...url" rel="bookmark" title="Permalink">Permalink</a>.

So I'm going to revise my proposal so that persistent links to academic articles, e.g. DOIs, are embedded into web pages by adding a LINK tag into the HEAD of the document as follows:

<link rel="bookmark" title="DOI" href="http://dx.doi.org/10.1000/1"/>

The system is extensible as we can agree a convention, similar to RSS auto-discovery that the combination of the rel and title attributes convey information about the type of link. For example to include both a stable DOI link and a direct link to the current publisher's website, we could use the following:

<link rel="bookmark" title="DOI" href="http://dx.doi.org/10.1000/1"/>
<link rel="bookmark" title="Publisher" href="http://www.doi.org/index.html"/>

User agents (e.g. bookmarklets and other tools) and social bookmarking sites can then offer the user a choice of which link to use (to avoid security issues) or simply store both.

Labelling DOIs like this also enables them to be more easily extracted for other purposes. We're already including DOIs, expressed as info: URIs, in our embedded Dublin Core metadata, but the actual web link is useful too (if not more so!)


Labels: ,

posted by Leigh Dodds at 11:37 am



Identifiers are a hot topic in academic publishing. We have article identifiers, but we need stable identifiers for authors and institutions too. More on author identifiers in another posting. But there's a need for identifiers for lots of other kinds of resources too. If we have an identifier for a resource then we can link to it and share metadata about it.

bioGUID is an attempt to provide "resolvable URIs for biological objects, such as publications, taxonomic names, nucleotide sequences, and specimens".
The system currently supports DOIs, PubMed identifiers, Handles, and GenBank sequences amongst other identifier schemes. Under the hood all the information is available as RDF. Linking to, for example, an organism we can find related links, bookmark the organism on del.icio.us, and view its location on a map.

Collections of identifiers like bioGUID will become key jumping off points in the growing web of data. These "linking hubs" will provide navigational aids to both humans and machines; tieing distributed data sets and collections into a larger hypertext system will have numerous benefits, not least of which will be making it easier to find stuff.

Web 2.0 is really the collective realization that it's the humble link that is the powerhouse of the internet.

Labels: , , ,

posted by Leigh Dodds at 10:51 am


The Machine is Us/ing Us

You may have seen the following video already as its been doing the rounds for a few weeks now. But if you haven't then I strongly recommend you watch it. Its only 4.5 minutes, so go grab a coffee.

The video is by Michael Wesch, assistant professor of cultural anthropology at Kansas State University. Its a great visual introduction to some Web 2.0 and markup concepts. You can read more about the video here. Wesch has explained that his aim was to "show people how digital technology has evolved and give them a sense of where it might be going and to give some momentum to the all-important conversation about the consequences of that on our global society."


posted by Leigh Dodds at 9:22 am


eye/to/eye: our latest publisher newsletter just mailed

A quick note to let you know that our latest newsletter for publishers just mailed on Monday. It can be accessed at http://eyetoeye.ingenta.com/publisher/ - this issue includes:
Past issues of the newsletter are also available, at http://eyetoeye.ingenta.com/publisher/archive.htm.

Labels: , ,

posted by Charlie Rapple at 9:21 am


Academic Journals as a Virtual File System

Tuesday, March 20, 2007

I've been taking a look at WebDAV recently as a means for enhancing some of our content management features.

WebDAV is an extension to HTTP that provides facilities for distributed authoring and versioning of documents. The protocol is natively supported on both the Windows and Mac desktops thereby allowing direct access, publishing and authoring of content held in remote repositories in a seamless way. To the end user the repository looks exactly like a network drive and they can use normal file management options to manage documents.

After reading a blog posting from Jon Udell yesterday and in particular the notion of WebDAV proxies and virtual documents, I got to wondering about where else WebDAV could be applied.

Central to WebDAV is the a notion of a hierarchy of documents, each of which can have its own metadata. Sounds very much like the typical browse path for academic journals to me: titles, issues, articles.

So what if we were to expose journals as a "virtual file system" using WebDAV? A user could then integrate a journal (or collection of journals) directly into their file-system. Accessing content would be as simple as browsing through directories (i.e. an issue) to find the relevant content (e.g. a PDF file, HTML file, etc).

Obviously access control is an issue here. Arbitrary users wouldn't be edit documents, so the file system would be read-only. And, apart from Open Access titles, not everyone would necessarily be authorized to access the full-text of all content within a journal. But that's OK too; a WebDAV server doesn't have to expose a real file-system. What's exposed can be a virtual collection of content and that content can be limited to just those journals, issues, and articles to which the user has access. In a similar fashion the WebDAV proxy could also ensure that usage statistics are accurately recorded.

There's other options too, e.g. for publishing and sharing related research data, supplementary information, etc. Authorized users (the publisher, author, editors) could add the related information directly into the WebDAV file-system.

It seems to me that this could be a pretty powerful technique. It would provide a simple, familiar metaphor for accessing journal content. And users wouldn't need to accumulate and organize local copies of PDFs (and they do!). Instead they would be able to mount the content from any networked computer.

The are some obvious downsides. There's no search option, no sophisticated browsing, etc. Viewing journals as a virtual file system of content is certainly not right for all users or usage patterns. But one thing that Web 2.0 has taught us is that supporting a flexible range of access options is an important criteria.

The idea speaks directly to a wider debate about how academic publishers could or should disseminate content in the future. Assuming the correct access controls are in place and the content metadata is easily and widely available, do publishers really need to more than expose their content in these kinds of ways? Instead leaving end users and intermediaries to add value?

Labels: , , ,

posted by Leigh Dodds at 9:59 am


ASA event : “Policies, Pricing and Purchasing”

Wednesday, March 14, 2007

IngentaConnect recently made it down to the Association of Subscription Agents and Intermediaries’ event “Policies, Pricing and Purchasing” in London. Following is a summary of the most interesting moments from our point of view (the full presentations are available for download from the ASA’s site).


Does the journal have a price any longer?
John Cox, Managing Director, John Cox Associates

Pricing the package
Nick Evans, Members Services Manager, ALPSP and Tamsyn Honour, Swets Information Services UK

Business models that work
Ian Snowley, Director, Academic Services, University of London Research Library Services, and President-Elect the Chartered Institute of Library and Information Professionals (CILIP)

1700 libraries were surveyed:


Intermediation in the new user environment
Chris Beckett, Director, Scholarly Information Strategies Ltd

Discovery - empowering the user
Kate Stanfield, Head of Knowledge Management, CMS Cameron McKenna LLP

Where are consortia heading? 1. The UK model
Paul Harwood, Director, Content Complete

posted by EdMcLean at 11:18 am


Top CiteULike tags for IngentaConnect articles

Friday, March 09, 2007

After reading about the BioMed Central tag cloud earlier this week, I mailed the CiteULike discussion list and asked about the possibility of a similar feature for other sites.

Richard Cameron has obligingly created this for us, and so now you can view a tag cloud for IngentaConnect articles tagged on CiteULike. Thanks to Richard for providing this so quickly.

The cloud makes for interesting reading, and it'd be interested to compare (or amalgamate) the data with that of different services, e.g. Connotea. Its also going to be interesting to compare these figures with our usage statistics; some content on the site gets much more activity than others. I suspect there may be audience differences between, e.g. general users of IngentaConnect and those that also use social bookmarking tools.

Labels: ,

posted by Leigh Dodds at 12:51 pm



Thursday, March 08, 2007

Rob Cornelius recently wrote a nice example of how to use Yahoo Pipes with IngentaConnect RSS feeds.

Today's example is going to use Grazr a service that amongst its various features, allows you to build a little customized RSS viewer for linking to or embedding in your own applications.

Its pretty easy to do:

Firstly, visit the Grazr Create a Widget page. You'll be prompted with a simple three step process:

As I've just written a posting about OPML exports lets create a widget for exploring the table of contents data for our Medical titles.

This is the URL we need. Size limits on the widget restrict the numbers of titles it'll import in one go, so that link only contains the first 200 titles. Copy and paste that into the box provided in Step 1, and click "Update". After a few seconds your widget should be showing a list of titles.

In Step 2 we choose a colour scheme, fonts, and sizing. I decided to try the the "sateen_green" theme. Here's how it looks:

Finally, we decide to publish. For the purposes of this article, lets use the little grazr icon:

Open Grazr

And we're done. Click on the icon, and you can view the widget for yourself. Selecting a title will pull in the latest list of articles, and you can then click through to read the abstract (or download the full text) on IngentaConnect.

Labels: ,

posted by Leigh Dodds at 4:13 pm


OPML Exports from IngentaConnect

One feature we've had on IngentaConnect for a while now (since December 2004 in fact) is OPML exports of the RSS feeds we generate, organized by subject category. The feature is something I threw in one day and has largely existed as an "easter egg", although I did blog about it once.

The feeds are embedded in the subject category pages using an "auto discovery" link similar to that used for RSS feeds, e.g:

title="OPML" href="http://api.ingentaconnect.com/content/subject?j_pagesize=-1&format=opml&j_subject=142" />

The URLs could be tidier, but they do the job. I thought I'd publish a list of the feeds here:

Using those links should get you listing of the main IngentaConnect RSS feeds for our electronic collection (i.e. the latest title feed for each journal), grouped by our subject categories. Useful if you're interested in subscribing to a bunch of feeds in one go, or grabbing lists of RSS feeds to create a directory.

Only interested in the feeds for titles that your institution has a subscription to? Then we support that too. At present it relies on IP authentication to identify you and your institution, so its only useful if you're accessing the link from on campus, or via a proxy server, and obviously that your institution has subscriptions activated on IngentaConnect.

To subset the list, add the following parameter to any of the URLs:


Thats it.

Any problems or suggestions for improvements, please let me know.

Labels: ,

posted by Leigh Dodds at 2:42 pm


Table of Contents by Really Simple Syndication

The TOCRoSS project was a JISC project carried out last year in conjunction with Talis, Emerald and the University of Derby. The main goals of the project were to look at how RSS feeds could be extended to include sufficient metadata for populating library OPAC systems, thereby allowing users to search and then access more content from within the library environment.

The project write-up goes into more details and provides links to some sample RSS feeds and a (yet to be populated) Sourceforge project which will be home to some of the code developed during the prototype.

While the experiment was no doubt worthwhile, I think the project is something of a missed opportunity, as it failed to take into account any of the existing work thats been done on enriching publication feeds with detailed metadata.

For example Nature, Ingenta, and other publishers have been producing RSS feeds containing rich metadata for several years now. Collectively we've converged on RSS 1.0 supplemented with Dublin Core and PRISM metadata to carry precisely the information added to the TOCRoSS feeds, only using RSS 2.0 and ONIX. You can compare and contrast examples.

The existing formats have already been successfully used to drive other services, e.g. CiteULike, so there's demonstrable implementation experience that suggests that they meet a number of use cases.

While there's certainly plenty of room for debate about which format(s) and vocabularies are better -- perhaps Atom should replace all of them? -- its a shame that the project didn't focus on drawing out the existing lessons learnt and areas for further work.

Also, had the project adopted existing formats then the code that will ultimately be published would also have been much more useful: it would immediately work with thousands of existing feeds, as opposed to requiring publishers to support another variant format.

Labels: , ,

posted by Leigh Dodds at 2:00 pm


Task Force on “Development of OECD statistical products”

Wednesday, March 07, 2007

Last month I was invited to visit the OECD to speak to the OECD's task force on looking at the development of OECD statistical products about data publishing and Web 2.0.

The documentation for that meeting including my presentation is now available online.

The meeting included representatives from the OECD and from a number of national statistics organizations. It was interesting to learn about their challenges in publishing statistics and discuss their plans to reach a mixture of audiences both expert and otherwise. For my part I provided an overview of Web 2.0 and discussed the increasingly diverse range of lightweight publishing options that are available. I showed some demos of tools like Exhibit, Google Spreadsheets, Many Eyes, Swivel, and other data visualization examples using tools like Google Maps and Google Earth. We also discussed the potential for Semantic Web technologies in this space.

I suggested that the arrival of a "YouTube for data" isn't far away. Indeed tools like Swivel are aiming to deliver just that. Data publishing and visualization is a topic thats of increasing interest to a wide range of publishers.

Declan Butler has a short piece in Nature that reviews this latest trend. Armin Grossenbacher, another OECD task force attendee, maintains a blog about dissemination of official statistics which also has some useful pointers and commentary.

posted by Leigh Dodds at 5:10 pm


Persistent linking, web crawlers and social bookmarking

Typically a web crawler, unless configured to use a separate index or crawling algorithm, will use the URL from which it retrieves some content as the entry in its search index. This means that anyone clicking on a search result will be taken to this URL.

Where a site has access controlled content and the full-text resides at a different location, this presents a problem. The site owner or publisher would like users to go to one page, e.g. the abstract, but will want the crawler to get the full-text. Making this work involves some dialogue between site owner and the search engine. For example the web crawler needs to use an alternate index or additional metadata to make the connection between the index entry link and the full-text retrieval link.

Some site operators, with approval, use a technique known as "cloaking" to achieve this. This involves serving different content to a web crawler, e.g. a PDF, than would be served to an end user, e.g. an abstract. Most search engines disapprove of this approach, but Google Scholar, for example has allowed it. This has caused some debate.

On IngentaConnect we use cloaking to serve content to some crawlers. But we no longer do this for Google Scholar. The reason for this is that Google were interested in obtaining the richer metadata that we include (as embedded Dublin Core) in abstract pages. This metadata, supplemented with the full-text, improves the quality of Scholar search indexes.

I thought I'd explain the fairly simple solution I concocted to achieve this and point out where the same technique could be used to improve another problem: persistent linking in social bookmarking services.

When the Googlebot requests an abstract page from IngentaConnect, it gets fed some additional metadata that looks like this:

<meta rel="schema.CRAWLER" href="http://labs.ingenta.com/2006/06/16/crawler"/>
<meta name="CRAWLER.fullTextLink" content=""/>
<meta name="CRAWLER.indexEntryLink" content=""/>

The embedded metadata provides two properties. The first, CRAWLER.fullTextLink, indicates to the crawler where it can retrieve the full-text that corresponds to this article.

The second link, CRAWLER.indexEntryLink, indicates to the crawler the URL that it should use in its indexes. I.e. the URL to which users should be sent.

The technique is fairly simple and uses existing extensibility in HTML to good effect. It occured to me recently that the same technique could be used to address a related problem.

When I use del.icio.us, CiteULike, or Connotea or other social bookmarking service, I end up bookmarking the URL of the site I'm currently using. Its this specific URL that goes into their database and associated with user-assigned tags, etc.

However, as we all know, in an academic publishing environment content may be available on multiple platforms. Content also frequently moves between platforms. The industry solution to this has been to use the DOI as a stable linking syntax. Some sites like CiteULike make attempts to extract DOIs from bookmarked pages, or resolve DOIs via CrossRef. But the metadata they collect is still typically associated with the primary URL and not the stable identifier. This presents something of a problem if, say, one wants to collate tagging information across services, or ensure that links I make now will still work in the future.

A more generally applicable approach to addressing this issue, one that is not specific to academic publishing, would be to include, in each article page, embedded metadata that indicates the preferred bookmark link. The DOI could again be pressed into service as the preferred bookmarking link. E.g.

<meta rel="schema.BOOKMARK" href="http://labs.ingenta.com/2007/03/7/bookmark"/>
<meta name="BOOKMARK.bookmarkLink" content="http://dx.doi.org/10.1000/1"/>

This is simple to deploy. It'd also be simple to extend existing bookmarking tools to support this without requiring specific updates from the owners of social bookmarking sites. If the tool found this embedded link it could use it, at the option of the user, instead of the current URL.

The only downside I can see to this is the potential for abuse: it could be used to substitute links to an entirely different site and/or content for that which the user actually wants to bookmark. This is why I think users ought to be given the option to use the link, rather than silently substituting it. If owners of sites like CiteULike or Connotea decided to support this crude "microformat" then they can easily deploy a simple trust metric, e.g. that they'll use this metadata from known and approved sites.

I'd be interested in feedback on this as its something that we'll likely deploy on IngentaConnect in the next few weeks.

Labels: , , ,

posted by Leigh Dodds at 3:41 pm



There have been quite a few interesting sites launched recently each of which attempt to make certain kinds of publishing much easier. For example SlideShare allows users to publish presentations, while Swivel is for publishing data sets, e.g. Excel spreadsheets. The sites each allow users to upload content which is then wrapped in the usual set of Web 2.0 functionality: tagging, rating, blog integration, etc.

A recent addition to these micro-publishing sites is Scribd which describes itself as "a free online library where anyone can upload". The Scribd FAQ suggests that Scribd could be used for publishing "serious academic articles".

The site certainly has some interesting features which are of interest to an academic publishing environment, although the content itself is currently pretty variable in quality! As such, its another useful data point in the ongoing discussion of how academic publishing could evolve.

Amongst the features that I found particularly interesting are its use of FlashPaper to embed a PDF style viewer directly into web pages; the ability to download content in multiple formats, including MP3; the text and traffic analysis; and the integration with Print(fu), a separate service that allows purchasing of printed copies of PDFs.

Labels: , , ,

posted by Leigh Dodds at 3:02 pm


The Team

Contact us

Recent Posts



Blogs we're reading

RSS feed icon Subscribe to this site

How do I do that

Powered by Blogger