All My Eye

Here's the unedited version of a paper I wrote recently for Research Information (Oct/Nov 2006 issue):

There's been no hotter topic in 2006 than Web 2.0. Much has been made of its community engagement: putting research back into the hands of the researchers, fulfilling the web's true potential for user interaction. And like any new Big Thing, it has its sceptics, not least those who simply recoil from buzzword worship. But there's no doubting the widespread nature of the attitudinal shift. Where do publishers fit into this brave new user-centric world?

Attempts to define the term "Web 2.0" engender heated debate amongst the technical community. It is broadly associated with several concepts of the moment -- the semantic web, the long tail, social software -- and its use, whilst contentious, is pervasive. Web 2.0 philosophy may be viewed as a threat to traditional publishers -- in that, amongst other things, it provides researchers with alternative channels for content dissemination -- but those embracing its technologies are well-positioned to expand and thus protect their role in the information chain.

Positive engagement will involve facing, and overcoming, some fears. Key to the concept of the semantic web is the sharing of structured data by providing an interface (API) with which other web applications can interact. For publishers, whose revenue models often rely upon restricting access to their data, uptake is restricted by issues of strategy rather than technology. And it's not just the publishers who are cautious; witness those librarians who warn against the perils of social tagging, by which users of social software such as del.icio.us and flickr tag data according to self-created taxonomies (known as folksonomies). Equally, many researchers are unconvinced of the accuracy and thus the value of user-created resources such as Wikipedia.

But there's more than one way to keep abreast of a new wave. Think of the movement simply as a way to engage users -- both in terms of enabling them to interact, but also in terms of helping them to access and manipulate data they need. It becomes easy to rewrite your own history and consider yourself 2.0. Hey, weren't specialist publishers "monetizing the long tail" way before eBay -- and haven't we been taking advantage of the e-journal revolution to reach out further to niche markets? Rather than trying comprehensively to embrace the new order, publishers should look to build on those areas where it overlaps with their existing methodology; there's no sense throwing the business model out with the bath water purely to bandy about a buzzword. You'll be surprised at the levels on which you can engage without undermining -- nay, even supporting -- current revenue streams or strategies. For example, publishers have long been encouraging third party use of open data to drive traffic to access-controlled content, by making structured metadata openly available (to abstract & indexing databases, or via Open Archives Initiatives), and supporting predictable linking syntaces.

Or take remixing. More progressive implementations of RSS, such as "recent content" or "most viewed articles" feeds, are semantic web-friendly in that they can be retrieved and 'remixed; by another site (such as a library OPAC). One could further argue that our industry was an early adopter of remixing in its development of, and support for, federated searching, which espouses the seamless spirit of Web 2.0 by providing a single interface to multiple data sets. Elsewhere, early adopters have created blogs (such as Ingenta's All My Eye) to complement or replace the role of traditional newsletters in publicising service developments and product announcements; the format lends itself well to syndication and thus increased use of the content. Blogs can also be tied in with specific journals as an extension to the discussion fora of the '90s; enabling comments on postings can drum up debate and encourage usage of the journal articles to which they relate. This capitalises on the pre-existing status of a given journal as the centre of its community, and the freely available content can serve to draw users in to the paid-for papers.

Then there's the long tail. How about promoting less mainstream content within your site by adding "more like this" links from popular articles, or enabling users to vote for articles as is possible on sites such as Digg -- or even, if you're brave enough, posting a "least read" list to catch users' attention? Of course, there's no better way to maximise visibility and use of all your content than enabling it to be indexed by Google. The technology giant is also held responsible for the rise of another Web 2.0 phenomenon: the mashup, whereby publicly available data sets are combined to provide dynamic new services. The launch of the Google Maps API in June 2005 encouraged a plethora of programmers to create Ajax applications that draw on Google Maps' data -- such as Map My Run, which also brings in data from US Geological Service to provide elevations of plotted routes. At Ingenta we're piloting some projects which utilise a variety of data sets to enrich the full text we host, and both OCLC and Talis have recently announced prizes for the best mashups. Google is not the only search player to board the Web 2.0 train; other providers such as Yahoo! are developing social search tools which filter results based on folksonomies and user preferences, whilst new search engine Rollyo allows you to create a "searchroll" to restrict results to sites you trust -- a user-defined extension to the concept of Elsevier's Scirus.

In spite of this technolust, we should remember that we're a long way from critical mass: non-sophisticated users still make up the majority, and aren't interested in the more collaborative aspects of Web 2.0. What they do want from it is an information-rich user experience: more data to supplement the published literature, such as the additional details necessary to reproduce an experiment; or a means to feed back responses to authors and thus engender discussion which could further the research. Informal communication media (technical presentations, conference papers, pre-prints, even email and phone discussions) can be harnessed to strengthen the message of formal communication channels and to counteract the length of the formal process.

A range of technologies can be employed to support this; from community "trust" mechanisms, which can verify the expertise of participants (as eBay feedback attests to transactor reliability), to sophisticated data storage. Ingenta's award-winning new Metastore is a component-based data repository which allows us to store and deliver raw research data alongside the associated journal article. The technology behind it, RDF, is popular amongst Web 2.0 advocates because its common data model makes it readily extensible and remixable. Its flexibility allows us to extend the formats in which research results can be communicated, and to embrace the informal media which more traditional online publishing technologies preclude. Longer term we anticipate that authors themselves could add supplementary data directly to Metastore; whilst author self-archiving of papers is currently sluggish, espousal of collaborative enterprises such as Nature's Omics or Signaling gateways suggests it is not unreasonable to expect stronger support longer term.

Key to the availability of such data is the business model by which it can be accessed. Whilst Web 2.0 is lauded for going hand-in-hand with open source, such generosity is not compulsory -- but a flexible e-commerce framework is advantageous to encourage maximum usage. Of equal value is granular addressibility of content, whereby URLs are clean, compact, assigned at the lowest possible level and, preferably, predictable. Interoperability is clearly critical to the collaborative environment and, as elsewhere, work towards standardisation in this area will pave the way for further uptake.

In summary? Whilst early adopters are thriving on the additional functionality that Web 2.0-styled services can supply, the majority of researchers continue to have relatively simplistic requirements. Some publishers still need to focus on successfully delivering the basics before expending resource to deal in the bells and whistles -- and even then it is critical to serve our communities appropriately, with data and tools that will add genuine value to their workflow. Given the frenzied debate around the term Web 2.0, it seems inevitable that its usage will decline as providers try to dissociate from the media hype. Many in the industry are predicting a dotcom-style bust, as the bubble bursts for operations trading heavily on their Web-2.0-ability. If it does, those who survive -- like their dotcom-era predecessors -- will be those who have taken steps to provide user-focussed services whilst maintaining a strategy with substance, not hype, at its core.

Automatic for the people: publishing for a user-centric generation

Contributors