Archive for September, 2007

accelerating ORE

For three days in August I was at Cornell University in Ithaca, New York attending a meeting of the Object Reuse and Exchange acceleration project (ORE). This meeting followed on from the two-day technical committee meeting in New York City in May and means that I have now spent 5 full days discussing this ORE stuff face-to-face alongside a bunch of hours reading, discussing by phone and trying to explain the project to other people. The main ORE project has been running since October last year and I don’t think it would be unkind to say that the technical committee has been struggling with defining the scope of the project and answering the question “just what is it that you are trying to do here?”. The acceleration project, funded by Microsoft and coming at the project’s mid-way point, is probably best described as an injection of steroids to kick start the real technical work that will occupy the next 12 months. Its aim is to produce an alpha set of technical documentation by the end of September which will then be used. The focussed effort of these three days, following on from the two technical meetings beforehand, has been successful in coming up with some concrete use cases, requirements, agreements about the ORE abstract model and ideas about how to serialise this model. Over the next few weeks, a small group will be working very hard to put the results of this meeting into a set of documents that will be ratified by the technical committee.

My take on what ORE is trying to do is that it is creating a bridge between the document-centric scholarly digital library world and the data-centric semantic web community. It’s primary aim from the start has been to provide a mechanism for describing compound information objects, in particular to provide a mechanism for describing their boundaries and identifying the types and relationships of their components on the Web, in a way that can facilitate re-use and exchange in a machine-to-machine way. After much discussion ORE is coming towards consensus on a layered approach that will allow a range of implementers with different requirements and different levels of technical skill to describe and share their resources in more meaningful ways. These range from from creating a simple manifest listing of information components, to a rich description that expresses the types of components, relationships between them and other resources, alongside essential metadata to aid discovery and re-use. Examples of use cases include a digitized book with a hierarchy of chapters, sections and individual pages made available in different document formats; a scholarly paper with a range of different versions available; and a youtube page containing an embedded video clip, user comments and references to related resources. What was gratifying about the meeting was that the discussions and disagreements ultimately helped all of us reach a point of tentative consensus, understanding and agreement. Some of the main agreements that I think could be usefully summarised – as a minimum a resource map should list the components of a compound object in a simple manifest; HTTP URIs should be used to identify the resource map and the compound object; where ‘good’ URIs already exist these can and should be used, but where new URIs are needed these should be created; the resource map may provide richer information about resource types, relationships and other metadata. One thing I was particularly pleased about was the movement away from re-invention, towards using existing standards and formats as much as possible, such as ATOM, RDF/XML, XHTML and Dublin Core.

After so much time spent involved in the detail of this project I find that I’m more excited about what it can offer than ever. I’ve heard various questions – what is it going to do for scholarly communication, really? is it offering anything new? isn’t this just RDF? but I think what is key for ORE is more about the focus it brings to the ‘compound objects’ (or whatever the chosen terminology is in the end!) issue than what it will ‘produce’. Of particular importance, for me, is the need for making resources available via the Web using pre-existing standards and formats, for encouraging adoption of standard mechanisms for sharing, describing and re-using objects, and doing this in way that allows traditional ‘documents’ and semantic ‘data’ to co-exist and benefit each other. This fits well with the work we’re doing with SWORD and with the wide use of RSS/ATOM. Once the lightweight bootstrap ORE specification is available, it will be up to repositories and other scholarly communications tools and systems to enrich and profile ORE for their own communities.

Leave a Comment