Augmenting Interoperability across Scolaarly Repositories - PowerPoint PPT Presentation

About This Presentation
Title:

Augmenting Interoperability across Scolaarly Repositories

Description:

Steve Griffin - National Science Foundation. Robert Hanisch - Space Telescope Science Institute ... Peter Murray-Rust - University of Cambridge ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 53
Provided by: herbe67
Learn more at: https://www.cs.odu.edu
Category:

less

Transcript and Presenter's Notes

Title: Augmenting Interoperability across Scolaarly Repositories


1
The Open Archives Initiative Object Re-Use
Exchange (ORE) Project
  • Michael L. Nelson (1)
  • Herbert Van de Sompel (2)
  • Carl Lagoze (3)
  • (1) Computer Science, Old Dominion University
  • (2) Research Library, Los Alamos National
    Laboratory
  • (3) Information Science, Cornell University

ORE is supported by the Andrew W. Mellon
Foundation with additional support of the
National Science Foundation
2
General information about OAI-ORE
3
OAI Object Re-Use and Exchange
  • OAI-ORE is a new effort conducted under the
    umbrella of the OAI
  • Supported by the Andrew W. Mellon Foundation
    additional support from the National Science
    Foundation
  • International effort October 2006 - September
    2008
  • http//www.openarchives.org/ore/

4
Meeting in NYC, April 20-21 2006
  • Supported by Microsoft, Mellon Foundation,
    Coalition for Networked Information, Digital
    Library Federation, JISC
  • Representatives from institutional Repository
    projects, scholarly content Repositories,
    Registry projects, various projects that touch on
    interoperability
  • See http//msc.mellon.org/Meetings/Interop/ for
    Agenda, Participants, Topics Goals,
    Terminology, Presentations, Prototype
    demonstration, Meeting Report.

5
OAI Object Re-Use and Exchange
  • OAI-ORE project organization
  • Coordinators Carl Lagoze Herbert Van de Sompel
  • ORE Advisory Committee
  • ORE Technical Committee
  • ORE Liaison Group

6
ORE Technical Committee
  • Les Carr - University of Southampton (UK)
  • Leigh Dodds - Ingenta (UK)
  • Tim DiLauro - Johns Hopkins University
  • Dave Fulker - University Corporation for
    Atmospheric Research
  • Tony Hammond - Nature Publishing Group (UK)
  • Richard Jones - Imperial College (UK)
  • Peter Murray - OhioLINK
  • Michael Nelson - Old Dominion University
  • Ray Plante - National Center for Supercomputing
    Applications
  • Pete Johnston - Eduserv Foundation (UK)
  • Rob Sanderson - University of Liverpool (UK)
  • Simeon Warner - Cornell University
  • Jeff Young - OCLC

7
ORE Liaison Group
  • Leonardo Candela - EC DRIVER
  • Tim Cole - UUIC for DLF Aquifer
  • Julie Allinson - UKOLN for the JISC Digital
    Repository support effort (substituting for
    Rachel Heery )
  • Jane Hunter - University of Queensland for
    Australian Department of Education, Science and
    Technology
  • Savas Parastatidis - Microsoft
  • Thomas Place - University of Tilburg for DARE
    (soon to be renamed SurfShare)
  • Andy Powell - EduServ for the DC community
  • Rob Tansley - Google for Google and DSpace

8
ORE Advisory Committee
  • Sayeed Choudhury - Johns Hopkins University
  • Gregory Crane - Tufts University
  • Lorcan Dempsey - OCLC
  • Mark Doyle - The American Physical Society
  • John Erickson - Hewlett-Packard Laboratories
  • Steve Griffin - National Science Foundation
  • Robert Hanisch - Space Telescope Science
    Institute
  • Jane Hunter - The University of Queensland
  • Clifford Lynch (chair) - Coalition for Networked
    Information
  • Liz Lyon - UKOLN
  • Peter Murray-Rust - University of Cambridge
  • Jim Ostell - National Center for Biotechnology
    Information
  • Sandy Payette - Cornell University
  • Robby Robson - Eduworks
  • MacKenzie Smith - MIT Libraries
  • Leo Waaijers - SURF Platform ICT and Research

9
Context of OAI-ORE Standards Protocols
10
OAI Its Not Just for Metadata Harvesting
Anymore
OAI-PMH OAI-ORE
Repository structure Object structure
Metadata centric Resource centric
Metadata harvesting Object re-use (obtain, harvest, register)
  • OAI-PMH and OAI-ORE are complimentary
  • you can do one without the other
  • you can do them together

11
An Early Formulation of the Problem
  • First noticed in how people would populate their
    Dublin Core records
  • people need the HTML splash page
  • crawlers need the PDF file
  • Ad-hoc conventions and methods used to expose
    the repositorys knowledge about the structure of
    the object
  • Next three slides taken from Resource Harvesting
    Within the OAI-PMH Framework
  • http//www.dlib.org/dlib/december04/vandesompel/12
    vandesompel.html

12
Dublin Core Encoding Type 1
ltoai_dcdcgt ltdctitlegtA Simple Parallel-Plate Resonator Technique for Microwave. Characterization of Thin Resistive Filmslt/dctitlegt ltdccreatorgtVorobiev, A.lt/dccreatorgt ltdcsubjectgtING-INF/01 Elettronicalt/dcsubjectgt ltdcdescriptiongtA parallel-plate resonator method is proposed for non-destructive characterisation of resistive films used in microwave integrated circuits. A slot made in one ... lt/dcdescriptiongt ltdcpublishergtMicrowave engineering Europelt/dcpublishergt ltdcdategt2002lt/dcdategt ltdctypegtDocumento relativo ad una Conferenza o altro Eventolt/dctypegt ltdctypegtPeerReviewedlt/dctypegt ltdcidentifiergthttp//amsacta.cib.unibo.it/archive/00000014/lt/dcidentifiergt ltdcformatgtpdf http//amsacta.cib.unibo.it/archive/00000014/01/GaAs_1_Vorobiev.pdf lt/dcformatgt lt/oai_dcdcgt
13
Dublin Core Encoding Type 2
ltdcidentifiergthttp//amsacta.cib.unibo.it/archive/00000014/lt/dcidentifiergt ltdcrelationgt http//amsacta.cib.unibo.it/archive/00000014/01/GaAs_1_Vorobiev.pdf lt/dcrelationgt
14
Dublin Core Encoding Type 3
ltdcidentifiergt http//amsacta.cib.unibo.it/archive/00000014/lt/dcidentifiergt ltdcrelationgt http//resolver.unibo.it/00000014/ lt/dcrelationgt ltdcrelationgt http//amsacta.cib.unibo.it/archive/00000014/01/GaAs_1_Vorobiev.pdf lt/dcrelationgt
15
And more recently
  • Are repositories successfully exposing the
    full-text of articles (the PDF file or whatever)
    to Google rather than (or as well as) the
    abstract page?
  • Are we consistent in the way we create hypertext
    links between research papers in repositories?
  • (from Andy Powells eFoundations blog)

16
As the objects get more complex, things get
worseRather than continue down that path,
lets back up and restart
17
Compound Information Objects
  • Units of scholarly communication are compound
    information objects
  • Identified, bounded aggregations of related
    information units that form a logical whole.
  • Components of compound object may vary according
    to
  • Semantic type book, article, moving image,
    dataset,
  • Media type PDF, HTML, JPEG, MP3, .
  • Internal relationship parts, views,
  • External relationships

18
Access Repositories
  • Compound objects are made accessible by a variety
    of scholarly repositories
  • Institutional repositories
  • Discipline-oriented repositories
  • Publisher repositories
  • Dataset repositories
  • Cultural heritage repositories
  • Learning object repositories
  • Digitized book and manuscript collections
  • Research-group and managed personal (ePortfolio)
    repositories

19
Access Repositories
  • Repositories expose compound objects in manners
    specific to the repository architecture
  • Interfaces (API user-oriented)
  • Identification schemes
  • Representation of compound objects
  • Mapping of compound objects and
  • components to the Web

20
Their Structure is Obfuscated When Mapped to the
Web
21
Structure Can Be Even Harder to Infer When
Server/Domain Boundaries are Crossed
  • http//foo.edu/repo1/object12/index.html
  • http//foo.edu/repo1/object12/object12.pdf
  • http//foo.edu/repo1/object12/metadata.dc
  • http//foo.edu/repo1/object12/errata.html

http//foo.edu/repo1/object12/index.html http//bl
urple.org/service?citing-authorNelson http//blur
ple.org/service?citing-paperobject12 http//bar.e
du/mln/jcdl-2007.pdf
22
Fun CDO Example Flickr
Peers
public private tags (service links)
wed href to http//www.flickr.com/photos/7
3977402_at_N00/162521629/ but img src to
http//farm1.static.flickr.com/62/162521629_f988d1
e5fa.jpg
23
Scholarly CDO Example CiteSeer
Original, remote version
Representations
Peers
Representation
http//citeseer.ist.psu.edu/lagoze01open.html
(with semantics) http//citeseer.ist.psu.edu/5006
50.html (without)
24
Scholarly CDO Examples arXiv
Representations
Service Links
Remotely held version
Locally held versions
http//arxiv.org/abs/astro-ph/0611775
25
More Scholarly Compound Digital Object
Possibilities
  • An issue of an overlay journal built from
    distributed ePrints
  • eScience resource combining text, data,
    simulations
  • eHumanities resource combining primary and
    derived content

26
Systems that manage digital objects
Systems that leverage managed digital objects
  • Institutional repositories
  • Discipline-oriented repositories
  • Publisher repositories
  • Dataset repositories
  • Cultural heritage repositories
  • Learning object repositories
  • Digitized book and manuscript collections
  • Image repositories
  • All repositories from left column
  • Search engines
  • Authoring tools
  • Citation management tools
  • Collaborative environments
  • Social network applications
  • Graph analysis tools
  • Preservation services
  • Workflow tools

27
OAI Object Re-Use and Exchange
  • Develop, identify, and profile extensible
    standards and protocols to allow repositories,
    agents, and services to interoperate in the
    context of use and reuse of compound digital
    objects beyond the boundaries of the holding
    repositories.
  • Aim for more effective and consistent ways
  • to facilitate discovery of these objects,
  • to reference (link to) these objects (and parts
    thereof),
  • to obtain a variety of disseminations of these
    objects,
  • to aggregate and disaggregate these objects,
  • Enable processing by automated agents

28
Taking the Web perspective
29
Working with the web architecture
  • Whatever we do must be congruent with the web
    architecture
  • Use existing capabilities where they are
    appropriate
  • Cleanly layer capabilities meeting the needs of
    our problem space
  • Provide the infrastructure for web-based
    information systems that exploit/enhance and
    therefore overlay on the existing web.

30
ORE An Interoperability Layer
  • A projection of private object structure into the
    public web, using the web architecture
  • URIs that identify
  • resources, which are items of interest, that,
  • when accessed through standard protocols such as
    HTTP, return
  • representations of current resource state
  • and which are linked via URI references
  • thus forming the graph that is the Web.

31
W3C Web Architecture
Identifies
32
W3C Web Architecture more details
  • Aggregation
  • No standard way to describe finite set of
    resources and relationships
  • Resource
  • First-class object
  • Linkable
  • Relationship
  • Usually untyped
  • Link type ontologies not-standardized
  • Representation
  • Second-class object (identified only in context
    of resource)
  • Not linkable
  • Many representations/resource

33
Compound Object
astro-ph/0611775
Multiple Views, diverging in media-type, format,
and content-type
34
More complexity
boundary, logical unit
astro-ph/0611775
local, remote
lineage, version, citation, etc.
35
Compound Object
astro-ph/0611775
Lets publish it to the Web
36
(No Transcript)
37
Compound Digital Object mapped to the Web
  • Are repositories successfully exposing the
    full-text of articles (the PDF file or whatever)
    to Google rather than (or as well as) the
    abstract page?
  • Discovery How does Google find all these
    resources that originate from the same digital
    object?
  • Boundary How does Google know these resources
    originate in the same digital object?

38
Compound Digital Object mapped to the Web
  • Are we consistent in the way we create hypertext
    links between research papers in repositories?
  • Citation Which Resource to link to?
  • Citation How to reference the PDF version (and
    not the PS version)?

39
Thoughts about a possible approach
40
Observation 1Components of compound object must
be published as resources in order to be
reference-able
41
Observation 2 The object as such (boundary,
structure, relationships)is invisible to Web
applications
42
Observation 2 bis How about publishing a
resource that makes a Resource Map available that
formally expresses the boundaries of the object?
43
Observation 3And now facilitate discovery of the
Resource Map (and hence of the compound object)
by Web applications
44
Observation 4 bis Through the Resource Map, the
Web application sees the compound object
45
Observation 5This approach reveals compound
objects in the Web graph
46
Resource Map available from ORE resource
  • Expresses an aggregation of resources and
    relationships in a machine-readable manner.
  • Describes a graph
  • finite set of resources and relationships among
    the resources
  • relationships among resources that are members of
    the aggregation and resources are external to
    the aggregation
  • Can be used to express
  • Our scholarly compound objects
  • Whichever aggregation of resources and
    relationships
  • Having a standardized format for Resource Maps
    opens the door to graph publishing (cf.
    Semantic Web notion).

47
Use and Re-Use enabled by the ORE resource
  • ORE resource has a URI HTTPORE
  • lets call that ORE resource a Resource Map
  • HTTPORE identifies a graph (cf. Semantic Web
    notion Named Graph)
  • The Resource Map is available via HTTP GET on
    HTTPORE
  • HTTPORE can become the key for object re-use
    Obtain, Harvest, Register (cf. Web 2.0 mash-up)
  • The Resource Map is not the Resource
    (apologies to Alfred Korzybski)
  • Crawlers, agents will initially transact with the
    Resource Map, not the components of the resource

48
More About Resource Map Discovery
  • Two general approaches
  • create new resources that describe the boundary
    relationships that make up the CDO
  • web crawling (cf. sitemaps)
  • new metadataPrefix in OAI-PMH repositories
  • Atom feeds
  • instrument existing resources to point to the
    resources
  • http content negotiation
  • http headers
  • html microformats
  • Selective discovery
  • you should never get a Resource Map unless you
    really asked for it existing harvesters,
    crawlers will not break
  • Resource Maps are for machines, not humans

49
So, where does ORE stand?
50
OAI-ORE Current Status
  • Ongoing definition of the ORE framework
  • Reach joint problem statement
  • Issues regarding identification
  • Model for ORE resource
  • Publishing ORE resources to the Web
  • Discovering ORE resources
  • Review of appropriate technologies for ORE Model
    and Resource Map
  • ATOM
  • DID/DIDL, IMS/CP, METS, Ramlet
  • RDF, RDF/XML
  • Dublin Core Abstract Model

51
OAI-ORE Current Status
  • Explore demonstrators using these concepts in
    preparation of May 2007 ORE Technical Committee
    meeting
  • Post May 2007 meeting
  • Hopefully work towards alpha specs for ORE
    resource, Resource Map, discovery of ORE resource
  • Experimentation with alpha specs

52
OAI-ORE Afterwards
  • Look into core services Obtain, Harvest,
    Register, in terms of ORE resource and Resource
    Map.

53
Questions
Further information http//www.openarchives.org/or
e/
Write a Comment
User Comments (0)
About PowerShow.com