Digital Library Interoperability Architecture - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Digital Library Interoperability Architecture

Description:

Collections of components at different sites that are ... Realaudio video. Powerpoint presentation. SMIL synchronization metadata. structural. metadata ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 40
Provided by: carll8
Category:

less

Transcript and Presenter's Notes

Title: Digital Library Interoperability Architecture


1
Digital Library Interoperability Architecture
  • CS 502 20030305
  • Carl Lagoze Cornell University

2
Interoperability is multidimensional
  • Syntax
  • XML
  • Semantics
  • RDF/RDFS/OWL
  • Vocabularies/Ontologies
  • Dublin Core/ABC/CIDOC-CRM
  • Search and discovery
  • Z39.50
  • SDLIP
  • ZING
  • Document models
  • METS
  • FEDORA

3
Contrast to Distributed Systems
  • Distributed systems
  • Collections of components at different sites that
    are carefully designed to work with each other
  • Heterogeneous or federated systems
  • Cooperating systems in which individual
    components are designed or operated automously

4
Measuring success of interoperability solutions
  • Degree of component automony
  • Cost of infrastructure
  • Ease of contributing components
  • Ease of using components
  • Breadth of task complexity supported by the
    solution
  • Scalability in the number of components

5
Families of interoperability solutions
6
Interoperability Trade-offs
MetadataHarvesting
Dienst
7
Dienst
  • is a protocol and reference implementation of a
    distributed digital library service
  • where a network of services provide
  • World Wide Web browser access,
  • uniform search over distributed indexes,
  • and access to structured documents.

7
8
Why a service based protocol?
  • Expose the operational semantics of the services
    through an API,
  • to permit flexible integration of the services,
  • and use of the services by other
    clients/consumers/services.

9
Defining the services
  • Repository deposit, storage, and access to
    structured documents.
  • Index process queries on documents and returned
    handles
  • Query Mediator route queries to appropriate
    indexes
  • Collection define services and content in
    logical collections
  • User Interface human-oriented front-end for
    services.
  • Name Server Resolves URNs (handles) to
    document location(s)

10
Dienst Services
WWW browser
User Interface
11
Defining the protocol
  • Structured messages
  • Service
  • Version
  • Verb
  • Arguments
  • Template
  • /Dienst/ltservicegt/ltversiongt/ltverbgt?/ltargumentsgt
  • Example
  • /Dienst/Repository/4.0/Formats/ncstrl.cornell/TR94
    -1418

12
Why a Document Model?
  • Documents in current web are both
  • Unstructured (GET)
  • Chaotic (CGI)
  • Different views and pieces of contents are needed
    for
  • Bandwidth reduction
  • Rights management
  • Usability

13
Dienst Document Model
  • Metadata support for multiple descriptive
    formats
  • Views alternative expression or structural
    representation of the content encapsulated in the
    digital object
  • Divs hierarchically nested structure contained
    in a view

14
Expressing the document model in the protocol
  • Structure expose the views and structure for
    the digital object
  • Disseminate select the structural component
    (and packaging of it) to disseminate
  • List-Meta-Formats list available descriptive
    formats

15
Protocol Demonstration
  • http//techreports.library.cornell.edu8081/Dienst
    /Repository/4.0/List-Contents?file-after2003-01-0
    1
  • http//techreports.library.cornell.edu8081/Dienst
    /Repository/1.0/Disseminate/cul.cs/TR90-1160/23oa
    ms/xml
  • http//techreports.library.cornell.edu8081/Dienst
    /Repository/2.0/Structure/cul.cs/TR90-1160
  • http//techreports.library.cornell.edu8081/Dienst
    /Repository/4.0/Formats/cul.cs/TR90-1160?partbody
  • http//techreports.library.cornell.edu8081/Dienst
    /Repository/1.0/Disseminate/cul.cs/TR90-1160/body/
    inline?pageimage3

16
Collection Service
  • Periodically polled by each user interface server
    for
  • elements of the collection
  • index servers for the collection

User Interface Servers
Index Servers
17
Deploying Collection Globally
  • Internet connectivity varies considerably
  • Good connectivity between nodes often does not
    correspond to geographic proximity
  • Connectivity Region - a group of nodes on the
    network that among them have good connectivity,
    relative to nodes outside of the region.

18
Connectivity Regions
  • When possible route queries within region
  • In case of failure, use an alternate either
    within the region or in a nearby region

19
Origins of the OAI
  • Increasing interest in alternative scholarly
    publishing solutions e.g., LANL arXiv
  • Increasing impact through federation
  • UPS Mtg., Sante Fe, October 1999
  • Representatives of various ePrint, library,
    publishing, communities
  • Goal definition of an interoperability framework
    among ePrint providers
  • Reality Rich interoperability protocols like
    Dienst are too complicated for widespread
    deployment
  • Result Santa Fe Convention, interoperability
    through metadata harvesting

20
The World According to OAI
Service Providers
Discovery
Current Awareness
Preservation
Data Providers
21
Yes, its about resource discovery over
distributed collections
metadata
Author Title Abstract Identifer
22
Facilitating/Monitoring Longevity of Distributed
Content
PreservationService
23
Personalization of Content
24
Cross-Repository Reference Linking
Linkage Service
25
OAI Technical Infrastructure Key technical
features
  • Deploy now technology 80/20 rule
  • Two-party model providers (data providers) and
    consumers (service providers)
  • Simple HTTP encoding
  • XML schema for some degree of protocol
    conformance
  • Extensibility
  • Multiple item-level metadata
  • Collection level metadata

26
Content and Metadata
Item (metadata)
repository
resource
record
010010
27
http//www.openarchives.org/OAI/openarchivesprotoc
ol.html
28
record
ltrecordgt ltheadergt ltidentifiergtoaieg001lt/ident
ifiergt ltdatestampgt1999-01-01lt/datestampgt lt/head
ergt ltmetadatagt ltdc xmlnshttp//purl.org/dcgt
lttitlegtMy Examplelt/titlegt lt/dcgt lt/metadatagt
ltaboutgt ltea xmlnshttp//www.arXiv.org/ea
ltusagegtNo restrictionslt/usagegt lt/eagt lt/aboutgtlt
/recordgt
29
selective harvesting - datestamps
30
selective harvesting - sets
S2
31
set specifics
  • repositories define hierarchical organization
  • each item in a repository may be organized in one
    set, several sets, or no sets at all
  • meaning of sets or of set hierarchy is not
    defined in protocol
  • individual communities may formulate common set
    configurations

32
HTTP encoding - requests
BASE-URL -----------gt an.oa.org/OAI-scriptkeyword
arguments --gt verbListIdentiferssetS1
GET http//an.oa.org/OAI-script?verbListIdenti
ferssetS1
POST POST http//an.oa.org/OAI-script
HTTP/1.0 Content-Length 78 Content-Type
application/x-www-form-urlencoded
verbListIdentiferssetS1
33
HTTP encoding - responses
ltxml version1.0 encodingUTF-9
?gtltGetRecord xmlnshttp//oai.namespace.uri
xmlnsxsihttp//w3.namespace.uri xsischemaL
ocationhttp//oai.namespace.uri http//oai.sc
hemaURLgt ltresponseDategt2000-19-01T193030-0400
lt/responseDategt ltrequestURLgthttp//an.oa.org/OAI-
script?verbGetRecord ampidentifieroai3Aar
Xiv3A0001 ampmetadataPrefixoai_dclt/request
URLgt ltrecordgt record contents lt/record addit
ional recordslt/GetRecordgt
34
metadata prefix and schema
  • support for harvesting multiple metadata formats
  • metadata schema each format must have a
    validating XML schema at a publicly accessible
    URL (communities may define shared formats and
    schema.
  • metadata prefix each repository maps a prefix to
    the schema it supports, which is used in protocol
    requests.
  • support for unqualified Dublin Core mandatory
  • DC OAI record syntax that builds on base DCMI
    schema
  • reserved prefix oai_dc.

35
flow control
36
flow control specifics
  • applies to all protocol requests that return
    lists ListRecords, ListIdentifiers, ListSets
  • resumptionToken is opaque
  • semantics of partitioning of responses within
    resumption requests is undefined

37
Extensibility Feature Summary
  • Multiple metadata formats
  • Collection level metadata
  • Identify about container
  • Record data
  • Terms and conditions
  • Provenance
  • Set structure
  • Pre-configured queries

38
OAI Protocol
service provider
data provider
  • Supporting protocol requests
  • Identify
  • ListMetadataFormats
  • ListSets
  • Harvesting protocol requests
  • ListRecords
  • ListIdentifiers
  • GetRecord

39
Challenges and Questions
  • Utility of lowest common denominator metadata
    such as DC
  • Quality of metadata from non-professional
    contributors
  • Machines processing to reduce and compliment
    human effort
  • Functionality of service structure
Write a Comment
User Comments (0)
About PowerShow.com