Open Archives Initiative - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Open Archives Initiative

Description:

Open Archives Initiative Where we are, Where we are going Carl Lagoze 4th OAF Workshop September, 2003 Where we are now De facto standard for Internet information ... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 33
Provided by: CarlL156
Category:

less

Transcript and Presenter's Notes

Title: Open Archives Initiative


1
Open Archives Initiative
  • Where we are,
  • Where we are going

Carl Lagoze4th OAF WorkshopSeptember, 2003
2
Where we are now
  • De facto standard for Internet information
    exchange
  • Deployed extensively and internationally
  • (digital) libraries
  • Museums
  • Eprint repositories
  • Research projects

3
Protocol Stability
  • OAI-PMH has been stable since release
  • No functional changes, just typographic edits
  • Validation of leadership/participation model
  • No plans for a 3.0 release
  • Core protocol will not be extended
  • Minor 2.x release could occur (more later)
  • Additional implementation guidelines (more later)

4
NSDL and OAI-PMH
5
The NSDL Context
  • National STEM (Science, Technology, Engineering,
    Mathematics, Medicine) Digital Library
  • Major National Science Foundation project
    targeted at the application of web and Internet
    to (STEM) education
  • 25M over six years to over 100 projects
  • Collections
  • Services
  • Targeted Research
  • Core Integration

6
NSDL technical guidelines
  • Aggregation rather than collection
  • Core integration team will not manage any
    collections
  • Spectrum of interoperability
  • Accommodate diversity of participation models
  • Open interfaces and standards permitting plug in
    of array of value-added services
  • One library many portals
  • Accommodate multiple quality and selection
    metrics
  • Tailor presentation of content and nature of
    services to audience needs

7
Spectrum of interoperability
Level Agreements Example Federation Strict use
of standards AACR, MARC (syntax, semantic, Z
39.50 and business) Harvesting Digital
libraries expose Open Archives metadata
simple metadata harvesting protocol and
registry Gathering Digital libraries do not
Web crawlers cooperate services
must and search engines seek out information
8
Translating to initial goals
  • This is a big task that no one has done before!
  • Work on the priorities
  • Focus on one point on spectrum of
    interoperability
  • Metadata harvesting
  • Incorporate NSF funded collections and selected
    other collections
  • Leverage existing (or at least emerging)
    technologies and protocols
  • OAI, uPortal, Shibboleth, SDLIP, InQuery
  • Provide reliable base level services
  • Search and Discovery, Access Management, User
    Profiles, Exemplary Portals, Persistence
  • Plant some seeds for the future
  • Machine-assisted metadata generation
  • Automated collection aggregation
  • Web gathering strategies

9
Metadata Repository
  • Central storage of all metadata about all
    resources in the NSDL
  • Defines the extent of NSDL collection
  • Metadata includes collections, items,
    annotations, etc.
  • MR main functions
  • Aggregation
  • Normalization
  • redistribution
  • Ingest of metadata by various means
  • Harvesting, manual, automatic, cross-walking
  • Open access to MR contents for service builders
    via OAI-PMH

10
Metadata Strategy
  • Collect and redistribute any native (XML)
    metadata format
  • Provide crosswalks to Dublin Core from standard
    formats
  • DC-GEM, LTSC (IMS), ADL (SCORM), MARC, FGCD, EAD
  • Concentrate on collection-level metadata
  • Use automatic generation to augment item-level
    metadata

11
Importing metadata into the MR
12
Exporting metadata from the MR
13
NSDL and OAI-PMH Two years later
  • Concepts are good, practice is hard
  • Issues
  • Metadata is hard
  • http//www.well.com/doctorow/metacrap.htm
  • XML is hard
  • Protocols are hard
  • Static repositories (more later)
  • IP is relevant (more later)

14
Some Essential Metadata Questions
  • Review original (DC) metadata assumptions
  • Metadata is essential for good resource discovery
  • Joe Sixpack could create metadata
  • Account for current realities
  • 2003 is not 1994
  • Google, etc. keeps getting better

15
Metadata Space
16
Metadata Triage
17
Reconsidering the Dublin Core Requirement
  • Questions about utility of unqualified DC
  • The conundrum.
  • Specification too loose to serve intended
    interoperability goal
  • But more complex metadata may be too hard
  • Limited energy for interoperability
  • Data providers implement required DC at expense
    of better metadata
  • Use of protocol for purposes other than resource
    discovery

18
Rethinking record-oriented model
Implications for record-oriented harvesting????
19
Topology Evolution
Simple Data Provider, Service Provider Topology
20
Topology Evolution (cont.)
Metadata Aggregator
21
Topology Evolution (cont.)
OAI-PMH p2p network
22
OAI-P2pMH Issues
  • Document (metadata) location
  • Exploit unique identifiers, use efficient
    key-based location mechanisms (distributed hash
    tables)
  • Provenance-based queries
  • Metadata records may go through refinement and/or
    translation phases as they move through
    value-added aggregators.
  • Exploit provenance guidelines
  • Network harvesting
  • Broadcast query (Gnutella) inefficient
  • Exploit techniques for efficient routing of
    queries (P-trees)

23
OAI-PMH and Intellectual Property
  • Protocol exists in a context where information
    providers have concerns about use of intellectual
    property
  • OAI-PMH is nominally about metadata, but
  • Rich metadata is an intellectual product
  • The protocol can be used to transmit anything
    (e.g. content) that can be encoded in XML
  • Generally metadata leads to content so.

24
OAI-rights effort
  • Goal is to investigate and develop means of
    expressing rights about metadata and resources in
    the OAI framework.
  • The result will be an addition to the OAI
    implementation guidelines that specifies
    mechanisms for rights expressions within OAI-PMH.
  • No changes to core protocol

25
OAI-rights Effort (cont.)
  • Extensible, providing a general framework for
    expressing rights statements within OAI-PMH.
  • Not an effort to develop a new rights expression
    language
  • Use Creative Commons licenses as a motivating and
    deployable example.
  • Release of specification by 2nd quarter 04
  • Invited OAI-rights group
  • Standard OAI development model

26
Dimensions of OAI-PMH and rightsEntity
Association
  • Metadata concern in NSDL for (re)use of rich
    metadata
  • Content predominant application of the protocol
    to resource discovery and ultimate access makes
    this important

27
Dimensions of OAI-PMH and rights Aggregation
Association
  • OAI-PMH aggregations
  • Repository
  • Set
  • Item
  • Rights association with an aggregation may
    provide shortcut (e.g., the rights for all
    resources in a repository/set)
  • Cost of shortcut is pseudo-statefulness, possibly
    complex overriding rules

28
Dimensions of OAI-PMH and rightsBinding
  • Choices
  • exploit mechanisms in metadata formats e.g.,
    DC-rights
  • restrict the rights statements to some more
    specific protocol mechanism
  • allow some mixture of these methods.
  • DC-rights problems
  • Semantics is restricted to rights about resource
  • Cant embed XML in dc value
  • What if DC is not required
  • Burden on harvesters if rights embedding is not
    explicit but scattered across several locations

29
OAI-PMH Static Repositories
  • Provide a lightweight mechanism for data provider
    participation
  • Intended for relatively small and static
    collections
  • Two components
  • Static Repository XML format
  • Semantically equivalent to Identify and
    ListRecords
  • Invisible to harvester
  • Static Repository Gateway
  • Virtual data provider for static repository data
  • Unique baseURL for each contained static
    repository

30
Static Repositories andStatic Repository Gateway
31
Static Repositories Open Issue
  • Relationship to RSS?????

32
Conclusions
  • Interoperability and lowest common denominator
  • Rapid advances automated methods
  • Moores law
  • Smart algorithms
  • Benefits of issues of scale
  • Combining human effort and automated methods
  • Extracting order from chaos
  • Learning from order
  • Move beyond resource discovery

33
Typical Values
  • repository
  • collection of publications
  • resource
  • scholarly publication
  • item
  • all metadata (DC MARC)
  • record
  • a single metadata format
  • datestamp
  • last update / addition of a record
  • metadata format
  • bibliographic metadata format
  • set
  • originating institution or subject categories

34
Repositories
  • Stretching the idea of a repository a bit
  • contextually sensitive repositories
  • personalization for harvesters
  • communication between strangers, or communication
    between friends?
  • OAI-PMH for individual complex objects?
  • OAI-PMH without MySQL?!
  • Fedora, Multi-valent documents, buckets
  • tar, jar, zip, etc. files

35
Resource
  • What if resource were
  • computer system status
  • uptime, who, w, df, ps, etc.
  • or generalized system status
  • e.g., sports league standings
  • people
  • personnel databases
  • authority files for authors

36
Item
  • What if item were
  • software
  • union of versions formats
  • all forms of metadata
  • administrative structural
  • citations, annotations, reviews, etc.
  • data
  • e.g., newsfeeds and other XML expressible content
  • metadataPrefixes or sets could be defined to be
    different versions

37
Record
  • What if record were
  • specific software instantiations / updates
  • access / retrieval logs for DLs (or computer
    systems)
  • push / pull model inversion
  • put a harvester on the client behind a firewall,
    the client contacts a DP and receives
    instructions on how to submit the desired
    document (e.g., send email to a specified address)

38
Datestamp
  • semantics of datestamp are strongly influenced by
    the choice of resource / item / record /
    metadataPrefix, but it could be used to
  • signify change of set membership (e.g., workflow
    item moves from submitted to approved)
  • change datestamp to reflect access to the DP
  • e.g., in conjunction with metadataPrefixes of
    accessed or mirrored

39
metadataPrefix
  • what if metadataPrefix were
  • instructions for extracting / archiving /
    scraping the resource
  • verbListRecordsmetadataPrefixextract_TIFFs
  • code fragments to run locally
  • (harvested from a trusted source!)
  • XSLT for other metadataPrefixes
  • branding container is at the repository-level,
    this could be record- or item-level

40
Set
  • sets are already used for tunneling OAI-PMH
    extensions (see Suleman Fox, D-Lib 7(12))
  • other uses
  • in aggregators, automatically create 1 set per
    baseURL
  • have hidden sets (or metadataPrefix) that have
    administrative or community-specific values (or
    triggers)
  • setaccessedgt1000from2001-01-01
  • setharvestMeWithTheseARGSuntil2002-05-05metada
    taPrefixoai_marc
Write a Comment
User Comments (0)
About PowerShow.com