OAI Metadata Harvesting with Theses and Dissertations - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

OAI Metadata Harvesting with Theses and Dissertations

Description:

Harvester. OAI Server#2. DC & MARC. Web. Browser. Union. Catalog. DC. MARC. DC ... Harvester. Access 2001. 16. Name Authorities (cont.) Layered on top of OAI ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 21
Provided by: hic63
Category:

less

Transcript and Presenter's Notes

Title: OAI Metadata Harvesting with Theses and Dissertations


1
OAI Metadata Harvesting with Theses and
Dissertations
  • Thomas Hickey, OCLC
  • Access 2001

2
Background
  • OCLC Office of Research
  • http//www.oclc.org/research/
  • NDLTD
  • Networked Digital Library of Theses and
    Dissertations
  • http//www.ndltd.org/
  • OAI
  • Open Archive Initiative
  • http//www.openarchives.org/

3
ALCME Project
  • Exploring
  • Open source platforms
  • Web-accessible services
  • OAI Metadata Harvesting
  • Using metadata for theses and dissertations

4
NDLTD
  • Networked Digital Library of Theses and
    Dissertations
  • Run out of Virginia Tech
  • Some 100 members interested in improving access
  • OAI servers seem a natural direction

5
NDLTD (cont.)
  • Released a metadata standard
  • EDTMS
  • Dublin Core-based
  • Additional elements from new namespaces
  • Currently XML based
  • RDF a possibility
  • Incorporates linking to a distributed name
    authority database

6
OAIMH protocol
  • Open Archives Initiative Metadata Harvesting
    protocol (http//www.openarchives.org/OAI/openarch
    ivesprotocol.htm)
  • Allows efficient harvesting of metadata

7
The Vision
  • Make information widely available
  • Allow layering of systems

Peer review, selection, indexing services
OAI Protocol
Harvesting, publication service
OAI Protocol
Digital Content (images, papers, etc.)
8
OAI Protocol
  • Uses HTTP
  • Fairly simple URLs to
  • Identify the server
  • List formats, record sets
  • Get records (can specify date modified)
  • Has flow control so that large sets can be managed

9
OAI Protocol continued
  • XML version of Dublin Core is required
  • Other metadata formats possible (wrapped in XML)
  • Typical uses
  • Publish metadata for a special collection
  • Use to keep two catalogs synchronized

10
OAI at OCLC
  • Currently in Office of Research
  • Publishing
  • Harvesting
  • Building services
  • Open source

11
Publishing
  • Catalog of WorldCat theses and dissertations
    records
  • Currently have 100,000 available
  • Plan to have all 3,000,000 up
  • Starting to embed services in the records
  • Authority links
  • OCLC Open Access links

12
Harvesting
  • Harvesting a variety of OAI servers
  • Making them available in a single catalog
  • Most theses are already in WorldCat
  • May be able to get more foreign theses

13
Harvesting Example
OAI Server1 DC EAD
DC
OAI Harvester
Union Catalog
DC/MARC
OAI Server2 DC MARC
MARC
HTML
DC
Web Browser
OAI Server3 DC VRA
14
Services
  • Links into WorldCat information (planned)
  • Associated searches
  • Holdings
  • ILL?
  • Name authority links

15
Name Authorities on OAI
Central Metadata Harvester
Local Metadata Server
Central MD Server
Central Name Server
Local Name Harvester
Local Name Server
CentralName Harvester
16
Name Authorities (cont.)
  • Layered on top of OAI
  • URL associated with each entry
  • Mechanisms for synchronization, publication
  • http//alcme.oclc.org/ndltd/AuthLink.html

17
Open Source
  • Gwen/Pears
  • database and search engine
  • Scorpion
  • RDF toolkits (EOR, Perl based)
  • http//www.oclc.org/research/software/

18
Future Work
  • Loading more records
  • Supporting NDLTD ETDMS
  • Adding test services to records
  • Incorporating Authority files into ACE
  • Analysis of harvested metadata

19
ACE
  • New research project
  • Advanced Collection Environment
  • ASP model for managing collections
  • Starting with personal collections
  • Simplifies
  • Allows more experimentation
  • Expect much to cross over to libraries

20
ACE (cont.)
  • Dublin Core based
  • First few hundred records are free
  • Tries to be a complete service
  • Targeted for the serious collector
  • Emphasis on management, not commerce
  • Expect testing within OCLC Research in October,
    testing outside late this year
Write a Comment
User Comments (0)
About PowerShow.com