Metadata: Cross-collection Repository - PowerPoint PPT Presentation

1 / 9
About This Presentation
Title:

Metadata: Cross-collection Repository

Description:

UIUC DeLIver collection of scientific/technical journal articles from AIP, ... flat file format, so wrote a web crawler with custom parser, translated into DC, ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 10
Provided by: timoth70
Category:

less

Transcript and Presenter's Notes

Title: Metadata: Cross-collection Repository


1
Metadata Cross-collection Repository
  • Thomas G. Habing
  • Grainger Engineering Library Information Center
  • University of Illinois at Urbana-Champaign
  • http//dli.grainger.uiuc.edu
    thabing_at_uiuc.edu

2
Who is in the repository
  • UIUC DeLIver collection of scientific/technical
    journal articles from AIP, APS, ASCE, and IEE.
  • D-Lib Magazine is a monthly magazine about
    innovation and research in digital libraries.
  • Netlib is a collection of mathematical software,
    papers, and databases.
  • The UC Berkeley Digital Library Project is a
    collection of documents and photographs
    containing environmental and biological data
    mostly about the state of California.
  • Plan to add others...

3
What is in the repository
  • Metadata from the various collections
  • Normalized into Dublin Core categories
  • Some substructure beyond DC has been retained
  • Where is the repository
  • Grainger Engineering Library Server Room
  • Windows NT Server running MS SQL Server
  • Windows NT Server running MS IIS Server

4
Why build a repository
  • Investigates issues fundamental to the D-Lib Test
    Suite Project
  • How to aggregate metadata
  • Appropriateness of various metadata schemes, such
    as DC, DCQ, etc.
  • Search interfaces
  • How to best support least-common-denominator
    searches and detailed searches
  • CNRI is looking at architecture options for next
    generation
  • Tomorrow

5
Why build a central repository
  • Instead of a distributed repository (harvest)
  • Facilitates standardization of data
  • Simplified Architecture
  • It still allows us to investigate many of the
    issues from the previous slide
  • However, distributed repositories do have some
    advantages (agent)
  • No data synchronization problems
  • Better able to take advantage of the best
    features of each repository
  • Potentially More Scalable

6
How was the repository built
  • Database of 16 tables
  • One top-level table with a row for every item
    being described
  • 15 tables for each DC category

Item Record
1
0 .. n
0 .. n
0 .. n
DCCreator
DCRights
DCTitle
...
7
How was the repository built (cont.)
  • Custom VB Parser for each collection
  • UIUC records already in DC and XML, so loaded
    into DOM and inserted records into DB
  • Dlib Mag. already in XML, so loaded into DOM,
    translated to DC and inserted into DB
  • NetLib used custom flat file format, so wrote a
    web crawler with custom parser, translated into
    DC, inserted into DB
  • Berkeley used RFC1807 or flat file ASCII DB
    dumps, so wrote custom parser, translated to DC,
    and inserted into DB

8
How was the repository built (cont.)
  • HTML Search Form
  • Active Server Page (ASP) Backend
  • SQL Server Stored Procedures
  • SQL Server Full Text Catalogs

9
Demo
  • http//dli.grainger.uiuc.edu/dlibmeta/searchform.a
    sp
Write a Comment
User Comments (0)
About PowerShow.com