OAIster: Whats with the Weird Name - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

OAIster: Whats with the Weird Name

Description:

usc: U South California census data. Examples of data providers ... dc:relation IspartOf Victorian Railways collection. /dc:relation Many, many more hits ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 22
Provided by: KatHag8
Category:

less

Transcript and Presenter's Notes

Title: OAIster: Whats with the Weird Name


1
OAIster Whats with the Weird Name?
  • Kat Hagedorn
  • UM Library Information Technology
  • November 28, 2005

2
What is OAIster?
  • Is/was a means for UM to test the OAI protocol
    (hence the name)
  • A method for sharing metadata among institutions
    and groups of people
  • A means of developing a search service for
    end-users worldwide

3
Basics of OAI
4
What does OAIster collect?
  • Harvests all metadata from all OAI data providers
    (within reason)
  • Only keeps metadata that points to digital
    objects, e.g., articles, photographs, datasets,
    etc. in digitized form
  • All available via search service

5
Searching OAIster
  • Time to show off OAIster
  • http//www.oaister.org/

6
A little history
  • Service is now 3.5 years old
  • Started with 66 data providers and a little over
    200K records
  • Now have 572 data providers and a little over 6
    million records
  • 37 US, 63 international

7
Visibility of OAI
  • Surprising who hasnt made their metadata
    shareable through OAI
  • Harvard, Yale, Stanfordthe big ones
  • Initially perplexing, but now clearer
  • always done at the end
  • only recently thought of at initiation of
    projects
  • truthfully, many institutions not collaborative

8
Examples of data providers
  • Many data providers are huge, e.g.,
  • arXiv physics preprint and postprint articles
  • pubmed medical articles, although restricted
  • pictureaustralia images from govt and academic
    institutions in Australia
  • lcoa Library of Congress digital archives
  • usc U South California census data

9
Examples of data providers
  • Most are small, though
  • Many around 100 records
  • Value of making their records available
  • increased visibility
  • inclusion in bigger search service than theirs
  • incorporation in Yahoo! Search

10
Yahoo! Search
  • Two years ago, collaborated with team at Yahoo!
    Search to send our metadata to them for indexing
  • e.g., gardens at albury in Yahoo! Search
  • know its not static html roboting
  • ltdcrelationgtIspartOf Victorian Railways
    collection.lt/dcrelationgt
  • Many, many more hits
  • Also send metadata to Google

11
System design
XSL stylesheets (per source type)
UM harvester
XSLT transformation tool
OAI-enabled DC records
Record storage
Non-OAI-enabled DC records
Search interface (XPAT)
BibClass indexes
12
Transformation of metadata
  • Most metadata needs to be brushed off
  • adding an http// to the front of URLs
  • Or raked
  • removing instances of lt!CDATA
  • Or wrung out
  • instead of Wheres Waldo, its Wheres the
    incorrect UTF-8 character?
  • And should be normalized

13
Why normalize?
  • Sample date values
  • ltdategt2-12-01lt/dategt
  • ltdategt2002-01-01lt/dategt
  • ltdategt0000-00-00lt/dategt
  • ltdategt1822lt/dategt
  • ltdategtbetween 1827 and 1833lt/dategt
  • ltdategt18--?lt/dategt
  • ltdategtNovember 13, 1947lt/dategt
  • ltdategtSEP 1958lt/dategt
  • ltdategt235 bcelt/dategt
  • ltdategtSummer, 1948lt/dategt

14
Why use a CV?
  • Sample subject values
  • ltsubjectgt30,51,52lt/subjectgt
  • ltsubjectgt1852, Apr. 22. Everitt Judson, letter
    to Philuta Judson.lt/subjectgt
  • ltsubjectgtSlavery--United States--Controversial
    literaturelt/subjectgt
  • ltsubjectgtview of interior with John Henry
    sculpturelt/subjectgt
  • ltsubjectgtParticles (Nuclear physics) --
    Research.lt/subjectgt

15
Best practices
  • Fixing more than half of the data providers is
    cumbersome
  • Individuals at OAI-enabled institutions started a
    Best Practices group to inform data providers
    what they ought to do
  • http//oai-best.comm.nsdl.org/cgi-bin/wiki.pl?Tabl
    eOfContents

16
2nd phase OAI
  • Best Practices group sponsored by the Digital
    Library Federation, which also
  • Sponsors our latest grant
  • Better and more easily calculated statistics
  • Search interface improvements
  • Clustering / classification techniques
  • Using richer metadata

17
Clustering / classification
  • Using automated means to take a selection of
    metadata and determine what its about
  • Working with Emory University (one of our grant
    partners) to test their tool
  • Results will be integrated into search so can
    search in smaller group of OAIster records

18
Using richer metadata
  • Data providers must use simple Dublin Core
  • Very sparse schema for describing objects
  • dctitle must contain main title, sorted title
    and alternative titles
  • dcsubject doesnt distinguish between
    geographical, hierarchical, temporal

19
Using richer metadata
  • Encouraging use of richer metadata, especially
    MODS (Metadata Object Description Schema) from
    LOC
  • Developed testbed for grant deliverables
  • currently only shows MODS work
  • http//www.hti.umich.edu/m/mods/

20
Other stuff
  • Well, make it smaller somehow
  • Clean up Boolean interface
  • squinch fields together
  • include more normalization
  • Make it available through federated search
  • Proselytize sharing metadata
  • Test, test, test

21
Contact me
  • Kat Hagedorn
  • UM Library Information Technology
  • khage_at_umich.edu
  • www.oaister.org
Write a Comment
User Comments (0)
About PowerShow.com