OAIster: A - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

OAIster: A

Description:

Digital Library Production Service at University of Michigan Libraries began ... provided by content creator and harvesting ... Focus on all content available ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 40
Provided by: KatHag8
Category:

less

Transcript and Presenter's Notes

Title: OAIster: A


1
OAIster A No Dead Ends Digital Object Service
  • Kat Hagedorn
  • OAIster Librarian
  • University of Michigan Libraries
  • October 3, 2003

2
background
  • One-year Mellon grant project to test the
    feasibility of making OAI-enabled metadata for
    digital objects accessible to the public
  • Digital Library Production Service at University
    of Michigan Libraries began work in December 2001
  • Publicized as OAIster in February 2002
  • Launched in June 2002

3
highlights
  • Any audience
  • Any subject matter
  • Any format
  • Freely accessible
  • No dead ends
  • One-stop shopping
  • retrieving the hidden web

4
the protocol
  • OAI Open Archives Initiative
  • OAI-PMH Open Archives Initiative Protocol for
    Metadata Harvesting
  • Designed to make it easy to exchange metadata
    among interested parties
  • Consists of 6 HTTP requests to identify
    repositories / metadata and perform harvesting

5
tool we borrowed
  • University of Illinois Urbana-Champaign
    open-source OAI protocol harvester
  • java edition for our unix environment
  • Worked collaboratively to iron out kinks
  • resumptionToken / retryAfter
  • inexplicable kill
  • bogus records in MySQL table

6
development environment
  • Digital Library Extension Service (DLXS)
  • Develop open-source middleware and license XPAT
    search engine for building and mounting digital
    libraries
  • Middleware consists of document classes, i.e.,
    Text, Image, Bib, FindAid
  • Originally designed to make SGML encoded texts
    available online

7
tool we developed
  • Runs in DLXS environment using BibClass
  • Current BibClass web templates modified
  • Additional java-based transformation tool to
  • DC metadata records concatenated
  • No-digital-object records filtered out
  • Records counted
  • Conversion from UTF-8 to ISO-8859-1
  • XSLT used to transform DC records into BibClass
    records

8
system design
XSL stylesheets (per source type)
UIUC harvester
XSLT transformation tool
OAI-enabled DC records
Record storage
Non-OAI-enabled DC records
Search interface (XPAT)
BibClass indexes
9
result
  • One place to look for digital objects
  • Big
  • 1,723,003 metadata records
  • 203 institutions (as of September 03)
  • Popular
  • Averages 3300 search sessions / month
  • Picked up in March 03 average 3500 now
  • 43,894 searches total (through July 03)

10
www.oaister.org search
11
www.oaister.org limiters
12
www.oaister.org sort
13
www.oaister.org results
14
www.oaister.org repositories
15
repositories e.g.,
  • Online Archive of California manuscripts,
    photographs, and works of art held in
    institutions across California
  • arXiv Eprint Archive math and physics pre- and
    post-prints
  • Sammelpunkt, Elektronisch Archivierte Theorie
    archive of philosophical publications
  • British Women Romantic Poets Project collection
    of poems written by British women between 1789
    and 1832

16
repositories stats
  • As of July 03, out of 191 repositories
  • U.S. and foreign
  • U.S. 49 (94)
  • Foreign 51 (97)
  • By subject
  • Humanities 26 (50)
  • Science 30 (58)
  • Mixed 43 (83)
  • E-prints and pre-prints
  • Using eprints.org software 41 (78)
  • Not using eprints.org software 58 (110)

17
major issues encountered
  • Metadata variation
  • Records not leading to digital objects
  • Access restrictions on digital objects described
    in records
  • Duplicate records for a single digital object

18
issue metadata variation
  • With more records, users need more restrictions
  • Consistent metadata needed to facilitate these
    restrictions
  • One option normalization of data

19
issue metadata variation
  • Type the obvious quick win
  • 240 metadata values mapped to four generic values
    (text, image, audio, video)
  • e.g.,
  • audio, sound audio
  • motion, animation, newsreels, etc. video
  • watercolour, watercolor, slides, etc. image
  • article, articles, booklet, diss, story, etc.
    text

20
issue metadata variation
  • Date where to begin?
  • Most records with at least one date
  • Some records include up to seven dates
  • No consistent style of date
  • Subject out of context, what meaning?
  • Many records with at least one subject element
  • But over 100 records with more than 50 subjects
  • And one record with 1000!

21
issue metadata variation
  • Sample date values
  • ltdategt2-12-01lt/dategt
  • ltdategt2002-01-01lt/dategt
  • ltdategt0000-00-00lt/dategt
  • ltdategt1822lt/dategt
  • ltdategtbetween 1827 and 1833lt/dategt
  • ltdategt18--?lt/dategt
  • ltdategtNovember 13, 1947lt/dategt
  • ltdategtSEP 1958lt/dategt
  • ltdategt235 bcelt/dategt
  • ltdategtSummer, 1948lt/dategt

22
issue metadata variation
  • Sample subject values
  • ltsubjectgt30,51,52lt/subjectgt
  • ltsubjectgt1852, Apr. 22. Everitt Judson, letter
    to Philuta Judson.lt/subjectgt
  • ltsubjectgtSlavery--United States--Controversial
    literaturelt/subjectgt
  • ltsubjectgtview of interior with John Henry
    sculpturelt/subjectgt
  • ltsubjectgtParticles (Nuclear physics) --
    Research.lt/subjectgt

23
issue no digital objects
  • Some records contain links to further description
    of digital object
  • But not the digital object itself
  • Culling difficult
  • One option add explanatory text to site

24
issue access restrictions
  • No records where metadata itself is restricted in
    use (as far as we know!)
  • Definitely some records where objects are
    restricted to licensed users
  • One option add explanatory text to site

25
issue access restrictions
  • DC Rights element often not enough info about
    viewing restrictions
  • Currently no protocol method for indicating
    restricted digital objects (i.e., yes/no toggle
    element)
  • Need to assess whether users feel informed or
    frustrated when encountering restricted objects

26
issue duplicate records
  • Two records harvested, different identifiers,
    same object described and pointed to
  • Acquired in two ways
  • Harvesting of original repository and aggregator
  • Receiving static DC records provided by content
    creator and harvesting aggregator

27
issue duplicate records
  • Aggregators can contain records not currently
    available through OAI channels
  • Aggregators do not always contain all the records
    of a particular original repository
  • So, need to harvest both aggregator and original
    repositories

28
issue duplicate records
  • Harvest records from aggregator
  • Also receive from original content creator, but
    as snapshot
  • e.g., MEO and cogprints
  • Snapshot before aggregator
  • Creator unsure all records would be aggregated

29
issue duplicate records
  • Were duplicates to be identified, how to deal
    with the issue?
  • Suppress?
  • Group?
  • Flag?
  • So far, not addressed in OAIster

30
assessment
  • Large survey (over 400 respondents)
  • 2 rounds of face-to-face and remote user testing
  • Conducted before design and after phase one
    rollout

31
assessment survey
  • Online journals and reference materials wanted
    over other digital objects
  • Difficult to search for information every
    service different where to start
  • Number of respondents (5) indicated they were
    generally successful in finding resources online

32
assessment user testing
  • No short and long record formats one size fits
    all
  • Want clearly defined and labeled AND/OR searching
    options
  • Results clear and easy to understand
  • Want to sort by title, date, institution,
    resource formatyou name it!
  • Use OAIster for academic, trustworthy, authentic
    materials

33
service providers comparison

high
  • Focus on high usability
  • Focus on all content available
  • Some service providers have increased
    functionality (e.g., de-duplication, integration
    of thesauri)

UIUC, Emory, etc.
OAIster
Usability
DP-9
Ad hoc
low
some
all
Content
34
future of OAIster
  • Make it faster
  • Advanced searching
  • Grouping to aid browsing
  • Saving/emailing/downloading records
  • Further normalization of data
  • Handling duplicate records
  • Collaboration with other services search,
    instructional

35
current state of protocol
  • Popular
  • As Peter Suber says
  • no other single idea or technology in the
    open-source movement has enjoyed this density of
    endorsement and adoption in a six month period.
  • Data providers over one year
  • June 02 56 repositories / 274,062 records
  • June 03 187 repositories / 1,246,953 records
  • Over three-fold increase for repositories
  • Over four-fold increase for records

36
future of protocol
  • Branching out
  • HTTP vs. SOAP
  • DC required vs. highly recommended
  • Use of OAI in closed environments
  • Static repository protocol
  • OAI-rights committee
  • Need for add-on applications
  • OAI evangelism

37
how can you be in OAIster?
  • OAI-enable your data
  • DLXS customer easiest
  • Make sure data is UTF-8 / Unicode compliant
  • Provide as much metadata as you can
  • Use standard element tags
  • Develop sets for service providers
  • Let us know youre ready to be harvested
  • Keep us informed about changes to the harvesting
    URL, new data and deleted data, change in contact
    info

38
how can you use OAIster?
  • Just about anywhere
  • Reference desks
  • Tool for researchers and faculty
  • Inclusion into list of electronic resources
    and/or subject guides
  • It is
  • freely available
  • regularly updated
  • simple to use

39
contact info
  • Kat Hagedorn
  • University of Michigan Libraries, Digital Library
    Production Service
  • khage_at_umich.edu
  • http//www.oaister.org/
Write a Comment
User Comments (0)
About PowerShow.com