Metadata Interoperability and CONTENTdm - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Metadata Interoperability and CONTENTdm

Description:

... elements most often missing are creator (used in 39% of records) ... Subfield 'c' = creator or contributor Subfield 'f' = date Subfield 'g' = date ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 60
Provided by: imlsdccGr
Category:

less

Transcript and Presenter's Notes

Title: Metadata Interoperability and CONTENTdm


1
Metadata Interoperability and CONTENTdm
  • Midwest CONTENTdm Users Group
  • April 30, 2008
  • IUPUI
  • Indianapolis, IN
  • Amy Jackson, amyjacks_at_uiuc.edu
  • Myung-ja Han, mhan3_at_uiuc.edu
  • University of Illinois at Urbana Champaign

2
University of Illinois at Urbana Champaign
  • Producer/consumer of metadata in CONTENTdm
  • Currently use CONTENTdm to provide public access
    to 12 collections
  • Various projects harvest metadata from 11
    CONTENTdm repositories around the nation

3
Metadata Interoperability and CONTENTdm
  • Metadata and CONTENTdm
  • Service provider
  • Data provider
  • Longitudinal analysis of harvested metadata
  • Qualitative results
  • Quantitative results
  • Service provider view (Amy)
  • Data provider view (MJ)

4
IMLS Digital Collections and Content
  • Project began December 2002 as an IMLS National
    Leadership Grant
  • Carole Palmer, Principal Investigator, 2007-2010
  • Tim Cole, Principal Investigator, 2002-2007
  • Amy Jackson, Project Coordinator
  • Collaboration between UIUC Library and Graduate
    School of Library and Information Science
  • http//imlsdcc.grainger.uiuc.edu/

5
IMLS Digital Collections and Content
  • Project Objectives
  • Implement a collection registry of digital
    collections created or developed with funding
    from IMLS NLG program
  • Use OAI-PMH to implement an item-level metadata
    repository for items contained in NLG collections
  • Carry out associated research related to
  • Utility and usability of Registry Repository
  • Current metadata practices of IMLS NLG grantees
  • Implications for interoperability (Framework of
    Guidance for Building Good Digital Collections)

6
Item-level repository
  • Item-level Repository
  • Harvesting 71 of 195 Collections (36)
  • 37 Repositories (some multiple institutions)
  • 10 CONTENTdm repositories
  • 310,448 records
  • Item Records (self identified types)
  • 86 images
  • 14 text

7
Item-level repository
Number of harvested collections using each DC
field
8
Item-level repository
Top Item-level subjects
Archaeology Buildings Photographers Mountains Men
Archaeological site Insect Bodies of water
9
OAI-PMH
  • All 37 repositories export metadata in simple
    Dublin Core
  • Five export in schemas other than simple or
    Qualified Dublin Core
  • MARC21
  • MODS
  • OLAC
  • ETDMS

10
Metadata harvesting
  • OAI-PMH
  • Harvested approach rather than federated approach
  • Data providers create and expose metadata
  • Service providers harvest and aggregate
    metadata
  • Based on HTTP and XML
  • Requires use of Dublin Core
  • Encourages and supports other formats

11
How OAI Works (Technically)
Service Provider Data Provider
Digi. Mana. Sys.
  • 6 distinct verbs or request
  • OAI requests are sent via HTTP
  • Responses are sent in valid XML

A G G R E G A T E D
OAI H A R V E S T E R
OAI Data P R O V I D E R
M E T A D A T A
HTTP Request (OAI Verb)
HTTP Response (Valid XML)
12
OAI-PMH in CONTENTdm
  • Enable oai.txt file
  • CONTENTdm base url followed by /cgi-bin/oai.exe
  • http//images.library.uiuc.edu8081/cgi-bin/oai.ex
    e
  • OAI verbs
  • ?verbIdentify
  • Return general information about the archive and
    its policies (e.g., datestamp granularity)
  • http//images.library.uiuc.edu8081/cgi-bin/oai.ex
    e?verbIdentify

13
(No Transcript)
14
OAI-PMH verbs
  • Identify
  • ListMetadataFormats
  • ListSets
  • ListIdentifiers
  • ListRecords
  • GetRecord

15
OAI-PMH
  • ListSets
  • Purpose
  • Provide a listing of sets in which records may be
    organized (may be hierarchical, overlapping, or
    flat)
  • http//images.library.uiuc.edu8081/cgi-bin/oai.ex
    e?verbListSets

16
(No Transcript)
17
OAI-PMH
  • ListRecords
  • Purpose
  • Retrieves metadata records for multiple items
  • Parameters
  • from start date
  • until end date
  • set set to harvest from
  • resumptionToken flow control mechanism
  • metadataPrefix metadata format
  • http//images.library.uiuc.edu8081/cgi-bin/oai.ex
    e?verbListRecordsmetadataPrefixoai_dc

18
(No Transcript)
19
OAI-PMH
  • Barriers to sharing metadata through OAI-PMH
  • Technical Infrastructure
  • Metadata
  • Institution/Project
  • CONTENTdm
  • Compliant with OAI-PMH
  • Metadata is mapped to DC

20
Harvested Metadata
  • How has use of Dublin Core changed over time?
  • Records harvested from January 1, 2001 and
    December 31, 2006.
  • Quantitative analysis
  • What measurable changes can we see in the
    metadata?
  • Qualitative analysis
  • How has use of fields changed over time?

21
Quantitative analysis
  • Quantitative analysis
  • Repetition of elements
  • Length of fields
  • Use of core fields (Shreeves et al. (2005))

22
Quantitative analysis
  • Repetition of fields
  • Stable
  • Length of fields
  • Stable
  • Use of all 8 core fields
  • Declining

23
Quantitative Analysis
24
Quantitative Analysis
  • Of these eight elements, the two elements most
    often missing are creator (used in 39 of
    records) and rights (52).
  • Identifier, title, and subject were each used in
    over 96 of all records.
  • Format and description fields have shown the most
    significant decline in use since 2003.
  • Decreased repetition and length of the
    description field, and an overall increase in use
    of the relation field.

25
Percent of Records Containing each DC field
26
Conclusions
  • Recommendations
  • Publish local metadata practices
  • Publish crosswalking information
  • Expose native metadata in addition to Dublin Core

27
  • Amy Jackson
  • Project Coordinator
  • IMLS Digital Collections and Content
  • University of Illinois at Urbana Champaign
  • amyjacks_at_uiuc.edu

28
Data provider view
29
  • What does exporting mean?
  • Qualitative analysis
  • - Changes over time
  • - Unpacking MARC
  • - Incorrect mapping
  • - Misuse and confusion of DC elements
  • - What top expose and what not
  • - Lost in harvesting
  • What we have learned
  • Recommendations

30
What does exporting mean?
  • Exporting
  • Makes collection metadata available for
    service providers to harvest.
  • CONTENTdm has a turnkey option to make this
    possible.
  • Has DC mapping to provide Dublin Core records
    to service providers.

31
(No Transcript)
32
Why export metadata?
  • Increases exposure of collections
  • Broadens user base
  • We can no longer assume that users will come
    through the front door, sharing metadata gets us
    in the flow (Locan Dempsey)

  • - Metadata for you me

33
Qualitative analysis
  • 225 records from 6 repositories (time
    increments)
  • - Document changes in practice over time
  • - Compare original record vs. harvested record
    in service providers environment
  • 600 randomly selected records
  • 95 records from 11 repositories and 19
    collections harvested from CONTENTdm

34
Any Changes over time?
  • Only 1 observed change in overtime
  • Early records lttitlegtFrankie / Music by Neil
    Sedaka words by Howard Greenfield lt/titlegt
  • Later records lttitlegtFrankielt/titlegtltcreator
    gtMusic by Neil Sedaka words by Howard
    Greenfieldlt/creatorgt

35
Other findings
  • Unpacking MARC
  • Incorrect mapping
  • Misuse and confusion of Dublin Core elements
  • What to export and what not
  • And
  • Lost in harvesting

36
Unpacking MARC
  • Object Description Photograph bw 6 1/8x8
    in. lttypegtPhotograph bampw 6 1/8 x 8
    in.lt/typegt
  • Publication Information Lancaster, Pa.?
    Johann Albrecht und Comp.?, 1790?
  • ltpublishergtLancaster, Pa.? Johann Albrecht und
    Comp.?, 1790?lt/publishergt

37
Unpacking MARC
  • a. MARC 245 could be mapped to
  • Subfield 'a' gt lttitlegt
  • Subfield 'b' gt lttitlegt or ltalternativegt
  • Subfield 'c' gt ltcreatorgt or ltcontributorgt
  • Subfield 'f' gt ltdategt
  • Subfield 'g' gt ltdategt
  • Subfield 'h' gt ltformatgt
  • Subfield 'k' gt lttypegt
  • Subfield 'n' gt ltdescriptiongt or lttitlegt
  • Subfield 'p' gt ltdescriptiongt or lttitlegt

38
Unpacking MARC
  • b. MARC 260
  • ltpublishergt
  • ltdategt
  • c. MARC 6xx
  • ltsubjectgt
  • ltcoverage-temporalgt
  • ltcoverage-spatialgt
  • lttypegt

39
Incorrect Mapping
  • a. Digital Reproduction Information Scanned as a
    3000 pixel TIFF image in 8-bit grayscale, resized
    to 640 pixels in the longest dimension and
    compressed into JPEG format using Photoshop 6.0
    and its JPEG quality measurement 3.
  • Where do you map this?
  • ltformatgt Scanned as a 3000 pixel TIFF image in
    8-bit grayscale, resized to 640 pixels in the
    longest dimension and compressed into JPEG format
    using Photoshop 6.0 and its JPEG quality
    measurement 3.

40
Incorrect Mapping
  • b. Repository University of Prominent Libraries.
    Special Collections Division.
  • Repository Collection Prominent Photograph
    Collection. PH Coll 282
  • Where do you map these?
  • ltsourcegt University of Prominent Libraries.
    Special Collections Division.
  • ltsourcegt Prominent Photograph Collection. PH
    Coll 282

41
Incorrect Mapping
  • c. Physical description 9 in. x 6 in.
  • Where do you map this?
  • ltdescriptiongt 9 in. x 6 in.

42
Misuse of Dublin Core elements
  • a. ltdategt and ltcoveragegt
  • - Item about the nineteenth century, published
    in 2007.
  • Metadata should be?
  • ltdategt1800-1899
  • OR
  • ltdategt2007
  • ltcoveragegt1800-1899

43
Misuse of Dublin Core elements
  • b. ltsourcegt and ltrelationgt
  • Repository PSMHS Collection is located at the
    Museum of History Industry, Seattle
  • Repository Collection Joe Williamson Collection
  • Both of them mapped to ltsourcegt
  • ltsourcegt
  • A related resource from which the described
    resource is derived.
  • ltrelationgt
  • A related resource. - Dublin Core Metadata
    Element Set, Version 1.1

44
Misuse of Dublin Core elements
  • c. lttypegt, ltformatgt, and ltdescriptiongt
  • lttypegtPhotograph bampw 6 1/8 x 8 in.lt/typegt
  • ltformatgt1 tool woodlt/formatgt
  • ltdescriptiongt9 in. x 6 in.lt/descriptiongt
  • ltdescriptiongtMaterial Whale Bonelt/descriptiongt

45
After re-mapping the records
  • DC Elements Usages (118 records)

46
After re-mapping the records
  • Number of records with 8 DC fields

47
What to export and what not
  • a. Information about scanning?
  • ltformatgtThree-dimensional objects, oversized
    prints and posters photographed with a Nikon D1X
    digital camera at resolution of 1312 x 2000
    pixels, eight bits per RGB channel in TIF format.
    Images downloaded onto CD-R's, then copied using
    a Dell Optiplex GX150 and stored in Network Area
    Storage for non-display archival purposes.
    Additional copy created for further processing.
    If necessary, color correction performed using
    Levels in Photoshop. Resized at 720 dpi vertical,
    then compressed using Photoshop setting of 80
    into JPG format for Web display.lt/formatgt

48
What to export and what not
  • b. Information about shelf, box, and folder
    number of item?
  • ltdcsourcegt99lt/dcsourcegt
  • ltdcsourcegt1lt/dcsourcegt
  • ltdcsourcegt14lt/dcsourcegt
  • ltdcsourcegt5lt/dcsourcegt

49
What to export and what not
  • c. Two publishers, which to export?
  • Digital Publisher Electronically reproduced by
    the Digital Services unit of the University of
    Central Florida Libraries, Orlando, 2005.
  • Publisher Students of Rollins College.
  • ltpublishergtStudents of Rollins College.lt/publisher
    gt
  • The Digital Publisher information is not mapped
    to export.

50
(No Transcript)
51
(No Transcript)
52
Lost in harvesting
53
(No Transcript)
54
(No Transcript)
55
What we have learned
  • Native metadata records are rich in meaning in
    their own environment, but lose richness in the
    aggregated environment due to mapping errors and
    misunderstanding and misuse of Dublin Core
    elements.
  • Mapping is often based on semantic meanings of
    metadata fields rather than value strings.
  • Correct mapping could improve metadata quality
    significantly.

56
CONTENTdm Collections
  • Could be exposed via service providers in DC
    format
  • Could be exposed via WorldCat in MARC format
  • How can we provide good records to users in
    service providers environments?

57
Recommendations
  • Create a project based best practices and
    content standard
  • Consider using field names that can be useful
    globally
  • Ensure that metadata creators receive proper
    training
  • But first of all,

58
Use Qualified Dublin Core elements
59
Questions and comments
  • Myung-ja (mj) Han
  • Metadata Librarian
  • University of Illinois at Urbana-Champaign
  • mhan3_at_uiuc.edu
Write a Comment
User Comments (0)
About PowerShow.com