Title: Metadata Interoperability and CONTENTdm
1Metadata Interoperability and CONTENTdm
- Midwest CONTENTdm Users Group
- April 30, 2008
- IUPUI
- Indianapolis, IN
- Amy Jackson, amyjacks_at_uiuc.edu
- Myung-ja Han, mhan3_at_uiuc.edu
- University of Illinois at Urbana Champaign
2University of Illinois at Urbana Champaign
- Producer/consumer of metadata in CONTENTdm
- Currently use CONTENTdm to provide public access
to 12 collections - Various projects harvest metadata from 11
CONTENTdm repositories around the nation
3Metadata Interoperability and CONTENTdm
- Metadata and CONTENTdm
- Service provider
- Data provider
- Longitudinal analysis of harvested metadata
- Qualitative results
- Quantitative results
- Service provider view (Amy)
- Data provider view (MJ)
4IMLS Digital Collections and Content
- Project began December 2002 as an IMLS National
Leadership Grant - Carole Palmer, Principal Investigator, 2007-2010
- Tim Cole, Principal Investigator, 2002-2007
- Amy Jackson, Project Coordinator
- Collaboration between UIUC Library and Graduate
School of Library and Information Science - http//imlsdcc.grainger.uiuc.edu/
5IMLS Digital Collections and Content
- Project Objectives
- Implement a collection registry of digital
collections created or developed with funding
from IMLS NLG program - Use OAI-PMH to implement an item-level metadata
repository for items contained in NLG collections - Carry out associated research related to
- Utility and usability of Registry Repository
- Current metadata practices of IMLS NLG grantees
- Implications for interoperability (Framework of
Guidance for Building Good Digital Collections)
6Item-level repository
- Item-level Repository
- Harvesting 71 of 195 Collections (36)
- 37 Repositories (some multiple institutions)
- 10 CONTENTdm repositories
- 310,448 records
- Item Records (self identified types)
- 86 images
- 14 text
7Item-level repository
Number of harvested collections using each DC
field
8Item-level repository
Top Item-level subjects
Archaeology Buildings Photographers Mountains Men
Archaeological site Insect Bodies of water
9OAI-PMH
- All 37 repositories export metadata in simple
Dublin Core - Five export in schemas other than simple or
Qualified Dublin Core - MARC21
- MODS
- OLAC
- ETDMS
10Metadata harvesting
- OAI-PMH
- Harvested approach rather than federated approach
- Data providers create and expose metadata
- Service providers harvest and aggregate
metadata - Based on HTTP and XML
- Requires use of Dublin Core
- Encourages and supports other formats
11How OAI Works (Technically)
Service Provider Data Provider
Digi. Mana. Sys.
- 6 distinct verbs or request
- OAI requests are sent via HTTP
- Responses are sent in valid XML
A G G R E G A T E D
OAI H A R V E S T E R
OAI Data P R O V I D E R
M E T A D A T A
HTTP Request (OAI Verb)
HTTP Response (Valid XML)
12OAI-PMH in CONTENTdm
- Enable oai.txt file
- CONTENTdm base url followed by /cgi-bin/oai.exe
- http//images.library.uiuc.edu8081/cgi-bin/oai.ex
e - OAI verbs
- ?verbIdentify
- Return general information about the archive and
its policies (e.g., datestamp granularity) - http//images.library.uiuc.edu8081/cgi-bin/oai.ex
e?verbIdentify
13(No Transcript)
14OAI-PMH verbs
- Identify
- ListMetadataFormats
- ListSets
- ListIdentifiers
- ListRecords
- GetRecord
15OAI-PMH
- ListSets
- Purpose
- Provide a listing of sets in which records may be
organized (may be hierarchical, overlapping, or
flat) - http//images.library.uiuc.edu8081/cgi-bin/oai.ex
e?verbListSets
16(No Transcript)
17OAI-PMH
- ListRecords
- Purpose
- Retrieves metadata records for multiple items
- Parameters
- from start date
- until end date
- set set to harvest from
- resumptionToken flow control mechanism
- metadataPrefix metadata format
- http//images.library.uiuc.edu8081/cgi-bin/oai.ex
e?verbListRecordsmetadataPrefixoai_dc
18(No Transcript)
19OAI-PMH
- Barriers to sharing metadata through OAI-PMH
- Technical Infrastructure
- Metadata
- Institution/Project
- CONTENTdm
- Compliant with OAI-PMH
- Metadata is mapped to DC
20Harvested Metadata
- How has use of Dublin Core changed over time?
- Records harvested from January 1, 2001 and
December 31, 2006. - Quantitative analysis
- What measurable changes can we see in the
metadata? - Qualitative analysis
- How has use of fields changed over time?
21Quantitative analysis
- Quantitative analysis
- Repetition of elements
- Length of fields
- Use of core fields (Shreeves et al. (2005))
22Quantitative analysis
- Repetition of fields
- Stable
- Length of fields
- Stable
- Use of all 8 core fields
- Declining
23Quantitative Analysis
24Quantitative Analysis
- Of these eight elements, the two elements most
often missing are creator (used in 39 of
records) and rights (52). - Identifier, title, and subject were each used in
over 96 of all records. - Format and description fields have shown the most
significant decline in use since 2003. - Decreased repetition and length of the
description field, and an overall increase in use
of the relation field.
25Percent of Records Containing each DC field
26Conclusions
- Recommendations
- Publish local metadata practices
- Publish crosswalking information
- Expose native metadata in addition to Dublin Core
27- Amy Jackson
- Project Coordinator
- IMLS Digital Collections and Content
- University of Illinois at Urbana Champaign
- amyjacks_at_uiuc.edu
28Data provider view
29- What does exporting mean?
- Qualitative analysis
- - Changes over time
- - Unpacking MARC
- - Incorrect mapping
- - Misuse and confusion of DC elements
- - What top expose and what not
- - Lost in harvesting
- What we have learned
- Recommendations
-
30What does exporting mean?
- Exporting
- Makes collection metadata available for
service providers to harvest. - CONTENTdm has a turnkey option to make this
possible. - Has DC mapping to provide Dublin Core records
to service providers.
31(No Transcript)
32Why export metadata?
- Increases exposure of collections
- Broadens user base
- We can no longer assume that users will come
through the front door, sharing metadata gets us
in the flow (Locan Dempsey)
-
- Metadata for you me
33Qualitative analysis
- 225 records from 6 repositories (time
increments) - - Document changes in practice over time
- - Compare original record vs. harvested record
in service providers environment - 600 randomly selected records
- 95 records from 11 repositories and 19
collections harvested from CONTENTdm
34Any Changes over time?
- Only 1 observed change in overtime
- Early records lttitlegtFrankie / Music by Neil
Sedaka words by Howard Greenfield lt/titlegt - Later records lttitlegtFrankielt/titlegtltcreator
gtMusic by Neil Sedaka words by Howard
Greenfieldlt/creatorgt
35Other findings
- Unpacking MARC
- Incorrect mapping
- Misuse and confusion of Dublin Core elements
- What to export and what not
- And
- Lost in harvesting
36Unpacking MARC
- Object Description Photograph bw 6 1/8x8
in. lttypegtPhotograph bampw 6 1/8 x 8
in.lt/typegt - Publication Information Lancaster, Pa.?
Johann Albrecht und Comp.?, 1790? - ltpublishergtLancaster, Pa.? Johann Albrecht und
Comp.?, 1790?lt/publishergt
37Unpacking MARC
- a. MARC 245 could be mapped to
- Subfield 'a' gt lttitlegt
- Subfield 'b' gt lttitlegt or ltalternativegt
- Subfield 'c' gt ltcreatorgt or ltcontributorgt
- Subfield 'f' gt ltdategt
- Subfield 'g' gt ltdategt
- Subfield 'h' gt ltformatgt
- Subfield 'k' gt lttypegt
- Subfield 'n' gt ltdescriptiongt or lttitlegt
- Subfield 'p' gt ltdescriptiongt or lttitlegt
38Unpacking MARC
- b. MARC 260
- ltpublishergt
- ltdategt
- c. MARC 6xx
- ltsubjectgt
- ltcoverage-temporalgt
- ltcoverage-spatialgt
- lttypegt
-
39Incorrect Mapping
- a. Digital Reproduction Information Scanned as a
3000 pixel TIFF image in 8-bit grayscale, resized
to 640 pixels in the longest dimension and
compressed into JPEG format using Photoshop 6.0
and its JPEG quality measurement 3. - Where do you map this?
- ltformatgt Scanned as a 3000 pixel TIFF image in
8-bit grayscale, resized to 640 pixels in the
longest dimension and compressed into JPEG format
using Photoshop 6.0 and its JPEG quality
measurement 3.
40Incorrect Mapping
- b. Repository University of Prominent Libraries.
Special Collections Division. - Repository Collection Prominent Photograph
Collection. PH Coll 282 - Where do you map these?
- ltsourcegt University of Prominent Libraries.
Special Collections Division. - ltsourcegt Prominent Photograph Collection. PH
Coll 282
41Incorrect Mapping
- c. Physical description 9 in. x 6 in.
- Where do you map this?
-
- ltdescriptiongt 9 in. x 6 in.
42Misuse of Dublin Core elements
- a. ltdategt and ltcoveragegt
- - Item about the nineteenth century, published
in 2007. -
- Metadata should be?
- ltdategt1800-1899
- OR
- ltdategt2007
- ltcoveragegt1800-1899
43Misuse of Dublin Core elements
- b. ltsourcegt and ltrelationgt
- Repository PSMHS Collection is located at the
Museum of History Industry, Seattle - Repository Collection Joe Williamson Collection
- Both of them mapped to ltsourcegt
- ltsourcegt
- A related resource from which the described
resource is derived. - ltrelationgt
- A related resource. - Dublin Core Metadata
Element Set, Version 1.1 -
44Misuse of Dublin Core elements
- c. lttypegt, ltformatgt, and ltdescriptiongt
- lttypegtPhotograph bampw 6 1/8 x 8 in.lt/typegt
- ltformatgt1 tool woodlt/formatgt
- ltdescriptiongt9 in. x 6 in.lt/descriptiongt
- ltdescriptiongtMaterial Whale Bonelt/descriptiongt
45After re-mapping the records
- DC Elements Usages (118 records)
46After re-mapping the records
- Number of records with 8 DC fields
47What to export and what not
- a. Information about scanning?
-
- ltformatgtThree-dimensional objects, oversized
prints and posters photographed with a Nikon D1X
digital camera at resolution of 1312 x 2000
pixels, eight bits per RGB channel in TIF format.
Images downloaded onto CD-R's, then copied using
a Dell Optiplex GX150 and stored in Network Area
Storage for non-display archival purposes.
Additional copy created for further processing.
If necessary, color correction performed using
Levels in Photoshop. Resized at 720 dpi vertical,
then compressed using Photoshop setting of 80
into JPG format for Web display.lt/formatgt
48What to export and what not
- b. Information about shelf, box, and folder
number of item? - ltdcsourcegt99lt/dcsourcegt
- ltdcsourcegt1lt/dcsourcegt
- ltdcsourcegt14lt/dcsourcegt
- ltdcsourcegt5lt/dcsourcegt
49What to export and what not
- c. Two publishers, which to export?
- Digital Publisher Electronically reproduced by
the Digital Services unit of the University of
Central Florida Libraries, Orlando, 2005. - Publisher Students of Rollins College.
- ltpublishergtStudents of Rollins College.lt/publisher
gt - The Digital Publisher information is not mapped
to export.
50(No Transcript)
51(No Transcript)
52Lost in harvesting
53(No Transcript)
54(No Transcript)
55What we have learned
- Native metadata records are rich in meaning in
their own environment, but lose richness in the
aggregated environment due to mapping errors and
misunderstanding and misuse of Dublin Core
elements. - Mapping is often based on semantic meanings of
metadata fields rather than value strings. - Correct mapping could improve metadata quality
significantly.
56CONTENTdm Collections
- Could be exposed via service providers in DC
format - Could be exposed via WorldCat in MARC format
- How can we provide good records to users in
service providers environments?
57Recommendations
- Create a project based best practices and
content standard - Consider using field names that can be useful
globally - Ensure that metadata creators receive proper
training - But first of all,
58Use Qualified Dublin Core elements
59Questions and comments
- Myung-ja (mj) Han
- Metadata Librarian
- University of Illinois at Urbana-Champaign
- mhan3_at_uiuc.edu