Title: Metadata Modularization Concepts and Tools
1Metadata ModularizationConcepts and Tools
- Carl Lagoze
- CS502
- 2001-03-14
2Metadata
Structured data about data.
3Why is Metadata important?
- Key to organizing, managing, preserving, and
locating content and services in digital libraries
4Why is Metadata difficult?
- Cost
- Interoperability
- Syntax
- Semantics
- Customizability
- Extensibility
- Distribution
- Integrity, Authenticity, Quality
- Human and Machine Factors
- Naming
5Metadata Thoughts
- Metadata takes a variety of forms
- descriptive cataloging
- specialized
- terms and conditions
- administrative
- content ratings
- provenance
- linkage
6More Metadata Thoughts
- New metadata sets will continually evolve
- Many metadata sets are community-specific
- administration
- use
- Human and machine use
7Dublin Core
- Metadata Set for Simple Resource Discovery
- 15 elements allowing simple descriptive sentences
about document like objects - Document has title Hamlet
- Document has creator William Shakespeare
- Document has subject love and anguish
8The Dublin Core 15
- Title
- Creator
- Subject /Keywords
- Description
- Publisher
- Other Contributor
- Date
- Resource Type
- Format
- Resource Identifier
- Source
- Language
- Relation
- Coverage
- Rights Management
9A Scope for the Dublin Core
- Increase or decrease number of elements?
- Structured or Unstructured value syntax?
- Accommodate community extensions?
10Warwick Framework
- Provide context for Dublin Core effort
- Integrate multiple sets of metadata addressing
issues of - individual integrity
- distinct audiences
- separate realms of responsibility and management
11Warwick Framework Design
- Containers for aggregating
- Packages of typed metadata sets
- General principles - information hiding
- only operation defined at container level returns
sequence of contained packages - packages are opaque at the container level
- access to package contents subject to terms and
conditions
12Package Types
- Simple metadata set
- segregating distinct metadata into separate
packages - Recursive container
- nesting semantically related metadata sets
- Indirect reference
- allowing distribution and sharing of metadata sets
13Metadata Container
Container
Package Dublin Core
Package MARC record
Package Indirect Reference
Package Terms and Conditions
URI
14Open Implementation Issues
- Data encoding
- Semantic interaction of overlapping sets
- between semantically-related packages
- between semantically distinct packages
- Type registry
15Modeling Encoding Metadata Components XML
Namespaces
- Prevent term clash
- record?, creator?
- Establish concept spaces through URIs
xmlnsdchttp//purl.org/dcxmlnsabchttp//ilr
t.ac.uk/abcltdccreatorgtHerbert Van de
Sompellt/dccreatorgtltabcorganizationgtCornell
Universitylt/abcorganizationgt
16 Modeling Encoding Metadata Components RDF
- RDF (Resource Description Format)
- The instantiation of the Warwick Framework on the
Web - Provides enabling technology for
richly-structured metadata - Rich data model supporting notions of distinct
entities and properties - Syntax expressed in XML
17RDF Components
- Formal data model
- Syntax for interchange of data
- Schema Type system (schema model)
18RDF Data Model
- Directed labeled graphs
- Model elements
- Resource
- Property
- Value
- Statement
- Containers
19RDF Model Primitives
Resource
Property
Value
20RDF Syntax Example
URIR
Title
CIMI Presentation
Creator
Eric Miller
ltRDF xmlns http//www.w3.org/TR/WD-rdf-syntax
xmlnsdc http//purl.org/dc/element
s/1.0/gt ltDescription about URIRgt
ltdcTitlegt CIMI Presentation lt/dcTitlegt
ltdcCreatorgt Eric Miller lt/dcCreatorgt
lt/Descriptiongt lt/RDFgt
21RDF Model Example 2
URIR
Title
CIMI Presentation
Creator
Eric Miller
22RDF Syntax Example 2
ltRDF xmlns http//www.w3.org/TR/WD-rdf-syntax
xmlnsdc http//purl.org/dc/element
s/1.0/ xmlnsbib http//www.bib.org
/personsgt ltDescription about URIRgt
ltdcTitlegt CIMI Presentation lt/dcTitlegt
ltoaCreatorgt ltDescriptiongt
ltbibNamegt Eric Miller lt/bibNamegt
ltbibEmailgt emiller_at_oclc.org lt/bibEmailgt
ltbibAff resource http//www.oclc.org /gt
lt/Descriptiongt lt/oaCreatorgt
lt/Descriptiongt lt/RDFgt
23RDF Containers
- Permit the aggregation of several values for a
property - Express multiple aggregation semantics
- unordered
- sequential or priority order
- alternative
24RDF Schemas
- Declaration of vocabularies
- properties defined by a particular community
- characteristics of properties and/or constraints
on corresponding values - Schema Type System - Basic Types
- Property, Class, SubClassOf, Domain, Range
- Minimal (but extensible) at this time
- minimize significant clashes with typing system
designed for XML Schema WG - Expressible in the RDF model and syntax
25Relationships among vocabularies
dcCreator
marc100
msdirector
bibAuthor
26Bringing it together
- RDF Data Model
- Support consistent encoding, exchange and
processing of metadata critical when aggregating
data from multiple sources - RDF Schema
- Declare, define, reuse vocabularies
- RDF Metadata transmission
- XML encoding
27Interoperability among Metadata Vocabularies
- projections to application-specific metadata
vocabularies
core classes
28Attribute/Value approaches to metadata
The playwright of Hamlet was Shakespeare
Hamlet has a creator
Shakespeare
29run into problems for richer descriptions
The playwright of Hamlet was Shakespeare,who was
born in Stratford
Hamlet has a creator
Shakespeare
30because of their failure to model entity
distinctions
Shakespeare
name
R1
R2
creator
birthplace
title
Stratford
Hamlet
31Understanding Metadata based on Query Capabilities
- Simple boolean tags?
- Agent, time, place questions?
- Who was responsible for what and when
32Applying a Model-Centric Approach
- Formally define common entities and relationships
underlying multiple metadata vocabularies - Describe them (and their inter-relationships) in
a simple logical model - Provide the framework for extending these common
semantics to domain and application-specific
metadata vocabularies.
33Conceptual BasisEvolution of Content over Time
IFLA Entity Model
From Bearman, et. al., D-Lib Magazine, January
1999.
34Events are key to understanding metadata
relationships?
- Recognizing inherent lifecycle aspects of digital
content - transformation of input resources to
output resources and of their descriptions.
(e.g., IFLA model) - Modeling implied events as first-class objects
provides attachment points for common entities
e.g., agents, contexts (times places), roles. - Clarifying attachment points facilitates mapping
across common entities in different vocabularies.
35Content, Events, Descriptions
36Museum Data