Title: IFLADELOSNSF Workshop Standards and Metadata
1IFLA/DELOS/NSF WorkshopStandards and Metadata
- EVA 2000 MoscowNovember 2, 2000
2Introductions
- Thomas Baker
- GMD Library, Bonn, Germany
- Dublin Core Executive Committee
- EU DELOS Network of Excellence
- Carl Lagoze
- Digital Library Research Group, Faculty of
Computing and Information, Cornell University,
Ithaca, NY, USA - Dublin Core Advisory Committee
- NSF Digital Library Initiative
3Workshop Roadmap
- Introduction to Metadata (30 min.)
- Dublin Core Metadata Initiative (60 min.)
- Break
- Simplicity and Complexity (45 min.)
- Metadata Infrastructure (45 min.)
- Lunch
- Deploying and Using Metadata (90 min.)
- Metadata Landscape (30 min.)
4Introduction to Metadata
5Havent we done metadata already?
6Whats wrong with this model?
- Expensive
- Complex (even for its original goal?)
- Professional intervention (assumes single
community of expertise) - Monolithic
- One size fits all approach
- Reflects its centralized system origins
- Bias towards physical artifacts
- Fixed resources
- Incomplete handling of resource evolution and
other resource relationships
7Internet Commons includes Multiple Communities
8Web Challenge to Traditional Cataloging
- Scale
- Permanence
- Authenticity
- Organizational Context
- Variety
9State of the Web as an Information System
- Search systems are motivated by advertising
- Index coverage is unpredictable and limited (1/3)
- Too much recall, too little precision
- Index spam abounds
- Resources (and their names) are volatile
- What about versions, editions, back issues?
- Archiving is presently unsolved
- Authority and quality of service are spotty
- Managing Intellectual Property Rights is hard
10Metadata Part of a Solution
- Structured data about data
- helps to impose order on chaos
- enables automated discovery/manipulation
- Variety across various dimension
- specialization
- decentralization
- democratization
11Metadata Takes Many Forms
12Metadata Challenges
- Accommodate multiple varieties of metadata
- Tension functionality and simplicity
- Tension extensibility and interoperability
- Human and machine creation and use
- Community-specific functionality, creation,
administration, access
13Warwick Framework Containing Chaos
- Conceptual Architecture for metadata from the
Warwick Metadata Workshop (DC-2) - Conceptual architecture to support the
specification, collection, encoding, and exchange
of modular metadata - Provide context for metadata efforts (including
Dublin Core) - avoids the black-hole of comprehensive element
sets - focuses interoperability issues at package level
14Modularization Allows Distributed Management
- Communities of expertise (not software vendors)
are responsible for - Semantics
- Registration
- Administration
- Access management
- Authority of data
- Sharing and Distribution
15Interoperabilityrequires conventions about
- Semantics
- The meaning of the elements
- Structure
- human-readable
- machine-parseable
- Syntax
- grammars to convey semantics and structure
16Dublin Core Metadata Initiative
17History of the Dublin Core
- 1994 "Do we have a simple set of tags for
ordinary people to describe their Web pages?" - 1995 The Dublin Core 13 elements, later 15
- 1996 The Dublin Core is but one of many
vocabularies needed ("Warwick Framework") - 1997 "WF needs formal expression in a Resource
Description Framework (RDF)" - 2000 Dublin Core Metadata Initiative recommends
qualifiers, broadens its organizational scope
beyond the Core
18A pidgin for digital tourists
- Metadata is language.
- Dublin Core is a small and simple language -- a
pidgin -- for finding resources across domains. - Speakers of different languages naturally
"pidginize" to communicate - E.g., tourists using simple phrases to order beer
("zwei Bier bitte" "dva pivo" "biru o san
bai"...) - We are all "tourists" on the global Internet.
19A grammar of Dublin Core
- http//www.dlib.org/dlib/october00/baker/10baker.h
tml - By design not as subtle as mother tongues, but
easy to learn and extremely useful in practice - Pidgins small vocabularies (Dublin Core fifteen
special nouns and lots of optional adjectives) - Simple grammars sentences (statements) follow a
simple fixed pattern...
20Example Dublin Core statements
- Resource has Title 'Grammar of Dublin Core'.
- Resource has Creator 'Tom Baker'.
- Resource has Subject 'Metadata'.
- Resource has Relation http//foo.org/file.htm.
21implied verb
one of 15 properties
property value (an appropriate literal)
DCCreator DCTitle DCSubject DCDate...
implied subject
Resource
has
property
X
qualifiers (adjectives)
optional qualifier
optional qualifier
22The fifteen special nouns (properties)
23Resource
has
Subject
"Languages -- Grammar"
LCSH
Resource
has
Date
"2000-06-13"
Revised
ISO8601
24Dumb-Down Principle for qualifiers
- The fifteen elements should be usable and
understandable with or without the qualifiers - Like saying that nouns can stand on their own
without adjectives - If your software encounters an unfamiliar
qualifier, look it up -- or just ignore it!
25To test whether qualifiers are "good", cover
them with your hand and ask -- Does the
statement still make sense? -- Is it still
correct?
Resource
has
Subject
"Languages -- Grammar"
LCSH
Resource
has
Date
"2000-06-13"
Revised
ISO8601
26Element Refinements
- Make the meaning of an element narrower or more
specific. - a Date Created versus a Date Modified
- an IsReplacedBy Relation versus a Replaces
Relation - If your software does not understand the
qualifier, you can safely ignore it.
27Value Encoding Schemes
- Says that the value is
- a term from a controlled vocabulary (e.g.,
Library of Congress Subject Headings) - a string formatted in a standard way (e.g.,
"2000-05-03" means May 3, not March 5) - Even if a scheme is not known by software, the
value should be "appropriate" and usable for
resource discovery.
28Peer review of proposals for new terms
- DCMI Usage Committee reviews proposals for new
qualifiers (and perhaps elements) - Evaluates proposals in light of grammatical
principles (are the qualifiers ignorable?) - Tiered model of approval status (tentative)
proposed, conforming, recommended, obsolete - First qualifiers "recommended" in July 2000
- http//purl.org/DC/documents/rec/dcmes-qualifiers-
20000711.htm
29A not-so-good example
Resource
has
Creator
"Last.name Smith
First.name John
Type Person
Affiliation IBM"
30Open questions in Dublin Core
- What are "appropriate values" for the fifteen
properties? How can they be used for
cross-domain searching? - How can DCMI control the evolution of Dublin Core
as it is adapted in practice? - How can an application use DC as a pidgin while
describing resources with more complex metadata? - Can we keep the Core simple?
31Search buckets versus description
- Think of DC elements as fuzzy search buckets
- Different types of data appropriate for different
buckets URLs, date strings, word strings, names - Separate books about Sigmund Freud versus books
by Sigmund Freud into different buckets - Search bucket for discovering resources
- But general, fuzzy categories may not be
sufficient for describing resources - After searching, display more detailed
descriptions on screen
32DCMI broadens its mission (Oct 2000)
- The mission of the DCMI is to make it easier to
find resources using the Internet through the
following activities - Developing metadata standards for discovery
across domains (example the Dublin Core) - Defining frameworks for the interoperation of
metadata sets - Facilitating the development of community or
disciplinary specific metadata sets that are
consistent with items 1 and 2
33A context for the Core
- If "the Dublin Core" is the core of DCMI, what is
the surrounding context? - If "the Dublin Core" is the simple pidgin, what
is the broader landscape of metadata language? - How do pidgins relate to more complex models or
"application profiles"? - Do we need pidgins for describing other things,
such as "people" and "events"?
34Using DC with other vocabularies
- Specialized application profiles government
information, education, mathematics may need to - Use general-purpose Dublin Core elements
- Use elements from another, more domain-specific
standard - Narrow standard definitions of DC elements for
specific local uses - Invent local elements outside the scope of
existing standards
35Namespaces versus Profiles
- Namespaces declare terms and definitions
- Dublin Core namespace Dublin Core standard
- Application profiles (only) re-use terms from
namespaces - May package terms from multiple namespaces
- May adapt definitions to local purposes
- All terms must be defined in namespaces
36Adapting standard definitions to local uses
- Dublin Core Namespace
- DCTitle - machine-readable name of an element
- "Title A name given to the resource" --
human-readable name and definition - Collection Description Profile (UKOLN)
- DCTitle - name reused from the DC namespace
- "Title A name given to the collection"
- Definition is modified for the application context
37Example adapting DCTitle to local uses
- As defined in the official Dublin Core
"namespace" - "Title A name given to the resource"
- As defined in a UK "application profile"
- "Title A name given to the collection"
- Definition is narrower
38Profiles may model multiple entities
- "Resource" (a thing) as an entity with its own
- Title (dctitle)
- Date created (dcdate dcqcreated)
- Identifier (dcidentifier)
- "Agent" (a person) with its own
- Name (vcardfn)
- Date of birth (vcardbday)
- Identifier (dcidentifier)
39Namespaces in translation
- Dublin Core has been translated into 26 languages
- machine-readable tokens are shared by all
- human-readable labels are defined in different
languages - translations are distributed, maintained in many
countries
40One token - labels in many languages
dccreator
Server in Germany
DCMI Server
Server in Jakarta
41RDF -- a more powerful sentence pattern
- Dublin Core statements
- Resource has Creator "Tom Baker".
- Resource has Identifier http//foo.org/bar.html.
- Resource Description Framework "triples" - a more
powerful way to say the same thing - http//foo.org/bar.htm has Creator "Tom Baker".
42implied verb
one of 15 properties
property value (an appropriate literal)
DCCreator DCTitle DCSubject DCDate...
implied subject
Resource
has
property
X
qualifiers (adjectives)
optional qualifier
optional qualifier
43predicate
implied verb "has"
property (from any vocabulary)
object, also known as "property value" (a literal
-- or another resource)
explicit subject
Property
"X"
Resource
44DCMI Re-organization
- Expanded mission
- Core metadata elements for Agents (or Events)?
- Frameworks for integrating multiple standards
- Re-organization model
- Membership organization like W3C or Unicode
Consortium? - Retain open consensus model
- International perspective
- Better training, documentation, outreach
45DCMI Open Metadata Registry
- Managing vocabularies defined by the DCMI
- Languages
- Versioning
- Controlled vocabularies
- Foundation for modular, incremental integration
and evolution - Collaboration with European SCHEMAS Project and
ULIS in Tsukuba, Japan - http//wip.dublincore.org/registry/
46Official recognition of the Dublin Core
- CEN Workshop Agreement
- endorse Dublin Core elements as CWA13874
- provide usage guidelines for European industry
- NISO Z39.85
- National Information Standards Organization, an
ANSI affiliate - Balloting concluded in August 2000
47DCMI Activities
- Standards development and maintenance
- Metadata registry
- Technical working groups and periodic workshops
- Tutorial materials and user guides
- Education and training
- Access to software
- Liaisons with other standards or user communities
48DC-9 Workshop in Tokyo, 2001
- DC-8 Workshop was a National Library of Canada
(Ottawa) - emphasis on application profiles, longer-term
organizational mission, and domain-specific
adaptations of Dublin Core - DC-9 in Tokyo well-defined tracks
- implementation reports and research papers
- ongoing technical working group meetings
- general introduction and tutorials for non-experts
49Simplicity and Complexity
50Warwick Framework
- Container/Package approach to metadata
- Rejection of universal ontology
- Recognition of individual community needs
- Provide scope for metadata efforts
51Warwick Framework Design
Container
Package Dublin Core
- Containers for aggregating Packages of typed
metadata sets
Package MARC Metadata
URI
Package Terms and Conditions
Package Indirect Reference
52Warwick FrameworkImplementation and Research
- Packaging, linking, storing, and transmitting
component/package framework - Semantic interactions and interoperability among
multiple metadata packages/vocabularies
53Interoperability among Metadata Vocabularies
- projections to application-specific metadata
vocabularies
abc core classes
54Harmony Project
- Project Investigators
- Dan Brickley - ILRT, Bristol (U.K.)
- Jane Hunter - DSTC, Brisbane (Australia)
- Carl Lagoze - Computer Science, Cornell (U.S.)
- More Information
- http//www.ilrt.bris.ac.uk/discovery/harmony/
55Attribute/Value approaches to metadata
The playwright of Hamlet was Shakespeare
Hamlet has a creator
Shakespeare
56run into problems for richer descriptions
The playwright of Hamlet was Shakespeare,who was
born in Stratford
Hamlet has a creator
Shakespeare
57because of their failure to model entity
distinctions
Shakespeare
name
R1
R2
creator
birthplace
title
Stratford
Hamlet
58Applying a Model-Centric Approach
- Formally define common entities and relationships
underlying multiple metadata vocabularies - Describe them (and their inter-relationships) in
a simple logical model - Provide the framework for extending these common
semantics to domain and application-specific
metadata vocabularies.
59Applications of the ABC Model
- Guidance for communities developing vocabularies
- Foundation for understanding existing
vocabularies - Basis for mappings among vocabularies using
formalisms such as RDF
60Harmony/ABC Workshop
- January 27-28 2000 CNI Washington
- Representatives from
- Dublin Core, INDECS, MPEG-7, IFLA
- Archives, Museums, Libraries, Audiovisual
- Result Importance of processes, events, and
states in understanding and describing resources
61Conceptual BasisEvolution of Content over Time
IFLA Entity Model
From Bearman, et. al., D-Lib Magazine, January
1999.
62Events help metadata relationships?
- Recognizing inherent lifecycle aspects of digital
content - transformation of input resources to
output resources and of their descriptions.
(e.g., IFLA model) - Modeling implied events as first-class objects
provides attachment points for common entities
e.g., agents, contexts (times places), roles. - Clarifying attachment points facilitates mapping
across common entities in different vocabularies.
63Content, Events, Descriptions
64ABC Event Model
65A Simple ExampleLive At Lincoln Performance
- Performance at The Lincoln Center for the
Performing Arts - On April 7, 1998 at 8pm Eastern time
- Orchestra is New York Philharmonic
- Musical score Concerto for Violin
- 130 minute MP3 audio recording
- Rights held by Lincoln Center
66Example in ABC Model
67ltABCgt ltEvent id"E1" Type"Performance"gt
ltTitlegtLive At the Lincoln Centrelt/Titlegt
ltContextgt ltDategt7/4/98lt/Dategt
ltTimegt2000lt/Timegt ltPlacegtLincoln
Centrelt/Placegt lt/Contextgt ltAct
id"Act1"gt ltAgentgtNew York
Philharmoniclt/Agentgt ltRolegtOrchestralt/Rolegt
lt/Actgt ltInput id"comp523"/gt
ltOutput id"audio8215"/gt ltRightsgt
Lincoln Center for Performing Arts
lt/Rightsgt lt/Eventgt lt/ABCgt
68Derivation of Multiple Views
Dublin Core in XML/RDF
ABC Description in XML
ID3 tags embedded in MP3
MPEG-7 description in DDL
69Step 1 Structural Mapping
Event-aware model
Resource-centric model
70Structural Mapping Rules
- Event attributes transferred to output
- Context/Date, /Time, /Place -gt Date.Performance,
Time.Performance, Place.Performance - Act/Role -gt Agent.Role e.g. Orchestra
- Event Type -gt Relation between input ouput
- e.g. Performance -gtRelation.isPerformanceOf
- Output Description generated from event Type and
input Title e.g. Performance of Concerto for
Violin
71- ltResource id"audio8215"gt ltTitlegtLive At Lincoln
Centerlt/Titlegt ltDate.Performancegt1998-07-04 - lt/Date.Performancegt ltTime.Performancegt20
00lt/Time.Performancegt ltPlace.PerformancegtLincoln
Centre - lt/Place.Performancegt ltAgent.OrchestragtNew
York Philharmonic - lt/Agent.Orchestragt
- ltRelation.isPerformanceOfgtcomp523
- lt/Relation.isPerformanceOfgt
ltDescriptiongtPerformance of 'Concerto for
Violin'lt/Descriptiongt ltRightsgtLincoln Center for
Performing Arts lt/Rightsgt
ltTypegtaudiolt/Typegt ltFormatgtMP3lt/Formatgt ltLength
units"mins"gt130lt/Lengthgt - lt/Resourcegt
72Step 2 Semantic Mapping
73XSLT for Transformations
- Works well for structural and syntactic mapping
between metadata descriptions - Semantic mappings need to be hardcoded
- Unsuitable for loosely constrained or variable
input
74A More General Solution
- Flexible semantic mappings require additional
knowledge - Metadata Term Ontology MetaNet
- Methods for using that context knowledge for
mapping - Some combination of procedural language (Java)
and XSLT - Investigating more general mapping rule language
(analogies to compiler technology)
75Planned Experimental Context
- CIMI Experiments
- Dublin Core for basic resource descriptions
- Richer descriptions derived from ABC model
- Mapping among descriptions
- Understanding relationship between ABC and CIDOC
CRM - Connecting with Recordkeeping Metadata Issue -
SPIRT Project
76Metadata Infrastructure
77Metadata is language
- Metadata schemas are languages for making
statements about resources - Book has Title "Gone with the Wind".
- Web page has Publisher "Springer Verlag".
- Vocabulary terms (elements) are defined in
standards like Dublin Core - Metadata grammars constrain the statements and
data models one can form
78But languages evolve with use
- Inevitably, languages resist stability
- People stretch official definitions
- Implementers misunderstand the intended meaning
or use of elements - Implementors coin local terms and extensions
- If the application does not fit the standard, the
standard is often "customized" to fit the
application
79Metadata languages are "multilingual"
- Metadata is not a spoken language
- The words of metadata -- "elements" -- are
symbols that stand for concepts expressible in
multiple natural languages - Standards may have dozens of translations
- Are concepts like "title", "author", or "subject"
used the same way in English, Finnish, and Korean?
80What metadata languages lack
- Comprehensive dictionaries
- Where can one get an overview of vocabulary terms
used in metadata languages? - A publication context for implementers
- Where can you see how they are using metadata?
- Standard grammars
- How do we understand the principles of metadata?
81Can we manage this evolution?
- How can we (scalably) monitor the usage of a
language that is - Never spoken?
- Rarely published in a way that can be harvested?
- How can dictionary editors help a metadata
language evolve and grow in response to usage? - How can this evolution occur across (human)
languages?
82RDF Schemas (RDFS) -- W3C standard
- A dictionary format for metadata terms
- Simple XML format for terms and definitions
- Example "Title" (Dublin Core)
- Human-readable label and definition
- Title A name given to the resource.
- Unique, machine-readable identifiers
- dctitle
- Support for cross-references
- between terms in related standards
- between local adaptations and related standards
83Print world versus the Web
- Traditional print world
- Standards are currently defined and published as
paper documents or Web pages in HTML - Metadata implementors rarely publish their local
extensions and adaptations - RDF Schemas (RDFS)
- Web-based publication format
- Explicit cross references from implementation
schemas and the standards on which they are based
84EOR -- an RDF Schema Browser
- Harvests RDF Schemas
- Schemas distributed on multiple Web servers
- Creates huge database of schemas for searching
- Web interface functions as a "metadata browser"
- Click on cross-references between linked terms
- Downloadable as open source software
- http//eor.dublincore.org/index.html
- Authors Eric Miller (OCLC, RDF Working Group,
DCMI) and Tod Matola
85Hyperlink Metadata Terms over the Web
- Index of metadata terms searchable as one huge
database - Click on cross-references to follow term-to-term
links between vocabularies - Point-to-point, like the Web itself
- In 1992, Gopher located the right file within
directory trees (but not points within the file) - HTML enabled point-to-point links between
documents
86"Editor" -- a MARC relator -- refines
"Contributor"
87Follow the link to MARC Relator Terms
88...the source of which looks like this
89...or to Contributor here, in English, French,
German
90Or view the schema of MyRDF itself...
91...itself an RDF schema like the others
92Registries can function as dictionaries
- Historically, dictionaries of English, French,
etc recorded variants, prescribed forms, and
helped standardize (national) languages - Metadata dictionaries can help metadata
vocabularies evolve more like other human
languages - Not just top-down, like traditional standards
- Also bottom-up, in response to usage
93Dictionaries prescribe and describe
- Prescribe definitions and recommend usage
- Describe how terms are actually used
- Monitor usage through collecting examples
- Editors and usage boards must strike a balance
between prescription and description.
94SCHEMAS Project -- a Thin Registry
- http//www.schemas-forum.org, an EU Project
- Pointers to resources elsewhere (a "thin"
registry or portal) - Short descriptions of metadata standards
activities - Critical commentaries by domain experts
- Promote the publication of schemas (in RDF)
- Goal help implementors discover how others (e.g.
EU Projects) are using standards in order to
harmonize usage
95DCMI -- a Thick Registry
- A thick registry stores official metadata
element definitions in a central database or
repository - Managing a namespace (as a standards agency)
publish qualifiers as available, with version
control - Managing translations of the standard in multiple
languages - Eventually
- User guide interface
- Support for standardisation processes (peer
review) - Downloadable input to software tools for
generating, editing, validating DC metadata
96Dictionaries as a tool for harmonization
- Knowledge of how other projects are using
standards will avoid "reinventing the wheel" - To help information providers harmonize their
schemas for improved access within domains - Between countries (Nordic Metadata Project)
- Preprint repositories (Open Archives Initiative)
- Subject gateways (Renardus)
- Theses and dissertations (NDLTD)
- Mathematics and physics (MathNet, PhysNet)
97A global registry infrastructure?
- Analogously to HTML for text, RDF Schema format
suggests a scalable ecology of metadata
vocabularies on the Web - Sharing machine-readable elements translated into
many languages suggests a global (multilingual)
metadata language for digital libraries - Can a well-managed registry infrastructure allow
this language to evolve -- with flexible
innovation in usage alongside more stable
standards?
98The scope of registries
- Anything "semantic" (terms and definitions) is
potentially an RDF schema - controlled vocabularies
- namespaces, application profiles, annotations
- the "schema" of the registry itself
- Application constraints can be modelled in XML
Schemas - "title is mandatory" "date must be after 1980"
- Will XML and RDF Schemas merge?
99Deploying and Using Metadata
100Syntax AlternativesHTML
- Advantages
- Simple Mechanism META tags embedded in content
- Widely deployed tools and knowledge
- Disadvantages
- Limited structural richness (wont support
hierarchical,tree-structured data or entity
distinctions). - Limited formalisms (parsing and schema definition)
101Dublin Core in HTML
ltlink rel"schema.DC" href"http//purl.org/dc"gt
ltmeta name"DC.Title" content"Business Unusual
ltmeta name"DC.Creator" content"Carl Lagoze"gt
ltmeta name"DC.Subject" content"bibliographic
control web cataloging "gt ltmeta name"DC.Date"
scheme"W3CDTF" content"2000-10-23"gt ltmeta
name"DC.Format" content"text/html"gt ltmeta
name"DC.Identifier" content"http//lcweb.loc
.gov/lagoze_paper.html"gt
102Syntax AlternativesXML
- The standard for networked text and data
- Wide-spread tool support
- Parsers (DOM and SAX)
- Extensibility (namespaces)
- Type definition (XML Schema)
- Transformation and Rendering (XSLT)
- Rich linking semantics (XLINK)
103XML Schema
- Rich XML-based language for expressing type
semantics - Replaces arcane and limited DTD (origin in SGML)
- Facilities
- Data typing (both complex and primitive)
- Constraints
- Defaults
104Dublin Core in XML
ltmetadata xmlnsdc"http//www.openarchives.org/O
AI/dc.xsd"gt ltdccreatorgtCarl
Lagozelt/dccreatorgt ltdctitlegtAccommodating
Simplicity and Complexity in
Metadatalt/dctitlegt ltdcdategt2000-07-01lt/dcda
tegt ltdcpublishergtCornell University,
Computer Sciencelt/dcpublishergt lt/metadatagt
105 Syntax AlternativesRDF
- RDF (Resource Description Format)
- The instantiation of the Warwick Framework on the
Web - Provides enabling technology for
richly-structured metadata - Rich data model supporting notions of distinct
entities and properties - Syntax expressed in XML
106RDF Components
- Formal data model
- Syntax for interchange of data
- Schema Type system (schema model)
107RDF Data Model
- Directed labeled graphs
- Model elements
- Resource
- Property
- Value
- Statement
- Containers
108RDF Model Primitives
Resource
Property
Value
109RDF Syntax Example
URIR
Title
CIMI Presentation
Creator
Eric Miller
ltRDF xmlns http//www.w3.org/TR/WD-rdf-syntax
xmlnsdc http//purl.org/dc/element
s/1.0/gt ltDescription about URIRgt
ltdcTitlegt CIMI Presentation lt/dcTitlegt
ltdcCreatorgt Eric Miller lt/dcCreatorgt
lt/Descriptiongt lt/RDFgt
110RDF Model Example 2
URIR
Title
CIMI Presentation
Creator
Eric Miller
111RDF Syntax Example 2
ltRDF xmlns http//www.w3.org/TR/WD-rdf-syntax
xmlnsdc http//purl.org/dc/element
s/1.0/ xmlnsbib http//www.bib.org
/personsgt ltDescription about URIRgt
ltdcTitlegt CIMI Presentation lt/dcTitlegt
ltoaCreatorgt ltDescriptiongt
ltbibNamegt Eric Miller lt/bibNamegt
ltbibEmailgt emiller_at_oclc.org lt/bibEmailgt
ltbibAff resource http//www.oclc.org /gt
lt/Descriptiongt lt/oaCreatorgt
lt/Descriptiongt lt/RDFgt
112RDF Containers
- Permit the aggregation of several values for a
property - Express multiple aggregation semantics
- unordered
- sequential or priority order
- alternative
113RDF Schemas
- Declaration of vocabularies
- properties defined by a particular community
- characteristics of properties and/or constraints
on corresponding values - Schema Type System - Basic Types
- Property, Class, SubClassOf, Domain, Range
- Minimal (but extensible) at this time
- minimize significant clashes with typing system
designed for XML Schema WG - Expressible in the RDF model and syntax
114Relationships among vocabularies
dcCreator
marc100
msdirector
bibAuthor
115Bringing it together
- RDF Metadata transmission
- Embedded (e.g. ltMETAgt), Transmitted with resource
(HTTP), Trusted 3rd Party (HTTP GET) - RDF Data Model
- Support consistent encoding, exchange and
processing of metadata critical when aggregating
data from multiple sources - RDF Schema
- Declare, define, reuse vocabularies
116Open Archives Initiativehttp//www.openarchives.o
rg
117History
- Increasing interest in alternative scholarly
publishing solutions e.g., LANL arXiv - Facilitation through federation
- UPS Mtg., Sante Fe, October 1999
- Representatives of various ePrint, library,
publishing, communities - Goal definition of an interoperability framework
among ePrint providers
118What is Interoperability?
- Naming?
- Handles
- Purls
- Metadata?
- MARC
- Dublin Core
- Document models?
- WebDAV
- Federated searching?
- Z39.50?
- DASL?
- Services and Protocols?
- Dienst
119Partitioning Interoperability
Mediator ServicesLinking, Searching, Summarizing
Metadata Harvesting
Document Models
120The World According to OAI
Service Providers
Searching
Current Awareness
Summarization
harvesting
Data Providers
121UPS Meeting Results
- Establishment of Open Archives Initiative
- Loose coalition to experiment with
interoperability solutions - Santa Fe Convention
- Organizational and technical framework to support
metadata harvesting for ePrint archives
122Metadata Harvesting is not New
- Harvest Project (1992-1995)
- DARPA-funded
- Mike Schwartz (U. Colorado), Mic Bowman (Penn
State), Udi Manber (U. Arizona)
123Open Archives
- Political Agenda?
- Author self-archiving of E-Prints
- Mission to reformulate scholarly publishing
framework - Technical?
- Infrastructure to facilitate interoperability
across multiple domains
124Other communities of interest
- Cambridge digital library federation meetings
- research library community has many materials for
which theyd like to expose metadata - San Antonio OAI workshop
- librarians, publishers (some), others
125Technical Umbrella for Practical Interoperability
Metadata Harvesting
E-PrintArchives
Reference Libraries
Publishers
that can be exploited by different communities
126Acting mission statement
Supply and promote an application independent
technical framework a supportive infrastructure
that empowers different scholarly communities to
pursue their own interests in interoperability in
the technical, legal, business, and
organizational contexts that are appropriate to
them. Dan Greenstein, Director DLF
127What does this REALLY Mean?
- Keep the bar low enough to make widespread
adoption possible - Provide enough back-doors to make true
disruption possible (e.g., ePrint community - refine record notion to mandate full-content
connection - refine metadata to mandate linkage to full-content
128Organizational Stability
- Institutional backing of CNI (Coalition for
Networked Information) and DLF (Digital Library
Federation) - Formation of steering committee
- first steps towards international involvement
129Framework for Partitioning Tasks
- Steering Committee
- policy guidance
- Technical Committee
- technical specifications
- Workshops
- public dissemination, feedback, community-building
130Ithaca Technical Meeting
- Input
- experiences gained with implementing discussing
the current SFc specs - emerging interest for the application of
SFc-concepts as a general interoperability
framework in a scholarly environment
131Ithaca technical meeting
- Output
- guidelines for an in-depth revised technical spec
to be issued early 2001 - stable for experimentation not definitive
- minimize risk for early adopters
- maximize chances for future interoperability
across communities
132Components of OAI Model
underlying concepts
abstract principles
concrete implementation of principles
133OAI Underlying Concepts
managed archives (data providers)
records in an archive
open interface to archives
service providers
134Building on Underlying Concepts
abstract principles
implementation of principle
OAI harvesting protocol
identifiers
URIs (community schemes)
DC XML container (parallel sets)
acceptable use
Flow Control (usage restrictions)
(community specific)
135What is a record?
A record in an archive is a metadata-record. The
metadata record describes and can contain an
entry point to- full-content.
136Metadata Interoperability Extensibility
We recognize that archives will use specific
metadata sets and formats that suit the needs of
their communities and the types of data they
handle. However, interoperability depends on a
shared format for exchanging metadata and
therefore archives should implement the basic
Open Archives Metadata Set.
137Metadata Solutions
- Adoption of unqualified Dublin Core Element Set
as required metadata. - Support for parallel metadata sets maintained
- EPMS (e-print community)
- Others
- Research library community
- Museum community
138Metadata XML Container
ltrecordgt ltheadergt ltidentifiergtoaiarXivhep/
001001lt/identifiergt ltdatestampgt1999-12-25lt/dat
estampgt lt/headergt ltmetadata
xmlnsdchttpgt ltdccreatorgtErnest
Rutherfordlt/dccreatorgt ltdctitlegtInvestigatio
ns of Radioactivity lt/dctitlegt
ltdcidentifiergtdoi1234/5432lt/dcidentifiergt
lt/metadatagtlt/recordgt
139Identifier Issues
- Basic identifier constraints based on URI
specifications - A key for requesting a record from a repository
- Key and metadata format ID uniquely identify a
record - Individual communities may develop URN
registration schemes
140Identifier Solutions
full-identifier oaiarchive-identifierrecord-id
entifier
example oaincstrlncstrl.cornellcs/TR94-1418
141Repositories, Identifiers, and Records
142Selective harvesting
- Recognized need for light-weight facility for
selective harvesting - By Date
- Sets
- A low-cost means of selective harvesting
- NOT a general tool for defining global categories
- Attribution of meanings to sets can be done
within communities and in bilateral fashion
143Protocol Solutions
- Normalized and Enhanced Verb Set
- GetRecord
- Identity
- ListIdentifiers
- ListMetadataFormats
- ListRecords
- ListSets
144Protocol Solutions
- CGI-script friendly syntax
- baseurl?verbverbnameargnameargval...
- verbname is the name of the verb
- argname is the name of the attribute
- argval is the value of the attribute
- Example
- http//foo/blaz?verbListRecordssetS1
145Registration Solutions
- Automation through
- On-line registration of
- Archive identifier (uniqueness enforcement)
- base-url of archives OAI protocol implementation
- Identity verb that exposes archive
characteristics - Use of protocol for registration of metadata
formats and validity checking - Registration of service providers is still an
open issue
146Release Schedule
- October 15 normalized meeting notes distributed
to meeting group - November 1 beta specification to steering
committee and limited distribution - Early January stabilization of specification
and public meeting
147Metadata Landscape
148Conferences
- ACM Digital Libraries 2001, San Antonio, June
2001, http//www.dl00.org/ - European Conference on Digital Libraries,
Darmstadt, Sep 2001 http//www.ecdl2001.org - Asian Digital Library Conference, Seoul, December
2000, http//ADL2000.kaist.ac.kr - Tenth International WWW Conference, Hong Kong,
May 2001, http//www10.org
149NSF Digital Library Initiative
- Phase I (1994-1998) six large-scale testbeds
involving research universities, industrial
partners, and next-generation technologies - Phase II (1999) expanded scope, smaller
projects as well as large testbeds, emphasis on
making accessible new types of content
150Distributed National Electronic Resource (UK)
- A managed environment for Internet access to
scholarly journals and other materials relevant
to higher education in the UK - Uses international standards (eg, Dublin Core)
- National purchase and licensing agreements for
best value to UK education community - eLib research funding since mid-1990s emphasized
incremental improvement of standards and services
151Global Info (Germany)
- "The German Digital Library Project"
- Since 1996, integrating access to scientific
information among libraries, publishers, learned
societies, and individual scientists - Emphasis on open standards (e.g., Dublin Core)
and open-standard formats (e.g., XML, RDF, MPEG)
152European Union
- Fifth Framework Programme, 1998-2002
- several dozen projects with several countries
each - Digital Heritage, Cultural Content
- Interactive Electronic Publishing
- Multimedia Content and Tools
- DELOS Network of Excellence
- http//www.ercim.org/delos/
- Communication within European digital library
research community and international networking
153MathNet
- German Mathematical Societies index math
pre-prints and home pages of mathematicians - Encourages use of Dublin-Core-based metadata by
distributing free metadata editor displays hits
"with metadata" separately from hits "without
metadata" - International Mathematical Union (IMU) planning
international Web service based on German MathNet
model - Seeking international agreement on simple
metadata profiles for types of math materials
154IMS Global Learning Consortium, Inc.
- Teachers seeking appropriate classroom materials
on Web may want to know - for which age-group?
- has it already been used successfully in
classrooms? - will it work on my equipment?
- IMS Rich descriptions of learning resources in a
standard record format
155Federal Geographic Data Committee
- (US) FGDC Content Standard for Digital Geospatial
Metadata integrate access to resources about a
particular area found in diverse repositories - Government, education, and business needs
- Emergency management
- Integrated databases and comprehensive maps
- City planning
- Environmental control
156Visual Resources Association
- VRA Core Categories in a two-level model for
describing objects such as paintings and
buildings - "Works" described separately from "images" of
those works (One-to-One Principle) - Conceptual clarity of One-to-One Principle
implies more complex work-flow and processing for
catalogers and software
157Nordic Metadata Project
- Cooperation between Scandinavian countries (since
circa 1996) - Pioneered idea of metadata-based distributed
index across national boundaries - NetLab (Lund University) maintains SAFARI, which
harvests Dublin-Core-based metadata embedded in
documents on Web servers
158Renardus Project (EU)
- http//www.konbib.nl/coop/reynard
- National libraries (Netherlands coordinates)
- NDR National Digital Resource in UK
- Die Deutsche Bibliothek
- Goal integrated access to subject gateways in
Europe - High-level agreement on simple, Dublin-Core-based
schema as common denominator
159Networked Digital Library of Theses and
Dissertations (NDLTD)
- http//www.ndltd.org
- International consortium of projects putting
dissertations online - Difficult to agree on single unified metadata
schema -- national, legal, and disciplinary
requirements differ significantly - NDLTD agreement on a small Dublin-Core-based set
of metadata elements?
160CIDOC
- International Council of Museums object-oriented
model (CIDOC) designed for describing multiple
entities that may be - physical (e.g., museum objects)
- conceptual (e.g., works)
- temporal (e.g., historical periods)
- spatial (e.g., places)
- Implies an integrated information space of
"encyclopedic" scope
161Rich Site Summary (RSS)
- Metadata for content syndication (news feeds)
- Used in developing media content portals
- Built on established vocabularies (DC), uses RDF
syntax - Layers of application-specific semantics
syndication vocabularies, annotation
vocabularies, etc.
162Moving Picture Experts Group (MPEG)
- MPEG 4 encoding and interacting with
audio-visual objects - MPEG 7 multimedia content description interface
for such objects - MPEG 21 ambitious "umbrella" framework
describing the infrastructure for delivering and
consuming multimedia content
163More...
- INDECS - Uses an event-based model to describe
intellectual property rights for commercial
transactions - DOI - Uses the INDECS framework with a Digital
Object Identifier for content description and
management of references between scientific,
technical, and medical journals - BSR - Basic Semantic Registry as a universal
interlingua of concepts - GILS - Government Information Locator Service
164...and more...
- PDS - Planetary Data System
- IEEE Learning Object Metadata - an elaborate,
hierarchical scheme for describing multiple
facets of educational material - MARC 21 - Machine Readable Cataloging format and
related vocabularies for libraries - EPICS Data Dictionary, a subset of which -- ONIX
-- describes books in a specific XML format
(pushed by Amazon.com)
165For further information....
- "Metadata Watch Reports" of SCHEMAS Project,
http//www.schemas-forum.org - Critical overview (with expert commentary) on the
metadata landscape as it evolves - Related database of individual activity reports
- D-Lib Magazine, http//www.dlib.org/dlib/
- Ariadne, http//www.ariadne.ac.uk
166Why the Web won
- Tim Berners-Lee's original model was very simple,
and it was easy to implement - Real-world experience with simple HTML led
iteratively to better understanding of priorities - As with bicycles and airplanes, there was no
"theory" for design -- design was perfected
iteratively, starting simple - Complex standards impose significant costs,
especially if legacy data must be converted
167Learning from experience
- People are only human the most perfect language
is always subject to interpretation - By design, metadata languages must allow for
innovation and evolution - Physics and art history, Chinese and Finnish --
different languages will continue in real life - Likewise, a diversity of metadata languages is
inevitable - Interoperability over "everything" can only be
via a simple and general pidgin
168thomas.baker_at_gmd.de