IFLADELOSNSF Workshop Standards and Metadata - PowerPoint PPT Presentation

1 / 159
About This Presentation
Title:

IFLADELOSNSF Workshop Standards and Metadata

Description:

Digital Library Research Group, Faculty of Computing and Information, Cornell ... using simple phrases to order beer ('zwei Bier bitte' 'dva pivo' 'biru o san bai' ... – PowerPoint PPT presentation

Number of Views:137
Avg rating:3.0/5.0
Slides: 160
Provided by: carl273
Category:

less

Transcript and Presenter's Notes

Title: IFLADELOSNSF Workshop Standards and Metadata


1
IFLA/DELOS/NSF WorkshopStandards and Metadata
  • EVA 2000 MoscowNovember 2, 2000

2
Introductions
  • Thomas Baker
  • GMD Library, Bonn, Germany
  • Dublin Core Executive Committee
  • EU DELOS Network of Excellence
  • Carl Lagoze
  • Digital Library Research Group, Faculty of
    Computing and Information, Cornell University,
    Ithaca, NY, USA
  • Dublin Core Advisory Committee
  • NSF Digital Library Initiative

3
Workshop Roadmap
  • Introduction to Metadata (30 min.)
  • Dublin Core Metadata Initiative (60 min.)
  • Break
  • Simplicity and Complexity (45 min.)
  • Metadata Infrastructure (45 min.)
  • Lunch
  • Deploying and Using Metadata (90 min.)
  • Metadata Landscape (30 min.)

4
Introduction to Metadata
5
Havent we done metadata already?
6
Whats wrong with this model?
  • Expensive
  • Complex (even for its original goal?)
  • Professional intervention (assumes single
    community of expertise)
  • Monolithic
  • One size fits all approach
  • Reflects its centralized system origins
  • Bias towards physical artifacts
  • Fixed resources
  • Incomplete handling of resource evolution and
    other resource relationships

7
Internet Commons includes Multiple Communities
8
Web Challenge to Traditional Cataloging
  • Scale
  • Permanence
  • Authenticity
  • Organizational Context
  • Variety

9
State of the Web as an Information System
  • Search systems are motivated by advertising
  • Index coverage is unpredictable and limited (1/3)
  • Too much recall, too little precision
  • Index spam abounds
  • Resources (and their names) are volatile
  • What about versions, editions, back issues?
  • Archiving is presently unsolved
  • Authority and quality of service are spotty
  • Managing Intellectual Property Rights is hard

10
Metadata Part of a Solution
  • Structured data about data
  • helps to impose order on chaos
  • enables automated discovery/manipulation
  • Variety across various dimension
  • specialization
  • decentralization
  • democratization

11
Metadata Takes Many Forms
12
Metadata Challenges
  • Accommodate multiple varieties of metadata
  • Tension functionality and simplicity
  • Tension extensibility and interoperability
  • Human and machine creation and use
  • Community-specific functionality, creation,
    administration, access

13
Warwick Framework Containing Chaos
  • Conceptual Architecture for metadata from the
    Warwick Metadata Workshop (DC-2)
  • Conceptual architecture to support the
    specification, collection, encoding, and exchange
    of modular metadata
  • Provide context for metadata efforts (including
    Dublin Core)
  • avoids the black-hole of comprehensive element
    sets
  • focuses interoperability issues at package level

14
Modularization Allows Distributed Management
  • Communities of expertise (not software vendors)
    are responsible for
  • Semantics
  • Registration
  • Administration
  • Access management
  • Authority of data
  • Sharing and Distribution

15
Interoperabilityrequires conventions about
  • Semantics
  • The meaning of the elements
  • Structure
  • human-readable
  • machine-parseable
  • Syntax
  • grammars to convey semantics and structure

16
Dublin Core Metadata Initiative
17
History of the Dublin Core
  • 1994 "Do we have a simple set of tags for
    ordinary people to describe their Web pages?"
  • 1995 The Dublin Core 13 elements, later 15
  • 1996 The Dublin Core is but one of many
    vocabularies needed ("Warwick Framework")
  • 1997 "WF needs formal expression in a Resource
    Description Framework (RDF)"
  • 2000 Dublin Core Metadata Initiative recommends
    qualifiers, broadens its organizational scope
    beyond the Core

18
A pidgin for digital tourists
  • Metadata is language.
  • Dublin Core is a small and simple language -- a
    pidgin -- for finding resources across domains.
  • Speakers of different languages naturally
    "pidginize" to communicate
  • E.g., tourists using simple phrases to order beer
    ("zwei Bier bitte" "dva pivo" "biru o san
    bai"...)
  • We are all "tourists" on the global Internet.

19
A grammar of Dublin Core
  • http//www.dlib.org/dlib/october00/baker/10baker.h
    tml
  • By design not as subtle as mother tongues, but
    easy to learn and extremely useful in practice
  • Pidgins small vocabularies (Dublin Core fifteen
    special nouns and lots of optional adjectives)
  • Simple grammars sentences (statements) follow a
    simple fixed pattern...

20
Example Dublin Core statements
  • Resource has Title 'Grammar of Dublin Core'.
  • Resource has Creator 'Tom Baker'.
  • Resource has Subject 'Metadata'.
  • Resource has Relation http//foo.org/file.htm.

21
implied verb
one of 15 properties
property value (an appropriate literal)
DCCreator DCTitle DCSubject DCDate...
implied subject
Resource
has
property
X
qualifiers (adjectives)
optional qualifier
optional qualifier
22
The fifteen special nouns (properties)
23
Resource
has
Subject
"Languages -- Grammar"
LCSH
Resource
has
Date
"2000-06-13"
Revised
ISO8601
24
Dumb-Down Principle for qualifiers
  • The fifteen elements should be usable and
    understandable with or without the qualifiers
  • Like saying that nouns can stand on their own
    without adjectives
  • If your software encounters an unfamiliar
    qualifier, look it up -- or just ignore it!

25
To test whether qualifiers are "good", cover
them with your hand and ask -- Does the
statement still make sense? -- Is it still
correct?
Resource
has
Subject
"Languages -- Grammar"
LCSH
Resource
has
Date
"2000-06-13"
Revised
ISO8601
26
Element Refinements
  • Make the meaning of an element narrower or more
    specific.
  • a Date Created versus a Date Modified
  • an IsReplacedBy Relation versus a Replaces
    Relation
  • If your software does not understand the
    qualifier, you can safely ignore it.

27
Value Encoding Schemes
  • Says that the value is
  • a term from a controlled vocabulary (e.g.,
    Library of Congress Subject Headings)
  • a string formatted in a standard way (e.g.,
    "2000-05-03" means May 3, not March 5)
  • Even if a scheme is not known by software, the
    value should be "appropriate" and usable for
    resource discovery.

28
Peer review of proposals for new terms
  • DCMI Usage Committee reviews proposals for new
    qualifiers (and perhaps elements)
  • Evaluates proposals in light of grammatical
    principles (are the qualifiers ignorable?)
  • Tiered model of approval status (tentative)
    proposed, conforming, recommended, obsolete
  • First qualifiers "recommended" in July 2000
  • http//purl.org/DC/documents/rec/dcmes-qualifiers-
    20000711.htm

29
A not-so-good example
Resource
has
Creator
"Last.name Smith
First.name John
Type Person
Affiliation IBM"
30
Open questions in Dublin Core
  • What are "appropriate values" for the fifteen
    properties? How can they be used for
    cross-domain searching?
  • How can DCMI control the evolution of Dublin Core
    as it is adapted in practice?
  • How can an application use DC as a pidgin while
    describing resources with more complex metadata?
  • Can we keep the Core simple?

31
Search buckets versus description
  • Think of DC elements as fuzzy search buckets
  • Different types of data appropriate for different
    buckets URLs, date strings, word strings, names
  • Separate books about Sigmund Freud versus books
    by Sigmund Freud into different buckets
  • Search bucket for discovering resources
  • But general, fuzzy categories may not be
    sufficient for describing resources
  • After searching, display more detailed
    descriptions on screen

32
DCMI broadens its mission (Oct 2000)
  • The mission of the DCMI is to make it easier to
    find resources using the Internet through the
    following activities
  • Developing metadata standards for discovery
    across domains (example the Dublin Core)
  • Defining frameworks for the interoperation of
    metadata sets
  • Facilitating the development of community or
    disciplinary specific metadata sets that are
    consistent with items 1 and 2

33
A context for the Core
  • If "the Dublin Core" is the core of DCMI, what is
    the surrounding context?
  • If "the Dublin Core" is the simple pidgin, what
    is the broader landscape of metadata language?
  • How do pidgins relate to more complex models or
    "application profiles"?
  • Do we need pidgins for describing other things,
    such as "people" and "events"?

34
Using DC with other vocabularies
  • Specialized application profiles government
    information, education, mathematics may need to
  • Use general-purpose Dublin Core elements
  • Use elements from another, more domain-specific
    standard
  • Narrow standard definitions of DC elements for
    specific local uses
  • Invent local elements outside the scope of
    existing standards

35
Namespaces versus Profiles
  • Namespaces declare terms and definitions
  • Dublin Core namespace Dublin Core standard
  • Application profiles (only) re-use terms from
    namespaces
  • May package terms from multiple namespaces
  • May adapt definitions to local purposes
  • All terms must be defined in namespaces

36
Adapting standard definitions to local uses
  • Dublin Core Namespace
  • DCTitle - machine-readable name of an element
  • "Title A name given to the resource" --
    human-readable name and definition
  • Collection Description Profile (UKOLN)
  • DCTitle - name reused from the DC namespace
  • "Title A name given to the collection"
  • Definition is modified for the application context

37
Example adapting DCTitle to local uses
  • As defined in the official Dublin Core
    "namespace"
  • "Title A name given to the resource"
  • As defined in a UK "application profile"
  • "Title A name given to the collection"
  • Definition is narrower

38
Profiles may model multiple entities
  • "Resource" (a thing) as an entity with its own
  • Title (dctitle)
  • Date created (dcdate dcqcreated)
  • Identifier (dcidentifier)
  • "Agent" (a person) with its own
  • Name (vcardfn)
  • Date of birth (vcardbday)
  • Identifier (dcidentifier)

39
Namespaces in translation
  • Dublin Core has been translated into 26 languages
  • machine-readable tokens are shared by all
  • human-readable labels are defined in different
    languages
  • translations are distributed, maintained in many
    countries

40
One token - labels in many languages
dccreator
Server in Germany
DCMI Server
Server in Jakarta
41
RDF -- a more powerful sentence pattern
  • Dublin Core statements
  • Resource has Creator "Tom Baker".
  • Resource has Identifier http//foo.org/bar.html.
  • Resource Description Framework "triples" - a more
    powerful way to say the same thing
  • http//foo.org/bar.htm has Creator "Tom Baker".

42
implied verb
one of 15 properties
property value (an appropriate literal)
DCCreator DCTitle DCSubject DCDate...
implied subject
Resource
has
property
X
qualifiers (adjectives)
optional qualifier
optional qualifier
43
predicate
implied verb "has"
property (from any vocabulary)
object, also known as "property value" (a literal
-- or another resource)
explicit subject
Property
"X"
Resource
44
DCMI Re-organization
  • Expanded mission
  • Core metadata elements for Agents (or Events)?
  • Frameworks for integrating multiple standards
  • Re-organization model
  • Membership organization like W3C or Unicode
    Consortium?
  • Retain open consensus model
  • International perspective
  • Better training, documentation, outreach

45
DCMI Open Metadata Registry
  • Managing vocabularies defined by the DCMI
  • Languages
  • Versioning
  • Controlled vocabularies
  • Foundation for modular, incremental integration
    and evolution
  • Collaboration with European SCHEMAS Project and
    ULIS in Tsukuba, Japan
  • http//wip.dublincore.org/registry/

46
Official recognition of the Dublin Core
  • CEN Workshop Agreement
  • endorse Dublin Core elements as CWA13874
  • provide usage guidelines for European industry
  • NISO Z39.85
  • National Information Standards Organization, an
    ANSI affiliate
  • Balloting concluded in August 2000

47
DCMI Activities
  • Standards development and maintenance
  • Metadata registry
  • Technical working groups and periodic workshops
  • Tutorial materials and user guides
  • Education and training
  • Access to software
  • Liaisons with other standards or user communities

48
DC-9 Workshop in Tokyo, 2001
  • DC-8 Workshop was a National Library of Canada
    (Ottawa)
  • emphasis on application profiles, longer-term
    organizational mission, and domain-specific
    adaptations of Dublin Core
  • DC-9 in Tokyo well-defined tracks
  • implementation reports and research papers
  • ongoing technical working group meetings
  • general introduction and tutorials for non-experts

49
Simplicity and Complexity
50
Warwick Framework
  • Container/Package approach to metadata
  • Rejection of universal ontology
  • Recognition of individual community needs
  • Provide scope for metadata efforts

51
Warwick Framework Design
Container

Package Dublin Core
  • Containers for aggregating Packages of typed
    metadata sets

Package MARC Metadata
URI
Package Terms and Conditions
Package Indirect Reference
52
Warwick FrameworkImplementation and Research
  • Packaging, linking, storing, and transmitting
    component/package framework
  • Semantic interactions and interoperability among
    multiple metadata packages/vocabularies

53
Interoperability among Metadata Vocabularies
- projections to application-specific metadata
vocabularies
abc core classes
54
Harmony Project
  • Project Investigators
  • Dan Brickley - ILRT, Bristol (U.K.)
  • Jane Hunter - DSTC, Brisbane (Australia)
  • Carl Lagoze - Computer Science, Cornell (U.S.)
  • More Information
  • http//www.ilrt.bris.ac.uk/discovery/harmony/

55
Attribute/Value approaches to metadata
The playwright of Hamlet was Shakespeare
Hamlet has a creator
Shakespeare
56
run into problems for richer descriptions
The playwright of Hamlet was Shakespeare,who was
born in Stratford
Hamlet has a creator
Shakespeare
57
because of their failure to model entity
distinctions
Shakespeare
name
R1
R2
creator
birthplace
title
Stratford
Hamlet
58
Applying a Model-Centric Approach
  • Formally define common entities and relationships
    underlying multiple metadata vocabularies
  • Describe them (and their inter-relationships) in
    a simple logical model
  • Provide the framework for extending these common
    semantics to domain and application-specific
    metadata vocabularies.

59
Applications of the ABC Model
  • Guidance for communities developing vocabularies
  • Foundation for understanding existing
    vocabularies
  • Basis for mappings among vocabularies using
    formalisms such as RDF

60
Harmony/ABC Workshop
  • January 27-28 2000 CNI Washington
  • Representatives from
  • Dublin Core, INDECS, MPEG-7, IFLA
  • Archives, Museums, Libraries, Audiovisual
  • Result Importance of processes, events, and
    states in understanding and describing resources

61
Conceptual BasisEvolution of Content over Time
IFLA Entity Model
From Bearman, et. al., D-Lib Magazine, January
1999.
62
Events help metadata relationships?
  • Recognizing inherent lifecycle aspects of digital
    content - transformation of input resources to
    output resources and of their descriptions.
    (e.g., IFLA model)
  • Modeling implied events as first-class objects
    provides attachment points for common entities
    e.g., agents, contexts (times places), roles.
  • Clarifying attachment points facilitates mapping
    across common entities in different vocabularies.

63
Content, Events, Descriptions
64
ABC Event Model
65
A Simple ExampleLive At Lincoln Performance
  • Performance at The Lincoln Center for the
    Performing Arts
  • On April 7, 1998 at 8pm Eastern time
  • Orchestra is New York Philharmonic
  • Musical score Concerto for Violin
  • 130 minute MP3 audio recording
  • Rights held by Lincoln Center

66
Example in ABC Model
67
ltABCgt ltEvent id"E1" Type"Performance"gt
ltTitlegtLive At the Lincoln Centrelt/Titlegt
ltContextgt ltDategt7/4/98lt/Dategt
ltTimegt2000lt/Timegt ltPlacegtLincoln
Centrelt/Placegt lt/Contextgt ltAct
id"Act1"gt ltAgentgtNew York
Philharmoniclt/Agentgt ltRolegtOrchestralt/Rolegt
lt/Actgt ltInput id"comp523"/gt
ltOutput id"audio8215"/gt ltRightsgt
Lincoln Center for Performing Arts
lt/Rightsgt lt/Eventgt lt/ABCgt
68
Derivation of Multiple Views
Dublin Core in XML/RDF
ABC Description in XML
ID3 tags embedded in MP3
MPEG-7 description in DDL
  • CIDOC CRM Model

69
Step 1 Structural Mapping
Event-aware model
Resource-centric model
70
Structural Mapping Rules
  • Event attributes transferred to output
  • Context/Date, /Time, /Place -gt Date.Performance,
    Time.Performance, Place.Performance
  • Act/Role -gt Agent.Role e.g. Orchestra
  • Event Type -gt Relation between input ouput
  • e.g. Performance -gtRelation.isPerformanceOf
  • Output Description generated from event Type and
    input Title e.g. Performance of Concerto for
    Violin

71
  • ltResource id"audio8215"gt ltTitlegtLive At Lincoln
    Centerlt/Titlegt ltDate.Performancegt1998-07-04
  • lt/Date.Performancegt ltTime.Performancegt20
    00lt/Time.Performancegt ltPlace.PerformancegtLincoln
    Centre
  • lt/Place.Performancegt ltAgent.OrchestragtNew
    York Philharmonic
  • lt/Agent.Orchestragt
  • ltRelation.isPerformanceOfgtcomp523
  • lt/Relation.isPerformanceOfgt
    ltDescriptiongtPerformance of 'Concerto for
    Violin'lt/Descriptiongt ltRightsgtLincoln Center for
    Performing Arts lt/Rightsgt
    ltTypegtaudiolt/Typegt ltFormatgtMP3lt/Formatgt ltLength
    units"mins"gt130lt/Lengthgt
  • lt/Resourcegt

72
Step 2 Semantic Mapping
73
XSLT for Transformations
  • Works well for structural and syntactic mapping
    between metadata descriptions
  • Semantic mappings need to be hardcoded
  • Unsuitable for loosely constrained or variable
    input

74
A More General Solution
  • Flexible semantic mappings require additional
    knowledge
  • Metadata Term Ontology MetaNet
  • Methods for using that context knowledge for
    mapping
  • Some combination of procedural language (Java)
    and XSLT
  • Investigating more general mapping rule language
    (analogies to compiler technology)

75
Planned Experimental Context
  • CIMI Experiments
  • Dublin Core for basic resource descriptions
  • Richer descriptions derived from ABC model
  • Mapping among descriptions
  • Understanding relationship between ABC and CIDOC
    CRM
  • Connecting with Recordkeeping Metadata Issue -
    SPIRT Project

76
Metadata Infrastructure
77
Metadata is language
  • Metadata schemas are languages for making
    statements about resources
  • Book has Title "Gone with the Wind".
  • Web page has Publisher "Springer Verlag".
  • Vocabulary terms (elements) are defined in
    standards like Dublin Core
  • Metadata grammars constrain the statements and
    data models one can form

78
But languages evolve with use
  • Inevitably, languages resist stability
  • People stretch official definitions
  • Implementers misunderstand the intended meaning
    or use of elements
  • Implementors coin local terms and extensions
  • If the application does not fit the standard, the
    standard is often "customized" to fit the
    application

79
Metadata languages are "multilingual"
  • Metadata is not a spoken language
  • The words of metadata -- "elements" -- are
    symbols that stand for concepts expressible in
    multiple natural languages
  • Standards may have dozens of translations
  • Are concepts like "title", "author", or "subject"
    used the same way in English, Finnish, and Korean?

80
What metadata languages lack
  • Comprehensive dictionaries
  • Where can one get an overview of vocabulary terms
    used in metadata languages?
  • A publication context for implementers
  • Where can you see how they are using metadata?
  • Standard grammars
  • How do we understand the principles of metadata?

81
Can we manage this evolution?
  • How can we (scalably) monitor the usage of a
    language that is
  • Never spoken?
  • Rarely published in a way that can be harvested?
  • How can dictionary editors help a metadata
    language evolve and grow in response to usage?
  • How can this evolution occur across (human)
    languages?

82
RDF Schemas (RDFS) -- W3C standard
  • A dictionary format for metadata terms
  • Simple XML format for terms and definitions
  • Example "Title" (Dublin Core)
  • Human-readable label and definition
  • Title A name given to the resource.
  • Unique, machine-readable identifiers
  • dctitle
  • Support for cross-references
  • between terms in related standards
  • between local adaptations and related standards

83
Print world versus the Web
  • Traditional print world
  • Standards are currently defined and published as
    paper documents or Web pages in HTML
  • Metadata implementors rarely publish their local
    extensions and adaptations
  • RDF Schemas (RDFS)
  • Web-based publication format
  • Explicit cross references from implementation
    schemas and the standards on which they are based

84
EOR -- an RDF Schema Browser
  • Harvests RDF Schemas
  • Schemas distributed on multiple Web servers
  • Creates huge database of schemas for searching
  • Web interface functions as a "metadata browser"
  • Click on cross-references between linked terms
  • Downloadable as open source software
  • http//eor.dublincore.org/index.html
  • Authors Eric Miller (OCLC, RDF Working Group,
    DCMI) and Tod Matola

85
Hyperlink Metadata Terms over the Web
  • Index of metadata terms searchable as one huge
    database
  • Click on cross-references to follow term-to-term
    links between vocabularies
  • Point-to-point, like the Web itself
  • In 1992, Gopher located the right file within
    directory trees (but not points within the file)
  • HTML enabled point-to-point links between
    documents

86
"Editor" -- a MARC relator -- refines
"Contributor"
87
Follow the link to MARC Relator Terms
88
...the source of which looks like this
89
...or to Contributor here, in English, French,
German
90
Or view the schema of MyRDF itself...
91
...itself an RDF schema like the others
92
Registries can function as dictionaries
  • Historically, dictionaries of English, French,
    etc recorded variants, prescribed forms, and
    helped standardize (national) languages
  • Metadata dictionaries can help metadata
    vocabularies evolve more like other human
    languages
  • Not just top-down, like traditional standards
  • Also bottom-up, in response to usage

93
Dictionaries prescribe and describe
  • Prescribe definitions and recommend usage
  • Describe how terms are actually used
  • Monitor usage through collecting examples
  • Editors and usage boards must strike a balance
    between prescription and description.

94
SCHEMAS Project -- a Thin Registry
  • http//www.schemas-forum.org, an EU Project
  • Pointers to resources elsewhere (a "thin"
    registry or portal)
  • Short descriptions of metadata standards
    activities
  • Critical commentaries by domain experts
  • Promote the publication of schemas (in RDF)
  • Goal help implementors discover how others (e.g.
    EU Projects) are using standards in order to
    harmonize usage

95
DCMI -- a Thick Registry
  • A thick registry stores official metadata
    element definitions in a central database or
    repository
  • Managing a namespace (as a standards agency)
    publish qualifiers as available, with version
    control
  • Managing translations of the standard in multiple
    languages
  • Eventually
  • User guide interface
  • Support for standardisation processes (peer
    review)
  • Downloadable input to software tools for
    generating, editing, validating DC metadata

96
Dictionaries as a tool for harmonization
  • Knowledge of how other projects are using
    standards will avoid "reinventing the wheel"
  • To help information providers harmonize their
    schemas for improved access within domains
  • Between countries (Nordic Metadata Project)
  • Preprint repositories (Open Archives Initiative)
  • Subject gateways (Renardus)
  • Theses and dissertations (NDLTD)
  • Mathematics and physics (MathNet, PhysNet)

97
A global registry infrastructure?
  • Analogously to HTML for text, RDF Schema format
    suggests a scalable ecology of metadata
    vocabularies on the Web
  • Sharing machine-readable elements translated into
    many languages suggests a global (multilingual)
    metadata language for digital libraries
  • Can a well-managed registry infrastructure allow
    this language to evolve -- with flexible
    innovation in usage alongside more stable
    standards?

98
The scope of registries
  • Anything "semantic" (terms and definitions) is
    potentially an RDF schema
  • controlled vocabularies
  • namespaces, application profiles, annotations
  • the "schema" of the registry itself
  • Application constraints can be modelled in XML
    Schemas
  • "title is mandatory" "date must be after 1980"
  • Will XML and RDF Schemas merge?

99
Deploying and Using Metadata
100
Syntax AlternativesHTML
  • Advantages
  • Simple Mechanism META tags embedded in content
  • Widely deployed tools and knowledge
  • Disadvantages
  • Limited structural richness (wont support
    hierarchical,tree-structured data or entity
    distinctions).
  • Limited formalisms (parsing and schema definition)

101
Dublin Core in HTML
ltlink rel"schema.DC" href"http//purl.org/dc"gt
ltmeta name"DC.Title" content"Business Unusual
ltmeta name"DC.Creator" content"Carl Lagoze"gt
ltmeta name"DC.Subject" content"bibliographic
control web cataloging "gt ltmeta name"DC.Date"
scheme"W3CDTF" content"2000-10-23"gt ltmeta
name"DC.Format" content"text/html"gt ltmeta
name"DC.Identifier" content"http//lcweb.loc
.gov/lagoze_paper.html"gt
102
Syntax AlternativesXML
  • The standard for networked text and data
  • Wide-spread tool support
  • Parsers (DOM and SAX)
  • Extensibility (namespaces)
  • Type definition (XML Schema)
  • Transformation and Rendering (XSLT)
  • Rich linking semantics (XLINK)

103
XML Schema
  • Rich XML-based language for expressing type
    semantics
  • Replaces arcane and limited DTD (origin in SGML)
  • Facilities
  • Data typing (both complex and primitive)
  • Constraints
  • Defaults

104
Dublin Core in XML
ltmetadata xmlnsdc"http//www.openarchives.org/O
AI/dc.xsd"gt   ltdccreatorgtCarl
Lagozelt/dccreatorgt ltdctitlegtAccommodating
Simplicity and Complexity in
Metadatalt/dctitlegt ltdcdategt2000-07-01lt/dcda
tegt       ltdcpublishergtCornell University,
Computer Sciencelt/dcpublishergt lt/metadatagt    
105
Syntax AlternativesRDF
  • RDF (Resource Description Format)
  • The instantiation of the Warwick Framework on the
    Web
  • Provides enabling technology for
    richly-structured metadata
  • Rich data model supporting notions of distinct
    entities and properties
  • Syntax expressed in XML

106
RDF Components
  • Formal data model
  • Syntax for interchange of data
  • Schema Type system (schema model)

107
RDF Data Model
  • Directed labeled graphs
  • Model elements
  • Resource
  • Property
  • Value
  • Statement
  • Containers

108
RDF Model Primitives
Resource
Property
Value
109
RDF Syntax Example
URIR
Title
CIMI Presentation
Creator
Eric Miller
ltRDF xmlns http//www.w3.org/TR/WD-rdf-syntax
xmlnsdc http//purl.org/dc/element
s/1.0/gt ltDescription about URIRgt
ltdcTitlegt CIMI Presentation lt/dcTitlegt
ltdcCreatorgt Eric Miller lt/dcCreatorgt
lt/Descriptiongt lt/RDFgt
110
RDF Model Example 2
URIR
Title
CIMI Presentation
Creator
Eric Miller
111
RDF Syntax Example 2
ltRDF xmlns http//www.w3.org/TR/WD-rdf-syntax
xmlnsdc http//purl.org/dc/element
s/1.0/ xmlnsbib http//www.bib.org
/personsgt ltDescription about URIRgt
ltdcTitlegt CIMI Presentation lt/dcTitlegt
ltoaCreatorgt ltDescriptiongt
ltbibNamegt Eric Miller lt/bibNamegt
ltbibEmailgt emiller_at_oclc.org lt/bibEmailgt
ltbibAff resource http//www.oclc.org /gt
lt/Descriptiongt lt/oaCreatorgt
lt/Descriptiongt lt/RDFgt
112
RDF Containers
  • Permit the aggregation of several values for a
    property
  • Express multiple aggregation semantics
  • unordered
  • sequential or priority order
  • alternative

113
RDF Schemas
  • Declaration of vocabularies
  • properties defined by a particular community
  • characteristics of properties and/or constraints
    on corresponding values
  • Schema Type System - Basic Types
  • Property, Class, SubClassOf, Domain, Range
  • Minimal (but extensible) at this time
  • minimize significant clashes with typing system
    designed for XML Schema WG
  • Expressible in the RDF model and syntax

114
Relationships among vocabularies
dcCreator
marc100
msdirector
bibAuthor
115
Bringing it together
  • RDF Metadata transmission
  • Embedded (e.g. ltMETAgt), Transmitted with resource
    (HTTP), Trusted 3rd Party (HTTP GET)
  • RDF Data Model
  • Support consistent encoding, exchange and
    processing of metadata critical when aggregating
    data from multiple sources
  • RDF Schema
  • Declare, define, reuse vocabularies

116
Open Archives Initiativehttp//www.openarchives.o
rg
117
History
  • Increasing interest in alternative scholarly
    publishing solutions e.g., LANL arXiv
  • Facilitation through federation
  • UPS Mtg., Sante Fe, October 1999
  • Representatives of various ePrint, library,
    publishing, communities
  • Goal definition of an interoperability framework
    among ePrint providers

118
What is Interoperability?
  • Naming?
  • Handles
  • Purls
  • Metadata?
  • MARC
  • Dublin Core
  • Document models?
  • WebDAV
  • Federated searching?
  • Z39.50?
  • DASL?
  • Services and Protocols?
  • Dienst

119
Partitioning Interoperability
Mediator ServicesLinking, Searching, Summarizing
Metadata Harvesting
Document Models
120
The World According to OAI
Service Providers
Searching
Current Awareness
Summarization
harvesting
Data Providers
121
UPS Meeting Results
  • Establishment of Open Archives Initiative
  • Loose coalition to experiment with
    interoperability solutions
  • Santa Fe Convention
  • Organizational and technical framework to support
    metadata harvesting for ePrint archives

122
Metadata Harvesting is not New
  • Harvest Project (1992-1995)
  • DARPA-funded
  • Mike Schwartz (U. Colorado), Mic Bowman (Penn
    State), Udi Manber (U. Arizona)

123
Open Archives
  • Political Agenda?
  • Author self-archiving of E-Prints
  • Mission to reformulate scholarly publishing
    framework
  • Technical?
  • Infrastructure to facilitate interoperability
    across multiple domains

124
Other communities of interest
  • Cambridge digital library federation meetings
  • research library community has many materials for
    which theyd like to expose metadata
  • San Antonio OAI workshop
  • librarians, publishers (some), others

125
Technical Umbrella for Practical Interoperability
Metadata Harvesting
E-PrintArchives
Reference Libraries
Publishers
that can be exploited by different communities
126
Acting mission statement
Supply and promote an application independent
technical framework a supportive infrastructure
that empowers different scholarly communities to
pursue their own interests in interoperability in
the technical, legal, business, and
organizational contexts that are appropriate to
them. Dan Greenstein, Director DLF
127
What does this REALLY Mean?
  • Keep the bar low enough to make widespread
    adoption possible
  • Provide enough back-doors to make true
    disruption possible (e.g., ePrint community
  • refine record notion to mandate full-content
    connection
  • refine metadata to mandate linkage to full-content

128
Organizational Stability
  • Institutional backing of CNI (Coalition for
    Networked Information) and DLF (Digital Library
    Federation)
  • Formation of steering committee
  • first steps towards international involvement

129
Framework for Partitioning Tasks
  • Steering Committee
  • policy guidance
  • Technical Committee
  • technical specifications
  • Workshops
  • public dissemination, feedback, community-building

130
Ithaca Technical Meeting
  • Input
  • experiences gained with implementing discussing
    the current SFc specs
  • emerging interest for the application of
    SFc-concepts as a general interoperability
    framework in a scholarly environment

131
Ithaca technical meeting
  • Output
  • guidelines for an in-depth revised technical spec
    to be issued early 2001
  • stable for experimentation not definitive
  • minimize risk for early adopters
  • maximize chances for future interoperability
    across communities

132
Components of OAI Model
underlying concepts
abstract principles
concrete implementation of principles
133
OAI Underlying Concepts
managed archives (data providers)
records in an archive
open interface to archives
service providers
134
Building on Underlying Concepts
abstract principles
implementation of principle
OAI harvesting protocol
identifiers
URIs (community schemes)
DC XML container (parallel sets)
acceptable use
Flow Control (usage restrictions)
(community specific)
135
What is a record?
A record in an archive is a metadata-record. The
metadata record describes and can contain an
entry point to- full-content.
136
Metadata Interoperability Extensibility
We recognize that archives will use specific
metadata sets and formats that suit the needs of
their communities and the types of data they
handle. However, interoperability depends on a
shared format for exchanging metadata and
therefore archives should implement the basic
Open Archives Metadata Set.
137
Metadata Solutions
  • Adoption of unqualified Dublin Core Element Set
    as required metadata.
  • Support for parallel metadata sets maintained
  • EPMS (e-print community)
  • Others
  • Research library community
  • Museum community

138
Metadata XML Container
ltrecordgt ltheadergt ltidentifiergtoaiarXivhep/
001001lt/identifiergt ltdatestampgt1999-12-25lt/dat
estampgt lt/headergt ltmetadata
xmlnsdchttpgt ltdccreatorgtErnest
Rutherfordlt/dccreatorgt ltdctitlegtInvestigatio
ns of Radioactivity lt/dctitlegt
ltdcidentifiergtdoi1234/5432lt/dcidentifiergt
lt/metadatagtlt/recordgt
139
Identifier Issues
  • Basic identifier constraints based on URI
    specifications
  • A key for requesting a record from a repository
  • Key and metadata format ID uniquely identify a
    record
  • Individual communities may develop URN
    registration schemes

140
Identifier Solutions
full-identifier oaiarchive-identifierrecord-id
entifier
example oaincstrlncstrl.cornellcs/TR94-1418
141
Repositories, Identifiers, and Records
142
Selective harvesting
  • Recognized need for light-weight facility for
    selective harvesting
  • By Date
  • Sets
  • A low-cost means of selective harvesting
  • NOT a general tool for defining global categories
  • Attribution of meanings to sets can be done
    within communities and in bilateral fashion

143
Protocol Solutions
  • Normalized and Enhanced Verb Set
  • GetRecord
  • Identity
  • ListIdentifiers
  • ListMetadataFormats
  • ListRecords
  • ListSets

144
Protocol Solutions
  • CGI-script friendly syntax
  • baseurl?verbverbnameargnameargval...
  • verbname is the name of the verb
  • argname is the name of the attribute
  • argval is the value of the attribute
  • Example
  • http//foo/blaz?verbListRecordssetS1

145
Registration Solutions
  • Automation through
  • On-line registration of
  • Archive identifier (uniqueness enforcement)
  • base-url of archives OAI protocol implementation
  • Identity verb that exposes archive
    characteristics
  • Use of protocol for registration of metadata
    formats and validity checking
  • Registration of service providers is still an
    open issue

146
Release Schedule
  • October 15 normalized meeting notes distributed
    to meeting group
  • November 1 beta specification to steering
    committee and limited distribution
  • Early January stabilization of specification
    and public meeting

147
Metadata Landscape
148
Conferences
  • ACM Digital Libraries 2001, San Antonio, June
    2001, http//www.dl00.org/
  • European Conference on Digital Libraries,
    Darmstadt, Sep 2001 http//www.ecdl2001.org
  • Asian Digital Library Conference, Seoul, December
    2000, http//ADL2000.kaist.ac.kr
  • Tenth International WWW Conference, Hong Kong,
    May 2001, http//www10.org

149
NSF Digital Library Initiative
  • Phase I (1994-1998) six large-scale testbeds
    involving research universities, industrial
    partners, and next-generation technologies
  • Phase II (1999) expanded scope, smaller
    projects as well as large testbeds, emphasis on
    making accessible new types of content

150
Distributed National Electronic Resource (UK)
  • A managed environment for Internet access to
    scholarly journals and other materials relevant
    to higher education in the UK
  • Uses international standards (eg, Dublin Core)
  • National purchase and licensing agreements for
    best value to UK education community
  • eLib research funding since mid-1990s emphasized
    incremental improvement of standards and services

151
Global Info (Germany)
  • "The German Digital Library Project"
  • Since 1996, integrating access to scientific
    information among libraries, publishers, learned
    societies, and individual scientists
  • Emphasis on open standards (e.g., Dublin Core)
    and open-standard formats (e.g., XML, RDF, MPEG)

152
European Union
  • Fifth Framework Programme, 1998-2002
  • several dozen projects with several countries
    each
  • Digital Heritage, Cultural Content
  • Interactive Electronic Publishing
  • Multimedia Content and Tools
  • DELOS Network of Excellence
  • http//www.ercim.org/delos/
  • Communication within European digital library
    research community and international networking

153
MathNet
  • German Mathematical Societies index math
    pre-prints and home pages of mathematicians
  • Encourages use of Dublin-Core-based metadata by
    distributing free metadata editor displays hits
    "with metadata" separately from hits "without
    metadata"
  • International Mathematical Union (IMU) planning
    international Web service based on German MathNet
    model
  • Seeking international agreement on simple
    metadata profiles for types of math materials

154
IMS Global Learning Consortium, Inc.
  • Teachers seeking appropriate classroom materials
    on Web may want to know
  • for which age-group?
  • has it already been used successfully in
    classrooms?
  • will it work on my equipment?
  • IMS Rich descriptions of learning resources in a
    standard record format

155
Federal Geographic Data Committee
  • (US) FGDC Content Standard for Digital Geospatial
    Metadata integrate access to resources about a
    particular area found in diverse repositories
  • Government, education, and business needs
  • Emergency management
  • Integrated databases and comprehensive maps
  • City planning
  • Environmental control

156
Visual Resources Association
  • VRA Core Categories in a two-level model for
    describing objects such as paintings and
    buildings
  • "Works" described separately from "images" of
    those works (One-to-One Principle)
  • Conceptual clarity of One-to-One Principle
    implies more complex work-flow and processing for
    catalogers and software

157
Nordic Metadata Project
  • Cooperation between Scandinavian countries (since
    circa 1996)
  • Pioneered idea of metadata-based distributed
    index across national boundaries
  • NetLab (Lund University) maintains SAFARI, which
    harvests Dublin-Core-based metadata embedded in
    documents on Web servers

158
Renardus Project (EU)
  • http//www.konbib.nl/coop/reynard
  • National libraries (Netherlands coordinates)
  • NDR National Digital Resource in UK
  • Die Deutsche Bibliothek
  • Goal integrated access to subject gateways in
    Europe
  • High-level agreement on simple, Dublin-Core-based
    schema as common denominator

159
Networked Digital Library of Theses and
Dissertations (NDLTD)
  • http//www.ndltd.org
  • International consortium of projects putting
    dissertations online
  • Difficult to agree on single unified metadata
    schema -- national, legal, and disciplinary
    requirements differ significantly
  • NDLTD agreement on a small Dublin-Core-based set
    of metadata elements?

160
CIDOC
  • International Council of Museums object-oriented
    model (CIDOC) designed for describing multiple
    entities that may be
  • physical (e.g., museum objects)
  • conceptual (e.g., works)
  • temporal (e.g., historical periods)
  • spatial (e.g., places)
  • Implies an integrated information space of
    "encyclopedic" scope

161
Rich Site Summary (RSS)
  • Metadata for content syndication (news feeds)
  • Used in developing media content portals
  • Built on established vocabularies (DC), uses RDF
    syntax
  • Layers of application-specific semantics
    syndication vocabularies, annotation
    vocabularies, etc.

162
Moving Picture Experts Group (MPEG)
  • MPEG 4 encoding and interacting with
    audio-visual objects
  • MPEG 7 multimedia content description interface
    for such objects
  • MPEG 21 ambitious "umbrella" framework
    describing the infrastructure for delivering and
    consuming multimedia content

163
More...
  • INDECS - Uses an event-based model to describe
    intellectual property rights for commercial
    transactions
  • DOI - Uses the INDECS framework with a Digital
    Object Identifier for content description and
    management of references between scientific,
    technical, and medical journals
  • BSR - Basic Semantic Registry as a universal
    interlingua of concepts
  • GILS - Government Information Locator Service

164
...and more...
  • PDS - Planetary Data System
  • IEEE Learning Object Metadata - an elaborate,
    hierarchical scheme for describing multiple
    facets of educational material
  • MARC 21 - Machine Readable Cataloging format and
    related vocabularies for libraries
  • EPICS Data Dictionary, a subset of which -- ONIX
    -- describes books in a specific XML format
    (pushed by Amazon.com)

165
For further information....
  • "Metadata Watch Reports" of SCHEMAS Project,
    http//www.schemas-forum.org
  • Critical overview (with expert commentary) on the
    metadata landscape as it evolves
  • Related database of individual activity reports
  • D-Lib Magazine, http//www.dlib.org/dlib/
  • Ariadne, http//www.ariadne.ac.uk

166
Why the Web won
  • Tim Berners-Lee's original model was very simple,
    and it was easy to implement
  • Real-world experience with simple HTML led
    iteratively to better understanding of priorities
  • As with bicycles and airplanes, there was no
    "theory" for design -- design was perfected
    iteratively, starting simple
  • Complex standards impose significant costs,
    especially if legacy data must be converted

167
Learning from experience
  • People are only human the most perfect language
    is always subject to interpretation
  • By design, metadata languages must allow for
    innovation and evolution
  • Physics and art history, Chinese and Finnish --
    different languages will continue in real life
  • Likewise, a diversity of metadata languages is
    inevitable
  • Interoperability over "everything" can only be
    via a simple and general pidgin

168
thomas.baker_at_gmd.de
Write a Comment
User Comments (0)
About PowerShow.com