Title: Making sense of schemas
1Making sense of schemas
- Thomas Baker, Fraunhofer-Gesellschaft
- Fourth SCHEMAS Workshop
- 30 November 2001, The Hague
2What are schemas?
- Declare, like a dictionary, vocabulary terms and
definitions (semantics) - Confusingly, the generic term implies both
- Describing a set of concepts and relationships
between those concepts, and - Encoding its machine-processable representation.
- People associate latter with W3C specifications
- RDF Schema
- XML Schema
- SCHEMAS specific uses of schema and profile
- Caveat the collective jargon is not yet stable!
3Motivation for SCHEMAS
- Help implementors understand the evolving
landscape of metadata schemas - Overviews of standards development
- Lists and reviews of schema-creating activities
- Standards-based registry (in RDF)
- Harvest metadata schemas from maintainers
- Provide integrated access to this distributed
corpus of metadata vocabulary terms - Encourage use of existing and emerging standards
4Where we stand
- Providing integrated access to schemas evokes a
broader problem - Explicitly or not, the schemas of the world
follow a diversity of incompatible data models - Merging diverse data models requires translation
into a common grammar - This talk Creating coherence (making sense)
among diverse schemas can entail an imperfect
process of translation, even simplification.
5An XML schema example
- Music catalogue using an XML schema specifies a
particular nested tag structure - Applications that share this schema can be
searched in a consistent manner - User query Find identifiers of all tracks with
creator Don Van Vliet - Program action Find values of dcidentifier for
track elements which have a dccreator child
element with content Don Van Vliet
6(No Transcript)
7Each schema a languageunto itself
- Each XML schema defines a particular model for
nesting tags - Without knowing this, cannot safely guess
- Who is the creator?
- What is the relationship between a creator and a
track? - Web crawlers reading this metadata without its
schema cannot make sense of it
8Too many ways to do it!
- "XML allows users to add arbitrary structure to
their documents but says nothing about what the
structures mean" Tim Berners-Lee - Different XML schemas have different structures,
all "good" (and valid) - Humans may be able to interpret, but machines
need prior knowledge of parent-child element
relations (schemas and DTDs) - Not scalable in an open Web, where machines are
always encountering unknown schemas
9W3C Semantic Web
- Simple linked data model
- Create webs of information about related things
using explicit statements - Statements follow a common model and use
machine-processable vocabularies - URIs unique Web addresses for resources
- URIs tie metadata vocabulary terms to unique
definitions that everyone can find on the Web - XML namespaces unique addresses for metadata
vocabulary terms - XML universal file format
10Semantic Web hypothesis
- A shared grammar is needed to ensure that humans
and software will interpret metadata consistently - A grammatical framework for the description of
resources (Resource Description Framework) - Clusters of simple Subject-Predicate-Object
statements can describe most of the data
processed by machines - More complex grammars will not interoperate in
the diverse Web environment
11The RDF model
- A resource has some property whose value is
either (i) a simple string value (literal)
http//pj.org/doc/1
author
Pete
Subject resource identified by the URI
http//pj.org/doc/1 Predicate has property
author Object the value of the property is
Pete
12The RDF model
- or (ii) another resource...
http//pj.org/doc/1
author
name
email
Pete
pete_at_pj.org
Object of (I) is another resource Subject of
statement has name Pete and Subject of
statement has email pete_at_pj.org
13The RDF model
- which may itself have a URI
author
http//pj.org/doc/1
http//pj.org/person/pete
name
email
Pete
pete_at_pj.org
14URIs as anchors formerging data
- URIs are fixed points on global Web for
- Identifying resources to be described
- Identifying precisely the metadata vocabulary
used to describe those resources - These points can be used to superimpose graphs,
merging statements - Creates market for aggregation, data merging,
annotation, and filtering services
15First source
author
http//pj.org/doc/1
http//pj.org/person/pete
name
email
Pete
pete_at_pj.org
ltrdfRDF xmlnsuchttp//www.ukoln.ac.uk/core/gt
ltrdfDescription abouthttp//pj.org/doc/1gt
ltucauthorgt ltrdfDescription
abouthttp//pj.org/person/petegt
ltucnamegtPetelt/ucnamegt
ltucemailgtpete_at_pj.orglt/ucemailgt
lt/rdfDescription lt/ucauthorgt
lt/rdfDescriptiongt lt/rdfRDFgt
16Second source
http//pj.org/doc/1
subject
XML
ltrdfRDF xmlnsuchttp//www.ukoln.ac.uk/core/gt
ltrdfDescription abouthttp//pj.org/doc/1gt
ltucsubjectgtXMLlt/ucauthorgt
lt/rdfDescriptiongt lt/rdfRDFgt
17Third source
organisation
http//pj.org/person/pete
UKOLN
ltrdfRDF xmlnsuchttp//www.ukoln.ac.uk/core/gt
ltrdfDescription abouthttp//pj.org/person/pet
egt ltucorganisationgtUKOLNlt/ucorganisationgt
lt/rdfDescriptiongt lt/rdfRDFgt
18Three descriptions merged
ltrdfRDF xmlnsuchttp//www.ukoln.ac.uk/core/gt
ltrdfDescription abouthttp//pj.org/doc/1gt
ltucauthorgt ltrdfDescription
abouthttp//pj.org/person/petegt
ltucnamegtPetelt/ucnamegt
ltucemailgtpete_at_pj.orglt/ucemailgt
ltucorganisationgtUKOLNlt/ucorganisationgt
lt/rdfDescription lt/ucauthorgt
ltucsubjectgtXMLlt/ucsubjectgt lt/rdfDescriptiongt
lt/rdfRDFgt
19Three descriptions merged
20Partial understanding
- To share data between programs and resources
designed independently - Essential trait of a massively distributed Web
- Incorporate and re-purpose data for unanticipated
uses - Communication among diverse communities on basis
of partial, imperfect understanding - Assumption tolerate inconsistency and errors!
- Ignore the ones you don't understand
- On the Web, "Error 404 File not found", but
unchecked exponential growth
21Pidgin metadata
- Tourists use simplified speech ("pidginisation")
"Zwei Bier bitte". - We are all "tourists" on a global Web with
linguistically diverse metadata - Core vocabulary of terms generally useful for
description - Dublin Core Creator, Title, Subject, Date...
- Simple Metadata Hypothesis simple metadata
plus powerful search engines is cost-effective
22Making schemascomparable
- SCHEMAS Expressing metadata vocabularies in a
common grammar - Instead of asking machines to understand people's
language, ask people to make the extra effort
Tim Berners-Lee - As with natural language, translation into a
shared grammar may involve simplification - Allows construction of...
23Registries
- Web-based dictionaries of metadata terms using
machine-processable schemas - URIs ensure that vocabulary terms (elements) are
defined at unique locations on the Web - Namespace schemas declare definitions for
metadata terms ("standards") - Application profiles mix and match standards
for specialised needs
24Like dictionaries
- As with natural languages
- Prescribing definitions and guidelines
- Describing actual metadata usage
- Translated definitions in French and Farsi
- Tracking evolutionary change
- Making metadata language visible, helping it
evolve bottom-up - Documentary and social purpose
- Define semantic coherence across applications
- Support schema design and harmonisation
25but also unlikedictionaries
- Natural-language dictionaries
- Compiled centrally by editorial boards
- Metadata dictionaries
- Thousands of metadata schemas, dynamically
changing - Can only scale if vocabularies are harvested over
the Web directly from their maintainers - Implies shared conventions and grammars for
publishing schemas on the Web
26Life on the bleeding edge
- SCHEMAS has made progress on defining and
articulating conventions for publishing metadata
vocabularies - much work and consensus-building
- But is RDF ready for prime-time?
- RDF specifications continue to evolve
- RDF tools are cutting-edge
- SCHEMAS registry prototype demonstrated
proof-of-concept, but not production-ready
27Keep it simple
- Speaking practically, broad-brush
interoperability entails - Partial understanding (Semantic Web)
- Shared grammar for simple statements (RDF)
- Core vocabularies, pidgin metadata (eg Dublin
Core) - Broadly understood conventions for publishing
metadata vocabularies
28Thomas.Baker_at_gmd.de