Making sense of schemas - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Making sense of schemas

Description:

... its machine-processable ... data models requires translation into a common ... As with natural language, translation into a shared grammar may ... – PowerPoint PPT presentation

Number of Views:140
Avg rating:3.0/5.0
Slides: 25
Provided by: Bak970
Category:
Tags: making | schemas | sense

less

Transcript and Presenter's Notes

Title: Making sense of schemas


1
Making sense of schemas
  • Thomas Baker, Fraunhofer-Gesellschaft
  • Fourth SCHEMAS Workshop
  • 30 November 2001, The Hague

2
What are schemas?
  • Declare, like a dictionary, vocabulary terms and
    definitions (semantics)
  • Confusingly, the generic term implies both
  • Describing a set of concepts and relationships
    between those concepts, and
  • Encoding its machine-processable representation.
  • People associate latter with W3C specifications
  • RDF Schema
  • XML Schema
  • SCHEMAS specific uses of schema and profile
  • Caveat the collective jargon is not yet stable!

3
Motivation for SCHEMAS
  • Help implementors understand the evolving
    landscape of metadata schemas
  • Overviews of standards development
  • Lists and reviews of schema-creating activities
  • Standards-based registry (in RDF)
  • Harvest metadata schemas from maintainers
  • Provide integrated access to this distributed
    corpus of metadata vocabulary terms
  • Encourage use of existing and emerging standards

4
Where we stand
  • Providing integrated access to schemas evokes a
    broader problem
  • Explicitly or not, the schemas of the world
    follow a diversity of incompatible data models
  • Merging diverse data models requires translation
    into a common grammar
  • This talk Creating coherence (making sense)
    among diverse schemas can entail an imperfect
    process of translation, even simplification.

5
An XML schema example
  • Music catalogue using an XML schema specifies a
    particular nested tag structure
  • Applications that share this schema can be
    searched in a consistent manner
  • User query Find identifiers of all tracks with
    creator Don Van Vliet
  • Program action Find values of dcidentifier for
    track elements which have a dccreator child
    element with content Don Van Vliet

6
(No Transcript)
7
Each schema a languageunto itself
  • Each XML schema defines a particular model for
    nesting tags
  • Without knowing this, cannot safely guess
  • Who is the creator?
  • What is the relationship between a creator and a
    track?
  • Web crawlers reading this metadata without its
    schema cannot make sense of it

8
Too many ways to do it!
  • "XML allows users to add arbitrary structure to
    their documents but says nothing about what the
    structures mean" Tim Berners-Lee
  • Different XML schemas have different structures,
    all "good" (and valid)
  • Humans may be able to interpret, but machines
    need prior knowledge of parent-child element
    relations (schemas and DTDs)
  • Not scalable in an open Web, where machines are
    always encountering unknown schemas

9
W3C Semantic Web
  • Simple linked data model
  • Create webs of information about related things
    using explicit statements
  • Statements follow a common model and use
    machine-processable vocabularies
  • URIs unique Web addresses for resources
  • URIs tie metadata vocabulary terms to unique
    definitions that everyone can find on the Web
  • XML namespaces unique addresses for metadata
    vocabulary terms
  • XML universal file format

10
Semantic Web hypothesis
  • A shared grammar is needed to ensure that humans
    and software will interpret metadata consistently
  • A grammatical framework for the description of
    resources (Resource Description Framework)
  • Clusters of simple Subject-Predicate-Object
    statements can describe most of the data
    processed by machines
  • More complex grammars will not interoperate in
    the diverse Web environment

11
The RDF model
  • A resource has some property whose value is
    either (i) a simple string value (literal)

http//pj.org/doc/1
author
Pete
Subject resource identified by the URI
http//pj.org/doc/1 Predicate has property
author Object the value of the property is
Pete
12
The RDF model
  • or (ii) another resource...

http//pj.org/doc/1
author
name
email
Pete
pete_at_pj.org
Object of (I) is another resource Subject of
statement has name Pete and Subject of
statement has email pete_at_pj.org
13
The RDF model
  • which may itself have a URI

author
http//pj.org/doc/1
http//pj.org/person/pete
name
email
Pete
pete_at_pj.org
14
URIs as anchors formerging data
  • URIs are fixed points on global Web for
  • Identifying resources to be described
  • Identifying precisely the metadata vocabulary
    used to describe those resources
  • These points can be used to superimpose graphs,
    merging statements
  • Creates market for aggregation, data merging,
    annotation, and filtering services

15
First source
author
http//pj.org/doc/1
http//pj.org/person/pete
name
email
Pete
pete_at_pj.org
ltrdfRDF xmlnsuchttp//www.ukoln.ac.uk/core/gt
ltrdfDescription abouthttp//pj.org/doc/1gt
ltucauthorgt ltrdfDescription
abouthttp//pj.org/person/petegt
ltucnamegtPetelt/ucnamegt
ltucemailgtpete_at_pj.orglt/ucemailgt
lt/rdfDescription lt/ucauthorgt
lt/rdfDescriptiongt lt/rdfRDFgt
16
Second source
http//pj.org/doc/1
subject
XML
ltrdfRDF xmlnsuchttp//www.ukoln.ac.uk/core/gt
ltrdfDescription abouthttp//pj.org/doc/1gt
ltucsubjectgtXMLlt/ucauthorgt
lt/rdfDescriptiongt lt/rdfRDFgt
17
Third source
organisation
http//pj.org/person/pete
UKOLN
ltrdfRDF xmlnsuchttp//www.ukoln.ac.uk/core/gt
ltrdfDescription abouthttp//pj.org/person/pet
egt ltucorganisationgtUKOLNlt/ucorganisationgt
lt/rdfDescriptiongt lt/rdfRDFgt
18
Three descriptions merged
ltrdfRDF xmlnsuchttp//www.ukoln.ac.uk/core/gt
ltrdfDescription abouthttp//pj.org/doc/1gt
ltucauthorgt ltrdfDescription
abouthttp//pj.org/person/petegt
ltucnamegtPetelt/ucnamegt
ltucemailgtpete_at_pj.orglt/ucemailgt
ltucorganisationgtUKOLNlt/ucorganisationgt
lt/rdfDescription lt/ucauthorgt
ltucsubjectgtXMLlt/ucsubjectgt lt/rdfDescriptiongt
lt/rdfRDFgt
19
Three descriptions merged
20
Partial understanding
  • To share data between programs and resources
    designed independently
  • Essential trait of a massively distributed Web
  • Incorporate and re-purpose data for unanticipated
    uses
  • Communication among diverse communities on basis
    of partial, imperfect understanding
  • Assumption tolerate inconsistency and errors!
  • Ignore the ones you don't understand
  • On the Web, "Error 404 File not found", but
    unchecked exponential growth

21
Pidgin metadata
  • Tourists use simplified speech ("pidginisation")
    "Zwei Bier bitte".
  • We are all "tourists" on a global Web with
    linguistically diverse metadata
  • Core vocabulary of terms generally useful for
    description
  • Dublin Core Creator, Title, Subject, Date...
  • Simple Metadata Hypothesis simple metadata
    plus powerful search engines is cost-effective

22
Making schemascomparable
  • SCHEMAS Expressing metadata vocabularies in a
    common grammar
  • Instead of asking machines to understand people's
    language, ask people to make the extra effort
    Tim Berners-Lee
  • As with natural language, translation into a
    shared grammar may involve simplification
  • Allows construction of...

23
Registries
  • Web-based dictionaries of metadata terms using
    machine-processable schemas
  • URIs ensure that vocabulary terms (elements) are
    defined at unique locations on the Web
  • Namespace schemas declare definitions for
    metadata terms ("standards")
  • Application profiles mix and match standards
    for specialised needs

24
Like dictionaries
  • As with natural languages
  • Prescribing definitions and guidelines
  • Describing actual metadata usage
  • Translated definitions in French and Farsi
  • Tracking evolutionary change
  • Making metadata language visible, helping it
    evolve bottom-up
  • Documentary and social purpose
  • Define semantic coherence across applications
  • Support schema design and harmonisation

25
but also unlikedictionaries
  • Natural-language dictionaries
  • Compiled centrally by editorial boards
  • Metadata dictionaries
  • Thousands of metadata schemas, dynamically
    changing
  • Can only scale if vocabularies are harvested over
    the Web directly from their maintainers
  • Implies shared conventions and grammars for
    publishing schemas on the Web

26
Life on the bleeding edge
  • SCHEMAS has made progress on defining and
    articulating conventions for publishing metadata
    vocabularies
  • much work and consensus-building
  • But is RDF ready for prime-time?
  • RDF specifications continue to evolve
  • RDF tools are cutting-edge
  • SCHEMAS registry prototype demonstrated
    proof-of-concept, but not production-ready

27
Keep it simple
  • Speaking practically, broad-brush
    interoperability entails
  • Partial understanding (Semantic Web)
  • Shared grammar for simple statements (RDF)
  • Core vocabularies, pidgin metadata (eg Dublin
    Core)
  • Broadly understood conventions for publishing
    metadata vocabularies

28
Thomas.Baker_at_gmd.de
Write a Comment
User Comments (0)
About PowerShow.com