Metadata for the Web From Discovery to Description - PowerPoint PPT Presentation

About This Presentation
Title:

Metadata for the Web From Discovery to Description

Description:

Carl Lagoze Cornell University. Cornell CS 502. The fifteen Dublin Core Elements ... A cross-domain switchboard for interoperable metadata? Switchboard ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 36
Provided by: carll8
Category:

less

Transcript and Presenter's Notes

Title: Metadata for the Web From Discovery to Description


1
Metadata for the WebFrom Discovery to Description
  • CS 502 20020224
  • Carl Lagoze Cornell University

2
The fifteen Dublin Core Elements
http//dublincore.org/usage/terms/dc/current-eleme
nts/ http//dublincore.org
3
A Pidgin for Digital Tourists
  • Metadata is language
  • Dublin Core is a small and simple language -- a
    pidgin -- for finding resources across domains.
  • Speakers of different languages naturally
    "pidginize" to communicate
  • E.g., tourists using simple phrases to order beer
    ("zwei Bier bitte" "dva pivo" "biru o san
    bai"...)
  • We are all "tourists" on the global Internet.

4
What is the Dublin Core (1)
  • A simple set of properties to support resource
    discovery on the web (fuzzy search buckets)?

5
What is Dublin Core (2)?
  • An extensible ontology for resource desciption?

Greater Functionality Cost
6
What is the Dublin Core (3)?
  • A cross-domain switchboard for interoperable
    metadata?

- projections to application-specific metadata
vocabularies
Switchboard
7
Dublin Core Qualifiers
  • From fuzzy buckets to more specific description
  • Model of graceful degradation
  • Support both simplicity and specificity
  • Intra-domain and inter-domain semantics

8
Varieties of qualifiers Element Refinements
  • Make the meaning of an element narrower or more
    specific.
  • Narrowing implies an is a relationship
  • a "date created is a "date
  • an "is part of relation is a "relation
  • If your software does not understand the
    qualifier, you can safely ignore it.

9
Varieties of Qualifiers Value Encoding Schemes
  • Says that the value is
  • a term from a controlled vocabulary (e.g.,
    Library of Congress Subject Headings)
  • a string formatted in a standard way (e.g.,
    "2001-05-02" means May 3, not February 5)
  • Even if a scheme is not known by software, the
    value should be "appropriate" and usable for
    resource discovery.

10
A Grammar of Dublin Core
  • http//www.dlib.org/dlib/october00/baker/10baker.h
    tml
  • By design not as subtle as mother tongues, but
    easy to learn and extremely useful in practice
  • Pidgins small vocabularies (Dublin Core fifteen
    special nouns and lots of optional adjectives)
  • Simple grammars sentences (statements) follow a
    simple fixed pattern...

11
Example Dublin Core statements
  • Resource has Title 'Grammar of Dublin Core'.
  • Resource has Creator 'Tom Baker'.
  • Resource has Subject 'Metadata'.
  • Resource has Relation http//foo.org/file.htm.

12
implied verb
one of 15 properties
property value (an appropriate literal)
DCCreator DCTitle DCSubject DCDate...
implied subject
Resource
has
property
X
13
implied verb
one of 15 properties
property value (an appropriate literal)
DCCreator DCTitle DCSubject DCDate...
implied subject
Resource
has
property
X
qualifiers (adjectives)
optional qualifier
optional qualifier
14
Resource
has
Subject
"Languages -- Grammar"
LCSH
Resource
has
Date
"2000-06-13"
Revised
ISO8601
15
Dumb-Down Principle for Qualifiers
  • The fifteen elements should be usable and
    understandable with or without the qualifiers
  • Qualifiers refine meaning (but may be harder to
    understand)
  • Nouns can stand on their own without adjectives
  • If your software encounters an unfamiliar
    qualifier, look it up -- or just ignore it!
  • "has a relations break the model
  • E.g., a creator has a hair color

16
Test for good qualifiers cover and ask
-- Does the statement still make sense?
-- Is it still correct?
Resource
has
Subject
"Languages -- Grammar"
LCSH
Resource
has
Date
"2000-06-13"
Revised
ISO8601
17
Incorrect Qualification
Resource
has
creator
Cornell University
affiliation
Resource
has
subject
pre-schoolers
audience
18
Open questions in this model
  • Are uncontrolled and unconstrained values really
    useful for discovery?
  • Is it possible for an organization (DCMI) to
    control the evolution of a language?
  • How can "simple discovery metadata" be combined
    with complex descriptions? Is there a notion of
    graceful degradation?
  • Can DC serve as a lingua franca (mapping
    template) among more complex models

19
Models for Deploying Metadata
  • Embedded in the resource
  • low deployment threshold
  • Limited flexibility, limited model
  • Linked to from resource
  • Using xlink
  • Is there only one source of metadata?
  • Independent resource referencing resource
  • Model of accessing the object through its
    surrogate
  • Resource doesnt have metadata, metadata is
    just one resource annotating another

20
Syntax AlternativesHTML
  • Advantages
  • Simple Mechanism META tags embedded in content
  • Widely deployed tools and knowledge
  • Disadvantages
  • Limited structural richness (wont support
    hierarchical,tree-structured data or entity
    distinctions).

21
Dublin Core in HTML
  • http//www.dublincore.org/documents/2000/08/15/dcq
    -html/
  • HTML constructs
  • ltlinkgt to establish pseudo-namespace
  • ltmetagt for metadata statements
  • name attribute for DC element (DC.element.ER)
  • content attribute for element value
  • scheme attribute for encoding scheme or
    controlled vocabulary
  • lang attribute for language of element value

22
Dublin Core in HTML example
ltlink rel"schema.DC" href"http//purl.org/dc/ele
ments/1.1"gt ltmeta name"DC.Title"
content"Business Unusualgtltmeta nameDC.Title
langes contentnegocio inusualgt ltmeta
name"DC.Creator" content"Carl Lagoze"gt ltmeta
name"DC.Subject" content"bibliographic control
web cataloging "gt ltmeta name"DC.Date.Created"
scheme"W3CDTF" content"2000-10-23"gt ltmeta
name"DC.Format" content"text/html"gt ltmeta
name"DC.Identifier" content"http//lcweb.loc
.gov/lagoze_paper.html"gt
23
Unqualified Dublin Core in XML
http//dublincore.org/documents/2002/09/09/dc-xml-
guidelines/
24
Multi-entity nature of object description
25
Attribute/Value approaches to metadata
The playwright of Hamlet was Shakespeare
Hamlet has a creator
Shakespeare
26
run into problems for richer descriptions
The playwright of Hamlet was Shakespeare,who was
born in Stratford
Hamlet has a creator
Stratford
birthplace
27
because of their failure to model entity
distinctions
Shakespeare
name
R1
R2
creator
birthplace
title
Stratford
Hamlet
28
Applying a Model-Centric Approach
  • Formally define common entities and relationships
    underlying multiple metadata vocabularies
  • Describe them (and their inter-relationships) in
    a simple logical model
  • Provide the framework for extending these common
    semantics to domain and application-specific
    metadata vocabularies.

29
Events are key to understanding resource
complexity?
  • Events are implicit in most metadata formats
    (e.g., date published, translator)
  • Modeling implied events as first-class objects
    provides attachment points for common entities
    e.g., agents, contexts (times places), roles.
  • Clarifying attachment points facilitates
    understanding and querying who was responsible
    for what when.

30
ABC/Harmony Event-aware metadata ontology
  • Recognizing inherent lifecycle aspects of
    description (esp. of digital content)
  • Modeling incorporates time (events and
    situations) as first-class objects
  • Supplies clear attachment points for agents,
    roles, existential properties
  • Resource description as a story-telling activity

31
Resource-centric Metadata
32
(No Transcript)
33
Breaking the metadata bottleneck Human vs.
machine generation
  • Simple text scraping
  • HTML tags as hint
  • Other structural methods
  • Natural language methods and machine learning
  • Contextual methods
  • Google (text and image search)

34
Putting metadata in its place
35
Query engine architecture space
Write a Comment
User Comments (0)
About PowerShow.com