Title: Metadata for the Web Issues and Simple Answers
1Metadata for the WebIssues and Simple Answers
- CS 502 20030219
- Carl Lagoze Cornell University
2Metadata is data about data
3Metadata is semi-structured data conforming to
commonlyagreed upon models, providing
operational interoperabilityin a heterogeneous
environment
4Some untested hypotheses
- Metadata is useful for
- People
- Machines
- More metadata is better
- (semi) automated digital libraries and simple
metadata
5Some known facts
- Number and variety of metadata vocabularies will
continue to increase - The Tower of Babel is a franchise
- There is not one common view of reality
- The one thing I know about metadata is that it
is expensive (Bill Arms) - I hate metadata projects because they make every
other digital library project more expensive
(Michael Lesk)
6Are metadata and data distinguishable?
- Objectivity?
- Intellectual property?
- Structure?
- Aboutness?
7The fiction of classification
there is no classification of the universe that
is not fictional and conjectural. Jorge Luis
Borges
8Lenses and Views
- All classification does and should provide a
biased lens or view of reality - Each view emphasizes certain characteristics and
hides others
9Reality is Complex
Relationship?
Created byGeorge Castaldo Created on1994
10Objects are Related
11Entities, Events, and Agents
12Havent we done metadata already?
13Whats wrong with this model?
- Expensive
- Complex (even for its original goal?)
- Professional intervention (assumes single
community of expertise) - Monolithic
- One size fits all approach
- Reflects its centralized system origins
- Bias towards physical artifacts
- Fixed resources
- Incomplete handling of resource evolution and
other resource relationships - Anglo-centric
14Web Challenge to Traditional Cataloging
- Scale
- Permanence
- Authenticity
- Organizational Context
- Custodial Control
- Variety
15Internet Commons includes Multiple Communities
16Metadata Takes Many Forms
17Metadata Challenges
- Accommodate multiple varieties of metadata
- community-specific functionality, creation,
administration, access - Tensions
- functionality and simplicity
- extensibility and interoperability
- human and machine creation and use
18Interoperability has many facets
- Semantics
- Meaning/classification/ontology
- Models/Structure
- Entities and relationships
- Syntax
- grammars to convey semantics and structure
19Warwick Framework Containing Chaos
- Conceptual Architecture for metadata from the
Warwick Metadata Workshop (DC-2) - Conceptual architecture to support the
specification, collection, encoding, and exchange
of modular metadata - Provide context for metadata efforts (including
Dublin Core) - avoids the black-hole of comprehensive element
sets - focuses interoperability issues at package level
20Metadata Container
Container
Package Dublin Core
Package MARC record
Package Indirect Reference
Package Terms and Conditions
URI
21Modularization Allows Distributed Management
- Communities of expertise (not software vendors)
are responsible for - Semantics
- Registration
- Administration
- Access management
- Authority of data
- Sharing and Distribution
22Realities of Web search and discovery
- Search systems are motivated by advertising
- Index coverage is unpredictable and limited
- Too much recall, too little precision
- Index spam abounds
- Resources (and their names) are volatile
23Metadata Part of a Solution
- Structured data about data
- helps to impose order on chaos
- enables automated discovery/manipulation
- Variety across various dimensions
- specialization
- decentralization
- democratization
24Web Metadata ModelsDrill-Down Searching Paradigm
- Moving along a specificity spectrum
- Inter-domain vs. intra-domain terms, models,
query mechanisms - One size doesn't fit all
- Cognitive models of searching and browsing
25Drill-down search paradigm
26MetadataPart of the problem
AACR2/MARC
cost
Dublin Core
google
functionality
27Why hasnt metadata worked on the Web?
- Its all about trust
- People are lazy
- Metadata is hard
- No perceived benefit
- Reverse tragedy of the commons
- No agreement on one way to describe things
- Metacrap - http//www.well.com/doctorow/metacra
p.htm