Title: Metadata : Promise and Practice
1Metadata Promise and Practice
- Jeffrey Beall
- Nebraska Library Association
- Technical Services Round Table
- Spring Meeting, April 25, 2008
2Outline
- Introduction
- 8 theses of my talk
- About me
- Metadata and high-quality information retrieval
value of browse displays - Four types of searching in libraries
- The weaknesses of full-text searching
- The future of cataloging and the debate
- Next-generation library interfaces
3000 01005cz a2200217n 450
001 6940590
005 20061110002734.0
008 060822 anannbabn a ana c
035 __ a (DLC)6940590
035 __ a (DLC)sh2006006354
035 __ a (DLC)351667
906 __ t 0645 u te04 v 0
010 __ a sh2006006354
040 __ a CoU-DA b eng c DLC
150 __ a Carhenge (Box Butte County, Neb.)
550 __ w g a Monuments z Nebraska
670 __ a Work cat. Carhenge, genius or junk? VR 2005.
670 __ a GNIS, Aug. 21, 2006 b (Carhenge, locale, Box Butte County, Nebraska, 4209"40'N 10251"32'W)
670 __ a Wikipedia, Aug. 21, 2006 b (Carhenge is a replica of Englands Stonehenge located near the town of Alliance, Nebraska on the High Plains. Instead of being made from stones, Carhenge is constructed of vintage American automobiles, all covered with gray spray paint. Built by Jim Reinders, it was dedicated at summer solstice in June of 1987.)
4Favorite funny subject headings
Golf and war Electric donkeys Infants
Congresses World Wide Web Early works to
1800 Automobile driving Religious aspects Dance
France Women, Kukukuku (Changed to Women,
Hamtai) Ugly contests Host-fungus relationships
5Favorite funny subject headings
Weapons of mass destructionSafety
measures Pomegranate seeds in literature Infants
Books and reading Eskimos Hunting Headache
patients writings Bird surveys Violin Methods
(Fiddling) Global warming Fiction Body, Human
Catalogs Mentally ill parents Appalachian Region
Intellectual life
6Favorite funny subject headings
Tax exemption Taxation Dinosaurs as pets Labor
disputes Poetry Crappie fishing Reality
Fiction Historic buildings Design and
construction Public toilets in motion
pictures Domestic asses Hurling managers Uranus
probes 110 10 a United States. b Office of
Solid Waste
7Theses
- Libraries should provide high-quality information
discovery and information retrieval. - The best way to achieve this is with systems that
sufficiently exploit rich, standard, and
comprehensive metadata. - Rich, standard, and comprehensive metadata
requires controlled vocabularies for subject
metadata, name disambiguation, granularity of
description, and collocation.
8Theses (continued)
- Full-text searching, while not devoid of value,
is a low-quality IR/ID system for the type of
searching done in libraries, especially serious
research and scholarship, etc. - At this time, computers, which do not understand
the nuances of human language, are not able to
create metadata that is of sufficient quality for
use in library IR systems
9Theses (continued)
- Information discovery often requires mediation.
IR systems dont have to be dumbed-down and made
simple. Many things in the world are complicated,
so its natural that the organization of
information will reflect that. Its okay to have
to learn to use a library catalog or other IR
system.
10Theses (continued)
- Library IR systems should not abandon
alphabetical browse displays in favor of
relevance ranking. - The creation, maintenance, and sharing of
metadata for intellectual resources should not be
made so complicated that it reduces the amount or
quality of metadata being created.
11About me
Auraria Campus
12The value of metadata
- Elements of metadata
- The value of rich metadata
- The library technology graveyard analyses of
low-quality, emerging library technologies - Defining quality in library IR systems
13Left-anchored subject browse display
14The value of left-anchored browse displays
- Simplicity
- Structure
- Parsing advantage
- References
- Truncation
- Concept consolidation
- Collocation of inverted terms
- Typographical errors
- Classification display
- Completeness
- Skill transference
15The Four categories of searching in libraries
- Deterministic searching
- Full text searching
- Metatext searching
- Metadata-enhanced stochastic searching
16Deterministic searching
- An author, title, subject, number search in an
online library catalog - Only searches metadata results sorted
alphanumerically - Can use cross-references
17Full-text searching
- Matches words in a search with words in documents
- Advantages free, good for rare terms, good for
casual information seeking - Also called stochastic searching, probabilistic
searching
18Metatext searching
- Is a full-text search but only of metadata
- A keyword search in a library catalog is an
example - Advantages good for rare words good for novice
searchers - Disadvantage May miss abbreviated terms is full
text, but not of full text itself
19Metadata-enhanced stochastic searching
- Is a full-text search but also uses metadata to
limit results - Google advanced search is an example
- Google staff mode how do they encode metadata?
What's their metadata scheme?
20(No Transcript)
21The weaknesses of full-text searching
- The synonym problem
- The homonym problem
- Inability to search by facets
- Spamming
- The "aboutness" problem
- Figurative language
- Word lists
- Abstract topics
22The weaknesses of full-text searching (continued)
- The incognito problem
- Difficult-to-search paired topics
- Search engine variability
- The opaque web
23Search fatigue
24Miscellaneous
- What computers still cannot do
- Gresham's Law
- Still need metadata surrogates
- The debate about the future of cataloging
- My strategy
- "Next-generation" library catalogs
25WorldCat.org Example of a next-generation,
FRBRized search engine
- Facets
- Metatext search
- Hope for catalogers
- Can be sorted also by
- author, title, date
26jeffrey.beall_at_ucdenver.edu
Discussion
Scarlet