Title: Labels
1Labels
- the practice and science of classification
2Classification
- Classification provides a system for organizing
knowledge. - Classification provides a system for organizing
knowledge.
3Classification Types
- Controlled Vocabulary
- Classification
- Taxonomy
- Thesaurus
- Ontology
4Commonalities of Controlled Vocabulary,
Classification, Taxonomy, Thesaurus, Ontology
- Standardized naming conventions
- Highly structured
- Generally hierarchical
- Highly specific application
- Highly specific audience
- Prescriptive, rather than descriptive
- Low adaptability
- Enables more efficient indexing and searching
5Controlled Vocabulary
- List of terms that have been enumerated
explicitly. - Controlled by and is available from a controlled
vocabulary registration authority. - All terms in a controlled vocabulary should have
an unambiguous, non-redundant definition.
6Purpose of Controlled Vocabulary
- Translation Provide a means for converting the
natural language of authors, indexers, and users
into a vocabulary that can be used for indexing
and retrieval. - Consistency Promote uniformity in term format
and in the assignment of terms. - Indication of relationships Indicate semantic
relationships among terms. - Label and browse Provide consistent and clear
hierarchies in a navigation system to help users
locate desired content objects. - Retrieval Serve as a searching aid in locating
content objects.
7Control Rules
- Synonyms (two words with the same meaning, like
jeans and dungarees) - Homonyms (words that sound the same, but have
different meanings, like bank the financial
institution and bank the side of a stream or
river) - Common misspellings
8Control Rules
- Changes in content (e.g., countries that change
their name or have multiple spellings) - Identifying Best Bets connecting a womans
married name to her maiden name - Connecting abbreviations to the full word (e.g.,
NY and New York, the chemical symbol Si with the
element Silicon)
9(No Transcript)
10Classification Types
- Controlled Vocabulary
- Classification
- Taxonomy
- Thesaurus
- Ontology
11Classifying a work with the DDC
- Requires determination of
- The subject
- The disciplinary focus
- The approach or form (if applicable)
12Dewey Decimal Classification - 1876
- 000 Computers, information general reference
- 100 Philosophy psychology
- 200 Religion
- 300 Social sciences
- 400 Language
- 500 Science
- 600 Technology
- 700 Arts recreation
- 800 Literature
- 900 History geography
13Notational hierarchy
600 Technology (Applied sciences) 630
Agriculture and related technologies 636
Animal husbandry 636.7 Dogs 636.8 Cats
14Table of Last Resort (tie breaker)
- Kinds of things
- Parts of things
- Materials from which things, kinds, or parts are
made - Properties of things, kinds, parts, or materials
- Processes within things, kinds, parts, or
materials - Operations upon things, kinds, parts, or
materials - Instrumentalities for performing such operations
15Book Industry Study Group Subject Headings
http//www.bisg.org/standards/bisac_subject/major_
subjects.html
16Taxonomy
- Word comes from the Greek t????, taxis, 'order'
??µ??, nomos, 'law' or 'science'. - Collection of controlled vocabulary terms
organized into a hierarchical structure
17Taxonomy Classification
- Biological classification has two basic
objectives - To serve as a basis for generalization in
comparative studies. - To serve as an information storage system.
18Linnaean System of classification
19Search Dilemma
Peter Morville
20Example
21Classification Types
- Controlled Vocabulary
- Classification
- Taxonomy
- Thesaurus
- Ontology
22Thesaurus
- Networked collection of controlled vocabulary
terms. - Uses associative relationshipsin addition to
parent-child relationships.
23Thesaurus Provides Variant Terms
24http//www.visualthesaurus.com/
25Classification Types
- Controlled Vocabulary
- Classification
- Taxonomy
- Thesaurus
- Ontology
26Z39.50
- Client-server protocol for searching and
retrieving information from remote computer
databases. - Guidelines for the Construction, Format, and
Management of Monolingual Controlled Vocabulari - ANSI/NISO Z39.19-2005
http//www.niso.org/standards/resources/Z39-19-200
5.pdf
27Extend taxonomies to be more descriptive
- Thesaurus BT/NT, USE/UF, SN and RT
- ISO standard 2788 - Properties
- BT Broader term - one level up in the hierarchy
- NT Narrow term / Inversed with BT
- SN Scope note (Explanation of meaning of the
term) - RT Related term (No synonym or BT/NT See
also) - USE Other term preferred/synonym /Inversed with
UF - To provide a much richer vocabulary for
describing the terms than taxonomies do.
28Z39.50
- It is covered by ANSI/NISO standard Z39.50, and
ISO standard 23950. - (ANSI) American National Standard For Information
- (ISO) International Organization for
Standardization - (NISO) National Information Standards
Organization -
- The standard's maintenance agency is the Library
of Congress.
295.2.2 Content Objects
- There are two classes of content objects, primary
and secondary, although this distinction is
rarely made. - A primary content object is the item itself,
whether it exists in physical form (e.g. print,
audiotape, DVD, movie) or exists solely in
electronic form. - A secondary content object is the metadata that
describes the primary content object. - Many data stores combine the primary content
object and its metadata into a single, hybrid
content object.
30(No Transcript)
319.3.1 Alphabetical Displays
- An alphabetical listing is the most basic type of
vocabulary display. It may contain both preferred
terms and entry terms with their respective USE
and USED FOR references.
329.3.1.2 Flat Format Displays
- Most commonly used controlled vocabulary display
format. - All terms arranged in alphabetical order,
including their term details, and one level of
BT/NT hierarchy. - (BT/BT) Broad Term / Narrow Term
(SN) Scope Note
(UF) Used For
(RT) Related Term
339.3.4.1.1 Multilevel Broader and Narrower Terms
Hierarchical Displays
- In a multilevel hierarchical display format, all
levels of the broader and narrower terms related
to a given term are immediately visible. - This is in contrast with the flat format
described above, in which only one level of
broader or narrower terms is displayed and the
user is required to navigate from term to term
one level at a time to discover the full
hierarchy.
34Multilevel Hierarchy using Tree Structure
35Ontology
- Ontology is a controlled vocabulary expressed in
an ontology representation language. - Includes a grammar for using vocabulary terms
within a domain of interest. - The grammar contains formal constraints
- Rule define what constitutes a
- well-formed statement
- assertion
- query
36Commonalities of Controlled Vocabulary,
Classification, Taxonomy, Thesaurus, Ontology
- Standardized naming conventions
- Highly structured
- Generally hierarchical
- Highly specific application
- Highly specific audience
- Prescriptive, rather than descriptive
- Low adaptability
- Enables more efficient indexing and searching
37http//magia3e.wordpress.com/2007/02/04/informatio
n-classification-part-ii-taxonomy/
38Creating a Controlled Vocabulary
- What is this content object?
- How can I describe it?
- What distinguishes it from other content objects?
- How can I make it findable?
39Creating a Top-Down Classification
- Define the sites target audience.
- List all topics, actions, concepts, theories,
functions, roles other aspects of the sites
projected content. - Categorize the items according to
functional-topical - verb or noun, action or actor, entity or
relationship splits. - Analyze the categories from a user perspective
and build a tentative structure. - Develop a unifying metaphor to incorporate into
the design, expressing the relationships in a
holistic, symbolic way. - User test the site structure and labeling.
Louise Gruenberg
40Consideration Specificity
http//www.zappos.com/welcome.zhtml
41Consideration Stability
1998
2000
2002
2004
2006
2007
42Faceted System Bottom-up Approach
- Focuses on the important, essential or persistent
characteristics of content objects. - Useful for fine-grained rapidly changing
repositories. - Easy to add a new facet at any time.
- Example a facet map file http//facetmap.com/con
f/wine.txt
43Other Considerations
- Implicit information?
- Classification of objects over fuzzy boundaries.
44Faceted Classification
- Bottom-Up approach
- Central concept How do I describe this?
- Oct 2003 69 of sites made at least some use of
faceted classification.
45Faceted Classification is not new!
- S. R. Ranganathan ((1892-1972)
- a clearly defined, mutually exclusive, and
collective exhaustive aspects, properties or
characteristics of a class or specific subject. - Describing documents from various perspectives
- Prolegomena to Library Classification (1967)
definitive resource on faceted classification
46Potential Facets
- Topic
- Product
- Document Type Format?
- technical report
- white paper
- news article
- Source Creator?
- Intended Audience
- Geography
- Price
Peter Morville
47Usability is Key
- Don't have to predict what facets the users will
find most intuitive. - MUST create an intuitive interface, so that the
user, with a minimum amount of effort can use the
facets to search the site.
Kathryn La Barre
48Principles for Choice of Facets
- Principle of Differentiation
- Principle of Relevance
- Principle of Ascertainability
- Principle of Permanence
- Principle of Homogeneity
- Principle of Mutual Exclusivity
- Principle of Fundamental Categories
Louise Spiteri
49Creating a Faceted Classification
- Content analysis
- Gathering a representative sample site's content.
- Adopt a Noah's Ark approach Try to capture a
couple of each type of animal.
Peter Morville
50Resources
- Introduction to Dewey Decimal Classification
system from OCLC - BISAC Subject Headings 2006 Edition - September
2006 - Everything is Miscellaneous blog by David
Weinberger - What are the differences between a vocabulary, a
taxonomy, a thesaurus, an ontology, and a
meta-model? by Woody Pidcock - Building a Synonymous Search Index by Peter
Morville - Creating a Controlled Vocabulary by Leise, Fast
and Steckel - Guidelines for the Construction, Format, and
Management of Monolingual Controlled Vocabulari
Z39.50 Standard Document (pdf) from NISO - A Simplified Model for Facet Analysis by Dr.
Louise Spiteri