Title: Standards for Controlled Vocabularies
1Standards for Controlled Vocabularies
- I. IFLA Guidelines - 2005
- II. U.S. Standard (NISO Z39.19 - 2005)
- III. British Standards (BS 8723 2005)
Marcia Lei Zeng for IFLA 2006 Classification
Indexing Section program, Seoul, Korea
2I. IFLA Guidelines for Multilingual Thesauri
- IFLA Classification and Indexing Section
- April 2005 released for commentshttp//www.ifla
.org/VII/s29/pubs/Draft-multilingualthesauri.pdf
3IFLA Classification and Indexing Section WG on
Guidelines for Multilingual Thesauri
- Chair Gerhard J.A. Riesthuis (Netherlands)
- Members
- Lois Mai Chan (USA),
- Patrice Landry (Switzerland),
- Pia Leth (Sweden),
- Ia McIlwaine (United Kingdom),
- Martin Kunz (Germany),
- Dorothy McGarry (USA),
- Max Naudi (France),
- Marcia Lei Zeng (USA)
4Three approaches in the development of
multilingual thesauri
- building a new thesaurus from the bottom up
- starting with one language and adding another
language or languages - starting with more than one language
simultaneously - combining existing thesauri
- merging two or more existing thesauri into one
new (multilingual) information retrieval language
to be used in indexing and retrieval - linking existing thesauri and subject heading
languages to each other using the existing
thesauri and/or subject heading languages both in
indexing and retrieval - translating a thesaurus into one or more other
languages
5Semantic structure of multilingual thesauri (1)
- symmetrical
- all different language versions of a multilingual
thesaurus have to be identical - each descriptor must have one and only one
equivalent in every language and be related in
the same way to other descriptors in the given
language
6Example a symmetrical thesaurus (last versions
interface)
http//www.fao.org/agrovoc/
1
3
4
2
7Semantic structure of multilingual thesauri (2)
- non-identical and non-symmetrical structure
- the number of descriptors in each language is not
necessarily the same - the way descriptors are related to each other
can be different for the different languages
8Example a non-symmetrical thesaurus
HEREIN thesaurus interlingua
http//www.european-heritage.net/sdx/herein/thesau
rus/introduction.xsp
9(No Transcript)
10Each exists in its own language and structure.
(PDF version)
11Each exists in its own language and structure.
(PDF version)
12 Semantic problems
- Semantic problems pertain to equivalence
relations between terms used as preferred and
non-preferred terms in information retrieval
languages. - Equivalence relations exist not only within each
separate language involved, but also between the
languages (intra-language equivalence and
inter-language equivalence). - Intra-language homonymy and inter-language
homonymy are also considered semantic questions. - Additional problems pertaining to semantics
involve the scope, form and choice of thesaurus
terms.
13Examples of homographs in multiple languages
Cranes as a homograph in English does not
necessarily mean that equivalent terms in other
languages are also homographs. The Dutch term
kranen is a homograph, too, but with the meanings
cranes (lifting equipment) and taps.
14Structural problems
- Structural problems involve hierarchical and
associative relations between the terms. - An important question in this respect is whether
the structure should be the same or different for
each language. - In most, if not all, cases of linking, the
structure will most likely not be the same in all
the information retrieval languages involved.
15Contents covered by the guidelines
- Building multilingual thesauri starting from
scratch - Structure
- Morphology and Semantics
- Starting from existing thesauri
- Merging
- Linking
- Glossary
- Appendix
- An example of a non-symmetrical thesaurus
16II. U.S. Standard for Controlled Vocabularies
NISO Z39.19
- NISO Z39.19-2005 Guidelines for the Construction,
Format, and Management of Monolingual Controlled
Vocabularies - Some of the slides are based on
- Emily Fayen 2004.6 SLA presentation, Margie
Hlavas talk at 2005 Data Harmony User Group
meeting 2005 and Marcia Zeng NKOS Meeting in
Denver, 2005
17A little bit of history
- ANSI/NISO Z39.19,Guidelines for the Construction,
Format, and Management of Monolingual Thesauri
1993 - The most frequently requested NISO Standard
- In spite of its age the Standard is still
relevant - 1999 NISO Workshop on Electronic Thesauri
http//www.niso.org/news/events_workshop/thes99rpt
.html - 2002 NISO initiates revision of Z39.19
- 2004 Z39.19-1993 reaffirmed
- 2005 New standard Z39.19-2005 published
18Scope
- Expand beyond thesaurus
- Make more user-friendly
- Explain important concepts
- Explain principles of vocabulary control
- Include electronic information environment
- Include additional user search methods
- Browse
- Navigate
- Keyword searching
- Expand beyond A I services
- Include Web applications
19The Team
- Emily Gallup Fayen, project Leader-- MuseGlobal,
Inc. - Vivian Bliss Microsoft
- Carol Brent ProQuest
- John Dickert DTIC
- Lynn El-Hoshy Library of Congress
- Marjorie Hlava Access Innovations
- Stephen Hearn ALA
- Sabine Kuhn Chemical Abstracts Service
- Pat Kuhr H.W. Wilson Company
- Diane McKerlie DMA Consulting
- Peter Morville -- Semantic Studios
- Stuart Nelson National Library of Medicine
- Allan Savage National Library of Medicine
- Diane Vizine-Goetz OCLC
- Marcia Lei Zeng Special Libraries Association
20Z39.19 Chapters
- Content1 Introduction 2 Scope 3 Referenced
Standards 4 Definitions, Abbreviations, and
Acronyms 5 Controlled Vocabularies Purpose,
Concepts, Principles, and Structure 6 Term
Choice, Scope, and Form 7 Compound Terms 8
Relationships9 Displaying Controlled
Vocabularies 10 Interoperability11
Construction, Testing, Maintenance, and
Management Systems
21Whats new?
Added
- Coverage
- documents
- Types of vocabularies
- Thesauri
- Post-coordinated
- Printed formats
- Monolingual vocabularies
- Coverage
- Content objects
- Types of vocabularies
- lists, synonym rings, taxonomy
- Pre-coordinated
- Web format
- Multilingual vocabularies (general)
- Interoperability
- Facet analysis
22Types of vocabulary control-- based on the
important principles
23Lists
- A list is a simple group of terms
- Example
- Alabama
- Alaska
- Arkansas
- California
- Colorado
- . . . .
- Frequently used in Web site pick lists and
pull down menus
24Source The J. Paul Getty Museum's implementation
of The Museum System software by Gallery Systems
25Synonym Rings
- A synonym ring is a list of synonyms or near
synonyms that are used interchangeably for
retrieval purposes
26Synonym Rings-- Examples
- Synonym rings are usually found as sets of lists
that allow users to access all content containing
any of the terms.
- e.g., cholesterol
- Cholesterol
- Blood Cholesterol
- Serum Cholesterol
- Good Cholesterol
- Bad Cholesterol
- LDL
- .
- .
- .
-- Frequently used in systems where the content
is not indexed or the indexing vocabulary is not
controlled
27An example from International SEMATECH a
search for Silicon would look like this
Your search was submitted as SILICON or SI
28Synonym Rings are used--
- to expand queries for content objects.
- in systems where the underlying content objects
are left in their unstructured natural language
format. - in conjunction with search engines and provide a
minimal amount of control of the diversity of the
language found in the texts of the underlying
documents.
29Taxonomies
- A taxonomy is a set of preferred terms, all
connected by a hierarchy or polyhierarchy - Example
- Chemistry
- Organic chemistry
- Polymer chemistry
- Nylon
- Frequently used in web navigation systems
30Thesauri
- A thesaurus is a controlled vocabulary with
multiple types of relationships - Example
- Rice
- UF paddy
- BT Cereals
- BT Plant products
- NT Brown rice
- RT Rice straw
31Thesauri (cont.)
- Relationship types
- Equivalence (Use/Used For)
- indicates preferred term in a synonym
relationship - Hierarchy (NT/BT)
- indicates broader and narrower terms
- Associative (RT/RT)
- almost unlimited types of relationships may be
used - related - It is the most complex format for controlled
vocabularies and is widely used.
32Interoperability
- One of the most important issues from the 1999
workshop - Question How to
- compare indexes?
- perform searches?
- merge databases that have been developed using
different controlled vocabularies?
33Interoperability (cont.)
- Factors Affecting Interoperability
- Multilingual Controlled Vocabularies
- Searching
- Indexing
- Merging Databases
- Merging Controlled Vocabularies
- Achieving Interoperability
- Storage and Maintenance of Relationships among
Terms in Multiple Controlled Vocabularies
34III. The British Standard
- BS 8723 Structured Vocabularies for Information
Retrieval Guide - Slides based on the presentation by
- Stella G Dextre Clarke, Alan Gilchrist ,Leonard
Will - In ISKO 2004, London
35 Existing BSI/ISO thesaurus standards
- ISO 2788-1986 Guidelines for the establishment
and development of monolingual thesauri - BS 57231987
- ISO 5964-1985 Guidelines for the establishment
and development of multilingual thesauri - BS 67231985
36 What needs updating?
- Printed versus electronic application
- Guidance on management software
- Interoperability
- Mapping between thesauri and other types of
vocabulary - Formats/protocols for data exchange with
downstream applications - Applicability to end-user applications, not just
to those for information professionals
37 BS 8723 Structured Vocabularies for
Information Retrieval Guide
- Part 1 - Definitions, symbols and abbreviations
- Part 2 Thesauri Part 1 and 2 correspond to
ISO2788 and they supersede BS 5723. - Part 3 - Vocabularies other than thesauri
- Part 4 - Interoperability between vocabularies
Part 4 will supersede BS 6723, which is
equivalent to ISO 5964. - Part 5 - Interoperation between vocabularies and
other components of information storage and
retrieval systems
38- Part 1 Definitions, symbols and abbreviations
- provides definitions, symbols, abbreviations and
other conventions applying to all the parts. - Part 2 Thesauri
- is designed for situations in which human
indexers analyse documents and express their
subjects using thesaurus terms, before searchers
retrieve the documents with the same vocabulary.
39Part 3 Vocabularies other than thesauri
- Now in written
- Classification schemes
- Business classification schemes for records
management - Taxonomies
- Subject heading schemes
- Ontologies
- Planned
- Classification schemes
- Taxonomies
- Subject heading lists
- Ontologies
- Semantic nets
- Search thesauri
40Part 4 Interoperability between vocabularies
- Huge demand for accessing information that has
been indexed with another language and/or
vocabulary. The buzzword is Mapping. - Part 4 includes multilingual thesauri as a
special case of mapping between vocabularies. - Part 4 applies to situations in which more than
one language or vocabulary is in use, but access
to all resources is needed through the one
vocabulary chosen by the user.
41Part 4 (cont.)
- BSÂ 8723 part 4 has a wider scope than BSÂ 6723,
which was concerned only with multilingual
thesauri. - It covers all of the previous ground and extends
the scope to - thesauri in different dialects of one language
- different thesauri in a single language
- situations where a thesaurus interoperates with
one or more different types of structured
vocabulary, such as classification schemes - situations where not all the interoperating
vocabularies have the same status and/or function.
42Part 5 Interoperability with applications
- Vocabularies must work with
- Search engines
- Content Management Systems
- Web publishing software, etc.
- Part 5 sets out the protocols and formats needed
for the exchange of vocabulary data.
43Standards are available at
- IFLA Guidelines for Multilingual Thesauri
- http//www.ifla.org/VII/s29/pubs/Draft-multilingua
lthesauri.pdf - US Standard -- NISO 2788
- http//www.niso.org/standards/standard_detail.cfm?
std_id814 - Tutorial http//www.slis.kent.edu/mzeng/Z3919/in
dex.html - British Standard -- BS 8723
- These documents may be ordered from BSI Customer
Services - tel 44(0)208-996-9001 or
- email orders_at_bsi-global.com
44http//www.slis.kent.edu/mzeng/Z3919/index.html