Standards for Controlled Vocabularies - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Standards for Controlled Vocabularies

Description:

all different language versions of a multilingual thesaurus have to be identical ... http://www.european-heritage.net/sdx/herein/thesaurus/introduction.xsp ... – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 45
Provided by: Emily122
Category:

less

Transcript and Presenter's Notes

Title: Standards for Controlled Vocabularies


1
Standards for Controlled Vocabularies
  • I. IFLA Guidelines - 2005
  • II. U.S. Standard (NISO Z39.19 - 2005)
  • III. British Standards (BS 8723 2005)

Marcia Lei Zeng for IFLA 2006 Classification
Indexing Section program, Seoul, Korea
2
I. IFLA Guidelines for Multilingual Thesauri
  • IFLA Classification and Indexing Section
  • April 2005 released for commentshttp//www.ifla
    .org/VII/s29/pubs/Draft-multilingualthesauri.pdf

3
IFLA Classification and Indexing Section WG on
Guidelines for Multilingual Thesauri
  • Chair Gerhard J.A. Riesthuis (Netherlands)
  • Members
  • Lois Mai Chan (USA),
  • Patrice Landry (Switzerland),
  • Pia Leth (Sweden),
  • Ia McIlwaine (United Kingdom),
  • Martin Kunz (Germany),
  • Dorothy McGarry (USA),
  • Max Naudi (France),
  • Marcia Lei Zeng (USA)

4
Three approaches in the development of
multilingual thesauri
  • building a new thesaurus from the bottom up
  • starting with one language and adding another
    language or languages
  • starting with more than one language
    simultaneously
  • combining existing thesauri
  • merging two or more existing thesauri into one
    new (multilingual) information retrieval language
    to be used in indexing and retrieval
  • linking existing thesauri and subject heading
    languages to each other using the existing
    thesauri and/or subject heading languages both in
    indexing and retrieval
  • translating a thesaurus into one or more other
    languages

5
Semantic structure of multilingual thesauri (1)
  • symmetrical
  • all different language versions of a multilingual
    thesaurus have to be identical
  • each descriptor must have one and only one
    equivalent in every language and be related in
    the same way to other descriptors in the given
    language

6
Example a symmetrical thesaurus (last versions
interface)
http//www.fao.org/agrovoc/
1
3
4
2
7
Semantic structure of multilingual thesauri (2)
  • non-identical and non-symmetrical structure
  • the number of descriptors in each language is not
    necessarily the same
  • the way descriptors are related to each other
    can be different for the different languages

8
Example a non-symmetrical thesaurus
HEREIN thesaurus interlingua
http//www.european-heritage.net/sdx/herein/thesau
rus/introduction.xsp
9
(No Transcript)
10
Each exists in its own language and structure.
(PDF version)
11
Each exists in its own language and structure.
(PDF version)
12
Semantic problems
  • Semantic problems pertain to equivalence
    relations between terms used as preferred and
    non-preferred terms in information retrieval
    languages.
  • Equivalence relations exist not only within each
    separate language involved, but also between the
    languages (intra-language equivalence and
    inter-language equivalence).
  • Intra-language homonymy and inter-language
    homonymy are also considered semantic questions.
  • Additional problems pertaining to semantics
    involve the scope, form and choice of thesaurus
    terms.

13
Examples of homographs in multiple languages
Cranes as a homograph in English does not
necessarily mean that equivalent terms in other
languages are also homographs. The Dutch term
kranen is a homograph, too, but with the meanings
cranes (lifting equipment) and taps.
14
Structural problems
  • Structural problems involve hierarchical and
    associative relations between the terms.
  • An important question in this respect is whether
    the structure should be the same or different for
    each language.
  • In most, if not all, cases of linking, the
    structure will most likely not be the same in all
    the information retrieval languages involved.

15
Contents covered by the guidelines
  • Building multilingual thesauri starting from
    scratch
  • Structure
  • Morphology and Semantics
  • Starting from existing thesauri
  • Merging
  • Linking
  • Glossary
  • Appendix
  • An example of a non-symmetrical thesaurus

16
II. U.S. Standard for Controlled Vocabularies
NISO Z39.19
  • NISO Z39.19-2005 Guidelines for the Construction,
    Format, and Management of Monolingual Controlled
    Vocabularies
  • Some of the slides are based on
  • Emily Fayen 2004.6 SLA presentation, Margie
    Hlavas talk at 2005 Data Harmony User Group
    meeting 2005 and Marcia Zeng NKOS Meeting in
    Denver, 2005

17
A little bit of history
  • ANSI/NISO Z39.19,Guidelines for the Construction,
    Format, and Management of Monolingual Thesauri
    1993
  • The most frequently requested NISO Standard
  • In spite of its age the Standard is still
    relevant
  • 1999 NISO Workshop on Electronic Thesauri
    http//www.niso.org/news/events_workshop/thes99rpt
    .html
  • 2002 NISO initiates revision of Z39.19
  • 2004 Z39.19-1993 reaffirmed
  • 2005 New standard Z39.19-2005 published

18
Scope
  • Expand beyond thesaurus
  • Make more user-friendly
  • Explain important concepts
  • Explain principles of vocabulary control
  • Include electronic information environment
  • Include additional user search methods
  • Browse
  • Navigate
  • Keyword searching
  • Expand beyond A I services
  • Include Web applications

19
The Team
  • Emily Gallup Fayen, project Leader-- MuseGlobal,
    Inc.
  • Vivian Bliss Microsoft
  • Carol Brent ProQuest
  • John Dickert DTIC
  • Lynn El-Hoshy Library of Congress
  • Marjorie Hlava Access Innovations
  • Stephen Hearn ALA
  • Sabine Kuhn Chemical Abstracts Service
  • Pat Kuhr H.W. Wilson Company
  • Diane McKerlie DMA Consulting
  • Peter Morville -- Semantic Studios
  • Stuart Nelson National Library of Medicine
  • Allan Savage National Library of Medicine
  • Diane Vizine-Goetz OCLC
  • Marcia Lei Zeng Special Libraries Association

20
Z39.19 Chapters
  • Content1 Introduction 2 Scope 3 Referenced
    Standards 4 Definitions, Abbreviations, and
    Acronyms 5 Controlled Vocabularies Purpose,
    Concepts, Principles, and Structure 6 Term
    Choice, Scope, and Form 7 Compound Terms 8
    Relationships9 Displaying Controlled
    Vocabularies 10 Interoperability11
    Construction, Testing, Maintenance, and
    Management Systems

21
Whats new?
Added
  • Coverage
  • documents
  • Types of vocabularies
  • Thesauri
  • Post-coordinated
  • Printed formats
  • Monolingual vocabularies
  • Coverage
  • Content objects
  • Types of vocabularies
  • lists, synonym rings, taxonomy
  • Pre-coordinated
  • Web format
  • Multilingual vocabularies (general)
  • Interoperability
  • Facet analysis

22
Types of vocabulary control-- based on the
important principles
23
Lists
  • A list is a simple group of terms
  • Example
  • Alabama
  • Alaska
  • Arkansas
  • California
  • Colorado
  • . . . .
  • Frequently used in Web site pick lists and
    pull down menus

24
Source The J. Paul Getty Museum's implementation
of The Museum System software by Gallery Systems
25
Synonym Rings
  • A synonym ring is a list of synonyms or near
    synonyms that are used interchangeably for
    retrieval purposes

26
Synonym Rings-- Examples
  • Synonym rings are usually found as sets of lists
    that allow users to access all content containing
    any of the terms.
  • e.g., cholesterol
  • Cholesterol
  • Blood Cholesterol
  • Serum Cholesterol
  • Good Cholesterol
  • Bad Cholesterol
  • LDL
  • .
  • .
  • .

-- Frequently used in systems where the content
is not indexed or the indexing vocabulary is not
controlled
27
An example from International SEMATECH a
search for Silicon would look like this
Your search was submitted as SILICON or SI
28
Synonym Rings are used--
  • to expand queries for content objects.
  • in systems where the underlying content objects
    are left in their unstructured natural language
    format.
  • in conjunction with search engines and provide a
    minimal amount of control of the diversity of the
    language found in the texts of the underlying
    documents.

29
Taxonomies
  • A taxonomy is a set of preferred terms, all
    connected by a hierarchy or polyhierarchy
  • Example
  • Chemistry
  • Organic chemistry
  • Polymer chemistry
  • Nylon
  • Frequently used in web navigation systems

30
Thesauri
  • A thesaurus is a controlled vocabulary with
    multiple types of relationships
  • Example
  • Rice
  • UF paddy
  • BT Cereals
  • BT Plant products
  • NT Brown rice
  • RT Rice straw

31
Thesauri (cont.)
  • Relationship types
  • Equivalence (Use/Used For)
  • indicates preferred term in a synonym
    relationship
  • Hierarchy (NT/BT)
  • indicates broader and narrower terms
  • Associative (RT/RT)
  • almost unlimited types of relationships may be
    used - related
  • It is the most complex format for controlled
    vocabularies and is widely used.

32
Interoperability
  • One of the most important issues from the 1999
    workshop
  • Question How to
  • compare indexes?
  • perform searches?
  • merge databases that have been developed using
    different controlled vocabularies?

33
Interoperability (cont.)
  • Factors Affecting Interoperability
  • Multilingual Controlled Vocabularies
  • Searching
  • Indexing
  • Merging Databases
  • Merging Controlled Vocabularies
  • Achieving Interoperability
  • Storage and Maintenance of Relationships among
    Terms in Multiple Controlled Vocabularies

34
III. The British Standard
  • BS 8723 Structured Vocabularies for Information
    Retrieval Guide
  • Slides based on the presentation by
  • Stella G Dextre Clarke, Alan Gilchrist ,Leonard
    Will
  • In ISKO 2004, London

35
Existing BSI/ISO thesaurus standards
  • ISO 2788-1986 Guidelines for the establishment
    and development of monolingual thesauri
  • BS 57231987
  • ISO 5964-1985 Guidelines for the establishment
    and development of multilingual thesauri
  • BS 67231985

36
What needs updating?
  • Printed versus electronic application
  • Guidance on management software
  • Interoperability
  • Mapping between thesauri and other types of
    vocabulary
  • Formats/protocols for data exchange with
    downstream applications
  • Applicability to end-user applications, not just
    to those for information professionals

37
BS 8723 Structured Vocabularies for
Information Retrieval Guide
  • Part 1 - Definitions, symbols and abbreviations
  • Part 2 Thesauri Part 1 and 2 correspond to
    ISO2788 and they supersede BS 5723.
  • Part 3 - Vocabularies other than thesauri
  • Part 4 - Interoperability between vocabularies
    Part 4 will supersede BS 6723, which is
    equivalent to ISO 5964.
  • Part 5 - Interoperation between vocabularies and
    other components of information storage and
    retrieval systems

38
  • Part 1 Definitions, symbols and abbreviations
  • provides definitions, symbols, abbreviations and
    other conventions applying to all the parts.
  • Part 2 Thesauri
  • is designed for situations in which human
    indexers analyse documents and express their
    subjects using thesaurus terms, before searchers
    retrieve the documents with the same vocabulary.

39
Part 3 Vocabularies other than thesauri
  • Now in written
  • Classification schemes
  • Business classification schemes for records
    management
  • Taxonomies
  • Subject heading schemes
  • Ontologies
  • Planned
  • Classification schemes
  • Taxonomies
  • Subject heading lists
  • Ontologies
  • Semantic nets
  • Search thesauri

40
Part 4 Interoperability between vocabularies
  • Huge demand for accessing information that has
    been indexed with another language and/or
    vocabulary. The buzzword is Mapping.
  • Part 4 includes multilingual thesauri as a
    special case of mapping between vocabularies.
  • Part 4 applies to situations in which more than
    one language or vocabulary is in use, but access
    to all resources is needed through the one
    vocabulary chosen by the user.

41
Part 4 (cont.)
  • BS 8723 part 4 has a wider scope than BS 6723,
    which was concerned only with multilingual
    thesauri.
  • It covers all of the previous ground and extends
    the scope to
  • thesauri in different dialects of one language
  • different thesauri in a single language
  • situations where a thesaurus interoperates with
    one or more different types of structured
    vocabulary, such as classification schemes
  • situations where not all the interoperating
    vocabularies have the same status and/or function.

42
Part 5 Interoperability with applications
  • Vocabularies must work with
  • Search engines
  • Content Management Systems
  • Web publishing software, etc.
  • Part 5 sets out the protocols and formats needed
    for the exchange of vocabulary data.

43
Standards are available at
  • IFLA Guidelines for Multilingual Thesauri
  • http//www.ifla.org/VII/s29/pubs/Draft-multilingua
    lthesauri.pdf
  • US Standard -- NISO 2788
  • http//www.niso.org/standards/standard_detail.cfm?
    std_id814
  • Tutorial http//www.slis.kent.edu/mzeng/Z3919/in
    dex.html
  • British Standard -- BS 8723
  • These documents may be ordered from BSI Customer
    Services
  • tel 44(0)208-996-9001 or
  • email orders_at_bsi-global.com

44
http//www.slis.kent.edu/mzeng/Z3919/index.html
Write a Comment
User Comments (0)
About PowerShow.com