Workshop 3 DAY 3 - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Workshop 3 DAY 3

Description:

P. Calvert. 1. June 2001. Philip Calvert. School of ... Earthquakes / seismology. Accountancy / accountants. Driving / Road vehicles. Women / Feminism ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 41
Provided by: philipc3
Category:

less

Transcript and Presenter's Notes

Title: Workshop 3 DAY 3


1
Workshop 3 DAY 3
  • Philip Calvert
  • School of Information Management
  • Victoria University of Wellington
  • New Zealand

2
What is a Thesaurus?
  • A vocabulary of controlled indexing language,
    formally organised, so that relationships between
    concepts are made explicit
  • To be used in an information retrieval system,
    such as card catalogue, printed index, or online
    database

3
  • The primary purpose of a thesaurus is information
    retrieval
  • Secondary purposes
  • definition of terms
  • semantic map of subject area

4
Classic use of thesaurus
  • Thesaurus is used for indexing documents to
    create a database
  • The same thesaurus is used for searching the
    database for document retrieval

5
Planning for a thesaurus
  • Do you need a thesaurus?
  • Controlled language versus natural language
  • Indexing in the full-text era

6
Controlled and Natural Language
  • Advantages of controlled language
  • Eases searching by
  • controlling synonyms
  • leads to nearest preferred term for starting
    search
  • shows relationships between terms
  • uses agreed definitions of terms
  • Avoids over exhaustivity - no minor terms
  • Useful in multilingual systems

7
Weaknesses of controlled language
  • lack of specificity
  • lack of exhaustivity
  • not always up-to-date
  • indexer can misunderstand authors intent
  • searcher must use un-natural language

8
The thesaurus in the full-text era
  • If the system searches for full-text, do you need
    a thesaurus?
  • In some systems, the thesaurus enhances the
    performance of full-text retrieval. It retrieves
    extra references that do not contain the exact
    words in the search term.

9
Vocabulary control
  • Preferred and non-preferred index terms
  • General categories
  • Form of index term
  • Choice of index terms
  • Restriction and clarification of meaning

10
Preferred and non-preferred index terms
  • An index term is a representation of a concept
  • In a thesaurus, a term must be a Preferred Term
    or a Non-Preferred Term
  • The Preferred Term is the chosen term

11
General categories
  • Concepts covered by index terms cover three
    categories
  • Concrete, e.g. Primates, Buildings
  • Abstract, e.g. Evolution, Efficiency
  • Individual, e.g. Nigeria, Unesco

12
Form of index term
  • Grammatical phrases
  • Singular and plural forms
  • Punctuation, capitalisation
  • Abbreviations, initialisms, acronyms

13
1 Grammatical phrases
  • Nouns and noun phrases
  • Women workers
  • Philosophy of education
  • Heads of state
  • Adjectives can be used, e.g. Portable
  • Verbs, e.g. Communication, Walking
  • Initial articles depend upon the sense made, e.g.
    use Theatre, and The Narrows

14
2 Singular and plural forms
15
3 Punctuation, capitalisation
  • Avoid punctuation if possible, as it affects
    filing order
  • Use brackets only as a qualifier
  • Coating (process)
  • Retain apostrophes
  • Childrens hospitals
  • Keep hyphens to retain meaning
  • X-Ray

16
4 Abbreviations, initialisms, acronyms
  • Use initials if the term or organisation is
    well-known by its initials
  • Unesco
  • AIDS

17
Choice of index terms
  • Use loan words, e.g. Sub judice, glasnost
  • Neologisms are ok, e.g money laundering
  • Avoid tradenames, e.g. Ovaltine (TM)
  • Use popular name or scientific names according to
    the audience, e.g Rubella or German Measles
  • Same with place names, e.g. Burma or Myanmar

18
Restriction and clarification of meaning
  • Homographs - must clarify meaning
  • Cells (biology)
  • Cells (electric)
  • Reading (process)
  • Reading (place)
  • Use scope notes and definitions to clarify
    meaning of terms

19
Specificity and Compound Terms
  • Vocabulary specificity
  • Compound terms levels of pre-coordination
  • A thesaurus with mostly single terms is called
    post-coordinate. A thesaurus with many compound
    terms is pre-coordinate

20
1 Vocabulary specificity
21
2 Compound Terms
  • It is a general rule that terms in a thesaurus
    should represent single or unitary concepts as
    far as possible, and compound terms should be
    broken into single elements, except when this
    will lower the users understanding
  • avoid gt Workload of dentists in Scotland
  • prefer gt Workload Dentists Scotland

22
(No Transcript)
23
  • Use natural word order,
  • e.g. Dried vegetables
  • not Vegetables, Dried
  • If indirect form is used, there must always be a
    reference to natural word order
  • Libraries, University
  • see
  • University libraries

24
(No Transcript)
25
Keep a compound term to retain the meaning that
would be lost if words were split
  • Group discussion
  • Discussion group
  • Markov Process
  • White elephants (which arent elephants at all)
  • First aid (would First make a useful search
    term?)

26
Structure and relationships
  • A very important feature of a thesaurus is the
    display of structural relationships between the
    terms it contains.
  • gt Micro level semantic links between terms
  • gt Macro level How the terms relate to the
    overall structure of the subject field

27
Basic relationship types
  • Equivalence relationships
  • Hierarchical relationships
  • Associative relationships

28
1. Equivalence relationships I
  • A relationship between preferred and
    non-preferred terms where two or more terms are
    regarded as referring to the same concept
  • Use preferred term
  • UF (use for) non-preferred term

29
1. Equivalence relationships II
  • Popular /scientific names
  • Spiders / Arachnida
  • Different words for the same concept
  • Teleworking / Distance working
  • Railways / Railroads
  • Words from different languages
  • Aliens / Foreigners
  • Terms that change over time
  • Negroes / Black Americans / African-Americans

30
1. Equivalence relationships III
  • Different spelling
  • Gipsies / Gypsies
  • Mouse / Mice
  • Direct and indirect forms
  • Dried vegetables / Vegetables, Dried
  • Abbreviations and full form
  • TQM / Total Quality Management
  • Near synonyms
  • Dryness / Wetness
  • Literacy / Illiteracy

31
2. Hierarchical relationships I
  • The relationship between levels of
    superordination and subordination. It shows
    broader and narrower concepts.
  • It is an important feature of a thesaurus and
    helps with searching
  • BT Broader Term
  • NT Narrower Term

32
2. Hierarchical relationships IIa
  • Generic
  • BT Vertebrates
  • NT Mammals
  • Reptiles

33
2. Hierarchical relationships IIb
  • Whole - part relationships
  • BT Ear
  • NT External ear
  • Inner ear

34
2. Hierarchical relationships IIc
  • Instances
  • BT Seas
  • NT Baltic Sea
  • China Sea

35
3. Associative relationships
  • Closely related concepts that are not
    hierarchical and not equivalent. The indexer
    often makes a judgement before adding this sort
    of relationship to the thesaurus
  • RT Related Term

36
3. Associative relationships - examples
  • Earthquakes / seismology
  • Accountancy / accountants
  • Driving / Road vehicles
  • Women / Feminism
  • Friction / Wear
  • Pests / Pesticides
  • Single people / Married people
  • Communication / Communication skills

37
Thesaurus Displays
  • Can be categorised as
  • Alphabetical displays showing scope notes
    hierarchical relationships
  • Hierarchical displays generated from A-Z display
  • Systematic displays showing total structure
  • Graphic displays

38
Construction Techniques 1
  • Initials steps
  • Define the subject
  • Choose thesaurus characteristics and layout
  • Select the terms
  • other thesauri and classification schemes
  • literature searching
  • experts knowledge
  • Record the terms

39
Construction Techniques 2
  • Creating the structure
  • organise terms into broad categories
  • identify
  • key categories
  • peripheral categories
  • general concepts

40
Construction Techniques 3
  • Step A - Choose key facets / groups
  • Step B - Make the hierarchies code with NT/BT
  • Step C - Add SN, UF, then BT/NT/RT from terms in
    other subject fields
  • Step D - Make the alphabetical display
Write a Comment
User Comments (0)
About PowerShow.com