Title: Workshop 3 DAY 3
1Workshop 3 DAY 3
- Philip Calvert
- School of Information Management
- Victoria University of Wellington
- New Zealand
2What is a Thesaurus?
- A vocabulary of controlled indexing language,
formally organised, so that relationships between
concepts are made explicit - To be used in an information retrieval system,
such as card catalogue, printed index, or online
database
3- The primary purpose of a thesaurus is information
retrieval - Secondary purposes
- definition of terms
- semantic map of subject area
4Classic use of thesaurus
- Thesaurus is used for indexing documents to
create a database - The same thesaurus is used for searching the
database for document retrieval
5Planning for a thesaurus
- Do you need a thesaurus?
- Controlled language versus natural language
- Indexing in the full-text era
6Controlled and Natural Language
- Advantages of controlled language
- Eases searching by
- controlling synonyms
- leads to nearest preferred term for starting
search - shows relationships between terms
- uses agreed definitions of terms
- Avoids over exhaustivity - no minor terms
- Useful in multilingual systems
7Weaknesses of controlled language
- lack of specificity
- lack of exhaustivity
- not always up-to-date
- indexer can misunderstand authors intent
- searcher must use un-natural language
8The thesaurus in the full-text era
- If the system searches for full-text, do you need
a thesaurus? - In some systems, the thesaurus enhances the
performance of full-text retrieval. It retrieves
extra references that do not contain the exact
words in the search term.
9Vocabulary control
- Preferred and non-preferred index terms
- General categories
- Form of index term
- Choice of index terms
- Restriction and clarification of meaning
10Preferred and non-preferred index terms
- An index term is a representation of a concept
- In a thesaurus, a term must be a Preferred Term
or a Non-Preferred Term - The Preferred Term is the chosen term
11General categories
- Concepts covered by index terms cover three
categories - Concrete, e.g. Primates, Buildings
- Abstract, e.g. Evolution, Efficiency
- Individual, e.g. Nigeria, Unesco
12Form of index term
- Grammatical phrases
- Singular and plural forms
- Punctuation, capitalisation
- Abbreviations, initialisms, acronyms
131 Grammatical phrases
- Nouns and noun phrases
- Women workers
- Philosophy of education
- Heads of state
- Adjectives can be used, e.g. Portable
- Verbs, e.g. Communication, Walking
- Initial articles depend upon the sense made, e.g.
use Theatre, and The Narrows
142 Singular and plural forms
153 Punctuation, capitalisation
- Avoid punctuation if possible, as it affects
filing order - Use brackets only as a qualifier
- Coating (process)
- Retain apostrophes
- Childrens hospitals
- Keep hyphens to retain meaning
- X-Ray
164 Abbreviations, initialisms, acronyms
- Use initials if the term or organisation is
well-known by its initials - Unesco
- AIDS
17Choice of index terms
- Use loan words, e.g. Sub judice, glasnost
- Neologisms are ok, e.g money laundering
- Avoid tradenames, e.g. Ovaltine (TM)
- Use popular name or scientific names according to
the audience, e.g Rubella or German Measles - Same with place names, e.g. Burma or Myanmar
18Restriction and clarification of meaning
- Homographs - must clarify meaning
- Cells (biology)
- Cells (electric)
- Reading (process)
- Reading (place)
- Use scope notes and definitions to clarify
meaning of terms
19Specificity and Compound Terms
- Vocabulary specificity
- Compound terms levels of pre-coordination
- A thesaurus with mostly single terms is called
post-coordinate. A thesaurus with many compound
terms is pre-coordinate
201 Vocabulary specificity
212 Compound Terms
- It is a general rule that terms in a thesaurus
should represent single or unitary concepts as
far as possible, and compound terms should be
broken into single elements, except when this
will lower the users understanding - avoid gt Workload of dentists in Scotland
- prefer gt Workload Dentists Scotland
22(No Transcript)
23- Use natural word order,
- e.g. Dried vegetables
- not Vegetables, Dried
- If indirect form is used, there must always be a
reference to natural word order - Libraries, University
- see
- University libraries
24(No Transcript)
25Keep a compound term to retain the meaning that
would be lost if words were split
- Group discussion
- Discussion group
- Markov Process
- White elephants (which arent elephants at all)
- First aid (would First make a useful search
term?)
26Structure and relationships
- A very important feature of a thesaurus is the
display of structural relationships between the
terms it contains. - gt Micro level semantic links between terms
- gt Macro level How the terms relate to the
overall structure of the subject field
27Basic relationship types
- Equivalence relationships
- Hierarchical relationships
- Associative relationships
281. Equivalence relationships I
- A relationship between preferred and
non-preferred terms where two or more terms are
regarded as referring to the same concept - Use preferred term
- UF (use for) non-preferred term
291. Equivalence relationships II
- Popular /scientific names
- Spiders / Arachnida
- Different words for the same concept
- Teleworking / Distance working
- Railways / Railroads
- Words from different languages
- Aliens / Foreigners
- Terms that change over time
- Negroes / Black Americans / African-Americans
301. Equivalence relationships III
- Different spelling
- Gipsies / Gypsies
- Mouse / Mice
- Direct and indirect forms
- Dried vegetables / Vegetables, Dried
- Abbreviations and full form
- TQM / Total Quality Management
- Near synonyms
- Dryness / Wetness
- Literacy / Illiteracy
312. Hierarchical relationships I
- The relationship between levels of
superordination and subordination. It shows
broader and narrower concepts. - It is an important feature of a thesaurus and
helps with searching - BT Broader Term
- NT Narrower Term
322. Hierarchical relationships IIa
- Generic
- BT Vertebrates
- NT Mammals
- Reptiles
332. Hierarchical relationships IIb
- Whole - part relationships
- BT Ear
- NT External ear
- Inner ear
342. Hierarchical relationships IIc
- Instances
- BT Seas
- NT Baltic Sea
- China Sea
353. Associative relationships
- Closely related concepts that are not
hierarchical and not equivalent. The indexer
often makes a judgement before adding this sort
of relationship to the thesaurus - RT Related Term
363. Associative relationships - examples
- Earthquakes / seismology
- Accountancy / accountants
- Driving / Road vehicles
- Women / Feminism
- Friction / Wear
- Pests / Pesticides
- Single people / Married people
- Communication / Communication skills
37Thesaurus Displays
- Can be categorised as
- Alphabetical displays showing scope notes
hierarchical relationships - Hierarchical displays generated from A-Z display
- Systematic displays showing total structure
- Graphic displays
38Construction Techniques 1
- Initials steps
- Define the subject
- Choose thesaurus characteristics and layout
- Select the terms
- other thesauri and classification schemes
- literature searching
- experts knowledge
- Record the terms
39Construction Techniques 2
- Creating the structure
- organise terms into broad categories
- identify
- key categories
- peripheral categories
- general concepts
40Construction Techniques 3
- Step A - Choose key facets / groups
- Step B - Make the hierarchies code with NT/BT
- Step C - Add SN, UF, then BT/NT/RT from terms in
other subject fields - Step D - Make the alphabetical display