Indexing - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Indexing

Description:

Performance of an information system. Pre-coordinate and post ... The Subject ... the indexer becomes conversant with the subject content of the document ... – PowerPoint PPT presentation

Number of Views:297
Avg rating:3.0/5.0
Slides: 24
Provided by: olip
Category:

less

Transcript and Presenter's Notes

Title: Indexing


1
Indexing
  • Controlled vs. free vocabularies
  • Indexing languages

2
Outline
  • Subject indexing process
  • Analysis of a document
  • Indexing exercise
  • Performance of an information system
  • Pre-coordinate and post-coordinate systems
  • Controlled vocabularies
  • Indexing failures

3
The Subject Indexing Process
  • Indexing the process whereby indexes and
    associated tools for the organization of
    knowledge are created
  • Effective efficient indexing involves skill
    and judgment in the assignment of terms
  • There are three stages

4
Stage 1 Familiarization
  • the indexer becomes conversant with the subject
    content of the document to be indexed
  • the indexer attempts to identify the concepts
    that are represented by the words in the document
  • the indexer must examine the documents content,
    concentrating particularly on the clues offered
    by the title, the contents page, chapter headings
    and any abstracts, introduction, prefaces,etc.

5
Stage 2 Analysis
  • the identification of the concepts within a
    document which are worthy of indexing
  • usually it is possible to identify a central
    theme
  • to what extent should access be provided to
    secondary topics considered within a document?
    (TGM example)
  • traditional approaches have sought to find an
    indexing term which is co-extensive with the
    content of the document (i.e. the scope of the
    term the document match)

6
Stage 2 Analysis (contd)
  • 3 Questions the Indexer must ask
  • What is the document about? (ideal read entire
    item and pick central theme)
  • Why has it been added to our collection?
  • What aspects will interest our users?
  • Key no single set of correct terms depends
    on audience and collection
  • And the more specialized the clientele, the
    more likely it is that the index can be tailored
    to their needs (i.e. highly specific)

7
(No Transcript)
8
Stage 2 Analysis (contd)
  • Exhaustive indexing how many themes will be
    included?
  • Specificity always index at the level of
    specificity of the document
  • E.g. an article on cultivating oranges is indexed
    under oranges, not citrus fruit (and not both
    only assign one term for each concept)
  • What if the controlled vocabulary doesnt include
    the term oranges? Use the most specific term
    you can use (citrus fruit not fruit)
  • In practice specificity may be achieved by
    using term combinations (e.g. Canadian Libraries
    Canada Libraries)

9
Indexing Exercise
  • Read the article, focusing on one paragraph at a
    time
  • What is the writer saying?
  • What are the concepts or are there any?
  • How does the paragraph reward the reader are
    there important ideas here?
  • Write down words/phrases for the concepts that
    come to mind
  • Index the document using natural language

10
Stage 3 Translation
  • having identified the central theme of a
    document, this theme must be described in terms
    present in the indexing language
  • in controlled language indexing, this involves
    using the thesaurus to assign terms to the
    document
  • key select terms and relationships that are
    consistent with the typical users perspective
    on the subject (i.e. user warrant so that the
    indexing system is tailored to the needs of the
    users of the index)

11
Factors Affecting Information System Performance
  • Indexing accuracy
  • Indexers have control over accuracy
  • Indexing policy
  • Outside the indexers control
  • Major policy decision exhaustivity the terms
    assigned may represent the subject matter of the
    document completely or they may be selective
  • E.g. most items will be indexed with 8 to 15
    terms

12
Precision and Recall
  • Precision ratio of useful items to total
    retrieved of relevant records retrieved
  • of records retrieved
  • Recall extent to which all useful items are
    found, from the total in the database
  • of relevant records retrieved
  • of relevant records in the database
  • E.g. 100 relevant records in the database
  • 80 records retrieved that are relevant
  • 200 records retrieved in total
  • Recall 80/100 80
  • Precision 80/200 40 lots of junk

13
FW Lancasters Rules to Guide Indexing Practice
  • Include all topics known to be of interest to the
    users of the information service that are treated
    substantively in the document
  • Ask yourself how much information is given on
    the topic in the article? How much interest will
    users have in the topic? How much information
    already exists on the topic?
  • Index each of these as specifically as the
    vocabulary of the system allows and the needs or
    interests of the users warrants

14
Postcoordinate Systems
  • Allow a searcher to combine terms in any way
  • the multidimensionality of the relationships
    among terms is retained
  • every term assigned to a document has equal
    weight one is no more important than another

15
Precoordinate Systems
  • Are not as flexible as postcoordinate systems
  • The multidimensionality of the relationships
    among terms is difficult to depict
  • Terms can only be listed in a particular
    sequence, which implies that the first term is
    more important than the others
  • It is not easy to combine terms at the time a
    search is performed
  • E.g. LCSH
  • Mozambique Economic Relations South Africa

16
Controlled Vocabulary
  • an authority list indexers can only assign to a
    document terms that appear on the list approved
    by the organization for which they work
  • Moves the responsibility from the user (i.e.
    through free-text searching) to the indexer
    (through the creation/use of controlled
    vocabularies)

17
Controlled Vocabulary
  • controlled access to each concept (i.e.
    consistent representation of the term)
  • the creation of hierarchies (broader, narrower
    terms) show relationships between terms
  • major and minor descriptors are used to represent
    the document at hand
  • controlled access for plurals, acronyms, etc.
  • homonyms are controlled same word relates to
    different concepts (e.g. pitch music vs.
    baseball) can control for these differences

18
Citation Indexes
  • The Institute for Scientific Information (ISI)
    publishes three citation indexes
  • the Science Citation Index
  • the Social Sciences Citation Index
  • the Arts and Humanities Citation Index

19
(No Transcript)
20
(No Transcript)
21
Weighted indexing
  • Most subject indexing binary decision a term
    is either assigned to a document or it is not
  • Some indexes provide for a weighting of terms on
    a numeric scale, or use major or minor
  • Example Psychological research of computer
    mediated communication in Russia

22
Indexing failures
  • Conceptual analysis failures
  • Failure to recognize a topic that is of potential
    interest to the user group served
  • Misinterpretation of what some aspect of the
    document really deals with, leading to an
    assignment of terms that are inappropriate
  • Translation failures
  • Failure to use the most specific term available
    to represent some subject
  • Use of a term that is inappropriate to the
    subject matter because of lack of subject
    knowledge or due to carelessness

23
In general, bad indexing occurs when
  • The indexer contravenes policy
  • The indexer fails to use the vocabulary elements
    in the way they should be used
  • The indexer fails to use a term at the correct
    level of specificity
  • The indexer uses an obviously incorrect term,
    perhaps through a lack of subject knowledge
  • The indexer omits an important term
Write a Comment
User Comments (0)
About PowerShow.com