Module 5b: Subject Analysis and Indexing - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Module 5b: Subject Analysis and Indexing

Description:

Consistency. Subject indexing. Definition and purpose of subject indexing ... Consistency ... from those vocabularies are used, consistency is much better ... – PowerPoint PPT presentation

Number of Views:279
Avg rating:3.0/5.0
Slides: 32
Provided by: michaelc78
Category:

less

Transcript and Presenter's Notes

Title: Module 5b: Subject Analysis and Indexing


1
Module 5b Subject Analysis and Indexing
  • IMT530 Organization of Information Resources
  • Winter 2007
  • Michael Crandall

2
Recap
  • Descriptive metadata elements can be used for
    access or selection
  • For access, it is important to have good
    authority control to enable the users to
  • Find known items from the information they have
    available
  • Gather all the items of a similar nature together
  • Choose the right one from among retrieved items
  • Authority control takes time and effort, but pays
    off in better results for users
  • Need to balance cost against benefits and make a
    decision on your approach for each project
  • Dont do it halfway, because its not worth it

3
Module 5b Outline
  • Subject analysis
  • Definition
  • Why do this?
  • Mais domain-centered analysis
  • Consistency
  • Subject indexing
  • Definition and purpose of subject indexing
  • Types of subject indexing
  • Indexing non-text objects
  • Types of terms used in subject indexing
  • The subject indexing process

4
Some Questions
  • Library catalogs often lump fiction into one
    subject heading why?
  • Would you describe the subject of The
    Organization of Information to your mother the
    same way you would to a classmate?
  • Would you use the same subjects to describe
    Chapter 9 in Taylor that you would to describe
    the whole book?
  • If you wanted to assign a subject to your kitchen
    or garage, what would it be?
  • What if you had to describe snow to a Papua New
    Guinea native? What words would you use? Would
    they be the same for an Inuit?
  • How do you describe the subject of a picture or
    film?

5
Subject Analysis - Definition
  • The process of determining the subject and other
    content-related attributes of an object
  • The purpose of subject analysis is to come to an
    understanding of or judgment regarding
  • what an object is about, in the context of how it
    might be used
  • what an object exemplifies
  • what discipline (or other aspect, including
    community) an object reflects (for classification)

6
Why Subject Analysis?
  • One of the primary means of access to information
    is through subjects
  • In order for a computer to access those subjects,
    there has to be some way to get to them an index
    of some kind
  • Remember Soergels model, and the necessity for a
    means to match user requests to information
    objects
  • Automatic indexing works for some situations, but
    not all
  • As well see, subject concepts are not
    necessarily contained in words (especially not in
    images!!)
  • A specific audience may dictate specific analysis

7
Wilson on Subjects
  • One of the main purposes of Wilsons chapter on
    subjects is to analyze the subject analysis
    process to take it apart
  • Starts with the words, then the sentences, then
    the work itself, and asks questions about how you
    can elicit descriptions of aboutness
  • Wilson suggests four different ways to approach
    this
  • Purposive- why did the author write
  • Figure-ground what stands out among all the
    possible subjects
  • Objective- count what is most frequently
    mentioned
  • Appeal to unity and completeness- what questions
    are answered within the work
  • Ultimately, he concludes that any extraction will
    miss some part of the work, and not satisfy some
    user

8
Subject Analysis in Context
  • Subject analysis should always be done in context
  • Context considerations include
  • user (children, medical practitioners, etc.)
  • uses (developing egg substitutes, learning how to
    cook)
  • the document itself (the text of a document,
    intended audience, uses, etc.)
  • institution (public library, corporate intranet)
  • administrative and information systems context

9
Mais Domain-Centered Approach
10
Relevance
  • Taylors stages in development of an information
    need
  • The visceral need
  • The conscious need
  • The formalized need
  • The compromised need
  • Relevance is usually measured against the last of
    these, while ignoring the more complex
    situational aspects that affect the other states
  • Mai concludes that evaluation should be less
    mechanistic (focused on terminology matches) and
    more humanistic (focused on the visceral needs)
  • Requires contextual analysis and qualitative
    research rather than just precision/recall
    measures

11
Consistency
  • Taylor points out the difficulty of getting
    people to assign similar subjects to objects
  • But when controlled vocabularies and rules for
    selecting subject terms from those vocabularies
    are used, consistency is much better
  • Assumes trained subject indexers
  • Not likely to be the case in most settings other
    than libraries
  • Again points out need to determine what your
    objectives in building a taxonomy are before you
    make the investment
  • So how do you go about subject indexing?

12
Definition and Purpose of Subject Indexing
  • Subject indexing is the process or technique of
    identifying and selecting terms (words, phrases,
    sentences, taxonomic categories, notation) used
    in a domain of information to indicate the
    subject content of a resource for users and to
    provide subject access
  • Purposes of subject indexing may be seen in light
    of Cutters objects of the catalog
  • To facilitate finding a particular object on the
    basis of its subject content (finding function)
  • To display to a user all of the objects that
    exhibit particular subject content (collocating
    function)
  • To aid a user in the selection of a particular
    object (choice function).

13
Rowley Article
  • Trade off between precision and recall
  • 4 eras in indexing
  • Era1 Pre-computer access- Title indexing
  • Era 2 Online age- Cranfield and other retrieval
    studies showed free indexing worked as well as
    controlled in abstract databases
  • Era 3 Full-text vs. subject indexing- shown to
    complement each other (Taylor also points out the
    tradeoff between summarization for document
    retrieval vs. depth indexing for information
    retrieval)
  • Era 4 Tests with real users instead of
    controlled experiments- difficulty in using
    search interfaces because of complex and varied
    systems

14
Types of Subject Indexing Derived Indexing
  • Derived Indexing in derived indexing, terms
    used for indexing are limited to those that
    actually appear in the document or resource.
  • Derived indexing may be done manually or
    automatically
  • Search engine indexes are examples of automatic
    derived indexing

15
Assigned Indexing
  • Assigned Indexing in assigned indexing, terms
    used for indexing are not limited to those in the
    object, but may come from the object, the mind of
    the indexer, or from a controlled vocabulary
  • There are two types of Assigned indexing Free
    Indexing and Indexing from controlled
    vocabularies

16
Free Indexing
  • In free indexing, the indexer or indexing program
    is free to assign terms from anywhere inside or
    outside the object
  • the indexer may take terms from the object, or
    use any terms that occur to them
  • In some free indexing settings, very detailed
    instructions guide indexers in their selection of
    terms
  • Other settings are much looser, users can pick
    any terms that mean something to them or others
  • Pictures (http//flickr.com)
  • Folksonomies (http//del.icio.us)

17
Controlled Vocabulary Indexing
  • In indexing from controlled vocabularies,
    indexers are constrained by the terms that are
    available in lists of terms called controlled
    vocabularies - they must assign one or more
    terms from the controlled vocabulary.
  • Controlled vocabulary indexing is much like
    choosing terms from a very large drop-down menu.

18
Automatic Indexing
  • In automatic indexing, it is common for indexing
    software applications to use derived indexing
    techniques only, enhanced with word stemming and
    spelling algorithms to improve matching
  • However, more advanced programs are being
    developed that mimic free indexing (e.g., text
    summarization programs)
  • Some advanced automatic indexing programs
    (particularly those in medicine) are making use
    of controlled vocabularies in term selection and
    identification.

19
Mais Conceptions of Indexing
  • Simplistic conception of indexing
  • automatic extraction (derived indexing)
  • Document-oriented indexing
  • focus on document document parts
  • Content-oriented indexing
  • focus on content in document (still document
    oriented)
  • User-oriented indexing
  • focus on user possible uses of the document
  • Requirement-oriented indexing
  • relies on in-depth knowledge of users uses of
    documents complete knowledge of context

20
Types of Terms Used in Subject Indexing
  • Words or short phrases
  • descriptors, identifiers, subject headings, or
    keywords
  • Sentences derived indexing may use whole
    sentences, but rarely done used in some web
    documents and for derived abstracts
  • abstracts, summaries, or annotations
  • Taxonomic categories (such as the type used in
    the Yahoo directory)
  • Notation (such as the type used in the Dewey
    Decimal Classification)

21
Sample ERIC Indexing Record
  • PERSONAL AUTHOR Magnuson,-Sandy Norem,-Ken
  • TITLE Challenges for Higher Education Couples in
    Commuter Marriages Insights for Couples and
    Counselors Who Work with Them.
  • PUBLICATION YEAR 1999
  • SOURCE (JOURNAL CITATION) Family-Journal-Counsel
    ing-and-Therapy-for-Couples-and-Families v7 n2
    p125-34 Apr 1999
  • DOCUMENT TYPE Journal-Articles (080)
    Reports-Research (143)
  • LANGUAGE English
  • MAJOR DESCRIPTORS Counseling-Techniques
    Dual-Career-Family Job-Satisfaction
    Marital-Satisfaction Marriage-
  • MINOR DESCRIPTORS Trust-Psychology
  • MAJOR IDENTIFIERS Career-Commitment
  • MINOR IDENTIFIERS Quality-Time
  • ABSTRACT Focuses on the experiences of
    dual-career couples that maintain two homes to
    attain career satisfaction. Findings include
    support for the potential strength and
    satisfaction of commuting relationships. Trust,
    commitment, regular communication, and quality
    shared time were endorsed as factors contributing
    to successful distance marriages. (Author/GCP)

22
Indexing Non-text Objects
  • Layne discusses the indexing of images and points
    out some useful distinctions
  • Defines four general types of attributes
  • Biographical
  • Subject
  • Exemplified
  • Relationship
  • While she discusses in the context of images,
    these can prove useful when indexing almost any
    object

23
Identification of Concepts
  • Taylor lists several concepts that can be helpful
    in teasing out subject terms
  • Topics
  • Names
  • Persons, corporations, geographic, other
  • Time periods
  • Form (genre)
  • http//isotropic.org/papers/chicken.pdf
  • See the appendix in Taylor for an example and
    checklist

24
Indexing Policies
  • Many indexers are guided by indexing policies
    that determine the types of terms that are
    finally used in indexing
  • Three characteristics of indexing upon which
    indexing policies may be built
  • Exhaustivity
  • Specific entry (sometimes called specificity,
    but incorrectly)
  • Coextensivity

25
ISO 5963
  • Despite Wilsons assertion that subject analysis
    is impossible, a variety of standards exist
    prescribing how it should be done the British
    Standard ISO 5963 in your readings this week is
    one of them
  • Viewed from Wilsons or Mais perspective (and
    your own), what are the problems with this
    standard?

26
(No Transcript)
27
Steps in Free and Assigned Indexing
  • Identify subject content
  • Identify disciplinary context or domain (for
    classifications or taxonomies)
  • Express or describe content (steps 1-3 describe
    the subject analysis process)
  • Select or create terms and add them to the
    document representation
  • If working with a controlled vocabulary (CV),
    update and maintain the CV based on the indexing
    experience

28
Questions?
  • If not, take a break!!!

29
Exercise 5
  • Purpose is to try different methods of extracting
    concepts from an article, so you can see the
    impact on users
  • Spend the rest of class working through the
    questions in Exercise 5
  • Well discuss before the end of class

30
Differences
  • Hopefully, this exercise gave you a chance to see
    a couple things
  • How difficult it can be to actually determine
    what something is about
  • How different methods of assigning terms would
    result in very different access for users
  • We didnt throw in Mais perspective on domain
    indexing in this exercise, which makes it even
    more difficult
  • This is obviously not a simple thing to do well
  • But you now are aware of the issues, and can keep
    them in mind when working in this area

31
Next Week
  • Well start looking in more detail at controlled
    vocabularies and discuss how they might interact
    with emergent social tagging systems
  • Remember to read assignments BEFORE class
  • Important your mid-term assignments are due at
    the start of class next Thursday!!
Write a Comment
User Comments (0)
About PowerShow.com