SIMS%20296a-3:%20Current%20Topics%20in%20Information%20Access - PowerPoint PPT Presentation

About This Presentation
Title:

SIMS%20296a-3:%20Current%20Topics%20in%20Information%20Access

Description:

Become expert on the state-of-the-art in timely topics related to information access ... The Pathfinder forded the river without marring its paint job. Outline ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 28
Provided by: hea4
Category:

less

Transcript and Presenter's Notes

Title: SIMS%20296a-3:%20Current%20Topics%20in%20Information%20Access


1
SIMS 296a-3Current Topics in Information Access
  • Marti Hearst
  • Fall 98

2
Today
  • Introductions
  • Goals and Course Requirements
  • Administrivia
  • Topics
  • What is Information Access
  • Current Topics (an outline)
  • Intro to IA

3
Goals
  • Become expert on the state-of-the-art in timely
    topics related to information access
  • Begin getting research results.

4
Course Requirements
  • To get S/U credit for the class
  • Lead two discussions
  • Do the readings
  • Attend the meetings

5
Course Requirements
  • To get a grade in the class
  • Do the above
  • Do one of the following (optionally with the help
    of a faculty member and/or another student)
  • Write a publishable survey paper on an emerging
    area of information access.
  • Do research that should lead to a publishable
    research paper on a new idea, method, analysis,
    or vision statement for an emerging area of
    information access.
  • Implement and/or evaluate code to further an
    information access research project.

6
Administrivia
  • Sign up sheet
  • Readings
  • Other questions?

7
Outline
  • What is Information Access?
  • Goals, Tasks, Types of data
  • Standard Information Retrieval
  • Assumptions, Techniques, Evaluation
  • Current Topics
  • Candidate topics

8
What is Information Access?
  • Information Access
  • The process by which users use information
    technology to seek, organize, and understand
    information.
  • Focus information expressed as text.

9
Information Retrieval
  • Task Statement
  • Build a system that retrieves documents that
    users are likely to find relevant to their
    queries.
  • This set of assumptions underlies the field of
    Information Retrieval.

10
Information Retrieval Assumptions
  • The system has available only pre-existing,
    canned text passages.
  • Its response is limited to selecting from these
    passages and presenting them to the user.
  • It must select, say, 10 or 20 passages out of
    millions or billions!

11
Top 10 Research Issues for IRWhat do people want
from IR?
  • By Bruce Croft, DLIB Magazine, Nov 95
  • Based on work observations from work on
    public-domain systems, including
  • THOMAS
  • American Memory Project (Library of Congress)
  • The order of importance does not correspond to
    many IR researchers priorities.
  • The same can be said for AI researchers.

12
Top 10 Research Issues for IR
  • Bruce Croft, DLIB Magazine, Nov 95. In
    descending order of importance.
  • Integrated Solutions
  • Distributed IR
  • Efficient, Flexible Indexing and Retreival
  • Magic (Effective Vocabulary Expansion)
  • Interfaces and Browsing
  • Routing and Filtering
  • Effective Retrieval
  • Multimedia Retrieval
  • Information Extraction
  • Relevance Feedback

13
Other Issues
  • Mundane issues are important
  • Spelling Correction
  • Fast display of initial results
  • Less important but more interesting from many
    researchers points of view (Bruce Croft, DLIB
    Magazine, Nov 95)
  • Multilingual IR
  • Data Mining (in text databases)
  • Text Categorization

14
Matching Tasks, Collections, and Search Systems
  • Typical WWW search is not the whole picture.
  • Different information needs require
  • different collections
  • different search systems and strategies
  • Compare
  • general WWW
  • newswire and magazines
  • medical journal articles

15
Match Task and Search Type
  • WWW Tasks (from www.cnet.com/Content/Reviews/Comp
    are/Seach/ss1a.html)
  • Find how-to pages for Doom.
  • Purchase plane tickets and hotel for a trip to
    Java.
  • Find the top five all-time scoring leaders in the
    national hockey league.
  • Find a recipe for potato latkes.
  • Find the tide tables for Maui.
  • Characteristics
  • Timely, specific, found via help from human
    agents and in well-known resources before the WWW.

16
Match Task and Search Type
  • Newswire Magazine Tasks (from the TREC
    collection)
  • Find articles on research into cures for
    osteoporosis.
  • Find articles on the effects of recycling of
    tires on the environment.
  • Find information on jail and prison overcrowding
    and how inmates are forced to cope with those
    conditions.
  • Find discussion of an existing or proposed
    insurance plan (governmental, commercial or
    individual) and the coverage it provides for long
    term care confinements in an institution.
  • Characteristics
  • Complex combinations of topics.
  • Research-oriented
  • Either timely or retrospective

17
Match Task and Search Type
  • MEDLINE Tasks (From OHSUMED, medir.ohsu.edu/pub/o
    hsumed)
  • Are there adverse effects on lipids when
    progesterone is given with estrogen replacement
    therapy?
  • Pathophysiology and treatment of disseminated
    intravascular coagulation.
  • Reviews on subdurals in the elderly.
  • Effectiveness of etidronate in treating
    hypercalcemia of malignancy.
  • Characteristics
  • Research-oriented
  • Technical
  • Cause and Effect, Implications

18
The Problem of Information Access
  • Main problem
  • Computers cant understand natural language.
  • Therefore
  • Information access systems must guide users to
    information of interest by approximate methods.
  • General common methods
  • word match
  • topic directories

19
Why Text is Tough
  • Abstract concepts difficult to represent
  • (AI-Complete)
  • Countless combinations of subtle, abstract
    relationships among concepts
  • Many ways to represent similar concepts
  • space ship, flying saucer, UFO, figment of
    imagination
  • Concepts are difficult to visualize
  • High dimensionality
  • Tens or hundreds of thousands of features

20
Why Text is Tough
  • I saw Pathfinder on Mars with a telescope.
  • Pathfinder photographed Mars.
  • The Pathfinder photograph mars our perception of
    a lifeless planet.
  • The Pathfinder photograph from Ford has arrived.
  • The Pathfinder forded the river without marring
    its paint job.

21
Outline
  • What is Information Access?
  • Goals, Tasks, Types of data
  • Standard Information Retrieval
  • Assumptions, Techniques, Evaluation
  • Current Topics
  • Candidate topics
  • User Interfaces
  • Quality Assessment
  • Text Data Mining
  • Student suggestions

22
Tools for Information Access
  • User Interfaces
  • (information visualization)
  • Information Access
  • (information retrieval)
  • Language and Task Analysis
  • Content Analysis

23
Current Topics
  • User Interfaces
  • Incorporating personal information
  • Automated Agents vs. User Initiated Steps
  • Support for the dynamic process of information
    access
  • How to organize large search results
  • Categories, clusters, combinations of these
  • Question Answering
  • Others?

24
Current Topics
  • Quality Assessment
  • Issues
  • How to define quality
  • Rating methods
  • Different fields (medicine, business)
  • Techniques
  • Visitation patterns and times
  • Social techniques
  • Link structure (co-citation patterns)
  • Link structure content

25
Current Topics
  • Text Data Mining
  • Visualizating the contents of large text
    collections
  • Automatically discovering associations within
    text collections
  • Discovering useful patterns
  • Spotting anomalies
  • Finding chains of associated information
  • I have a proposal for this

26
Current Topics
  • Cognitive modeling/AI techniques
  • Your idea goes here

27
For Next Time
  • Do background reading
  • Think about which topics to pursue
  • I will present more background information
Write a Comment
User Comments (0)
About PowerShow.com