Query expansion techniques - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Query expansion techniques

Description:

At the end of this lecture you will be able to. Select good' terms from a set of ... harvard university, water hole, space administration, message, creature, ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 20
Provided by: marksan
Category:

less

Transcript and Presenter's Notes

Title: Query expansion techniques


1
Query expansion techniques
  • Mark Sanderson
  • Porto, 2000

2
Aims
  • To cover query modification techniques, both
    automatic and manual.

3
Objectives
  • At the end of this lecture you will be able to
  • Select good terms from a set of relevant
    documents
  • Describe how pseudo-relevance feedback works

4
Why?
  • These approaches are to be found in common
    research and commercial systems.
  • Show how techniques build on each other

5
Relevance feedback
  • User types in query
  • the possibility of extra-terrestrial life
  • Gets back documents
  • Tells system some of them are relevant
  • Modify query to users wishes
  • How?

6
Which terms do you pick?
  • From newspaper articles, user selects 4

theseti planetradio...
theseti radiogalaxy...
seti the planet...
theuniverse alienseti...
7
IDF differences
  • Compute in non relevant
  • Approximate to main collection
  • Compute in relevant collection
  • Rank terms on their difference
  • Add top n terms to query
  • Really works
  • Harman, D. (1992) Relevance feedback revisited,
    in Proceedings of the 15th Annual International
    ACM SIGIR conference on Research and development
    in information retrieval 1-10

8
Expansion before retrieval?
  • Query expansion so good, can we use it other
    times?
  • Global analysis
  • Qiu, Y., Frei, H.P. (1993) Concept based query
    expansion, in Proceedings of the 16th annual
    international ACM SIGIR conference on Research
    and Development in Information Retrieval, ACM
    Press 160-170
  • Local analysis
  • (PseudoLocal) relevance feedback
  • Local Context Analysis (LCA)

9
Pseudo-relevance feedback
  • Assume top ranked documents relevant
  • Automatically mark as relevant
  • Maybe others as non-relevant
  • Expand query
  • Do another retrieval
  • Use top ranked passages
  • Xu, J., Croft, W.B. (1996) Query Expansion Using
    Local and Global Document Analysis, in
    Proceedings of the 19th annual international ACM
    SIGIR conference on Research and development in
    information retrieval 4-11

10
Example
  • Reporting on possibility of and search for
    extra-terrestrial life/intelligence.
  • extraterrestrials, planetary society, universe,
    civilization, planet, radio signal, seti, sagan,
    search, earth, extraterrestrial intelligence,
    alien, astronomer, star, radio receiver, nasa,
    earthlings, e.t., galaxy, life, intelligence,
    meta receiver, radio search, discovery, northern
    hemisphere, national aeronautics, jet propulsion
    laboratory, soup, space, radio frequency, radio
    wave, klein, receiver, comet, steven spielberg,
    telescope, scientist, signal, mars, moises
    bermudez, extra terrestrial, harvard university,
    water hole, space administration, message,
    creature, astronomer carl sagan, intelligent
    life, meta ii, radioastronomy, meta project,
    cosmos, argentina, trillions, raul colomb, ufos,
    meta, evidence, ames research center, california
    institute, history, hydrogen atom, columbus
    discovery, hypothesis, third kind, institute,
    mop, chance, film, signs

11
Does it work?
  • Works (mostly)
  • Remember assumption
  • Query drift?

12
Other methods?
  • Latent Semantic Indexing
  • Rough overview

13
Toy collection
14
Words are not enough?
  • Document (query) cosmonaut
  • Document astronaut, moon
  • Should they match?
  • Conventional system, no
  • Maybe they should?
  • How?

15
Latent Semantic Indexing
  • Find terms that commonly co-occur in the
    collection
  • Merge them together
  • How?
  • Good explanation in
  • Manning, C.D., Schütze, H., Foundations of
    Statistical Natural Language Processing, MIT
    Press, 1999

16
Collapse the dimensions
  • Toy collection
  • 5D space
  • 6 document vectors
  • Reduce the number of dimensions
  • Standard ways of doing this
  • Singular Value Decomposition (SVD)
  • used in LSI
  • Principle Components Analysis

17
Collapse down to 2D
D1
D2
D3
D6
D4
D5
18
Problems with LSI?
  • Very computer intensive to calculate reduced
    dimensions
  • Pseudo-relevance feedback does similar job
  • Not so computer intensive.
  • Why?

19
Conclusion
  • You can now
  • Select good terms from a set of relevant
    documents
  • Describe how pseudo-relevance feedback works
Write a Comment
User Comments (0)
About PowerShow.com