Relevance Feedback and other Query Modification Techniques - PowerPoint PPT Presentation

About This Presentation
Title:

Relevance Feedback and other Query Modification Techniques

Description:

1. Relevance Feedback and other Query Modification Techniques. ????: ????????? ... [AF77] Attar, R. and Fraenkel, A. S., 'Local Feedback in Full-Text Retrieval ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 31
Provided by: chinshe
Category:

less

Transcript and Presenter's Notes

Title: Relevance Feedback and other Query Modification Techniques


1
Relevance Feedback and other Query Modification
Techniques
  • ???? ?????????
  • ???? ??? ??
  • ??? ?? ??? (d9142801)
  • ?? ??? (d9142803)

2
Introduction
  • Precision v.s. Recall
  • In case high recall ratio is critical to users,
    they have to retrieve more relevant documents.
  • Methods to retrieve more
  • Expand their search by broadening a narrow
    Boolean query or looking further down a ranked
    list of retrieved documents.
  • Modify the original query.

3
Introduction (contd)
  • Word Mismatch problem
  • Some of the unretrieved relevant documents are
    indexed by a different set of terms than those in
    the query or in most of the other relevant
    documents.
  • Approaches for improving the initial query
  • Relevance Feedback
  • Automatic Query Modification

4
Conceptual Model of Relevance Feedback
Query
New Query Based on Result Set
User Relevance Feedback
Result Set
5
Basic Ideas about Relevance Feedback
  • Two components of relevance feedback
  • Reweighting of query terms based on the
    distribution of these terms in the relevant and
    nonrelevant documents retrieved in response to
    those queries
  • Changing the actual terms in the query

6
Basic Ideas about Relevance Feedback (contd)
  • Evaluation of Relevance Feedback
  • The results after one iteration of feedback
    against those using no feedback generally show
    spectacular improvement
  • Another evaluation of the results is to compare
    only the residual collections

7
Basic approach to Relevance Feedback
  • Rocchios approach used the vector space model to
    rank documents

8
  • Ide developed three particular strategies
    extending Rocchios approach
  • Basic Rocchos formula, minus the normalization
    for the number of relevant and nonrelevant
    documents
  • Allowed only feedback from relevant documents
  • Allowed limited negative feedback from only the
    highest-ranked nonrelevant document

9
Term reweighting without Query Expansion
  • A probabilistic model proposed by Robertson and
    Sparck Jones (1976)

Wij the term weight for term i in query j r
the number of relevant documents for query j
having term i R the total number of
relevant documents for query j n the number
of documents in the collection having term i N
the number of documents in the collection
10
Term reweighting without Query Expansion (contd)
  • Croft (1983) extended this weighting scheme as
    below,
  • initial search
  • Feedback

Wijk the term weight for term I in query j and
document k IDFi the IDF weight for term I in
the entire collection Pij the probability that
term i is assigned within the set of relevant
documents for query j Qij the probability that
term i is assigned with the set of nonrelevant
documents for query j Fik K(1-K)(freqik/maxfreq
k) freqikthe frequency of term i in document
k maxfreqk the maximum frequency of any term in
document k
11
Query Expansion
  • The query could be expanded by
  • offering users a selection of terms that are the
    terms most closely related to the initial query
    terms (thesaurus)
  • presenting users with a sorted list of terms from
    the relevant documents or all retrieved documents

12
Query Expansion (contd)
  • A proposed list of terms from relevant/nonrelevant
    documents using ranking methods
  • User selection from the top N terms
  • Automatically added to the query
  • The early SMART experiments both expanded the
    query and reweighted the query terms by adding
    the vectors of the relevant and nonrelevant
    documents.

13
Query Expansion (contd)
  • Modification of terms in relevant/nonrelevant
    documents
  • Any relevant document(s) as a new query
    (Noreault, 1979)
  • If no relevant documents are indicated, the term
    list shown to the user is the list of related
    terms based on those previously sorted in the
    inverted file

14
Query Expansion with Term Reweighting
  • The vast amount of relevance feedback and query
    expansion research has been done using both query
    expansion and term-reweighting.
  • Three of most used feedback methods
  • Ide Regular

15
Query Expansion with Term Reweighting(contd)
  • Ide dec-hi
  • Standard Rocchio

Si the top ranked non-relevant document
16
Automatic Query Modification
  • The major disadvantage of relevance feedback is
    that it increase the burden on the users X97.
  • Approaches for automatic query modification
  • Local feedback
  • Automatic query expansion
  • Dictionary-based
  • Global analysis
  • Local analysis

17
Local Feedback
  • Local feedback is similar to relevance feedback.
  • Difference assume the top ranked documents are
    relevant without human judgment.
  • It saves the costs of relevance judgment, but it
    can result in poor retrieval if the top ranked
    documents are non-relevant.

18
Automatic Query Expansion
  • Basic idea
  • Expanding a user query using semantically similar
    and/or statistically associated terms with
    corresponding weights are added.
  • Thesauri are needed for similarity judgment.
  • Two approach for thesauri construction
  • Manual thesauri
  • Automatic thesauri

19
Dictionary-based Query Expansion
  • Based on manual thesauri (e.g., WordNet M95 ).
  • In expansion process, synonymous (or other
    semantic relations) words of initial query terms
    are selected and assigned each term a weight.
  • Disadvantage
  • Construction of manual thesaurus requires a lot
    of human labor.
  • A general manual thesaurus does not consistently
    improve retrieval performance.

20
Example - WordNet
21
Automatic Thesauri Construction Approach
  • Thesauri are construction from the whole (a part
    of) the data corpus.
  • Basic idea of automatic thesauri construction
  • Term co-occurrence
  • Methods of automatic thesauri construction
  • Traditional TFxIDF Y02
  • Variant of TFxIDF (i.e., similarity thesaurus
    QF93)
  • Mining Association Rule Approach WBO00

22
Example of Thesaurus Construction
  • To each term ti is associated a vector
  • Where
  • The relationship between two terms tu and tv
  • According to QF93

23
Example of Thesaurus Construction (contd)
24
Global Analysis
  • The whole collection of documents is used for
    thesaurus creation.
  • Approaches
  • Similarity Thesaurus QF93
  • Statistical Thesaurus CY92

25
Global Analysis (contd)
26
Local Analysis
  • Unlike the global analysis, only the top ranked
    documents are used for constructing thesaurus.
  • Approaches
  • Local Clustering AF77
  • Local Content Analysis X97, XC96, XC00
  • According to XC96, X97, X00, local analysis is
    more effective than global analysis.

27
Local Analysis (contd)
28
(No Transcript)
29
References
  • AF77 Attar, R. and Fraenkel, A. S., Local
    Feedback in Full-Text Retrieval Systems, Journal
    of the ACM, Volume 24, Issue 3, 1977, pp.397-417.
  • BR99 Baeza-Yates, R, Ribeiro-Neto, B, Modern
    Information Retrieval, Addison Wesley/ACM Pres,
    Harlow, England, 1999.
  • CY92 Crouch, C. J., Yang, B., "Experiments in
    Automatic Statistical Thesaurus Construction,"
    Proceedings of the 15th Annual International ACM
    SIGIR Conference on Research and development in
    information retrieval, 1992, pp.77-88.
  • M95 Miller, G. A, WordNet A Lexical Database
    for English, Communications of the ACM, Vol. 38,
    No. 11, November 1995, pp.39- 41.
  • QF93 Qiu, Y., Frei, H. P., "Concept Based Query
    Expansion," Proceedings of the 16th annual
    international ACM SIGIR Conference on Research
    and Development in Information Retrieval, 1993,
    pp. 160-169.
  • WBO00 Wei, J., Bressan, S., and Ooi, B. C.,
    Mining Term Association Rules for Automatic
    Global Query Expansion Methodology and
    Preliminary Results, Proceedings of the First
    International Conference on Web Information
    Systems Engineering, Volume 1, 2000, pp. 366-373.

30
References (contd)
  • X97 Xu, J., Solving the Word Mismatch Problem
    Through Automatic Text Analysis, PhD Thesis,
    University of Massachusetts at Amherst, 1997.
  • XC96 Xu, J. and Croft, W. B., Query Expansion
    Using Local and Global Document Analysis,
    Proceedings of the 19th Annual International ACM
    SIGIR Conference on Research and Development in
    Information Retrieval, 1996, pp. 4-11.
  • XC00 Xu, J. and Croft, W. B., Improving the
    Effectiveness of Information Retrieval with Local
    Context Analysis, ACM Transactions on
    Information Systems, Volume 18, Issue 1, 2000,
    pp. 79-112.
  • Y02 Yang, C., Investigation of Term Expansion
    on Text Mining Techniques, Master Thesis,
    National Sun Yet-Sen University, Taiwan, 2002.
Write a Comment
User Comments (0)
About PowerShow.com