Relevance Feedback and other Query Modification Techniques - PowerPoint PPT Presentation

About This Presentation

Title:

Relevance Feedback and other Query Modification Techniques

Description:

1. Relevance Feedback and other Query Modification Techniques. ????: ????????? ... [AF77] Attar, R. and Fraenkel, A. S., 'Local Feedback in Full-Text Retrieval ... – PowerPoint PPT presentation

Number of Views:84

Avg rating:3.0/5.0

Slides: 31

Provided by: chinshe

Category:

more less

Transcript and Presenter's Notes

Title: Relevance Feedback and other Query Modification Techniques

1
Relevance Feedback and other Query Modification
Techniques

???? ?????????
???? ??? ??
??? ?? ??? (d9142801)
?? ??? (d9142803)

2
Introduction

Precision v.s. Recall
In case high recall ratio is critical to users,
they have to retrieve more relevant documents.
Methods to retrieve more
Expand their search by broadening a narrow
Boolean query or looking further down a ranked
list of retrieved documents.
Modify the original query.

3
Introduction (contd)

Word Mismatch problem
Some of the unretrieved relevant documents are
indexed by a different set of terms than those in
the query or in most of the other relevant
documents.
Approaches for improving the initial query
Relevance Feedback
Automatic Query Modification

4
Conceptual Model of Relevance Feedback
Query
New Query Based on Result Set
User Relevance Feedback
Result Set
5
Basic Ideas about Relevance Feedback

Two components of relevance feedback
Reweighting of query terms based on the
distribution of these terms in the relevant and
nonrelevant documents retrieved in response to
those queries
Changing the actual terms in the query

6
Basic Ideas about Relevance Feedback (contd)

Evaluation of Relevance Feedback
The results after one iteration of feedback
against those using no feedback generally show
spectacular improvement
Another evaluation of the results is to compare
only the residual collections

7
Basic approach to Relevance Feedback

Rocchios approach used the vector space model to
rank documents

Ide developed three particular strategies
extending Rocchios approach
Basic Rocchos formula, minus the normalization
for the number of relevant and nonrelevant
documents
Allowed only feedback from relevant documents
Allowed limited negative feedback from only the
highest-ranked nonrelevant document

9
Term reweighting without Query Expansion

A probabilistic model proposed by Robertson and
Sparck Jones (1976)

Wij the term weight for term i in query j r
the number of relevant documents for query j
having term i R the total number of
relevant documents for query j n the number
of documents in the collection having term i N
the number of documents in the collection
10
Term reweighting without Query Expansion (contd)

Croft (1983) extended this weighting scheme as
below,
initial search
Feedback

Wijk the term weight for term I in query j and
document k IDFi the IDF weight for term I in
the entire collection Pij the probability that
term i is assigned within the set of relevant
documents for query j Qij the probability that
term i is assigned with the set of nonrelevant
documents for query j Fik K(1-K)(freqik/maxfreq
k) freqikthe frequency of term i in document
k maxfreqk the maximum frequency of any term in
document k
11
Query Expansion

The query could be expanded by
offering users a selection of terms that are the
terms most closely related to the initial query
terms (thesaurus)
presenting users with a sorted list of terms from
the relevant documents or all retrieved documents

12
Query Expansion (contd)

A proposed list of terms from relevant/nonrelevant
documents using ranking methods
User selection from the top N terms
Automatically added to the query
The early SMART experiments both expanded the
query and reweighted the query terms by adding
the vectors of the relevant and nonrelevant
documents.

13
Query Expansion (contd)

Modification of terms in relevant/nonrelevant
documents
Any relevant document(s) as a new query
(Noreault, 1979)
If no relevant documents are indicated, the term
list shown to the user is the list of related
terms based on those previously sorted in the
inverted file

14
Query Expansion with Term Reweighting

The vast amount of relevance feedback and query
expansion research has been done using both query
expansion and term-reweighting.
Three of most used feedback methods
Ide Regular

15
Query Expansion with Term Reweighting(contd)

Ide dec-hi
Standard Rocchio

Si the top ranked non-relevant document
16
Automatic Query Modification

The major disadvantage of relevance feedback is
that it increase the burden on the users X97.
Approaches for automatic query modification
Local feedback
Automatic query expansion
Dictionary-based
Global analysis
Local analysis

17
Local Feedback

Local feedback is similar to relevance feedback.
Difference assume the top ranked documents are
relevant without human judgment.
It saves the costs of relevance judgment, but it
can result in poor retrieval if the top ranked
documents are non-relevant.

18
Automatic Query Expansion

Basic idea
Expanding a user query using semantically similar
and/or statistically associated terms with
corresponding weights are added.
Thesauri are needed for similarity judgment.
Two approach for thesauri construction
Manual thesauri
Automatic thesauri

19
Dictionary-based Query Expansion

Based on manual thesauri (e.g., WordNet M95 ).
In expansion process, synonymous (or other
semantic relations) words of initial query terms
are selected and assigned each term a weight.
Disadvantage
Construction of manual thesaurus requires a lot
of human labor.
A general manual thesaurus does not consistently
improve retrieval performance.

20
Example - WordNet
21
Automatic Thesauri Construction Approach

Thesauri are construction from the whole (a part
of) the data corpus.
Basic idea of automatic thesauri construction
Term co-occurrence
Methods of automatic thesauri construction
Traditional TFxIDF Y02
Variant of TFxIDF (i.e., similarity thesaurus
QF93)
Mining Association Rule Approach WBO00

22
Example of Thesaurus Construction

To each term ti is associated a vector
Where
The relationship between two terms tu and tv

According to QF93

23
Example of Thesaurus Construction (contd)
24
Global Analysis

The whole collection of documents is used for
thesaurus creation.
Approaches
Similarity Thesaurus QF93
Statistical Thesaurus CY92

25
Global Analysis (contd)
26
Local Analysis

Unlike the global analysis, only the top ranked
documents are used for constructing thesaurus.
Approaches
Local Clustering AF77
Local Content Analysis X97, XC96, XC00
According to XC96, X97, X00, local analysis is
more effective than global analysis.

27
Local Analysis (contd)
28
(No Transcript)
29
References

AF77 Attar, R. and Fraenkel, A. S., Local
Feedback in Full-Text Retrieval Systems, Journal
of the ACM, Volume 24, Issue 3, 1977, pp.397-417.
BR99 Baeza-Yates, R, Ribeiro-Neto, B, Modern
Information Retrieval, Addison Wesley/ACM Pres,
Harlow, England, 1999.
CY92 Crouch, C. J., Yang, B., "Experiments in
Automatic Statistical Thesaurus Construction,"
Proceedings of the 15th Annual International ACM
SIGIR Conference on Research and development in
information retrieval, 1992, pp.77-88.
M95 Miller, G. A, WordNet A Lexical Database
for English, Communications of the ACM, Vol. 38,
No. 11, November 1995, pp.39- 41.
QF93 Qiu, Y., Frei, H. P., "Concept Based Query
Expansion," Proceedings of the 16th annual
international ACM SIGIR Conference on Research
and Development in Information Retrieval, 1993,
pp. 160-169.
WBO00 Wei, J., Bressan, S., and Ooi, B. C.,
Mining Term Association Rules for Automatic
Global Query Expansion Methodology and
Preliminary Results, Proceedings of the First
International Conference on Web Information
Systems Engineering, Volume 1, 2000, pp. 366-373.

30
References (contd)

X97 Xu, J., Solving the Word Mismatch Problem
Through Automatic Text Analysis, PhD Thesis,
University of Massachusetts at Amherst, 1997.
XC96 Xu, J. and Croft, W. B., Query Expansion
Using Local and Global Document Analysis,
Proceedings of the 19th Annual International ACM
SIGIR Conference on Research and Development in
Information Retrieval, 1996, pp. 4-11.
XC00 Xu, J. and Croft, W. B., Improving the
Effectiveness of Information Retrieval with Local
Context Analysis, ACM Transactions on
Information Systems, Volume 18, Issue 1, 2000,
pp. 79-112.
Y02 Yang, C., Investigation of Term Expansion
on Text Mining Techniques, Master Thesis,
National Sun Yet-Sen University, Taiwan, 2002.