Title: Relevance Feedback and other Query Modification Techniques
1Relevance Feedback and other Query Modification
Techniques
- ???? ?????????
- ???? ??? ??
- ??? ?? ??? (d9142801)
- ?? ??? (d9142803)
2Introduction
- Precision v.s. Recall
- In case high recall ratio is critical to users,
they have to retrieve more relevant documents. - Methods to retrieve more
- Expand their search by broadening a narrow
Boolean query or looking further down a ranked
list of retrieved documents. - Modify the original query.
3Introduction (contd)
- Word Mismatch problem
- Some of the unretrieved relevant documents are
indexed by a different set of terms than those in
the query or in most of the other relevant
documents. - Approaches for improving the initial query
- Relevance Feedback
- Automatic Query Modification
4Conceptual Model of Relevance Feedback
Query
New Query Based on Result Set
User Relevance Feedback
Result Set
5Basic Ideas about Relevance Feedback
- Two components of relevance feedback
- Reweighting of query terms based on the
distribution of these terms in the relevant and
nonrelevant documents retrieved in response to
those queries - Changing the actual terms in the query
6Basic Ideas about Relevance Feedback (contd)
- Evaluation of Relevance Feedback
- The results after one iteration of feedback
against those using no feedback generally show
spectacular improvement - Another evaluation of the results is to compare
only the residual collections
7Basic approach to Relevance Feedback
- Rocchios approach used the vector space model to
rank documents
8- Ide developed three particular strategies
extending Rocchios approach - Basic Rocchos formula, minus the normalization
for the number of relevant and nonrelevant
documents - Allowed only feedback from relevant documents
- Allowed limited negative feedback from only the
highest-ranked nonrelevant document
9Term reweighting without Query Expansion
- A probabilistic model proposed by Robertson and
Sparck Jones (1976)
Wij the term weight for term i in query j r
the number of relevant documents for query j
having term i R the total number of
relevant documents for query j n the number
of documents in the collection having term i N
the number of documents in the collection
10Term reweighting without Query Expansion (contd)
- Croft (1983) extended this weighting scheme as
below, - initial search
- Feedback
Wijk the term weight for term I in query j and
document k IDFi the IDF weight for term I in
the entire collection Pij the probability that
term i is assigned within the set of relevant
documents for query j Qij the probability that
term i is assigned with the set of nonrelevant
documents for query j Fik K(1-K)(freqik/maxfreq
k) freqikthe frequency of term i in document
k maxfreqk the maximum frequency of any term in
document k
11Query Expansion
- The query could be expanded by
- offering users a selection of terms that are the
terms most closely related to the initial query
terms (thesaurus) - presenting users with a sorted list of terms from
the relevant documents or all retrieved documents
12Query Expansion (contd)
- A proposed list of terms from relevant/nonrelevant
documents using ranking methods - User selection from the top N terms
- Automatically added to the query
- The early SMART experiments both expanded the
query and reweighted the query terms by adding
the vectors of the relevant and nonrelevant
documents.
13Query Expansion (contd)
- Modification of terms in relevant/nonrelevant
documents - Any relevant document(s) as a new query
(Noreault, 1979) - If no relevant documents are indicated, the term
list shown to the user is the list of related
terms based on those previously sorted in the
inverted file
14Query Expansion with Term Reweighting
- The vast amount of relevance feedback and query
expansion research has been done using both query
expansion and term-reweighting. - Three of most used feedback methods
- Ide Regular
15Query Expansion with Term Reweighting(contd)
- Ide dec-hi
- Standard Rocchio
Si the top ranked non-relevant document
16Automatic Query Modification
- The major disadvantage of relevance feedback is
that it increase the burden on the users X97. - Approaches for automatic query modification
- Local feedback
- Automatic query expansion
- Dictionary-based
- Global analysis
- Local analysis
17Local Feedback
- Local feedback is similar to relevance feedback.
- Difference assume the top ranked documents are
relevant without human judgment. - It saves the costs of relevance judgment, but it
can result in poor retrieval if the top ranked
documents are non-relevant.
18Automatic Query Expansion
- Basic idea
- Expanding a user query using semantically similar
and/or statistically associated terms with
corresponding weights are added. - Thesauri are needed for similarity judgment.
- Two approach for thesauri construction
- Manual thesauri
- Automatic thesauri
19Dictionary-based Query Expansion
- Based on manual thesauri (e.g., WordNet M95 ).
- In expansion process, synonymous (or other
semantic relations) words of initial query terms
are selected and assigned each term a weight. - Disadvantage
- Construction of manual thesaurus requires a lot
of human labor. - A general manual thesaurus does not consistently
improve retrieval performance.
20Example - WordNet
21Automatic Thesauri Construction Approach
- Thesauri are construction from the whole (a part
of) the data corpus. - Basic idea of automatic thesauri construction
- Term co-occurrence
- Methods of automatic thesauri construction
- Traditional TFxIDF Y02
- Variant of TFxIDF (i.e., similarity thesaurus
QF93) - Mining Association Rule Approach WBO00
22Example of Thesaurus Construction
- To each term ti is associated a vector
- Where
- The relationship between two terms tu and tv
23Example of Thesaurus Construction (contd)
24Global Analysis
- The whole collection of documents is used for
thesaurus creation. - Approaches
- Similarity Thesaurus QF93
- Statistical Thesaurus CY92
25Global Analysis (contd)
26Local Analysis
- Unlike the global analysis, only the top ranked
documents are used for constructing thesaurus. - Approaches
- Local Clustering AF77
- Local Content Analysis X97, XC96, XC00
- According to XC96, X97, X00, local analysis is
more effective than global analysis.
27Local Analysis (contd)
28(No Transcript)
29References
- AF77 Attar, R. and Fraenkel, A. S., Local
Feedback in Full-Text Retrieval Systems, Journal
of the ACM, Volume 24, Issue 3, 1977, pp.397-417.
- BR99 Baeza-Yates, R, Ribeiro-Neto, B, Modern
Information Retrieval, Addison Wesley/ACM Pres,
Harlow, England, 1999. - CY92 Crouch, C. J., Yang, B., "Experiments in
Automatic Statistical Thesaurus Construction,"
Proceedings of the 15th Annual International ACM
SIGIR Conference on Research and development in
information retrieval, 1992, pp.77-88. - M95 Miller, G. A, WordNet A Lexical Database
for English, Communications of the ACM, Vol. 38,
No. 11, November 1995, pp.39- 41. - QF93 Qiu, Y., Frei, H. P., "Concept Based Query
Expansion," Proceedings of the 16th annual
international ACM SIGIR Conference on Research
and Development in Information Retrieval, 1993,
pp. 160-169. - WBO00 Wei, J., Bressan, S., and Ooi, B. C.,
Mining Term Association Rules for Automatic
Global Query Expansion Methodology and
Preliminary Results, Proceedings of the First
International Conference on Web Information
Systems Engineering, Volume 1, 2000, pp. 366-373.
30References (contd)
- X97 Xu, J., Solving the Word Mismatch Problem
Through Automatic Text Analysis, PhD Thesis,
University of Massachusetts at Amherst, 1997. - XC96 Xu, J. and Croft, W. B., Query Expansion
Using Local and Global Document Analysis,
Proceedings of the 19th Annual International ACM
SIGIR Conference on Research and Development in
Information Retrieval, 1996, pp. 4-11. - XC00 Xu, J. and Croft, W. B., Improving the
Effectiveness of Information Retrieval with Local
Context Analysis, ACM Transactions on
Information Systems, Volume 18, Issue 1, 2000,
pp. 79-112. - Y02 Yang, C., Investigation of Term Expansion
on Text Mining Techniques, Master Thesis,
National Sun Yet-Sen University, Taiwan, 2002.