Title: Using OWA Fuzzy Operator to Merge Retrieval System Results
1Using OWA Fuzzy Operator to Merge Retrieval
System Results
Tehran University
- Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud
Rahgozar - School of Electrical and Computer Engineering
- University of Tehran
- Farhad Oroumchian
- University of Wollongong in Dubai
2Outline
- The Persian Language
- Used Methods
- Vector Space Model
- Language Modeling
- OWA Operator
- The test collections
- Experiment results
- Conclusion
3Outline
- The Persian Language
- Used Methods
- Vector Space Model
- Language Modeling
- OWA Operator
- The test collections
- Experiment results
- Conclusion
?
4The Persian Language
- It is Spoken in countries like Iran, Tajikistan
and Afghanistan - It has Arabic like script and 32 characters
written continuously from right to left - Its morphological analyzers need to deal with
many forms of words that are not actually Farsi - Example
- The word ???? that has two plural forms in
Farsi - Farsi form???? ??
- Arabic form?????
5Outline
- The Persian Language
- Used Methods
- Vector Space Model
- Language Modeling
- OWA Operator
- The test collections
- Experimental results
- Conclusion
?
6Vector Space Model
List of Weights that produced the best results
Best
We used Lnu.ltu and Lnc.btc weighting schemas
7Outline
- The Persian Language
- Used Methods
- Vector Space Model
- Language Modeling
- OWA Operator
- The test collections
- Experimental results
- Conclusion
?
8Language Modeling
- four ways to specify the rank of document d
against query q - Considering P(Dd) as the prior probability of
relevance of the document d to the query q - Lambda (? ) is a smoothing parameter and is equal
for each query term - if there is no previous relevance information
available for a query, each query term will be
considered equally important
9Language Modeling- Cont.
10Outline
- The Persian Language
- Used Methods
- Vector Space Model
- Language Modeling
- OWA Operator
- The test collections
- Experimental results
- Conclusion
?
11OWA Operator- Cont.
- We used OWA operator as the merge operator.
- The OWA weight of each document d is defined as
- Each score xi is assigned by ith search engine
to document d. If d is not present in the ith
list then xi0.
.
12OWA Operator- Weighting Method
- Quantifier Based Weighting
- Degree of Importance Based Weighting
13Quantifier Based Weighting
- linguistic quantifiers All, Most, Few, and
At-Least-One as the weighting schemas, - All consider documents appearing in all
retrieval engines lists. This quantifier is
suitable when the user is looking for precise
answer - Most a fuzzy majority operator that assumes the
retrieval by the most of the engines to be
sufficient for inclusion in the fused list. - Few is a weaker weighting schemas in which it is
enough for a document to be retrieved by a few
number of retrieval engines. - At-Least-One is the weakest weighting schemas in
which it is enough for a document to appear in
only one retrieval engines list to be included
in the fused list.
14Degree of Importance Based Weighting
- As the second weighting schema we use the
position of the documents in the retrieved lists - The weight of each document d in the Li,q is
defined by -
- Ni is the number of elements in the ith list,
Li,q, and POSi is the position of document d in
Li,q.
.
15Outline
- The Persian Language
- Used Methods
- Vector Space Model
- Language Modeling
- OWA Operator
- The test collections
- Experimental results
- Conclusion
?
16Test Collections
- Qvanin Collection
- Documents Iranian Law Collection
- 177089 passages
- 41 queries and Relevance Judgments
- Hamshari Collection
- Documents 600 MB News from Hamshari Newspaper
- 160000 news articles
- 60 queries and Relevance Judgments
- BijanKhan Tagged Collection
- Documents 100 MB from different sources
- A tag set of 41 tags
- 2590000 tagged words
17Hamshahri Collection
- We used HAMSHAHRI (a test collection for Persian
text prepared and distributed by DBRG (IR team)
of University of Tehran) - The 3rd version
- contains about 160000 distinct textual news
articles in Farsi - 60 queries and relevance judgments for top 20
relevant documents for each query
18Outline
- The Persian Language
- Used Methods
- Pivoted normalization
- N-Gram approach
- Local Context Analysis
- Our test collections
- Experimental results
- Conclusion
?
19Experiment results
The precision of the six retrieval engines at
different document cut-offs. The LM4 and the
Lnu.ltu with slope 0.25 methods are better than
the other systems
20Quantifier Based OWA Weighting
The parameter n (in Most and Few quantifiers)
indicates the minimum number of retrieval lists
sufficient for inclusion in the merge process
21Experiment results
The precision of the fusion methods at different
document cut-offs. The bests are Most3 Most4.
22Experiment results
- Comparing LM4 and Lnu.ltu methods with the best
OWA results
23Statistical significance tests
- Wilcoxon Signed Rank
- T-Test
24Statistical significance tests
- Based on T Test, both Most3 and Most4 methods are
significantly better than LM4 method which is a
confirmation of The Wilconxon Signed Rank test. - However, with the T-Test we can not confirm the
significance of the Most3 and Most4 methods over
the Lnu.ltu with slope of 0.25 method.
25Conclusion
- We used two weighting namely quantifier based and
degree-of-importance based weighting methods - The experimental results show that the best OWA
operator, Most3 and Most4 (quantifier based OWA
operators), only marginally improve over the best
retrieval method on Persian text the LM4 methods.
- However seems they produce better ranking since
they push the relevant documents to higher ranks. - The significant tests we conducted seem to
confirm that Most3 and Most4 are significantly
better than all other methods but Lnu.ltu with
slope of 0.25. - However, the superiority over the Lnu.ltu with
slope of 0.25 was not confirmed by T-Test.
26Thanks, Questions
http//ece.ut.ac.ir/dbrg
27OWA Operator- Cont.
- The OWA weight of each document is computed by
this Equation - WT is the transpose vector of W that
- defines the semantics of associated with the
- OWA operator
- Bb1,b2,..,bn is the vector
- X x1, x2,, xn reordered so that bjMinj(x1,
x2,, xn), that is the jth smallest element of
all the x1, x2,, xn. - we used a simple function to bring the scores
(xi, i1,,n) into a same scale