Using OWA Fuzzy Operator to Merge Retrieval System Results - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Using OWA Fuzzy Operator to Merge Retrieval System Results

Description:

Tehran University. Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud Rahgozar ... University of Tehran - Database Research Group. The Persian Language ... – PowerPoint PPT presentation

Number of Views:124

Avg rating:3.0/5.0

Slides: 28

Provided by: anneko

Category:

more less

Transcript and Presenter's Notes

Title: Using OWA Fuzzy Operator to Merge Retrieval System Results

1
Using OWA Fuzzy Operator to Merge Retrieval
System Results
Tehran University

Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud
Rahgozar
School of Electrical and Computer Engineering
University of Tehran
Farhad Oroumchian
University of Wollongong in Dubai

2
Outline

The Persian Language
Used Methods
Vector Space Model
Language Modeling
OWA Operator
The test collections
Experiment results
Conclusion

3
Outline

The Persian Language
Used Methods
Vector Space Model
Language Modeling
OWA Operator
The test collections
Experiment results
Conclusion

?
4
The Persian Language

It is Spoken in countries like Iran, Tajikistan
and Afghanistan
It has Arabic like script and 32 characters
written continuously from right to left
Its morphological analyzers need to deal with
many forms of words that are not actually Farsi
Example
The word ???? that has two plural forms in
Farsi
Farsi form???? ??
Arabic form?????

5
Outline

The Persian Language
Used Methods
Vector Space Model
Language Modeling
OWA Operator
The test collections
Experimental results
Conclusion

?
6
Vector Space Model
List of Weights that produced the best results
Best
We used Lnu.ltu and Lnc.btc weighting schemas
7
Outline

The Persian Language
Used Methods
Vector Space Model
Language Modeling
OWA Operator
The test collections
Experimental results
Conclusion

?
8
Language Modeling

four ways to specify the rank of document d
against query q
Considering P(Dd) as the prior probability of
relevance of the document d to the query q
Lambda (? ) is a smoothing parameter and is equal
for each query term
if there is no previous relevance information
available for a query, each query term will be
considered equally important

9
Language Modeling- Cont.
10
Outline

The Persian Language
Used Methods
Vector Space Model
Language Modeling
OWA Operator
The test collections
Experimental results
Conclusion

?
11
OWA Operator- Cont.

We used OWA operator as the merge operator.
The OWA weight of each document d is defined as
Each score xi is assigned by ith search engine
to document d. If d is not present in the ith
list then xi0.

.
12
OWA Operator- Weighting Method

Quantifier Based Weighting
Degree of Importance Based Weighting

13
Quantifier Based Weighting

linguistic quantifiers All, Most, Few, and
At-Least-One as the weighting schemas,
All consider documents appearing in all
retrieval engines lists. This quantifier is
suitable when the user is looking for precise
answer
Most a fuzzy majority operator that assumes the
retrieval by the most of the engines to be
sufficient for inclusion in the fused list.
Few is a weaker weighting schemas in which it is
enough for a document to be retrieved by a few
number of retrieval engines.
At-Least-One is the weakest weighting schemas in
which it is enough for a document to appear in
only one retrieval engines list to be included
in the fused list.

14
Degree of Importance Based Weighting

As the second weighting schema we use the
position of the documents in the retrieved lists
The weight of each document d in the Li,q is
defined by
Ni is the number of elements in the ith list,
Li,q, and POSi is the position of document d in
Li,q.

.
15
Outline

The Persian Language
Used Methods
Vector Space Model
Language Modeling
OWA Operator
The test collections
Experimental results
Conclusion

?
16
Test Collections

Qvanin Collection
Documents Iranian Law Collection
177089 passages
41 queries and Relevance Judgments
Hamshari Collection
Documents 600 MB News from Hamshari Newspaper
160000 news articles
60 queries and Relevance Judgments
BijanKhan Tagged Collection
Documents 100 MB from different sources
A tag set of 41 tags
2590000 tagged words

17
Hamshahri Collection

We used HAMSHAHRI (a test collection for Persian
text prepared and distributed by DBRG (IR team)
of University of Tehran)
The 3rd version
contains about 160000 distinct textual news
articles in Farsi
60 queries and relevance judgments for top 20
relevant documents for each query

18
Outline

The Persian Language
Used Methods
Pivoted normalization
N-Gram approach
Local Context Analysis
Our test collections
Experimental results
Conclusion

?
19
Experiment results
The precision of the six retrieval engines at
different document cut-offs. The LM4 and the
Lnu.ltu with slope 0.25 methods are better than
the other systems
20
Quantifier Based OWA Weighting
The parameter n (in Most and Few quantifiers)
indicates the minimum number of retrieval lists
sufficient for inclusion in the merge process
21
Experiment results
The precision of the fusion methods at different
document cut-offs. The bests are Most3 Most4.
22
Experiment results

Comparing LM4 and Lnu.ltu methods with the best
OWA results

23
Statistical significance tests

Wilcoxon Signed Rank
T-Test

24
Statistical significance tests

Based on T Test, both Most3 and Most4 methods are
significantly better than LM4 method which is a
confirmation of The Wilconxon Signed Rank test.
However, with the T-Test we can not confirm the
significance of the Most3 and Most4 methods over
the Lnu.ltu with slope of 0.25 method.

25
Conclusion

We used two weighting namely quantifier based and
degree-of-importance based weighting methods
The experimental results show that the best OWA
operator, Most3 and Most4 (quantifier based OWA
operators), only marginally improve over the best
retrieval method on Persian text the LM4 methods.
However seems they produce better ranking since
they push the relevant documents to higher ranks.
The significant tests we conducted seem to
confirm that Most3 and Most4 are significantly
better than all other methods but Lnu.ltu with
slope of 0.25.
However, the superiority over the Lnu.ltu with
slope of 0.25 was not confirmed by T-Test.

26
Thanks, Questions

http//ece.ut.ac.ir/dbrg
27
OWA Operator- Cont.

The OWA weight of each document is computed by
this Equation
WT is the transpose vector of W that
defines the semantics of associated with the
OWA operator
Bb1,b2,..,bn is the vector
X x1, x2,, xn reordered so that bjMinj(x1,
x2,, xn), that is the jth smallest element of
all the x1, x2,, xn.
we used a simple function to bring the scores
(xi, i1,,n) into a same scale