Using OWA Fuzzy Operator to Merge Retrieval System Results - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Using OWA Fuzzy Operator to Merge Retrieval System Results

Description:

Tehran University. Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud Rahgozar ... University of Tehran - Database Research Group. The Persian Language ... – PowerPoint PPT presentation

Number of Views:124
Avg rating:3.0/5.0
Slides: 28
Provided by: anneko
Category:

less

Transcript and Presenter's Notes

Title: Using OWA Fuzzy Operator to Merge Retrieval System Results


1
Using OWA Fuzzy Operator to Merge Retrieval
System Results
Tehran University
  • Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud
    Rahgozar
  • School of Electrical and Computer Engineering
  • University of Tehran
  • Farhad Oroumchian
  • University of Wollongong in Dubai

2
Outline
  • The Persian Language
  • Used Methods
  • Vector Space Model
  • Language Modeling
  • OWA Operator
  • The test collections
  • Experiment results
  • Conclusion

3
Outline
  • The Persian Language
  • Used Methods
  • Vector Space Model
  • Language Modeling
  • OWA Operator
  • The test collections
  • Experiment results
  • Conclusion

?
4
The Persian Language
  • It is Spoken in countries like Iran, Tajikistan
    and Afghanistan
  • It has Arabic like script and 32 characters
    written continuously from right to left
  • Its morphological analyzers need to deal with
    many forms of words that are not actually Farsi
  • Example
  • The word ???? that has two plural forms in
    Farsi
  • Farsi form???? ??
  • Arabic form?????

5
Outline
  • The Persian Language
  • Used Methods
  • Vector Space Model
  • Language Modeling
  • OWA Operator
  • The test collections
  • Experimental results
  • Conclusion

?
6
Vector Space Model
List of Weights that produced the best results
Best
We used Lnu.ltu and Lnc.btc weighting schemas
7
Outline
  • The Persian Language
  • Used Methods
  • Vector Space Model
  • Language Modeling
  • OWA Operator
  • The test collections
  • Experimental results
  • Conclusion

?
8
Language Modeling
  • four ways to specify the rank of document d
    against query q
  • Considering P(Dd) as the prior probability of
    relevance of the document d to the query q
  • Lambda (? ) is a smoothing parameter and is equal
    for each query term
  • if there is no previous relevance information
    available for a query, each query term will be
    considered equally important

9
Language Modeling- Cont.
10
Outline
  • The Persian Language
  • Used Methods
  • Vector Space Model
  • Language Modeling
  • OWA Operator
  • The test collections
  • Experimental results
  • Conclusion

?
11
OWA Operator- Cont.
  • We used OWA operator as the merge operator.
  • The OWA weight of each document d is defined as
  • Each score xi is assigned by ith search engine
    to document d. If d is not present in the ith
    list then xi0.

.
12
OWA Operator- Weighting Method
  • Quantifier Based Weighting
  • Degree of Importance Based Weighting

13
Quantifier Based Weighting
  • linguistic quantifiers All, Most, Few, and
    At-Least-One as the weighting schemas,
  • All consider documents appearing in all
    retrieval engines lists. This quantifier is
    suitable when the user is looking for precise
    answer
  • Most a fuzzy majority operator that assumes the
    retrieval by the most of the engines to be
    sufficient for inclusion in the fused list.
  • Few is a weaker weighting schemas in which it is
    enough for a document to be retrieved by a few
    number of retrieval engines.
  • At-Least-One is the weakest weighting schemas in
    which it is enough for a document to appear in
    only one retrieval engines list to be included
    in the fused list.

14
Degree of Importance Based Weighting
  • As the second weighting schema we use the
    position of the documents in the retrieved lists
  • The weight of each document d in the Li,q is
    defined by
  • Ni is the number of elements in the ith list,
    Li,q, and POSi is the position of document d in
    Li,q.

.
15
Outline
  • The Persian Language
  • Used Methods
  • Vector Space Model
  • Language Modeling
  • OWA Operator
  • The test collections
  • Experimental results
  • Conclusion

?
16
Test Collections
  • Qvanin Collection
  • Documents Iranian Law Collection
  • 177089 passages
  • 41 queries and Relevance Judgments
  • Hamshari Collection
  • Documents 600 MB News from Hamshari Newspaper
  • 160000 news articles
  • 60 queries and Relevance Judgments
  • BijanKhan Tagged Collection
  • Documents 100 MB from different sources
  • A tag set of 41 tags
  • 2590000 tagged words

17
Hamshahri Collection
  • We used HAMSHAHRI (a test collection for Persian
    text prepared and distributed by DBRG (IR team)
    of University of Tehran)
  • The 3rd version
  • contains about 160000 distinct textual news
    articles in Farsi
  • 60 queries and relevance judgments for top 20
    relevant documents for each query

18
Outline
  • The Persian Language
  • Used Methods
  • Pivoted normalization
  • N-Gram approach
  • Local Context Analysis
  • Our test collections
  • Experimental results
  • Conclusion

?
19
Experiment results
The precision of the six retrieval engines at
different document cut-offs. The LM4 and the
Lnu.ltu with slope 0.25 methods are better than
the other systems
20
Quantifier Based OWA Weighting
The parameter n (in Most and Few quantifiers)
indicates the minimum number of retrieval lists
sufficient for inclusion in the merge process
21
Experiment results
The precision of the fusion methods at different
document cut-offs. The bests are Most3 Most4.
22
Experiment results
  • Comparing LM4 and Lnu.ltu methods with the best
    OWA results

23
Statistical significance tests
  • Wilcoxon Signed Rank
  • T-Test

24
Statistical significance tests
  • Based on T Test, both Most3 and Most4 methods are
    significantly better than LM4 method which is a
    confirmation of The Wilconxon Signed Rank test.
  • However, with the T-Test we can not confirm the
    significance of the Most3 and Most4 methods over
    the Lnu.ltu with slope of 0.25 method.

25
Conclusion
  • We used two weighting namely quantifier based and
    degree-of-importance based weighting methods
  • The experimental results show that the best OWA
    operator, Most3 and Most4 (quantifier based OWA
    operators), only marginally improve over the best
    retrieval method on Persian text the LM4 methods.
  • However seems they produce better ranking since
    they push the relevant documents to higher ranks.
  • The significant tests we conducted seem to
    confirm that Most3 and Most4 are significantly
    better than all other methods but Lnu.ltu with
    slope of 0.25.
  • However, the superiority over the Lnu.ltu with
    slope of 0.25 was not confirmed by T-Test.

26
Thanks, Questions
  • ?

http//ece.ut.ac.ir/dbrg
27
OWA Operator- Cont.
  • The OWA weight of each document is computed by
    this Equation
  • WT is the transpose vector of W that
  • defines the semantics of associated with the
  • OWA operator
  • Bb1,b2,..,bn is the vector
  • X x1, x2,, xn reordered so that bjMinj(x1,
    x2,, xn), that is the jth smallest element of
    all the x1, x2,, xn.
  • we used a simple function to bring the scores
    (xi, i1,,n) into a same scale
Write a Comment
User Comments (0)
About PowerShow.com