FuFaIR: a Fuzzy Farsi Information Retrieval System - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

FuFaIR: a Fuzzy Farsi Information Retrieval System

Description:

Classic music in Iran. Cancer treatment methods. The ... The Bidel music group concert. ?????? ?????? ?? ?????. ?????? ????? ?????. ?????? ?????? ?? ??? ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 37
Provided by: eceU3
Category:

less

Transcript and Presenter's Notes

Title: FuFaIR: a Fuzzy Farsi Information Retrieval System


1
  • FuFaIR a Fuzzy Farsi Information Retrieval
    System
  • Amir Nayyeri
  • School of Electrical and Computer Engineering
  • University of Tehran
  • Farhad Oroumchian
  • University of Wollongong in Dubai

2
Overview
  • Persian Language
  • Related Work
  • Fuzzy IR
  • Farsi IR
  • FuFaIR Explanation
  • Experimental Results
  • Conclusion and Future Work

3
Persian Language
  • Spoken in several countries (Iran, Afghanistan,
    Tajikistan )
  • This language has evolved over the years been
    influenced by many languages
  • Contains foreign words from many languages such
    as Arabic, Turkish, French, English,
  • In some cases these words still follow the
    grammatical rules of their original languages for
    example
  • Maktab ???? (singular) ? MAKATEB ?????
    (plural)
  • In some cases these words could use grammatical
    rules of both languages i.e.
  • Khabar ??? (singular) ?
  • AKHBAR ????? (Arabic)
  • KHABAR-HA ????? (Persian)
  • Morphological analyzers for this language need to
    deal with many forms of words

4
Information Retrieval and Natural Language
Processing for Persian (Farsi)
  • Faculty of Engineering of University of Tehran
    started working on processing of Persian about 7
    years ago.
  • From 3 years ago, it has been a joint
    co-operation between UT and UOWD.
  • Since then several thousand experiments on
    processing and retrieval of Persian text have
    been performed.

5
Test Collections
  • Qvanin Collection
  • Documents Iranian Law Collection
  • 177089 passages
  • 41 queries and Relevance Judgments
  • Hamshari Collection
  • Documents 300 MB News from Hamshari Newspaper
  • Part of Speech Tagging Collection
  • A tag set of 40 tags
  • 2590000 tagged words

6
Natural Language Processing
  • Investigating Automatic Part of Speech Tagging
    based on machine learning approaches
  • Probabilistic (Hidden Markov Model)
  • Rule based
  • Entropy based
  • Neural Networks
  • The best so far has reached a 96 accuracy.

7
Information Retrieval Experiments
  • All Major Retrieval Models of English text
    retrieval have been tested and their combinations
    (i.e.)
  • Fuzzy Logic
  • MMM, Paice,
  • Vector Space
  • Probabilistic
  • BM25
  • N-Grams
  • N2, N3, N4
  • Combinational
  • With many different term weighting schemes.

8
List of Weights that produced the best results
Best
9
Best
10
The context of the current work
  • Improving the quality of Persian retrieval
  • Improving IR systems that used Fuzzy Logic as
    their retrieval model

11
Related Work Fuzzy IR
  • Fuzzy logic has been used in IR from early days.
  • But only a few of them could show superiority in
    comparison with Classical approaches like vector
    space.
  • This has been confirmed for Persian language
    also.
  • The current work has been mostly inspired by one
    of them
  • D.E. Losada, F.D. Hermida, A. Bugarin, S. Barro.
    Experiments on using fuzzy quantified sentences
    in adhoc retrieval. ACM Symposium on Applied
    Aomputin, 2004.

12
Mixed Min Max MMM
Calculates the degree of membership of a document
to the fuzzy set of the terms in the query as
below OR Query (??????? ?? ?????) ??
((Guardian OR GOD Parent Q or (A1OR A2 OR A3
OR ) SIM(Qor, D) C or1 max(dA1, dA2, ) C
or2 min(dA1, dA2, ) AND Query (????? ?
??? ) (Registration AND Properties) ?? Q
and (A1 AND A2 AND A3 AND ) SIM(Qand, D) C
and1 min(dA1, dA2, ) C and2 max(dA1,
dA2, ) Cand , Cor softness coefficient Cand1
0.5,0.8 Cand2 1 Cand1 Cor1 gt 0.2
Cor2 1- Cor1
13
Paice Model
Calculates the degree of membership of a document
to the fuzzy set of terms in the query as
below AND Query (????? ? ??? ) ??
(Registration AND Properties) Q and (A1 and A2
and A3 and ) OR Query (??????? ?? ?????) ??
(Guardian OR GOD Parent ) Q or (A1or A2 or A3
or ) SIM(Q, D) ? ri-1 tdi / ? ri-1 r 1.0
for and queries (tdi ascending order) r 0.7
for or queries (tdi descending order)
14
Comparison of Fuzzy Systems
Experiments on Qavanin Collection
15
Probabilistic Systems (BM25)
Experiments on Qavanin Collection
16
Comparison of Vector Space Systems With BM25
Experiments on Qavanin Collection
17
Comparison of Best Vector Space With Best N-grams
Experiments on Qavanin Collection
18
FuFaIR
  • The query is considered as a fuzzy set of
    relevant documents in the database
  • The documents will be sent to the client sorted
    based on their degree of membership to the
    query's fuzzy set
  • The larger the value of µi the more relevant is
    the document to the query

i
19
FuFaIR (Cont.)
  • each term is assigned a membership degree to a
    document based on the importance of that term for
    representing the documents content.
  • Membership degree can be computed with classical
    IR parameters such as tf/idf
  • The input query is considered as an algebraic
    sentence whose elements are
  • Terms
  • Fuzzy operators such as AND, OR, and NOT
  • Applying the operators on terms the final Fuzzy
    Set results

i
20
FuFaIR (Cont.)
  • The membership degree of a document to an
    individual term is defined as follows in our
    method

i
ft,d Frequency of term t in document d idf
(t) Inverse document frequency of term t
21
Overview
  • Persian Language
  • Related Work
  • Fuzzy IR
  • Farsi IR
  • Fuzzy Logic Overview
  • FuFaIR Explanation
  • Experimental Results
  • Conclusion and Future Work

22
Experimental Results
  • Parameters
  • Hamshahri Corpora has been used
  • Total size of the collection 300MB
  • Indexing has been performed after stop word
    elimination
  • No stemming has been applied
  • 30 queries have been used for these experiments
  • Precision has been computed for top 20 retrieved
    documents.

23
Experimental Results (Cont.)
  • Some Sample Queries

24
Experimental Results (Cont.)
  • As a bench mark the best Persian retrieval model
    so far has been selected. That is the Vector
    Space model with Lnu-ltu weighting scheme.
  • Pivot and the slope parameters have been set to
    13.36, and 0.75, respectively
  • The effectiveness of these values had been shown
    by previous works (See Paper).
  • To calculate the performance of each run, the
    precision at 5, 10, 15 and 20 document cut-offs
    have been calculated and averaged over all 30
    queries.

25
Experimental Results (Cont.)
  • Comparison Results

26
Conclusion Future Work
  • Conclusion
  • Main contribution of this paper
  • Design, implementation and testing of FuFaIR a
    Fuzzy retrieval system for Persian language.
  • fuzzy quantifiers are also added to the original
    model to provide more flexibility
  • In comparison with Vector Space, FuFaIR
    significantly better performance
  • Future Works
  • Testing different interpretation of the Fuzzy
    operators on the Persian corpora
  • Examining the true value and contribution of a
    Persian stemmer in retrieval.

27
  • Questions ?

28
Conception of Fuzzy Logic
  • Many decision-making and problem-solving tasks
    are too complex to be defined precisely
  • however, people succeed by using imprecise
    knowledge
  • Fuzzy logic resembles human reasoning in its use
    of approximate information and uncertainty to
    generate decisions.

29
Natural Language
  • Consider
  • Joe is tall -- what is tall?
  • Joe is very tall -- what does this differ from
    tall?
  • Natural language (like most other activities in
    life and indeed the universe) is not easily
    translated into the absolute terms of 0 and 1.

30
Fuzzy Logic
  • An approach to uncertainty that combines real
    values 01 and logic operations
  • Fuzzy logic is based on the ideas of fuzzy set
    theory and fuzzy set membership often found in
    natural (e.g., spoken) language.

31
Example Young
  • Example
  • Ann is 28, 0.8 in set Young
  • Bob is 35, 0.1 in set Young
  • Charlie is 23, 1.0 in set Young
  • Unlike statistics and probabilities, the degree
    is not describing probabilities that the item is
    in the set, but instead describes to what extent
    the item is the set.

32
Membership function of fuzzy logic
Fuzzy values
DOM Degree of Membership
Young
Old
Middle
1
0.5
0
25
40
55
Age
Fuzzy values have associated degrees of
membership in the set.
33
Benefits of fuzzy logic
  • You want the value to switch gradually as Young
    becomes Middle and Middle becomes Old. This is
    the idea of fuzzy logic.

34
Fuzzy Set Operations
  • Fuzzy OR (?) the union of two fuzzy sets is the
    maximum (MAX) of each element from two sets.
  • E.g.
  • A 1.0, 0.20, 0.75
  • B 0.2, 0.45, 0.50
  • A ? B MAX(1.0, 0.2), MAX(0.20, 0.45),
    MAX(0.75, 0.50)
  • 1.0, 0.45, 0.75

35
Fuzzy Set Operations
  • Fuzzy AND (?) the intersection of two fuzzy sets
    is just the MIN of each element from the two
    sets.
  • E.g.
  • A ? B MIN(1.0, 0.2), MIN(0.20, 0.45),
    MIN(0.75, 0.50) 0.2, 0.20, 0.50

36
Fuzzy Set Operations
  • The complement of a fuzzy variable with DOM x is
    (1-x).
  • Complement The complement of a fuzzy set is
    composed of all elements complement.
  • Example.
  • Ac 1 1.0, 1 0.2, 1 0.75 0.0, 0.8,
    0.25
Write a Comment
User Comments (0)
About PowerShow.com