Title: FuFaIR: a Fuzzy Farsi Information Retrieval System
1- FuFaIR a Fuzzy Farsi Information Retrieval
System - Amir Nayyeri
- School of Electrical and Computer Engineering
- University of Tehran
- Farhad Oroumchian
- University of Wollongong in Dubai
2Overview
- Persian Language
- Related Work
- Fuzzy IR
- Farsi IR
- FuFaIR Explanation
- Experimental Results
- Conclusion and Future Work
3Persian Language
- Spoken in several countries (Iran, Afghanistan,
Tajikistan ) - This language has evolved over the years been
influenced by many languages - Contains foreign words from many languages such
as Arabic, Turkish, French, English, - In some cases these words still follow the
grammatical rules of their original languages for
example - Maktab ???? (singular) ? MAKATEB ?????
(plural) - In some cases these words could use grammatical
rules of both languages i.e. - Khabar ??? (singular) ?
- AKHBAR ????? (Arabic)
- KHABAR-HA ????? (Persian)
- Morphological analyzers for this language need to
deal with many forms of words
4Information Retrieval and Natural Language
Processing for Persian (Farsi)
- Faculty of Engineering of University of Tehran
started working on processing of Persian about 7
years ago. - From 3 years ago, it has been a joint
co-operation between UT and UOWD. - Since then several thousand experiments on
processing and retrieval of Persian text have
been performed.
5Test Collections
- Qvanin Collection
- Documents Iranian Law Collection
- 177089 passages
- 41 queries and Relevance Judgments
- Hamshari Collection
- Documents 300 MB News from Hamshari Newspaper
- Part of Speech Tagging Collection
- A tag set of 40 tags
- 2590000 tagged words
6Natural Language Processing
- Investigating Automatic Part of Speech Tagging
based on machine learning approaches - Probabilistic (Hidden Markov Model)
- Rule based
- Entropy based
- Neural Networks
- The best so far has reached a 96 accuracy.
7Information Retrieval Experiments
- All Major Retrieval Models of English text
retrieval have been tested and their combinations
(i.e.) - Fuzzy Logic
- MMM, Paice,
- Vector Space
- Probabilistic
- BM25
- N-Grams
- N2, N3, N4
- Combinational
- With many different term weighting schemes.
8List of Weights that produced the best results
Best
9Best
10The context of the current work
- Improving the quality of Persian retrieval
- Improving IR systems that used Fuzzy Logic as
their retrieval model
11Related Work Fuzzy IR
- Fuzzy logic has been used in IR from early days.
- But only a few of them could show superiority in
comparison with Classical approaches like vector
space. - This has been confirmed for Persian language
also. - The current work has been mostly inspired by one
of them - D.E. Losada, F.D. Hermida, A. Bugarin, S. Barro.
Experiments on using fuzzy quantified sentences
in adhoc retrieval. ACM Symposium on Applied
Aomputin, 2004.
12Mixed Min Max MMM
Calculates the degree of membership of a document
to the fuzzy set of the terms in the query as
below OR Query (??????? ?? ?????) ??
((Guardian OR GOD Parent Q or (A1OR A2 OR A3
OR ) SIM(Qor, D) C or1 max(dA1, dA2, ) C
or2 min(dA1, dA2, ) AND Query (????? ?
??? ) (Registration AND Properties) ?? Q
and (A1 AND A2 AND A3 AND ) SIM(Qand, D) C
and1 min(dA1, dA2, ) C and2 max(dA1,
dA2, ) Cand , Cor softness coefficient Cand1
0.5,0.8 Cand2 1 Cand1 Cor1 gt 0.2
Cor2 1- Cor1
13Paice Model
Calculates the degree of membership of a document
to the fuzzy set of terms in the query as
below AND Query (????? ? ??? ) ??
(Registration AND Properties) Q and (A1 and A2
and A3 and ) OR Query (??????? ?? ?????) ??
(Guardian OR GOD Parent ) Q or (A1or A2 or A3
or ) SIM(Q, D) ? ri-1 tdi / ? ri-1 r 1.0
for and queries (tdi ascending order) r 0.7
for or queries (tdi descending order)
14Comparison of Fuzzy Systems
Experiments on Qavanin Collection
15Probabilistic Systems (BM25)
Experiments on Qavanin Collection
16Comparison of Vector Space Systems With BM25
Experiments on Qavanin Collection
17Comparison of Best Vector Space With Best N-grams
Experiments on Qavanin Collection
18FuFaIR
- The query is considered as a fuzzy set of
relevant documents in the database - The documents will be sent to the client sorted
based on their degree of membership to the
query's fuzzy set - The larger the value of µi the more relevant is
the document to the query
i
19FuFaIR (Cont.)
- each term is assigned a membership degree to a
document based on the importance of that term for
representing the documents content. - Membership degree can be computed with classical
IR parameters such as tf/idf - The input query is considered as an algebraic
sentence whose elements are - Terms
- Fuzzy operators such as AND, OR, and NOT
- Applying the operators on terms the final Fuzzy
Set results
i
20FuFaIR (Cont.)
- The membership degree of a document to an
individual term is defined as follows in our
method
i
ft,d Frequency of term t in document d idf
(t) Inverse document frequency of term t
21Overview
- Persian Language
- Related Work
- Fuzzy IR
- Farsi IR
- Fuzzy Logic Overview
- FuFaIR Explanation
- Experimental Results
- Conclusion and Future Work
22Experimental Results
- Parameters
- Hamshahri Corpora has been used
- Total size of the collection 300MB
- Indexing has been performed after stop word
elimination - No stemming has been applied
- 30 queries have been used for these experiments
- Precision has been computed for top 20 retrieved
documents.
23Experimental Results (Cont.)
24Experimental Results (Cont.)
- As a bench mark the best Persian retrieval model
so far has been selected. That is the Vector
Space model with Lnu-ltu weighting scheme. - Pivot and the slope parameters have been set to
13.36, and 0.75, respectively - The effectiveness of these values had been shown
by previous works (See Paper). - To calculate the performance of each run, the
precision at 5, 10, 15 and 20 document cut-offs
have been calculated and averaged over all 30
queries.
25Experimental Results (Cont.)
26Conclusion Future Work
- Conclusion
- Main contribution of this paper
- Design, implementation and testing of FuFaIR a
Fuzzy retrieval system for Persian language. - fuzzy quantifiers are also added to the original
model to provide more flexibility - In comparison with Vector Space, FuFaIR
significantly better performance - Future Works
- Testing different interpretation of the Fuzzy
operators on the Persian corpora - Examining the true value and contribution of a
Persian stemmer in retrieval.
27 28Conception of Fuzzy Logic
- Many decision-making and problem-solving tasks
are too complex to be defined precisely -
- however, people succeed by using imprecise
knowledge -
- Fuzzy logic resembles human reasoning in its use
of approximate information and uncertainty to
generate decisions.
29Natural Language
- Consider
- Joe is tall -- what is tall?
- Joe is very tall -- what does this differ from
tall? - Natural language (like most other activities in
life and indeed the universe) is not easily
translated into the absolute terms of 0 and 1.
30Fuzzy Logic
- An approach to uncertainty that combines real
values 01 and logic operations - Fuzzy logic is based on the ideas of fuzzy set
theory and fuzzy set membership often found in
natural (e.g., spoken) language.
31Example Young
- Example
- Ann is 28, 0.8 in set Young
- Bob is 35, 0.1 in set Young
- Charlie is 23, 1.0 in set Young
- Unlike statistics and probabilities, the degree
is not describing probabilities that the item is
in the set, but instead describes to what extent
the item is the set.
32Membership function of fuzzy logic
Fuzzy values
DOM Degree of Membership
Young
Old
Middle
1
0.5
0
25
40
55
Age
Fuzzy values have associated degrees of
membership in the set.
33Benefits of fuzzy logic
- You want the value to switch gradually as Young
becomes Middle and Middle becomes Old. This is
the idea of fuzzy logic.
34Fuzzy Set Operations
- Fuzzy OR (?) the union of two fuzzy sets is the
maximum (MAX) of each element from two sets. - E.g.
- A 1.0, 0.20, 0.75
- B 0.2, 0.45, 0.50
- A ? B MAX(1.0, 0.2), MAX(0.20, 0.45),
MAX(0.75, 0.50) - 1.0, 0.45, 0.75
35Fuzzy Set Operations
- Fuzzy AND (?) the intersection of two fuzzy sets
is just the MIN of each element from the two
sets. - E.g.
- A ? B MIN(1.0, 0.2), MIN(0.20, 0.45),
MIN(0.75, 0.50) 0.2, 0.20, 0.50
36Fuzzy Set Operations
- The complement of a fuzzy variable with DOM x is
(1-x). - Complement The complement of a fuzzy set is
composed of all elements complement. - Example.
- Ac 1 1.0, 1 0.2, 1 0.75 0.0, 0.8,
0.25