Title: NGrams Conflation Approach for Arabic Text
1N-Grams Conflation Approach for Arabic Text
- Farag Ahmed, and Andreas NürnbergerOtto-von-Gueri
cke-University of Magdeburg - http//irgroup.cs.uni-magdeburg.de
fahmed_at_iws.cs.uni-magdeburg.de
2N-gram conflation technique
- GoalStudy a conflation method based on n-gram
approach with some enhancements and evaluate its
performance in Arabic text. - What is conflation method?Matching non-identical
words that refer to the same principle concept. - Why is it important?Avoid the strong dependences
on the exact users query. - The problem
- Document Information need
-
- Query Problem
Word form variations whose English translation
contain the word student or students ?????
??????? ???????? ???????? ????????? ??????
??????? ??????? ?????? ???????? ??????? ????????
??????? ????????Feminine
?????? ??????? ??????? ????????
??????? ????????? ??????? ???????? ????????
????????? ????????? ???????? ???????? ?????????
???????? ????????? ??????? ???????? ???????
???????? ?????? ?????????. ???? ?????? ???????
??????? ???????? ????? ?????? ?????? ?????
??????? ?????? ??????? ?????? ??????? Masculine
????? ?????? ??????
??????? ?????? ???????? ????? ?????? ??????
??????? ???????? ??????? ?????? ??????? ??????
??????? ?????? ??????? ????? ?????? ???? ????????
- ???? ??????? ?? ?????? ??????? ??????
xx-xx?????? ?????? ??? ??????? "???????????
????? ????? ??????? ????????. - (Samsung released it is new mobile xx-xx that
- support office And multimedia applications)
?? ?? ?????? ??????? ?????? ???? ????? ????
??????? ? (What is the new mobile that
Samsung released?)
???? ? ????? ? ??????? Phone, Mobile, Samsung
No term in common between the query and the
document therefore traditional IR system will not
consider this document as a relevant document.
3Revised n-gram
Computing Similarity Scores Based on N-Grams
(1) Computing Similarity Scores
Based on revised N-Grams
(2)
The revised approach enhance the similarity score
measures ????????? (the Alliances) and ??????
(the Conqueror).
Figure 2. the similarity score using revised
bigram is 28.57
Figure 1. the similarity score using pure
bigram is 85.72