Resolving Translation Ambiguity using Ontological Chains for Cross Language Information Retrieval - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

Resolving Translation Ambiguity using Ontological Chains for Cross Language Information Retrieval

Description:

Hypernym / Hyponym (relation is a kind of) 11 /61 ... Medium-strong:Hypernym , 4 point. 12 /61. WSD using Lexical Chain [Barzilay97] 13 /61 ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 62
Provided by: ccNct
Category:

less

Transcript and Presenter's Notes

Title: Resolving Translation Ambiguity using Ontological Chains for Cross Language Information Retrieval


1
?????????????????????????Resolving Translation
Ambiguity using Ontological Chains for Cross
Language Information Retrieval
  • Institute of Computer and Information Science
  • National Chiao Tung University
  • Student Je-Wei Liang
  • Advisor Dr. Hao-Ren Ke
  • Dr. Wei-Pang Yang

2
Outline
  • Introduction
  • Related Work
  • Query Translation
  • Word Sense Disambiguation
  • Ontology-based CLIR
  • Query Translation
  • Resolving Translation Ambiguity
  • Mono-lingual IR
  • Evaluation
  • Document set
  • Topics
  • Relevance Assessment
  • Result
  • Conclusion
  • Reference

3
Introduction What is Cross Language Information
Retrieval?
  • Enable users to query in one language and
    retrieve relevant documents in other languages.
  • Users neednt know the exact translation of
    their queries.

4
Introduction Why CLIR ?
  • Chinese is the most spoken language in the world,
    but web pages are mostly in English.
  • Through the aid of CLIR systems, Chinese speakers
    are able to retrieve English documents.

5
Introduction Ambiguity
  • Ambiguity occurs when segmentation, translation
    and indexing in CLIR systems.
  • Our research focuses on resolving translation
    ambiguity.

6
Motivation and Objectives
  • Motivation
  • Queries are usually short.
  • e.g. ?
  • A word may have many possible translations.
  • e.g. ?? man, gentleman
  • Lack of domain knowledge
  • e.g. galleries is related to library
  • Objectives
  • Design an ontology-based CLIR System
  • Query expansion using WordNet.
  • Resolving translation ambiguity using ontological
    chains.

7
Related Work
8
Related Work Improved use of Contextual
Information in CLIR Fung98
  • ?? and flu have similar context.
  • News stories about ??? in Hong Kong.

9
Related Work Improved use of Contextual
Information in CLIR Fung98
10
Related Work WSD using Lexical Chain Barzilay97
  • Word meanings are represented by synonym sets
    (synsets)
  • Relations defined in WordNet
  • Synonym / Antonym
  • Hypernym / Hyponym (relation is a kind of)

11
Related Work WSD using Lexical Chain Barzilay97
  • A procedure for constructing lexical chains
    follows three steps
  • Select a set of candidate words (nouns).
  • For each candidate word, find an appropriate
    chain relying on a relatedness criterion among
    members of the chains.
  • If it is found, insert the word in the chain and
    update it accordingly.
  • Three kinds of relations are defined
  • Extra-strongSynonym, 10 point
  • StrongHolonym , 7 point
  • Medium-strongHypernym , 4 point

12
WSD using Lexical Chain Barzilay97
13
WSD using Lexical Chain Barzilay97
  • Machine
  • an efficient person
  • E.g. the boxer was a magnificent fighting
    machine"

14
WSD using Lexical Chain Barzilay97
Score 11
Score 30
15
Related Work Building a Chinese English WordNet
Chen02
16
Related Work Building a Chinese English WordNet
Chen02
17
Related Work Building a Chinese English WordNet
Chen02
18
Advantages and Disadvantages of Related Work
19
System workflow
20
Query Translation
21
Query Translation
  • For each query term, add all its synonyms,
    hypernyms and hyponyms to the query.
  • Original query terms are more important than
    newly added terms, so we need to refine term
    weight.
  • Term weight in the query can be defined

22
Query Translation An Example
  • The query ? is translated to fish, and
    retrieves the following documents by looking up
    WordNet.

23
Query Translation Translation Ambiguity
24
Resolving Translation Ambiguity
25
Ontology Construction
  • Each ImageCLEF2004 document belongs to one or
    more categories.
  • There are 946 distinct categories.
  • Human experts gather related categories to form a
    hierarchy.
  • E.g. fish processing and fisherman are
    assigned to the parent node ??.

26
Keyword Extraction
  • Ontology Node Representation
  • Each node is represented by the term-to-concept
    vector.
  • Weight is defined as the product of term
    frequency and inverse concept frequency.
  • ltW1,W2, W3, ..,Wngt
  • Pick the most important k terms as keywords.

27
Keyword Extraction Example
  • Keywords relevant to universities and
    university libraries

28
Building Ontological Chains
  • Build an ontological chain for each query.
  • For each query, find the most similar N ontology
    leaf nodes.
  • Measuring pairwise semantic distance among the
    selected leaf nodes, well obtain a semantic
    graph.
  • Find connected components of the network, and
    pick up the strongest component as our
    ontological chain.
  • For each node in the chain, add its sibling nodes
    to the chain.
  • Calculate mutual information (MI) according to
    the chain for each English query term.
  • Pick up terms having MI gt T, and T is the
    threshold.

29
Building Ontological Chains
  • Step 1
  • Similarity of query Q and ontology leaf node Li
    is defined as the following
  • tij is the number of distinct Chinese query terms
    in document j belonging to Li
  • N is the number of documents belonging to Li
  • E.g. ????????? is similar to fish processing,
    fishwive

30
Building Ontological Chains
  • Step2
  • Define the semantic distance between 2 ontology
    leaf nodes as
  • K is a constant, and D is the path length between
    the 2 nodes.
  • E.g. distance between herring and fish
    processing is K/3.

31
Building Ontological Chains
  • Calculate pairwise semantic distance and then
    obtain a semantic graph.

32
Building Ontological Chains
Step3 Employ union-find algorithm to find
connected components, and choose the maximum
weighted one.
33
Building Ontological Chains
  • Step 4 add fish markets, fisherman
  • Step 5,6 For the term ??, pick up man
    instead of gentleman.

34
Monolingual Information Retrieval System
35
Document Vector Representation
  • Each document has 3 kind of features
  • Terms Wi,j is defined as tf idf
  • Categories Wci,j is defined by boolean weighting
    scheme
  • Temporal feature Wti,j is defined by boolean
    weighting scheme
  • We use cosine measure as our similarity function

36
Query Vector Representation
  • Each feature is multiplied by a weighting factor.
  • We define 3 temporal operation before, in, after
  • E.g. D1 is published in 1898 D2 published in
    1901, and Q is the operation before 1900

37
Evaluation Dataset Description
  • ImageCLEF2004 bilingual ad hoc
  • task.
  • St Andrews University Library
  • photographic collection.
  • Photos are primarily historic in
  • nature from areas in and around
  • Scotland.
  • Dataset Overview
  • 28133 SGML documents consist
  • of text and images.
  • 946 categories

38
Evaluation Dataset Description
39
Evaluation Topics
  • Topics are based on real search request,
    including query logs, and requests from patrons.

40
Evaluation Metrics Mean Average Precision
  • Average Precision
  • Average of precision at each relevant document
    retrieved.
  • E.g. average precision of the query ?????????.
  • Mean Average Precision
  • Mean of the individual average precision scores.

41
System Demonstration Retrieval
42
System Demonstration Ontology
43
System Demonstration Monolingual
44
System Demonstration Dictionary-based
45
System Demonstration Ontology-based
46
Evaluation Precision/Recall at Top 100
  • Ontology-based CLIR performs better than
    dictionary-lookup CLIR.
  • Ontology-based CLIR system reaches 85
    performance of monolingual IR system.
  • Without Ontology, CLIR reaches only 42
    performance of monolingual IR system.

47
Evaluation 11-point Precision/Recall
  • Ontology-based CLIR system performs better than
    dictionary-lookup CLIR.

48
Evaluation Mean Average Precision
  • Ontology-based CLIR system performs better than
    dictionary-lookup CLIR.
  • Ontology-based CLIR system reaches 92
    performance of monolingual IR system.
  • Without Ontology, CLIR reaches only 81
    performance of monolingual IR system.

49
System Demonstration Feedback
50
System Demonstration Feedback
51
System Demonstration Feedback
52
Evaluation Relevance Feedback
  • Pick up M retrieved relevant documents as
    positive examples, and N retrieved non-relevant
    documents as negative examples.

53
Discussion
  • With ontological chains, CLIR will perform better
    than monolingual IR.
  • Without semantic features, documents are
    retrieved only if they have common terms as the
    query.
  • catch, fisherman and salting are related to
    the query man and woman processing fish in the
    ontology.
  • Ontological chains use semantic features and
    perform better than keyword matching.
  • Our similarity function performs better than
    cosine measure.
  • Terms are independent in the vector space model.
  • Our similarity function have the same effect as
    and operator.

54
Discussion When the query is specific
  • Our approach performs a little worse than
    keyword-matching.
  • E.g. 1908??????????
  • Our approach may expand too much ontology nodes.

55
Discussion When the query is general
  • Our approach performs much better than
    keyword-matching.
  • E.g. ?????????
  • man, woman, processing are general terms
    and appears in many documents.

56
Conclusion
  • Ontology can be employed to represent domain
    knowledge in an CLIR systems.
  • The proposed ontological chain approach can be
    used to resolve translation ambiguity.
  • The proposed ontological chain approach gains
    better precision than others especially when
    translation candidates are very large.

57
Future Work
  • Semantic indexing can be used to resolve
    polysemous words.

58
Reference
  • Ballesteros98 L. Ballesteros and W.B. Croft,
    Resolving ambiguity for cross language
    retrieval, Proc. 21st annual international ACM
    SIGIR conference on Research and development in
    information retrieval, pp.64-71, 1998.
  • Barzilay97 R. Barzilay and M. Elhadad, Using
    Lexical Chains for Text Summarization, ACL/EACL
    Workshop on Intelligent Scalable Text
    Summarization, 1997.
  • Carbonell97 J. Carbonell, Y. Yang, R.
    Frederking, R.D. Brown, Y. Geng, and D. Lee,
    "Translingual Information Retrieval A
    Comparative Evaluation," Proc. Fifteenth
    International Joint Conference on Artificial
    Intelligence Vol 1, pp. 708-715, 1997.
  • Chen02 H.H. Chen, C.C. Lin and W.C. Lin,
    Building a Chinese-English wordnet for
    translingual applications, ACM Transactions on
    Asian Language Information Processing vol. 1,
    Issue 2, pp.103-122, 2002.
  • CLEF04 Cross Language Evaluation Forum,
    avalible at http//clef.iei.pi.cnr.it2002/2004.ht
    ml
  • Frakes92 W.B. Frakes, R. Baeza-Yates,
    Information Retrieval, Data Structures
    Algorithms. Prentice Hall, 1992.
  • Fung98 P. Fung, L.Y. Yee, An IR Approach for
    Translating New Words from Nonparallel,
    Comparable Texts,Proc. of the 36th Annual
    Conference of the Association for Computational
    Linguistics, pp. 414-420, 1998.

59
Reference
  • Gruber93 T. R. Gruber, A translation approach
    to portable ontologies, Knowledge Acquisition,
    pp. 199-220, 1993
  • ImageCLEF04 Cross Language Evaluation Forum,
    avalible at http//ir.shef.ac.uk/imageclef2004/
  • Kipfer01 B.A. Kipfer and R. L. Chapman, Roget's
    International Thesaurus. , HarperResource, 2001.
  • Larkey03 L.S. Larkey and M.E. Connell,
    Structured Queries, Language Modeling, and
    Relevance Modeling in Cross-Language Information
    Retrieval, Information Processing and Management
    Special Issue on Cross Language Information
    Retrieval, 2003.
  • Littman98 M.L. Littman, S.T. Dumais, and T.K.
    Landauer,Automatic cross-language information
    retrieval using latent semantic
    indexing,Cross-Language Information Retrieval,
    pp. 5162, 1998.
  • Lu02 W.H. Lu, L.F. Chien and H.L. Lee,
    Translation of web queries using anchor text
    mining, ACM Transactions on Asian Language
    Information Processing ,Vol 1, Issue 2,
    pp.159-172, 2002
  • Miller95 G. Miller, "Wordnet A Lexical
    Database for English, Proc. of Communications of
    CACM, 1995.

60
Reference
  • Miller99 D.R.H. Miller, T. Leek, R.M. Schwartz,
    A hidden Markov model information retrieval
    system, Proc. of the 22nd annual international
    ACM SIGIR conference on Research and development
    in information, pp. 214-221, 1999.
  • Nie99 J.Y. Nie, M. Simard, P. Isabelle and R.
    Durand , Cross-Language Information Retrieval
    Based on Parallel Texts and Automatic Mining of
    Parallel Texts from the Web, Proc. of the 22nd
    annual international ACM SIGIR conference on
    Research and development in information, pp.
    74-81, 1999.
  • Porter80 M. F. Porter, An algorithm for suffix
    stripping, Program, Vol. 14, No. 3, pp. 130-137,
    1980
  • Rocchio71 J. Rocchio, Relevance Feedback in
    Information Retrieval, Prentice-Hall, Inc.,
    1971.
  • Salton83 G. Salton and M. J. McGill,
    Introduction to Modern Information Retrieval ,
    McGraw-Hill, 1983
  • Savoy03 J. Savoy ,Cross-language information
    retrieval experiments based on CLEF 2000
    corpora, Information Processing Management
    ,Vol. 39, Issue 1, pp. 75-115, 2003.
  • Trajan75 R.E. Tarjan, Efficiency of a Good But
    Not Linear Set Union Algorithm, Journal of the
    ACM, Vol 22, Issue 2, pp. 215-225, 1975.

61
Reference
  • Xu01 J. Xu, R. Weischedel, and C. Nguyen,
    Evaluating a probabilistic model for
    cross-lingual information retrieval, Proc. 24th
    annual international ACM SIGIR conference on
    Research and development in information retrieval
    , pp. 105-110, 2001
  • Zhang02 Y. Zhang and P. Vines, Improved use of
    Contextual Information in Cross-language
    Information Retrieval, ACDS, 2002.
Write a Comment
User Comments (0)
About PowerShow.com