Cross-lingual Event Tracking - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Cross-lingual Event Tracking

Description:

Cross-lingual Event Tracking (CLET) ... Apply Cross-lingual Information Retrieval Technique for CLET. Query Expansion ... Apply MT to cross-lingual tracking ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 24
Provided by: xia107
Category:

less

Transcript and Presenter's Notes

Title: Cross-lingual Event Tracking


1
Cross-lingual Event Tracking

2
Outline
  • Task
  • Two Alternative Approaches
  • Our Methods
  • Experimental results
  • Observations
  • Future work

3
Task
  • Tracking Task
  • More Difficult than Information Filtering (TREC)
  • No human relevance feedback
  • Cross-lingual Event Tracking (CLET)
  • Using source language topic track on target
    language stories
  • Difficulty bridging the language gap

4
Approaches
  • Translating test documents (common approach)
  • Translating the multilingual test documents to
    preferred language, and treating the problem as a
    monolingual tracking task.
  • Translating sampled training docs (true CLET)
  • Translating some sampled training stories to the
    same language with test documents

5

6
Goal
  • Reduce the gap between these two approaches

7
Main Ideas
  • Apply Cross-lingual Information Retrieval
    Technique for CLET
  • Query Expansion
  • Bi-gram Based Segmentation (for Chinese)
  • Adaptation
  • LIMSI weighted adaptation-- LWAdapt
  • CMU normalized and weighted adaptation -- NWAdapt

8
CMU Event Tracking System
  • Rocchio Model
  • the centroid of a category which is constructed
    using a set of positive training examples and s
    set of negative training examples of that class.
  • Fix Weighted Adaptation
  • LIMSIs Adaptation
  • CMUs Adaptation


9
Cross-Lingual Components
  • Topic Expansion
  • Sample translations
  • Using the bilingual dictionary
  • Using the CL-PRF technique
  • Segmentation (for Chinese)
  • Phrase-based
  • Bigram-based
  • Adaptation
  • LIMSIs adaptation
  • CMUs adaptation

10
Experiment Design
  • Mixed language event tracking
  • Demonstrate that our system is comparable to the
    best teams in recent TDT benchmark evaluations
  • CLET based on test document translation
  • Give a comparable baseline for our approach
  • CLET based on translating sampled training data
  • Our true cross-lingual event tracking approach

11
Experiments English-Chinese Data
  • Mixed language event tracking
  • TDT 2001 evaluation data
  • CLET based on test document translation
  • TDT 1999 evaluation data
  • Translation SYSTRAN MT system (released by NIST)
  • CLET based on translating sampled training data
  • TDT 1999 evaluation data
  • Translation LDC dictionary

12
Mixed language event tracking
  • LIMSI result in TDT2001 Cost 0.1332

Adaptation Method NormalizedMin Cost Cost Reduction Ratio
Without adaptation 0.1225 --
LMAdapt 0.1183 3.4
NMAdapt 0.1133 7.5
13
Two Baselines Translating test documents
  • Using SYSTRAN MT system
  • -- Released by NIST
  • -- Cost 0.1336
  • Using LDC dictionary
  • -- Translated by CMU
  • -- Cost 0.1899

14
Experimental Results--The Effects of Topic
Expansion (TE) and Segmentation
15
Experimental Results--The Effects of Topic
Expansion (TE) and Segmentation

Condition English-Chinese Cost Cost Reduction Ratio
Phrase (DICT) 0.5039 --
PhraseTE 0.2974 41
Bigram 0.3848 26.3
BigramTE 0.2522 50
16
Experimental Results--The effects of different
adaptation approaches

Condition English-Chinese Cost Cost Reduction Ratio
Phrase 0.5039 --
PhraseLWAdapt 0.4258 15.5
PhraseNWAdapt 0.4197 16.7
PhraseTE 0.2974 41
PhraseTELWAdapt 0.2660 47.2
PhraseTENWAdapt 0.2617 48
Bigram 0.3848 26
BigramTE 0.2522 50
BigramTELWAdapt 0.2467 51
BigramTENWAdapt 0.2413 52.6
17
Experimental Results--Translating Test Documents
vs. Sampled Training Documents (1)
18
Experimental Results--Translate Test Documents
vs. Sampled Training Documents (Using StatMT
dictionary generated by IBM system)
19
Observations
  • CMUs adaptation gets better performance than
    LIMSIs adaptation
  • Topic expansion improved the performance
  • Bi-gram gets better performance than segmentation
    in Mandarin tracking task
  • CLIR techniques are an effective way of bridging
    language gap in true CLET

20
Future Work
  • Apply MT to cross-lingual tracking
  • Introduce named entity to CLET task, further
    improve the system performance

21
Reference
  • Improving text categorization methods for event
    tracking Yiming Yang, Tom Ault etc.
  • Learning Approaches for Detecting and Tracking
    News Events Yiming Yang, Jaime Carbonell etc.
  • The BBN Crosslingual Topic Detection and Tracking
    System Tim Leek, Hubert Jin etc.
  • The LIMSI topic tracking system for TDT2002
  • Yuen-Yee Lo and Jean-Luc Gauvain

22
Evaluation
  • DET Curve
  • Reductive Ratio


23
CLET using training sampled data

Segmentation Expand Adaptation Label
Phrase NO No Phrase
Phrase NO FWAdapt PhraseFWAdapt
Phrase NO LWAdapt PhraseLWAdapt
Phrase NO NWAdapt PhraseNWAdapt
Phrase Yes No PhraseTE
Phrase Yes FWAdapt PhraseTEFWAdapt
Phrase Yes LWAdapt PhraseTELWAdapt
Phrase Yes NWAdapt PhraseTENWAdapt
Bigram NO No Bigram
Bigram NO FWAdapt BigramFWAdapt
Bigram NO LWAdapt BigramLWAdapt
Bigram NO NWAdapt BigramNWAdapt
Bigram Yes No BigramTE
Bigram Yes FWAdapt BigramTEFWAdapt
Bigram Yes LWAdapt BigramTELWAdapt
Bigram Yes NWAdapt BigramTENWAdapt
Write a Comment
User Comments (0)
About PowerShow.com