Cross-lingual Event Tracking - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Cross-lingual Event Tracking

Description:

Cross-lingual Event Tracking (CLET) ... Apply Cross-lingual Information Retrieval Technique for CLET. Query Expansion ... Apply MT to cross-lingual tracking ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 24

Provided by: xia107

Category:

more less

Transcript and Presenter's Notes

Title: Cross-lingual Event Tracking

1
Cross-lingual Event Tracking

2
Outline

Task
Two Alternative Approaches
Our Methods
Experimental results
Observations
Future work

3
Task

Tracking Task
More Difficult than Information Filtering (TREC)
No human relevance feedback
Cross-lingual Event Tracking (CLET)
Using source language topic track on target
language stories
Difficulty bridging the language gap

4
Approaches

Translating test documents (common approach)
Translating the multilingual test documents to
preferred language, and treating the problem as a
monolingual tracking task.
Translating sampled training docs (true CLET)
Translating some sampled training stories to the
same language with test documents

6
Goal

Reduce the gap between these two approaches

7
Main Ideas

Apply Cross-lingual Information Retrieval
Technique for CLET
Query Expansion
Bi-gram Based Segmentation (for Chinese)
Adaptation
LIMSI weighted adaptation-- LWAdapt
CMU normalized and weighted adaptation -- NWAdapt

8
CMU Event Tracking System

Rocchio Model
the centroid of a category which is constructed
using a set of positive training examples and s
set of negative training examples of that class.
Fix Weighted Adaptation
LIMSIs Adaptation
CMUs Adaptation

9
Cross-Lingual Components

Topic Expansion
Sample translations
Using the bilingual dictionary
Using the CL-PRF technique
Segmentation (for Chinese)
Phrase-based
Bigram-based
Adaptation
LIMSIs adaptation
CMUs adaptation

10
Experiment Design

Mixed language event tracking
Demonstrate that our system is comparable to the
best teams in recent TDT benchmark evaluations
CLET based on test document translation
Give a comparable baseline for our approach
CLET based on translating sampled training data
Our true cross-lingual event tracking approach

11
Experiments English-Chinese Data

Mixed language event tracking
TDT 2001 evaluation data
CLET based on test document translation
TDT 1999 evaluation data
Translation SYSTRAN MT system (released by NIST)
CLET based on translating sampled training data
TDT 1999 evaluation data
Translation LDC dictionary

12
Mixed language event tracking

LIMSI result in TDT2001 Cost 0.1332

Adaptation Method NormalizedMin Cost Cost Reduction Ratio
Without adaptation 0.1225 --
LMAdapt 0.1183 3.4
NMAdapt 0.1133 7.5
13
Two Baselines Translating test documents

Using SYSTRAN MT system
-- Released by NIST
-- Cost 0.1336
Using LDC dictionary
-- Translated by CMU
-- Cost 0.1899

14
Experimental Results--The Effects of Topic
Expansion (TE) and Segmentation
15
Experimental Results--The Effects of Topic
Expansion (TE) and Segmentation

Condition English-Chinese Cost Cost Reduction Ratio
Phrase (DICT) 0.5039 --
PhraseTE 0.2974 41
Bigram 0.3848 26.3
BigramTE 0.2522 50
16
Experimental Results--The effects of different
adaptation approaches

Condition English-Chinese Cost Cost Reduction Ratio
Phrase 0.5039 --
PhraseLWAdapt 0.4258 15.5
PhraseNWAdapt 0.4197 16.7
PhraseTE 0.2974 41
PhraseTELWAdapt 0.2660 47.2
PhraseTENWAdapt 0.2617 48
Bigram 0.3848 26
BigramTE 0.2522 50
BigramTELWAdapt 0.2467 51
BigramTENWAdapt 0.2413 52.6
17
Experimental Results--Translating Test Documents
vs. Sampled Training Documents (1)
18
Experimental Results--Translate Test Documents
vs. Sampled Training Documents (Using StatMT
dictionary generated by IBM system)
19
Observations

CMUs adaptation gets better performance than
LIMSIs adaptation
Topic expansion improved the performance
Bi-gram gets better performance than segmentation
in Mandarin tracking task
CLIR techniques are an effective way of bridging
language gap in true CLET

20
Future Work

Apply MT to cross-lingual tracking
Introduce named entity to CLET task, further
improve the system performance

21
Reference

Improving text categorization methods for event
tracking Yiming Yang, Tom Ault etc.
Learning Approaches for Detecting and Tracking
News Events Yiming Yang, Jaime Carbonell etc.
The BBN Crosslingual Topic Detection and Tracking
System Tim Leek, Hubert Jin etc.
The LIMSI topic tracking system for TDT2002
Yuen-Yee Lo and Jean-Luc Gauvain

22
Evaluation

DET Curve
Reductive Ratio

23
CLET using training sampled data

Segmentation Expand Adaptation Label
Phrase NO No Phrase
Phrase NO FWAdapt PhraseFWAdapt
Phrase NO LWAdapt PhraseLWAdapt
Phrase NO NWAdapt PhraseNWAdapt
Phrase Yes No PhraseTE
Phrase Yes FWAdapt PhraseTEFWAdapt
Phrase Yes LWAdapt PhraseTELWAdapt
Phrase Yes NWAdapt PhraseTENWAdapt
Bigram NO No Bigram
Bigram NO FWAdapt BigramFWAdapt
Bigram NO LWAdapt BigramLWAdapt
Bigram NO NWAdapt BigramNWAdapt
Bigram Yes No BigramTE
Bigram Yes FWAdapt BigramTEFWAdapt
Bigram Yes LWAdapt BigramTELWAdapt
Bigram Yes NWAdapt BigramTENWAdapt

Write a Comment

User Comments (0)