Unsupervised Constraint Driven Learning for Transliteration Discovery - PowerPoint PPT Presentation

About This Presentation
Title:

Unsupervised Constraint Driven Learning for Transliteration Discovery

Description:

Title: Unsupervised Constraint Driven Learning for Transliteration Discovery Author: Ming-Wei Chang Last modified by: Ming-Wei Chang Created Date – PowerPoint PPT presentation

Number of Views:134
Avg rating:3.0/5.0
Slides: 35
Provided by: Ming119
Category:

less

Transcript and Presenter's Notes

Title: Unsupervised Constraint Driven Learning for Transliteration Discovery


1
Unsupervised Constraint Driven Learning for
Transliteration Discovery
  • M. Chang, D. Goldwasser, D. Roth, and Y. Tu

2
What I am going to do today
  • Goal 1 Present the transliteration work
  • Get feedback!
  • Goal 2 Think about this work with CCM
  • Tutorial . ?
  • I will try to present this work in a slightly
    different way
  • Some of them are my personal comment
  • Different than our yesterday discussion
  • Please give us comment about this
  • Make this work more general (not only
    transliteration)

3
Wait a sec! What is CCM?
  • I get this question 100 times already!
  • Informal answer
  • everything uses constraints is CCM! ?
  • Formal Answer
  • No constraints
  • CCM
  • We do not define the training method
  • Definition CCM makes prediction with constraints!

4
Constraints Driven Learning
  • Why Constraints?
  • The Goal Building a good system easily
  • We have prior knowledge at our hand
  • Why not inject knowledge directly ?
  • How useful are constraints?
  • Useful for supervised learning Yih and Roth 04
    many others
  • Useful for semi-supervised learning Chang et.al.
    ACL 2007
  • Some times more efficient than labeling data
    directly

5
Unsupervised Constraint Driven Learning
  • In this work
  • We do not use any label instance
  • Achieve to good performance that competitive
    several supervised model
  • Compared to Chang et.al. ACL 2007
  • In ACL 07, they use a small amount of dataset
    (5-20)
  • Reason Bad Models can not benefit from
    constraints!
  • For some applications, we have very good resource
  • We do not need labeled instances at all!

6
In a nutshell
  • Traditional semi-supervised learning.
  • Model can drift from the correct one.

Unsupervised Learning
Model
Resource ?
Prediction Label unlabeled data
Feedback Learn from labeled data
Unlabeled Data
7
In a nutshell
CODL Improves Simple Model Using Expressive
Constraints
CODL Use constraints to generate better training
samples in unsupervised learning.
Model
Better Model
Prediction Constraints
Prediction
Feedback
Unlabeled Data
More accurate labeling
8
Outline
  • Constraint Driven Learning (CoDL)
  • Transliteration Discovery
  • Algorithm
  • Experimental Results

9
Transliteration Generation (Not our focus)
  • Given a Source Transliteration What is the
    target transliteration?
  • Bush
  • ? ??
  • Sushi
  • ? ??
  • Issues
  • Ambiguity
  • For the same source word, many different
    transliteration
  • Think about Chinese
  • What we want find the most widely used
    transliteration

10
Transliteration Discovery (Our focus)
  • Problem Settings
  • Give you two list of words, map them!
  • Advantages
  • A relatively easy problem
  • Can find the most widely used transliteration
  • Assumption
  • Source English
  • Each source entities has a transliteration in the
    target candidates
  • Target candidates might not be named entities

11
Outline
  • Constraint Driven Learning (CoDL)
  • Transliteration Discovery
  • Algorithm
  • Experimental Results

12
Algorithm Outline
  • Prediction Model
  • How to use existing resource to construct the
    Model?
  • Constraints?
  • Learning Algorithm

13
The Prediction Model
  • How do we make prediction?
  • Given a source word, how to predict the best
    target ?
  • Model 1 Vs, Vt ? Yes or No
  • Issue Not many obvious constraints can be added
  • Not a structure prediction problem
  • Model 2 Vs, Vt ? Hidden variables ? Yes or No
  • Predicting F is a structure prediction algorithm
  • We can add constraints more easily

14
The Prediction Model
  • Score for a pair
  • A CCM formulation
  • A slightly different scoring function

More on this point in the next few slides
15
Prediction Model Another View
  • The scoring function looks like weight times
    features!
  • If there is a bad feature, score ? - 8
  • Our Hidden variable (Feature Vectors)
  • Character Mapping

16
  • Everything
  • (a,a), (o,O), (w,_),

17
Algorithm Outline
  • Prediction Model
  • How to use existing resource to construct the
    Model?
  • Constraints?
  • Learning Algorithm

18
Resource Romanization Table
  • Hebrew, Russian
  • How can you type Hebrew or Russian?
  • Use English Keyboard, C maps to
  • A similar character C or S in Hebrew or
    Russian
  • Very easy to get
  • Ambiguous
  • Special Case Chinese (Pin Yin)
  • ?? ? shòu si (Low ambiguity)
  • Map Pin-Yin to English (sushi)
  • Romanization Table? a ?a

19
Initialize the Table
  • Every character pair in the Romanization Table
  • Weight 0
  • Everything else, -1
  • Could have better way to do initialization
  • Note All (v_s,v_t) will get zero without
    constraints

20
Algorithm Outline
  • Prediction Model
  • How to use existing resource to construct the
    Model?
  • Constraints?
  • Learning Algorithm

21
Constraints
  • General Constraints
  • Coverage all character need to be mapped at
    least once
  • No crossing character mappings can not cross
    each other
  • Language Specific Constraints
  • General Restricted Mapping
  • Initial Restricted Mapping
  • Length Restriction

22
Constraints
Many other works use similar information as well!
  • Pin-Yin to English

23
Algorithm Outline
  • Prediction Model
  • How to use existing resource to construct the
    Model?
  • Constraints?
  • Learning Algorithm

24
High-Level Overview
  • Model ? Resource
  • While Converge
  • Use Model Constraints to get Labels (for both
    F, y)
  • Update Model with newly labeled F and y (without
    Constraints) (details in the next slide)
  • Similar to ACL 07
  • Update the model without Constraints
  • Difference from ACL 07
  • We get feedback from the labels of both hidden
    variables and output

25
Training
Predict hidden variables and the labels
Update Algorithm
26
Outline
  • Constraint Driven Learning (CoDL)
  • Transliteration Discovery
  • Algorithm
  • Experimental Results

27
Experimental Setting
  • Evaluation
  • ACC Top candidate is (one of) the right answer
  • Learning Algorithm
  • Linear SVM with C 0.5
  • Dataset
  • English-Hebrew 300 300
  • English-Chinese 581681
  • English-Russian 72750648 (Target includes all
    words)

28
Results - Hebrew
29
Results - Russian
30
Analysis
4) Better Constraints Lead to Better Final Results
  • A small Russian subset was used here

3) Learning has great impact here! But
constraints are very important, too!
1) Without Constraints (on features),
Romanization Table is useless!
2) General Constraints are more important!
31
Related Works (Need more work here)
  • Learning the score for Edit Distance
  • Previous transliteration works
  • Machine translation?

32
Conclusion
  • ML unsupervised constraint driven algorithm
  • Use hidden variable to find more constraints
    (e.g. co-ref)
  • Use constraints to find cleaner feature
    representation
  • Transliteration
  • Usage of Normalization Table as the starting
    point
  • We can get good results without training data
  • Right constraints (modeling) is the key
  • Future Work
  • Transliteration Model Better Model, Quicker
    Inference
  • CoDL Other applications for unsupervised CoDL

33
Constraint - Driven Learning (CODL)
Any supervised learning algorithm parametrized
by ?
?learn(Tr) For N iterations do T? For
each x in unlabeled dataset y ?Inference(x,
?) TT ? (x, y) ? ? ?(1-? )learn(T)
Augmenting the training set (feedback). Any
inference algorithm (with constraints).
Inference(x,C, ?)
Learn from new training data. Weight supervised
and unsupervised model(Nigam2000).
34
Unsupervised Constraint - Driven Learning
Construct the model with Resources
?Construct(Resource) For N iterations do T?
For each x in unlabeled dataset y
?Inference(x, ?) TT ? (x, y) ? ?
?(1-? )learn(T)
Augmenting the training set (feedback). Any
inference algorithm (with constraints).
Inference(x,C, ?)
Learn from new training data. ? 0 in this work
Write a Comment
User Comments (0)
About PowerShow.com