Title: Unsupervised Constraint Driven Learning for Transliteration Discovery
1Unsupervised Constraint Driven Learning for
Transliteration Discovery
- M. Chang, D. Goldwasser, D. Roth, and Y. Tu
2What I am going to do today
- Goal 1 Present the transliteration work
- Get feedback!
- Goal 2 Think about this work with CCM
- Tutorial . ?
- I will try to present this work in a slightly
different way - Some of them are my personal comment
- Different than our yesterday discussion
- Please give us comment about this
- Make this work more general (not only
transliteration)
3Wait a sec! What is CCM?
- I get this question 100 times already!
- Informal answer
- everything uses constraints is CCM! ?
- Formal Answer
- No constraints
- CCM
- We do not define the training method
- Definition CCM makes prediction with constraints!
4Constraints Driven Learning
- Why Constraints?
- The Goal Building a good system easily
- We have prior knowledge at our hand
- Why not inject knowledge directly ?
- How useful are constraints?
- Useful for supervised learning Yih and Roth 04
many others - Useful for semi-supervised learning Chang et.al.
ACL 2007 - Some times more efficient than labeling data
directly
5Unsupervised Constraint Driven Learning
- In this work
- We do not use any label instance
- Achieve to good performance that competitive
several supervised model - Compared to Chang et.al. ACL 2007
- In ACL 07, they use a small amount of dataset
(5-20) - Reason Bad Models can not benefit from
constraints! - For some applications, we have very good resource
- We do not need labeled instances at all!
6In a nutshell
- Traditional semi-supervised learning.
- Model can drift from the correct one.
Unsupervised Learning
Model
Resource ?
Prediction Label unlabeled data
Feedback Learn from labeled data
Unlabeled Data
7In a nutshell
CODL Improves Simple Model Using Expressive
Constraints
CODL Use constraints to generate better training
samples in unsupervised learning.
Model
Better Model
Prediction Constraints
Prediction
Feedback
Unlabeled Data
More accurate labeling
8Outline
- Constraint Driven Learning (CoDL)
- Transliteration Discovery
- Algorithm
- Experimental Results
9Transliteration Generation (Not our focus)
- Given a Source Transliteration What is the
target transliteration? - Bush
- ? ??
- Sushi
- ? ??
- Issues
- Ambiguity
- For the same source word, many different
transliteration - Think about Chinese
- What we want find the most widely used
transliteration
10Transliteration Discovery (Our focus)
- Problem Settings
- Give you two list of words, map them!
- Advantages
- A relatively easy problem
- Can find the most widely used transliteration
- Assumption
- Source English
- Each source entities has a transliteration in the
target candidates - Target candidates might not be named entities
11Outline
- Constraint Driven Learning (CoDL)
- Transliteration Discovery
- Algorithm
- Experimental Results
12Algorithm Outline
- Prediction Model
- How to use existing resource to construct the
Model? - Constraints?
- Learning Algorithm
13The Prediction Model
- How do we make prediction?
- Given a source word, how to predict the best
target ? - Model 1 Vs, Vt ? Yes or No
- Issue Not many obvious constraints can be added
- Not a structure prediction problem
- Model 2 Vs, Vt ? Hidden variables ? Yes or No
- Predicting F is a structure prediction algorithm
- We can add constraints more easily
14The Prediction Model
- Score for a pair
- A CCM formulation
- A slightly different scoring function
More on this point in the next few slides
15Prediction Model Another View
- The scoring function looks like weight times
features! - If there is a bad feature, score ? - 8
- Our Hidden variable (Feature Vectors)
- Character Mapping
16- Everything
- (a,a), (o,O), (w,_),
17Algorithm Outline
- Prediction Model
- How to use existing resource to construct the
Model? - Constraints?
- Learning Algorithm
18Resource Romanization Table
- Hebrew, Russian
- How can you type Hebrew or Russian?
- Use English Keyboard, C maps to
- A similar character C or S in Hebrew or
Russian - Very easy to get
- Ambiguous
- Special Case Chinese (Pin Yin)
- ?? ? shòu si (Low ambiguity)
- Map Pin-Yin to English (sushi)
- Romanization Table? a ?a
19Initialize the Table
- Every character pair in the Romanization Table
- Weight 0
- Everything else, -1
- Could have better way to do initialization
- Note All (v_s,v_t) will get zero without
constraints
20Algorithm Outline
- Prediction Model
- How to use existing resource to construct the
Model? - Constraints?
- Learning Algorithm
21Constraints
- General Constraints
- Coverage all character need to be mapped at
least once - No crossing character mappings can not cross
each other - Language Specific Constraints
- General Restricted Mapping
- Initial Restricted Mapping
- Length Restriction
22Constraints
Many other works use similar information as well!
23Algorithm Outline
- Prediction Model
- How to use existing resource to construct the
Model? - Constraints?
- Learning Algorithm
24High-Level Overview
- Model ? Resource
- While Converge
- Use Model Constraints to get Labels (for both
F, y) - Update Model with newly labeled F and y (without
Constraints) (details in the next slide) - Similar to ACL 07
- Update the model without Constraints
- Difference from ACL 07
- We get feedback from the labels of both hidden
variables and output
25Training
Predict hidden variables and the labels
Update Algorithm
26Outline
- Constraint Driven Learning (CoDL)
- Transliteration Discovery
- Algorithm
- Experimental Results
27Experimental Setting
- Evaluation
- ACC Top candidate is (one of) the right answer
- Learning Algorithm
- Linear SVM with C 0.5
- Dataset
- English-Hebrew 300 300
- English-Chinese 581681
- English-Russian 72750648 (Target includes all
words)
28Results - Hebrew
29Results - Russian
30Analysis
4) Better Constraints Lead to Better Final Results
- A small Russian subset was used here
3) Learning has great impact here! But
constraints are very important, too!
1) Without Constraints (on features),
Romanization Table is useless!
2) General Constraints are more important!
31Related Works (Need more work here)
- Learning the score for Edit Distance
- Previous transliteration works
- Machine translation?
32Conclusion
- ML unsupervised constraint driven algorithm
- Use hidden variable to find more constraints
(e.g. co-ref) - Use constraints to find cleaner feature
representation - Transliteration
- Usage of Normalization Table as the starting
point - We can get good results without training data
- Right constraints (modeling) is the key
- Future Work
- Transliteration Model Better Model, Quicker
Inference - CoDL Other applications for unsupervised CoDL
33Constraint - Driven Learning (CODL)
Any supervised learning algorithm parametrized
by ?
?learn(Tr) For N iterations do T? For
each x in unlabeled dataset y ?Inference(x,
?) TT ? (x, y) ? ? ?(1-? )learn(T)
Augmenting the training set (feedback). Any
inference algorithm (with constraints).
Inference(x,C, ?)
Learn from new training data. Weight supervised
and unsupervised model(Nigam2000).
34Unsupervised Constraint - Driven Learning
Construct the model with Resources
?Construct(Resource) For N iterations do T?
For each x in unlabeled dataset y
?Inference(x, ?) TT ? (x, y) ? ?
?(1-? )learn(T)
Augmenting the training set (feedback). Any
inference algorithm (with constraints).
Inference(x,C, ?)
Learn from new training data. ? 0 in this work