Gholamreza Haffari Anoop Sarkar

About This Presentation

Title:

Gholamreza Haffari Anoop Sarkar

Description:

1. Gholamreza Haffari Anoop Sarkar. Presenter: Milan Tofiloski. Natural Language Lab ... Parameter setting for the joint probability of input-output which maximizes ... – PowerPoint PPT presentation

Number of Views:134

Avg rating:3.0/5.0

Slides: 31

Provided by: css64

Category:

more less

Transcript and Presenter's Notes

Title: Gholamreza Haffari Anoop Sarkar

1

Homotopy-based Semi-Supervised Hidden Markov
Models for Sequence Labeling

Gholamreza Haffari
Anoop Sarkar
Presenter Milan Tofiloski
Natural Language Lab
Simon Fraser university

2
Outline

Motivation Contributions
Experiments
Homotopy method
More experiments

3
Maximum Likelihood Principle

Parameter setting for the joint probability of
input-output which maximizes probability of the
given data

L labeled data
U unlabeled data

4
Deficiency of MLE

Usually U gtgt L, then

Which means the relationship of input-output is
ignored when estimating the parameters !
MLE focuses on modeling the input distribution
P(x)
But we are interested in modeling the joint
distribution P(x,y)

5
Remedy for the Deficiency

Balance the effect of lab and unlab data

Find which maximally take
advantage of lab and unlab data
MLE

6
An experiment with HMM
Lower is Better
MLE Performance

MLE can hurt the performance
Balancing lab and unlab data related terms is
beneficial

7
Our Contributions

Introducing a principled way to choose ? for HMM
in sequence labeling (tagging) tasks
Introducing an efficient dynamic programming
algorithm to compute second order statistics in
HMM

8
Outline

Motivation Contributions
Experiments
Homotopy method
More experiments

9
Task

Field segmentation in information extraction
13 tag fields AUTHOR, TITLE,

10
Experimental Setup

Use an HMM with 13 states
Freeze the transition (state-gtstate)
probabilities to what has been observed in the
lab data
Use the Homotopy method to just learn the
emission (state-gtalphabet) probabilities
Do add-? smoothing for the initial values of
emission and transition probabilities
Data statistics
Average seq. length 36.7
Average number of segments in a seq 5.4
Size of Lab/Unlab data is 300/700

11
Baselines

Held-out put aside part of the lab data as a
held-out set, and use it t choose ?
Oracle choose ? based on test data using per
position accuracy
Supervised forgetting about unlab data, and just
using lab data

12
Homotopy vs Baselines
Higher is Better
Even very small values of ? can be useful. In
homotopy ?.004, and in supervised ? 0

Sequence of most probable states decoding
See paper for more results

13
Outline

Motivation Contributions
Experiments
Homotopy method
More experiments

14
Path of Solutions

Look at ? as ? changes from 0 to 1
Choose the best ? based on the path

15
EM?for HMM

Let be a state-gtstate or state-gtobservation
event in our HMM
To find best parameter values ? which (locally)
maximizes the objective function for a fixed ?

Repeat until convergence
16
Fixed Points of EM?

Useful fact
At the fixed points , then
This is similar to using Homotopy for root
finding
Same numerical techniques should be applicable
here

17
Homotopy for Root Finding

To find a root of G(?)
start from a root of a simple problem F(?)
trace the roots of intermediate problems while
morphing F to G
To find ? which satisfy the above
Set the derivative to zero gives differential
equation
Numerically solve the resulting differential
eqn.

18
Solving the Differential Eqn
Jaccobian of EM1

Repeat until
Update in a proper direction parallel
to vKernel(M)
Update M

19
Jaccobian of EM1
See the paper for details

So, we need to compute the covariance matrix of
the events
The entry in the row and column of
the covariance matrix is

20
Expected Quadratic Counts for HMM

Dynamic programming algorithm to efficiently
compute
Pre-compute a table Zx for each sequence
Having table Zx, the EQC can be computed
efficiently
The time complexity is where K
is the number of states in the HMM (see paper for
more details)

21
How to Choose ? based on Path

monotone the first point at which the
monotonocity of ? changes
MaxEnt choose ? for which the model has maximum
entropy on the unlab data
minEig when solving the diff eqn, consider the
minimum singular value of the matrix M. Across
rounds, choose ? for which the minimum singular
value is the smallest

22
Outline

Motivation Contributions
Experiments
Homotopy method
More experiments

23
Varying the Size of Unlab Data

The three Homotopy-based methods outperform EM
maxEnt outperforms minEig and monotone
minEig and monotone have similar performances

Size of the labeled data 100

24
Picked ? Values
25
Picked ? Values

EM gives higher weight to unlabeled data compared
to Homotopy-based method

? selected by
maxEnt are much smaller than those selected by
minEig and monotone
minEig and monotone are close

26
Conclusion and Future Work

Using EM can hurt performance in the case L ltlt
U
Proposed a method to alleviate this problem for
HMMs for seq. labeling tasks
To speed up the method
Using sampling to find approximation to
covariance matrix
Using faster methods in recovering the solution
path, e.g. predictor-corrector

27
Questions?
28
Is Oracle outperformed by Homotopy?

No!
- The performance measure used in selecting ? in
oracle method may be different from that used in
comparing homotopy and oracle
- The decoding alg used in oracle may be
different from that used in comparing homotopy
and oracle

29
Why not set ?

This adhoc way of setting ? has two drawbacks
It still may hurt the performance. The proper ?
may be much smaller than that.
- In some situations, the right choice of ? may
be a big value. Setting is very
conservative and dose not fully take advantage
of the available unlabeled data.