Gholamreza Haffari Anoop Sarkar - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Gholamreza Haffari Anoop Sarkar

Description:

1. Gholamreza Haffari Anoop Sarkar. Presenter: Milan Tofiloski. Natural Language Lab ... Parameter setting for the joint probability of input-output which maximizes ... – PowerPoint PPT presentation

Number of Views:134
Avg rating:3.0/5.0
Slides: 31
Provided by: css64
Category:

less

Transcript and Presenter's Notes

Title: Gholamreza Haffari Anoop Sarkar


1

Homotopy-based Semi-Supervised Hidden Markov
Models for Sequence Labeling
  • Gholamreza Haffari
    Anoop Sarkar
  • Presenter Milan Tofiloski
  • Natural Language Lab
  • Simon Fraser university

2
Outline
  • Motivation Contributions
  • Experiments
  • Homotopy method
  • More experiments

3
Maximum Likelihood Principle
  • Parameter setting for the joint probability of
    input-output which maximizes probability of the
    given data
  • L labeled data
  • U unlabeled data

4
Deficiency of MLE
  • Usually U gtgt L, then
  • Which means the relationship of input-output is
    ignored when estimating the parameters !
  • MLE focuses on modeling the input distribution
    P(x)
  • But we are interested in modeling the joint
    distribution P(x,y)

5
Remedy for the Deficiency
  • Balance the effect of lab and unlab data
  • Find which maximally take
    advantage of lab and unlab data
  • MLE

6
An experiment with HMM
Lower is Better
MLE Performance
  • MLE can hurt the performance
  • Balancing lab and unlab data related terms is
    beneficial

7
Our Contributions
  • Introducing a principled way to choose ? for HMM
    in sequence labeling (tagging) tasks
  • Introducing an efficient dynamic programming
    algorithm to compute second order statistics in
    HMM

8
Outline
  • Motivation Contributions
  • Experiments
  • Homotopy method
  • More experiments

9
Task
  • Field segmentation in information extraction
  • 13 tag fields AUTHOR, TITLE,

10
Experimental Setup
  • Use an HMM with 13 states
  • Freeze the transition (state-gtstate)
    probabilities to what has been observed in the
    lab data
  • Use the Homotopy method to just learn the
    emission (state-gtalphabet) probabilities
  • Do add-? smoothing for the initial values of
    emission and transition probabilities
  • Data statistics
  • Average seq. length 36.7
  • Average number of segments in a seq 5.4
  • Size of Lab/Unlab data is 300/700

11
Baselines
  • Held-out put aside part of the lab data as a
    held-out set, and use it t choose ?
  • Oracle choose ? based on test data using per
    position accuracy
  • Supervised forgetting about unlab data, and just
    using lab data

12
Homotopy vs Baselines
Higher is Better
Even very small values of ? can be useful. In
homotopy ?.004, and in supervised ? 0
  • Sequence of most probable states decoding
  • See paper for more results

13
Outline
  • Motivation Contributions
  • Experiments
  • Homotopy method
  • More experiments

14
Path of Solutions
  • Look at ? as ? changes from 0 to 1
  • Choose the best ? based on the path

15
EM?for HMM
  • Let be a state-gtstate or state-gtobservation
    event in our HMM
  • To find best parameter values ? which (locally)
    maximizes the objective function for a fixed ?

Repeat until convergence
16
Fixed Points of EM?
  • Useful fact
  • At the fixed points , then
  • This is similar to using Homotopy for root
    finding
  • Same numerical techniques should be applicable
    here

17
Homotopy for Root Finding
  • To find a root of G(?)
  • start from a root of a simple problem F(?)
  • trace the roots of intermediate problems while
    morphing F to G
  • To find ? which satisfy the above
  • Set the derivative to zero gives differential
    equation
  • Numerically solve the resulting differential
    eqn.

18
Solving the Differential Eqn
Jaccobian of EM1
  • Repeat until
  • Update in a proper direction parallel
    to vKernel(M)
  • Update M

19
Jaccobian of EM1
See the paper for details
  • So, we need to compute the covariance matrix of
    the events
  • The entry in the row and column of
    the covariance matrix is

20
Expected Quadratic Counts for HMM
  • Dynamic programming algorithm to efficiently
    compute
  • Pre-compute a table Zx for each sequence
  • Having table Zx, the EQC can be computed
    efficiently
  • The time complexity is where K
    is the number of states in the HMM (see paper for
    more details)

21
How to Choose ? based on Path
  • monotone the first point at which the
    monotonocity of ? changes
  • MaxEnt choose ? for which the model has maximum
    entropy on the unlab data
  • minEig when solving the diff eqn, consider the
    minimum singular value of the matrix M. Across
    rounds, choose ? for which the minimum singular
    value is the smallest

22
Outline
  • Motivation Contributions
  • Experiments
  • Homotopy method
  • More experiments

23
Varying the Size of Unlab Data
  • The three Homotopy-based methods outperform EM
  • maxEnt outperforms minEig and monotone
  • minEig and monotone have similar performances
  • Size of the labeled data 100

24
Picked ? Values
25
Picked ? Values
  • EM gives higher weight to unlabeled data compared
    to Homotopy-based method
  • ? selected by
  • maxEnt are much smaller than those selected by
    minEig and monotone
  • minEig and monotone are close

26
Conclusion and Future Work
  • Using EM can hurt performance in the case L ltlt
    U
  • Proposed a method to alleviate this problem for
    HMMs for seq. labeling tasks
  • To speed up the method
  • Using sampling to find approximation to
    covariance matrix
  • Using faster methods in recovering the solution
    path, e.g. predictor-corrector

27
Questions?
28
Is Oracle outperformed by Homotopy?
  • No!
  • - The performance measure used in selecting ? in
    oracle method may be different from that used in
    comparing homotopy and oracle
  • - The decoding alg used in oracle may be
    different from that used in comparing homotopy
    and oracle

29
Why not set ?
  • This adhoc way of setting ? has two drawbacks
  • It still may hurt the performance. The proper ?
    may be much smaller than that.
  • - In some situations, the right choice of ? may
    be a big value. Setting is very
    conservative and dose not fully take advantage
    of the available unlabeled data.

30
Homotopy vs Baselines
  • Viterbi Decoding most probable seq of states
    decoding
  • SMS Decoding seq of most probable states
    decoding
Write a Comment
User Comments (0)
About PowerShow.com