Graphical Models for Segmenting and Labeling Sequence Data - PowerPoint PPT Presentation

About This Presentation
Title:

Graphical Models for Segmenting and Labeling Sequence Data

Description:

Outline Introduction Directed Graphical Models Hidden Markov ... in same group Applications Computational Linguistics POS ... HMMs Tagging Process Given ... – PowerPoint PPT presentation

Number of Views:234
Avg rating:3.0/5.0
Slides: 24
Provided by: ManojKuma1
Category:

less

Transcript and Presenter's Notes

Title: Graphical Models for Segmenting and Labeling Sequence Data


1
Graphical Models for Segmenting and Labeling
Sequence Data
NLP-AI Seminar
  • Manoj Kumar Chinnakotla

2
Outline
  • Introduction
  • Directed Graphical Models
  • Hidden Markov Models (HMMs)
  • Maximum Entropy Markov Models (MEMMs)
  • Label Bias Problem
  • Undirected Graphical Models
  • Conditional Random Fields (CRFs)
  • Summary

3
The Task
  • Labeling
  • Given sequence data, mark appropriate tags for
    each data item
  • Segmentation
  • Given sequence data, segment into non-overlapping
    groups such that related entities are in same
    group

4
Applications
  • Computational Linguistics
  • POS Tagging
  • Information Extraction
  • Syntactic Disambiguation
  • Computational Biology
  • DNA and Protein Sequence Alignment
  • Sequence homologue searching
  • Protein Secondary Structure Prediction

5
Example POS Tagging
6
Directed Graphical Models
  • Hidden Markov models (HMMs)
  • Assign a joint probability to paired observation
    and label sequences
  • The parameters trained to maximize the joint
    likelihood of train examples

7
Hidden Markov Models (HMMs)
  • Generative Model - Models the joint distribution
  • Generation Process
  • Probabilistic Finite State Machine
  • Set of states Correspond to tags
  • Alphabet - Set of words
  • Transition Probability
  • State Probability

8
HMMs (Contd..)
  • For a given word/tag sequence pair
  • Why Hidden?
  • Sequence of tags which generated word sequence
    not visible
  • Why Markov?
  • Based on Markovian Assumption current tag
    depends only on previous n tags
  • Solves the sparsity problem
  • Training Learning the transition and emission
    probabilities from data

9
HMMs Tagging Process
  • Given a string of words w, choose tag sequence t
    such that
  • Computationally expensive - Need to evaluate all
    possible tag sequences!
  • For n possible tags, m positions
  • Viterbi Algorithm
  • Used to find the optimal tag sequence t
  • Efficient dynamic programming based algorithm

10
Disadvantages of HMMs
  • Need to enumerate all possible observation
    sequences
  • Not possible to represent multiple interacting
    features
  • Difficult to model long-range dependencies of the
    observations
  • Very strict independence assumptions on the
    observations

11
Maximum Entropy Markov Models (MEMMs)
  • Conditional Exponential Models
  • Assumes observation sequence given (need not
    model)
  • Trains the model to maximize the conditional
    likelihood P(YX)

12
MEMMs (Contd..)
  • For a new data sequence x, the label sequence y
    which maximizes P(yx,T) is assigned (T -
    parameter set)
  • Arbitrary non-independent features on observation
    sequence possible
  • Conditional Models known to perform well than
    Generative
  • Performs Per-State Normalization
  • Total mass which arrives at a state must be
    distributed among all possible successor states

13
Label Bias Problem
  • Bias towards states with fewer outgoing
    transitions
  • Due to per-state normalization
  • An Example MEMM

14
Undirected Graphical ModelsRandom Fields
15
Conditional Random Fields (CRFs)
  • Conditional Exponential Model like MEMM
  • Has all the advantages of MEMMs without label
    bias problem
  • MEMM uses per-state exponential model for the
    conditional probabilities of next states given
    the current state
  • CRF has a single exponential model for the joint
    probability of the entire sequence of labels
    given the observation sequence
  • Allow some transitions vote more strongly than
    others depending on the corresponding observations

16
Definition of CRFs
17
CRF Distribution Function
Where V Set of Label Random Variables fk and
gk Features gk State Feature fk Edge
Feature are parameters to be
estimated ye Set of Components of y defined by
edge e yv Set of Components of y defined by
vertex v
18
CRF Training
19
CRF Training (Contd..)
  • Condition for maximum likelihood
  • Expected feature count computed using Model
    equals Empirical feature count from training data
  • Closed form solution for parameters not possible
  • Iterative algorithms employed - Improve log
    likelihood in successive iterations
  • Examples
  • Generalized Iterative Scaling (GIS)
  • Improved Iterative Scaling (IIS)

20
Graphical Comparison HMMs, MEMMs, CRFs
21
POS Tagging Results
22
Summary
  • HMMs
  • Directed, Generative graphical models
  • Cannot be used to model overlapping features on
    observations
  • MEMMs
  • Directed, Conditional Models
  • Can model overlapping features on observations
  • Suffer from label bias problem due to per-state
    normalization
  • CRFs
  • Undirected, Conditional Models
  • Avoids label bias problem
  • Efficient training possible

23
Thanks! Acknowledgements Some slides in this
presentation are from Rongkun Shens (Oregon
State Univ) Presentation on CRFs
Write a Comment
User Comments (0)
About PowerShow.com