Title: Graphical Models for Segmenting and Labeling Sequence Data
1Graphical Models for Segmenting and Labeling
Sequence Data
NLP-AI Seminar
2Outline
- Introduction
- Directed Graphical Models
- Hidden Markov Models (HMMs)
- Maximum Entropy Markov Models (MEMMs)
- Label Bias Problem
- Undirected Graphical Models
- Conditional Random Fields (CRFs)
- Summary
3The Task
- Labeling
- Given sequence data, mark appropriate tags for
each data item - Segmentation
- Given sequence data, segment into non-overlapping
groups such that related entities are in same
group
4Applications
- Computational Linguistics
- POS Tagging
- Information Extraction
- Syntactic Disambiguation
- Computational Biology
- DNA and Protein Sequence Alignment
- Sequence homologue searching
- Protein Secondary Structure Prediction
5Example POS Tagging
6Directed Graphical Models
- Hidden Markov models (HMMs)
- Assign a joint probability to paired observation
and label sequences - The parameters trained to maximize the joint
likelihood of train examples
7Hidden Markov Models (HMMs)
- Generative Model - Models the joint distribution
- Generation Process
- Probabilistic Finite State Machine
- Set of states Correspond to tags
- Alphabet - Set of words
- Transition Probability
- State Probability
8HMMs (Contd..)
- For a given word/tag sequence pair
- Why Hidden?
- Sequence of tags which generated word sequence
not visible - Why Markov?
- Based on Markovian Assumption current tag
depends only on previous n tags - Solves the sparsity problem
- Training Learning the transition and emission
probabilities from data
9HMMs Tagging Process
- Given a string of words w, choose tag sequence t
such that - Computationally expensive - Need to evaluate all
possible tag sequences! - For n possible tags, m positions
- Viterbi Algorithm
- Used to find the optimal tag sequence t
- Efficient dynamic programming based algorithm
-
10Disadvantages of HMMs
- Need to enumerate all possible observation
sequences - Not possible to represent multiple interacting
features - Difficult to model long-range dependencies of the
observations - Very strict independence assumptions on the
observations
11Maximum Entropy Markov Models (MEMMs)
- Conditional Exponential Models
- Assumes observation sequence given (need not
model) - Trains the model to maximize the conditional
likelihood P(YX)
12MEMMs (Contd..)
- For a new data sequence x, the label sequence y
which maximizes P(yx,T) is assigned (T -
parameter set) - Arbitrary non-independent features on observation
sequence possible - Conditional Models known to perform well than
Generative - Performs Per-State Normalization
- Total mass which arrives at a state must be
distributed among all possible successor states
13Label Bias Problem
- Bias towards states with fewer outgoing
transitions - Due to per-state normalization
- An Example MEMM
14Undirected Graphical ModelsRandom Fields
15Conditional Random Fields (CRFs)
- Conditional Exponential Model like MEMM
- Has all the advantages of MEMMs without label
bias problem - MEMM uses per-state exponential model for the
conditional probabilities of next states given
the current state - CRF has a single exponential model for the joint
probability of the entire sequence of labels
given the observation sequence - Allow some transitions vote more strongly than
others depending on the corresponding observations
16Definition of CRFs
17CRF Distribution Function
Where V Set of Label Random Variables fk and
gk Features gk State Feature fk Edge
Feature are parameters to be
estimated ye Set of Components of y defined by
edge e yv Set of Components of y defined by
vertex v
18CRF Training
19CRF Training (Contd..)
- Condition for maximum likelihood
- Expected feature count computed using Model
equals Empirical feature count from training data - Closed form solution for parameters not possible
- Iterative algorithms employed - Improve log
likelihood in successive iterations - Examples
- Generalized Iterative Scaling (GIS)
- Improved Iterative Scaling (IIS)
20Graphical Comparison HMMs, MEMMs, CRFs
21POS Tagging Results
22Summary
- HMMs
- Directed, Generative graphical models
- Cannot be used to model overlapping features on
observations - MEMMs
- Directed, Conditional Models
- Can model overlapping features on observations
- Suffer from label bias problem due to per-state
normalization - CRFs
- Undirected, Conditional Models
- Avoids label bias problem
- Efficient training possible
23Thanks! Acknowledgements Some slides in this
presentation are from Rongkun Shens (Oregon
State Univ) Presentation on CRFs