Graphical Models for Segmenting and Labeling Sequence Data - PowerPoint PPT Presentation

About This Presentation

Title:

Graphical Models for Segmenting and Labeling Sequence Data

Description:

Outline Introduction Directed Graphical Models Hidden Markov ... in same group Applications Computational Linguistics POS ... HMMs Tagging Process Given ... – PowerPoint PPT presentation

Number of Views:234

Avg rating:3.0/5.0

Slides: 24

Provided by: ManojKuma1

Category:

more less

Transcript and Presenter's Notes

Title: Graphical Models for Segmenting and Labeling Sequence Data

1
Graphical Models for Segmenting and Labeling
Sequence Data
NLP-AI Seminar

Manoj Kumar Chinnakotla

2
Outline

Introduction
Directed Graphical Models
Hidden Markov Models (HMMs)
Maximum Entropy Markov Models (MEMMs)
Label Bias Problem
Undirected Graphical Models
Conditional Random Fields (CRFs)
Summary

3
The Task

Labeling
Given sequence data, mark appropriate tags for
each data item
Segmentation
Given sequence data, segment into non-overlapping
groups such that related entities are in same
group

4
Applications

Computational Linguistics
POS Tagging
Information Extraction
Syntactic Disambiguation
Computational Biology
DNA and Protein Sequence Alignment
Sequence homologue searching
Protein Secondary Structure Prediction

5
Example POS Tagging
6
Directed Graphical Models

Hidden Markov models (HMMs)
Assign a joint probability to paired observation
and label sequences
The parameters trained to maximize the joint
likelihood of train examples

7
Hidden Markov Models (HMMs)

Generative Model - Models the joint distribution
Generation Process
Probabilistic Finite State Machine
Set of states Correspond to tags
Alphabet - Set of words
Transition Probability
State Probability

8
HMMs (Contd..)

For a given word/tag sequence pair
Why Hidden?
Sequence of tags which generated word sequence
not visible
Why Markov?
Based on Markovian Assumption current tag
depends only on previous n tags
Solves the sparsity problem
Training Learning the transition and emission
probabilities from data

9
HMMs Tagging Process

Given a string of words w, choose tag sequence t
such that
Computationally expensive - Need to evaluate all
possible tag sequences!
For n possible tags, m positions
Viterbi Algorithm
Used to find the optimal tag sequence t
Efficient dynamic programming based algorithm

10
Disadvantages of HMMs

Need to enumerate all possible observation
sequences
Not possible to represent multiple interacting
features
Difficult to model long-range dependencies of the
observations
Very strict independence assumptions on the
observations

11
Maximum Entropy Markov Models (MEMMs)

Conditional Exponential Models
Assumes observation sequence given (need not
model)
Trains the model to maximize the conditional
likelihood P(YX)

12
MEMMs (Contd..)

For a new data sequence x, the label sequence y
which maximizes P(yx,T) is assigned (T -
parameter set)
Arbitrary non-independent features on observation
sequence possible
Conditional Models known to perform well than
Generative
Performs Per-State Normalization
Total mass which arrives at a state must be
distributed among all possible successor states

13
Label Bias Problem

Bias towards states with fewer outgoing
transitions
Due to per-state normalization
An Example MEMM

14
Undirected Graphical ModelsRandom Fields
15
Conditional Random Fields (CRFs)

Conditional Exponential Model like MEMM
Has all the advantages of MEMMs without label
bias problem
MEMM uses per-state exponential model for the
conditional probabilities of next states given
the current state
CRF has a single exponential model for the joint
probability of the entire sequence of labels
given the observation sequence
Allow some transitions vote more strongly than
others depending on the corresponding observations

16
Definition of CRFs
17
CRF Distribution Function
Where V Set of Label Random Variables fk and
gk Features gk State Feature fk Edge
Feature are parameters to be
estimated ye Set of Components of y defined by
edge e yv Set of Components of y defined by
vertex v
18
CRF Training
19
CRF Training (Contd..)

Condition for maximum likelihood
Expected feature count computed using Model
equals Empirical feature count from training data
Closed form solution for parameters not possible
Iterative algorithms employed - Improve log
likelihood in successive iterations
Examples
Generalized Iterative Scaling (GIS)
Improved Iterative Scaling (IIS)

20
Graphical Comparison HMMs, MEMMs, CRFs
21
POS Tagging Results
22
Summary