Viterbi Algorithm - PowerPoint PPT Presentation

1 / 13

About This Presentation

Title:

Viterbi Algorithm

Description:

By keeping pointers backwards, the most optimal (probable) state sequence can be ... So, logarithms could not be used easily to avoid underflow errors. ... – PowerPoint PPT presentation

Number of Views:104

Avg rating:3.0/5.0

Slides: 14

Provided by: nalinwickr

Category:

more less

Transcript and Presenter's Notes

Title: Viterbi Algorithm

1
Viterbi Algorithm

Let X be a path of length L. For k ? Q and 0 ? i
? L, consider a path ? ending at k. If vk(i ) is
the probability of most probable path that ends
in state k.
Initialize
Recursive relation For each i 0, , L -1 and
for each l ? Q
The value of p (X, ?) is given by

2
Viterbi Algorithm

By keeping pointers backwards, the most optimal
(probable) state sequence can be found on
backtracking.
Start backtracking from the state where vk(L ) is
maximum for all k ? Q.
Predicted states by Viterbi algorithm on the
casino example for 300 rolls of a die

Rolls 315116246446644245311321631164152133625144
54363165662656666665116645313265124563666463163666
31623264 Die FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFLLLLLLLLLLLLLLLLLLLLLFFFFFFFFFFFFLLLLL
LLLLLLLLLLLFFFLLL Viterbi FFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFLLLLLLLLLLLLLLLLLLFFFFFFFF
FFFFLLLLLLLLLLLLLLLLLLLLLL Rolls
55236266666625151631222555441666566563564324364131
51346514635341112641462625335636616366646623253441
Die LLLLLLLLLLLFFFFFFFFFFFFFFFFFLLLLLLLLLLLLL
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFLLLLLLLLLLFFF
FFFFFFFFF Viterbi LLLLLLLLLLLLFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFPFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFLLL
LLLLLLLLLLFFFFFFFF Rolls 3661661163252562462255
26525226643535333623312162536441443233516324363366
5562466662632666612355245242 Die
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFLLLLLLLLLLLLLLLLLLLLLLFFFFFFFFFFF
Viterbi FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFLLLLLLLLLLLLLLLLLLLFF
FFFFFFFFF
3
The Forward Algorithm

We want to be able to calculate the probability
of a given sequence, (as we did for a Markov
chain), but under HMM conditions.
However, many different state paths can give rise
to the same sequence X in the case of a HMM.
So add the probabilities of all possible paths
to get the final probability.

4
The Forward Algorithm

Given the sequence X (x1, , xL), denote fk (i
) to be the probability of emitting prefix (x1,
, xi) and eventually reaching the state ?i k
fk (i ) is the probability of the given sequence
up to xi requiring that ?i k
Initial values

5
The Forward Algorithm

Recursive relation
Terminal value
Unlike the Viterbi algorithm, we have sums of
probabilities.
So, logarithms could not be used easily to avoid
underflow errors.
Use exponential functions or a scaling method.

6
The Backward Algorithm

Complementary to the forward algorithm.
Denote bk (i ) to be the probability of emitting
suffix (xi1, , xL) given ?i k
Initial Values
Recursive relation
Terminal value

7
The Posterioir Decoding Problem

The Viterbi algorithm finds the most probable
path through the model given a sequence of
symbols.
However, in general we want to find the
probability that the observation xi came from
state k, given the observed sequence.
This is called posterior probability of state k
at step i when the emitted sequence is known.
Posterior probability is particularly useful when
many different paths compete for the most
probable path with almost the same probability.
With posterior probability, we can ask questions
like - Does Nth base in the sequence come from a
CpG island or not?

8
Posterioir Probability

Posterior probability is obtained by using
forward and backward probabilities.
By the definition of conditional probability,
PAB PA,B / PB
where p(X) is the result of either forward or
backward calculation.

9
Posterioir Probability

The posterior probability of the die being fair
in the casino example can be calculated for each
roll of a die.

X-axis no. of rolls, Y-axis p (die is fair)
The shaded areas show the rolls generated by
loaded die.
10
Parameter Estimation for HMM

All examples considered so far assume that
transmission and emission probabilities (? in HMM
model) are known beforehand.
In practice, we do not know these HMM model
parameters to begin with. (example CpG island
case).
If we have a set of sample sequences X1, , Xn of
lengths L1, , Ln, (called training sequences)
then we can construct the HMM that will best
characterize the training sequences.
Our goal is to find ? such that the logarithmic
scores of the training sequences are maximized.

11
Estimation When State Sequence Known

Assume that state sequences ?1, , ?n are known.
In CpG island case, this corresponds to a given
set of genomic training sequences in which CpG
islands are already labeled.
First scan the sequences and compute
Akl no. of transitions from state k to l, and
Ek(b) no. of times symbol b was emitted in
state k.
Then the maximum likelihood estimations are

12
Estimation When State Sequence Unknown

Called Baum-Welch training algorithm - an
iterative technique.
Initialize by assigning arbitrary values to ?.
compute the expected no. of state transitions
from k to l using,
then the expectations are,
where fkj (i ) and bkj(i ) are the forward and
backward probabilities of the sequence Xj.

13
Estimation When State Sequence Unknown

compute the expected no. of emissions of symbol
b in the state k using,
Maximization Re-compute the new values for ?
from Akl and Ek(b), as in the case of known state
sequence.
Repeat steps 2 and 3 until the improvement of
is less than a given
parameter ?.
The Baum-Welch algorithm does not guarantee
convergence to the global maximum, it may get
stuck in a local maximum.
Final solution is particularly sensitive to
initial values.