Title: CS621: Artificial Intelligence
1CS621 Artificial Intelligence
- Pushpak BhattacharyyaCSE Dept., IIT Bombay
- Lecture 38-39 Baum Welch Algorithm HMM training
2Baum Welch algorithm
- Training Hidden Markov Model (not structure
learning, i.e., the structure of the HMM is
pre-given). This involves - Learning probability values ONLY
- Correspondence with PCFG
- Not learning production rule but probabilities
associated with them - Training algorithm for PCFG is called
Inside-Outside algorithm
3Key Intuition
- Given Training sequence
- Initialization Probability values
- Compute Pr (state seq training seq)
- get expected count of transition
- compute rule probabilities
- Approach Initialize the probabilities and
recompute them EM like approach
4Building blocks Probabilities to be used
W1
W2 Wn-1
Wn
5Probabilities to be used, contd
-
- Exercise 1- Prove the following
6Start of baum-welch algorithm
b
b
r
q
a
a
- String aab aaa aab aaa
- Sequence of states with respect to input symbols
o/p seq
State seq
7- Calculating probabilities from table
- Table of counts
- Tstates
- Aalphabet symbols
- Now if we have a non-deterministic transitions
then multiple state seq possible for the given
o/p seq (ref. to previous slides feature). Our
aim is to find expected count through this.
8Interplay Between Two Equations
wk No. of times the transitions si?sj occurs in
the string
9Learning probabilities
a0.67
b0.17
q
r
a0.16
b1.0
Actual (Desired) HMM
a0.4
b0.48
q
r
a0.48
b1.0
Initial guess
10One run of Baum-Welch algorithm string ababa
State sequences
is considered as starting and ending
symbol of the input sequence string
This way through multiple iterations the
probability values will converge.
11Appling Naïve Bayes
Hence multiplying the transition probabilities is
valid
12Discussions
- Symmetry breaking
- Example Symmetry breaking leads to no change in
initial values - Struck in Local maxima
- Label bias problem
- Probabilities have to sum to 1.
- Values can rise at the cost of fall of values
for others.
a0.5
b0.25
a0.25
b1.0
a0.5
a0.5
b0.5
a1.0
b0.5
a0.25
b0.5
b0.5
Desired
Initialized
13Computational part
Exercise 2 What is the complexity of calculating
the above expression? Hint To find this first
solve Exercise 1 i.e. understand how probability
of given string can be represented as