Title: FSA and HMM
1FSA and HMM
2Outline
- FSA
- HMM
- Relation between FSA and HMM
3FSA
4Definition of FSA
- A FSA is
- Q a finite set of states
- S a finite set of input symbols
- I the set of initial states
- F the set of final states
- the transition
relation between states.
5An example of FSA
b
a
q0
q1
6Definition of FST
- A FST is
- Q a finite set of states
- S a finite set of input symbols
- G a finite set of output symbols
- I the set of initial states
- F the set of final states
-
the transition relation between states. - ? FSA can be seen as a special case of FST
7- The extended transition relation is the
smallest set such that - T transduces a string x into a string y if there
exists a path from the initial state to a final
state whose input is x and whose output is y
8An example of FST
9Operations on FSTs
- Union
- Concatenation
- Composition
10An example of composition operation
11Probabilistic finite-state automata (PFA)
- Informally, in a PFA, each arc is associated with
a probability. - The probability of a path is the multiplication
of the arcs on the path. - The probability of a string x is the sum of the
probabilities of all the paths for x. - Tasks
- Given a string x, find the best path for x.
- Given a string x, find the probability of x in a
PFA. - Find the string with the highest probability in a
PFA
12Formal definition of PFA
- A PFA is
- Q a finite set of N states
- S a finite set of input symbols
- I Q ?R (initial-state probabilities)
- F Q ?R (final-state probabilities)
- the transition
relation between states. - P (transition probabilities)
13Constraints on function
Probability of a string
14Consistency of a PFA
- Let A be a PFA.
- Def P(x A) the sum of all the valid paths
for x in A. - Def a valid path in A is a path for some string
x with probability greater than 0. - Def A is called consistent if
- Def a state of a PFA is useful if it appears in
at least one valid path. - Proposition a PFA is consistent if all its
states are useful. - ? Q1 of Hw1
15An example of PFA
I(q0)1.0 I(q1)0.0
P(abn)0.20.8n
16Weighted finite-state automata (WFA)
- Each arc is associated with a weight.
- Sum and Multiplication can be other meanings.
17HMM
18Two types of HMMs
- State-emission HMM (Moore machine)
- The emission probability depends only on the
state (from-state or to-state). - Arc-emission HMM (Mealy machine)
- The probability depends on (from-state, to-state)
pair.
19State-emission HMM
s1
s2
sN
w1
w4
w1
w5
w3
w1
- Two kinds of parameters
- Transition probability P(sj si)
- Output (Emission) probability P(wk si)
- ? of Parameters O(NMN2)
20Arc-emission HMM
w1
w2
w1
w1
w5
sN
s1
s2
w4
w3
Same kinds of parameters but the emission
probabilities depend on both states P(wk, sj
si) ? of Parameters O(N2MN2).
21Are the two types of HMMs equivalent?
- For each state-emission HMM1, there is an
arc-emission HMM2, such that for any sequence O,
P(OHMM1)P(OHMM2). - The reverse is also true.
- ? Q3 and Q4 of hw1.
22Definition of arc-emission HMM
- A HMM is a tuple
- A set of states Ss1, s2, , sN.
- A set of output symbols Sw1, , wM.
- Initial state probabilities
- State transition prob Aaij.
- Symbol emission prob Bbijk
- State sequence X1,n
- Output sequence O1,n
23Constraints
For any integer n and any HMM
? Q2 of hw1.
24Properties of HMM
- Limited horizon
- Time invariance the probabilities do not change
over time - The states are hidden because we know the
structure of the machine (i.e., S and S), but we
dont know which state sequences generate a
particular output.
25Applications of HMM
- N-gram POS tagging
- Bigram tagger oi is a word, and si is a POS tag.
- Trigram tagger oi is a word, and si is ??
- Other tagging problems
- Word segmentation
- Chunking
- NE tagging
- Punctuation predication
-
- Other applications ASR, .
26Three fundamental questions for HMMs
- Finding the probability of an observation
- Finding the best state sequence
- Training estimating parameters
27(1) Finding the probability of the observation
- Forward probability the probability of producing
- O1,t-1 while ending up in state si
28Calculating forward probability
Initialization
Induction
29(2) Finding the best state sequence
- Given the observation O1,To1oT, find the state
sequence X1,T1X1 XT1 that maximizes P(X1,T1
O1,T). - ? Viterbi algorithm
30Viterbi algorithm
- The probability of the best path that produces
O1,t-1 while ending up in state si
Initialization
Induction
?Modify it to allow epsilon emission Q5 of hw1.
31Summary of HMM
- Two types of HMMs state-emission and
arc-emission HMM - Properties Markov assumption
- Applications POS-tagging, etc.
- Finding the probability of an observation
forward probability - Decoding Viterbi decoding
32Relation between FSA and HMM
33Relation between WFA and HMM
- HMM can be seen as a special type of WFA.
- Given an HMM, how to build an equivalent WFA?
34Converting HMM into WFA
- Given an HMM , build a WFA
- such that. for any input
sequence O, P(OHMM)P(OWFA). - Build a WFA add a final state and arcs to it
- Show that there is a one-to-one mapping between
the paths in HMM and the paths in WFA - Prove that the probabilities in HMM and in WFA
are identical.
35HMM ? WFA
Need to create a new state (the final state) and
add edges to it.
? The WFA is not a PFA.
36A slightly different definition of HMM
- A HMM is a tuple
- A set of states Ss1, s2, , sN.
- A set of output symbols Sw1, , wM.
- Initial state probabilities
- State transition prob Aaij.
- Symbol emission prob Bbijk
- qf is the final state there are no outcoming
edges from qf
37Constraints
For any HMM (under this new definition)
38HMM ? PFA
39PFA ? HMM
?
Need to add a new final state and edges to it
40Project Part 1
- Learn to use Carmel (a WFST package)
- Use Carmel as an HMM Viterbi decoder for a
trigram POS tagger. - The instruction will be handed out on 1/12, and
the project is due on 1/19.
41Summary
- FSA
- HMM
- Relation between FSA and HMM
- HMM (the common def) is a special case of WFA
- HMM (a different def) is equivalent to PFA.