Hidden Markov Models - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Hidden Markov Models

Description:

Lecture 6, Thursday April 17, 2003. Lecture 6, Thursday April 17, ... Parenthesis log likelihoods -0.679 0.393 0.573 -1.169. T -0.730 0.331 0.461 -0.624 ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 36
Provided by: Sera4
Category:

less

Transcript and Presenter's Notes

Title: Hidden Markov Models


1
Hidden Markov Models
Lecture 6, Thursday April 17, 2003
2
Review of Last Lecture
Lecture 6, Thursday April 17, 2003
3
Decoding
  • GIVEN x x1x2xN
  • We want to find ? ?1, , ?N,
  • such that P x, ? is maximized
  • ? argmax? P x, ?
  • We can use dynamic programming!
  • Let Vk(i) max?1,,i-1 Px1xi-1, ?1, , ?i-1,
    xi, ?i k
  • Probability of most likely sequence of states
    ending at state ?i k

4
The Viterbi Algorithm
x1 x2 x3 ..xN





State 1
2
Vj(i)
K
  • Similar to aligning a set of states to a
    sequence
  • Time
  • O(K2N)
  • Space
  • O(KN)

5
Evaluation
  • Compute
  • P(x) Probability of x given the model
  • P(xixj) Probability of a substring of x given
    the model
  • P(?I k x) Probability that the ith state is
    k, given x

6
The Forward Algorithm
  • We can compute fk(i) for all k, i, using dynamic
    programming!
  • Initialization
  • f0(0) 1
  • fk(0) 0, for all k gt 0
  • Iteration
  • fl(i) el(xi) ?k fk(i-1) akl
  • Termination
  • P(x) ?k fk(N) ak0
  • Where, ak0 is the probability that the
    terminating state is k (usually a0k)

7
The Backward Algorithm
  • We can compute bk(i) for all k, i, using dynamic
    programming
  • Initialization
  • bk(N) ak0, for all k
  • Iteration
  • bk(i) ?l el(xi1) akl bl(i1)
  • Termination
  • P(x) ?l a0l el(x1) bl(1)

8
Posterior Decoding
  • We can now calculate
  • fk(i) bk(i)
  • P(?i k x)
  • P(x)
  • Then, we can ask
  • What is the most likely state at position i of
    sequence x
  • Define ? by Posterior Decoding
  • ?i argmaxk P(?i k x)

9
Today
  • Example CpG Islands
  • Learining

10
Implementation Techniques
  • Viterbi Sum-of-logs
  • Vl(i) log ek(xi) maxk Vk(i-1) log akl
  • Forward/Backward Scaling by c(i)
  • One way to perform scaling
  • fl(i) c(i) ? el(xi) ?k fk(i-1) akl
  • where c(i) 1/( ?k fk(i))
  • bl(i) use the same factors c(i)
  • Details in Rabiners Tutorial on HMMs, 1989

11
A modeling Example
  • CpG islands in DNA sequences

12
Example CpG Islands
CpG nucleotides in the genome are frequently
methylated (Write CpG not to confuse with CG
base pair) C ? methyl-C ? T Methylation
often suppressed around genes, promoters ?
CpG islands
13
Example CpG Islands
  • In CpG islands,
  • CG is more frequent
  • Other pairs (AA, AG, AT) have different
    frequencies
  • Question Detect CpG islands computationally

14
A model of CpG Islands (1) Architecture
A
C
G
T
CpG Island
A-
C-
G-
T-
Not CpG Island
15
A model of CpG Islands (2) Transitions
  • How do we estimate the parameters of the model?
  • Emission probabilities 1/0
  • Transition probabilities within CpG islands
  • Established from many known (experimentally
    verified)
  • CpG islands
  • (Training Set)
  • Transition probabilities within other regions
  • Established from many known non-CpG islands

A C G T
A .180 .274 .426 .120
C .171 .368 .274 .188
G .161 .339 .375 .125
T .079 .355 .384 .182
- A C G T
A .300 .205 .285 .210
C .233 .298 .078 .302
G .248 .246 .298 .208
T .177 .239 .292 .292
16
Parenthesis log likelihoods
A better way to see effects of transitions Log
likelihoods L(u, v) log P(uv ) / P(uv
-) Given a region x x1xN A quick--dirty
way to decide whether entire x is CpG P(x is
CpG) gt P(x is not CpG) ? ?i L(xi, xi1) gt 0
A C G T
A -0.740 0.419 0.580 -0.803
C -0.913 0.302 1.812 -0.685
G -0.624 0.461 0.331 -0.730
T -1.169 0.573 0.393 -0.679
17
A model of CpG Islands (2) Transitions
  • What about transitions between () and (-)
    states?
  • They affect
  • Avg. length of CpG island
  • Avg. separation between two CpG islands

Length distribution of region X PlX 1
1-p PlX 2 p(1-p) PlX k
pk(1-p) ElX 1/(1-p) Exponential
distribution, with mean 1/(1-p)
1-p
X
Y
p
q
1-q
18
A model of CpG Islands (2) Transitions
  • No reason to favor exiting/entering () and (-)
    regions at a particular nucleotide
  • To determine transition probabilities between ()
    and (-) states
  • Estimate average length of a CpG island lCPG
    1/(1-p) ? p 1 1/lCPG
  • For each pair of () states k, l, let akl ? p
    akl
  • For each () state k, (-) state l, let akl
    (1-p)/4
  • (better take frequency of l in the (-) regions
    into account)
  • Do the same for (-) states
  • A problem with this model CpG islands dont have
    exponential length distribution
  • This is a defect of HMMs compensated with ease
    of analysis computation

19
Applications of the model
  • Given a DNA region x,
  • The Viterbi algorithm predicts locations of CpG
    islands
  • Given a nucleotide xi, (say xi A)
  • The Viterbi parse tells whether xi is in a CpG
    island in the most likely general scenario
  • The Forward/Backward algorithms can calculate
  • P(xi is in CpG island) P(?i A x)
  • Posterior Decoding can assign locally optimal
    predictions of CpG islands
  • ?i argmaxk P(?i k x)

20
What if a new genome comes?
  • We just sequenced the porcupine genome
  • We know CpG islands play the same role in this
    genome
  • However, we have no known CpG islands for
    porcupines
  • We suspect the frequency and characteristics of
    CpG islands are quite different in porcupines
  • How do we adjust the parameters in our model?
  • - LEARNING

21
Problem 3 Learning
  • Re-estimate the parameters of the model based on
    training data

22
Two learning scenarios
  • Estimation when the right answer is known
  • Examples
  • GIVEN a genomic region x x1x1,000,000 where
    we have good (experimental) annotations of the
    CpG islands
  • GIVEN the casino player allows us to observe
    him one evening, as he changes dice and
    produces 10,000 rolls
  • Estimation when the right answer is unknown
  • Examples
  • GIVEN the porcupine genome we dont know how
    frequent are the CpG islands there, neither do
    we know their composition
  • GIVEN 10,000 rolls of the casino player, but
    we dont see when he changes dice
  • QUESTION Update the parameters ? of the model to
    maximize P(x?)

23
1. When the right answer is known
  • Given x x1xN
  • for which the true ? ?1?N is known,
  • Define
  • Akl times k?l transition occurs in ?
  • Ek(b) times state k in ? emits b in x
  • We can show that the maximum likelihood
    parameters ? are
  • Akl Ek(b)
  • akl ek(b)
  • ?i Aki ?c Ek(c)

24
1. When the right answer is known
  • Intuition When we know the underlying states,
  • Best estimate is the average frequency
    of
  • transitions emissions that occur in
    the training data
  • Drawback
  • Given little data, there may be overfitting
  • P(x?) is maximized, but ? is unreasonable
  • 0 probabilities VERY BAD
  • Example
  • Given 10 casino rolls, we observe
  • x 2, 1, 5, 6, 1, 2, 3, 6, 2, 3
  • ? F, F, F, F, F, F, F, F, F, F
  • Then
  • aFF 1 aFL 0
  • eF(1) eF(3) .2
  • eF(2) .3 eF(4) 0 eF(5) eF(6) .1

25
Pseudocounts
  • Solution for small training sets
  • Add pseudocounts
  • Akl times k?l transition occurs in ? rkl
  • Ek(b) times state k in ? emits b in x
    rk(b)
  • rkl, rk(b) are pseudocounts representing our
    prior belief
  • Larger pseudocounts ? Strong priof belief
  • Small pseudocounts (? lt 1) just to avoid 0
    probabilities

26
Pseudocounts
  • Example dishonest casino
  • We will observe player for one day, 500 rolls
  • Reasonable pseudocounts
  • r0F r0L rF0 rL0 1
  • rFL rLF rFF rLL 1
  • rF(1) rF(2) rF(6) 20 (strong belief
    fair is fair)
  • rF(1) rF(2) rF(6) 5 (wait and see for
    loaded)
  • Above s pretty arbitrary assigning priors is
    an art

27
2. When the right answer is unknown
  • We dont know the true Akl, Ek(b)
  • Idea
  • We estimate our best guess on what Akl, Ek(b)
    are
  • We update the parameters of the model, based on
    our guess
  • We repeat

28
2. When the right answer is unknown
  • Starting with our best guess of a model M,
    parameters ?
  • Given x x1xN
  • for which the true ? ?1?N is unknown,
  • We can get to a provably more likely parameter
    set ?
  • Principle EXPECTATION MAXIMIZATION
  • Estimate Akl, Ek(b) in the training data
  • Update ? according to Akl, Ek(b)
  • Repeat 1 2, until convergence

29
Estimating new parameters
  • To estimate Akl
  • At each position i of sequence x,
  • Find probability transition k?l is used
  • P(?i k, ?i1 l x) 1/P(x) ? P(?i k,
    ?i1 l, x1xN) Q/P(x)
  • where Q P(x1xi, ?i k, ?i1 l, xi1xN)
  • P(?i1 l, xi1xN ?i k) P(x1xi, ?i
    k)
  • P(?i1 l, xi1xi2xN ?i k) fk(i)
  • P(xi2xN ?i1 l) P(xi1 ?i1 l)
    P(?i1 l ?i k) fk(i)
  • bl(i1) el(xi1) akl fk(i)
  • fk(i) akl el(xi1) bl(i1)
  • So P(?i k, ?i1 l x, ?)
  • P(x ?)

30
Estimating new parameters
  • So,
  • fk(i) akl
    el(xi1) bl(i1)
  • Akl ?i P(?i k, ?i1 l x, ?) ?i
  • P(x ?)
  • Similarly,
  • Ek(b) 1/P(x)? i xi b fk(i)
    bk(i)

31
Estimating new parameters
  • If we have several training sequences, x1, , xM,
    each of length N,
  • fk(i) akl el(xi1)
    bl(i1)
  • Akl ?j ?i P(?i k, ?i1 l x, ?) ?j ?i
  • P(x ?)
  • Similarly,
  • Ek(b) ?j (1/P(xj))? i xi b fk(i)
    bk(i)

32
The Baum-Welch Algorithm
  • Initialization
  • Pick the best-guess for model parameters
  • (or arbitrary)
  • Iteration
  • Forward
  • Backward
  • Calculate Akl, Ek(b)
  • Calculate new model parameters akl, ek(b)
  • Calculate new log-likelihood P(x ?)
  • GUARANTEED TO BE HIGHER BY EXPECTATION-MAXIMIZAT
    ION
  • Until P(x ?) does not change much

33
The Baum-Welch Algorithm comments
  • Time Complexity
  • iterations ? O(K2N)
  • Guaranteed to increase the log likelihood of the
    model
  • P(? x) P(x, ?) / P(x) P(x ?) / ( P(x)
    P(?) )
  • Not guaranteed to find globally best parameters
  • Converges to local optimum, depending on initial
    conditions
  • Too many parameters / too large
    model Overtraining

34
Alternative Viterbi Training
  • Initialization Same
  • Iteration
  • Perform Viterbi, to find ?
  • Calculate Akl, Ek(b) according to ?
    pseudocounts
  • Calculate the new parameters akl, ek(b)
  • Until convergence
  • Notes
  • Convergence is guaranteed Why?
  • Does not maximize P(x ?)
  • In general, worse performance than Baum-Welch
  • Convenient when interested in Viterbi parsing,
    no need to implement additional procedures
    (Forward, Backward)!!

35
Exercise Submit any time Groups up to 3
  • Implement a HMM for the dishonest casino (or any
    other simple process you feel like)
  • Generate training sequences with the model
  • Implement Baum-Welch and Viterbi training
  • Show a few sets of initial parameters such that
  • Baum-Welch and Viterbi differ significantly,
    and/or
  • Baum-Welch converges to parameters close to the
    model, and to unreasonable parameters, depending
    on initial parameters
  • Do not use 0-probability transitions
  • Do not use 0s in the initial parameters
  • Do use pseudocounts in Viterbi
  • This exercise will replace the 1-3 lowest
    problems, depending on thoroughness
Write a Comment
User Comments (0)
About PowerShow.com