Title: One way cells differentiate is methylation. Addition of CH
1A modeling Example
- CpG islands in DNA sequences
2Methylation Silencing
- One way cells differentiate is methylation
- Addition of CH3 in C-nucleotides
- Silences genes in region
- CG (denoted CpG) often mutates to TG, when
methylated - In each cell, one copy of X is silenced,
methylation plays role - Methylation is inherited during cell division
3Example CpG Islands
CpG nucleotides in the genome are frequently
methylated (Write CpG not to confuse with CG
base pair) C ? methyl-C ? T Methylation
often suppressed around genes, promoters ?
CpG islands
4Example CpG Islands
- In CpG islands,
- CG is more frequent
- Other pairs (AA, AG, AT) have different
frequencies - Question Detect CpG islands computationally
5A model of CpG Islands (1) Architecture
A
C
G
T
CpG Island
A-
C-
G-
T-
Not CpG Island
6A model of CpG Islands (2) Transitions
- How do we estimate parameters of the model?
- Emission probabilities 1/0
- Transition probabilities within CpG islands
- Established from known CpG islands
- (Training Set)
- Transition probabilities within other regions
- Established from known non-CpG islands
- (Training Set)
- Note these transitions out of each state add up
to oneno room for transitions between () and
(-) states
1
1
1
1
1
1
1
1
7Log LikehoodsTelling CpG Island from Non-CpG
Island
Another way to see effects of transitions Log
likelihoods L(u, v) log P(uv ) / P(uv
-) Given a region x x1xN A quick--dirty
way to decide whether entire x is CpG P(x is
CpG) gt P(x is not CpG) ? ?i L(xi, xi1) gt 0
8A model of CpG Islands (2) Transitions
- What about transitions between () and (-)
states? - They affect
- Avg. length of CpG island
- Avg. separation between two CpG islands
1-p
Length distribution of region X PlX 1
1-p PlX 2 p(1-p) PlX k
pk-1(1-p) ElX 1/(1-p) Geometric
distribution, with mean 1/(1-p)
X
Y
p
q
1-q
9Applications of the model
- Given a DNA region x,
- The Viterbi algorithm predicts locations of CpG
islands - Given a nucleotide xi, (say xi A)
- The Viterbi parse tells whether xi is in a CpG
island in the most likely general scenario - The Forward/Backward algorithms can calculate
- P(xi is in CpG island) P(?i A x)
- Posterior Decoding can assign locally optimal
predictions of CpG islands -
- ?i argmaxk P(?i k x)
- Advantage ?Each nucleotide is more likely to
be called correctly - Disadvantage ?The overall parse will be
choppyCpG islands too short
Advantage/Disadvantage?
10What if a new genome comes?
- We just sequenced the porcupine genome
- We know CpG islands play the same role in this
genome - However, we have no known CpG islands for
porcupines - We suspect the frequency and characteristics of
CpG islands are quite different in porcupines - How do we adjust the parameters in our model?
- LEARNING
11Learning
- Re-estimate the parameters of the model based on
training data
12Two learning scenarios
- Estimation when the right answer is known
- Examples
- GIVEN a genomic region x x1x1,000,000 where
we have good (experimental) annotations of the
CpG islands -
- GIVEN the casino player allows us to observe
him one evening, as he changes dice and
produces 10,000 rolls -
- Estimation when the right answer is unknown
- Examples
- GIVEN the porcupine genome we dont know how
frequent are the CpG islands there, neither do
we know their composition - GIVEN 10,000 rolls of the casino player, but
we dont see when he changes dice - QUESTION Update the parameters ? of the model to
maximize P(x?)
131. When the right answer is known
- Given x x1xN
- for which the true ? ?1?N is known,
- Define
- Akl times k?l transition occurs in ?
- Ek(b) times state k in ? emits b in x
- We can show that the maximum likelihood
parameters ? (maximize P(x?)) are - Akl Ek(b)
- akl ek(b)
- ?i Aki ?c Ek(c)
141. When the right answer is known
- Intuition When we know the underlying states,
- Best estimate is the normalized
frequency of - transitions emissions that occur in
the training data - Drawback
- Given little data, there may be overfitting
- P(x?) is maximized, but ? is unreasonable
- 0 probabilities BAD
- Example
- Given 10 casino rolls, we observe
- x 2, 1, 5, 6, 1, 2, 3, 6, 2, 3
- ? F, F, F, F, F, F, F, F, F, F
- Then
- aFF 1 aFL 0
- eF(1) eF(3) .2
- eF(2) .3 eF(4) 0 eF(5) eF(6) .1
15Pseudocounts
- Solution for small training sets
- Add pseudocounts
- Akl times k?l transition occurs in ? rkl
- Ek(b) times state k in ? emits b in x
rk(b) - rkl, rk(b) are pseudocounts representing our
prior belief - Larger pseudocounts ? Strong priof belief
- Small pseudocounts (? lt 1) just to avoid 0
probabilities
16Pseudocounts
- Example dishonest casino
- We will observe player for one day, 600 rolls
- Reasonable pseudocounts
-
- r0F r0L rF0 rL0 1
- rFL rLF rFF rLL 1
- rF(1) rF(2) rF(6) 20 (strong belief
fair is fair) - rL(1) rL(2) rL(6) 5 (wait and see for
loaded) - Above s are arbitrary assigning priors is an
art
172. When the right answer is unknown
- We dont know the true Akl, Ek(b)
- Idea
- We estimate our best guess on what Akl, Ek(b)
are - Or, we start with random / uniform values
- We update the parameters of the model, based on
our guess - We repeat
182. When the right answer is unknown
- Starting with our best guess of a model M,
parameters ? - Given x x1xN
- for which the true ? ?1?N is unknown,
- We can get to a provably more likely parameter
set ? - i.e., ? that increases the probability P(x ?)
- Principle EXPECTATION MAXIMIZATION
- Estimate Akl, Ek(b) in the training data
- Update ? according to Akl, Ek(b)
- Repeat 1 2, until convergence
19Estimating new parameters
- To estimate Akl (assume ?CURRENT, in all
formulas below) - At each position i of sequence x, find
probability transition k?l is used - P(?i k, ?i1 l x)
- 1/P(x) ? P(?i k, ?i1 l, x1xN)
Q/P(x) - where Q P(x1xi, ?i k, ?i1 l, xi1xN)
- P(?i1 l, xi1xN ?i k) P(x1xi, ?i
k) - P(?i1 l, xi1xi2xN ?i k) fk(i)
- P(xi2xN ?i1 l) P(xi1 ?i1 l)
P(?i1 l ?i k) fk(i) - bl(i1) el(xi1) akl fk(i)
- fk(i) akl el(xi1) bl(i1)
- So P(?i k, ?i1 l x, ?)
- P(x ?CURRENT)
20Estimating new parameters
- So, Akl is the E times transition k?l, given
current ? - fk(i) akl
el(xi1) bl(i1) - Akl ?i P(?i k, ?i1 l x, ?) ?i
- P(x ?)
- Similarly,
- Ek(b) 1/P(x ?)? i xi b fk(i)
bk(i)
bl(i1)
fk(i)
akl
k
l
xi2xN
x1xi-1
el(xi1)
xi1
xi
21The Baum-Welch Algorithm
- Initialization
- Pick the best-guess for model parameters
- (or arbitrary)
- Iteration
- Forward
- Backward
- Calculate Akl, Ek(b), given ?CURRENT
- Calculate new model parameters ?NEW akl,
ek(b) - Calculate new log-likelihood P(x ?NEW)
-
- GUARANTEED TO BE HIGHER BY EXPECTATION-MAXIMIZATIO
N - Until P(x ?) does not change much
22The Baum-Welch Algorithm
- Time Complexity
-
- iterations ? O(K2N)
- Guaranteed to increase the log likelihood P(x
?) - Not guaranteed to find globally best parameters
- Converges to local optimum, depending on initial
conditions - Too many parameters / too large
model Overtraining
23Alternative Viterbi Training
- Initialization Same
- Iteration
- Perform Viterbi, to find ?
- Calculate Akl, Ek(b) according to ?
pseudocounts - Calculate the new parameters akl, ek(b)
- Until convergence
- Notes
- Not guaranteed to increase P(x ?)
- Guaranteed to increase P(x ?, ?)
- In general, worse performance than Baum-Welch