Multiple Alignment

About This Presentation

Title:

Multiple Alignment

Description:

Time O(L3), memory O(L3) Multidimensional DP. Time O(LN) ... k=kilobytes. M=megabytes. G=gigabytes. T=terabytes. P=petabytes. E=exabytes. Progressive alignment ... – PowerPoint PPT presentation

Number of Views:45

Avg rating:3.0/5.0

Slides: 18

Provided by: biow

Category:

more less

Transcript and Presenter's Notes

Title: Multiple Alignment

1
Multiple Alignment

BioE131/231

2
Pairwise Dynamic Programming (DP)
Time O(L2), memory O(L2)
3
Three-sequence DP
Time O(L3), memory O(L3)
4
Multidimensional DP

Time O(LN), memory O(LN)
Generally impractical, e.g. for globins (99aa)

kkilobytes
Mmegabytes
Ggigabytes
Tterabytes
Ppetabytes
Eexabytes
5
Progressive alignment

Estimate a guide tree (slowest step - why?)
Proceed up tree, building a profile for each
internal node
Align siblings, going from leaves to root
Sequence-to-sequence (A-B, D-E)
Sequence-to-profile (U-C)
Profile-to-profile (V-W)

6
Whats a profile?
Alignment
Profile
7
Profile a.k.a. Position-specific Weight
Matrix (PWM)
8
Sequence logos
Scale each column by its entropy (technically,
the difference between its entropy and the
maximum possible entropy)
weblogo.berkeley.edu
9
Sequence logos
Globin, B helix to D helix
10
Scoring schemes

Scoring schemes up to now have been pairwise
Several ways of scoring a multiple alignment
column
Entropy
Sum-of-pairs
Phylogenetic
Position-specific
Sequence-profile and profile-profile scoring

11
Entropy-like scores

If n(x) is the number of times residue x occurs
in the column, then p(x) n(x) / N
Rewards homogeneous columns
Assumes each row is an independent draw from the
same probability distribution
Equivalent to the following

can maximize with Lagrange multipliers
12
Sum-of-pairs score

i and j are row indices
xi is residue in row i (similarly xj)
Q(a,b) is pairwise substitution matrix
Problems overcounting

13
Probabilistic scoring

Recall pairwise substitution matrix is
Q(a,b)log q(a,b)where Q is an additive score
q is a multiplicative probability
Strictly, q is usually not a probability per se,
but is related to one e.g. it might be a
likelihood ratioq(a,b) P(a,b) / (P(a)P(b))
P(ba) / P(a) P(ab) / P(b)

14
Phylogenetic score
(but... you dont actually know u,v,w,x. So what
do you do? The probabilistic answer sum them out)
...and then rearrange the sums optimally
(pruning)...
15
Position-specific score

Score for aligning two (or more) residues does
not depend (directly) on their values
Instead, you specify particular scores for
aligning each pair of positions
These can be obtained by pre-processing the
sequences (e.g. scores derived from posterior
probabilities from a Pair HMM), or by other means
e.g. T-COFFEE, PROBCONS

16
Profile-sequence