Multiple Alignment - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Multiple Alignment

Description:

Time O(L3), memory O(L3) Multidimensional DP. Time O(LN) ... k=kilobytes. M=megabytes. G=gigabytes. T=terabytes. P=petabytes. E=exabytes. Progressive alignment ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 18
Provided by: biow
Category:

less

Transcript and Presenter's Notes

Title: Multiple Alignment


1
Multiple Alignment
  • BioE131/231

2
Pairwise Dynamic Programming (DP)
Time O(L2), memory O(L2)
3
Three-sequence DP
Time O(L3), memory O(L3)
4
Multidimensional DP
  • Time O(LN), memory O(LN)
  • Generally impractical, e.g. for globins (99aa)

kkilobytes
Mmegabytes
Ggigabytes
Tterabytes
Ppetabytes
Eexabytes
5
Progressive alignment
  • Estimate a guide tree (slowest step - why?)
  • Proceed up tree, building a profile for each
    internal node
  • Align siblings, going from leaves to root
  • Sequence-to-sequence (A-B, D-E)
  • Sequence-to-profile (U-C)
  • Profile-to-profile (V-W)

6
Whats a profile?
Alignment
Profile
7
Profile a.k.a. Position-specific Weight
Matrix (PWM)
8
Sequence logos
Scale each column by its entropy (technically,
the difference between its entropy and the
maximum possible entropy)
weblogo.berkeley.edu
9
Sequence logos
Globin, B helix to D helix
10
Scoring schemes
  • Scoring schemes up to now have been pairwise
  • Several ways of scoring a multiple alignment
    column
  • Entropy
  • Sum-of-pairs
  • Phylogenetic
  • Position-specific
  • Sequence-profile and profile-profile scoring

11
Entropy-like scores
  • If n(x) is the number of times residue x occurs
    in the column, then p(x) n(x) / N
  • Rewards homogeneous columns
  • Assumes each row is an independent draw from the
    same probability distribution
  • Equivalent to the following

can maximize with Lagrange multipliers
12
Sum-of-pairs score
  • i and j are row indices
  • xi is residue in row i (similarly xj)
  • Q(a,b) is pairwise substitution matrix
  • Problems overcounting

13
Probabilistic scoring
  • Recall pairwise substitution matrix is
    Q(a,b)log q(a,b)where Q is an additive score
    q is a multiplicative probability
  • Strictly, q is usually not a probability per se,
    but is related to one e.g. it might be a
    likelihood ratioq(a,b) P(a,b) / (P(a)P(b))
    P(ba) / P(a) P(ab) / P(b)

14
Phylogenetic score
(but... you dont actually know u,v,w,x. So what
do you do? The probabilistic answer sum them out)
...and then rearrange the sums optimally
(pruning)...
15
Position-specific score
  • Score for aligning two (or more) residues does
    not depend (directly) on their values
  • Instead, you specify particular scores for
    aligning each pair of positions
  • These can be obtained by pre-processing the
    sequences (e.g. scores derived from posterior
    probabilities from a Pair HMM), or by other means
  • e.g. T-COFFEE, PROBCONS

16
Profile-sequence
  • Q(a,b)log q(a,b) is pairwise substitution matrix
  • Aligning profile p(x) with residue y

17
Profile-profile
  • Q(a,b)log q(a,b) is pairwise substitution matrix
  • Aligning profile p1(x) with profile p2(y)
Write a Comment
User Comments (0)
About PowerShow.com