Trees, Stars, and Multiple Biological Sequence Alignment - PowerPoint PPT Presentation

About This Presentation
Title:

Trees, Stars, and Multiple Biological Sequence Alignment

Description:

Forms an -dimensional parallelepiped. 02/19/2004. 7. Paths. 2 dimensions. 3 dimensions ... 3-dimensional parallelepiped. sublattice. Sequences DQLF, DNVQ, QGL ... – PowerPoint PPT presentation

Number of Views:158
Avg rating:3.0/5.0
Slides: 30
Provided by: jessewo
Category:

less

Transcript and Presenter's Notes

Title: Trees, Stars, and Multiple Biological Sequence Alignment


1
Trees, Stars, and Multiple Biological Sequence
Alignment
  • Jesse Wolfgang
  • CSE 497
  • February 19, 2004

2
Importance?
  • Molecular evolution (Dayhoff)
  • RNA folding (Trifonov, Bolshoi)
  • Gene regulation (Galas et al.)
  • Protein structure-function relationships(Wu,
    Kabat)

3
Introduction
  • Original sequence unknown
  • Must consider all possible transformations
  • Including insertions, deletions, and replacements
  • Choose the most likely set of transformations
  • With a given model of protein evolution

4
Sequences and Alignments
5
Alignments
  • Ex sequences DQLF, DNVQ, QGL

6
Lattices and Paths
7
Paths
2n-1 O(2n)
2 dimensions
3 dimensions
3 possible paths
7 possible paths
8
Paths
  • Sequences DQLF, DNVQ, QGL

9
Paths and Sequence Length
  • Sequences ABCD, ABD, BCD

10
Paths and Sequence Length
  • Sequences ABCD, EFGH, IJK

11
Projections
  • Sequences DQLF, DNVQ, QGL

12
Optimal Paths
13
Calculating Optimal Paths
14
Problems with This Algorithm
  • Calculates a weighted sum of its projected
    pairwise alignments
  • Called Sum-of-the-Pairs (SP)
  • Other methods fit biological intuition more
    closely

15
Tree-Alignment
  • Treat sequences as leaves of an evolutionary tree
  • Reconstruct ancestral sequences which minimize
    the cost of the tree
  • Must assign sequences to internal nodes
  • Align the given and reconstructed sequences
  • Star-alignment only one internal node

16
Tree-Alignment
  • Many different methods for calculating tree
    alignments
  • Discuss version used by ClustalX

17
Tree-Alignment in ClustalX
  • Three main parts
  • Perform pairwise alignment on all sequences to
    calculate a distance matrix
  • Use distance matrix to calculate a guide tree
  • Sequences are progressively aligned using the
    branching order in the guide tree

http//bimas.dcrt.nih.gov/clustalw/clustalw.html
18
Calculating Distance Matrix
  • Use standard dynamic programming to find the best
    alignment
  • Gap penalties for opening a gap and continuing a
    gap (possibly different)
  • Divide number of matches by total number of
    residues compared (excluding gaps)
  • Convert to distances by dividing by 100 and
    subtracting from 1
  • Gives one entry in the n by n matrix

19
Calculating Distance Matrix
  • Ex sequences ATCG, ATCC, AGGC, AGCC

20
Calculating Distance Matrix
ATCG ATCT AGGC GCAA
ATCG -- -- -- --
ATCT .9925 -- -- --
AGGC .9975 .9975 -- --
GCAA 1 1 1 --
21
Calculating a Guide Tree
  • Using Nearest-Neighbor method to group sequences
  • Results in an unrooted tree
  • Branch lengths proportional to estimated
    divergence
  • Mid-point method used to determine root
  • Means of the branch lengths to each side of the
    root are equal (or approximately equal)

22
Calculating a Guide Tree
AGAA
GCAA
AGCC
AGGC
ATCG
ATCT
ATCG
23
Calculating a Guide Tree
24
Progressive Alignment
  • Perform a series of pairwise alignments
  • Slowly align larger and larger groups of
    sequences
  • Follow the branching order of the tree
  • From leaves to root

25
Progressive Alignment
AGCC
ATCG
AGAA
26
Alignment Costs
Traditional
A, A, A, C, C
A, A, A, C, C
A, A, A, C, C
27
Alignment Inconsistencies
  • Different definitions of multiple alignments can
    yield different optimal alignments
  • Optimal tree-alignments minimize number of
    mutations from theorized common ancestors
  • SP-alignments maximize number of positions where
    aligned sequences agree
  • Sometimes makes more biological sense since
    certain regions of proteins more likely to mutate

28
Alignment Inconsistencies
  • Ex cost of 1 for aligning two different
    letters, cost of 2 for aligning a letter with a
    null
  • Sequences ACC, ACC, TCT, ATCT

Input sequences Reconstructedsequences
29
ClustalX Demo
  • Multiple sequence alignment program
  • For more information on ClustalX
  • http//www.at.embnet.org/embnet/progs/clustal/clus
    talx.htm
Write a Comment
User Comments (0)
About PowerShow.com