Title: Multiple Sequence Alignment
1Multiple Sequence Alignment
- Urmila Kulkarni-Kale
- Bioinformatics Centre
- University of Pune
- urmila_at_bioinfo.ernet.in
2Approaches MSA
- Dynamic programming
- Progressive alignment ClustalW
- Genetic algorithms SAGA
3Progressive alignment approach
- Align most related sequences
- Add on less related sequences to initial
alignment - Perform pairwise alignments of all sequences
- Use alignment scores to produce phylogenetic tree
- Align sequences sequentially, guided by the tree
- Gaps are added to an existing profile in
progressive methods
4No of pairwise alignments N(N-1)/2
5(No Transcript)
6Pairwise alignment Calculate the distance matrix
Unrooted Neighbor-joining tree
Rooted NJ tree Sequence weights
Progressive alignment using Guide tree
Steps in ClustalW Algorithm
7ClustalW weight
- groups of related sequences receive lower weight
- highly divergent sequences without any close
relatives receive high weights
8ClustalW affine Gap penalty
- GOP Gap Opening Penalty
- GEP Gap Extension Penalty
- Heuristics in calculating gap penalty
- Position specific penalty
- gap at position?
- yes ? lower GOP and GEP
- no, but gap within 8 residues ? increase GOP
- stretch of hydrophilic residues?
- yes ? lower GOP
- no ? use residue-specific gap propensities
Once a gap, always a gap
9Variation in local GOP
Lowest GOP in Hydrophilic regions
Initial GOP
10MSA help detect Similarity
Hemoglobin Human, chimpanzee, Goat, pig, horse
mouse
11Sample MSA
12Applications of MSA
- Detecting diagnostic patterns
- Phylogenetic analysis
- Primer design
- Prediction of protein secondary structure
- Finding novel relationships between genes
- Similar genes conserved across organisms
- Same or similar function
- Simultaneous alignment of similar genes yields
- regions subject to mutation
- regions of conservation
- mutations or rearrangements causing change in
conformation or function
13Limitations of Progressive alignment approach
- Greedy nature
- Any errors in the initial alignment are carried
through - More efficient for closely related sequences than
for divergent sequences