Integrating Biological Information In Multiple Sequence Alignments - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

Integrating Biological Information In Multiple Sequence Alignments

Description:

Abraham Wald's Work on Aircraft Suvivability, J. American ... Canary Island to Japan : 3,700 km (Reality: 12,000 km.) The More Structures The Merrier ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 58
Provided by: notre7
Category:

less

Transcript and Presenter's Notes

Title: Integrating Biological Information In Multiple Sequence Alignments


1
Integrating Biological Information In
Multiple Sequence Alignments
  • Confronting Bits and Pieces of Information

Cédric Notredame CNRS-Marseille,
France www.tcoffee.org
2
Manguel M, Samaniego F.J., Abraham Walds Work
on Aircraft Suvivability, J. American
Statistical Association. 79, 259-270, (1984)
3
What s in a Multiple Sequence Alignment (I)
Evolution Inertia Common Ancestry Shows up In
the sequences
Selection Important Features Are Preserved
Functional Constraint Same Function Same
Sequence Convergence
Phylogenetic Footprint, Evolutionary Trace
4
Why So Much Interest For Multiple Alignments ?
Extrapolation
Structure Prediction
Motifs/Patterns
SNP Analysis
Profiles
Regulatory Elements
Phylogeny
Reactivity Analysis
5
Whats in a Multiple Alignment (II)?
  • The MSA contains what you put inside
  • Structural Similarity
  • Evolutive Similarity
  • Sequence Similarity
  • You can view your MSA as
  • A record of evolution
  • A summary of a protein family
  • A collection of experiments made for you by
    Nature

6
Building and Using Models
35.67 Angstrom
7
Computing the Correct Alignment is a Complicated
Problem
8
Stochastic Optimization
9
Stochastic Optimization
  • Exploration of Complex Optimization Problems With
    Multiple Constraints
  • Genomic Alignments
  • RNA Alignments
  • Generation of Population of Suboptimal Solutions
  • Qualityf( optimality )
  • Specification of Concistency Objective Function
    of T-Coffee

10
Three Types of Algorithms
  • Progressive ClustalW
  • Iterative Muscle
  • Concistency Based T-Coffee and Probcons

11
ClustalW The Progressive Algorithm
12
T-Coffee and Concistency
SeqA GARFIELD THE LAST FAT CAT
SeqB GARFIELD THE FAST CAT
SeqC GARFIELD THE VERY FAST CAT
SeqD THE FAT CAT
SeqA GARFIELD THE LAST FA-T CAT SeqB GARFIELD THE
FAST CA-T --- SeqC GARFIELD THE VERY FAST
CAT SeqD -------- THE ---- FA-T CAT
13
T-Coffee and Concistency
SeqA GARFIELD THE LAST FAT CAT Prim. Weight
88 SeqB GARFIELD THE FAST CAT --- SeqA GARFIELD
THE LAST FA-T CAT Prim. Weight 77 SeqC
GARFIELD THE VERY FAST CAT SeqA GARFIELD THE
LAST FAT CAT Prim. Weight 100 SeqD --------
THE ---- FAT CAT SeqB GARFIELD THE ---- FAST CAT
Prim. Weight 100 SeqC GARFIELD THE VERY FAST
CAT SeqC GARFIELD THE VERY FAST CAT Prim.
Weight 100 SeqD -------- THE ---- FA-T CAT
14
T-Coffee and Concistency
15
T-Coffee and Concistency
16
T-Coffee and Concistency
17
T-Coffee and Concistency
18
T-Coffee and Concistency
19
T-Coffee and Concistency
  • Each Library Line is a Soft Constraint (a wish)
  • You cant satisfy them all
  • You must satisfy as many as possible (The easy
    ones)

20
T-Coffee and Concistency
21
Concistency Based Algorithms T-Coffee
  • Gotoh (1990)
  • Iterative strategy using consistency
  • Martin Vingron (1991)
  • Dot Matrices Multiplications
  • Accurate but too stringeant
  • Dialign (1996, Morgenstern)
  • Concistency
  • Agglomerative Assembly
  • T-Coffee (2000, Notredame)
  • Concistency
  • Progressive algorithm

22
How Good Is My Method ?
23
Structures Vs Sequences
24
Validation Using BaliBase
25
Too Many Methods for ONE AlignmentM-Coffee
26
(No Transcript)
27
Combining Many MSAs into ONE
ClustalW
MAFFT
T-Coffee
MUSCLE
???????
28
Comparing Methods
MAFFT
29
(No Transcript)
30
(No Transcript)
31
Estimating the Accuracy of your MSA
32
What To Do Without Structures
33
Where to Trust Your Alignments
Most Methods Disagree
Most Methods Agree
34
What To Do Without Structures
35
When Sequences Are not Enough3D-Coffee and
Expresso
36
3D-Coffee Combining Sequences and Structures
Within Multiple Sequence Alignments
37
3D-Coffee Combining Sequences and Structures
Within Multiple Sequence Alignments
38
Expresso Finding the Right Structure
Why Not Using Structure Based Alignments
39
Expresso Finding the Right Structure
Sources
BLAST
BLAST
SAP
Templates
Templates
Template Alignment
Source Template Alignment
Library
Remove Templates
40
3D-Coffee Combining Sequences and Structures
Within Multiple Sequence Alignments
41
Template Based Multiple Sequence Alignments
42
Template Based Multiple Sequence Alignments
Sources
-Structure -Profile -
Template Aligner
-Structure -Profile -
Templates
Templates
Template Alignment
Source Template Alignment
Library
Remove Templates
43
Method Score Templates Prefab Homstrad
-------------------------------------------------
------------- ClustalW Matrix ---- 61.80 ----
Kalign Matrix ---- 63.00 ---- MUSCLE Matrix
---- 68.00 45.0 --------------------------------
------------------------------ T-Coffee Consisten
cy ---- 69.97 44.0 ProbCons Consistency ---- 7
0.54 ---- Mafft Consistency ---- 72.20 ---- M-
Coffee Consistency ---- 72.91 ---- MUMMALS Consi
stency ---- 73.10 ---- -------------------------
------------------------------------- Clustal-db
Matrix Profiles ---- ---- PRALINE Matrix Profi
les ---- 50.2 PROMALS Consistency Profiles 79.00
---- SPEM Matrix Profiles 77.00 ---- ---------
--------------------------------------------------
--- EXPRESSO Consistency Structures ---- 71.9
T-Lara Consistency Structures ---- ---- ------
--------------------------------------------------
------ Table 1. Summary of all the methods
described in the review. Validation figures were
compiled from several sources, and selected for
the compatibility. Prefab refers to some
validation made on Prefab Version 3. The HOMSTRAD
validation was made on datasets having less than
30 identity. The source of each figure is
indicated by a reference. The EXPRESSO figure
comes from a slightly more demanding subset of
HOMSTRAD (HOM39) made of sequences less than 25
identical.
44
Improving The Evaluation
45
How Do We Perform In The Twilight Zone?
  • Concistency Based Methods Have an Edge
  • Hard to tell Methods Apart
  • Sequence Alignment is NOT solved

46
More Than Structure based Alignments
  • Structural Correctness Is Only the Easy Side of
    the Coin.
  • In practice MSA are intermediate models used to
    generate other models

47
Conclusion
  • Template based Multiple Sequence Alignments
  • Projecting any relevant information onto the
    sequences
  • Using this Information
  • Need for new evaluation procedures
  • Functional Analysis
  • Phylogenetic Analysis
  • Homology Search (Profiles)
  • Homology Modelling
  • Integrating data ? Making sure your bits of data
    can fight with one another

48
  • Fabrice Armougom (CNRS, FR)
  • Sebastien Moretti (CNRS, FR)
  • Olivier Poirot (CNRS, FR)
  • Frederic Reinier (CRS4, IT)
  • Karsten Suhre (CNRS, FR)
  • Vladimir Saudek (Sanofi-Aventis, FR)
  • Des Higgins (UCD, IE)
  • Orla OSullivan (UCD, IE)
  • Iain Wallace (UCD, IE)
  • Victor Jongeneel (SIB/VitalIT, CH)
  • Bruno Nyfler (VitalIT, CH)
  • Roger Hersch (EPFL, CH)
  • Pierre Dumas (EPFL, CH)
  • Basile Schaeli (EPFL, CH)

www.tcoffee.org cedric.notredame_at_europe.com
49
www.tcoffee.org cedric.notredame_at_europe.com
50
(No Transcript)
51
Turning Data into Models
  • Data
  • Columbus, considered that the landmass occupied
    225, leaving only 135 of water (Marinus of
    Tyre, 70 AD).
  • Columbus believed that 1 represented only 56
    miles (Alfraganus, XIth century)
  • He knew there was an island named Japan off the
    cost of China
  • Model
  • Circumference of the Earth as 25,255 km at most,
  • Canary Island to Japan 3,700 km (Reality
    12,000 km.)

52
The More Structures The Merrier
Average Improvement over T-Coffee
Struc/Seq Ratio
53
The Right Mixt of Methods
54
3D-Coffee Combining Sequences and Structures
Within Multiple Sequence Alignments
55
Applications
56
Looking-Up The DNA Behind The Sequences PROTOGENE
57
SAR Analysis
  • Correlate Alignment Variations with Reactivity
  • Application to the Human Kinome
  • Collaboration with Sanofi-Aventis
  • Main Issue
  • Training problem ? Proper Benchmarking
Write a Comment
User Comments (0)
About PowerShow.com