Title: The Bioinformatics Toolkit at the MPI for Developmental Biology
1The Bioinformatics Toolkit at the MPI for
Developmental Biology
Workshop Systems Biology Berlin, March 3,
2006 Johannes Söding Department for Protein
Evolution (Andrei Lupas) Max-Planck-Institute for
Developmental Biology
2Our toolkit assists the departments research in
protein evolution and makes methods developed
in our group accessible to a larger public
Sequence similarity searches Multiple sequence
alignment Sequence analysis (repeats,
periodicities, subtyping) Secondary structure and
transmembrane prediction Tertiary structure
prediction and structure analysis Phylogeny and
classification Utilities (reformatting, sequence
retrieval, filtering)
3Overview page for Sequence Search
toolkit
4PSI-BLAST has enhanced functionality over NCBI
- Select subsets out of gt300 genomes
- Upload personal databases
- Change databases between search rounds
- Show colored multiple alignment (JalView)
- Submit results to other tools
57636
5Quick2D integrates results of various 2ndary
structure prediction programs
Contributed by Christian Mayer, MPI-DevBio
68748
6REPPER detects periodic regions in proteins
Gruber M, Söding J, and Lupas AN. (2005) NAR 33,
W239-243.
92259
7Several tools rely on a sensitive new method for
remote homology detection
HHrep De-novo repeat detection
HHpred Structure and function prediction by
detecting remote homologs in databases such as
the PDB, SCOP, Pfam, Smart, InterPro, CDD at
NCBI, HHsenser Sequence search method that
employs exhaustive intermediate profile search
Underlying method Pairwise comparison of profile
hidden Markov models (HMMs) What is a sequence
profile? What is a profile HMM?
8Sequence profiles are a condensed representation
of multiple alignments
master sequence
HBA_human ... W G K V G A - - H A G E
... HBB_human ... W G K V - - - - N V
D E ... MYG_phyca ... W G K V E A - - D
V A G ... LGB2_luplu ... W K D F N A -
- N I P K ... GLB1_glydi ... W E E I A G
A D N G A G ...
Each column of the profile pj(a) contains the
amino acid frequencies in the multiple sequence
alignment
9HMMs include position-specific gap penalties
Match or Delete
Deletions Insertions
M/D M/D M/D I
I M/D M/D M/D M/D M/D
HBA_human ... V G A . . H A G E Y
... HBB_human ... V - - . . N V D E V
... MYG_phyca ... V E A . . D V A G H
... LGB2_luplu ... F N A . . N I P K H
... GLB1_glydi ... I A G a d N G A G V
...
Probabilities for Insert Open Insert Extend
Delete Open Delete Extend
10Profile HMMs can be represented as states
connected by transitions
M/D M/D M/D
I I M/D M/D M/D M/D M/D
HBA_human ... V G A . . H A G E
Y ... HBB_human ... V - - . . N V
D E V ... MYG_phyca ... V E A . .
D V A G H ... LGB2_luplu ... F N A
. . N I P K H ... GLB1_glydi ... I
A G a d N G - G V ...
HMM p
Matrix
pi(a)
pi(X?Y)
11Profile HMMs can be represented as states
connected by transitions
M/D M/D M/D
I I M/D M/D M/D M/D M/D
HBA_human ... V G A . . H A G E
Y ... HBB_human ... V - - . . N V
D E V ... MYG_phyca ... V E A . .
D V A G H ... LGB2_luplu ... F N A
. . N I P K H ... GLB1_glydi ... I
A G a d N G - G V ...
HMM p
Matrix
pi(a)
pi(X?Y)
12Profile HMMs can be represented as states
connected by transitions
M/D M/D M/D
I I M/D M/D M/D M/D M/D
HBA_human ... V G A . . H A G E
Y ... HBB_human ... V - - . . N V
D E V ... MYG_phyca ... V E A . .
D V A G H ... LGB2_luplu ... F N A
. . N I P K H ... GLB1_glydi ... I
A G a d N G - G V ...
HMM p
D
D
Matrix
pi(a)
pi(X?Y)
13Profile HMMs can be represented as states
connected by transitions
M/D M/D M/D
I I M/D M/D M/D M/D M/D
HBA_human ... V G A . . H A G E
Y ... HBB_human ... V - - . . N V
D E V ... MYG_phyca ... V E A . .
D V A G H ... LGB2_luplu ... F N A
. . N I P K H ... GLB1_glydi ... I
A G a d N G - G V ...
I
HMM p
Matrix
pi(a)
pi(X?Y)
14Find path through two HMMs that maximizes
co-emission probability
HMM q
D
State q State p
M M
M M
M I
M M
M M
D
M M
I
HMM p
Co-emitted sequence
x1 x2 x3 x4 x5 x6
Include Null model maximize
log-sum-of-odds score
Söding, J. (2005) Bioinformatics 21, 951-960.
15HHrep detects repeats by HMM-HMM comparison of
the sequence with itself
repeat 1
repeat 2
repeat 3
repeat 4
repeat 1
repeat 2
repeat 3
repeat 4
The dotplot with suboptimal alignments reveals
internal symmetries
16Outer membrane ? barrels might have evolved by
duplication of a single ?? hairpin
OmpA
but is there an internal symmetry in the
sequences?
17HHrep indeed finds a fourfold sequence symmetry
in OMPs
50
100
150
blue significant alignments
50
OmpA
100
150
ompa_2
18TIM barrels possess approximate structural
symmetry
but up to now it has not been possible to
detect this repeat pattern on the sequence level
19HHrep detects structural repeats in TIMs
1fq0a_1
1fq0a_2
20Did TIM barrels evolve by duplication of a
quarter barrel peptide?
Fourfold symmetry
Eightfold symmetry
HisF
KDPG aldolase
after consistency transformation
same, but lower score threshold
profile-profile dot plot
21HMM-HMM comparison improves upon profile-profile
comparison All-against-all benchmark on SCOP (20
seq. id.)
HMM-HMMpredSS
HMM-HMMSS
HMM-HMMcorr
HMM-HMM
profile-profile
profile-profile
profile-profile
10 rate of false positives
HMM-seq
profile-seq
seq-seq
22The HHpred input page
1. Paste ScbA sequence
2. Select database
3. Submit job
All input parameters are linked to explanations
on help pages
8
ScbA from Steptomyces is involved in regulating
the onset of antibiotics production, but its
function is unknown
23Search results alignment view
Create 3D model
Graphical representation of best database hits
along query sequence
Statistical significance
View template structure
View template alignment
Summary hit list for best database matches
View alignments as histograms
Predicted 2ndary structure (query)
Query sequence (ScbA)
Match quality
Alignments with database sequences (templates)
Template sequence (from database)
Actual 2ndary structure (template)
Interesting region of high similarity
. . .
Predicted 2ndary structure (template)
Six best hits belong to a superfamily of enzymes
from the fatty acid synthesis pathway!
48830
24Histogram view
FabZ
FabA
FabZ
Highly conserved arginine catalytic ?
Highly conserved residues E and Q are catalytic
residues in FabZ / FabA!
25Homology between histones and C-terminal
subdomain in AAA ATPases
RuvB (AAA)
TAFII62
kink
TAFII42
Work in progress, V. Alva Kullanja and M.
Ammelburg et al.
26OmpA
MspA porin
27(No Transcript)
28Ompw_1
Ompw_3
29.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
x
.
.
.
.
x
.
.
.
x
.
.
.
x
.
.
.
x
.
.
.
.
.
.
query
.
.
Elt10-3
.
.
.
.
.
.
.
.
.
.
.
.
.
Elt10
.
x
shaded accepted sequences
.
.
.
.
.
.
.
.
.
30Sequences obtained with HHsenser, clustered with
CLANS
MazE (1mvf)
YjiW
Vir
Archaeal PhoU
PemI / MazE
VagC
AbrB (new, 1yfb)
1mvf
cyano TF
SpoVT
1yfb
PrlF
AbrB
AbrB (1ekt)
MraZ-N
MraZ-C
1n0g
1n0g
M. Coles et al. (2005) Structure 13, 919-928.
MraZ (1n0g)
abrb_1
31Retroactive from Drosophila was identified in a
screen in for chitin-associated defects
wild type
rtv mutant
- The retroactive fly larvae are bloated and show a
characterisitic disarrangement of chitin fibres
in the cuticle - Except for the orthologous genes from D
pseudoobscura and Anopheles, no homologs are
found in the databases - Understanding chitin-related developmental and
metabolic pathways is important for pest control
32- Rtv is membrane-bound and adopts a three-finger
neurotoxin fold - The long fingers carry two exposed aromatic
residues each - These exposed residues are likely to binding
chitin at the surface of epidermal cells
B. Moussian, J. Söding, H. Schwarz, and C.
Nüsslein-Volhard, Dev Dyn 2005
rtv_1
63951
33(No Transcript)
34p5_2
35m
a
Gas1_1
In collaboration with Mart Saarma, Helsinki
Gas1_2
36Outlook
- Toolkit as open-source package
- Continuous integration of the best available
tools - Several new tools planned or in development
- Cluster known folds by sequence
similarity(Galaxy of folds) - Functional subtyping
- PDB remote homology alert
- ? barrel membrane protein prediction
- Repeat detection (database-assisted)
- Expert system
37The Toolkit Team
Andreas Biegert
Michael Remmert
Christian Mayer
Andrei Lupas
Johannes Söding
- Many thanks to
- Tancred Frickey, Markus Gruber, Alex Diemand, and
Pavel Szczesny for contributing tools - Alexander Diemand for systems admin and support
- Members of our group for critical feedback
http//toolkit.tuebingen.mpg.de
38(No Transcript)
39(No Transcript)
40global alignment
query
db match
local alignment
query
db match
- BLAST and PSI-BLAST use a local alignment method
- HHpred can construct both local and global
alignments - Probabilities / E-values more reliable for local
alignment - Global alignment mode useful for making 3D models
and for determination of structural domain
boundaries