Title: Bioinformatics how to
1Bioinformatics how to
- use publicly available free tools to predict
protein structure by comparative modeling
2Proteins are 3D objects with complex shapes
- Over 60,000 protein structures have been
determined, mostly by X-ray crystallography (PDB) - 3D structure of 70 of bacterial and 50 of
human proteins can be predicted (comparative
modeling)
3A predicted model simply illustrates our
assumptions
No assumptions, this is nature telling us how it
is
GNAAAAKKGSEQESVKEFLAKAKEDFLKKWENPA QNTAHLDQFERIKTL
GTGSFGRVMLVKHKETGNH FAMKILDKQKVVKLKQIEHTLNEKRILQAV
NFPF LVKLEYSFKDNSNLYMVMEYVPGGEMFSHLRRIG RFSEPHARFY
AAQIVLTFEYLHSLDLIYRDLKPE NLLIDQQGYIQVTDFGFAKRVKGRT
WTLCGTPEY LAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPF FADQP
IQIYEKIVSGKVRFPSHFSSDLKDLLRNL LQVDLTKRFGNLKDGVNDIK
NHKWFATTDWIAIY QRKVEAPFIPKFKGPGDTSNFDDYEEEEIRVSIN
EKCGKEFSEF
Assumption (protein A is Similar to protein B)
Result (protein A is Similar to protein B)
Sequence
4How do we know that these proteins are similar?
- Well studied protein
- SRRSASHPTYSEMIAAAIRAEKSRGGSSRQSIQKYIKSHYKVGHNADLQI
KLSIRRLLAA
- Unknown protein
- GLLTTKFVSLLQEAKDGVLDLKLAADTLAVRQKRRIYDITNVLEGIGLIE
KKSKNSIQW
similarity
prediction
5How can we make such assumptions?
- Statistical reliability of the prediction
- E-value - the number of hits one can "expect" to
see just by chance when searching a database of a
particular size (closer to zero the better) - Z-score score expressed as a distance from the
mean calculated in standard deviations (the
bigger the better)
6Similar, but not homologous
- phosphoribosyltransferase and viral coat
protein, identity 42, different folds,
different functions - . . . .
. - 99 IRLKSYCNDQSTGDIKVIGGDDLSTLTGKNVLIVEDIIDTGKTMQT
LLSLVRQY.NPKMVKVASLLVKRTPRSVGY 173 - . .
. - 214 VPLKTDANDQ.IGDSLY....SAMTVDDFGVLAVRVVNDHNPTKVT
..SKVRIYMKPKHVRV...WCPRPPRAVPY 279 -
7Different, but homologous
- Histone H5 and transcription factor E2F4,
identity 7, similar fold, similar function (DNA
binding) - PTYSEMIAAAIRAEKSRGGSSRQSIQKYIKSHYKVGHNADLQIKLSIRRL
LAAGVLKQTKGVGASGSFRL
- GLLTTKFVSLLQEAKD-GVLDLKLAADTLA------VRQKRRIYDITNVL
EGIGLIEKKS----KNSIQW
8Steps in comparative modeling
Are there any well characterized proteins similar
to my protein?
Recognition
What is the position-by-position target/template
equivalence
Alignment
What is the detailed 3D structure of my proteins
Modeling
Model analysis
Is my model any good?
9Recognition
- BLAST, PSI-BLAST or PFAM, FFAS, metaserver
(bioinfo) - Name (PDB code) of the template
- Statistical significance of the match (Z-score,
e.value, p.value, points)
10Alignment
- The same tools as in recognition (perhaps with
different parameters), editing by hand - Position by position equivalence table
11Modeling
- Commercial programs
- Accelrys (Insight)
- Tripos (Sybyl)
- Freeware/shareware/servers
- Modeller (Andrej Sali)
- Jackal (Barry Honig)
- SCRWL (Roland Dunbrack)
- SwissModel
12Model quality
- Empirical energy based tools
- PSQS (http//www1.jcsg.org/psqs/psqs.cgi)
- SwissPDB viewer
- Geometric quality
- Procheck, SFCHECK, etc. (http//www.jcsg.org/scrip
ts/prod/validation/sv3.cgi)
13Expectations of comparative modeling
Easy 100-40 sequence id - strong
sequence similarity, strong structure
similarity, obvious function analogy
75
Difficult 40-25 - twilight zone sequence
similarity, increasing structure divergence,
function diversification
50
25
Fold prediction below 25 seq id. no apparent
sequence similarity extreme function divergence
0
14Challenges of comparative modeling
100
80
60
40
20
15Hands-on Activity
- Click below for a hands-on, bioinformatics how
to activity - Go to
- http//bioinformatics.burnham.org/
- Click Structure Biology Course - Protein
Modeling Tutorial Link in the homepage. - OR Go to.
- http//bioinformatics.burnham.org/SSBC/modeling.h
tml