Title: Diapositiva 1
1Transmembrane protein annotation
Rita Casadio
BIOCOMPUTING GROUP Interdepartmental Centre for
Biotechnological Research/Department of
Biology University of Bologna, Italy
2The omic era
Genome Sequencing Projects
Archaea
26 species
Bacteria
286 species
Complete - 21 Assembly - 86 In Progress - 171
Eukaryotic
http//www.ncbi.nlm.nih.gov/
Update March 2006
3The Data Bases of Biological Sequences and
Structures
GenBank 54,584,635 sequences 59,750,386,305
nucleotides
gtBGAL_SULSO BETA-GALACTOSIDASE Sulfolobus
solfataricus. MYSFPNSFRFGWSQAGFQSEMGTPGSEDPNTDWYKW
VHDPENMAAGLVSG DLPENGPGYWGNYKTFHDNAQKMGLKIARLNVEWS
RIFPNPLPRPQNFDE SKQDVTEVEINENELKRLDEYANKDALNHYREIF
KDLKSRGLYFILNMYH WPLPLWLHDPIRVRRGDFTGPSGWLSTRTVYEF
ARFSAYIAWKFDDLVDE YSTMNEPNVVGGLGYVGVKSGFPPGYLSFELS
RRHMYNIIQAHARAYDGI KSVSKKPVGIIYANSSFQPLTDKDMEAVEMA
ENDNRWWFFDAIIRGEITR GNEKIVRDDLKGRLDWIGVNYYTRTVVKRT
EKGYVSLGGYGHGCERNSVS LAGLPTSDFGWEFFPEGLYDVLTKYWNRY
HLYMYVTENGIADDADYQRPY YLVSHVYQVHRAINSGADVRGYLHWSLA
DNYEWASGFSMRFGLLKVDYNT KRLYWRPSALVYREIATNGAITDEIEH
LNSVPPVKPLRH
NR 2,638,494 sequences
847,653,699 residues
SwissProt 211,104 sequences
77,361,893 residues
PDB 35,460 structures
Membrane proteins lt1
Update March 2006
4Why membrane proteins ? So many important
functions.
5 Different architectures
Outer Membrane proteins (all b-Transmembrane
proteins)
Inner Membrane proteins (all a-Transmembrane
proteins)
6Membrane classification and membrane protein
structures
Structural Type of M proteins
All-b
All-a
7 Outer Membrane
Inner Membrane
?-barrel
?-helices
Bilayer
Bacteriorhodopsin (Halobacterium salinarum)
Porin (Rhodobacter capsulatus)
8Functional annotation in silico by homology search
ADH1_SULSO ----------MRAVRLVEIGKP--LSLQEIGVPKPKGP
QVLIKVEAAGVCHSDVHMRQGRFGNLRIVE ADH_CLOBE
----------MKGFAMLGINKLG---WIEKERPVAGSYDAIVRPLAVSPC
TSDIHTVFEGA------- ADH_THEBR ----------MKGFAMLSI
GKVG---WIEKEKPAPGPFDAIVRPLAVAPCTSDIHTVFEGA-------
ADH1_SOLTU MSTTVGQVIRCKAAVAWEAGKP--LVMEEVDVAPPQKM
EVRLKILYTSLCHTDVYFWEAKG------- ADH2_LYCES
MSTTVGQVIRCKAAVAWEAGKP--LVMEEVDVAPPQKMEVRLKILYTSLC
HTDVYFWEAKG------- ADH1_ASPFL ----MSIPEMQWAQVAEQK
GGP--LIYKQIPVPKPGPDEILVKVRYSGVCHTDLHALKGDW-------
Sequence comparison is performed with alignment
programs
Sequence identity ? 30
Similar function
Methods for similarity searches
BLAST, Psi-BLAST (http//www.ncbi.nlm.nih.gov/BLAS
T/)
Altschul et al., (1990) J Mol Biol
215403-410 Altschul et al., (1998) Nucleic Acids
Res. 253389-3402
Pfam (http//pfam.wustl.edu/hmmsearch.shtml)
Bateman et al., (2000) Nucleic Acids Research
28263-266
9Our strategy Annotation by predicting membrane
protein topology
10Predictors of the Topology of Membrane Proteins
11Tools out of machine learning approaches Neural
Networks(NNs) and/or Hidden Markov Models(HMMs)
Testing
Training
NN HMM
General rules
Prediction
12Annotation/prediction ofall-alpha transmembrane
proteins
13Biosapiens network of excellence Annotation of
all-alpha membrane proteins
14Our starting Data Base
UniProt (September 22, 2004) 33,135 unique
human proteins http//www.ebi.ac.uk/integr8/FtpSea
rch.do?orgProteomeID25
In UniProt 4002 sequences are annotated as
Transmembrane (12) 8897 sequences are
annotated as Hypothetical (27)
15Methods
Prediction of the signal peptide
SPepLip Method NNs. Input Single
sequence Fariselli P, Finocchiaro G, Casadio R
(2003) Bioinformatics 192498-2499
Prediction of transmembrane a-helices
MEMSAT Method Dynamic progamming. Input
Sequence profiles Jones DT, Taylor WR, Thornton
JM (1994) Biochemistry 153038-3049
TMHMM2.0 Method HMMs. Input Single
sequence Krogh A, Larsson B, von Heijne G,
Sonnhammer ELL (2001) JMB 305567-580
ENSEMBLE 1.0 Method NNs and HMMs. Input
Sequence profiles Martelli PL, Fariselli P,
Casadio R (2003) Bioinformatics 19I205-I211
ENSEMBLE 1.0 FILTER Filtering procedure for
reducing false positives Martelli PL, Fariselli P
and Casadio R (2003) Bioinformatics 19I205-I211
New methods
TMHMMdomfix Method HMMs. Input Single profiles,
SMART domains Bernsel A, von Heijne G (2005) Prot
Sci 141723-1728
PRODIV_TMHMM Method HMMs. Input Sequence
profiles Viklund H, Eloffson A (2004) Prot.Sci.
131908-1917
in progress.
ENSEMBLE 2.0 Alternative topology assignment 2
preliminary versions
16Performance of the high scoring methods on the
121 high-resolved chains (from PDB)
Correct Topography correct position of TMhelices
along the sequence Correct Topography correct
Topography AND correct orientation with respect
to the membrane plane
17A new annotation server http//pongo.biocomp.unibo
.it/pongo
18A new annotation server http//pongo.biocomp.unibo
.it/pongo
For retrieving results stored in the data base..
19A new annotation server http//pongo.biocomp.unibo
.it/pongo
..and for predicting new sequences
20(No Transcript)
21(No Transcript)
22A new annotation server http//pongo.biocomp.unibo
.it/pongo
23TM Protein Annotation of the human
genome Annotation of theUniProt data base (33,135
sequences)
24- Out of 33135 unique sequences of the
Ensemble35a1 - 19.5 of the sequences are predicted as membrane
proteins by TMHMM2.0 (single sequenced-based)
(19 are predicted by all three predictors) - Prodiv 17 are predicted by all predictors..
- 33.5 and 32.2 are predicted as membrane
proteins by MEMSAT and ENSEMBLE, respectively
(25 are predicted by both predictors). - These results set the lower and upper bound for
the membrane protein content of the human genome
and allow a list of putative membrane proteins
for further applications.
25Distribution of the number of predicted a-helices
26Distribution of predicted TM proteins among the
chromosomes
27(No Transcript)
28Annotation/prediction ofall-beta transmembrane
proteins
29Prediction Server Page of The Biocomputing Group
http//gpcr.biocomp.unibo.it/
30Trample
http//gpcr.biocomp.unibo.it/biodec
31Strand
Helix
SignalPep
Fariselli et al. NAR 33, 2005
32TRAMPLE www.biocomp.unibo.it
33Performance of HMM-B2TMR on 21 high-resolved TM
beta barrel proteins compared to other predictors
34A brand new prediction with Trample
Omp32 anion-selective porin Delftia
acidovorans, 5 Å (2FGR) and 1.45 Å (2FGQ)
Zachariae et al. (2006)
35Rate of false positives for ENSEMBLE (all-alpha)
and B2-TMR (all-beta)
The predictors are tested on 809 globular protein
with sequence identity ? 25 0.5
have at least 1 a-TM helix predicted 5.6
have at least 2 b-TM strand predicted
36A software system for genome annotation
37PROTEOME
HUNTER
Signal peptide
Yes
No
All-a TM
All-a TM
No
All-b TM
38Escherichia coli K12, complete genome
Completed Oct 13, 1998. Total Bases 4,639,221
bp
NCBI (www.ncbi.nlm.nih.gov) Protein coding
genes 4,289 Structural RNAs 115
EcoGene/EcoProt (bmb.med.miami.edu/EcoGene) Prote
in coding genes 4,173 Structural RNAs 120
39Classification of the non annotated proteins 1253
NON ANNOTATED PROTEINS
40Experimental validation on 8 new outer membrane
proteins (Protein Science,March 2006/Von
Heijene-Casadio Labs)
Outer Membrane Fraction
41Predicting globular, inner and outer membrane
proteins in genomes of Gram-negative bacteria
with Hunter
- the number of new proteins predicted in the class
with Hunter, out of the non-annotated region - Lists available at www.biocomp.unibo.it
42PROTEOME
MANHUNTER
Subcellular Localization/SPEP
Yes
No
All-a TM
All-a TM
Human genome annotation
No
No
All-b TM
All-b TM
No
43Some preliminar results Distribution of the
different protein structures among the
chromosomes in Homo sapiens
443D structure prediction of proteins
New folds
Existing folds
Membrane proteins
Building by homology
Ab initio prediction
Threading/ fold recognition
0 10 20 30 40 50 60 70 80
90 100
Homology ()
45- On the basis of predicted TM topology it may be
possible to select a template for 3D structure
prediction, even when sequence alignment is lt30
-Modeling the 3D structure of all-alpha membrane
proteins
-Modeling the 3D structure of eukaryotic ? barrel
proteins (VDAC) on prokaryotic porins
46Some examples..
A VDAC in Neurospora
A carrier in mitochondria
Casadio et al., FEBS Lett (2002)
A VDAC in drosophila
OGC_BOVIN (20,1OKC)
Morozzo Della Rocca et al., JMBiol, 2005
pori_drome (15,2OMF)
Aiello et al., JBC (2004)
47The Biocomputing Group of the University of
Bologna
Remo
Piero
Gianluca
Emidio
Pier Luigi
Ludovica
Paola
Ivan
Rita
Lisa
Alberto
48(No Transcript)