Title: Analysis and prediction of protein subcellular localization for Gramnegative bacteria
1Analysis and prediction of protein subcellular
localization for Gram-negative bacteria
- Jennifer Gardy
- Dept. of Molecular Biology Biochemistry
- Simon Fraser University
- Burnaby, B.C.
2Gram-negative Subcellular Localization
3PSORT-B www.psort.org/psortb
- Web-based subcellular localization prediction
tool - Analyzes 6 biological features using 6 modules
- More comprehensive than existing tools
- Score for each of 5 primary Gram -ve localization
sites - PSORT I does not predict extracellular proteins
- Designed for high precision (97) (specificity,
) - PSORT Is specificity measured at 59
- Trained and tested using a dataset of proteins of
experimentally-verified subcellular localization - Constructed manually through literature review
- Largest dataset of its kind
- Freely available at the PSORT-B site
4PSORT-B 101
Signal peptides Non-cytoplasmic Amino acid
composition Cytoplasmic Transmembrane helices
Inner membrane PROSITE motifs All
localizations Outer membrane motifs Outer
membrane Homology to proteins of known
localization All localizations
5Understanding The Results
- Output available in several formats
- NCBI Genomes have been pre-computed
- www.psort.org/psortb/genomes
6Current Limitations
- Sensitivity is not emphasized in this version
- Will not always get a prediction
- Examine your results carefully!
- Proteins at multiple localization sites
- Flagged in comments, score distribution
- Certain classes difficult to identify
- Inner membrane with 1-2 helices
- Extracellular
- Lipoproteins not identified in this version
- Trained primarily on proteobacteria
- Reduced predictive ability for bacteria with
atypical cell walls - Gram-negative bacteria only
- Use PSORT I for Gram-positives
7Insights Gained To Date
- Localization is an evolutionarily conserved trait
- SCL-BLAST (specificity of 96.7, sensitivity of
60.4) - E-value cutoff of 10e-10
- Length restriction HSP 80-120 length of query
to avoid matches to single domains - Identified motifs characteristic of outer
membrane proteins through a data mining approach - 279 sequences frequent in OMPs, infrequent in
non-OMPs - Typically 6 residues long, may occur in
combinations - Used in OMP classifier
- PSORT-B v.1.1 (3 motifs) spec. 100, sens. 24
- SVM approach (for next version) spec. 98
sens. 81 - Motifs map primarily to periplasmic turn regions
of known 3D structures - May reflect importance of periplasmic turns in a
transmembrane beta-barrel structure vs. other
similar non-membrane barrel structures
8Future Directions
- Development of a Gram-positive version
- Dataset being constructed (Dr. S. Rey, SFU)
- Increasing existing Gram-negative dataset
- Literature review, text mining
- Improvement of Gram-negative modules
- New computational techniques
- New biological information
9Still Curious?
- www.psort.org/psortb - Documentation, Users
Guide, datasets motifs used in the program,
many other subcellular localization prediction
resources - PSORT-B is described in
- J.L. Gardy et al (2003). PSORT-B improving
protein subcellular localization prediction for
Gram-negative bacteria, Nucleic Acids Research
31(13)3613-17 - The data mining approach to OMP motif discovery
is described in - She, R, Chen, F., Wang, K., Ester, M., Gardy,
J.L, and F.S.L. Brinkman (2003). Frequent
Subsequence-Based Prediction of Outer Membrane
Proteins. 9th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining.
10Acknowledgements
Initial PSORT-B Development Fiona S.L. Brinkman
Cory Spencer Martin Ester Ke Wang Gábor E.
Tusnády István Simon Katalin deFays Christophe
Lambert Sujun Hua Kenta Nakai Ongoing PSORT-B
Work Sébastien Rey (SFU) Matt Laird
(SFU) Also Bob Hancock (UBC) Oliver Schulte
(SFU) Søren Brunak (CBS) Gunnar von Heijne
(Stockholm) Funding Natural Sciences
Engineering Research Council
www.pathogenomics.sfu.ca/brinkman