Protein Subcellular Localization - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Protein Subcellular Localization

Description:

Co-localization of proteins of related function. Valuable ... (modified from Voet & Voet, Biochemistry; Wiley-VCH 1992) 5. Yet more categories... Chloroplast ... – PowerPoint PPT presentation

Number of Views:410
Avg rating:3.0/5.0
Slides: 30
Provided by: steph217
Category:

less

Transcript and Presenter's Notes

Title: Protein Subcellular Localization


1
Protein Subcellular Localization
  • Shan Sundararaj
  • ss23_at_ualberta.ca
  • June 24, 2005

2
Why is Localization Important?
  • Function is dependent on context
  • Co-localization of proteins of related function
  • Valuable annotation for new proteins
  • Design of proteins with specific targets
  • Drug targeting
  • Accessibility
  • Membrane-bound gt cytoplasmic gt nuclear

3
Bacteria
Gram Positive (3-4 states)
Gram Negative (5 states)
Extracellular
cytoplasm
cytoplasm
periplasm
cytoplasmic membrane
cytoplasmic membrane
cell wall
outer membrane
Extracellular
4
Eukaryotic Cell
  • Compartmentalized
  • Diverse range of specific organelles
  • Plants chloroplasts, chromoplasts, other
    plastids
  • Muscle sarcoplasm
  • Various endosomes, vesicles

(modified from Voet Voet, Biochemistry
Wiley-VCH 1992)
5
Yet more categories
Chloroplast
Mitochondrion
Yeast specific
6
Localization signaling
  • Proteins must have intrinsic signals for their
    localization a cellular address
  • E.g. N-terminal signal sequences

321 Nuclear Inner Membrane Lane Nucleus,
Intracellular county Eukaryotic Cell CL34V3M3
7
Localization signaling
  • Some signals are easily recognizable
  • Signal peptidase cleavage site, consensus
    sequence for secretion ? extracellular
  • Address printed neatly, postal code
  • Others are difficult to understand
  • Outer membrane b-barrel proteins, no consensus
    sequence, few sequence restraints
  • Sloppy address, different kind of code that we
    dont understand yet

8
Experimental determination
  • Since dont fully understand the language of
    proteins, our knowledge must often come from
    inference
  • Predicting localization is like sorting mail
    based only on examples of where some mail has
    gone before
  • Important to have good data sets of proteins with
    known localizations

9
Datasets
  • Organelle DB (http//organelledb.lsi.umich.edu/)
  • 25095 eukaryotic proteins from subcellular
    proteomics studies
  • DBSubLoc (http//www.bioinfo.tsinghua.edu.cn/guot
    ao/download.html)
  • Combines Swiss-Prot and PIR annotations (64051
    proteins)
  • PSORTdb (http//db.psort.org/)
  • Bacterial. 1591 Gram ve proteins, 574 Gram ve
    proteins
  • SignalP (http//www.cbs.dtu.dk/ftp/signalp/)
  • 940 plant and 2738 human proteins
  • Yeast Protein Localization Server
    (http//bioinfo.mbb.yale.edu/genome/localize/)
  • 2956 yeast proteins

10
Experimental Methods
  • Electron microscopy
  • GFP tagging / fluorescence microscopy
  • Subcellular fractionation detection
  • Western blotting
  • Mass spectrometry

11
Fluorescence Microscopy
  • Tag gene at either 3 or 5 end
  • Using GFP (or RFP, YFP, CFP, etc.)
  • Using an epitope tag and a fluorescently labeled
    antibody
  • Careful of removing signal peptides!
  • Also use a subcellular-specific marker or stain
  • Visualize with confocal fluorescence microscopy
    and analyze images for co-localization

12
Confirmation by Co-localization (GFP/RFP merging)
13
High-Throughput Experiments
  • Kumar et al., Genes Dev 2002, 16707-719
  • Epitope-tagged gt60 of ORFs, visualized with
    fluorescently labeled antibody
  • 2744 localizations (44 of S. cerevisiae genes)
  • Huh et al., Nature 2003, 425686-691
  • GFP tagged all ORFs, RFP tagged compartments
  • 4156 localizations (75 of S. cerevisiae genes)
  • Combined, now nearly 87 of yeast proteins have a
    localization annotation

14
Subcellular Fractionation
  • Fractionate cells into organelles and other
    compartments using differential solubilization
    and centrifugation
  • Once fractionated, take compartment of interest
    and separate proteins
  • 2D gel or chromatography
  • Identify separated proteins
  • Mass spectrometry for high-throughput
  • Western blot for specific proteins

15
High-Throughput Experiments
  • Lopez-Campistrous et al., Mol Cell Proteomics,
    2005
  • Subcellular fractionation of E. coli, 2D-gel
    separation, MS-MS
  • 2,160 localizations to cytoplasm, inner membrane,
    periplasm, and outer membrane

16
Predictions from known data
  • Enough experimental data exists to build highly
    accurate computational predictors of localization

17
Computational methods for predicting localization
  • Motif based methods
  • Expert rule based methods
  • Artificial Neural Nets (ANN)
  • Hidden Markov Models (HMM)
  • Naïve Bayes (NB)
  • Support Vector Machines (SVM)
  • Combination of above methods

18
Predictions from known data
  • Different information used for predictions
  • Sequence motifs
  • N-terminal secretory signal peptides,
    mitochondrial targeting peptide, chloroplast
    transit peptide
  • C-terminal peroxisome import signal, ER
    retention signal
  • Mid-sequence nuclear localization signals
  • Amino acid composition
  • AA frequency, dipeptide composition.
  • Homology
  • - Sequence comparison to proteins of known
    localization

19
The PSORT Family
  • PSORT plant sequences
  • Expert rule-based system
  • PSORT II eukaryotic sequences
  • Probabilistic tree
  • iPSORT eukaryotic N-term. signal sequences
  • ANN
  • PSORT-B bacterial sequences
  • WoLF PSORT eukaryotic
  • Updated (2005) version of PSORTII

20
PSORT-Bhttp//www.psort.org/psortb/
21
PSORT-B - methods
  • Signal peptides Non-cytoplasmic
  • AA composition/patterns
  • SVMs trained for each location vs. all other
    locations
  • Transmembrane helices Inner membrane
  • HMMTOP
  • PROSITE motifs all localizations
  • Outer membrane motifs Outer membrane
  • Homology to proteins of known localization
  • SCL-BLAST

Integration with a Bayesian network
22
PSORT-B results
  • SeqID Unannotated_bacterial2
  • Analysis Report
  • CMSVM- Unknown No details
  • CytoSVM- Cytoplasmic No details
  • ECSVM- Unknown No details
  • HMMTOP- Unknown No internal
    helices found
  • Motif- Unknown No motifs
    found
  • OMPMotif- Unknown No motifs
    found
  • OMSVM- Unknown No details
  • PPSVM- Unknown No details
  • Profile- Unknown No matches
    to profiles found
  • SCL-BLAST- Cytoplasmic matched
    118438 Cyto. protein
  • SCL-BLASTe- Unknown No matches
    against database
  • Signal- Unknown No signal
    peptide detected
  • Localization Scores
  • Cytoplasmic 9.97
  • CytoplasmicMembrane 0.01
  • Periplasmic 0.01
  • OuterMembrane 0.00

23
Proteome Analysthttp//www.cs.ualberta.ca/bioinf
o/PA/Sub/
24
Proteome Analyst Feature Extraction
  • TOP 3 Homologs
  • ? AFP1_ARATH
  • AFP1_BRANA
  • AFP2_ARATH
  • KW
  • Plant defense Fungicide
  • Signal Multigene Family
  • Pyrrolidone carboxylic acid
  • DR InterPro
  • IPR002118 IPR003614
  • CC Subcellular location
  • Secreted
  • Token Set

Plant defense Fungicide Signal Multigene
Family Pyrrolidone carboxylic acid IPR002118
IPR003614 Secreted
25
PASub - Results
Contribution of each token
Log scale
Features
26
PASub - Interpretation
  • Bars represent -log probability, so a little
    difference is a lot!
  • Naïve Bayes chosen as classifier because of
    transparency of method
  • Each token gives a probability that can be summed
    and shown graphically
  • Neural network actually has higher recall
  • Can change token set, ask to explain with
    different features

27
Save Time Pre-computed Genomes
  • PSORTDB
  • http//db.psort.org
  • Browse, search, BLAST, download
  • 103 Gram ve bacteria, 45 Gram ve bacteria
  • Proteome Analyst (PA-GOSUB)
  • http//www.cs.ualberta.ca/bioinfo/PA/GOSUB/
  • Browse, search, BLAST, download
  • 15 bacterial and 8 eukaryotic

28
Summary
  • The data set of experimentally validated protein
    localizations is ever increasing, especially with
    high-throughput methods
  • Many localization signals are still unknown,
    except for simple sequence motifs
  • Prediction methods are very accurate, especially
    for bacteria and using machine learning
    techniques, but many motifs and other signals
    have yet to be discovered

29
Future Directions
  • Predict proteins with multiple localization
    sites, or with localization that changes over
    time
  • Integrate structural information into
    localization prediction
Write a Comment
User Comments (0)
About PowerShow.com