Shan Sundararaj - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Shan Sundararaj

Description:

Co-localization of proteins of related function. Valuable annotation for ... 'for discoveries concerning the structural and functional organization of the cell' ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 48
Provided by: stephe78
Category:
Tags: shan | sundararaj | theca

less

Transcript and Presenter's Notes

Title: Shan Sundararaj


1
Protein Subcellular Localization
  • Shan Sundararaj
  • University of Alberta
  • Edmonton, AB
  • ss23_at_ualberta.ca

2
Why is Localization Important?
  • Function is dependent on context
  • Co-localization of proteins of related function
  • Valuable annotation for new proteins
  • Design of proteins with specific targets
  • Drug targeting
  • Accessibility
  • Membrane-bound gt cytoplasmic gt nuclear

3
Why is Localization Important?
  • 1974 Nobel Prize in Physiology/Medicine
  • George Palade
  • for discoveries concerning the structural and
    functional organization of the cell
  • 1999 Nobel Prize in Physiology/Medicine
  • Günter Blobel
  • for the discovery that proteins have intrinsic
    signals that govern their transport and
    localization in the cell

4
Bacteria
Gram Positive (3-4 states)
Gram Negative (5 states)
Extracellular
cytoplasm
cytoplasm
periplasm
cytoplasmic membrane
cytoplasmic membrane
cell wall
outer membrane
Extracellular
5
Eukaryotic Cell
  • Compartmentalized
  • Diverse range of specific organelles
  • Plants chloroplasts, chromoplasts, other
    plastids
  • Muscle sarcoplasm
  • Various endosomes, vesicles

(modified from Voet Voet, Biochemystry
Wiley-VCH 1992)
6
Yet more categories
Chloroplast
Mitochondrion
Yeast specific
7
Level of Annotation
  • As simple as two states
  • membrane protein vs. non-membrane protein
  • secreted protein vs. non-secreted protein
  • Gross compartments
  • cytoplasm, inner membrane, periplasm, cell wall,
    outer membrane, extracellular
  • nucleus, mitochondria, peroxisome, vacuole
  • Fine compartments
  • Mitochondrial matrix, bud neck, spindle pole
  • Any of 1425 GO cellular compartments

8
Localization signaling
  • Proteins must have intrinsic signals for their
    localization a cellular address
  • E.g. N-terminal signal sequences

321 Nuclear Inner Membrane Lane Nucleus,
Intracellular county Eukaryotic Cell CL34V3M3
9
Localization signaling
  • Some signals are easily recognizable
  • Signal peptidase cleavage site, consensus
    sequence for secretion ? extracellular
  • Address printed neatly, postal code
  • Others are difficult to understand
  • Outer membrane b-barrel proteins, no consensus
    sequence, few sequence restraints
  • Sloppy address, different kind of code that we
    dont understand yet

10
Experimental determination
  • Since dont fully understand the language of
    proteins, our knowledge must often come from
    inference
  • Predicting localization is like sorting mail
    based only on examples of where some mail has
    gone before
  • Important to have good data sets of proteins with
    known localizations

11
Datasets
  • Organelle_DB (http//organelledb.lsi.umich.edu/)
  • 25095 eukaryotic proteins from subcellular
    proteomics studies
  • DBSubLoc (http//www.bioinfo.tsinghua.edu.cn/guot
    ao/download.html)
  • Combines SwissProt and PIR annotations (64051
    proteins)
  • PSORTDB (http//db.psort.org/)
  • Bacterial. 1591 Gram ve proteins, 574 Gram ve
    proteins
  • SignalP (http//www.cbs.dtu.dk/ftp/signalp/)
  • 940 plant and 2738 human proteins
  • YPL (http//bioinfo.mbb.yale.edu/genome/localize/)
  • 2956 yeast proteins

12
Experimental Methods
  • Electron microscopy
  • GFP tagging / fluorescence microscopy
  • Subcellular fractionation detection
  • Western blotting
  • Mass spectrometry

13
Electron Microscopy
  • Highest resolution, can work at the level of a
    single protein complex
  • Immunolabel proteins of interest in conjunction
    with colloidal gold, and visualize
  • Combined with electron tomography, can even
    visualize unlabeled complexes

(from Koster and Klumperman, Nat Rev Mol Cell
Biol, Sep 2003, S6-10)
14
Fluorescence Microscopy
  • Tag gene at either 3 or 5 end
  • Using GFP (or RFP, YFP, CFP, etc.)
  • Using an epitope tag and a fluorescently labeled
    antibody
  • Careful of removing signal peptides!
  • Also use a subcellular-specific marker or stain
  • Visualize with confocal fluorescence microscopy
    and analyze images for co-localization

15
Specific co-labeling (yeast)
  • Early GolgiCop1
  • Endosome Snf7
  • ER to Golgi Sec13
  • Golgi apparatus Anp1
  • Late Golgi Chc1
  • Lipid particle Erg6
  • Mitochondrion MitoTracker
  • Nucleus DAPI
  • Nucleolus Sik1
  • Nuclear periphery Nic96
  • Peroxisome Pex3
  • Vacuole FM4-64

Nuclear-specific DAPI staining
16
Subcellular Fractionation
transfer supernatant
transfer supernatant
transfer supernatant
1000 g
10,000 g
100,000 g
Pellet microsomal Fraction (ER,
golgi, lysosomes, peroxisomes)
Pellet unbroken cells nuclei chloroplast
Pellet mitochondria
Super. Cytosol, Soluble enzymes
tissue homogenate
17
Detergent Fractionation
Cells
Extraction with Digitonin/EDTA
supernatant
pellet
Extraction with TritonX100/EDTA
Cytoplasmic Fraction
Extraction with SDS/EDTA
Organelle Membranes
Nuclear
Cytoskeletal (in SDS)
18
Fractionation ? Identification
  • Once fractionated, take compartment of interest
    and separate proteins
  • 2D gel or chromatography
  • Identify separated proteins
  • Mass spectrometry for high-throughput
  • Western blot for specific proteins

19
Fractionation in proteomics
20
High-Throughput Experiments
  • Kumar et al., Genes Dev 2002, 16707-719
  • Epitope-tagged gt60 of ORFs, visualized with
    fluorescently labeled antibody
  • 2744 localizations (44 of S. cerevisiae genes)
  • Huh et al., Nature 2003, 425686-691
  • GFP tagged all ORFs, RFP tagged compartments
  • 4156 localizations (75 of S. cerevisiae genes)
  • Combined, now nearly 87 of yeast proteins have a
    localization annotation

21
High-Throughput Experiments
  • Lopez-Campistrous et al, Mol Cell Proteomics,
    2005
  • Subcellular fractionation of E. coli, 2D-gel
    separation, MS-MS
  • 2,160 localizations to cytoplasm, inner membrane,
    periplasm, and outer membrane

22
Predictions from known data
  • Enough experimental data exists to build highly
    accurate computational predictors of localization

23
Predictions from known data
  • Different information used for predictions
  • Sequence motifs
  • N-terminal secretory signal peptides,
    mitochondrial targeting peptide, chloroplast
    transit peptide
  • C-terminal peroxisome import signal, ER
    retention signal
  • Mid-sequence nuclear localization signals
  • Amino acid composition
  • AA frequency, dipeptide composition.
  • Homology
  • - Sequence comparison to proteins of known
    localization

24
N-terminal signal peptides
  • Common structure of signal peptides
  • positively charged n-region, followed by a
    hydrophobic h-region and a neutral but polar
    c-region.

25
N-terminal signal peptides
26
More work to do
  • Multiple bacterial secretion pathways
  • C-terminal signal peptides
  • Internal mitochondrial transit peptides
  • Structural aspects of targeting
  • Gene re-localization
  • Still a lot to discover in how signaling works!

27
Computational methods for predicting localization
  • Expert rule based methods
  • Artificial Neural Nets (ANN)
  • Hidden Markov Models (HMM)
  • Naïve Bayes (NB)
  • Support Vector Machines (SVM)
  • Combination of above methods

28
Naïve Bayes
  • Assumption
  • Features are conditionally
  • independent, given class labels
  • Structure
  • 1 level tree
  • Class labels root
  • Features leaf nodes
  • Prediction
  • class(f) argmax P(Cc)P(Ff Cc)
  • c

29
Artificial Neural Network
  • Excellent for modeling non-linear input/output
    relationships
  • Robust to noise in training data
  • Widely used in bioinformatics

30
Support Vector Machines
  • Input vectors are separated into positive vs.
    negative instance
  • Map to new feature space
  • Find hyperplane that best separates the two
    classes by distance

31
Evaluating Predictors - Precision
Predicted
True
  • of proteins correctly labeled as cyt divided
    by the total of proteins labeled as cyt
  • How often the label is correct
  • If there are 90 proteins correctly labeled as
    cyt, and 10 proteins incorrectly labeled as
    cyt, then the precision is 90/100 0.90.

32
Evaluating Predictors - Sensitivity
Predicted
True
  • of proteins correctly labeled as cytoplasmic
    divided by the total of proteins that are
    cytoplasmic
  • How many of the true results were retrieved
    (also called recall or accuracy)

33
Predictions from known data
  • Different information used for predictions
  • Sequence motifs
  • N-terminal secretory signal peptides,
    mitochondrial targeting peptide, chloroplast
    transit peptide
  • C-terminal peroxisome import signal, ER
    retention signal
  • Mid-sequence nuclear localization signals
  • Amino acid composition
  • AA frequency, dipeptide composition,
    hydrophobicity
  • Homology
  • - Sequence comparison to proteins of known
    localization

34
TargetP, SignalP, Phttp//www.cbs.dtu.dk/service
s/
  • Sequence-based methods
  • TargetP (85-90 recall)
  • Predicts mitochondria/chloroplast/secreted
  • Contains SignalP and ChloroP
  • LipoP
  • lipoproteins and signal peptides in Gram negative
    bacteria
  • SecretomeP
  • non-classical secretion in eukaryotes

35
SignalP result
  • Common structure of signal peptides
  • positively charged n-region, followed by a
    hydrophobic h-region and a neutral but polar
    c-region.

Cleavage site
Prediction Signal peptide Signal peptide
probability 0.945 Signal anchor probability
0.000 Max cleavage site probability 0.723
between pos. 28 and 29
36
Organellar Prediction
  • Predotar (http//www.inra.fr/predotar/) (80
    recall)
  • Mitochondrial and plastid sequences N-terminal
    sequences
  • MitoPred (http//mitopred.sdsc.edu/) (82 recall)
  • Mitochondrial PFAM domains, AA composition
  • MitoProteome (http//www.mitoproteome.org/)
  • Database of experimentally predicted human
    mitochondrial
  • MitoP (http//ihg.gsf.de/mitop2/)
  • Combines data from multiple experimental and
    computational sources to give a consensus score
    for each mitochondrial protein in yeast and
    human

37
The PSORT Family
  • PSORT plant sequences
  • Expert rule-based system
  • PSORT II eukaryotic sequences
  • Probabilistic tree
  • iPSORT eukaryotic N-term. signal sequences
  • ANN
  • PSORT-B bacterial sequences
  • WoLF PSORT eukaryotic
  • Updated (2005) version of PSORTII

38
PSORT-Bhttp//www.psort.org/psortb/
39
PSORT-B - methods
  • Signal peptides Non-cytoplasmic
  • AA composition/patterns
  • SVMs trained for each location vs. all other
    locations
  • Transmembrane helices Inner membrane
  • HMMTOP
  • PROSITE motifs all localizations
  • Outer membrane motifs Outer membrane
  • Homology to proteins of known localization
  • SCL-BLAST

Integration with a Bayesian network
40
PSORT-B results
  • SeqID Unannotated_bacterial2
  • Analysis Report
  • CMSVM- Unknown No details
  • CytoSVM- Cytoplasmic No details
  • ECSVM- Unknown No details
  • HMMTOP- Unknown No internal
    helices found
  • Motif- Unknown No motifs
    found
  • OMPMotif- Unknown No motifs
    found
  • OMSVM- Unknown No details
  • PPSVM- Unknown No details
  • Profile- Unknown No matches
    to profiles found
  • SCL-BLAST- Cytoplasmic matched
    118438 Cyto. protein
  • SCL-BLASTe- Unknown No matches
    against database
  • Signal- Unknown No signal
    peptide detected
  • Localization Scores
  • Cytoplasmic 9.97
  • CytoplasmicMembrane 0.01
  • Periplasmic 0.01
  • OuterMembrane 0.00

41
Proteome Analysthttp//www.cs.ualberta.ca/bioinf
o/PA/Sub/
42
Proteome Analyst - Method
43
Proteome Analyst - Feature Extraction
44
Proteome Analyst Feature Extraction
  • TOP 3 Homologs
  • ? AFP1_ARATH
  • AFP1_BRANA
  • AFP2_ARATH
  • KW
  • Plant defense Fungicide
  • Signal Multigene Family
  • Pyrrolidone carboxylic acid
  • DR InterPro
  • IPR002118 IPR003614
  • CC Subcellular location
  • Secreted
  • Token Set

Plant defense Fungicide Signal Multigene
Family Pyrrolidone carboxylic acid IPR002118
IPR003614 Secreted
45
PASub - Results
Contribution of each token
Log scale
Features
46
PASub - Interpretation
  • Bars represent -log probability, so a little
    difference is a lot!
  • Naïve Bayes chosen as classifier because of
    transparency of method
  • Each token gives a probability that can be summed
    and shown graphically
  • Neural network actually has higher recall
  • Can change token set, ask to explain with
    different features

47
Save Time Pre-computed Genomes
  • PSORTDB
  • http//db.psort.org
  • Browse, search, BLAST, download
  • 103 Gram ve bacteria, 45 Gram ve bacteria
  • Proteome Analyst (PA-GOSUB)
  • http//www.cs.ualberta.ca/bioinfo/PA/GOSUB/
  • Browse, search, BLAST, download
  • 15 bacterial and 8 eukaryotic
Write a Comment
User Comments (0)
About PowerShow.com