Experimental

About This Presentation

Transcript and Presenter's Notes

Title: Experimental

1
Experimental Bioinformatic Tools for Proteomics

Steve Oliver
Professor of Genomics
Faculty of Life Sciences
The University of Manchester
http//www.cogeme.man.ac.uk
http//www.bioinf.man.ac.uk

2
Functional Genomics
Level of Analysis Definition Status Method of Analysis
Genome Complete set of genes of an organism or its organelles. Context-independent (modifications to the yeast genome may be made with exquisite precision. Systematic DNA sequencing.
Transcriptome Complete set of mRNA molecules present in a cell, tissue or organ. Context-dependent (the complement of mRNAs varies with changes in physiology, development or pathology. Hybridisation arrays. SAGE High-throughput Northern analysis.
Proteome Complete set of protein molecules present in a cell, tissue or organ. Context-dependent. 2-D gel electrophoresis. Peptide mass fingerprinting. Two-hybrid analysis.
Metabolome Complete set of metabolites (low molecular weight intermediates) present in a cell, tissue or organ. Context-dependent. Infra-red spectroscopy. Mass spectometry. Nuclear magnetic resonance spectometry.

3
GENOME
TRANSCRIPTOME
PROTEOME
METABOLOME
4
Proteomics

Separation
Identification
Quantitation
Bioinformatics

5
Complex mixture analysis
genome
knowledge prediction
peptide mass database
virtual proteome
post-translational modification
Bioinformatics Identification
real proteome
separation methods
2D-gels, functional separations, n-dimensional chr
omatography
digest
complex peptide map fingerprint
complex mixtures subsets
digest
simple peptide map fingerprint
simple mixtures single proteins
6
(No Transcript)
7
Peptide mass fingerprinting
denature
KETAAAKFERQHMDSSTSAASSSNYCNQMMKSRNLTKDRC LPVNTFVHE
SLADVQAVCSQKNVACKNGQTNCYQSYSTMS ITDCRETGSSKYPNCAYK
TTQANKHIIVACEGNPYVPVHF DASV
digest (trypsin)
KETAAAK FER QHMDSSTSAASSSNYCNQMMK SR
NLTK DR CLPVNTFVHESLADVQAVCSQK
NVACK NGQTNCYQSYSTMSITDCR ETGSSK
YPNCAYKTTQANK
HIIVACEGNPYVPVHFDASV
m1 m2
m3 m4 m5 m6
m7
m8 m9
m10 m11
m12
mass spectrometry
m12
m11
m7
m9
m10
abundance
m1
mass
8
Proteomic applications

Quantitative Proteomics
Expression proteomics
protein levels under different conditions/times
Qualitative Proteomics
Identification proteomics
proteinprotein interactions
post-translational modifications

9
A MASS SPECTROMETER MEASURES THE MW.
...A MS ANALYSIS GIVES THE
MASS-TO-CHARGE RATIO (m/z) FOR IONSIN GAS PHASE.
Brancia FL, Trieste, 12/02/2004
10
What is a mass spectrometer...?
Brancia FL, Trieste, 12/02/2004
11
Pumping system
vacuum
ION SOURCE (ion generation)
ANALYZER (mass analysis)
Sample introduction
Detector
Data Processing
Brancia FL , Trieste, 12/02/2004
12
Various ionisation methods

Electron impact ionisation (1919 A.J. Dempster)
Chemical Ionisation CI
Fast atomic bombardment FAB (1981 M. Barber)
Matrix-assisted laser desorption ionisation MALDI
(1988 K. Tanaka, M. Karas F. Hillenkamp)
Electrospray ES (1985, J. Fenn)

Brancia FL, Trieste, 12/02/2004
13
Soft Ionisation Techniques

Soft refers to the low amount of energy
imparted into the analyte during ionisation. Too
much internal energy will result in
fragmentation. Soft ionisation techniques form
intact molecular or pseudo-molecular (MH) ions.
Matrix-assisted laser desorption
ionisation (MALDI)
Electrospray (ES)

Brancia FL, Trieste, 12/02/2004
14
Brancia FL , Trieste, 12/02/2004
15
Electrospray (ES)
Brancia FL, Trieste, 12/02/2004
16
Brancia FL, Trieste, 12/02/2004
17
The principal outcome of the electrospray process
is the transfer of analyte species, generally
ionised in condensed phase, into the gas phase as
isolated entities

HV

Aerosol of charged droplets
Gaskell SJ Jounal of Mass Spectrometry 1997
Brancia FL, Trieste, 12/02/2004
18
ES spectrum of Rho protein
Rho Protein 47004.33 Da
M56H56
M50H50
Courtesy of Dr Matt Openshaw
Brancia FL, Trieste, 12/02/2004
19
Electrospray (ES)M56H56 840.3
m/zTherefore, M 840.3 x 56
56 47000.8 DaDeconvolution Takes all
the multiply charged ions and converts them into
a spectrum on a mass (Da) scale i.e. works out
the molecular weight is most likely to be.
Brancia FL, Trieste, 12/02/2004
20
ES spectrum after deconvolution
47004.0 Da
Brancia FL, Trieste, 12/02/2004
21
Advantages

Production of molecular ions from solution
The ease of coupling with separation techniques
(micro LC-MS/MSMS, nano LC-MS/MSMS)
Production of multiply charged ions

Brancia FL, Trieste, 12/02/2004
22
Matrix Assisted Laser Desorption IonisationMALDI

Time-of-Flight

Brancia FL, Trieste, 12/02/2004
23
Matrix assisted laser desorption ionisation
(MALDI)
?-cyano-4-hydroxy cinnamic acid (CHCA)
2,5-dihydroxybenzoic acid (DHB)
Trans-3,5-dimethoxy-4- hydroxy cinnamic acid
(sinapinic acid SA)
Typically used with a nitrogen laser (337 nm)
Brancia FL, Trieste, 12/02/2004
24
MALDI is an efficient desorption ionisation
technique for producing gaseous ions from a solid
sample by laser pulses
MH
Brancia FL, Trieste, 12/02/2004
25
Matrix Assisted Laser Desorption/Ionisation
(MALDI)Unlike ES, MALDI forms predominantly
singly charged ions e.g. MH or adducts
(sodium MNa or potassium MK) Sodium
23 amu Potassium 39 amu
MH
MNa
22 m/z
MK
38 m/z
Brancia FL, Trieste, 12/02/2004
26
Why is the matrix so important?

Matrix is necessary to dilute and disperse the
analyte
It functions as energy mediator for ionising the
analyte itself or other neutral molecule
It forms an activated state produced by photo
ionisation

Brancia FL, Trieste, 12/02/2004
27
Advantages

MALDI primarily creates singly charged ions
MH
Less sensitive to contaminants
Sensitivity at femtomole level
High throughput analysis

Brancia FL, Trieste, 12/02/2004
28
Time-of-flight (ToF) mass spectrometer
mv2/2 zV
t2m/z(d2/2V)
Brancia FL, Trieste, 12/02/2004
29
Reflectron-time of flight mass analyser
Brancia FL, Trieste, 12/02/2004
30
MALDI ESI
Sensitivity femtomole 10-15 M/?l
(...attomole 10-18 M)
Simplicity very easy training required
70 to 650 k 120 to 650 k
Speed (high throughput) 104/day
dynamic system
Selectivity (resolution) gt5000
Structural information MSn
MSn
Software ...evaluation in
progress.
Brancia FL, Trieste, 12/02/2004
31
Structural information can be achieved by tandem
mass spectrometry
Brancia FL, Trieste, 12/02/2004
32
The tandem mass spectrometry experiment
Brancia FL, Trieste, 12/02/2004
33
Brancia FL, Trieste, 12/02/2004
34

PROBLEMS WITH CLASSICAL
PROTEOME ANALYSIS
Not comprehensive
2. Not high-throughput
3. Destroys protein-protein interactions
that provide important clues to function

35
(No Transcript)
36

Multidimensional protein identification
technology (MudPIT)
Washburn MP, et al Nat Biotechnol 2001,
19242-247.

Reverse Phase
SCX
Load complete digest of sample
Develop with gradient and spray directly onto
MSMS
MS/MS
Identified 1500 proteins from yeast including
lower abundance species and membrane
proteins 2415 (46) of Plasmodium genome
identified in all 4 stages of parasitic life cycle
37
Just Enough Diagnostic Information
38
Sidhu KS, Sangavich P, Brancia FL, Sullivan AG,
Gaskell SJ, Wolkenhauer O, Oliver SG, Hubbard
SJ (2001) Bioinformatic assessment of mass
spectrometric chemical derivatisation techniques
for proteome database searching. Proteomics 1,
1368-1377.
39

Provide limited sequence information by
Identification of N-terminal amino acid by
PTC derivatisation
2. Use guanidination to identify C-terminus,
determine lysine content, and improve
signal response
3. Specifically fragment next to Asp residues
using MALDI-QToF MS

40
PTC-derivatisation

phenylthiocarbamoyl derivative
Edman chemistry
N-terminal amino acid
b1 ion created via low energy collisions
precursor ion scan gives parents
increased sensitivity

ms2
peptide ions
ms1
scan for precursors
fixed on b1
collision cell
Spectra collected of all peptides which give rise
to a given b1 ion (implying knowledge of the
N-terminal amino acid)
41
Database peptide hits by N-terminal amino acid
N-terminal
mean number
Error 0.5 Da
Amino acid
of peptides
ANY
74.15
W
1.70
Average number of matching proteins in the yeast
proteome when searching with a peptide mass in
the 1000-2000 Da range Rare amino acids give a
bigger search gain
C
1.77
H
2.30
M
3.41

N
5.61
I
5.76
E
6.04
S
7.18
L
8.39

I/L
14.16
42
Guanidation of Lysine
H2N
NH
NH
2
NH
NH2
O
H3C
NH2
NH
2
O
O-methyl isourea
NH
2
OH
O
OH
lysine
homoarginine
43
MALDI spectrum of an enolase tryptic digest
R
R
R
R
R
R
K
K
K
44
MALDI spectrum of a tryptic digest of enolase
after guanidation
K
K
K
K
R
K
R
K
R
K
R
R
K
K
45
Initial set of search peptides and associated
information
Search database, compile protein hit list with
matching peptides
If all initial search peptides masses are
matched, stop, else continue searching
Top-scoring protein is matched. Remove
corresponding peptides from search list
46
Real yeast proteomics

Alternatives to 2D-gels
denaturing technology
low abundance spots difficult to identify
Many steps of orthogonal 1D-steps
Size exclusion chromatography
Ion exchange chromatography
1D-gels

47
Yeast proteome sample
1752.62
Before guanidination
3570.36
1768.59
795.32
1470.68
1708.61
811.32
800
1000
1200
1400
1600
3600
After guanidination
R
1752.65
K
1512.69
K
R
925.33
3612.77
1040.30
1210.39
1150.49
1416.55
795.23
1221.90
0
800
1000
1200
1400
1600
1800
3600
Mass (m/z)
48
Database search gains
1656 proteins match at least 1 peptide
Standard MALDI 7 search peptides (before
guanidination)
2549 proteins match at least 1 peptide
Standard MALDI 12 search peptides (after
guanidination)
Combined 19 (7 12) search peptides (both
experiments)
3235 proteins match at least 1 peptide
49
Database search gains
peptides in common
Search peptides in common (5 from expt 1, 4
from expt 2)
Only 289 proteins match at least 1 peptide in
both experiments
PTC derivatised 3 peptides N-term Ile/Leu
Only 204 proteins match at least 1 peptide
All 3 sets of experimental data combined
Only 18 proteins match at least 1 peptide in all
3 experiments
50
(No Transcript)
51
(No Transcript)
52
S. cerevisiae 1 protein
S. cerevisiae 2 proteins
53
Improved bioinformatics approaches for complex
mixtures
primary data
secondary data
(input masses)
(experimental proteome data)
Database
Database

- proteome
- proteins
rule-based
search
system
- peptides
engine
protein hit list
protein information
(quantitative data)
(qualitative data)
possibility
probability
Final Scores
54
Contextual information

? pI (theoretical experimental)
? Molecular weight (oligomerisation state)
? Subcellular localisation (known, predicted -
PSORT)
Molecular environment (soluble, membrane, DNA-,
actin- associated.)
Post-translational modifications (known,
putative, predicted)
Sequence motifs
Homology relationships
Non-native state digestions

55
Scoring systems

Bayesian approach
k is hypothesis that the sample protein is
protein k,
D is mass spec fingerprint data,
I is background information,
P(kDI) is posterior probability for k given D
and I,
P(kI) is prior probability of k given I,
P(DI) is a normalisation constant

56
QUANTITATIVE PROTEOMICS
57
DiGEDifference Gel Electrophoresis

Ünlü M. et al (1997). Difference gel
electrophoresisa single gel method for detecting
changes in cell extracts. Electrophoresis,18,
2071-2077

58
Sample 2
Sample 3
Sample 1
label with cy2 in dark 30mins _at_ 4OC
label with cy3 in dark 30mins _at_ 4OC
label with cy5 in dark 30mins _at_ 4OC
quench un-reacted dye by adding 1mM lysine in
dark 10mins _at_ 4OC
Difference Gel Electrophoresis
2D gel electrophoresis
59
no difference ? presence / absence ? ? up /
down-regulation ?
60

Stable Isotope Labelling

In vivo labelling Isotopes introduced during
cell culture
Pro Con
Cheap Only works for microbes and
cell culture????
Information rich Very complex samples
Have to deduce sequence before assigning
pairs

N14 N15
m/z
61

Growth of C.elegans on isotopically labelled
E.coli
E.coli grown on 15N nitrogen source
E.coli grown on 14N nitrogen source
Metabolic labelling of C.elegans
Heavy mutant
Light mutant
Light WT
Heavy WT
Krijsveld et al (2003) Nat. Biotech.
Also grew Drosophila on metabolically labelled
yeast
62
In vitro labelling - continued

I Isotopes introduced during proteolysis 18O
labelled water, C-termini
II Guanidinylation of lysine using isotopes of
O-methyl isourea lysine residues
III Dimethyl labelling lysine residues
Pro Con
Cheap Complex peptide mixture
Universal Small mass difference on MS

63
ICAT Isotope Coded Affinity Tags
Gygi SP, et al . Nat Biotechnol 1999, 17994-999.
Isotope Coded Linker 227 / 236 (913C) amu
SH- reactive group (Iodoacetamide)
Pros Cons Universal Protein must contain
cysteine Simplified sample
64
ICAT method
Biotin
Linker (heavy or light)
Thiol-specific reactive group
Gygi S, Rist B et al. (1999) Nature Biotech. 17
994.
65
Control sample
Test sample
Denature (SDS) and reduce (TCEP)
Label with heavy reagent
Label with light reagent
Pool Samples
66
Purify labelled peptides using avidin column
Digest overnight with trypsin
Cleave biotin portion of the tag with
concentrated TFA
LC-MSMS
67
(No Transcript)
68
iTRAQ
69
Ross P. et al. Mol Cell Proteomics. 2004 Sep 22
70
WORKFLOW

? reduce, alkylate (cysteine block) and digest
protein sample with trypsin as usual
? label each sample (max of 4) with a different
iTRAQ reagent, 100ug of protein is optimal
? combine all iTRAQ labeled samples to one
sample mixture
? clean up sample by Cation- Exchange-
Chromatography
? for complex sample mixtures, pre-fractionation
is achieved by using a High-Resolution-Cation-Exch
ange column
? analyze the mixture by LC/MS/MS
? results are analysed by Pro Quant Software

71
(No Transcript)
72
PROTEIN TURNOVER The missing dimension of
proteomics
JM Pratt, J Petty, I Riba-Garcia, DHL Robertson,
SJ Gaskell, SG Oliver, RJ Beynon (2002) Molec.
Cell. Proteomics 1, 579-591.
73
Experimental Approach
Dilution rate 0.1h-1 Half-time 6.9h
74
L3
Pratt et al., Figure 3
L1
100 d9
1467.3
1119.9
1454.1
1686.3
1795.4
2336.5
1336.2
2057.5
L3
50 d9
1119.8
L0
L1
L3
L1
L2
L2
L2
1119.9
1440.0
1444.9
0 d9
1747.1
1668.0
1317.8
1768.2
2327.2
2039.2
75
27Da (3 Leu)
9Da (1 Leu)
0h 4h 6h 8h 12h 25h 51h
76
Pratt et al., Figure 3
1
NADP-glutamate dehydrogenase (GDH) (3 peptides)
Hsp26(2 peptides)
0
.
8
RIAt
0
.
6
0
.
4
0
.
2
1
Pyruvate decarboxylase (PDC) (4 peptides)
Hsp71 (4 peptides)
0
.
8
RIAt
0
.
6
0
.
4
0
.
2
0
0
1
0
2
0
3
0
4
0
5
0
0
1
0
2
0
3
0
4
0
5
0
6
0
Time(h)
Time(h)
0.16
kloss (h-1) SEM
0.08
0
NADP-GDH
Hsp26
Hsp71
PDC
77
Pratt et al., Figure 5
0.02-0.03 h-1
0.01-0.02 h-1
30
lt 0.01h-1
20
Distribution ()
0.03-0.04 h-1
gt 0.04 h-1
10
0
Degradation rate constant
Degradation rate constant (h-1) SEM
Protein (Spot ID)
78
INTEGRATION
79
Evaluating protein-interaction data
von Mering C, Krause R, Snel B, Cornell M,
Oliver SG, Fields S, Bork P (2002) Comparative
assessment of large-scale data sets of
proteinprotein interactions. Nature 417,
399-403. Cornell M, Paton NW, Oliver SG (2004) A
critical and integrated view of the yeast
interactome. Comp. Funct. Genom. 5, 382-402
80

81
Schematic representation of the two hybrid system
in case of interaction of protein A and B
activation D
B
A
RNA POL II
DNA-binding D
reporter gene
UAS
Gene expression
82
Schematic representation of the two hybrid system
in absence of interaction of protein A and B
activation D
B
RNA POL II
A
NO TRANSCRIPT
DNA-binding D
reporter gene
UAS
83
(No Transcript)
84
Synthetic lethals
Definition lethality is caused by mutating two
or more genes
gene1
gene1
gene2
gene2
geneA
gene3
gene3
geneB
gene4
gene4
geneC
gene5
gene5
Single essential pathway
Functionally overlapping pathways
85
Asparagine-linked Glycosylation
Dolpp-GlcNAc2Man9Glc3 (Substrate)
(ALG genes are responsible for the core
synthesis)
Asp -NH -GlcNAc2Man9Glc3

STT3, OST1 WBP1, OST3 OST6, SWP1 OST2 OST5 OST4
X
Asp-NH2
SER/THR
X
SER/THR
alg mutations are synthetically lethal
with conditional mutation affecting
oligosaccharyltransferase activity
86
Integrating complex data with yeast two-hybrid
data
B
Complex consists of six proteins A, B, C, D, E, F
C
A
F
D
E
In a yeast two-hybrid experiment, A interacts
with another protein
A
B, C, D, E or F?
Is
87
Large-scale interaction data and the distribution
of interactions according to functional
categories.
88
Quantitative comparison of interaction datasets.
89
Set of confirmed Y2H interactions
Confirmation of an interaction requires

Identification in more than one Y2H screen, OR
The reverse interaction must have been
identified, OR
The two proteins must have been identified in the
same protein complex (from either classical or
high-throughput affinity purification
studies).

A total of 451 reliable interactions,
involving 581 proteins have been identified
from a combined data set comprising 5214
interactions and 4025 proteins
90

91
PEDRo A Systematic Approach to Modelling,
Capturing and Disseminating Proteomics Data

Taylor CF, Paton NW, Garwood KL,
Kirby PD, Stead DA, Yin Z, Deutsch EW, Selway L,
Walker J,
RibaGarcia I, Mohammed S, Deery MJ, Howard JA,
Dunkley T, Aebersold R, Kell DB, Lilley KS,
Roepstorff P,
Yates JR III, Brass A, Brown AJP, Cash P, Gaskell
SJ, Hubbard SJ, Oliver SG (2003)
Nature Biotechnol. 21, 247-591.
Garwood K, McLaughlin T, Garwood C, Joens S,
Morrison N, Taylor CF, Carroll K, Evans C,
Whetton AD, Hart S, Stead D, Yin Z, Brown AJP,
Hesketh A, Chater K, Hansson L, Mewissen M,
Ghazal P, Howard J, Lilley KS, Gaskell SJ, Brass
A, Hubbard SJ, Oliver SG, Paton NW (2004)
PEDRo A database for storing, searching and
disseminating experimental proteomics data.
BMC Genomics 5, 68 doi10.1186/1471-2164-5-68.

92
Proteomics the state of play

The volume of generated proteome data is rapidly
increasing
Movement towards highthroughput approaches
Experimental techniques increasing in complexity
Analyses also increasing in complexity
Current publicly available proteomics data is
limited
2DGel image databases (e.g. SWISS2DPAGE)
contain little information about sample
preparation, or analysis of results
No widely used databases of mass spectrometry
data or analyses
A robust, future-proofed, standard representation
of both methods and data from proteomics
experiments is required
Analogous to the MIAME guidelines for
transcriptomics
Users will know what to expect from datasets
(formats etc.)
Will facilitate handling, exchange and
dissemination of data
Will guide the development of effective
search/analysis tools

93
PEDRo and PEML

The PEDRo (Proteome Experiment Data Repository)
model
Specifies the information required about a
proteomics experiment
sufficient information to exactly replicate that
experiment
Organised in a manner reflecting the procedures
that generated it
Flexible enough to accommodate new technological
developments
Described in UML (Universal Modelling Language)
making it implementationindependent (effectively
a generic blueprint)
Implemented in SQL (the relational database
repository)
Also implemented in Java (later slide), and XML
(next bullet)
PEML (Proteomics Experiment Markup Language)
The XML implementation of PEDRo for data exchange
and rapid dissemination (using XSLT to display
PEML files as web pages)
Two benefits arising from early implementation of
the model
Implementation allows the underlying technologies
to be tested
Making explicit what data might most usefully be
captured about proteomics experiments will speed
the models evolution

94
The nature of proteomics experiment data

Sample generation
Origin of sample
hypothesis, organism, environment, preparation,
paper citations
Sample processing
Gels (1D/ 2D) and columns
images, gel type and ranges, band/spot
coordinates
stationary and mobile phases, flow rate,
temperature, fraction details
Mass Spectrometry
machine type, ion source, voltages
In Silico analysis
peak lists, database name version, partial
sequence, search parameters, search hits,
accession numbers

95
The PEDRo UML schema in reduced form
96
(No Transcript)
97
The Framework Around PEDRo

Lab generated data is encoded using the PEDRo
data entry tool, producing an XML (PEML) file for
local storage, or submission
Locally stored PEML files may be viewed in a web
browser (with XSLT), allowing web pages to be
quickly generated from datasets
Upon receipt of a PEML file at the repository
site, a validation tool checks the file before
entering it into the database
The repository (a relational database) holds
submitted data, allowing various analyses to be
performed, or data to be extracted as a PEML file
or another format

98
The PEDRo Data Collator

The tool with which a user enters information
about, and data from, proteomics experiments
The tool collates these data into a single PEML
file
The hierarchical nature of the PEDRo schema (and
PEML) is reflected in the structure of the data
entry tool
Successive stages of the experimental design are
added as children of the previous stage
Enforces an audit trail for data e.g. details of
a gel cannot be entered without first describing
the sample
A simple, filterable list of all the subrecords
present and tree-style browser act as index and
contents for the PEML file being edited

99
(No Transcript)
100
Conclusions

The PEDRo model does require a substantial amount
of data
Much of this information will be available in the
lab of origin
Some data will be common to many experiments, and
therefore need only be entered once, then saved
as a template in PEDRoDC
But there are several advantages to adopting such
a model
All datasets will contain information sufficient
to quickly establish the provenance and relevance
(to the researcher) of a dataset
Datasets will be detailed enough to allow
nonstandard searches, for example, by sample
extraction technique
Tools can be developed that allow easy access to
large numbers of such datasets, from a wide range
of proteomics sites
Integration with other resources such as the
major sequence databases, will provide
sophisticated search and analysis capability
Information exchange between researchers will be
facilitated through the use of a common language
(PEML), and the ability to rapidly display
PEML-encoded data as a web page

Write a Comment

User Comments (0)

About PowerShow.com

Experimental PowerPoint PPT Presentation