Title:
1Proteomics Bioinformatics
MBI, Master's Degree Program in Helsinki, Finland
Lecture 2
8 May, 2007
Sophia Kossida, BRF, Academy of Athens,
Greece Esa Pitkänen, Univeristy of Helsinki,
Finland Juho Rousu, University of Helsinki,
Finland
2Gel Image Analysis
staining
Image acquisition
Image analysis /quantification
Image analysis is extracting perceptible data out
of the 2DE image, and storing it in a database.
It involves detecting spots and warping separate
images to align like-spots of the same proteins.
Spot data comes from the levels of spot darkness
which is proportional to the level of proteins
staining or dye labeling of particular amino
acids.
Reiner Westermeier, GE Healthcare LifeSciences,
Munich, Germany
3Image analysis software
Some commercially available softwares ImageMaster
2D/ Melanie PDQuest (Bio-Rad, USA) Progenesis
(Nonlinear, UK) Delta2D (Decodon, Germany)
4Melanie
http//www.2d-gel-analysis.com/
5Melanie
http//au.expasy.org/melanie/
6PDQuest
http//www.bio-rad.com/
7Progenesis
http//www.nonlinear.com/products/progenesis/
8Delta 2D
http//www.decodon.com/Solutions/Delta2D/
9Staining
Detection Pre-labeling - radioisotopes, stable
isotopes, fluorescence Intermediate labeling -
Fluorescence during equilibration of
IPG-strip Staining of gel background Imidazol
Zinc staining Staining of proteins- Organic dyes,
silver, fluorescent dyes Blotting Immuno /
affinity detection
10Scanning
Avoiding background, artifacts and noise is
essential insufficient destaining, contamination
by fingerprints, fluorescent sprinkle,
bubbles) gel breaking, gel pieces
Use grayscale (complete range) instead of color
images. Scan all gel images using the same
orientation, placing each gel at the same
position on the scanner plate. Avoid scanning too
much of the area around the gel. Limit
post-processing to crop, mirror and rotation by
90, 180, 270 degrees Avoid producing TIFF files
if you can process calibrated image file formats
such as .IMG/INF and .GEL. usually the TIFF
files are produces without grayscale calibration.
This means you loose precision or grayscales are
distorted nonlinearly making quantitation
questionable. Avoid using JPEGfiles for
quantitative analysis.
11Image analysis
Manipulation of image/ normalization -separation
of overlapping spots, removing lines and
speckles Spot detection/ quantification -backgrou
nd subtraction, spot segmentation, land-marking
,spot matching Gel comparison -matching of
gels (e.g. normal, diseased, treated),alignment D
ata analysis -changes in expression Data
representation -annotation of spots, linking of
data spots -intensity - MS data
12Organizing experiments
Organizing the experiment Creating projects,
folders and subfolders. Importing gel images
Melanie/ImageMaster 2D Platinum 6.0
13Import gels
Tool box to easy manipulate gels
Melanie/ImageMaster 2D Platinum 6.0
14Viewing and manipulating images
Adjusting contrast
Intensity variations in x- and y-direction
3D-view
Automatically subtracted background
Melanie/ImageMaster 2D Platinum 6.0
15Spot detection
Adjust the separation between spots Split
overlap Eliminate art affects/noise Stain
saturation Incomplete resolution
Melanie/ImageMaster 2D Platinum 6.0
16Spots report
A spot report summarizes the information about
the selected spots
Melanie/ImageMaster 2D Platinum 6.0
17Detection/matching
Spot detection
Spot matching
Normalization of spot intensities
PTM?
Downregulation?
Modified from mouse cardiac 250 ?g loading pH
3-10 IEF strips 12.5 SDS-PAGE file ID sc5bcon
vs. sc15iso
18Matching
Reference gel
Combining 2D gel images -creating a master gel, a
typical profile.
Melanie/ImageMaster 2D Platinum 6.0
19Master gels
- Combine several images, creating the master image
- all the spots on a single image
- even those that will never be expressed at the
same time, - a summary of groups of replicate gels (average
gel)
Delta 2D
Any point on a gel can be labeled, and
automatically transferred from one gel to another.
20Gel image warping
Variations in migration, protein separation,
stain artifacts and stain saturation complicate
gel matching and quantitation.
Compensates for running differences between
gels After warping, corresponding spots will
have the same position on every image.
21Expression
Comparison of individual experimental gels to
master gels. Identification of variant spots
22Miscellaneous
Automatic retrieval of web information. Send out
a Scout to the web and bring back corresponding
data like pI, MW, sequence, function
Create a PowerPoint slide from a gel image
Delta 2D
232D Gel Databases
Swiss-2DPAGE www.expasy.ch GelBank
http//www.gelscape.ualberta.ca8080/htm/gdbIndex.
html Cornea 2D-PAGE http//www.cornea-proteom
ics.com/ World 2DPAGE, Index of 2D gel
databases http//ca.expasy.org/ch2d/2d-index.html
24Swiss 2D PAGE viewer
25Gel bank
26cornea
27World-2DPAGE
http//ca.expasy.org/ch2d/2d-index.html
28Make 2D database
A software package to create, convert, publish,
interconnect and keep up to date 2DE-databases.
Provided by ExPASY The database is queryable via
description, accession or spot clicking. Cross-ref
erences are provided to other federated 2D PAGE
database entries, Medline and SWISS-PROT Entries
are linked to images showing the experimentally
determined and theoretical protein
locations. Search via clickable images,
-keywords
Data can be marked to be public, as well as fully
or partially private. An administration Web
interface, highly secured, makes external data
integration, data export, data privacy control,
database publication and versions' control a very
easy task to perform.
It runs on most UNIX-based operating systems
(Linux, Solaris/SunOS, IRIX). Being continuously
developed, the tool is evolving in concert with
the current Proteomics Standards Initiative of
the Human Proteome Organization (HUPO).
29Federated databases
A collection of databases that are treated as one
entity and viewed through a single user interface
(pc.mag.com)
Robustness Consistency Maintenance of the
database Data quality
Limitations of current databases Do not contain
strict/detailed descriptions of protocol
(buffers, sample volume, staining techniques all
important information for gel comparisons). Design
ed as 2D (and not proteomics) databases and
therefore not readily expandable to incorporate
other proteomics data e.g. MS, MDLC. Designed for
reference gels, not on-going projects.
30Guidelines for building a federated 2-DE database
http//ca.expasy.org/ch2d/fed-rules.html
Individual entries in the database must be
accessible by a keyword search. Other methods are
possible but not required. The database must be
linked to other databases by active hypertext
cross-references, linking together all related
databases. Database entries must be at least
linked to the main index. A main index has to be
supplied that provides a means of querying all
databases through one unique query point.
Individual protein entries must be available
through clickable images. 2DE analysis software
designed for use with federated databases, must
be able to access individual entries in any
federated 2DE databases.
for a complete reference, see Appel et al.,
Electrophoresis 17, 1996, 540-546, 1996)
31SWISS 2D PAGE
http//au.expasy.org/ch2d/
32Swiss 2D PAGE viewer
Which gel you want to look at
33Swiss 2D PAGE
34Swiss-2D PAGE
35Estimated position
Estimated position in human liver sample
36Vimentin_human (P08670)
37Peptide Mass Fingerprinting
A protein identification technique, that
correlates experimental data with theoretical
data.
Experimental MS
Proteolytic digestion
Protein
Computer search
In silico digestion
Theoretical MS
Protein sequence from database
38Peptide Mass Fingerprinting
- Protein digestion with protease (trypsin)
- Determination of the mass by MS -Calibration
- Database searching -Generation of the peptide
map - Comparison with theoretical peptide maps of
known proteins -In silico digestion - Identification of the protein based on a
probabilistic basis -percent coverage,
similarity etc
39Protein digestion with protease (trypsin)
The molecule is cleaved at all the possible
sites, which will produce a set of peptides, of
varying masses, that are characteristic of that
protein. The mass of each peptide will be the
sum of the amino acids present including any
modifications that those amino acids might have
undergone.
trypsin
Cleaves at lysine and arginine, unless either is
followed by proline in C-terminal direction
from tutorial written by Dr J. R. Jefferies,
Parasitology Group, Institute of Biological
Sciences, University of Wales, UK
40Determination of mass
MALDI - MS is used to measure the masses of the
proteolytic peptide fragments.
Every peak corresponds to the exact mass (m/z) of
a peptide ion
1051.54 1086.52 1094.56 1111.59 1244.64 1421.7 147
6.67 1542.84 1613.88 1664.97 1763.79 1777.82
Select Monoisotopic peaks MH i.e.
singly charged
Peak list
41Isotopes
Isotopes are different forms of an element, each
having different atomic mass. They have a nuclei
with the same number of protons (same atomic
number) but different numbers of neutrons.
Naturally occurring isotopes
Isotope (A) mass Abundance, Isotope (A1) mass Abundance, Isotope (A2) mass
12C 12 98,93 13C 13.0033548378 1.07 C14 14.003241988
1H 1.0078250321 99.9885 2H 2.0141017780 0.0115 3H 3.0160492675
14N 14.0030740052 99.632 15N 15.0001088984 0.368
16O 15.9949146221 99.757 17O 16.99913150 0.038 18O 17.9991604
modified from http//www.ionsource.com/Card/Mass/
mass.htm
42Monoisotopic- /Average mass
The monoisotopic mass is the mass of the isotopic
peak whose elemental composition is composed of
the most abundant isotopes of those elements
A simulated isotopic distribution of the MH
ion of a compound (poly- ananine)
Monoisotopic mass is expressed in atomic mass
unites (amu), or in daltons (Da).
43Accuracy
The higher the accuracy, the better and more
specific the protein hit.
Accurate measurements of peptide masses
Accurate databases Relies on the ability to
search data already present in various databases
44Effect of Mass Accuracy and Mass Tolerance
search m/z mass tolerance (Da) hits
1529 1 478
1529,7 0,1 164
1529,73 0,01 25
1529,734 0,001 4
1529,7348 0,0001 2
Tryptic digestion of human hemoglobin alpha chain
yields 14 tryptic peptides, of which the peptide
VGAHAGEYHAEALER has an exact monoisotopic mass of
1528,7348 Da. The singly charged ion of this
peptide has an m/z value of 1529,7348. The result
of searching SWISS PROT database against all
human and mouse proteins.
Lieber, Introduction to Proteomics
45Database search
Peptide mass fingerprinting provides evidence for
the most probable identity of a protein.
The genome should be verified for the organism
that you are working on. If not, then the next
most ideal situation is that there is good cDNA
data available. If neither of these are the case
then it is worth checking if there are any
expressed sequence tags (EST) that can be used.
The quality of the Protein identification will
depend upon Quality of the mass spectrometry
data The accuracy of the database The power of
the search algorithms and software used
46Tools for fingerprinting
Mascot (Matrix Science) Aldente
(ExPasy) Profound MS-Fit (Prospector UCSF)
Several of the available peptide mass
fingerprinting programs use more sophisticated
scoring algorithms. Correct for scoring bias due
to protein size, in which larger proteins give
rise to greater number of peptides, for tendendy
of smaller peptides I databases to have a greater
number of matches with search m/z values. Some of
these algorithms also apply probability based
statistics to better define the significance of
protein identifications.
47Mascot PMF
48Mascot PMF score
gt 5 probability that the match is a random
events, of no significance
The significance of the result depends on the
size of the database being searched
49Mascot PMF results
coverage
of protein length covered by the experimental
peptides
50Mascot protein view
51ALDENTE
Aldente is a tool to identify proteins from
peptide mass fingerprinting data
(http//www.expasy.org/tools/aldente/)
52Aldente, protein window
53Aldente, peptide view
54Aldente, results
S S1 S2S final score for this entry S1
sum of each peptide scoreS2 protein level
score The scoring is tunable the weights of
each parameter in the score, can be defined
independently
55newt
56Swiss prot entry
57Aldente results
58Profound
http//prowl.rockefeller.edu/prowl-cgi/profound.ex
e
59Profound results
60Graphic display of results
61MS-FIT
University of California, San Fransisco UCSF Mass
Spectrometry Facility http//prospector.ucsf.edu/p
rospector/4.0.8/html/msfit.htm
62MS Fit
63Ms fit detailed report