Computational Methods and Bioinformatics in Proteomic Studies - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Computational Methods and Bioinformatics in Proteomic Studies

Description:

Disease proteomics: androgen-induced effects in prostate cancer ... androgen androgen. Dealing with the data. Data acquisition. Raw data processing ... – PowerPoint PPT presentation

Number of Views:139
Avg rating:3.0/5.0
Slides: 52
Provided by: tgr70
Category:

less

Transcript and Presenter's Notes

Title: Computational Methods and Bioinformatics in Proteomic Studies


1
Computational Methods and Bioinformatics in
Proteomic Studies
Bioinformatics Building Bridges April 14,
2005 Tim Griffin Dept. Biochemistry, Molecular
Biology and Biophysics tgriffin_at_umn.edu
2
Interdisciplinary biology in the 21st century
3
Genome-era biology system-wide studies
The yeast genome on a chip
DeRisi et al, 1997, Science 278680
4
The simple view one gene, one protein
5
The reality biological systems are complex
Protein interaction network in Drosophila Science
(2003) 302, p. 1727
6
Why analyze at the protein level? control of
eukaryotic gene expression
Inactive mRNA
Nucleus
Cytosol
Translational control
Primary RNA transcript
DNA
mRNA
mRNA
Trancriptional control
RNA processing control
RNA transport control
Translational control
Protein
Protein activity control
Inactive protein
Active protein
7
What is proteomics?
  • Proteomics includes not only the identification
    and quantification of proteins, but also the
    determination of their localization,
    modifications, interactions, activities, and,
    ultimately, their function.
  • -Stan Fields in Science, 2001.
  • Alternatively proteomics fast biochemistry

8
Proteomics a complement to genomics
What proteomic analysis has to offer
  • measurement of protein response, which is not
    always
  • indicated by mRNA response
  • post-translational modifications
  • macromolecular interactions
  • sub-cellular location
  • high-resolution structural and molecular
    characterization

9
Genomics, Proteomics, and Systems Biology
10
Proteomics technologies and methods
  • Two-dimensional gel electrophoresis
  • mass spectrometry
  • protein chips
  • yeast 2-hybrid
  • phage display
  • antibody engineering
  • high-throughput protein expression
  • high-throughput X-ray crystallography

11
The 1990s revolution mass spectrometry Developm
ent of physical methods to mass analyze large
biomolecules
separation by m/z
  • quadrupole
  • ion trap
  • time-of-flight
  • MALDI
  • Electrospray
  • liquid chromatography
  • nanospray
  • mass analysis of proteins, peptides, DNA

12
Electrospray ionization (ESI)
200 ?m
  • protein and peptide analysis, multiply charged
    ions
  • quadrupole and TOF detection
  • tandem mass spectrometry
  • solution phase ionization enables online
    coupling with liquid chromatography (LC)

13
Separations of complex mixtures crowd control
  • Enables the processing of the many components
    in big protein mixtures

turnstile
1 2 3....
14
Identification of protein mixtures by tandem mass
spectrometry
3. CID
4. detect fragments
2. select specific peptide
ESI
Ar
Ar
µLC
Ar
Ar
1. MS survey scan
peptides

trypsin
Protein mixture
15
Peptide sequence determination from MS/MS spectra
Collision-induced dissociation (CID) creates two
prominent ion series
y13
y12
y11
y10
y9
y8
y7
y6
y5
y4
y3
y2
y1
y14
y-series
H2N-N--S--G--D--I--V--N--L--G--S--I--A--G--R-COOH
b2
b3
b4
b5
b6
b7
b8
b9
b10
b11
b12
b13
b14
b1
b-series
16
Identification of protein mixtures by mass
spectrometry
  • De novo (i.e. manually)
  • Database searching

peptide identification
theoretical (DNA or protein database)
observed
protein identification
17
Peptide sequence identifies the protein
GDIVNLGSIAGR
DIVNLGSIAGR
IVNLGSIAGR
VNLGSIAGR
NLGSIAGR
LGSIAGR
GSIAGR
H2N-NSGDIVNLGSIAGR-COOH
Relative Abundance
SIAGR
IAGR
AGR
GR
R
200
400
600
800
1000
1200
m/z
YMR134W, yeast protein involved in iron metabolism
18
High-throughput protein identification by
LC-MS/MS and automated sequence database searching
Raw MS/MS spectrum
Direct identification of 1000 proteins from
complex mixtures
Protein sequence and/or DNA sequence database
search
Peptide sequence match
Protein identification
19
Case Study Proteomic Analysis of Oral Cancer
Progression
  • Mouth cancer, tongue cancer, throat cancer
  • In USA, 30,000 people are newly diagnosed with
    oral cancer each year, a person dies from oral
    cancer every hour of every day
  • 350,000 to 400,000 new cases annually worldwide
  • Less than half will be alive in 5 yrs 20x
    higher risk of producing second, primary tumors
  • However, 80 to 90 cure rate when found early.
    Unfortunately, at this time, the majority are
    found as latter stage cancers

20
Progression of oral cancer
Malignancy transformation rate 5-17
carcinoma
dysplasia
insult or injury
normal
?? Can we find molecular markers that predict
this transition?
(adapted from Dr. Nelson Rhodus, U of M Dental
School)
21
Saliva as a diagnostic fluid in oral cancer
progression
  • Readily available, non-invasive collection
  • Heterogeneous human fluid with large dynamic
    range of protein abundances requires
    fractionation
  • Many post-translational modified proteins
  • Currently only 100-150 proteins have been
    identified in whole saliva (LC-MS/MS)

First step obtain a comprehensive profile of the
protein components from a normal individual
saliva sample
22
Multidimensional separations followed by mass
spectrometry
Whole saliva protein mixture
FFE fractionation (70 fractions)

RP-capLC
ESI-MS/MS (500,000 spectra)
Protein identification
Protein sequence and/or DNA sequence database
search
23
Raw data processing Automated database
searching Computational algorithms for searching
MS/MS spectra against protein sequence databases,
mRNA sequences, DNA sequences
  • ProFound
  • Mascot
  • PepSea
  • MS-Fit
  • MOWSE
  • Peptident
  • Multident
  • Sequest
  • PepFrag
  • MS-Tag

Protein identification
24
Choosing a sequence database
  • National Center for Biotechnology Information
    (NCBI)
  • Swiss-Prot/TrEMBL
  • Protein Information Resource (PIR)
  • European Biotechnology Institute (EBI)

Considerations organism-specificity, redundancy,
annotation
25
Analysis of processed data quality control of
protein matches
filtering
Unfiltered 105 matches (lots of noise and
junk)
Filtered thousands of true matches
26
Probability of sequence match via statistical
modeling
Keller et al (2002) Analytical Chemistry 74, 5383
Sequence matches automatically assigned a P score
between 0 and 1
27
Collating and interpreting the data Interact
software tool
http//www.systemsbiology.org/Default.aspx?pagenam
eproteomicssoftware
28
Result Processed and Filtered Data
Saliva example 433 unique proteins identified
29
Interpreting the data annotated protein
databases
National Center for Biotechnology Information
(NCBI) ExPASy/Uniprot European Bioinformatics
Institute (EBI) Organism/biology
specific Saccharomyces Genome Database
(SGD) Human Mitochondrial Protein
Database Human Proteome Organization (HUPO)
30
Mining databases for data interpretation Example
1
31
Mining databases for data interpretation Example
1
32
Mining databases for data interpretation Example
2
33
Mining databases for data interpretation Example
2
34
Classification of interpreted data subcellular
localization
35
Classification of interpreted data functional
characterization
36
What about quantitative measurements?
Malignancy transformation rate 5-17
carcinoma
dysplasia
insult or injury
normal
?? Can we find molecular markers that predict
this transition?
(adapted from Dr. Nelson Rhodus, U of M Dental
School)
37
Stable-isotope labeling of proteins for
quantitative profiling
20 vs. 37
-L and H labels are chemically identical, but
isotopically different due to incorporation of
stable isotopes (i.e. 2H, 15N, 13C)
Chemically identical but isotopically different
peptides ionize with same efficiency, act as
mutual internal standards
38
Quantitative analysis of mRNA data
DeRisi et al, 1997, Science 278680
39
Automated Quantitative Proteomics
100
light
heavy
quantify
mixture 1 (light)
550
560
570
580
m/z
mass analysis
multi-dimensional separation
combine and proteolyze
mixture 2 (heavy)
100
NH2-EACDPLR-COOH
Identify (MS/MS)
0
200
400
600
800
m/z
40
Quantitative analysis
Sample 2
Relative intensity relative protein abundance
Sample 1
41
Disease proteomics androgen-induced effects in
prostate cancer
42
Dealing with the data
Data acquisition
Raw data processing (Database searching)
Analysis of processed data (Statistical
filtering, quantitative analysis)
Data organization and interpretation
Archiving and databasing
Modeling (Computational Biology)
43
Need for better data archives and respositories
http//proteomics.jhu.edu/dl/pathidb.php
44
Archiving challenges different data formats
http//sashimi.sourceforge.net/software_glossolali
a.html
45
Computational Biology Integrating proteomics
and genomics data
control of eukaryotic gene expression
Inactive mRNA
Nucleus
Cytosol
Translational control
Primary RNA transcript
DNA
mRNA
mRNA
Trancriptional control
RNA processing control
RNA transport control
Translational control
Protein
Protein activity control
Inactive protein
Active protein
46
Integrating proteomics and genomics
data Elucidating gene expression regulatory
networks
Griffin TJ et al (2002) Mol Cell Proteomics 1 323
47
Post-transcriptionally regulated proteins?
48
Computational biology integrating information to
assign function
Cytoscape http//www.cytoscape.org/
49
Modeling cellular circuitry based on genomic and
proteomic data
50
Is the virtual human on the horizon???
51
Acknowledgements
Griffin Laboratory Mikel Roe Sri
Bandhakavi Hongwei Xie Clive Nyauncho
U of M Dental School Dr. Nelson Rhodus
MSI Patton Fast
University of Minnesota
Funding Minnesota Medical Foundation NIH
Write a Comment
User Comments (0)
About PowerShow.com