INTERNATIONAL COLLABORATION IN PROTEOMICS AND INFORMATICS - PowerPoint PPT Presentation

1 / 73
About This Presentation
Title:

INTERNATIONAL COLLABORATION IN PROTEOMICS AND INFORMATICS

Description:

INTERNATIONAL COLLABORATION IN PROTEOMICS AND INFORMATICS – PowerPoint PPT presentation

Number of Views:140
Avg rating:3.0/5.0
Slides: 74
Provided by: medi151
Category:

less

Transcript and Presenter's Notes

Title: INTERNATIONAL COLLABORATION IN PROTEOMICS AND INFORMATICS


1
INTERNATIONAL COLLABORATION IN PROTEOMICS AND
INFORMATICS
  • Bibliotheca Alexandrina, 9 October, 2007
  • Gilbert S. Omenn, M.D., Ph.D.
  • Center for Computational Medicine Biology
  • Chair, HUPO Plasma Proteome Project
  • University of Michigan, Ann Arbor, MI, USA

2
It Is Such A Great Pleasure to Visit The
Bibliotheca Alexandrina
  • One of the Wonders of the Modern World!
  • The First Digital Library, from its Birth
  • Facilitating International Collaboration in
  • Science and Technology

3
Nearly-Complete Human Genome Sequence, 15-16 Feb
2001
4
We Live in a New World of Life Sciences
  • New Biology---New Technology a parts list
  • Genome Expression Microarrays
  • Comparative Genomics CNV miRNA
  • Proteomics and Metabolomics
  • Bioinformatics Computational Biology
  • Mechanism- Evidence-Based Medicine
  • What were you doing up to now?!
  • Predictive, personalized, preventive,
  • participatory healthcare and community
  • health services

5
Key Components of the Vision of Biology As An
Information Science
  • An avalanche of genomic information validated
    SNPs, haplotype blocks, candidate genes/alleles,
    proteins, metabolites--associated with disease
    risk
  • Powerful computational methods
  • Effective linkages with better environmental and
    behavioral datasets for eco-genetic analyses
  • Credible privacy and confidentiality protections
  • Breakthrough tests, vaccines, drugs, behaviors,
    and regulatory actions to reduce health risks and
    cost-effectively treat patients globally.

6
A Golden Age for the Public Health Sciences
  • Sequencing and analyzing the human genome is
    generating genetic information that must be
    linked with information about
  • Nutrition and metabolism
  • Lifestyle behaviors
  • Diseases and medications
  • Microbial, chemical, physical exposures
  • Every discipline of public health sciences
    needed.

7
Definitions
  • Genetics is the scientific study of genes and
    their roles in health and disease, physiology,
    and evolution.
  • Genomics is a modern subset of the broader field
    of genetics, made feasible by remarkable advances
    in molecular biology, biotechnology, and
    computational sciences, to examine the entire
    complement of genes and their actions.
  • Global analyses permit us and require us to go
    beyond the known lamp-posts of individual gene
    associations and effects.

8
  • Proteins are the action molecules of the cell and
    the leading candidates for biomarkersin tissues
    and in the blood. Proteins are coded for by
    genes. Understanding one protein can be a
    lifetimes work!
  • Proteomics is the global analysis of proteins in
    cells or body fluids. Techniques for global
    analysis of proteins are advancing rapidly,
    especially for discovery of biomarkers for
    diagnosis, treatment, and prevention.
  • Metabolomics is the global analysis of
    metabolites.
  • Proteomics metabolomics epigenomics
    functional genomics

9
Protein
DNA
10
Rationale for Proteomics
  • Proteins are much closer to the pathophysiologic
    changes and molecular targets for drugs than are
    mRNAs.
  • Changes in mRNAs are clues, but changes in
    corresponding proteins often are not highly
    correlated.
  • Advances in fractionation of complex tissue and
    plasma protein mixtures, in mass spectrometry,
    and in curated databases of proteins help address
    complexity, dynamic range, and uncertainty of
    protein identifications.

11
A Vision For Proteomics
  • Multiple protein biomarkers discovered
  • Biomarkers combined on diagnostic chips
  • Detect organ location of cancers, for surgery or
    radiation
  • Detect mechanism of disease for chemotherapy,
    even if location unknown
  • Mechanistic, rather than geographic
    classification
  • Better efficacy/less toxicity for all types of
    patients

12
Status of Proteomics Assays
  • Many technology platforms of increasing
    sensitivity and resolution
  • Patterns or specific proteins still just
    biomarker candidates most lack independent
    confirmation and coefficient of variation, let
    alone validation with standard clinical
    chemistry parameters of sensitivity, specificity,
    and especially positive predictive value
  • Approaches of clinical chemistry needed to guide
    further development of the field

13
Barriers for Proteomic Cancer Biomarker Discovery
in Plasma
  • Human cancers are very heterogeneous
  • Tumor proteins are in low abundance for early
    detection of cancers
  • Tumor proteins are greatly diluted upon release
    to ECF and blood
  • Plasma is an extraordinarily complex specimen
    dominated by high abundance proteins (50 by
    weight is albumin)
  • Knowledge of the plasma proteome is still limited

14
Outline of Lecture
  • Review of the vision, strategy, and output of the
    HUPO Human Plasma Proteome Project Pilot Phase
  • Objectives for the New Phase of the Plasma
    Proteome Project
  • Example of the power of computational tools and
    collaborations (if time)

15
HUPO
  • The international Human Proteome Organization
    (HUPO) was founded in 2001. Its aims are
  • To advance the science of proteomics
  • To enhance training in proteomics
  • To build international initiatives by organ
    (liver, brain, kidney), biofluid (plasma, urine,
    CSF, saliva), and disease (cardiovascular,
    cancers), plus antibodies and data standards.

16
Proteomics Interaction Map Ruth McNally,
sociologist
17
Samir Hanash, founding President of HUPO
Gil Omenn, leader of HUPO PPP
18
THE PLASMA PROTEOME
  • Advantages The most available human specimen
    the most comprehensive sample of tissue-derived
    proteins the basis for a Disease Biomarkers
    Initiative tied to organ proteomes.
  • Specific Disadvantages
  • Extreme complexity/enormous dynamic range
  • High risk of ex vivo modifications
  • Lack of highly standardized protocols
  • General Challenges Inadequate appreciation of
    incomplete sampling by MS/MS evolving
    annotations and unstable databases

19
  • Long-Term Scientific Goals of the HUPO
  • Human Plasma Proteome Project
  • 1. Comprehensive analysis of plasma and serum
  • protein constituents in people
  • Identification of biological sources of variation
  • within individuals over time, with
    validation of
  • biomarkers
  • Physiological age, sex/menstrual cycle,
    exercise
  • Pathological selected diseases/special
    cohorts
  • Pharmacological common medications
  • 3. Determination of the extent of variation
    across
  • populations and within populations

20
Scheme Showing Aims and Linkages of the
HUPO Plasma Proteome Project, Pilot
Phase
Serum vs Plasma
Technology Platforms--Separation and
Identification
Reference Specimens
HUPO HUMAN PLASMA PROTEOME PROJECT (PPP)
Development Validation of Biomarkers
HUPO PPP Participating Labs
Technology Vendors
Liver and Brain Proteome, Antibody, Protein Stds
Projects
Omenn GS. The Human Proteome Organization Plasma
Proteome Project Pilot Phase Reference
Specimens, Technology Platform Comparisons, and
Standardized Data Submissions and Analyses.
Proteomics 200441235-1240.
21
OUTPUT FROM PPP Pilot Phase
  • Special Issue Aug 2005, Proteomics, Exploring
    the Human Plasma Proteome 28 paperscollaborativ
    e analyses and annotations, plus lab-specific
    analyses, and Wiley book (2006)
  • Publicly-accessible datasets
  • www.ebi.ac.uk/pride EBI www.peptideatlas.org
    /repository ISB
  • www.bioinformatics.med.umich.edu/hupo/ppp
  • Additional papers are encouraged
  • Nature Biotechnology 2006 24333-338
    (States et al)
  • Genome Biology 20067R35 (Fermin et al)
  • Proteomics 2006 6 5662-5673 (Omenn)
  • Numerous citations/comparisons of datasets

22
(No Transcript)
23
SERUM AND PLASMA REFERENCE SPECIMENS
  • BD specially prepared male/female pooled
    samples, divided into EDTA-, Heparin-, and
    Citrate-anti-coagulated Plasma and Serum (250 ul
    x4 of each).
  • BD clot activator. No protease inhibitors.
    Three separate ethnic pools prepared. Shipped
    frozen.
  • 2. Chinese Academy of Medical Sciences Sets of
    three
  • plasmas serum, similar to BD protocol.
  • 3. National Institute for Biological Standards
    Control,
  • UK citrate-anti-coagulated, freeze-dried
    plasma, from
  • 25 donors, prepared for Intl Soc Thrombosis
  • Hemostasis, 1 ml aliquots/ampoules.

24
Specifications for Data Submission
  • Each of 55 labs agreed (July, 2003 Workshop) to
    provide, and 31 labs did provide
  • a) a detailed experimental protocol, to push
    the limits to detect low-abundance proteins
  • b) peptide sequences, rated as high or lower
    confidence, based on MS/MS criteria
  • c) protein IDs from IPI 2.21 (July 2003) and
    search engine parameters used to align peptide
    sequences with proteins in human database
  • Later, we obtained m/z peak lists and raw spectra
    (by DVD) for independent analyses.

25
From Peptides to Genome Annotation
digestion
databasesearch
LC-MS/MS
extraction
Peptides
Mass Spectrum
Proteins
Sample
Peptides
Spectrum Peptide Probability
Spectrum 1 LGEYGH 1.0
Spectrum N EIQKKF
0.3
statistical filtering
BLAST protein database
SBEAMS
Map to genome
Peptide Chrom Start_Coord
End_Coord PAp00007336 X
132217318 132217368


visualization
PeptideAtlas Database
Genome Browser
26
Numbers of Proteins Identified (LC-MS/MS or
FTICR-MS, 18 labs)
  • From 15,519 reported distinct protein IDs in IPI
    2.21, we chose one representative/cluster
  • (a) 9504 1 or more peptide matches
  • (b) 3020 2 peptide matches (Core Dataset)
  • (c) 1274 3 or more peptide matches
  • 889 follow-up high-stringency analysis with
    adjustments for protein length and multiple
    (43,000) comparisons in IPI v2.21
  • (Nature Biotech 2006 24333-338)

27
GREATEST RESOLUTION AND SENSITIVITY
  • The most extensive high-confidence yield was from
    combined methods of immunoaffinity (top-6)
    depletion, 2 or 3-D high-resolution
    fractionation, and then ESI-MS/MS with ion-trap
    LTQ instrument.
  • LTQ gave several fold more IDs (1168) than did
    LCQ (271) in same hands (B1-serum vs B1-heparin)
    and obtained multiple peptides for many proteins
    which had just one hit with LCQ.

28
SPECIFIC OBSERVATIONS DEPLETION
  • Many investigators depleted albumin and/or
    immunoglobulins
  • Several were provided Agilent immunoaffinity
    column to remove top-6 proteins
  • Much higher numbers of identifications after
    depletion if sufficient fractionation
  • Inadvertent removal of other proteins sponge
    effect of albumin
  • Assay both flow-through bound fractions

29
SPECIMEN VARIABLES
  • What evidence have we developed for choice of
    specimens for analysis?
  • Plasma preferred over serummore consistent, less
    degradation
  • EDTA-plasma preferred over heparin interferences
    and citrate dilution
  • Clot activator? necessary only for serum
  • Minimize freeze/thaw cycles (archives)
  • Minimal evidence of platelet activation 4C
  • Protease inhibitors desirable, but alter proteins

30
INFLUENCE OF ABUNDANCE
  • Using quantitative immunoassays and microarrays
    (generally unknown epitopes), we have found very
    high rates of detection of the more abundant
    proteins, less in the mid-range, and occasional
    detection of very low abundance proteins, as
    expected.
  • High correlation (r0.9) between peptides and
    measured concentrations

31
Least Abundant Proteins Identified with two
distinct peptides(pg/ml range 200 pg/ml to 20
ng/ml)
  • Alpha fetoprotein
    2.9E-02
  • TNF-R-8
    3.3E02
  • TNF-ligand-6
    1.5E03
  • PDGF-R alpha
    4.6E03
  • Leukemia inhibitory factor receptor 5.0E03
  • MMP-2/gelatinase
    8.8E03
  • EGFR
    1.1E04
  • TIMP-1
    1.4E04
  • IGFBP-2
    1.5E04
  • Activated leukocyte adhesion mol 1.6E04
  • Selectin L five labs10 peptides
    1.7E04

32
BIOLOGICAL INSIGHTS
  • The proteins identified can be annotated by many
    methods. We have searched multiple databases,
    including Gene Ontology, Novartis Atlas, Online
    Mendelian Inheritance in Man (OMIM), incomplete
    or unidentified sequences in the human genome,
    microbial genomes, InterPro protein domains,
    transmembrane domains, secretion signals.
  • See Proteomics 2005 53226-3519 Wiley, 2006

33
GENE ONTOLOGY SPECIFIC TERMS
  • Over-represented in PPP 3020 (vs whole genome)
    extracellular, immune response, blood
    coagulation, lipid transport, complement
    activation, regulation of blood pressure, as
    expected also cytoskeletal proteins, receptors
    and transporters.
  • Proteins from most cellular locations and
    molecular processes are recognized.
  • Under-represented perception of smell (1 vs
    25 exp) cation transporters, ribosomal proteins,
    G-protein coupled receptors, and nucleic acid
    binding proteins.

34
InterPro Protein Domain Analysis
  • Compared with the whole human genome, the 3020
    PPP proteins are
  • Over-represented for EGF, intermediate filament
    protein, sushi, thrombospondin, complement C1q,
    and cysteine protease inhibitor.
  • Under-represented Zinc finger (C2H2, B-box,
    RING), tyrosine protein phosphatase, tyrosine and
    serine/threonine protein kinases,
    helix-turn-helix motif, and IQ calmodulin binding
    region domains.

35
TRANSMEMBRANE AND SECRETED PROTEIN FEATURES
  • 1297 of 3020
  • SwissProt Annotated
    ProFun Both
  • Transmembrane 230 151 104
  • Secretion signal 373 420
    358
  • 1723 of 3020
    ProFun Predicted
  • TM domain(s)
    137
  • Secretion signal
    255

36
Cardiovascular-Related Proteins Biomarker
Candidates in the PPP Database
  • Proteins characterized in eight groups
  • Inflammation
  • Vascular
  • Signaling
  • Growth and differentiation
  • Cytoskeletal
  • Transcription factors
  • Channels
  • Receptors

37
Comparison of Five Search Algorithms
  • Using PPP data, Kapp et al (Proteomics 2005)
    found Sequest and Spectrum Mill more sensitive
    and MASCOT, Sonar, and X!Tandem more specific for
    peptide identifications at specified
    false-positive rates.
  • Some investigators have reported using
    combinations of two or more search engines.
    Decision rules are necessary.

38
Can We Overcome the Idiosyncrasies of Individual
Instruments and Laboratories?
  • Several informatics investigators approached the
    human PPP with an offer to re-analyze the
    complete MS/MS datasets using their own software
    and criteria from the raw spectra (or peaklists).
  • These analyses eliminated the heterogeneity of
    search algorithms, search parameters, and
    idiosyncrasies of individual labs.
  • The results are hard to compare, given different
    extent of analysis. However, each can be
    compared with the Core Dataset.

39
Independent Analyses from Raw Spectra (IDs with
2 peptides)
  • Core Dataset (18 datasets, 3020)
  • PepMiner (Beer, 8 large datasets, 2895) 1051 in
    3020 dataset, 700 in the 9504
  • X!Tandem (Beavis/States, 18 datasets, 2678) 577
    in the 3020 218 in the 889
  • PeptideProphet/ProteinProphet (Deutsch, 7
    datasets, 960)479 in 3020
  • Mascot/Digger (Kapp, Australia, 14 datasets, 513
    with 1.4 error rate ongoing analysis

40
What is Required and Feasible to Enhance the
Statistical Robustness of Findings?
  • Many complex proteomics analyses are done once,
    without replicates required to estimate
    coefficient of variation or other standard
    parameters for clinical chemistry use.
  • Five to ten independent repetitions of the
    experiments are a must Hamacher et al,
    Proteomics in Drug Discovery, 2006.
  • How should we determine how similar or different
    are samples A and B, or the results of methods X
    and Y? What decision rules apply?
  • We have a long way to go from discovery research
    to clinical applications.

41
Comparison of 5 Published Reports on Plasma
Proteins with HUPO PPP Datasets
  • Report IDs IPI in 3020 in 9504
  • Anderson 1175 990 316 471
  • Shen 1682 1842 213 526
  • Chan 1444 1019 257 402
  • Zhou 210 148 51 88
  • Rose 405 287 142 159

42
Comparison of New Biofluid Proteome Findings with
HUPO PPP-3020 Proteins
  • Proteome Proteins IPI 2.21 PPP-3020
  • Urine 1543 910
    293
  • tears 491 313
    117
  • semen 923 560
    180
  • Refs from Matthias Mann Lab, Genome Biology,
    2007, different IPI versions.
  • Comparison, Omenn, Proteomics-Clinical
    Applications (2007).

43
NEXT PHASE OF PPP (PPP-2)
  • Standard operating procedures (SOPs), including
    EDTA-plasma as standard specimen replication and
    confirmation of results
  • Quantitation and subproteomes, using new methods
    and advanced instruments
  • Databases and robust bioinformatics
  • Clinical chem/disease-related studies

44
PPP-2 Research Technology Thrusts
  • Learned a lot from Pilot Phaseplasma is a very
    complex specimen no single platform sufficient
    analyses currently far from comprehensive, let
    alone reproducible now have improved data
    quality and informatics resources.
  • PPP-2 use multiple methods focus on biomarker
    discovery build upon already-funded laboratories
    and repositories.

45
Specific Technology Recommendations
  • N-Glycosite (proteotypic) peptide resource is
    a special subproteome likely to have high
    biomarker relevance.
  • Capture glycoproteins, digest with
    trypsin and PNGase F to yield N-linked
    glycopeptides. Choose one unique to each protein
    a finite number not all proteins. Use
    complementary lectin approach to characterize
    glycans.
  • Prepare isotope-labeled
    N-glyco-peptides for multiple uses as standards
    and to spike specimens.

46
N-Glycosites
Glycoproteins are enriched on cell surface, in
secreted proteome and in plasma Glycoproteins
tend to be stable Only few glycosites per
protein reduction in sample complexity (excludes
albumin) Inherent validation of N-glycosite by
fragment ion spectrum N-glycosite subproteome is
probably the one easiest to completely map
47
Glycopeptide Isolation
Zhang H., Li X.-J., Martin D.B. Aebersold R.
(2003) Nat Biotech 21 660-666
48
Flow chart of process
Tissue Samples
Plasma Samples
Normal Disease
Capture / Digestion
'Glycopeptide' Fract.
'Glycopeptide' Fract.
LC-MS
LC/MS Maps
Target peptides
Data Analysis
MRM LC/MS/MS
Data Analysis
Targeted LC/MS/MS
49
Reducing Complexity Glycoprotein-Enriched
Subproteomes
  • Methods Lab 2
    Lab 11
  • Enrichment hydrazide chem lectin chromy
  • Peptide Fxn SCX RP RP
  • Mass Spec qtof
    deca-xp
  • Search engine Seq/ProteinProphet Sequest
  • Protein IDs 222
    83
  • in B1-serum 51 in common
  • Of total 254, 164 found among data from 11 other
    labs without glycoprotein enrichment.

50
Technology Recommendations (contd)
  • Orbitrap and other advanced instruments with high
    mass accuracy and increased throughput
  • Multiple Reaction Monitoring (Q-Trap, triple
    quad---LOD lt50 amol, 5 logs range, probably ng/ml
    range for GP.
  • Extensive fractionation and newer labeling
    methods.
  • Recruit several major labs be open to
    volunteers.
  • Determine interest in reference specimen.
  • Make peptide standards available through PPP-2
    post lists and make labeled compounds.

51
Multiple Reaction Monitoring (MRM)
  • High selectivity two levels of mass selection
    (increased S/N)
  • High sensitivity because of high duty cycle (Q1
    and Q3 are static)
  • Only known peptides (candidates) are detected

52
Technology Recommendations (contd)
  • Compare pooled samples from disease and control
    high throughput not essential for discovery phase
  • Continue to build the catalog
  • Do longitudinal repeat measures on individuals to
    establish CoVmust reliably tell whether two
    samples are the same or different, including PTMs
  • Pay attention to precursor ions
  • Known interested labs Aebersold, Paik, Smith,
    Speicher, Hancock, Mann probably Chinese,
    Michigan, FHCRC, Japanese/glycomics.

53
Issues for PPP Bioinformatics
  • What are imperatives for project design?
  • How can many more spectra be interpreted?
  • How can more confident protein IDs be generated?
  • How do we add value and benefit from EBI/PRIDE
    and ISB/PeptideAtlas repositories?
  • What is required to make the datasets more useful
    for other investigators?
  • Can quantitation, including of PTMs, be achieved
    with statistical robustness?

54
A Robust Bioinformatics Architecture
PRIDE
Dissemination
Peptide Atlas
Genome annotation
Level I repository
Individual labs
55
Repositories and Resources for Proteomics
Informatics
  • PRIDE at EBI, repository for protein
    identifications (Martens)
  • PeptideAtlas, repository for raw data processed
    through TransProteomics Pipeline at ISB
    (Deutsch), plus SpectraST barcodes from NIST
  • Tranche Distributed File System/DFS (Andrews, UM)
    at ProteomeCommons.org, National Resource for
    Proteomics and Pathways
  • CPAS, developed as part of Mouse Models of Human
    Cancers Consortium, at Fred Hutchinson (McIntosh)
  • GPMdb, developed by Beavis (Canada)

56
  • Tranche Distributed (P2P) File System
  • Open, simple, cross-platform protocols
  • e-Commerce-grade encryption makes it appropriate
    for scientific research (peer-review and
    traceability)
  • Can easily grow to accommodate very large amounts
    of data and users
  • Commodity hardware _at_ 0.37 per GB storage
  • 16 TB over 12 servers (30 additional TB ordered)
    and funding for additional 20TB
  • Documentation, tools, code, credits
    http//www.proteomecommons.org/dev/dfs
  • Data sets GPM, PNNL, Aurum, QqTOF vs QSTAR, sPRG
    ABRF 2006, HUPO PPP
  • Links with PeptideAtlas, OPD, HPRD, TheGPM

57
Can We Identify More High Confidence Peptides
from the MS/MS spectra?
  • The spectra, not protein lists, are the raw data.
    lt20 of spectra are confidently assigned to
    peptide sequences the rest are typically
    discarded.
  • More high quality spectra can be mined
    (Nesvizhskii et al, MCP 2006).
  • Higher mass accuracy greatly enhances results
    (with some complications---Eric Deutsch).
  • Error estimates and thresholds should be routine
    for peptide IDs and protein matches.
    TransProteomicPipeline (TPP) from ISB has been
    designed for this purpose.

58
Mining Un-assigned High Quality
Spectra(Nesvizhskii)
  • Typical search SEQUEST, IPI database
  • semi-constrained (tryptic on one end)
  • Met 16
  • /- 3 Da, average mass
  • Average numbers (LCQ/LTQ data) 10-15 of all
    spectra assigned peptide with high confidence
  • 20-25 of all high quality spectra are not
    assigned

59
Why Are Spectra Not Assigned?
  • Possible causes of failure to assign peptide
  • Imperfect scoring scheme
  • Constrained search (PTM, not tryptic etc.)
  • Incorrect mass/ charge state
  • Low spectrum quality / contaminant ion
  • Correct sequence may not be in the database
    searched (e.g., SNP)
  • Novel sequence (splice variants, fusion
    peptides?)
  • Use MS/MS data for genome
    annotation

60
Finding and Mining High Quality Unassigned
Spectra (Nesvizhskii)
61
Further Analyses at the Peptide Level
  • The PPP, GPM, and PeptideAtlas databases are rich
    with peptide-level findings, which can be
    analyzed for many questions---e.g., which
    peptides are most likely to be detected from
    among the predicted tryptic peptides of various
    proteins, and why? Can peptides be used directly
    to identify sequences of splice isoforms and
    SNPs? Can PTMs be identified more readily?
    Answers Yes to all three questions.
  • Proteotypic peptides will be a major feature of
    Next Phase PPP.

62
What Kinds of Biological Insights Emerge from
Annotation?
  • The aim of proteomics analyses is not just to
    create lists of peptides and proteins, but to
    advance our understanding of complex biological
    processes in health and disease. Going forward,
    quantitation of proteins and their PTMs will be
    increasingly important---and feasible.

63
High Throughput Proteomics and Systems Biology

condition 1
Understanding and modeling cell behavior Systems
Biology
condition 2
condition 3
Integration of genomic, transcriptomic,
proteomic, metabolomic data
64
SUMMARY
  • Enthusiasm for continuing and expanding Plasma
    Proteome Project, confirmed at Seoul, Korea,
    World Congress of Proteomics Oct 2007
  • Commitment to combine PPP with concept of Disease
    Biomarker Initiative
  • Interest in linking with and absorbing datasets
    from other Biofluid Proteomes (saliva, urine,
    CSF, organ-related proximal fluids)

65
Biology as an Information Science NIH Roadmap
National Centers for Biomedical Computing
Informatics for IntegratingBiology and the
Bedside (i2b2) Isaac Kohane, PI
Physics-Based Simulation of Biological Structures
(SIMBIOS) Russ Altman, PI
National Center for Integrative Biomedical
Informatics (NCIBI) Brian Athey, PI
National Alliance for Medical Imaging Computing
(NA-MIC) Ron Kikinis, PI
The National Center For Biomedical Ontology
(NCBO) Mark Musen, PI
Multiscale Analysis of Genomic and Cellular
Networks (MAGNet) Andrea Califano, PI
Center for Computational Biology (CCB) Arthur
Toga, PI
66
A Bioinformatics Approach to Discover Candidate
Oncogenes
  • Few causal cancer genes have been discovered
    using gene expression microarrays
  • Oncogenic events are often heterogeneous
  • ERBB2/HER2 amplification in 20 of breast CA
  • Activating Ras mutations in 25 of melanomas
  • E2A-PBX1 translocation in 5-10 of leukemias
  • Chromosomal aberrations that result in marked
    over-expression of an oncogene should be
    detectable in transcriptome data
  • Protein products then may be identified in tumor,
    biofluids, and plasma

67
(No Transcript)
68
COPA of microarray data revealed ETV1 and ERG as
outlier genes across multiple prostate cancer
gene expression data sets
Tomlins et al., Science 2005, 310 644
-648
69
COPA Unveils Androgen-Responsive TF Fusion
Genes
70
The Molecular Concept Map Project Chinnaiyan,
Rhodes
71
(No Transcript)
72
Our Genetic Future
  • Mapping the human genetic terrain may rank with
    the great expeditions of Lewis and Clark, Sir
    Edmund Hillary, and the Apollo Program.
    --Francis Collins, Director
  • National Human Genome
    Research Institute, 1999
  • Next
  • Understand gene and protein expression
  • Elucidate genetic, environmental, and
    behavioral interactions in health and disease
  • Engage scientists globally

73
Acknowledgements
  • HUPO PPP Ruedi Aebersold and Young-Ki Paik,
    co-chairs Eric Deutsch, Lennart Martens, Alexey
    Nesvizhskii, David States, bioinformatics lab
    leaders and sponsors (see Proteomics 2005)
  • UM Proteomics Alliance for Cancer Research Phil
    Andrews, David States, Alexey Nesvizhskii, George
    Michailidis, Mike Pisano, Arul Chinnaiyan, Dan
    Rhodes, Scott Tomlins, Arun Sreekumar, Adai
    Vellaichamy, Brian Haab
  • UM National Center for Integrative Biomedical
    Informatics Brian Athey, David States, HV
    Jagadish, Jignesh Patel, Peter Woolf, Biaoyang Lin
Write a Comment
User Comments (0)
About PowerShow.com