Proteomic Characterization of Alternative Splicing and Coding Polymorphism - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Proteomic Characterization of Alternative Splicing and Coding Polymorphism

Description:

Title: Faster, More Sensitive Peptide ID by Sequence DB Compression Last modified by: Nathan John Edwards Created Date: 12/6/2004 12:44:14 AM Document presentation format – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 48
Provided by: edwardsla
Category:

less

Transcript and Presenter's Notes

Title: Proteomic Characterization of Alternative Splicing and Coding Polymorphism


1
Proteomic Characterization of Alternative
Splicing and Coding Polymorphism
  • Nathan Edwards
  • Center for Bioinformatics and Computational
    Biology
  • University of Maryland, College Park

2
Proteomics
  • Proteins are the machines that drive much of
    biology
  • Genes are merely the recipe
  • The direct characterization of a samples
    proteins en masse.
  • What proteins are present?
  • How much of each protein is present?

3
Systems Biology
  • Establish relationships by
  • Choosing related samples,
  • Global characterization, and
  • Comparison.

Gene / Transcript / Protein Gene / Transcript / Protein
Measurement Predetermined Unknown
Discrete (DNA) Genotyping Sequencing
Continuous Gene Expression Proteomics
4
Samples
  • Healthy / Diseased
  • Cancerous / Benign
  • Drug resistant / Drug susceptible
  • Progression or Prognosis
  • Bound / Unbound
  • Tissue specific
  • Cellular location specific
  • Mitochondria, Membrane

5
2D Gel-Electrophoresis
  • Protein separation
  • Molecular weight (MW)
  • Isoelectric point (pI)
  • Staining
  • Birds-eye view of protein abundance

6
2D Gel-Electrophoresis
Bécamel et al., Biol. Proced. Online 2002494-104
.
7
Paradigm Shift
  • Traditional protein chemistry assay methods
    struggle to establish identity.
  • Identity requires
  • Specificity of measurement (Precision)
  • A reference for comparison

8
Mass Spectrometry for Proteomics
  • Measure mass of many (bio)molecules
    simultaneously
  • High bandwidth
  • Mass is an intrinsic property of all
    (bio)molecules
  • No prior knowledge required

9
Mass Spectrometer
  • ElectronMultiplier(EM)
  • Time-Of-Flight (TOF)
  • Quadrapole
  • Ion-Trap
  • MALDI
  • Electro-SprayIonization (ESI)

10
High Bandwidth
11
Mass is fundamental!
12
Mass Spectrometry for Proteomics
  • Measure mass of many molecules simultaneously
  • ...but not too many, abundance bias
  • Mass is an intrinsic property of all
    (bio)molecules
  • ...but need a reference to compare to

13
Mass Spectrometry for Proteomics
  • Mass spectrometry has been around since the turn
    of the century...
  • ...why is MS based Proteomics so new?
  • Ionization methods
  • MALDI, Electrospray
  • Protein chemistry automation
  • Chromatography, Gels, Computers
  • Protein sequence databases
  • A reference for comparison

14
Sample Preparation for Peptide Identification
15
Single Stage MS
MS
m/z
16
Tandem Mass Spectrometry(MS/MS)
m/z
Precursor selection
m/z
17
Tandem Mass Spectrometry(MS/MS)
Precursor selection collision induced
dissociation (CID)
m/z
MS/MS
m/z
18
Peptide Identification
  • For each (likely) peptide sequence
  • 1. Compute fragment masses
  • 2. Compare with spectrum
  • 3. Retain those that match well
  • Peptide sequences from protein sequence databases
  • Swiss-Prot, IPI, NCBIs nr, ...
  • Automated, high-throughput peptide identification
    in complex mixtures

19
Why dont we see more novel peptides?
  • Tandem mass spectrometry doesnt discriminate
    against novel peptides......but protein
    sequence databases do!
  • Searching traditional protein sequence databases
    biases the results towards well-understood
    protein isoforms!

20
What goes missing?
  • Known coding SNPs
  • Novel coding mutations
  • Alternative splicing isoforms
  • Alternative translation start-sites
  • Microexons
  • Alternative translation frames

21
Why should we care?
  • Alternative splicing is the norm!
  • Only 20-25K human genes
  • Each gene makes many proteins
  • Proteins have clinical implications
  • Biomarker discovery
  • Evidence for SNPs and alternative splicing stops
    with transcription
  • Genomic assays, ESTs, mRNA sequence.
  • Little hard evidence for translation start site

22
Novel Splice Isoform
  • Human Jurkat leukemia cell-line
  • Lipid-raft extraction protocol, targeting T cells
  • von Haller, et al. MCP 2003.
  • LIME1 gene
  • LCK interacting transmembrane adaptor 1
  • LCK gene
  • Leukocyte-specific protein tyrosine kinase
  • Proto-oncogene
  • Chromosomal aberration involving LCK in
    leukemias.
  • Multiple significant peptide identifications

23
Novel Splice Isoform
24
Novel Splice Isoform
25
Novel Mutation
  • HUPO Plasma Proteome Project
  • Pooled samples from 10 male 10 female healthy
    Chinese subjects
  • Plasma/EDTA sample protocol
  • Li, et al. Proteomics 2005. (Lab 29)
  • TTR gene
  • Transthyretin (pre-albumin)
  • Defects in TTR are a cause of amyloidosis.
  • Familial amyloidotic polyneuropathy
  • late-onset, dominant inheritance

26
Novel Mutation
Ala2?Pro associated with familial amyloid
polyneuropathy
27
Novel Mutation
28
Expressed Sequence Tags (ESTs)
  • Cheap, fast, coding
  • Single sequencing reads of mRNA
  • Sequence from 5 or 3 end
  • No assembly

http//www.ncbi.nlm.nih.gov/About/primer/est.html
29
Searching ESTs
  • Proposed long ago
  • Yates, Eng, and McCormack Anal Chem, 95.
  • Now
  • Protein sequences are sufficient for protein
    identification
  • Computationally expensive/infeasible
  • Difficult to interpret
  • Make EST searching feasible for routine searching
    to discover novel peptides.

30
Searching Expressed Sequence Tags (ESTs)
  • Pros
  • No introns!
  • Primary splicing evidence for annotation
    pipelines
  • Evidence for dbSNP
  • Often derived from clinical cancer samples
  • Cons
  • No frame
  • Large (8Gb)
  • Untrusted by annotation pipelines
  • Highly redundant
  • Nucleotide error rate 1

31
Compressed EST Peptide Sequence Database
  • For all ESTs mapped to a UniGene gene
  • Six-frame translation
  • Eliminate ORFs lt 30 amino-acids
  • Eliminate amino-acid 30-mers observed once
  • Compress to C2 FASTA database
  • Complete, Correct for amino-acid 30-mers

32
Compressed EST Peptide Sequence Database
  • For all ESTs mapped to a UniGene gene
  • Six-frame translation
  • Eliminate ORFs lt 30 amino-acids
  • Eliminate amino-acid 30-mers observed once
  • Compress to C2 FASTA database
  • Complete, Correct for amino-acid 30-mers

33
Compressed EST Database
  • Gene centric compressed EST peptide sequence
    database
  • 20,774 sequence entries
  • 8Gb vs 223 Mb
  • 35 fold compression
  • 22 hours becomes 15 minutes
  • E-values improve by similar factor!
  • Makes routine EST searching feasible
  • Search ESTs instead of IPI?

34
Back to the lab...
  • Current LC/MS/MS workflows identify a few
    peptides per protein
  • ...not sufficient for protein isoforms
  • Need to raise the sequence coverage to (say) 80
  • ...protein separation prior to LC/MS/MS analysis
  • Potential for database of splice sites of
    (functional) proteins!

35
Microorganism Identification by MALDI Mass
Spectrometry
  • Direct observation of microorganism biomarkers in
    the field.
  • Peaks represent masses of abundant proteins.
  • Statistical models assess identification
    significance.

B.anthracisspores
MALDI Mass Spectrometry
36
Key Principles
  • Protein mass from protein sequence
  • No introns, few PTMs
  • Specificity of single mass is very weak
  • Statistical significance from many peaks
  • Not all proteins are equally likely to be
    observed
  • Ribosomal proteins, SASPs

37
Rapid Microorganism Identification Database
(www.RMIDb.org)
  • Protein Sequences
  • 8.1M (2.9M)
  • Species
  • 18K
  • Genbank,
  • Microbial, Virus, Plasmid
  • RefSeq
  • CMR,
  • Swiss-Prot
  • TrEMBL

38
Rapid Microorganism Identification Database
(www.RMIDb.org)
39
Informatics Issues
  • Need good species / strain annotation
  • B.anthracis vs B.thuringiensis 
  • Need correct protein sequence
  • B.anthracis Sterne a/ß SASP
  • RefSeq/Gb MVMARN... (7442 Da)
  • CMR MARN... (7211 Da)
  • Need chemistry based protein classification

40
Spectral Matching
  • Detection vs. identification
  • Increased sensitivity
  • No novel peptides
  • NIST GC/MS Spectral Library
  • Identifies small molecules,
  • 100,000s of (consensus) spectra
  • Bundled/Sold with many instruments
  • Dot-product spectral comparison
  • Current project Peptide MS/MS

41
Peptide DLATVYVDVLK
42
Peptide DLATVYVDVLK
43
Hidden Markov Models for Spectral Matching
  • Capture statistical variation and consensus in
    peak intensity
  • Capture semantics of peaks
  • Extrapolate model to other peptides
  • Good specificity with superior sensitivity for
    peptide detection

44
Conclusions
  • Molecular biology bioinformatics provide a
    reference for biotechnologies
  • Foundation of systems biology
  • Peptides identify more than just proteins
  • Untapped source of disease biomarkers
  • Compressed peptide sequence databases make
    routine EST searching feasible

45
Future Research Directions
  • Identification of protein isoforms
  • Optimize proteomics workflow for isoform
    detection
  • Identify splice variants in cancer cell-lines
    (MCF-7) and clinical brain tumor samples
  • dbPep for genomic annotation

46
Future Research Directions
  • Proteomics for Microorganism Identification
  • Specificity of tandem mass spectra
  • Revamp RMIDb prototype
  • Incorporate spectral matching

47
Acknowledgements
  • Catherine Fenselau, Steve Swatkoski
  • UMCP Biochemistry
  • Chau-Wen Tseng, Xue Wu
  • UMCP Computer Science
  • Cheng Lee
  • Calibrant Biosystems
  • PeptideAtlas, HUPO PPP, X!Tandem
  • Funding NIH/NCI, USDA/ARS
Write a Comment
User Comments (0)
About PowerShow.com