Improving the Sensitivity of Peptide Identification - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Improving the Sensitivity of Peptide Identification

Description:

Xue Wu, Chau-Wen Tseng. Department of Computer Science. University of Maryland, College Park ... Search engine strengths, weaknesses, quirks. Use multiple ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 25
Provided by: edwardsla
Category:

less

Transcript and Presenter's Notes

Title: Improving the Sensitivity of Peptide Identification


1
Improving the Sensitivityof Peptide
Identification
  • Nathan Edwards
  • Department of Biochemistry and Molecular
    Cellular Biology
  • Georgetown University Medical Center
  • Xue Wu, Chau-Wen Tseng
  • Department of Computer Science
  • University of Maryland, College Park

2
Lost peptide identifications
  • Missing from the sequence database
  • Search engine strengths, weaknesses, quirks
  • Poor score or statistical significance
  • Thorough search takes too long

3
Lost peptide identifications
  • Missing from the sequence database
  • Build exhaustive peptide sequence databases
  • Search engine strengths, weaknesses, quirks
  • Use multiple search engines and combine results
  • Poor score or statistical significance
  • Use spectral-matching to identify weak spectra
  • Use search-engine consensus to boost confidence
  • Use machine-learning to distinguish true from
    false
  • Thorough search takes too long
  • Harness the power of heterogeneous computational
    grids

4
Peptide Sequence Databases
  • All peptides at most 30 amino-acids long from
  • IPI and all IPI constituent protein sequences
  • IPI, HInvDB, VEGA, UniProt, EMBL, RefSeq, GenBank
  • SwissProt variants, conflicts, splices, and
    signal peptide truncations.
  • Genbank and RefSeq mRNA sequence
  • 3 frame translation
  • GenBank EST and HTC sequences
  • 6 frame translation and found in at least 2
    sequences
  • Grouped by UniGene cluster and compressed.

5
Peptide Sequence Databases
  • Formatted as a FASTA sequence database
  • Easy integration with search engines.
  • One entry per gene/cluster.
  • Automated rebuild every few months.

Organism Size (AA) Size (Entries)
Human 209Mb 75,043
Mouse 151Mb 55,929
Rat 67Mb 43,211
Zebra-fish 90Mb 47,922
6
Spectral Matching with HMMs
7
Spectral Matching with HMMs
8
Hidden Markov Model
Delete
Insert
Ion
(m/z,int) pair emitted by ion insert states
9
Boosting Identification Sensitivity
10
Spectral Matching of Peptide Variants
DFLAGGIAAAISK
DFLAGGVAAAISK
11
Spectral Matching Extrapolation
12
Comparison of search engine results
  • No single score is comprehensive
  • Search engines disagree
  • Many spectra lack confident peptide assignment

Searle et al. JPR 7(1), 2008
13
Combining search engine results harder than it
looks!
  • Consensus boosts confidence, but...
  • How to assess statistical significance?
  • Gain specificity, but lose sensitivity!
  • Incorrect identifications are correlated too!
  • How to handle weak identifications?
  • Consensus vs disagreement vs abstention
  • Threshold at some significance?
  • We apply unsupervised machine-learning....
  • Lots of related work unified in a single
    framework.

14
Supervised Learning
15
Unsupervised Learning
16
PepArML Combining Results
17
Unsupervised Learning
U-TMO
U-TMO
C-TMO
H
False Positive Rate
Iteration
18
Searching for Consensus
  • Search engine quirks can destroy consensus
  • Initial methionine loss as tryptic peptide
  • Charge state enumeration or guessing
  • X!Tandem's refinement mode
  • Pyro-Gln, Pyro-Glu modifications
  • Difficulty tracking spectrum identifiers
  • Precursor mass tolerance (Da vs ppm)
  • Decoy searches must be identical!

19
Configuring for Consensus
  • Search engine configuration can be difficult
  • Correct spectral format
  • Search parameter files and command-line
  • Pre-processed sequence databases.
  • Tracking spectrum identifiers
  • Extracting peptide identifications, especially
    modifications and protein identifiers

20
Peptide Identification Meta-Search Parameters
  • Instrument
  • Precursor Tolerance
  • Fragment Tolerance
  • Max. Charge
  • Sequence Database
  • Target/Decoy
  • Modification
  • Fixed/Variable
  • Amino-Acids
  • Position
  • Delta
  • Proteolytic Agent
  • Motif
  • Peptide Candidates
  • Termini Specificity
  • Precursor Tolerance
  • Missed cleavages
  • Charge State Handling
  • 13C Peaks
  • Search Engines
  • Mascot, X!Tandem
  • OMSSA, MyriMatch

21
Peptide Identification Meta-Search
  • Simple unified search interface for
  • Mascot, X!Tandem
  • OMSSA, Myrimatch
  • Automatic decoy searches
  • Automatic spectrumfile "chunking"
  • Automatic scheduling
  • Serial, Multi-Processor,
  • Cluster, Grid

22
Peptide Identification Meta-Search
Heterogeneous compute resources
NSF TeraGrid 1000 CPUs
Edwards Lab Scheduler 48 CPUs
Secure communication
Simple searchrequest
UMIACS 250 CPUs
23
Conclusions
  • Improve sensitivity of peptide identification
  • Exhaustive peptide sequence databases
  • Machine-learning for matching and combining
  • Meta-search tools maximize consensus
  • Grid-computing to achieve thorough search

24
Acknowledgements
  • Catherine Fenselau
  • University of Maryland Biochemistry
  • Funding NIH/NCI, USDA/ARS
Write a Comment
User Comments (0)
About PowerShow.com