Novel Peptide Identification using ESTs and Genomic Sequence - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Novel Peptide Identification using ESTs and Genomic Sequence

Description:

Title: Faster, More Sensitive Peptide ID by Sequence DB Compression Last modified by: Nathan John Edwards Created Date: 12/6/2004 12:44:14 AM Document presentation format – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 27
Provided by: edwardsla
Category:

less

Transcript and Presenter's Notes

Title: Novel Peptide Identification using ESTs and Genomic Sequence


1
Novel Peptide Identification using ESTs and
Genomic Sequence
  • Nathan Edwards
  • Center for Bioinformatics and Computational
    Biology
  • University of Maryland, College Park

2
Sample Preparation for Peptide Identification
3
Mass Spectrometer
  • ElectronMultiplier(EM)
  • Time-Of-Flight (TOF)
  • Quadrapole
  • Ion-Trap
  • MALDI
  • Electro-SprayIonization (ESI)

4
Single Stage MS
MS
m/z
5
Tandem Mass Spectrometry(MS/MS)
m/z
Precursor selection
m/z
6
Tandem Mass Spectrometry(MS/MS)
Precursor selection collision induced
dissociation (CID)
m/z
MS/MS
m/z
7
Peptide Identification
  • For each (likely) peptide sequence
  • 1. Compute fragment masses
  • 2. Compare with spectrum
  • 3. Retain those that match well
  • Peptide sequences from protein sequence databases
  • Swiss-Prot, IPI, NCBIs nr, ...
  • Automated, high-throughput peptide identification
    in complex mixtures

8
What goes missing?
  • Known coding SNPs
  • Novel coding mutations
  • Alternative splicing isoforms
  • Alternative translation start-sites
  • Microexons
  • Alternative translation frames

9
Why should we care?
  • Alternative splicing is the norm!
  • Only 20-25K human genes
  • Each gene makes many proteins
  • Proteins have clinical implications
  • Biomarker discovery
  • Evidence for SNPs and alternative splicing stops
    with transcription
  • Genomic assays, ESTs, mRNA sequence.
  • Little hard evidence for translation start site

10
Novel Splice Isoform
11
Novel Splice Isoform
12
Novel Frame
13
Novel Frame
14
Novel Mutation
Ala2?Pro associated with familial amyloid
polyneuropathy
15
Novel Mutation
16
Genomic Peptide Sequences
  • Genomic DNA
  • Exons introns, 6 frames, large (3Gb ? 6Gb)
  • ESTs
  • No introns, 6 frames, large (4Gb ? 8Gb)
  • Used by gene, protein, and alternative splicing
    annotation pipelines
  • Highly redundant, nucleotide error rate 1

17
Compressed EST Database
  • Six-frame translation of all ESTs
  • Optionally, ESTs that map to a gene
  • Eliminate ORFs lt 30 amino-acids
  • Amino-acid 30-mers
  • Observed in at least two ESTs
  • Represent AA 30-mers in C3 FASTA database
  • Complete, Correct, Compact

18
SBH-graph
ACDEFGI, ACDEFACG, DEFGEFGI
19
Compressed SBH-graph
ACDEFGI, ACDEFACG, DEFGEFGI
20
Sequence Databases CSBH-graphs
  • Original sequences correspond to paths

ACDEFGI, ACDEFACG, DEFGEFGI
21
Sequence Databases CSBH-graphs
  • All k-mers represented by an edge have the same
    count

1
2
2
1
2
22
cSBH-graphs
  • Quickly determine those that occur twice

2
2
1
2
23
Compressed-SBH-graph
2
2
1
2
ACDEFGI
24
Compressed EST Database
  • Gene centric compressed EST peptide sequence
    database
  • 20,774 sequence entries
  • 8Gb vs 223 Mb
  • 35 fold compression
  • 22 hours becomes 15 minutes
  • E-values improve by similar factor!
  • Makes routine EST searching feasible
  • Search ESTs instead of IPI?

25
Conclusions
  • Peptides identify more than just proteins
  • Compressed peptide sequence databases make
    routine EST searching feasible
  • cSBH-graph edge counts C2/C3 enumeration
    algorithms
  • Minimal FASTA representation of k-mer sets

26
Collaborators
  • Chau-Wen Tseng, Xue Wu
  • Computer Science
  • Catherine Fenselau, Crystal Harvey
  • Biochemistry
  • Calibrant Biosystems
  • Thanks to PeptideAtlas, X!Tandem
Write a Comment
User Comments (0)
About PowerShow.com