Novel Peptide Identification using ESTs and Sequence Database Compression - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

Novel Peptide Identification using ESTs and Sequence Database Compression

Description:

Center for Bioinformatics and Computational Biology ... Ala2Pro associated with familial amyloid polyneuropathy. 7. Novel Mutation. 8. Searching ESTs ... – PowerPoint PPT presentation

Number of Views:100

Avg rating:3.0/5.0

Slides: 34

Provided by: umiac7

Category:

Tags: amyloid | compression | database | ests | identification | novel | peptide | sequence | using

Transcript and Presenter's Notes

Title: Novel Peptide Identification using ESTs and Sequence Database Compression

1
Novel Peptide Identification using ESTs and
Sequence Database Compression

Nathan Edwards
Center for Bioinformatics and Computational
Biology
University of Maryland, College Park

2
What is missing from protein sequence databases?

Known coding SNPs
Novel coding mutations
Alternative splicing isoforms
Alternative translation start-sites
Microexons
Alternative translation frames

3
Why dont we see more novel peptides?

Tandem mass spectrometry doesnt discriminate
against novel peptides......but protein
sequence databases do!
Searching traditional protein sequence databases
biases the results towards well-understood
protein isoforms!

4
Novel Splice Isoform
5
Novel Splice Isoform
6
Novel Mutation
Ala2?Pro associated with familial amyloid
polyneuropathy
7
Novel Mutation
8
Searching ESTs

Proposed long ago
Yates, Eng, and McCormack Anal Chem, 95.
Now
Protein sequences are sufficient for protein
identification
Computationally expensive/infeasible
Difficult to interpret
Make EST searching feasible for routine searching
to discover novel peptides.

9
Searching Expressed Sequence Tags (ESTs)

Pros
No introns!
Primary splicing evidence for annotation
pipelines
Evidence for dbSNP
Often derived from clinical cancer samples

Cons
No frame
Large (8Gb)
Untrusted by annotation pipelines
Highly redundant
Nucleotide error rate 1

10
Other Search Strategies

Genome Corrected ESTs
Large (2Gb)
Controls for nucleotide error rate
Polymorphism lost, potential errors introduced
Genome Clustered ESTs
Small, Gene model
Convergence to well-understood isoforms
Controls nucleotide error rate
Full-Length mRNAs
Incomplete gene coverage, most are already in
IPI

11
Other Search Strategies

Genome
Large (6Gb), lots of non-coding DNA
Find novel ORFs, no sampling bias
Miss spliced peptide sequences.
Genscan Exons
Small, find novel ORFs.
Miss spliced peptide sequences.
How should we interpret peptide identifications
with no mRNA evidence?

12
Compressed EST Peptide Sequence Database

For all ESTs mapped to a UniGene gene
Six-frame translation
Eliminate ORFs lt 30 amino-acids
Eliminate amino-acid 30-mers observed once
Compress to C2 FASTA database
Complete, Correct for amino-acid 30-mers
Gene-centric peptide sequence database
Size lt 3 of naïve enumeration, 20774 FASTA
entries
Running time 1 of naïve enumeration search
E-values 2 of naïve enumeration search results

13
Compressed EST Peptide Sequence Database

For all ESTs mapped to a UniGene gene
Six-frame translation
Eliminate ORFs lt 30 amino-acids
Eliminate amino-acid 30-mers observed once
Compress to C2 FASTA database
Complete, Correct for amino-acid 30-mers
Gene-centric peptide sequence database
Size lt 3 of naïve enumeration, 20774 FASTA
entries
Running time 1 of naïve enumeration search
E-values 2 of naïve enumeration search results

14
SBH-graph
ACDEFGI, ACDEFACG, DEFGEFGI
15
Compressed SBH-graph
ACDEFGI, ACDEFACG, DEFGEFGI
16
Sequence Databases CSBH-graphs

Original sequences correspond to paths

ACDEFGI, ACDEFACG, DEFGEFGI
17
Sequence Databases CSBH-graphs

All k-mers represented by an edge have the same
count

1
2
2
1
2
18
CSBH-graphs

Quickly determine which k-mers occur at least
twice

2
2
1
2
19
de Bruijn Sequences

de Bruijn sequences represent all words of length
k from some alphabet A.
A 0,1, k 3 s 0001110100
A 0,1, k 4 s 0000111101011001000

20
de Bruijn Graph A 0,1, k 4
1
1
0
1
0
1
1
0
1
0
1
0
1
0
0
0
21
Correct, Complete, Compact (C3) Enumeration

Set of paths that use each edge exactly once

ACDEFGEFGI, DEFACG
22
Correct, Complete (C2) Enumeration

Set of paths that use each edge at least once

ACDEFGEFGI, DEFACG
23
Patching the CSBH-graph

Use artificial edges to fix unbalanced nodes

24
Patching the CSBH-graph

Use matching-style formulations to choose
artificial edges
Optimal C2/C3 enumeration in polynomial time.
Chinese Postman Problem
Edmonds and Johnson, 73
l-tuple DNA sequencing
Pevzner, 89
Shortest (Common) Superstring
MAX-SNP-hard, 2.5 approx algorithm

25
C3 Enumeration
in-out
in-out
Cost k
26
C3 Enumeration
in-out
in-out
Cost 0
Cost 0
Cost k
27
Reusing Edges

ACDEHAC, ACDFHAC, ACDGHACD

28
Reusing Edges

C3 ACDEHACDFHAC, ACDGHACD

29
Reusing Edges

C2 ACDEHACDFHACDGHAC

30
C2 Enumeration
in-out
in-out
4
10
Shortcut paths
7
31
Implementation

CSBH-graph construction
Determine non-trivial nodes directly
Consecutive non-trivial nodes determine edges
C3/C2 enumeration
C3 Trivial assignment of artificial edges
C2 Depth-first search Goldbergs CS2
min cost flow code
Eulerian path algorithm
Can be applied to entire EST database
Condor grid and PBS cluster for CSBH-graph
construction
Large memory machine for C3/C2 enumeration

32
Conclusions

Peptides identify more than just proteins
Compressed peptide sequence databases makes
routine EST searching feasible
Currently available for download
Can include other sources of peptide sequence at
little additional cost.
CSBH-graph edge counts C2/C3 enumeration
algorithms
Minimal FASTA representation of k-mer sets

33
Acknowledgements

Chau-Wen Tseng, Xue Wu
UMCP Computer Science
Catherine Fenselau, Crystal Harvey
UMCP Biochemistry
Calibrant Biosystems
PeptideAtlas, HUPO PPP, X!Tandem
Funding National Cancer Institute

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Novel Peptide Identification using ESTs and Genomic Sequence PowerPoint PPT Presentation

Novel Peptide Identification using ESTs and Genomic Sequence - Novel Peptide Identification using ESTs and Genomic Sequence | PowerPoint PPT presentation | free to view

Image Forgery Identification Using JPEG Intrinsic Fingerprints PowerPoint PPT Presentation

Image Forgery Identification Using JPEG Intrinsic Fingerprints - EE398: Image and Video Compression ... JPEG compress tamper save bitmap. Q table available. Develop forgery detection algorithms ... | PowerPoint PPT presentation | free to view

Generating Peptide Candidates from Protein Sequence Databases for Protein Identification via Mass Sp PowerPoint PPT Presentation

Generating Peptide Candidates from Protein Sequence Databases for Protein Identification via Mass Sp - Protein Identification. Turns mass spectrometry into proteomics ... Suffix-Tree Traversal. O(k log k n L r log k) time. Redundancy eliminated ... | PowerPoint PPT presentation | free to view

Comparative genomics: functional characterization of new genes and regulatory interactions using computer analysis PowerPoint PPT Presentation

Comparative genomics: functional characterization of new genes and regulatory interactions using computer analysis - The metabolic map, the bird's view. Metabolic pathways, the eagle's view ... Identification of the candidate regulator by the analysis of phyletic patterns ... | PowerPoint PPT presentation | free to view

Object Tracking using Particle Filter PowerPoint PPT Presentation

Object Tracking using Particle Filter - Non-Gaussian. Background ... Initial State: Est(1) Mathematical Background ... 'Novel approach to nonlinear/non-Gaussian Bayesian state estimation,' IEEE ... | PowerPoint PPT presentation | free to view

Improving the Sensitivity of Peptide Identification for Genome Annotation PowerPoint PPT Presentation

Improving the Sensitivity of Peptide Identification for Genome Annotation - Mascot. 23. Combining search engine results harder than it looks! ... Mascot, X!Tandem, K-Score, OMSSA, MyriMatch. Automatic decoy searches. Automatic spectrum ... | PowerPoint PPT presentation | free to view

Gene discovery using combined signals from genome sequence and natural selection PowerPoint PPT Presentation

Gene discovery using combined signals from genome sequence and natural selection - Map experimentally determined sequences of spliced transcripts to their genomic source ... WU-BLAST. Aligned Intron Filter. Validation (RT-PCR) ... | PowerPoint PPT presentation | free to view

Novel Peptide Identification using ESTs and Genomic Sequence PowerPoint PPT Presentation

Novel Peptide Identification using ESTs and Genomic Sequence - Evidence for SNPs and alternative splicing stops with ... Running time: ~ 1% of na ve enumeration search. E-values: ~ 2% of na ve enumeration search results ... | PowerPoint PPT presentation | free to view

Optimal k-mer superstrings for protein identification and DNA assay design. PowerPoint PPT Presentation

Optimal k-mer superstrings for protein identification and DNA assay design. - Center for Bioinformatics and Computational Biology. University of ... One of the first algorithmic approaches to genome assembly. 7. de Bruijn Sequences ... | PowerPoint PPT presentation | free to view

Improving the Sensitivity of Peptide Identification PowerPoint PPT Presentation

Improving the Sensitivity of Peptide Identification - Xue Wu, Chau-Wen Tseng. Department of Computer Science. University of Maryland, College Park ... Search engine strengths, weaknesses, quirks. Use multiple ... | PowerPoint PPT presentation | free to view

Counter Braids A novel counter architecture for network measurement PowerPoint PPT Presentation

Counter Braids A novel counter architecture for network measurement - ... Threshold Computation of Threshold Computation of Threshold Comparison with Linear Programming Decoder Relation to Compressed Sensing Comparison ... | PowerPoint PPT presentation | free to view

A Comparison of Still-Image Compression Standards and Proposed Methods for Improving Lossy Image Quality PowerPoint PPT Presentation

A Comparison of Still-Image Compression Standards and Proposed Methods for Improving Lossy Image Quality - A Comparison of Still-Image Compression Standards and Proposed Methods for Improving Lossy Image Quality MDDSP Literature Survey Presentation Eric Heinen | PowerPoint PPT presentation | free to view

Oracle Database Backup-and-Recovery Best Practices and New Features PowerPoint PPT Presentation

Oracle Database Backup-and-Recovery Best Practices and New Features - Oracle Database Backup-and-Recovery Best Practices and New Features Timothy Chien Principal Product Manager Database High Availability Agenda What Keeps You Awake at ... | PowerPoint PPT presentation | free to view

CS177 Lecture 8 Bioinformatics Databases (and genetic diseases) PowerPoint PPT Presentation

CS177 Lecture 8 Bioinformatics Databases (and genetic diseases) - ... DDBJ; archival (International Nucleotide Sequence Database ... Synechocystis sp. (bacteria); yeast ... To determine the intron/exon ... | PowerPoint PPT presentation | free to view

PowerPoint Poster Template PowerPoint PPT Presentation

PowerPoint Poster Template - Novel Peptide Identification using ESTs and Genomic Sequence USHUPO 2006 Nathan J. Edwards1, Xue Wu2, Chau-Wen Tseng2 1Center for Bioinformatics & Computational ... | PowerPoint PPT presentation | free to view

Novel Peptide Identification using ESTs and Genomic Sequence PowerPoint PPT Presentation

Novel Peptide Identification using ESTs and Genomic Sequence - Title: Faster, More Sensitive Peptide ID by Sequence DB Compression Last modified by: Nathan John Edwards Created Date: 12/6/2004 12:44:14 AM Document presentation format | PowerPoint PPT presentation | free to view

Proteomic Characterization of Alternative Splicing and Coding Polymorphism PowerPoint PPT Presentation

Proteomic Characterization of Alternative Splicing and Coding Polymorphism - Title: Faster, More Sensitive Peptide ID by Sequence DB Compression Last modified by: Nathan John Edwards Created Date: 12/6/2004 12:44:14 AM Document presentation format | PowerPoint PPT presentation | free to view

MaizeGDB: A Next-Generation Maize Database PowerPoint PPT Presentation

MaizeGDB: A Next-Generation Maize Database - ... and Genes Let s say we have a sequence on hand and we ... set your evalue cutoffs ... describe the map locations of various hits; gives the alignment ... | PowerPoint PPT presentation | free to view

Systems biology: identification of regulatory regions and disease causing genes and mechanisms PowerPoint PPT Presentation

Systems biology: identification of regulatory regions and disease causing genes and mechanisms - Systems biology: identification of regulatory regions and disease causing genes and mechanisms PhD defense Peter Van Loo Promotor: P. Marynen Co-promotors: B. De Moor | PowerPoint PPT presentation | free to view

Fundamentals of Multimedia Chapter 8 Lossy Compression Algorithms (Wavelet) Ze-Nian Li and Mark S. Drew PowerPoint PPT Presentation

Fundamentals of Multimedia Chapter 8 Lossy Compression Algorithms (Wavelet) Ze-Nian Li and Mark S. Drew - Fundamentals of Multimedia Chapter 8 Lossy Compression Algorithms (Wavelet) Ze-Nian Li and Mark S. Drew | PowerPoint PPT presentation | free to view

Fundamentals of Multimedia Chapter 8 Lossy Compression Algorithms Ze-Nian Li and Mark S. Drew PowerPoint PPT Presentation

Fundamentals of Multimedia Chapter 8 Lossy Compression Algorithms Ze-Nian Li and Mark S. Drew - Fundamentals of Multimedia Chapter 8 Lossy Compression Algorithms Ze-Nian Li and Mark S. Drew | PowerPoint PPT presentation | free to view

Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates PowerPoint PPT Presentation

Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates - Title: Multiple Mapping Method: A novel approach to the sequence-to-structure alignment problem in comparative protein structure modeling Author | PowerPoint PPT presentation | free to view

Pair-Wise Sequence Alignment Methods and Tools PowerPoint PPT Presentation

Pair-Wise Sequence Alignment Methods and Tools - The second approach for scanning a database is to construct a deterministic finite automata ... FastA is an algorithm that attempts to speed up string matching over ... | PowerPoint PPT presentation | free to view

geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets PowerPoint PPT Presentation

geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets - geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets dr. florentino fdez-riverola university of vigo Computer System of New Generation | PowerPoint PPT presentation | free to view

Benefit of Using Actives packed anti wrinkle cream with peptides for your Skin PowerPoint PPT Presentation

Benefit of Using Actives packed anti wrinkle cream with peptides for your Skin - Peptides contain long and short chains of amino acid and share the same chemical structure with proteins. | PowerPoint PPT presentation | free to view

Antibody peptides PowerPoint PPT Presentation

Antibody peptides - Antibody-peptide Antibody peptides also have high affinity with unlimited access to almost all niches of cells, meanwhile they are easier to manufacture. Antibody-peptide conjugates combine the advantages of mAbs and small molecules. https://www.creative-biolabs.com/bsab/antibody-peptide-conjugates-generation-service.htm | PowerPoint PPT presentation | free to view

Global Peptide Synthesizer Market Growth 2019-2024 PowerPoint PPT Presentation

Global Peptide Synthesizer Market Growth 2019-2024 - Peptide Synthesizer is an instrument used for peptide synthesis according to the principle of solid phase peptide synthesis. Peptides are used to prepare epitope-specific antibodies, map antibody epitopes and enzyme binding sites and to design novel enzymes, drugs and vaccines. Peptide synthesis is characterized as the formation of a peptide bond between two amino acids. | PowerPoint PPT presentation | free to view