TOPP - PowerPoint PPT Presentation

1 / 70
About This Presentation
Title:

TOPP

Description:

Motivation typical steps in proteome analysis. OpenMS an open source software ... Inspired by EMBOSS: one application for each frequently used analysis step ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 71
Provided by: clemen88
Category:
Tags: topp | emboss

less

Transcript and Presenter's Notes

Title: TOPP


1
  • TOPP -
  • The
  • OpenMS
  • Proteomics
  • Pipeline

2
Outline
  • Motivation typical steps in proteome analysis
  • OpenMS an open source software frameworkfor
    LC/MS based proteomics
  • TOPP - The OpenMS Proteomics Pipeline
  • Example applications / use cases

3
Proteomics
Same genome...
...different proteomes!
4
Proteomics
  • Interested in qualitative, quantitative, and
    dynamic aspects of a proteome
  • Very complex biological samples
  • Need for separation
  • Cannot amplify proteins
  • Cannot identify and quantitate proteins directly

5
TOPP Goals
  • Large data volume (gt 1 GB per experiment)
  • Complex data, changing experimental designs
  • Limitations of vendor-supplied software
  • Data analysis currently limits the types of
    experiments which can be performed!
  • Goal
  • Bridge the gap between algorithms and
    applications(computer science / biology)
  • needed flexible, easy-to-use software enabling
    complex experimental designs

6
Shotgun Proteomics
K
Digestion
Separation
Peptides
Proteins
  • Idea
  • Cannot analyze proteins directly? Digest
    proteins into peptides
  • Separate peptides
  • Identify and quantitate proteins through their
    peptides

7
Liquid Chromatography
I
S
C
8
Mass Spectrometry
9
Raw HPLC-MS map
intensity
MS scans(schematically)
rt
m/z
10
Raw HPLC-MS map
intensity
rt
m/z
11
Raw HPLC-MS map
intensity
Features

rt
m/z
12
Raw HPLC-MS map
intensity
Features
But whatwas it?
rt
m/z
13
HPLC ESI QTOF MS ID
ESI-QTOF-MS
HPLC
MS2-Spectrum
i
Identification
KGFSPDGR
m/z
14
Interpretation of tandem MS spectra
y
7
800
700
600
M2H2
500
y
3
b
-
2
(
H
O
)
5
2
Intensity
400
300
y
6
b
-
H
O
4
2
y
9
739.34
200
S
V
I/L
S
y
8
b
-
H
O
5
2
y
1
y
1
0
2
9
4
.
2
b
-
H
O
6
2
y
100
4
b
-
H
O
1120.60
2
2
y
m/z
5
b
-
H
O
y
3
2
2
a
2
b
b
b
5
b
b
4
1
2
3
y
1
1
800
200
400
600
1200
1000
15
Overview / Glossary
  • HPLC High-performance liquid chromatographyPep
    tides elute at different retention times
    (RT)from a chromatographic column
  • MS Mass spectrometryPeptides have different
    mass/charge ratios (m/z)in a mass spectrum
  • MS/MS Tandem mass spectrometrySelected ions
    in a mass spec can be furtherfragmented gt
    derived mass spectrum

16

an open source software- framework for shotgun
proteomics
  • OpenMS is a C library that provides solutions
    for many tasks in proteomics data processing
  • Efficient data structures
  • E.g., support for external memory (1 LC/MS map gt
    1 GB)
  • D-dimensional kernel
  • New algorithms for
  • Signal processing of raw MS data
  • Feature finding
  • Superposition
  • Identification
  • Standard file formats, relational database
    support
  • Visualization

17

an open source software- framework for shotgun
proteomics
  • Design goals
  • Extensibility
  • Template code allows flexible and efficient reuse
  • Interoperability
  • Import/export of standard MS formats
  • Robustness
  • Black box unit testing
  • Automated test builds on various platforms and
    architectures
  • Usability
  • HTML documentation of all classes
  • Tutorial with examples
  • Consistent coding style

18
TOPP - Motivation
  • OpenMS is ...
  • very powerful

19
TOPP - Motivation
  • OpenMS is ...
  • very powerful
  • only usable for real programmers

20
TOPP design goals
  • Inspired by EMBOSS
  • one application for each frequently used analysis
    step
  • functionality of OpenMS as easy-to-use standalone
    tools
  • All applications share identical user
    interfacesgt Easy integration into workflow
    systems
  • Uses PSI standard formats (mzData, mzXML,
    analysisXML)
  • Can use a common XML configuration file
  • Requirements only familiarity with UNIX/Linux
    systems
  • Comprehensive HTML documentation (Doxygen)
  • GUI TOPPView

21
TOPP tools
Quantitation
File Handling
Signal Processing
FeatureFinder
FileConverter
NoiseFilter
FileInfo
BaselineFilter
Identification
FileFilter
PeakPicker
RTModel
FileMerger
SpectrumFilter
RTPredict
DTAExtractor
MascotAdapter
InspectAdapter
Analysis
Map Alignment
IDFilter
AdditiveSeries
UnlabeledMatcher
TOPPView
MapMatcher
Isotope Labeling
MapStatistics
LabeledMatcher
Dewarper
22
TOPP tools
Quantitation
File Handling
Signal Processing
FeatureFinder
FileConverter
NoiseFilter
FileInfo
BaselineFilter
Identification
FileFilter
PeakPicker
RTModel
FileMerger
SpectrumFilter
RTPredict
DTAExtractor
MascotAdapter
InspectAdapter
Analysis
Map Alignment
IDFilter
AdditiveSeries
UnlabeledMatcher
TOPPView
MapMatcher
Isotope Labeling
MapStatistics
LabeledMatcher
Dewarper
23
TOPP tools
Quantitation
File Handling
Signal Processing
FeatureFinder
FileConverter
NoiseFilter
FileInfo
BaselineFilter
Identification
FileFilter
PeakPicker
RTModel
FileMerger
SpectrumFilter
RTPredict
DTAExtractor
MascotAdapter
InspectAdapter
Analysis
Map Alignment
IDFilter
AdditiveSeries
UnlabeledMatcher
TOPPView
MapMatcher
Isotope Labeling
MapStatistics
LabeledMatcher
Dewarper
24
TOPP tools
Quantitation
File Handling
Signal Processing
FeatureFinder
FileConverter
NoiseFilter
FileInfo
BaselineFilter
Identification
FileFilter
PeakPicker
RTModel
FileMerger
SpectrumFilter
RTPredict
DTAExtractor
MascotAdapter
InspectAdapter
Analysis
Map Alignment
IDFilter
AdditiveSeries
UnlabeledMatcher
TOPPView
MapMatcher
Isotope Labeling
MapStatistics
LabeledMatcher
Dewarper
25
TOPP tools
Quantitation
File Handling
Signal Processing
FeatureFinder
FileConverter
NoiseFilter
FileInfo
BaselineFilter
Identification
FileFilter
PeakPicker
RTModel
FileMerger
SpectrumFilter
RTPredict
DTAExtractor
MascotAdapter
InspectAdapter
Analysis
Map Alignment
IDFilter
AdditiveSeries
UnlabeledMatcher
TOPPView
MapMatcher
Isotope Labeling
MapStatistics
LabeledMatcher
Dewarper
26
TOPP tools
Quantitation
File Handling
Signal Processing
FeatureFinder
FileConverter
NoiseFilter
FileInfo
BaselineFilter
Identification
FileFilter
PeakPicker
RTModel
FileMerger
SpectrumFilter
RTPredict
DTAExtractor
MascotAdapter
InspectAdapter
Analysis
Map Alignment
IDFilter
AdditiveSeries
UnlabeledMatcher
TOPPView
MapMatcher
Isotope Labeling
MapStatistics
LabeledMatcher
Dewarper
27
TOPP tools
Quantitation
File Handling
Signal Processing
FeatureFinder
FileConverter
NoiseFilter
FileInfo
BaselineFilter
Identification
FileFilter
PeakPicker
RTModel
FileMerger
SpectrumFilter
RTPredict
DTAExtractor
MascotAdapter
InspectAdapter
Analysis
Map Alignment
IDFilter
AdditiveSeries
UnlabeledMatcher
TOPPView
MapMatcher
Isotope Labeling
MapStatistics
LabeledMatcher
Dewarper
28
TOPP tools
Quantitation
File Handling
Signal Processing
FeatureFinder
FileConverter
NoiseFilter
FileInfo
BaselineFilter
Identification
FileFilter
PeakPicker
RTModel
FileMerger
SpectrumFilter
RTPredict
DTAExtractor
MascotAdapter
InspectAdapter
Analysis
Map Alignment
IDFilter
AdditiveSeries
UnlabeledMatcher
TOPPView
MapMatcher
Isotope Labeling
MapStatistics
LabeledMatcher
Dewarper
29
Example PipelinePeptide identification using
LCMSMS
Intermediate steps
Output
Input
HPLC-MS(-MS) raw data
reliable protein/peptide identifications
30
Example Pipeline Identification
Intermediate steps
Output
Input
HPLC-MS(-MS) raw data
MS-MS raw data
extraction of tandem-MS spectra
31
Example Pipeline Identification
Intermediate steps
Output
Input
HPLC-MS(-MS) raw data
MS-MS raw data
noise filtering
smoothed MS-MS raw data
32
Example Pipeline Identification
Intermediate steps
Output
Input
HPLC-MS(-MS) raw data
MS-MS raw data
smoothed MS-MS raw data
peak picking
MS-MS peak data
33
Peak picking using wavelet techniques
Original spectrum
local maxima peak centroids
Spectrum filtered with Marr wavelet
Picked peak spectum
RT, m/z, intensity,FWHM, skew, quality, ...
34
Example Pipeline Identification
Intermediate steps
Output
Input
HPLC-MS(-MS) raw data
MS-MS raw data
smoothed MS-MS raw data
MS-MS peak data
identification using
  • InSpecT
  • Mascot
  • Sequest

protein/peptide identifications
35
Example Pipeline Identification
Intermediate steps
Output
Input
HPLC-MS(-MS) raw data
MS-MS raw data
smoothed MS-MS raw data
MS-MS peak data
reliable protein/peptide identifications
protein/peptide identifications
filtering of identifications
36
Example Pipeline Identification
Quantitation
File Handling
Signal Processing
FeatureFinder
FileConverter
NoiseFilter
FileInfo
BaselineFilter
Identification
FileFilter
PeakPicker
RTModel
FileMerger
SpectrumFilter
RTPredict
DTAExtractor
MascotAdapter
Map Alignment
InspectAdapter
Analysis
UnlabeledMatcher
IDFilter
AdditiveSeries
MapMatcher
TOPPView
Isotope Labeling
Dewarper
MapStatistics
LabeledMatcher
37
Example Pipeline Identification
Quantitation
File Handling
Signal Processing
FeatureFinder
FileConverter
NoiseFilter
FileInfo
BaselineFilter
Identification
FileFilter
PeakPicker
RTModel
FileMerger
SpectrumFilter
RTPredict
DTAExtractor
MascotAdapter
Map Alignment
InspectAdapter
Analysis
UnlabeledMatcher
IDFilter
AdditiveSeries
MapMatcher
TOPPView
Isotope Labeling
Dewarper
MapStatistics
LabeledMatcher
38
Example Pipeline Identification
Quantitation
File Handling
Signal Processing
FeatureFinder
FileConverter
NoiseFilter
FileInfo
BaselineFilter
Identification
FileFilter
PeakPicker
RTModel
FileMerger
SpectrumFilter
RTPredict
DTAExtractor
MascotAdapter
Map Alignment
InspectAdapter
Analysis
UnlabeledMatcher
IDFilter
AdditiveSeries
MapMatcher
TOPPView
Isotope Labeling
Dewarper
MapStatistics
LabeledMatcher
39
Example Pipeline Identification
Quantitation
File Handling
Signal Processing
FeatureFinder
FileConverter
NoiseFilter
FileInfo
BaselineFilter
Identification
FileFilter
PeakPicker
RTModel
FileMerger
SpectrumFilter
RTPredict
DTAExtractor
MascotAdapter
Map Alignment
InspectAdapter
Analysis
UnlabeledMatcher
IDFilter
AdditiveSeries
MapMatcher
TOPPView
Isotope Labeling
Dewarper
MapStatistics
LabeledMatcher
40
Example Pipeline Identification
Quantitation
File Handling
Signal Processing
FeatureFinder
FileConverter
NoiseFilter
FileInfo
BaselineFilter
Identification
FileFilter
PeakPicker
RTModel
FileMerger
SpectrumFilter
RTPredict
DTAExtractor
MascotAdapter
Map Alignment
InspectAdapter
Analysis
UnlabeledMatcher
IDFilter
AdditiveSeries
MapMatcher
TOPPView
Isotope Labeling
Dewarper
MapStatistics
LabeledMatcher
41
Example Pipeline Identification
Quantitation
File Handling
Signal Processing
FeatureFinder
FileConverter
NoiseFilter
FileInfo
BaselineFilter
Identification
FileFilter
PeakPicker
RTModel
FileMerger
SpectrumFilter
RTPredict
DTAExtractor
MascotAdapter
Map Alignment
InspectAdapter
Analysis
UnlabeledMatcher
IDFilter
AdditiveSeries
MapMatcher
TOPPView
Isotope Labeling
Dewarper
MapStatistics
LabeledMatcher
42
Example Pipeline Identification
HPLC-MS(-MS) raw data
MS-MS raw data
FileFilter
mzData
mzData
NoiseFilter
MS-MS peak data
smoothed MS-MS raw data
PeakPicker
mzData
mzData
InspectAdapter
reliable protein/peptide identifications
protein/peptide identifications
IDFilter
analysisXML
analysisXML
43
Example Pipeline Identification
  • The TOPP tools can be combinedto workflows in
    many ways(E.g., Makefiles, shell scripts,
    complex workflow systems)
  • For simplicity, we shall use shell scripts in
    this example.

44
Example Pipeline Identification
id.sh FileFilter -in raw.mzData
-out tandem_ms.mzData -level 2 NoiseFilter
-in tandem_ms.mzData -out smoothed.mzData
-ini id.ini PeakPicker -in smoothed.mzData
-out peaks.mzData -ini id.ini InspectAdapte
r -in peaks.mzData -out id.analysisXML
-ini id.ini IDFilter -in id.analysisXML
-out result.analysisXML -ini id.ini
45
Example Pipeline Identification
id.sh FileFilter -in raw.mzData
-out tandem_ms.mzData -level 2 NoiseFilter
-in tandem_ms.mzData -out smoothed.mzData
-ini id.ini PeakPicker -in smoothed.mzData
-out peaks.mzData -ini id.ini InspectAdapte
r -in peaks.mzData -out id.analysisXML
-ini id.ini IDFilter -in id.analysisXML
-out result.analysisXML -ini id.ini
id.ini ltPARAMETERSgt ... ltNODE
name"InspectAdapter" gt ltITEM name"protease"
value"Trypsin" typestring /gt ltITEM
name"PM_tolerance" value"1.0"
typefloat /gt ltITEM name"ion_tolerance"
value"0.3" typefloat /gt lt/NODEgt
... lt/PARAMETERSgt
46
Myoglobin as diagnostic marker
  • Myoglobin
  • 17 kDa protein
  • stores oxygen in skeletal and heart muscle
  • released in serum after a myocardial infarct
  • important parameter for blood re-circulation
    after thrombolytic therapy
  • healthy people 30-90 ng/mLdiseased gt
    100-1000 ng/ml

47
Additive Method
intensity
measurements
48
Example Pipeline Quantification
Quantitation
File Handling
Signal Processing
FeatureFinder
FileConverter
NoiseFilter
FileInfo
BaselineFilter
Identification
FileFilter
PeakPicker
RTModel
FileMerger
SpectrumFilter
RTPredict
DTAExtractor
MascotAdapter
Map Alignment
InspectAdapter
Analysis
UnlabeledMatcher
IDFilter
AdditiveSeries
MapMatcher
TOPPView
Isotope Labeling
Dewarper
MapStatistics
LabeledMatcher
49
Example Pipeline Quantification
quant.sh Pipeline for Myoglobin Absolute
Quantitation Find features in all 32
individual maps. for i in seq 1 32 do
Truncate raw data maps (to save time).
FileFilter -ini AddSeries.ini -n i
Collect peptide features. FeatureFinder -ini
AddSeries.ini -n i done Star-like matching
(31 edges). for i in seq 1 32 do Map
features across different maps.
UnlabeledMatcher -ini AddSeries.ini -n i
MapMatcher -ini AddSeries.ini -n i Dewarper
-ini AddSeries.ini -n i done Compute final
concentration (lin. regression). AdditiveSeries
-ini AddSeries.ini
50
Example Pipeline Quantification
quant.sh Pipeline for Myoglobin Absolute
Quantitation Find features in all 32
individual maps. for i in seq 1 32 do
Truncate raw data maps (to save time).
FileFilter -ini AddSeries.ini -n i
Collect peptide features. FeatureFinder -ini
AddSeries.ini -n i done Star-like matching
(31 edges). for i in seq 1 32 do Map
features across different maps.
UnlabeledMatcher -ini AddSeries.ini -n i
MapMatcher -ini AddSeries.ini -n i Dewarper
-ini AddSeries.ini -n i done Compute final
concentration (lin. regression). AdditiveSeries
-ini AddSeries.ini
51
Feature Finding in LC/MS raw data
  • Fit a two-dimensional model to selected
    regionsof the LC/MS map

52
Raw Map ? Feature Map
Feature Finding reduces the volume of data by
several orders of magnitude
53
Example Pipeline Quantification
quant.sh Pipeline for Myoglobin Absolute
Quantitation Find features in all 32
individual maps. for i in seq 1 32 do
Truncate raw data maps (to save time).
FileFilter -ini AddSeries.ini -n i
Collect peptide features. FeatureFinder -ini
AddSeries.ini -n i done Star-like matching
(31 edges). for i in seq 1 32 do Map
features across different maps.
UnlabeledMatcher -ini AddSeries.ini -n i
MapMatcher -ini AddSeries.ini -n i Dewarper
-ini AddSeries.ini -n i done Compute final
concentration (lin. regression). AdditiveSeries
-ini AddSeries.ini
54
Direct differential quantitation
Assign pairs across two or more maps
  • RT of peptide may vary between maps
  • compute suitable mapping by pose clustering

55
Example Pipeline Quantification
quant.sh Pipeline for Myoglobin Absolute
Quantitation Find features in all 32
individual maps. for i in seq 1 32 do
Truncate raw data maps (to save time).
FileFilter -ini AddSeries.ini -n i
Collect peptide features. FeatureFinder -ini
AddSeries.ini -n i done Star-like matching
(31 edges). for i in seq 1 32 do Map
features across different maps.
UnlabeledMatcher -ini AddSeries.ini -n i
MapMatcher -ini AddSeries.ini -n i Dewarper
-ini AddSeries.ini -n i done Compute final
concentration (lin. regression). AdditiveSeries
-ini AddSeries.ini
56
Results
  • manual
  • Ca. 2 days work
  • automated
  • Ca. 2 min (CompLife '05)

T11hu HGATVLTALGGILK with IS T10ho
9.0E-01
8.0E-01
7.0E-01
6.0E-01
5.0E-01
relative peak area
4.0E-01
3.0E-01
2.0E-01
1.0E-01
0.0E00
0
0.5
1
1.5
2
2.5
3
concentration ng/µl
0.382 ng/ul (0.31-0.45)
0.48 ng/ul (0.42-0.55)
Expected value 0.47 ng/µl myoglobin
57
Availability
  • Lesser GNU public license (LGPL)
  • Currently runs under
  • Linux
  • (Mac OS X)
  • Other platforms will follow
  • TOPP is hosted at SourceForge as part of OpenMS
  • Latest version 0.95, released for ECCB '06
  • Project web page www.openms.de

58
Summary
  • TOPP is a set of tools covering a wide range of
    frequently used data analysis steps in LC/MS
    based proteomics
  • Flexible
  • one tool for each different task
  • common interfaces and standard file formats,
    easily integrated into workflow systems
  • built upon OpenMS
  • bridges the gap between algorithm designers and
    experimenters

59
The OpenMS Team
Oliver Kohlbacher
Andreas Bertsch
Nico Pfeifer
Marc Sturm
Instr. Analysis and Bioanalysis Saarland
University
60
that's all, essentially
61
Appendix

62
Peak picking using wavelet techniques
63
Tandem Mass Spectrometry
  • Trap certain ions inside a mass spectrometer
  • Ions are further fragmented, e.g. by collision
    with a noble gas
  • Analyze the derived ions in another mass
    spectrometer
  • gt Tandem MS

64
Tandem Mass Spectra
  • Most frequently observed fragmentsb- and y-ions
  • But other bondscan break as wella-, b-, c-
    andx-, y-, z- ions
  • Side chain reactions ...
  • Neutral losses ...
  • Noise ...

65
Tandem Mass Spectra
b2-H2O
b3- NH3
b2
b3
a2
a3
HO
NH3

R1 O R2 O
R3 O R4

H -- N --- C --- C --- N --- C
--- C --- N --- C --- C --- N --- C -- COOH

H H
H H H H H
y3
y2
y1
y2 - NH3
y3 -H2O
66
Average isotopic distribution vs. mass of peptide
67
Feature Models m/z
  • Isotope patterns

68
Feature Models RT
  • Elution profiles
  • Can be modeled by a normal distribution or a
    skewed distribution (EMG, log-normal,...)

69
Feature Finding


feature model
isotope pattern
elution profile
RT
m/z
70
Feature finding algorithm
  • The algorithm for feature finding consists of
  • four main phases
  • Seeding Choose a starting point
  • Extension Find a surrounding region (maybe too
    large)
  • Modeling Fit a two-dimensional model to the
    data.
  • Adjusting. Retain only those data points that are
    compatible with the model

71
Pair Matching
  • Derive empirical score from the 2D distance
    distribution of manually annotated pairs (right)
  • Search for matching features for a given feature
    (left, red) in a bounding box (left, blue)
  • Score pairs with respect to p-value
  • Use greedy algorithm to assign pairs
Write a Comment
User Comments (0)
About PowerShow.com