On Preprocessing SELDIMS and its Evaluation - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

On Preprocessing SELDIMS and its Evaluation

Description:

Geneva Artificial Intelligence Laboratory. Centre Universitaire d'Informatique ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 20
Provided by: cuiu
Category:

less

Transcript and Presenter's Notes

Title: On Preprocessing SELDIMS and its Evaluation


1
On Preprocessing SELDI-MS and its Evaluation
  • Julien Prados, Alexandros Kalousis, Melanie
    Hilario

2
Introduction
  • Context
  • diagnosis and biomarker extraction from SELDI-TOF
    mass spectra
  • Issues
  • preprocess mass spectra
  • tune preprocessing parameters
  • evaluate preprocessing quality

3
Typical Work Flow
  • Preprocessing
  • Baseline Removal
  • Normalisation (TIC)
  • Noise Estimation/Elimination
  • Peak Detection
  • Peak alignment

Spectra with Control Diseased labels
List of Discriminant Features
A learning dataset
Classification model for prediction of patient
state from its mass spectrum
Machine Learning
4
Peak Definition
  • Valley definition for a point p The minimum
    points on left and right of p such that their is
    no point with intensity higher than the intensity
    of p between them.
  • Peak definition a spectrum point p is considered
    a peak if its left and right valleys are deeper
    than the noise level.
  • Remark
  • No assumption on peak width

5
Peak Detection
6
Visual Choice of Peak Detection Parameter
  • ? We used a peak detection parameter of 2.5

7
Peak Area
  • p fixed point found by peak detection
  • The signal is splitted into regions according to
    the minima between two consecutive peaks
  • In each region, pl and pr are found by mean least
    square fitting of a piecewise linear model in two
    segments (horizontal and oblique)
  • Area of the peak is given by area of the triangle
    (pl, p, pr)

8
Peak Alignement Missing Values
  • Peak alignment performed by hierachical
    clustering (closest peaks are merged)
  • Two strategies for missing values
  • set missing values to zero because there is no
    peak
  • retrieve signal intensity (not obvious for peak
    area)

?
?
?
9
Data Representation Evaluation
  • Error evaluation of 3 classification algorithms
  • Instance Base Learning (IBk)
  • Decision Tree (J48)
  • Support Vector Machine (SMO)
  • On 3 data representations
  • peak intensity signal intensity for missing
    values (is)
  • peak intensity zero filling of missing values
    (iz)
  • peak area zero filling of missing values (az)
  • For 3 datasets
  • Stroke (Stk)
  • Prostate Cancer (Pro)
  • Ovarian Cancer (Ova)

10
Data Representation Evaluation Results
  • is/iz ? filling missing values with signal
    intensity instead of zeroes retains more
    discriminatory informations
  • iz/az ? using area or intensity does not result
    in significant differences
  • is/raw ? no significant information lost in
    preprocessing, but a much more compact
    representation gain

11
Influence of Peak Detection Parameter
  • Choosing is representation and SMO algorithm,
    what is the influence of peak detection parameter
    on the information content of the preprocessed
    datasets ?

12
Conclusion
  • We proposed a preprocessing pipeline with a new
    peak detection algorithm and a way of computing
    peak areas.
  • The preprocessing has been applied on three
    datasets without significant information loss,
    producing a much more compact representation.
  • Classification performance depends heavily on
    parameter tunning, it should be done in an
    informed manner
  • manual selection
  • automatic selection

13
Perspectives
  • Preprocessing pipeline used in a reproducibility
    study of MALDI-TOF MS Zeferos et al. Sample
    Preparation and Bioinformatics in MALDI Profiling
    of Urinary Proteins. Submitted, JChromat, 2006
  • Peak detection algorithm has been extended to 2D
    (and even nD) and applied on Nano-LC MS

14
2D Nano-LC Peak Detection
15
Thank you !
16
Preprocessing Objectives
  • Correct signal distortions (baseline,
    normalization)
  • Reduce dimensionality of learning problem (peak
    detection)
  • Avoid removing discriminative informations

17
Signal Distortions Correction
  • Baseline estimated with open operator (local
    maxima of the local minima) in a moving window
    of containing 5 of the total number of point in
    the mass spectrum
  • Total Ion Current normalization with part of the
    signal gt 2000 Da

18
vd vs Error
19
Advertising
Write a Comment
User Comments (0)
About PowerShow.com