Fuzzy Entropy Classification Systems and Their Application to Mass Spectrometry of the Proteome - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Fuzzy Entropy Classification Systems and Their Application to Mass Spectrometry of the Proteome

Description:

Peter de B. Harrington, Ping Chen, and Mariela Ochoa. Ohio University ... Ion source: Grounded 'time lag focusing' source (delayed extraction) ~ 500 ns ... – PowerPoint PPT presentation

Number of Views:236
Avg rating:3.0/5.0
Slides: 46
Provided by: peterdebh
Category:

less

Transcript and Presenter's Notes

Title: Fuzzy Entropy Classification Systems and Their Application to Mass Spectrometry of the Proteome


1
Fuzzy Entropy Classification Systems and Their
Application to Mass Spectrometry of the Proteome
  • Peter de B. Harrington, Ping Chen, and Mariela
    Ochoa

Ohio University Center for Intelligent Chemical
Instrumentation Department of Chemistry and
Biochemistry Athens, OH 45701-2979
Peter.Harrington_at_Ohio.edu
2
Rule-Building Expert Systems
  • In the 1980s expert systems were a promising
    technology except for the knowledge acquisition
    bottleneck
  • It was hard to find expertise for solving most
    problems
  • When an expert was found, it was difficult to
    take their knowledge and encode it in a logical
    system.
  • Rule-building expert systems solved this problem
    by using machine learning to construct logical
    rules from exemplary sets of data

3
Classification Trees and Inductive Logic
  • Logical rules may be implemented efficiently in a
    tree structure with general rules at the root of
    the tree and precise rules at the nodes.

4
Multivariate Rules
  • Multivariate rules may be obtained simply by
    using a linear discriminant that furnishes a
    logic. These systems were prevalent in the 1970s
    and 1980s as linear learning machines or
    perceptrons.

Jurs, P. C. Kowalski, B. R. Isenhour, T. L.,
Computerized Learning Machines Applied to
Chemical Problems - Investigation of Convergence
Rate and Predictive Ability of Adaptive Binary
Pattern Classifiers. Analytical Chemistry 1969,
41, 690-695.
5
Information Regarding Classes
  • Use a binary encode matrix Y.
  • Each row corresponds to a spectrum as a row of X.
  • The columns designate a class.
  • Summation of the columns gives the number of
    spectra in each class.

6
Multivariate Fuzzy Rules
  • A simple rule is obtained from a weight vector w
    that is orthogonal to a (n-1) dimensional
    hyperplane which separates the spectra X. The
    attribute a defines the intersection of the
    hyperplane and the weight vector.

7
Fuzzy Rules
8
Optimizing the Rule Using Fuzzy Entropy
  • The weight w and attribute a of the rule are
    adjusted so that the entropy of classification is
    minimized.
  • The temperature t is constrained so that the
    first derivative is maximized.

9
(No Transcript)
10
(No Transcript)
11
Optimal Fuzziness
  • The computational temperature parameter scales
    the magnitude of the weight vector.
  • The optimal temperature is the one that maximizes
    the first derivative of the classification
    entropy H(Cw,a,t)X,Y with respect to
    temperature.
  • This criterion speeds up training using gradient
    methods by maintaining a steep response surface.
  • The response is continuous between clusters.

12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
Nonlinearly Separable Data
21
Divide and Conquer
The nonlinearly separable problem can be divided
into smaller linearly separable sets of data.
This division can be accomplished using
artificial neural networks or classification
trees.
22
Classification Trees and Inductive Logic
Logical rules may be implemented efficiently in a
tree structure with general rules at the root of
the tree and precise rules at the nodes.
23
(No Transcript)
24
(No Transcript)
25
MALDI as a Soft Ionization Method
  • Introduced by Karas and Hillenkamp (1987) as
    ionization method for non-volatile polar
    biological and organic macromolecules and
    polymers
  • Low concentration of analyte uniformly dispersed
    in solid or liquid matrix
  • Matrix should have strong absorbance at laser
    excitation wavelength and low sublimation
    temperature
  • Three main processes occur formation of solid
    solution, matrix excitation, and analyte
    ionization

Karas, M. Bachmann, D. Bahr, U. Hillenkamp, F.
Int. J. Mass Spectrom. Ion Process. 1987, 78,
53-68.
26
MALDI Mechanism and TOF-MS Instrumentation
27
Schematic of a Linear Time-of-Flight Mass
Spectrometer Used in MALDI
UV (337 nm)
Microchannel plate detector
Field-free drift zone
Source
Pulse voltage
Analyte/matrix
Ed 0
Length D
Length s
Backing plate (grounded)
Extraction grid (source voltage -Vs)
Detector grid -Vs
28
M_at_ldi-LRTM Mass Spectrometer Time-of-Flight by
Micromass (UK)
  • Instrumental parameters
  • Laser Nitrogen UV (337 nm)
  • Firing rate 5 Hz
  • 10 shots/spectrum
  • Laser pulse energy 1.2 x 10-4 J
  • Spot width 100 mm
  • Ion optics Linear TOF path length 0.7 m
  • Ion source Grounded time lag focusing source
    (delayed extraction) 500 ns
  • Accelerating voltage 15 kV
  • Detector Fast dual micro-channel plate (MCP)

29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
Mass Alignment Using a Second Order Polynomial to
the Average Spectrum
33
Escherichia Coli Experimental Design
34
Principal Component Analysis
  • Decomposition into orthogonal matrices C and S
  • The matrices maximize variance
  • The matrices are abstract in that they do not
    represent physical or chemical trends

35
(No Transcript)
36
(No Transcript)
37
Latin-Partitions for Evaluating Classifiers
  • The Latin-partition method is used to randomly
    divide a data set into training-prediction set
    pairs.
  • A specified number of pairs are obtained so that
    the every spectrum in the data set is used once
    and only once for prediction.
  • The partitioning maintains the same proportions
    of class memberships in the training and
    prediction sets.
  • Replicate spectra of the same samples are never
    split between training and prediction sets.

C. Wan and P.B. Harrington, Screening GC-MS
data for carbamate pesticides with
temperature-constrained-cascade correlation
neural networks Analytica Chimica Acta 408
(2000) 1-12.
38
Latin Partitions
A
B
C
A
A
B
C
B
C
A
Prediction Set
A
B
A
B
A
B
A
C
C
C
B
C
C
B
Training Set
39
Table 1. Confusion Matrix from 50x2 Latin
Partitions with 95 Confidence Intervals
Before model building principal component
compression was used so that the number of
variables was 250. The principal components were
re-calculated for every Latin-partition. These
same components were used for prediction.
40
Table 2. Confusion Matrix from 50x2 Latin
Partitions with 95 Confidence Intervals. Tree
Built with Average Spectra and Evaluated with
Average Spectra
Table 3. Confusion Matrix from 50x2 Latin
Partitions with 95 Confidence Intervals. Tree
Built with Average Spectra and Evaluated with
Average Spectra
41
(No Transcript)
42
(No Transcript)
43
FuRES Rule 1
ANOVA-PCA
44
Concluding Thoughts
  • FuRES trees can give good prediction even when
    training sets contain data of questionable
    quality.
  • Using individual spectra gives improved
    prediction (significance level P0.08), although
    at a cost of longer computation times
  • Classification trees are amenable to
    interpretation and can be used to disclose the
    location of biomarkers

45
Acknowledgements
  • Students
  • Ping Chen Lisa Stout
  • Preshious Rearden Leanna Kishler
  • Yao Lu Abby Berg
  • National Institutes of Health
  • Mental Health
  • Child Health and Human Development
  • Federal Aviation Administration - Donation of a
    Barringer Ionscan 350
  • Ion Track Instruments for Support and Donation of
    the Itemizer 2 and VaporTracer 1
  • Sionex for the donation of DMS
  • U.S. Army EBCB - GeoCenters Donation of 4
    Chemical Agent Monitors and Funding
  • Research Opportunity Award-Research Corporation
  • Wright-Patterson Air Force Base-INNSSI Fuel
    Analysis
Write a Comment
User Comments (0)
About PowerShow.com