Title: Fuzzy Entropy Classification Systems and Their Application to Mass Spectrometry of the Proteome
1Fuzzy Entropy Classification Systems and Their
Application to Mass Spectrometry of the Proteome
- Peter de B. Harrington, Ping Chen, and Mariela
Ochoa
Ohio University Center for Intelligent Chemical
Instrumentation Department of Chemistry and
Biochemistry Athens, OH 45701-2979
Peter.Harrington_at_Ohio.edu
2Rule-Building Expert Systems
- In the 1980s expert systems were a promising
technology except for the knowledge acquisition
bottleneck - It was hard to find expertise for solving most
problems - When an expert was found, it was difficult to
take their knowledge and encode it in a logical
system. - Rule-building expert systems solved this problem
by using machine learning to construct logical
rules from exemplary sets of data
3Classification Trees and Inductive Logic
- Logical rules may be implemented efficiently in a
tree structure with general rules at the root of
the tree and precise rules at the nodes.
4Multivariate Rules
- Multivariate rules may be obtained simply by
using a linear discriminant that furnishes a
logic. These systems were prevalent in the 1970s
and 1980s as linear learning machines or
perceptrons.
Jurs, P. C. Kowalski, B. R. Isenhour, T. L.,
Computerized Learning Machines Applied to
Chemical Problems - Investigation of Convergence
Rate and Predictive Ability of Adaptive Binary
Pattern Classifiers. Analytical Chemistry 1969,
41, 690-695.
5Information Regarding Classes
- Use a binary encode matrix Y.
- Each row corresponds to a spectrum as a row of X.
- The columns designate a class.
- Summation of the columns gives the number of
spectra in each class.
6Multivariate Fuzzy Rules
- A simple rule is obtained from a weight vector w
that is orthogonal to a (n-1) dimensional
hyperplane which separates the spectra X. The
attribute a defines the intersection of the
hyperplane and the weight vector.
7Fuzzy Rules
8Optimizing the Rule Using Fuzzy Entropy
- The weight w and attribute a of the rule are
adjusted so that the entropy of classification is
minimized. - The temperature t is constrained so that the
first derivative is maximized.
9(No Transcript)
10(No Transcript)
11Optimal Fuzziness
- The computational temperature parameter scales
the magnitude of the weight vector. - The optimal temperature is the one that maximizes
the first derivative of the classification
entropy H(Cw,a,t)X,Y with respect to
temperature. - This criterion speeds up training using gradient
methods by maintaining a steep response surface. - The response is continuous between clusters.
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16(No Transcript)
17(No Transcript)
18(No Transcript)
19(No Transcript)
20Nonlinearly Separable Data
21Divide and Conquer
The nonlinearly separable problem can be divided
into smaller linearly separable sets of data.
This division can be accomplished using
artificial neural networks or classification
trees.
22Classification Trees and Inductive Logic
Logical rules may be implemented efficiently in a
tree structure with general rules at the root of
the tree and precise rules at the nodes.
23(No Transcript)
24(No Transcript)
25MALDI as a Soft Ionization Method
- Introduced by Karas and Hillenkamp (1987) as
ionization method for non-volatile polar
biological and organic macromolecules and
polymers - Low concentration of analyte uniformly dispersed
in solid or liquid matrix - Matrix should have strong absorbance at laser
excitation wavelength and low sublimation
temperature - Three main processes occur formation of solid
solution, matrix excitation, and analyte
ionization
Karas, M. Bachmann, D. Bahr, U. Hillenkamp, F.
Int. J. Mass Spectrom. Ion Process. 1987, 78,
53-68.
26MALDI Mechanism and TOF-MS Instrumentation
27Schematic of a Linear Time-of-Flight Mass
Spectrometer Used in MALDI
UV (337 nm)
Microchannel plate detector
Field-free drift zone
Source
Pulse voltage
Analyte/matrix
Ed 0
Length D
Length s
Backing plate (grounded)
Extraction grid (source voltage -Vs)
Detector grid -Vs
28M_at_ldi-LRTM Mass Spectrometer Time-of-Flight by
Micromass (UK)
- Instrumental parameters
- Laser Nitrogen UV (337 nm)
- Firing rate 5 Hz
- 10 shots/spectrum
- Laser pulse energy 1.2 x 10-4 J
- Spot width 100 mm
- Ion optics Linear TOF path length 0.7 m
- Ion source Grounded time lag focusing source
(delayed extraction) 500 ns - Accelerating voltage 15 kV
- Detector Fast dual micro-channel plate (MCP)
29(No Transcript)
30(No Transcript)
31(No Transcript)
32Mass Alignment Using a Second Order Polynomial to
the Average Spectrum
33Escherichia Coli Experimental Design
34Principal Component Analysis
- Decomposition into orthogonal matrices C and S
- The matrices maximize variance
- The matrices are abstract in that they do not
represent physical or chemical trends
35(No Transcript)
36(No Transcript)
37Latin-Partitions for Evaluating Classifiers
- The Latin-partition method is used to randomly
divide a data set into training-prediction set
pairs. - A specified number of pairs are obtained so that
the every spectrum in the data set is used once
and only once for prediction. - The partitioning maintains the same proportions
of class memberships in the training and
prediction sets. - Replicate spectra of the same samples are never
split between training and prediction sets.
C. Wan and P.B. Harrington, Screening GC-MS
data for carbamate pesticides with
temperature-constrained-cascade correlation
neural networks Analytica Chimica Acta 408
(2000) 1-12.
38Latin Partitions
A
B
C
A
A
B
C
B
C
A
Prediction Set
A
B
A
B
A
B
A
C
C
C
B
C
C
B
Training Set
39Table 1. Confusion Matrix from 50x2 Latin
Partitions with 95 Confidence Intervals
Before model building principal component
compression was used so that the number of
variables was 250. The principal components were
re-calculated for every Latin-partition. These
same components were used for prediction.
40Table 2. Confusion Matrix from 50x2 Latin
Partitions with 95 Confidence Intervals. Tree
Built with Average Spectra and Evaluated with
Average Spectra
Table 3. Confusion Matrix from 50x2 Latin
Partitions with 95 Confidence Intervals. Tree
Built with Average Spectra and Evaluated with
Average Spectra
41(No Transcript)
42(No Transcript)
43FuRES Rule 1
ANOVA-PCA
44Concluding Thoughts
- FuRES trees can give good prediction even when
training sets contain data of questionable
quality. - Using individual spectra gives improved
prediction (significance level P0.08), although
at a cost of longer computation times - Classification trees are amenable to
interpretation and can be used to disclose the
location of biomarkers
45Acknowledgements
- Students
- Ping Chen Lisa Stout
- Preshious Rearden Leanna Kishler
- Yao Lu Abby Berg
-
- National Institutes of Health
- Mental Health
- Child Health and Human Development
- Federal Aviation Administration - Donation of a
Barringer Ionscan 350 - Ion Track Instruments for Support and Donation of
the Itemizer 2 and VaporTracer 1 - Sionex for the donation of DMS
- U.S. Army EBCB - GeoCenters Donation of 4
Chemical Agent Monitors and Funding - Research Opportunity Award-Research Corporation
- Wright-Patterson Air Force Base-INNSSI Fuel
Analysis