Title: An Ensemble of Classifiers Approach for the Missing Feature Problem Using learn
1An Ensemble of Classifiers Approach for the
Missing Feature Problem Using learn
- IEEE Region 2 Student Paper Contest
- University of Maryland Eastern Shore
- April 5th, 2003
- Stefan Krause
- Rowan University
Project Advisor Dr. Robi Polikar Branch
Counselor Dr. Shreekanth Mandayam
This material is based upon work supported by the
National Science Foundation under Grant No
ECS-0239090. Any opinions, findings, and
conclusions or recommendations expressed in this
material are those of the author(s) and do not
necessarily reflect the views of the National
Science Foundation.
2Overview
- Background
- Problem Definition
- Motivation
- Approach and Theory
- Databases and Results
- Conclusions
- References
- Questions
3Background
- Pattern recognition
- Recognizing and classifying a previously seen /
familiar pattern
Background Problem Definition Motivation Approac
h and Theory Databases and Results Conclusion Ref
erences Questions
0 1 2 3 4 5 6 7 8 9
A classifier is necessary for automated machine
recognition of patterns
4Background
Background Problem Definition Motivation Approac
h and Theory Databases and Results Conclusions Re
ferences Questions
- Artificial neural network
- An artificial neural network (ANN) is an
algorithmicmodel of the brain, albeit very
crude, to allow a computer to emulate the brains
decision making capability
2
5Problem Definition
Background Problem Definition Motivation Approac
h and Theory Databases and Results Conclusions Re
ferences Questions
- The missing feature problem
- The missing feature problem occurs when instances
from a data set have features that are missing or
corrupted
2
?
6Motivation
The missing feature problem is a significant
issue in computational and machine learning
because
Background Problem Definition Motivation Approac
h and Theory Databases and Results Conclusions Re
ferences Questions
- Neural networks can only produce a valid
classification when all features used for
creating the network are available. - Sensor failure / malfunction or corrupt data is
very common in sensor based applications where
multiple sensors are observing an event. - Solving the missing feature problem adds
considerable robustness to a data classification
algorithm.
7Approach and Theory
Background Problem Definition Motivation Approac
h and Theory Databases and Results Conclusions Re
ferences Questions
- Learn automated classification algorithm
- Ensemble based incremental learning
- Modified for the missing feature problem
8Approach and Theory
Background Problem Definition Motivation Approac
h and Theory Databases and Results Conclusions Re
ferences Questions
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
X
O
O
O
O
X
O
O
O
O
O
X
O
X
O
X
O
X
X
O
O
O
O
O
X
X
X
O
X
X
O
O
O
X
X
O
O
X
O
O
X
X
X
X
O
X
O
O
O
O
X
O
O
X
O
O
X
X
O
O
O
X
X
X
O
O
X
X
O
X
O
X
O
X
O
X
X
O
X
O
O
X
X
X
O
X
O
O
O
X
X
O
X
O
X
O
O
X
X
X
O
X
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
9Approach and Theory
Background Problem Definition Motivation Approac
h and Theory Databases and Results Conclusions Re
ferences Questions
- Traditional ensemble of classifiers approach
10Approach and Theory
Background Problem Definition Motivation Approac
h and Theory Databases and Results Conclusions Re
ferences Questions
- Creating networks in the ensemble with only some
features
11Approach and Theory
Background Problem Definition Motivation Approac
h and Theory Databases and Results Conclusions Re
ferences Questions
- Classifying an instance that is missing f2
12Databases and Results
Gas Identification Database Identification of 5
volatile organic compounds using 6 quartz crystal
microbalance sensors.
Background Problem Definition Motivation Approac
h and Theory Databases and Results Conclusions Re
ferences Questions
13Databases and Results
Background Problem Definition Motivation Approac
h and Theory Databases and Results Conclusions Re
ferences Questions
Gas Identification Database
14Databases and Results
Optical Character Recognition Database Identificat
ion of handwritten characters of the numbers 0
through 9.
Background Problem Definition Motivation Approac
h and Theory Databases and Results Conclusions Re
ferences Questions
15Databases and Results
Background Problem Definition Motivation Approac
h and Theory Databases and Results Conclusions Re
ferences Questions
Optical Character Recognition Database
16Databases and Results
Ionosphere Radar Return Database This system
consists of a phased array of 16 high-frequency
antennas with a total transmitted power on the
order of 6.4 kilowatts. The targets were free
electrons in the ionosphere.
Background Problem Definition Motivation Approac
h and Theory Databases and Results Conclusions
References Questions
17Databases and Results
Background Problem Definition Motivation Approac
h and Theory Databases and Results Conclusions Re
ferences Questions
Ionosphere Radar Return Database
18Conclusions
Background Problem Definition Motivation Approac
h and Theory Databases and Results Conclusions Re
ferences Questions
- Initial results indicate that the algorithm is
capable of classifying data, even with up to 10
missing features, with virtually no drop off in
performance. - The mathematical equations for the algorithm as
well as a flow chart describing the algorithm can
be found in the paper.
19References
R. Polikar, L. Udpa, S. Udpa, and V. Honavar,
Learn an incremental learning algorithm for
supervised neural networks, IEEE Tran. Systems,
Man and Cybernetics, C, vol. 31, no. 4, pp.
497-508, 2001. R. Polikar, J. Byorick, S.
Krause, A. Marino and M. Moreton, Learn A
Classifier Independent Incremental Learning
Algorithm for Supervised Neural Networks, Proc.
Int. Joint Conf. Neural Networks (IJCNN2002),
vol. 2 , pp. 1742-1747, Honolulu, HI, 2002. L.K.
Hansen and P. Salamon, Neural network
ensembles, IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 12, no. 10, pp.
993-1001, 1990. Y. Freund and R. Schapire, A
decision theoretic generalization of on-line
learning and an application to boosting,
Computer and System Sciences, vol. 57, no. 1, pp.
119-139, 1997 C.L. Blake and C.J. Merz, UCI
Repository of machine learning databases at
http//www.ics.uci.edu/mlearn/
MLRepository.html. Irvine, CA University of
California, Dept. of In-formation and Computer
Science, 1998. R. Polikar, R. Shinar, L. Udpa,
M. Porter, Artificial intelligence Methods for
Selection of an Optimized Sensor Array for
Identification of Volatile Organic Compounds,
Sensors and Actuators B Chemical, Volume 80,
Issue 3, pp 243-254, December 2001.
Background Problem Definition Motivation Approac
h and Theory Databases and Results Conclusions Re
ferences Questions
20Questions
Background Problem Definition Motivation Approac
h and Theory Databases and Results Conclusions Re
ferences Questions
This presentation and the paper are available
online at http//engineering.rowan.edu/polikar/R
ESEARCH/PUBLICATIONS/publications.html