Title: Information Theoretic Learning
1Information Theoretic Learning
- PDEEC Machine Learning
- Hélder P. Oliveira
2Outline
- Information Theory
- Information-Theoretic Learning
- ITL Unifying Criterion for Learning
- Paper1
- A new classifier based on information theoretic
learning with unlabeled data - Paper2
- Effective feature selection scheme using mutual
information - Conclusions
- References
3Introduction
- Humans being constantly bombarded by large
amounts of data - Humans seek information not data
- Problem -gt How to extract information directly
from data
4Information Theory
- In 1948 Shannon introduced the foundations of
Information Theory (IT) - Information can be characterized by math models
- IT has had tremendous impact in communication
systems - Is able to answer two key questions
- What is the minimal code for our data
- What is the maximal amount of information which
can be transferred through a channel
5Information-Theoretic Learning
- ITL uses descriptors from information theory
(entropy and divergences) estimated directly from
the data to substitute the conventional
statistical descriptors of variance and
covariance - Entropy Single data source
- Divergence Multiple data source
6Information-Theoretic Learning
- Renyis entropy
- Estimation methods
- Parzen window
7Information-Theoretic Learning
From 1
8Information-Theoretic Learning Unifying
Criterion for Learning
- 1/2 unsupervised learning
- 3 supervised learning
From 1
9Information-Theoretic Learning Unifying
Criterion for Learning
10Paper1A new classifier based on information
theoretic learning with unlabeled data
- Jeong K., Xu J., Erdogmus D., Principe J. C., A
new classifier based on information theoretic
learning with unlabeled data. Neural Netw. 18,
5-6 (Jun. 2005) - ITL approach based on density divergence
minimization to obtain an extended training
algorithm using unlabeled data during the testing - Weights are updated during test phase
- Cost function boosting-like algorithm
11Paper1A new classifier based on information
theoretic learning with unlabeled data
- Euclidean distance-matching algorithm
- fdtrn pdf of desired signal during the training
phase - fytst pdf of system output during the testing
phase - w weights
12Paper1A new classifier based on information
theoretic learning with unlabeled data
- pdf directly estimated from data samples
(nonparametric method) - Parzen window
- k(.,s2) zero-mean Gaussian kernel
13Paper1A new classifier based on information
theoretic learning with unlabeled data
- Gradient descent algotithm
14Paper1A new classifier based on information
theoretic learning with unlabeled data
15Paper1A new classifier based on information
theoretic learning with unlabeled data
- Weights update (batch mode)
16Paper1A new classifier based on information
theoretic learning with unlabeled data
- Boosting-like algorithm
- The change of prior probabilities between
training and testing has important impact on the
decision boundary - With this algorithm the effect of prior
probability is minimized
17Paper1A new classifier based on information
theoretic learning with unlabeled data
18Paper1A new classifier based on information
theoretic learning with unlabeled data
- Simulation
- Simple pattern classification problem
- Real biomedical classification problem
- Same neural network topology for training and
testing - Classify two different classes
19Paper1A new classifier based on information
theoretic learning with unlabeled data
- Simulation in a biomedical data set from neural
recordings in the surgical treatment of
Parkinsons disease - Trains collected from
- Thalamic(Thal) class1
- Subthalamic nucleus(STN) class 2
- (cellular activity using deep brain stimulation)
20Paper1A new classifier based on information
theoretic learning with unlabeled data
- Training data set for three patients
21Paper1A new classifier based on information
theoretic learning with unlabeled data
22Paper1A new classifier based on information
theoretic learning with unlabeled data
23Paper1A new classifier based on information
theoretic learning with unlabeled data
24Paper1A new classifier based on information
theoretic learning with unlabeled data
- Conventional classifier is not good as the
overall training results - Prior probabilities are different from those of
the training set
25Paper1A new classifier based on information
theoretic learning with unlabeled data
- Conclusions
- New information theoretic approach for training
using unlabeled data in test phase - Better performance on adjust system weights
- For classification method requires preservation
of prior probability, during training and testing
26Paper2Effective feature selection scheme using
mutual information
- Huang D., Chow T. W. S., Effective feature
selection scheme using mutual information,
Neurocomputing, Volume 63, New Aspects in
Neurocomputing 11th European Symposium on
Artificial Neural Networks, January 2005, Pages
325-343. - Novel mutual information-based feature selection
(MIIO) - Proposed a supervised data compression algorithm
- Mutual information estimated directly from data
27Paper2Effective feature selection scheme using
mutual information
- Advantages
- MIIO can be estimated directly even with small
data set - Supervised data clustering algorithm enhance the
computational efficiency of MIIO - Able to determine the most prominent feature at
each iteration even with a highly nonlinear
problem - High redundancy
28Paper2Effective feature selection scheme using
mutual information
- Classification -gt reduce the uncertainty about
predictions on class labels C for the known
observations X - MI Classification -gt Increase MI as much as
possible. Achieve the higher I(XC) with fewer
features
29Paper2Effective feature selection scheme using
mutual information
- MI feature selection process
30Paper2Effective feature selection scheme using
mutual information
- Data compression algorithm for estimating MI (two
important issues) - Is expected a moderate compression to guarantee
reliable estimates - Unsupervised data compression algorithms are not
preferred when estimating MI between input and
output
31Paper2Effective feature selection scheme using
mutual information
- Estimating MI
- Parzen window estimator
- Uniform distribution is not applied
- Kernel Gaussian function
32Paper2Effective feature selection scheme using
mutual information
2000 data patterns Different values of s2 10
different data set MI-B Bonnlanders
approach MI-K Kwaks approach
33Paper2Effective feature selection scheme using
mutual information
- Evaluation of computational efficiency
2000 data patterns Time consumed MI-B
7.57s MI-K 8.23s MI-P 0.67s
34Paper2Effective feature selection scheme using
mutual information
- Effect of the size of data set
35Paper2Effective feature selection scheme using
mutual information experimental results
(Prostate cancer classification)
36Paper2Effective feature selection scheme using
mutual information experimental results
(Prostate cancer classification)
37Paper2Effective feature selection scheme using
mutual information experimental results
(Prostate cancer classification)
38Paper2Effective feature selection scheme using
mutual information
- Conclusions
- New MI-based scheme to search for salient
features that are relevant to classification and
not redundant to the selected features - The proposed scheme is efficient especially to
large data set - Comparative results show the superiority of the
proposed methodology
39Conclusions
- Information-theoretical Learning work directly
with the information contained in the samples,
without any further assumptions. - Parzen window algorithm is used to estimate
entropy and mutual information. - Renyis quadratic entropy can be readily
integrated with the Parzen window estimator
40Conclusions
- With Renyis quadratic entropy we have a more
principled approach to directly manipulate
entropy - Information-Theoretical Learning can be used with
the most common Machine Learning Algorithms
41References
- 1 Principe J.C. Information-Theoretic
Learning. Tutorial, 2008, http//itl.cnel.ufl.edu
/ITL_webpage/website20review.pdf - 2 Principe J.C, Xu D., Ficher III J. W,
Information-Theoretic Learning, Unsupervised
Adaptive Filtering, S. Haykin, ed., New York
John WileySons, 2000. - 3 Jeong K., Xu J., Erdogmus D., Principe J.
C., A new classifier based on information
theoretic learning with unlabeled data. Neural
Netw. 18, 5-6 (Jun. 2005). - 4 Huang D., Chow T. W. S., Effective feature
selection scheme using mutual information,
Neurocomputing, Volume 63, New Aspects in
Neurocomputing 11th European Symposium on
Artificial Neural Networks, January 2005, Pages
325-343.
42- Thank you.
- Any questions.