Information Theoretic Learning - PowerPoint PPT Presentation

1 / 42

About This Presentation

Title:

Information Theoretic Learning

Description:

A new classifier based on information theoretic learning with unlabeled data. Paper2 ... Huang D., Chow T. W. S., 'Effective feature selection scheme using mutual ... – PowerPoint PPT presentation

Number of Views:45

Avg rating:3.0/5.0

Slides: 43

Provided by: pagina

Category:

more less

Transcript and Presenter's Notes

Title: Information Theoretic Learning

1
Information Theoretic Learning

PDEEC Machine Learning
Hélder P. Oliveira

2
Outline

Information Theory
Information-Theoretic Learning
ITL Unifying Criterion for Learning
Paper1
A new classifier based on information theoretic
learning with unlabeled data
Paper2
Effective feature selection scheme using mutual
information
Conclusions
References

3
Introduction

Humans being constantly bombarded by large
amounts of data
Humans seek information not data
Problem -gt How to extract information directly
from data

4
Information Theory

In 1948 Shannon introduced the foundations of
Information Theory (IT)
Information can be characterized by math models
IT has had tremendous impact in communication
systems
Is able to answer two key questions
What is the minimal code for our data
What is the maximal amount of information which
can be transferred through a channel

5
Information-Theoretic Learning

ITL uses descriptors from information theory
(entropy and divergences) estimated directly from
the data to substitute the conventional
statistical descriptors of variance and
covariance
Entropy Single data source
Divergence Multiple data source

6
Information-Theoretic Learning

Renyis entropy
Estimation methods
Parzen window

7
Information-Theoretic Learning
From 1
8
Information-Theoretic Learning Unifying
Criterion for Learning

1/2 unsupervised learning
3 supervised learning

From 1
9
Information-Theoretic Learning Unifying
Criterion for Learning
10
Paper1A new classifier based on information
theoretic learning with unlabeled data

Jeong K., Xu J., Erdogmus D., Principe J. C., A
new classifier based on information theoretic
learning with unlabeled data. Neural Netw. 18,
5-6 (Jun. 2005)
ITL approach based on density divergence
minimization to obtain an extended training
algorithm using unlabeled data during the testing
Weights are updated during test phase
Cost function boosting-like algorithm

11
Paper1A new classifier based on information
theoretic learning with unlabeled data

Euclidean distance-matching algorithm
fdtrn pdf of desired signal during the training
phase
fytst pdf of system output during the testing
phase
w weights

12
Paper1A new classifier based on information
theoretic learning with unlabeled data

pdf directly estimated from data samples
(nonparametric method)
Parzen window
k(.,s2) zero-mean Gaussian kernel

13
Paper1A new classifier based on information
theoretic learning with unlabeled data

Gradient descent algotithm

14
Paper1A new classifier based on information
theoretic learning with unlabeled data
15
Paper1A new classifier based on information
theoretic learning with unlabeled data

Weights update (batch mode)

16
Paper1A new classifier based on information
theoretic learning with unlabeled data

Boosting-like algorithm
The change of prior probabilities between
training and testing has important impact on the
decision boundary
With this algorithm the effect of prior
probability is minimized

17
Paper1A new classifier based on information
theoretic learning with unlabeled data
18
Paper1A new classifier based on information
theoretic learning with unlabeled data

Simulation
Simple pattern classification problem
Real biomedical classification problem
Same neural network topology for training and
testing
Classify two different classes

19
Paper1A new classifier based on information
theoretic learning with unlabeled data

Simulation in a biomedical data set from neural
recordings in the surgical treatment of
Parkinsons disease
Trains collected from
Thalamic(Thal) class1
Subthalamic nucleus(STN) class 2
(cellular activity using deep brain stimulation)

20
Paper1A new classifier based on information
theoretic learning with unlabeled data

Training data set for three patients

21
Paper1A new classifier based on information
theoretic learning with unlabeled data

Test Patient1

22
Paper1A new classifier based on information
theoretic learning with unlabeled data

Test Patient2

23
Paper1A new classifier based on information
theoretic learning with unlabeled data

Test Patient3

24
Paper1A new classifier based on information
theoretic learning with unlabeled data

Conventional classifier is not good as the
overall training results
Prior probabilities are different from those of
the training set

25
Paper1A new classifier based on information
theoretic learning with unlabeled data

Conclusions
New information theoretic approach for training
using unlabeled data in test phase
Better performance on adjust system weights
For classification method requires preservation
of prior probability, during training and testing

26
Paper2Effective feature selection scheme using
mutual information

Huang D., Chow T. W. S., Effective feature
selection scheme using mutual information,
Neurocomputing, Volume 63, New Aspects in
Neurocomputing 11th European Symposium on
Artificial Neural Networks, January 2005, Pages
325-343.
Novel mutual information-based feature selection
(MIIO)
Proposed a supervised data compression algorithm
Mutual information estimated directly from data

27
Paper2Effective feature selection scheme using
mutual information

Advantages
MIIO can be estimated directly even with small
data set
Supervised data clustering algorithm enhance the
computational efficiency of MIIO
Able to determine the most prominent feature at
each iteration even with a highly nonlinear
problem
High redundancy

28
Paper2Effective feature selection scheme using
mutual information

Classification -gt reduce the uncertainty about
predictions on class labels C for the known
observations X
MI Classification -gt Increase MI as much as
possible. Achieve the higher I(XC) with fewer
features

29
Paper2Effective feature selection scheme using
mutual information

MI feature selection process

30
Paper2Effective feature selection scheme using
mutual information

Data compression algorithm for estimating MI (two
important issues)
Is expected a moderate compression to guarantee
reliable estimates
Unsupervised data compression algorithms are not
preferred when estimating MI between input and
output

31
Paper2Effective feature selection scheme using
mutual information

Estimating MI
Parzen window estimator
Uniform distribution is not applied
Kernel Gaussian function

32
Paper2Effective feature selection scheme using
mutual information

MI estimation evaluation

2000 data patterns Different values of s2 10
different data set MI-B Bonnlanders
approach MI-K Kwaks approach
33
Paper2Effective feature selection scheme using
mutual information

Evaluation of computational efficiency

2000 data patterns Time consumed MI-B
7.57s MI-K 8.23s MI-P 0.67s
34
Paper2Effective feature selection scheme using
mutual information

Effect of the size of data set

35
Paper2Effective feature selection scheme using
mutual information experimental results
(Prostate cancer classification)
36
Paper2Effective feature selection scheme using
mutual information experimental results
(Prostate cancer classification)
37
Paper2Effective feature selection scheme using
mutual information experimental results
(Prostate cancer classification)
38
Paper2Effective feature selection scheme using
mutual information

Conclusions
New MI-based scheme to search for salient
features that are relevant to classification and
not redundant to the selected features
The proposed scheme is efficient especially to
large data set
Comparative results show the superiority of the
proposed methodology

39
Conclusions

Information-theoretical Learning work directly
with the information contained in the samples,
without any further assumptions.
Parzen window algorithm is used to estimate
entropy and mutual information.
Renyis quadratic entropy can be readily
integrated with the Parzen window estimator

40
Conclusions

With Renyis quadratic entropy we have a more
principled approach to directly manipulate
entropy
Information-Theoretical Learning can be used with
the most common Machine Learning Algorithms

41
References

1 Principe J.C. Information-Theoretic
Learning. Tutorial, 2008, http//itl.cnel.ufl.edu
/ITL_webpage/website20review.pdf
2 Principe J.C, Xu D., Ficher III J. W,
Information-Theoretic Learning, Unsupervised
Adaptive Filtering, S. Haykin, ed., New York
John WileySons, 2000.
3 Jeong K., Xu J., Erdogmus D., Principe J.
C., A new classifier based on information
theoretic learning with unlabeled data. Neural
Netw. 18, 5-6 (Jun. 2005).
4 Huang D., Chow T. W. S., Effective feature
selection scheme using mutual information,
Neurocomputing, Volume 63, New Aspects in
Neurocomputing 11th European Symposium on
Artificial Neural Networks, January 2005, Pages
325-343.