Information Theoretic Learning - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Information Theoretic Learning

Description:

A new classifier based on information theoretic learning with unlabeled data. Paper2 ... Huang D., Chow T. W. S., 'Effective feature selection scheme using mutual ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 43
Provided by: pagina
Category:

less

Transcript and Presenter's Notes

Title: Information Theoretic Learning


1
Information Theoretic Learning
  • PDEEC Machine Learning
  • Hélder P. Oliveira

2
Outline
  • Information Theory
  • Information-Theoretic Learning
  • ITL Unifying Criterion for Learning
  • Paper1
  • A new classifier based on information theoretic
    learning with unlabeled data
  • Paper2
  • Effective feature selection scheme using mutual
    information
  • Conclusions
  • References

3
Introduction
  • Humans being constantly bombarded by large
    amounts of data
  • Humans seek information not data
  • Problem -gt How to extract information directly
    from data

4
Information Theory
  • In 1948 Shannon introduced the foundations of
    Information Theory (IT)
  • Information can be characterized by math models
  • IT has had tremendous impact in communication
    systems
  • Is able to answer two key questions
  • What is the minimal code for our data
  • What is the maximal amount of information which
    can be transferred through a channel

5
Information-Theoretic Learning
  • ITL uses descriptors from information theory
    (entropy and divergences) estimated directly from
    the data to substitute the conventional
    statistical descriptors of variance and
    covariance
  • Entropy Single data source
  • Divergence Multiple data source

6
Information-Theoretic Learning
  • Renyis entropy
  • Estimation methods
  • Parzen window

7
Information-Theoretic Learning
From 1
8
Information-Theoretic Learning Unifying
Criterion for Learning
  • 1/2 unsupervised learning
  • 3 supervised learning

From 1
9
Information-Theoretic Learning Unifying
Criterion for Learning
10
Paper1A new classifier based on information
theoretic learning with unlabeled data
  • Jeong K., Xu J., Erdogmus D., Principe J. C., A
    new classifier based on information theoretic
    learning with unlabeled data. Neural Netw. 18,
    5-6 (Jun. 2005)
  • ITL approach based on density divergence
    minimization to obtain an extended training
    algorithm using unlabeled data during the testing
  • Weights are updated during test phase
  • Cost function boosting-like algorithm

11
Paper1A new classifier based on information
theoretic learning with unlabeled data
  • Euclidean distance-matching algorithm
  • fdtrn pdf of desired signal during the training
    phase
  • fytst pdf of system output during the testing
    phase
  • w weights

12
Paper1A new classifier based on information
theoretic learning with unlabeled data
  • pdf directly estimated from data samples
    (nonparametric method)
  • Parzen window
  • k(.,s2) zero-mean Gaussian kernel

13
Paper1A new classifier based on information
theoretic learning with unlabeled data
  • Gradient descent algotithm

14
Paper1A new classifier based on information
theoretic learning with unlabeled data
15
Paper1A new classifier based on information
theoretic learning with unlabeled data
  • Weights update (batch mode)

16
Paper1A new classifier based on information
theoretic learning with unlabeled data
  • Boosting-like algorithm
  • The change of prior probabilities between
    training and testing has important impact on the
    decision boundary
  • With this algorithm the effect of prior
    probability is minimized

17
Paper1A new classifier based on information
theoretic learning with unlabeled data
18
Paper1A new classifier based on information
theoretic learning with unlabeled data
  • Simulation
  • Simple pattern classification problem
  • Real biomedical classification problem
  • Same neural network topology for training and
    testing
  • Classify two different classes

19
Paper1A new classifier based on information
theoretic learning with unlabeled data
  • Simulation in a biomedical data set from neural
    recordings in the surgical treatment of
    Parkinsons disease
  • Trains collected from
  • Thalamic(Thal) class1
  • Subthalamic nucleus(STN) class 2
  • (cellular activity using deep brain stimulation)

20
Paper1A new classifier based on information
theoretic learning with unlabeled data
  • Training data set for three patients

21
Paper1A new classifier based on information
theoretic learning with unlabeled data
  • Test Patient1

22
Paper1A new classifier based on information
theoretic learning with unlabeled data
  • Test Patient2

23
Paper1A new classifier based on information
theoretic learning with unlabeled data
  • Test Patient3

24
Paper1A new classifier based on information
theoretic learning with unlabeled data
  • Conventional classifier is not good as the
    overall training results
  • Prior probabilities are different from those of
    the training set

25
Paper1A new classifier based on information
theoretic learning with unlabeled data
  • Conclusions
  • New information theoretic approach for training
    using unlabeled data in test phase
  • Better performance on adjust system weights
  • For classification method requires preservation
    of prior probability, during training and testing

26
Paper2Effective feature selection scheme using
mutual information
  • Huang D., Chow T. W. S., Effective feature
    selection scheme using mutual information,
    Neurocomputing, Volume 63, New Aspects in
    Neurocomputing 11th European Symposium on
    Artificial Neural Networks, January 2005, Pages
    325-343.
  • Novel mutual information-based feature selection
    (MIIO)
  • Proposed a supervised data compression algorithm
  • Mutual information estimated directly from data

27
Paper2Effective feature selection scheme using
mutual information
  • Advantages
  • MIIO can be estimated directly even with small
    data set
  • Supervised data clustering algorithm enhance the
    computational efficiency of MIIO
  • Able to determine the most prominent feature at
    each iteration even with a highly nonlinear
    problem
  • High redundancy

28
Paper2Effective feature selection scheme using
mutual information
  • Classification -gt reduce the uncertainty about
    predictions on class labels C for the known
    observations X
  • MI Classification -gt Increase MI as much as
    possible. Achieve the higher I(XC) with fewer
    features

29
Paper2Effective feature selection scheme using
mutual information
  • MI feature selection process

30
Paper2Effective feature selection scheme using
mutual information
  • Data compression algorithm for estimating MI (two
    important issues)
  • Is expected a moderate compression to guarantee
    reliable estimates
  • Unsupervised data compression algorithms are not
    preferred when estimating MI between input and
    output

31
Paper2Effective feature selection scheme using
mutual information
  • Estimating MI
  • Parzen window estimator
  • Uniform distribution is not applied
  • Kernel Gaussian function

32
Paper2Effective feature selection scheme using
mutual information
  • MI estimation evaluation

2000 data patterns Different values of s2 10
different data set MI-B Bonnlanders
approach MI-K Kwaks approach
33
Paper2Effective feature selection scheme using
mutual information
  • Evaluation of computational efficiency

2000 data patterns Time consumed MI-B
7.57s MI-K 8.23s MI-P 0.67s
34
Paper2Effective feature selection scheme using
mutual information
  • Effect of the size of data set

35
Paper2Effective feature selection scheme using
mutual information experimental results
(Prostate cancer classification)
36
Paper2Effective feature selection scheme using
mutual information experimental results
(Prostate cancer classification)
37
Paper2Effective feature selection scheme using
mutual information experimental results
(Prostate cancer classification)
38
Paper2Effective feature selection scheme using
mutual information
  • Conclusions
  • New MI-based scheme to search for salient
    features that are relevant to classification and
    not redundant to the selected features
  • The proposed scheme is efficient especially to
    large data set
  • Comparative results show the superiority of the
    proposed methodology

39
Conclusions
  • Information-theoretical Learning work directly
    with the information contained in the samples,
    without any further assumptions.
  • Parzen window algorithm is used to estimate
    entropy and mutual information.
  • Renyis quadratic entropy can be readily
    integrated with the Parzen window estimator

40
Conclusions
  • With Renyis quadratic entropy we have a more
    principled approach to directly manipulate
    entropy
  • Information-Theoretical Learning can be used with
    the most common Machine Learning Algorithms

41
References
  • 1 Principe J.C. Information-Theoretic
    Learning. Tutorial, 2008, http//itl.cnel.ufl.edu
    /ITL_webpage/website20review.pdf
  • 2 Principe J.C, Xu D., Ficher III J. W,
    Information-Theoretic Learning, Unsupervised
    Adaptive Filtering, S. Haykin, ed., New York
    John WileySons, 2000.
  • 3 Jeong K., Xu J., Erdogmus D., Principe J.
    C., A new classifier based on information
    theoretic learning with unlabeled data. Neural
    Netw. 18, 5-6 (Jun. 2005).
  • 4 Huang D., Chow T. W. S., Effective feature
    selection scheme using mutual information,
    Neurocomputing, Volume 63, New Aspects in
    Neurocomputing 11th European Symposium on
    Artificial Neural Networks, January 2005, Pages
    325-343.

42
  • Thank you.
  • Any questions.
Write a Comment
User Comments (0)
About PowerShow.com