Statistical Pattern Recognition and - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Statistical Pattern Recognition and

Description:

Its entropy h can be written as: ... The Maximum Entropy Covariance Selection (MECS) method is given by the following ... (5) maximum entropy mixture. How ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 29
Provided by: dfg8
Category:

less

Transcript and Presenter's Notes

Title: Statistical Pattern Recognition and


1
Lecture 16
  • Statistical Pattern Recognition and
  • Small Sample Size Problems
  • (Covariance Estimation)

Many thanks to Carlos Thomaz who authored the
original version of these slides
2
The Multi-dimensional Gaussian distribution
  • The Gaussian distribution is written
  • p(x) exp(- ½ (x-m)T S-1 (x-m) )/ (2p)n/2 ?S
  • we can use it to determine the probability of
    membership of a class given S and m. For a given
    class we may also have a prior probability, using
    pi for class i
  • p(xclassi) pi exp(- ½ (x-m)T S-1 (x-m) )/
    (2p)n/2 ?S

3
The Bayes Plug-in Classifier (parametric)
  • Using logs so that we do not have an infinite
    space
  • the rule becomes
  • Assign pattern x to class i if the quadratic
    score
  • ? Focus on covariance estimators for Sj

4
The Parzen Window Classifier (non-parametric)
  • It is based on pdfs estimated locally using
    kernels (Gaussian) and a number of neighbours.
    In the standard Parzen classifier
  • ? Analogously, focus on estimators for Si

5
Statistical Pattern Recognition
  • Information about class member-ship is contained
    in the set of class conditional probability
    density functions (pdfs) which could be specified
    (parametric) or learned (non-parametric).
  • In practice, pdfs are based on Gaussian kernels
    that involve the inverse of sample group
    covariance matrix.
  • However, in high dimensional problems the
    covariance matrix may be singular
  • Small Sample Size Problem

6
Small samples
  • In many pattern recognition applications nowadays
    there are often a large number of features (n)
    but the number of training patterns (Ni) per
    class may be significant less than the dimension
    of the feature space.
  • Ni ltlt n !

7
For instance
  • In image recognition problems each group is
    commonly defined by a small number of pictures
    but the number of features used for recognition
    may be thousands of pixels or even hundreds of
    pre-processed image attributes.
  • High dimensional problems!

8
This implies
  • that the performance of classical statistical
    pattern recognition techniques, which have been
    used successfully to design several recognition
    systems, deteriorate in such small sample size
    settings.

9
Small Sample Size Problem
  • Sample group covariance matrix Si
  • Si is singular when Ni lt n
  • Si is poorly estimated when Ni is not gtgt n

10
Covariance Matrix Estimation
  • Geometric idea of parametric classifiers

11
Covariance Matrix Estimation
  • Geometric idea of non-parametric classifiers

12
Covariance Estimation
  • 1. Pooled covariance matrix Sp (LDF)
  • ? Assumes equal covariance for all groups

13
Covariance Estimation (continued)
  • 2. Friedmans RDA covariance estimator (1989)
  • Maximises the classification accuracy
  • Computationally intensive method
  • Same mixing parameters (?,?) for all classes

14
Covariance Estimation (continued)
  • 3. Hoffbecks LOOC estimator (1996)
  • Maximises the average likelihood of each class
  • Requires less computation than RDA

15
Covariance Estimation (non-parametric classifiers)
  • 1. Van Ness covariance matrices Sness (1980)
  • Maximises the classification accuracy
  • Same smoothing parameter (a) for all classes
  • No covariance information

16
Covariance Estimation (non-parametric classifiers)
  • 2. Toeplitz covariance matrices Stoep (1996)
  • Based on stationarity assumption (restrictive)

17
A Maximum Entropy Covariance Estimate
  • Work by Carlos Thomaz
  • It is based on the idea that
  • When we make inferences based on incomplete
    information, we should draw them from that
    probability distribution that has the maximum
    entropy permitted by the information we do have.
  • E.T.Jaynes 1982

18
Loss of Covariance Information
  • In many pattern recognition problems the sources
    of variation are often the same from group to
    group.
  • Similar covariance shape may be assumed
  • In such situations and when Si are singular or
    poorly estimated, linear combination of Si and,
    for instance, Sp may lead to a
  • loss of covariance information.

19
Loss of Covariance Information (cont.)
20
Loss of Covariance Information (cont.)
  • However, when Ni lt n, the (n Ni 1) lower z i
    variances are approximately 0 !
  • Therefore, using the same parameters a and b for
    the whole feature space fritters away some pooled
    covariance information !

21
Loss of Covariance Information (cont.)
  • Geometric Idea

22
A Maximum Entropy Covariance (cont.)
  • Let an n-dimensional sample Xi be normally
    distributed with true covariance matrix Si. Its
    entropy h can be written as
  • which is simply a function of the determinant of
    Si and is invariant under any orthonormal
    tranformation.

23
A Maximum Entropy Covariance (cont.)
  • Thus, in order to maximise
  • we must select the covariance estimation of Si
    that gives the largest eigenvalues.

24
A Maximum Entropy Covariance (cont.)
  • Considering linear combinations of Si and Sp

25
A Maximum Entropy Covariance (cont.)
  • Moreover, as the natural log is a monotonic
    increasing function, we can maximise
  • However,
  • Therefore, we do not need to choose the best
    parameters a and b but simply select the maximum
    variances of the corresponding matrices.

26
A Maximum Entropy Covariance (cont.)
  • The Maximum Entropy Covariance Selection (MECS)
    method is given by the following procedure
  • Find the eigenvectors of Si Sp
  • Calculate the variance contribution of both
    matrices
  • Form new variance matrix based on the largest
    values
  • Form the MECS estimator

27
Visual Analysis
  • The top row shows the 5 image training examples
    of a subject and the subsequent rows show the
    image eigenvectors (with the corresponding
    eigenvalues below) of the following covariance
    matrices
  • (1) sample group
  • (2) pooled
  • (3) maximum likelihood mixture
  • (4) maximum classification mixture
  • (5) maximum entropy mixture.

28
Visual Analysis (cont.)
  • The top row shows the 5 image training examples
    of a subject and the subsequent rows show the
    image eigenvectors (with the corresponding
    eigenvalues below) of the following covariance
    matrices
  • (1) sample group
  • (2) pooled
  • (3) maximum likelihood mixture
  • (4) maximum classification mixture
  • (5) maximum entropy mixture.

29
Visual Analysis (cont.)
  • The top row shows the 3 image training examples
    of a subject and the subsequent rows show the
    image eigenvectors (with the corresponding
    eigenvalues below) of the following covariance
    matrices
  • (1) sample group
  • (2) pooled
  • (3) maximum likelihood mixture
  • (4) maximum classification mixture
  • (5) maximum entropy mixture.

30
Visual Analysis (cont.)
  • The top row shows the 3 image training examples
    of a subject and the subsequent rows show the
    image eigenvectors (with the corresponding
    eigenvalues below) of the following covariance
    matrices
  • (1) sample group
  • (2) pooled
  • (3) maximum likelihood mixture
  • (4) maximum classification mixture
  • (5) maximum entropy mixture.

31
How accurate is the MECS idea ?
  • The details will be covered in the next lecture
    about the covariance-based classifier called
    Linear Discriminant Analysis (LDA).
  • Exemplar Neonatal Brain Classification and
    Analysis
Write a Comment
User Comments (0)
About PowerShow.com