Statistical Pattern Recognition and - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Statistical Pattern Recognition and

Description:

Its entropy h can be written as: ... The Maximum Entropy Covariance Selection (MECS) method is given by the following ... (5) maximum entropy mixture. How ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 29

Provided by: dfg8

Category:

more less

Transcript and Presenter's Notes

Title: Statistical Pattern Recognition and

1
Lecture 16

Statistical Pattern Recognition and
Small Sample Size Problems
(Covariance Estimation)

Many thanks to Carlos Thomaz who authored the
original version of these slides
2
The Multi-dimensional Gaussian distribution

The Gaussian distribution is written
p(x) exp(- ½ (x-m)T S-1 (x-m) )/ (2p)n/2 ?S
we can use it to determine the probability of
membership of a class given S and m. For a given
class we may also have a prior probability, using
pi for class i
p(xclassi) pi exp(- ½ (x-m)T S-1 (x-m) )/
(2p)n/2 ?S

3
The Bayes Plug-in Classifier (parametric)

Using logs so that we do not have an infinite
space
the rule becomes
Assign pattern x to class i if the quadratic
score
? Focus on covariance estimators for Sj

4
The Parzen Window Classifier (non-parametric)

It is based on pdfs estimated locally using
kernels (Gaussian) and a number of neighbours.
In the standard Parzen classifier
? Analogously, focus on estimators for Si

5
Statistical Pattern Recognition

Information about class member-ship is contained
in the set of class conditional probability
density functions (pdfs) which could be specified
(parametric) or learned (non-parametric).
In practice, pdfs are based on Gaussian kernels
that involve the inverse of sample group
covariance matrix.
However, in high dimensional problems the
covariance matrix may be singular
Small Sample Size Problem

6
Small samples

In many pattern recognition applications nowadays
there are often a large number of features (n)
but the number of training patterns (Ni) per
class may be significant less than the dimension
of the feature space.
Ni ltlt n !

7
For instance

In image recognition problems each group is
commonly defined by a small number of pictures
but the number of features used for recognition
may be thousands of pixels or even hundreds of
pre-processed image attributes.
High dimensional problems!

8
This implies

that the performance of classical statistical
pattern recognition techniques, which have been
used successfully to design several recognition
systems, deteriorate in such small sample size
settings.

9
Small Sample Size Problem

Sample group covariance matrix Si
Si is singular when Ni lt n
Si is poorly estimated when Ni is not gtgt n

10
Covariance Matrix Estimation

Geometric idea of parametric classifiers

11
Covariance Matrix Estimation

Geometric idea of non-parametric classifiers

12
Covariance Estimation

1. Pooled covariance matrix Sp (LDF)
? Assumes equal covariance for all groups

13
Covariance Estimation (continued)

2. Friedmans RDA covariance estimator (1989)
Maximises the classification accuracy
Computationally intensive method
Same mixing parameters (?,?) for all classes

14
Covariance Estimation (continued)

3. Hoffbecks LOOC estimator (1996)
Maximises the average likelihood of each class
Requires less computation than RDA

15
Covariance Estimation (non-parametric classifiers)

1. Van Ness covariance matrices Sness (1980)
Maximises the classification accuracy
Same smoothing parameter (a) for all classes
No covariance information

16
Covariance Estimation (non-parametric classifiers)

2. Toeplitz covariance matrices Stoep (1996)
Based on stationarity assumption (restrictive)

17
A Maximum Entropy Covariance Estimate

Work by Carlos Thomaz
It is based on the idea that
When we make inferences based on incomplete
information, we should draw them from that
probability distribution that has the maximum
entropy permitted by the information we do have.
E.T.Jaynes 1982

18
Loss of Covariance Information

In many pattern recognition problems the sources
of variation are often the same from group to
group.
Similar covariance shape may be assumed
In such situations and when Si are singular or
poorly estimated, linear combination of Si and,
for instance, Sp may lead to a
loss of covariance information.

19
Loss of Covariance Information (cont.)
20
Loss of Covariance Information (cont.)

However, when Ni lt n, the (n Ni 1) lower z i
variances are approximately 0 !
Therefore, using the same parameters a and b for
the whole feature space fritters away some pooled
covariance information !

21
Loss of Covariance Information (cont.)

Geometric Idea

22
A Maximum Entropy Covariance (cont.)

Let an n-dimensional sample Xi be normally
distributed with true covariance matrix Si. Its
entropy h can be written as
which is simply a function of the determinant of
Si and is invariant under any orthonormal
tranformation.

23
A Maximum Entropy Covariance (cont.)

Thus, in order to maximise
we must select the covariance estimation of Si
that gives the largest eigenvalues.

24
A Maximum Entropy Covariance (cont.)

Considering linear combinations of Si and Sp

25
A Maximum Entropy Covariance (cont.)

Moreover, as the natural log is a monotonic
increasing function, we can maximise
However,
Therefore, we do not need to choose the best
parameters a and b but simply select the maximum
variances of the corresponding matrices.

26
A Maximum Entropy Covariance (cont.)

The Maximum Entropy Covariance Selection (MECS)
method is given by the following procedure
Find the eigenvectors of Si Sp
Calculate the variance contribution of both
matrices
Form new variance matrix based on the largest
values
Form the MECS estimator

27
Visual Analysis

The top row shows the 5 image training examples
of a subject and the subsequent rows show the
image eigenvectors (with the corresponding
eigenvalues below) of the following covariance
matrices
(1) sample group
(2) pooled
(3) maximum likelihood mixture
(4) maximum classification mixture
(5) maximum entropy mixture.

28
Visual Analysis (cont.)

The top row shows the 5 image training examples
of a subject and the subsequent rows show the
image eigenvectors (with the corresponding
eigenvalues below) of the following covariance
matrices
(1) sample group
(2) pooled
(3) maximum likelihood mixture
(4) maximum classification mixture
(5) maximum entropy mixture.

29
Visual Analysis (cont.)

The top row shows the 3 image training examples
of a subject and the subsequent rows show the
image eigenvectors (with the corresponding
eigenvalues below) of the following covariance
matrices
(1) sample group
(2) pooled
(3) maximum likelihood mixture
(4) maximum classification mixture
(5) maximum entropy mixture.

30
Visual Analysis (cont.)

The top row shows the 3 image training examples
of a subject and the subsequent rows show the
image eigenvectors (with the corresponding
eigenvalues below) of the following covariance
matrices
(1) sample group
(2) pooled
(3) maximum likelihood mixture
(4) maximum classification mixture
(5) maximum entropy mixture.

31
How accurate is the MECS idea ?

The details will be covered in the next lecture
about the covariance-based classifier called
Linear Discriminant Analysis (LDA).
Exemplar Neonatal Brain Classification and
Analysis

Write a Comment

User Comments (0)