Pattern Classification All materials in these slides were taken from Pattern Classification 2nd ed b

About This Presentation

Title:

Pattern Classification All materials in these slides were taken from Pattern Classification 2nd ed b

Description:

Separate a collection of i.i.d. samples according to class, then we have c classes D1~Dc and ... n samples, x1, x2,..., xn, because the samples were drawn ... – PowerPoint PPT presentation

Number of Views:93

Avg rating:3.0/5.0

Slides: 19

Provided by: djam52

Category:

more less

Transcript and Presenter's Notes

Title: Pattern Classification All materials in these slides were taken from Pattern Classification 2nd ed b

1
Pattern ClassificationAll materials in these
slides were taken from Pattern Classification
(2nd ed) by R. O. Duda, P. E. Hart and D. G.
Stork, John Wiley Sons, 2000 with the
permission of the authors and the publisher

2
Chapter 3Maximum-Likelihood Bayesian
Parameter Estimation (part 1)

Introduction
Maximum-Likelihood Estimation
Example of a Specific Case
The Gaussian Case unknown ? and ?
Bias
Appendix ML Problem Statement

3.1 Introduction
Data availability in a Bayesian framework
We could design an optimal classifier if we knew
P(?i) (priors)
p(x ?i) (class-conditional densities)
Unfortunately, we rarely have this complete
information!
Typically we merely have some general knowledge,
together with some design samples or training
data
Approach to this problem design a classifier
from a training sample
No problem with prior estimation
Samples are often too small for class-conditional
estimation (large dimension of feature space!)

1
4

A priori information about the problem
Normality of p(x ?i)
p(x ?i) N(?i, ?i)
Characterized by 2 parameters ?i and ?i
Problem simplifying from estimating an unknown
function p(x ?i) to estimating the parameters
?i and ?i.
Estimation techniques
Maximum-Likelihood (ML) and the Bayesian
estimations
Results are nearly identical, but the approaches
are different

1
5

Parameters in ML estimation are fixed but
unknown!
Best parameters are obtained by maximizing the
probability of obtaining the samples observed
Bayesian methods view the parameters as random
variables having some known distribution
Observation of samples converts this to a
posterior density function.
Observing additional samples is to sharpen the a
posteriori density function, causing it to peak
near the true values of the parameters ---
Baysian Learning
In either approach, we use p(?ix) for our
classification rule!

1
6

3.2 Maximum-Likelihood Estimation
Has good convergence properties as the sample
size increases
Simpler than any other alternative techniques
3.2.1 General principle
Separate a collection of i.i.d. samples according
to class, then we have c classes D1Dc and
Dj ? p(x?j)
p(x?j) N( ?j, ?j) known parametric form
p(x?j) ? p (x ?j, ?j) where

2
7

Use the information provided by the training
samples to estimate the parameter vectors ?
(?1, ?2, , ?c), each ?i (i 1, 2, , c) is
associated with each category
Suppose that D contains n samples, x1, x2,, xn,
because the samples were drawn independently,
ML estimate of ? is, by definition the value that
maximizes p(D ?)
It is the value of ? that best agrees with the
actually observed training sample

2
8
2
9

Optimal estimation
The number of parameters to be estimated is p
Let ? (?1, ?2, , ?p)t and let ?? be the
gradient operator
We define l(?) as the log-likelihood function
l(?) ln p(D ?)
New problem statement
determine ? that maximizes the log-likelihood

2
10

Set of necessary conditions for an optimum is
?? l(q ) 0

2
11

Example of a specific case Gaussian case with
unknown ?
p(xi ?) N(?, ?)
(Samples are drawn from a multivariate normal
population)
? ? therefore
The ML estimate for ? must satisfy

2
12

Multiplying by ? and rearranging, we obtain
ML estimate for the unknown population mean is
just the arithmetic average of the training
samples the sample mean!
Conclusion
If p(xk ?j) (j 1, 2, , c) is supposed to be
Gaussian in a d-dimensional feature space then
we can estimate the vector
? (?1, ?2, , ?c)t and perform an optimal
classification!

2
13

ML Estimation
Gaussian Case unknown ? and ? ? (?1, ?2)
(?, ?2)

2
14

Summation
Combining (1) and (2), one obtains

2
15

Bias
ML estimate for ?2 is biased (expected value over
all data sets of size n of the sample variance is
not equal to the true variance)
An elementary unbiased estimator for ? is

2
16

Appendix ML Problem Statement
Let D x1, x2, , xn
P(x1,, xn ?) ?1,nP(xk ?) D n
Our goal is to determine (value of ? that
makes this sample the most representative!)