Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley - PowerPoint PPT Presentation

About This Presentation
Title:

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Description:

Title: PowerPoint Presentation Author: Djamel Bouchaffra Last modified by: rose Created Date: 1/28/2001 7:39:16 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:379
Avg rating:3.0/5.0
Slides: 22
Provided by: djam84
Learn more at: https://cse.sc.edu
Category:

less

Transcript and Presenter's Notes

Title: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley


1
Pattern ClassificationAll materials in these
slides were taken from Pattern Classification
(2nd ed) by R. O. Duda, P. E. Hart and D. G.
Stork, John Wiley Sons, 2000 with the
permission of the authors and the publisher

2
Chapter 3Maximum-Likelihood Bayesian
Parameter Estimation (part 1)
  • Introduction
  • Maximum-Likelihood Estimation
  • Example of a Specific Case
  • The Gaussian Case unknown ? and ?
  • Bias
  • Appendix ML Problem Statement

3
  • Introduction
  • Data availability in a Bayesian framework
  • We could design an optimal classifier if we knew
  • P(?i) (priors)
  • P(x ?i) (class-conditional densities)
  • Unfortunately, we rarely have this complete
    information!
  • Design a classifier from a training sample
  • No problem with prior estimation
  • Samples are often too small for class-conditional
    estimation (large dimension of feature space!)

1
4
  • A priori information about the problem
  • Do we know something about the distribution?
  • ? find parameters to characterize the
    distribution
  • Example Normality of P(x ?i)
  • P(x ?i) N( ?i, ?i)
  • Characterized by 2 parameters
  • Estimation techniques
  • Maximum-Likelihood (ML) and the Bayesian
    estimations
  • Results are nearly identical, but the approaches
    are different

1
5
  • Parameters in ML estimation are fixed but
    unknown!
  • Best parameters are obtained by maximizing the
    probability of obtaining the samples observed
  • Bayesian methods view the parameters as random
    variables having some known distribution
  • In either approach, we use P(?i x)for our
    classification rule!

1
6
  • Maximum-Likelihood Estimation
  • Has good convergence properties as the sample
    size increases
  • Simpler than any other alternative techniques
  • General principle
  • Assume we have c classes and
  • P(x ?j) N( ?j, ?j)
  • P(x ?j) ? P (x ?j, ?j) where

2
7
  • Use the informationprovided by the training
    samples to estimate
  • ? (?1, ?2, , ?c), each ?i (i 1, 2, , c) is
    associated with each category
  • Suppose that D contains n samples, x1, x2,, xn
  • ML estimate of ? is, by definition the value that
    maximizes P(D ?)
  • It is the value of ? that best agrees with the
    actually observed training sample

2
8
2
9
  • Optimal estimation
  • Let ? (?1, ?2, , ?p)t and let ?? be the
    gradient operator
  • We define l(?) as the log-likelihood function
  • l(?) ln P(D ?)
  • (recall D is the training data)
  • New problem statement
  • determine ? that maximizes the log-likelihood

2
10
  • The definition of l() is
  • and
  • Set of necessary conditions for an optimum is
  • ??l 0 (eq. 7)

2
11
  • Example, the Gaussian case unknown ?
  • We assume we know the covariance
  • p(xi ?) N(?, ?)
  • (Samples are drawn from a multivariate normal
    population)
  • ? ? thereforeThe ML estimate for ? must
    satisfy

2
12
  • Multiplying by ? and rearranging, we obtain
  • Just the arithmetic average of the samples of
    the training samples!
  • Conclusion
  • If P(xk ?j) (j 1, 2, , c) is supposed to be
    Gaussian in a d-dimensional feature space then
    we can estimate the vector
  • ? (?1, ?2, , ?c)t and perform an optimal
    classification!

2
13
  • Example, Gaussian Case unknown ? and S
  • First consider univariate case unknown ? and
    ? ? (?1, ?2) (?, ?2)

2
14
  • Summation (over the training set)
  • Combining (1) and (2), one obtains

2
15
  • The ML estimates for the multivariate case is
    similar
  • The scalars c and ? are replaced with vectors
  • The variance s2 is replaced by the covariance
    matrix

16
  • Bias
  • ML estimate for ?2 is biased
  • Extreme case n1, E 0 ? ?2
  • As the n increases the bias is reduced
  • ? this type of estimator is called asymptotically
    unbiased

2
17
  • An elementary unbiased estimator for ? is
  • This estimator is unbiased for all distributions
  • ? Such estimators are called absolutely unbiased

2
18
  • Our earlier estimator for ? is biased
  • In fact it is asymptotically unbiased
  • Observe that

2
19
  • Appendix ML Problem Statement
  • Let D x1, x2, , xn
  • P(x1,, xn ?) ?1,nP(xk ?) D n
  • Our goal is to determine (value of ? that
    maximizes the likelihood of this sample set!)

2
20
D n
.
.
.
x2
.
.
.
x1
xn
N(?j, ?j) P(xj, ?1)
P(xj ?1)
P(xj ?k)
D1
.
x11
.
.
.
x10
Dk
.
Dc
x8
.
.
.
.
x20
.
x9
x1
.
.
2
21
  • ? (?1, ?2, , ?c)
  • Problem find such that

2
Write a Comment
User Comments (0)
About PowerShow.com