Pattern Classification All materials in these slides were taken from Pattern Classification 2nd ed b - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Pattern Classification All materials in these slides were taken from Pattern Classification 2nd ed b

Description:

Separate a collection of i.i.d. samples according to class, then we have c classes D1~Dc and ... n samples, x1, x2,..., xn, because the samples were drawn ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 19
Provided by: djam52
Category:

less

Transcript and Presenter's Notes

Title: Pattern Classification All materials in these slides were taken from Pattern Classification 2nd ed b


1
Pattern ClassificationAll materials in these
slides were taken from Pattern Classification
(2nd ed) by R. O. Duda, P. E. Hart and D. G.
Stork, John Wiley Sons, 2000 with the
permission of the authors and the publisher

2
Chapter 3Maximum-Likelihood Bayesian
Parameter Estimation (part 1)
  • Introduction
  • Maximum-Likelihood Estimation
  • Example of a Specific Case
  • The Gaussian Case unknown ? and ?
  • Bias
  • Appendix ML Problem Statement

3
  • 3.1 Introduction
  • Data availability in a Bayesian framework
  • We could design an optimal classifier if we knew
  • P(?i) (priors)
  • p(x ?i) (class-conditional densities)
  • Unfortunately, we rarely have this complete
    information!
  • Typically we merely have some general knowledge,
    together with some design samples or training
    data
  • Approach to this problem design a classifier
    from a training sample
  • No problem with prior estimation
  • Samples are often too small for class-conditional
    estimation (large dimension of feature space!)

1
4
  • A priori information about the problem
  • Normality of p(x ?i)
    p(x ?i) N(?i, ?i)
  • Characterized by 2 parameters ?i and ?i
  • Problem simplifying from estimating an unknown
    function p(x ?i) to estimating the parameters
    ?i and ?i.
  • Estimation techniques
  • Maximum-Likelihood (ML) and the Bayesian
    estimations
  • Results are nearly identical, but the approaches
    are different

1
5
  • Parameters in ML estimation are fixed but
    unknown!
  • Best parameters are obtained by maximizing the
    probability of obtaining the samples observed
  • Bayesian methods view the parameters as random
    variables having some known distribution
  • Observation of samples converts this to a
    posterior density function.
  • Observing additional samples is to sharpen the a
    posteriori density function, causing it to peak
    near the true values of the parameters ---
    Baysian Learning
  • In either approach, we use p(?ix) for our
    classification rule!

1
6
  • 3.2 Maximum-Likelihood Estimation
  • Has good convergence properties as the sample
    size increases
  • Simpler than any other alternative techniques
  • 3.2.1 General principle
  • Separate a collection of i.i.d. samples according
    to class, then we have c classes D1Dc and
  • Dj ? p(x?j)
  • p(x?j) N( ?j, ?j) known parametric form
  • p(x?j) ? p (x ?j, ?j) where

2
7
  • Use the information provided by the training
    samples to estimate the parameter vectors ?
    (?1, ?2, , ?c), each ?i (i 1, 2, , c) is
    associated with each category
  • Suppose that D contains n samples, x1, x2,, xn,
    because the samples were drawn independently,
  • ML estimate of ? is, by definition the value that
    maximizes p(D ?)
  • It is the value of ? that best agrees with the
    actually observed training sample

2
8
2
9
  • Optimal estimation
  • The number of parameters to be estimated is p
  • Let ? (?1, ?2, , ?p)t and let ?? be the
    gradient operator
  • We define l(?) as the log-likelihood function
  • l(?) ln p(D ?)
  • New problem statement
  • determine ? that maximizes the log-likelihood

2
10
  • Set of necessary conditions for an optimum is
  • ?? l(q ) 0

2
11
  • Example of a specific case Gaussian case with
    unknown ?
  • p(xi ?) N(?, ?)
  • (Samples are drawn from a multivariate normal
    population)
  • ? ? therefore
  • The ML estimate for ? must satisfy

2
12
  • Multiplying by ? and rearranging, we obtain
  • ML estimate for the unknown population mean is
    just the arithmetic average of the training
    samples the sample mean!
  • Conclusion
  • If p(xk ?j) (j 1, 2, , c) is supposed to be
    Gaussian in a d-dimensional feature space then
    we can estimate the vector
  • ? (?1, ?2, , ?c)t and perform an optimal
    classification!

2
13
  • ML Estimation
  • Gaussian Case unknown ? and ? ? (?1, ?2)
    (?, ?2)

2
14
  • Summation
  • Combining (1) and (2), one obtains

2
15
  • Bias
  • ML estimate for ?2 is biased (expected value over
    all data sets of size n of the sample variance is
    not equal to the true variance)
  • An elementary unbiased estimator for ? is

2
16
  • Appendix ML Problem Statement
  • Let D x1, x2, , xn
  • P(x1,, xn ?) ?1,nP(xk ?) D n
  • Our goal is to determine (value of ? that
    makes this sample the most representative!)

2
17
D n
.
.
.
x2
.
.
.
x1
xn
N(?j, ?j) P(xj, ?1)
P(xj ?1)
P(xj ?k)
D1
.
x11
.
.
.
x10
Dk
.
Dc
x8
.
.
.
.
x20
.
x9
x1
.
.
2
18
  • ? (?1, ?2, , ?c)
  • Problem find such that

2
Write a Comment
User Comments (0)
About PowerShow.com