Ch' 3' MaximumLikelihood and Bayesian Parameter Estimation - PowerPoint PPT Presentation

1 / 40

About This Presentation

Title:

Ch' 3' MaximumLikelihood and Bayesian Parameter Estimation

Description:

In a typical case, we merely have some vague, general knowledge ... Maximum A Posteriori (MAP) Estimation. Posterior density p( |D): p( |D) p(D| )p( )= l( )p ... – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 41

Provided by: bmeYon

Category:

more less

Transcript and Presenter's Notes

Title: Ch' 3' MaximumLikelihood and Bayesian Parameter Estimation

1
Ch. 3. Maximum-Likelihood and Bayesian Parameter
Estimation
2
Introduction

If we knew the prior probabilities P(?i) and the
class-conditional densities p(x ?i), we could
design an optimal classifier
Unfortunately, we rarely have this kind of
complete knowledge about the probabilistic
structure of the problem
In a typical case, we merely have some vague,
general knowledge about situation, with a number
of design samples or training data particular
representatives of patterns

3
Introduction

Problem find some way to use this information to
design or train the classifier
An approach to this problem is to use the
samples to estimate the unknown probabilities and
probability densities
And to use the result estimates as if they were
the true values

4
Introduction Maximum Likelihood Parameter
Estimation

The ML approach assumes the parameter are fixed,
but unknown
The ML approach is to estimate the best parameter
that maximize the probability of obtaining the
(given) training set
The ML approach seeks parameter estimates that
maximize likelihood function

5
Introduction Bayesian Estimation

Bayesian approach models the parameters to be
estimated as random variables with some (assumed)
known a priori distribution
Bayesian approach uses the training set to update
the training-set-conditioned density function of
the unknown parameters

6
Maximum Likelihood Estimation
7
Formulation of ML estimation

ML estimation assumes the parameters to be
estimated are unknown, but constant.
ML formulation assumes
We have a training set D in the form of c subsets
of the samples or feature vectors D1,D2,,Dc
Samples in Di are assumed to be generated by the
underlying density function for class i, p(x?i).
i.e. It is assumed the parametric form of p(x?i)
is known
Parameter vector ?i is the set of parameter to
be estimated for class i
In the Gaussian case where x N(mi,Ci), the
components of ?i are the elements of mi and Ci

8
Use of the training set, ML

Use of the Training Set
We consider the training of each class
separately.
Samples in Di do not give information about ?j,
j?i
i.e. It is assumed that the parameters for the
different classes are functionally independent
Use a set Di of training samples drawn
independently for the probability density p(x?i)
to estimate the unknown parameter vector ?i

9
The likelihood function

Suppose Dix1,x2,,xn
If the xk within Di are assumed independent, the
joint parameter-conditional pdf of Di is
where p(Di?i) is the likelihood function of ?i
Maximum-likelihood estimate of ?i Given Di, the
objective of ML is to find ?i that maximizes
p(Di?i)
i.e. find ?i to maximize the likelihood of ?i
The goal is to maximize p(Di?i) with respect to
parameter vector ?i

10
Maximum-likelihood estimation

Given Di, the objective of ML estimation is to
find ?i, that maximizes p(Di?i), i.e. find ?i to
maximize the likelihood of ?i.
The goal is to maximize p(Di ?i) with respect to
parameter vector ?i

11
ML Estimation Example - 1D Gaussian
12
ML Estimation
13
ML Estimation, log-likelihood
14
ML estimation Example Gaussian with unknown m,
known C
15
ML estimation Example Gaussian with unknown m,
unknown C
16
ML estimation example, 2D
17
ML estimation example, 3D
18
Maximum A Posteriori (MAP) Estimation

Posterior density p(?D) p(?D)?p(D ?)p(?)
l(?)p(?)
MAP estimation Find the value of ? that
maximizes
l(?)p(?)p(D ?)p(?)
Maximum likelihood estimator a MAP estimator for
the uniform (flat) prior
MAP estimator finds the peak (mode) of a
posterior density
Generally speaking, information on p(?) is
derived from the designers knowledge of the
problem domain (beyond our study of the
classifier design)

19
Bayesian Parameter Estimation
20
Bayesian Parameter Estimation

Although the desired probability density p(x) is
unknown, we assume that it has a known parameter
form
?The function p(x?) is completely known
The only thing assumed unknown is the value of
parameter vector ?
Any information about ? is assumed to be
contained in a density p(?)
Observation of the samples D converts this
density p(?) to a posterior density p(?D), which
is sharply peaked about the true value of ?

21
Bayesian Parameter Estimation
22
Class-conditional Densities
23
Basic Assumptions of Bayesian Parameter Estimation

The basic assumptions are summarized as
The form of the density p(x?) is assumed to be
known, but the value of the parameter vector ? is
not known exactly
The initial knowledge about ? is assumed to be
contained in a known a priori density p(?)
The rest of knowledge about ? is contained in a
set D of n samples x1,,xn drawn independently
according to the unknown probability density p(x)

24
The Parameter Distribution

The important problem is to compute the posterior
density p(?D), because we can calculate p(xD)
as follows

If p(?D) is sharply peaked about some value ?0,
then we obtain p(xD)p(x?0)
We are less certain about the exact value of ?
(general case), we use above equation

25
Example for Gaussian density with unknown mean
vector

Problem Calculate a posterior density p(?D)
and the desired pdf p(xD) for p(xm)N(m,C)

26
Example for Gaussian density with unknown mean
vector
27
Example for Gaussian density with unknown mean
vector
28
Example for Gaussian density with unknown mean
vector
29
Example for Gaussian density with unknown mean
vector
30
Estimation of p(mD)
As the training sample size increases, p(mD)
becomes more sharply-peaked
31
The Univariate Gaussian Case p(xD)
32
Bayesian Parameter Estimation General theory
The basic problem is to compute the posterior
density p(?D), because from this we can
calculate p(xD)

By Bayes formula

By independence assumption

33
Bayesian Parameter Estimation
34
Bayesian Parameter Estimation
35
Questions remain

Difficulty of carrying out these computation?
Convergence of p(xD) to p(x)?

36
When ML and Bayesian Methods Differs

ML is easier since they require merely
differential calculus techniques or gradient
search for ? (cf. complex multidimensional
integration needed for Bayesian estimation)
Bayesian method is more accurate

37
Bayesian Parameter Estimation

The probabilities p(?ix), i1,2,,c are needed
for classification
The objective is to form the posterior
probabilities p(?ix,Di) for given training set
Di
A application of Bayes rule yields

38
Estimation of p(?ix,Di)

The estimation of the posterior probabilities
p(?ix,Di) needs computation of a priori
probability p(?iDi) and the density function
p(x ?i,Di)
Assumption for simplication
1. The probability p(?iDi) is independent of a
training set Di. i.e.,
2. The probability p(?i), i1,2,,c is known
3. A training set Di have information about only
the parameter of class ?i
4. The functional form of p(xDi) is known

39
Estimation of p(?ix,Di)