Theory - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

Theory

Description:

Expectation for two variables. Covariance / covariance matrix ... A discriminant is a function g(x), that discriminates between classes. ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 42

Provided by: robi170

Category:

more less

Transcript and Presenter's Notes

Title: Theory

1
Dept. of Electrical and Computer
Engineering 0909.402.02 / 0909.504.04
Lecture 2
Theory Applications of Pattern Recognition
Probability Theory
Random variables Probability mass function Joint
probability Expected values Mean/Variance/Covarian
ce Stat. independence Correlation Conditional
probability Bayes rule Vector RVs Normal
distribution Central limit theorem Gaussian
derivatives Multivariate densities
2
Probability Theory

Discrete random variables
Probability mass function
Cumulative mass function
Expected value (average)
Variance and standard deviation
Pairs of random variables
Joint probability
Joint distribution
Statistical independence
Expectation for two variables
Covariance / covariance matrix
Correlation / correlation coefficient

3
The Law of Total Probability

Conditional Probability
Bayes Rule
Prior probability
Likelihood
Evidence
Posterior probability

4
Vector Random VariablesContinuous Random
Variables

Joint distribution for vector random variables
Bayes rule for vector r.v.
Expectation, mean vector, covariance matrix for
vector r.v.
Probability density function pdf, ??????
Cumulative dist. function cdf, ??????
????? X? ??? x ?? ??? ?? ??
Fx(x) P(X lt x)
Dist. for sum of r.v.

5
Gaussian Distribution

Central Limit Theorem
Gaussian (Normal) Distribution
Mean and variance
Standardizing Gaussian dist.
Gaussian derivatives and integrals
Error function
Using Gaussian Distribution
Using tables of Gaussian distribution
Using MATLAB

6
Other Important Distributions

Chi-square
Poisson
Binomial
Beta
Gamma
Students t-distribution
F-distribution

7
Multivariate GaussianDistribution

Multivariate normal density function
Mahalanobis distance
Whitening Transform

8
Bayes Decision theory

Statistically best classifier
Based on quantifying the trade offs betweens
various classification decisions using a
probabilistic approach
The theory assumes
Decision problem can be posed in probabilistic
terms
All relevant probability values are known (in
practice this is not true)

9
Class Conditional Probabilities
P(x ?2) Class conditional probability for
salmon Likelihood Given that salmon has been
observed, what is the probability of this
salmons lightness is between 11 and 12?
?1 Sea bass ?2 Salmon
lightness
10
Definitions Bayes Decision Rule

State of nature
A priori probability (prior)
A posteriori probability (posterior)
Likelihood
Evidence

11
Posterior Probabilities

Bayes rule allows us to compute the posterior
probability (difficult to determine) from prior
probabilities, likelihood and the evidence
(easier to determine).

Posterior probabilities for priors P (?1) 2/3
and P(?2) 1/3. For example, given that a
pattern is measured to have feature value x 14,
the probability it is in category ?2 is roughly
0.08, and that it is in ?1 is 0.92. At every x,
the posteriors sum to 1.0.
12
Bayes Decision Rule

Choose the class that has the larger posterior
probability !!!

Choose ?i if P(?i x) gt P(?j x) for all i
1,2,,c P(error) min P(?1 x), P(?2 x),
, P(?c x) If there are multiple features,
xx1, x2,, xd ? Choose ?i if P(?i x) gt P(?j
x) for all i 1,2,,c P(error) min P(?1
x), P(?2 x), , P(?c x)
13
2.2 The Loss Function

Mathematical description of how costly each
action (making a class decision) is. Are certain
mistakes more costly than others?

?1, ?2,, ?c Set of states of nature
(classes) ?1, ?2, ?a Set of possible
actions. Note that a need not be same as c.
Because we may make more (or less) number of
actions than the number of classes. For example,
not making a decision is also an action. ?1,
?2, ?a Losses associated with each
action ?(?i ?j The loss function Loss
incurred for taking action i when the true state
of nature is in fact j. R(?i x) Conditional
risk - Expected loss for taking action i
Bayes decision takes the action that minimizes
the conditional risk !
14
Bayes Decision RuleUsing Conditional Risk

Compute conditional risk R(?i x) for each action
taken
Select the action that has the minimum
conditional risk. Let this be action k
The overall risk is then
This is the Bayes Risk, the minimum possible risk
that can be taken by any classifier !

15
2.3 Minimum Error Rate Classification

If we associate taking action i as selecting
class i, and if all errors are equally likely, we
obtain the zero-one loss
This loss function assigns no loss to correct
classification, and assigns 1 to
misclassification. The risk corresponding to this
loss function is then
which is precisely the average probability of
error. Clearly, to minimize this risk, we need to
choose the class that maximizes the posterior
probability, hence the Bayes rule !!!

16
????
17
2.4 Discriminant Based Classification

A discriminant is a function g(x), that
discriminates between classes. This function
assigns the input vector to a class according to
its definition Choose class i if
Bayes rule can be implemented in terms of
discriminant functions

Each discriminant function generates c decision
regions, R1,,Rc, which are separated by
decision boundaries. Decision regions need NOT
be contiguous. The decision boundary satisfies
18
The Normal Density (???? ??)

??? ???? ??????

??? ???? ??????

19
Normal ??? ?? ????

If likelihood probabilities are normally
distributed, then a number of simplifications can
be made. In particular, the discriminant function
can be written as in this greatly simplified form
(!)
? 2.4.1 ?? ?? ??? ??? ??
There are three distinct cases
that can occur

20
Case 1 _________

Features are statistically independent, and
all features have the same variance Dist. are
spherical in d dimensions, and the boundary is a
generalized hyperplane of d-1 dimensions, and
featured create equal sized hyperspherical
clusters. Examples of such hyperspherical
clusters are

21
Case 1________

This case results in linear discriminants of the
form

Note how priors shift the discriminant function
away from the more likely mean !!!
22
Case 2_______
Covariance matrices are arbitrary, but equal to
each other for all classes. Features then form
hyper- ellipsoidal clusters of equal size and
shape. This also results in linear discriminant
functions whose decision boundaries are again
hyperplanes
23
Case 3____________
All bets off !In two class case, the decision
boundaries form hyperquadratics. The
discriminant functions are now, in general,
quadratic (nor linear)
24
Case 3____________
For the multi class case, the boundaries will
look even more complicated. As an example
Decision Boundaries
25
Case 3____________
In 3-D
26
Error Probabilities
In a two class case, there are two sources of
error x is in R1, yet SON is w2, or vice versa
xBOptimal Bayes solution xNon-optimal solution

P(error)
27
????(Bayes Error)
????? ?? ????? ??-?? x? ??? ??? ???? P(error)?
?? ?? P(errorx)? ????,
??? ??????????
???? ???? ??? ??? ? ????, ?? ??? ???? ?? ? ???? ?
pp. 54, ? (71) ? ?? x? ?? ??? ??? ??? ??? ??? ??
28
Error Bounds
It is difficult, at best if possible, to
analytically compute the error probabilities,
Particularly when the decision regions are not
contiguous. However, upper bounds forthis error
can be obtained The Chernoff bound and its
approximation Bhattacharya bound are two such
bounds that are often used. If the distributions
are Gaussian, these expressions are
relativelyeasier to compute? Often times even
non-Gaussian cases are considered as Gaussian.
29
2.7 Error probabilities and Integrals
30
2.8 Error Bounds for Normal Densities
2.8.1 Chernoff Bound
2.8.2 Bhattacharyya Bound
31
2.8.3 Signal Detection Theory and Operating
Characteristics
Receiver Operating Characteriotic

Discriminability

Can calculate the Bayes error rate
32
2.9 Bayes Decision Theory Discrete Features
Bayes formula
33
2.9.1 Independent Binary Features
Relevance of a yes answer for xi in determining
the classification
34
2.10.1 Missing Features
Marginal distribution
? Integrate the posterior probability over the
bad features
35
2.10.2 Noisy Features
Assume) if were know, would be
independent of
36
2.11 Bayesian Belief Networks
37
2.11 Bayesian Belief Networks
38
2.11 Bayesian Belief Networks
Useful in the case where we seek to determine
some particular configuration of other variables
? given evidence
39
2.11 Bayesian Belief Networks
1
40
2.12 Compound Bayesian Decision Theory and Context
Exploit statistical dependence to gain improved
performance ? by using context

Compound decision problem
Sequential compound decision problem

41
2.12 Compound Bayesian Decision Theory and Context
The posterior probability of ?
? The optimal procedure is to minimize the
compound conditional risk.
If no loss for being correct all errors are
equally costly, Procedure ? computing P(?X) for
all ?, selecting ?( posterior
probability is maximum )
In practice, enormous task(?cn) P(?) ?
dependent

Write a Comment

User Comments (0)