COMPE 467 - Pattern Recognition - PowerPoint PPT Presentation

1 / 75
About This Presentation
Title:

COMPE 467 - Pattern Recognition

Description:

the determinant of of the : det = 2d Because: the inverse of of the : -1 = (1/ 2) Because: by using: det = 2d ... – PowerPoint PPT presentation

Number of Views:194
Avg rating:3.0/5.0
Slides: 76
Provided by: ati45
Category:

less

Transcript and Presenter's Notes

Title: COMPE 467 - Pattern Recognition


1
Bayesian Decision Theory
COMPE 467 - Pattern Recognition
2
Bayesian Decision Theory
  • Bayesian Decision Theory is a fundamental
    statistical approach that quanti?es the trade
    offs between various decisions using
    probabilities and costs that accompany such
    decisions.
  • First, we will assume that all probabilities are
    known.
  • Then, we will study the cases where the
    probabilistic structure is not completely known.

3
Fish Sorting Example
  • State of nature is a random variable.
  • De?ne w as the type of ?sh we observe (state of
    nature, class) where
  • w w1 for sea bass,
  • w w2 for salmon.
  • P(w1) is the a priori probability that the next
    ?sh is a sea bass.
  • P(w2) is the a priori probability that the next
    ?sh is a salmon.

4
Prior Probabilities
  • Prior probabilities re?ect our knowledge of how
    likely each type of ?sh will appear before we
    actually see it.
  • How can we choose P(w1) and P(w2)?
  • Set P(w1) P(w2) if they are equiprobable
    (uniform priors).
  • May use different values depending on the ?shing
    area, time of the year, etc.
  • Assume there are no other types of ?sh
  • P(w1) P(w2) 1

5
Making a Decision
  • How can we make a decision with only the prior
    information?
  • What is the probability of error for this
    decision?
  • P(error ) minP(w1), P(w2)

6
Making a Decision
  • Decision rule with only the prior information
  • Decide ?1 if P(?1) gt P(?2) otherwise decide ?2
  • Make further measurement and compute the class
    conditional densities

7
Class-Conditional Probabilities
  • Lets try to improve the decision using the
    lightness measurement x.
  • Let x be a continuous random variable.
  • De?ne p(xwj) as the class-conditional
    probability density (probability of x given that
    the state of nature is wj for j 1, 2).
  • p(xw1) and p(xw2) describe the difference in
    lightness between populations of sea bass and
    salmon

8
Class-Conditional Probabilities
9
Posterior Probabilities
  • Suppose we know P(wj) and p(xwj) for j 1, 2,
    and measure the lightness of a ?sh as the value
    x.
  • De?ne P(wj x) as the a posteriori probability
    (probability of the state of nature being wj
    given the measurement of feature value x).
  • We can use the Bayes formula to convert the prior
    probability to the posterior probability
  • where

10
Posterior Probabilities (remember)
11
Posterior Probabilities (remember)
Posterior (Likelihood . Prior) / Evidence
12
Making a Decision
  • p(xwj) is called the likelihood and p(x) is
    called the evidence.
  • How can we make a decision after observing the
    value of x?

13
Making a Decision
  • Decision strategy Given the posterior
    probabilities for each class
  • X is an observation for which
  • if P(?1 x) gt P(?2 x) True state of
    nature ?1
  • if P(?1 x) lt P(?2 x) True state of
    nature ?2

14
Making a Decision
  • p(xwj) is called the likelihood and p(x) is
    called the evidence.
  • How can we make a decision after observing the
    value of x?
  • Rewriting the rule gives
  • Note that, at every x, P(w1x) P(w2x) 1.

15
Probability of Error
  • What is the probability of error for this
    decision?
  • What is the probability of error?

16
Probability of Error
  • Decision strategy for Minimizing the probability
    of error
  • Decide ?1 if P(?1 x) gt P(?2 x) otherwise
    decide ?2
  • Therefore
  • P(error x) min P(?1 x), P(?2 x)
  • (Bayes
    decision)

17
Probability of Error
  • What is the probability of error for this
    decision?
  • What is the probability of error?
  • Bayes decision rule minimizes this error because

18
Example
19
Example (cont.)
20
Example (cont.)
Assign colours to objects.
21
Example (cont.)
22
Example (cont.)
23
Example (cont.)
24
Example (cont.)
Assign colour to pen objects.
25
Example (cont.)
26
Example (cont.)
Assign colour to paper objects.
27
Example (cont.)
28
Example (cont.)
29
Bayesian Decision Theory
  • How can we generalize to
  • more than one feature?
  • replace the scalar x by the feature vector x
  • more than two states of nature?
  • just a difference in notation
  • allowing actions other than just decisions?
  • allow the possibility of rejection
  • different risks in the decision?
  • de?ne how costly each action is

30
Bayesian Decision Theory
  • Let w1, . . . ,wc be the ?nite set of c states
    of nature (classes, categories).
  • Let a1, . . . , aa be the ?nite set of a
    possible actions.
  • Let ?(aiwj) be the loss incurred for taking
    action ai when the state of nature is wj .
  • Let x be the d-component vector-valued random
    variable called the feature vector .

31
Bayesian Decision Theory
  • p(xwj) is the class-conditional probability
    density function.
  • P(wj) is the prior probability that nature is in
    state wj .
  • The posterior probability can be computed as
  • where

32
Loss function
  • Allow actions and not only decide on the state of
    nature. How costly an action is?
  • Introduce a loss function which is more general
    than the probability of error
  • The loss function states how costly each action
    taken is
  • Allowing actions other than classification
    primarily allows the possibility of rejection
  • Refusing to make a decision in close or bad cases

33
Loss function
Let ?1, ?2,, ?c be the set of c states of
nature (or categories) Let , ?(x) maps a
pattern x into one of the actions from ?1,
?2,, ?a, the set of possible actions Let,
?(?i ?j) be the loss incurred for taking action
?i when the category is ?j
34
Conditional Risk
  • Suppose we observe x and take action ai.
  • If the true state of nature is wj , we incur the
    loss ?(aiwj).
  • The expected loss with taking action ai is
  • which is also called the conditional risk.

35
Ex. Target Detection
Actual class ?1 (var) ?2 (yok)
Choose ?1 ?(?1 ?1) hit ?(?1 ?2) false alarm
Choose ?2 ?(?2 ?1) miss ?(?2 ?2) do nothing
36
Minimum-Risk Classi?cation
  • The general decision rule a(x) tells us which
    action to take for observation x.
  • We want to ?nd the decision rule that minimizes
    the overall risk
  • Bayes decision rule minimizes the overall risk by
    selecting the action ai for which R(aix) is
    minimum.
  • The resulting minimum overall risk is called the
    Bayes risk and is the best performance that can
    be achieved.

37
Two-Category Classi?cation
  • De?ne
  • Conditional risks can be written as

38
Two-Category Classi?cation
  • The minimum-risk decision rule becomes
  • This corresponds to deciding w1 if
  • ? comparing the likelihood ratio to a threshold
    that is
  • independent of the observation x.

39
Optimal decision property
If the likelihood ratio exceeds a threshold
value T, independent of the input pattern x, we
can take optimal actions
40
Minimum-Error-Rate Classi?cation
  • Actions are decisions on classes (ai is deciding
    wi).
  • If action ai is taken and the true state of
    nature is wj , then the decision is correct if i
    j and in error if i ? j.
  • We want to ?nd a decision rule that minimizes the
    probability of error

41
Minimum-Error-Rate Classi?cation
  • De?ne the zero-one loss function
  • (all errors are equally costly).
  • I Conditional risk becomes

42
Minimum-Error-Rate Classi?cation
  • Minimizing the risk requires maximizing P(wix)
    and results in the minimum-error decision rule
  • Decide wi if P(wix) gt P(wj x) ?j i.
  • The resulting error is called the Bayes error and
    is the best performance that can be achieved.

43
Minimum-Error-Rate Classi?cation
Regions of decision and zero-one loss function,
therefore
44
Minimum-Error-Rate Classi?cation
45
Discriminant Functions
  • A useful way of representing classi?ers is
    through discriminant functions gi(x), i 1, . .
    . , c, where the classi?er assigns a feature
    vector x to class wi if
  • For the classi?er that minimizes conditional risk
  • For the classi?er that minimizes error

46
Discriminant Functions
47
Discriminant Functions
  • These functions divide the feature space into c
    decision regions (R1, . . . , Rc), separated by
    decision boundaries.

48
Discriminant Functions
gi(x) can be any monotonically increasing
function of P(?i x)
  • gi(x) ? f (P(?i x) ) P(x ?i) P(?i)
  • or natural logarithm of any function of P(?i
    x)
  • gi(x) ln P(x ?i) ln P(?i)

49
Discriminant Functions
  • The two-category case
  • A classifier is a dichotomizer that has two
    discriminant functions g1 and g2
  • Let g(x) ? g1(x) g2(x)
  • Decide ?1 if g(x) gt 0 Otherwise decide ?2

50
Discriminant Functions
  • The two-category case
  • The computation of g(x)

51
Example
52
Exercise
53
Example
54
Exercise
  • Select the optimal decision where
  • ?1, ?2
  • P(x ?1) N(2, 0.5) (Normal
    distribution)
  • P(x ?2) N(1.5, 0.2)
  • P(?1) 2/3
  • P(?2) 1/3

55
The Gaussian Density
  • Gaussian can be considered as a model where the
    feature vectors for a given class are
    continuous-valued, randomly corrupted versions of
    a single typical or prototype vector.
  • Some proper ties of the Gaussian
  • Analytically tractable.
  • Completely speci?ed by the 1st and 2nd moments.
  • Has the maximum entropy of all distributions
    with a given mean and variance.
  • Many processes are asymptotically Gaussian
    (Central Limit Theorem).
  • Linear transfor mations of a Gaussian are also
    Gaussian.

56
Univariate Gaussian
57
Univariate Gaussian
58
Multivariate Gaussian
59
Linear Transformations
60
Linear Transformations
61
Mahalanobis Distance
Mahalanobis distance takes into account the
covariance among the the variables in calculating
distance.
62
Mahalanobis Distance
  • Takes into account the covariance among the the
    variables in calculating distance.

63
Discriminant Functions for the Gaussian Density
Assume that class conditional density p(x ?i)
is multivariate normal
64
Discriminant Functions for the Gaussian Density
65
  • the simplest case,
  • the features are statistically independent,
  • each feature has the same variance.

66
  • the determinant of of the ?
  • det ? s2d
  • Because

67
  • the inverse of of the ?
  • ?-1 (1/s2) ?
  • Because

68
  • by using
  • det ? s2d ?-1
    (1/s2) ?

69
(No Transcript)
70
  • The quadratic term is same for all functions, so
    we can omit the quadratic term.

71
(No Transcript)
72
(No Transcript)
73
(No Transcript)
74
(No Transcript)
75
References
  • R.O. Duda, P.E. Hart, and D.G. Stork, Pattern
    Classification, New York John Wiley, 2001.
  • Selim Aksoy, Pattern Recognition Course
    Materials, 2011.
  • M. Narasimha Murty, V. Susheela Devi, Pattern
    Recognition an Algorithmic Approach, Springer,
    2011.
Write a Comment
User Comments (0)
About PowerShow.com