Pattern Recognition: Baysian Decision Theory - PowerPoint PPT Presentation

About This Presentation
Title:

Pattern Recognition: Baysian Decision Theory

Description:

Pattern Recognition: Baysian Decision Theory Charles Tappert Seidenberg School of CSIS, Pace University Pattern Classification Most of the material in these s ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 21
Provided by: ctap1
Learn more at: http://csis.pace.edu
Category:

less

Transcript and Presenter's Notes

Title: Pattern Recognition: Baysian Decision Theory


1
Pattern RecognitionBaysian Decision Theory
  • Charles Tappert
  • Seidenberg School of CSIS, Pace University

2
Pattern ClassificationMost of the material in
these slides was taken from the figures in
Pattern Classification (2nd ed) by R. O. Duda,
P. E. Hart and D. G. Stork, John Wiley Sons,
2001
3
Baysian Decision Theory
  • Fundamental pure statistical approach
  • Assumes relevant probabilities are known
    perfectly
  • Makes theoretically optimal decisions

4
Baysian Decision Theory
  • Based on Bayes formula
  • P(?j x) p(x ?j) P(?j) / p(x)
  • which is easily derived from writing the joint
    probability density two ways
  • P(?j , x) P(?jx) p(x)
  • P(?j , x) p(x?j) p(?j)
  • Note uppercase P(.) denotes a probability mass
    function and lowercase p(.) a density function

5
Bayes Formula
  • Bayes formula
  • P(?j x) p(x ?j) P(?j) / p(x)
  • can be expressed informally in English as
  • posterior likelihood x prior / evidence
  • and Bayes decision chooses the class j with the
    greatest posterior probability

6
Bayes Formula
  • Bayes formula P(?j x) p(x ?j) P(?j) / p(x)
  • Bayes decision chooses class j with the greatest
    P(?j x)
  • Since p(x) is the same for all classes, greatest
    P(?j x) means greatest p(x ?j) P(?j)
  • Special case if all classes are equally likely,
    i.e. same P(?j), we get a further simplification
    greatest P(?j x) is greatest likelihood p(x
    ?j)

7
Baysian Decision Theory
  • Now, lets look at the fish example of two
    classes sea bass and salmon and one feature
    lightness
  • Let p(x ?1) and p(x ?2) describe the
    difference in lightness between populations of
    sea bass and salmon (see next slide)

8
(No Transcript)
9
Baysian Decision Theory
  • In the previous slide, if the two classes are
    equally likely, we get the simplification
    greatest posterior means greatest likelihood, and
    Bayes decision is to choose class 1 when p(x
    ?1) gt p(x ?2), i.e. when lightness is gt
    approximately 12.4
  • However, if the two classes are not equally
    likely, we get a case like the next slide

10
(No Transcript)
11
Baysian Parameter Estimation
  • Because the actual probabilities are rarely
    known, they are usually estimated after assuming
    the form of the distributions
  • The usually assumed form of the distributions is
    multivariate normal

12
Baysian Parameter Estimation
  • Assuming multivariate normal probability density
    functions, it is necessary to estimate for each
    pattern class
  • Feature means
  • Feature covariance matrices

13
Multivariate Normal Densities
  • Simplifying assumptions can be made for
    multivariate normal density functions
  • Statistically independent features with equal
    variances yields hyperplane decision surfaces
  • Equal covariance matrices for each class also
    yields hyperplane decision surfaces
  • Arbitrary normal distributions yields
    hyperquadric decision surfaces

14
Nonparametric Techniques
  • Probabilities are not known
  • Two approaches
  • Estimate the density functions from sample
    patterns
  • Bypass probability estimation entirely
  • Use a non-parametric method
  • Such as k-Nearest-Neighbor

15
k-Nearest-Neighbor
16
k-Nearest-Neighbor (k-NN) Method
  • Used where probabilities are not known
  • Bypasses probability estimation entirely
  • Easy to implement
  • Asymptotic error never worst than twice Baysian
    error
  • Computationally intense, therefore slow

17
Simple PR System with k-NN
  • Good for feasibility studies easy to implement
  • Typical procedural steps
  • Extract feature measurements
  • Normalize features to 0-1 range
  • Classify by k nearest neighbor
  • Using Euclidean distance

18
Simple PR System with k-NN (cont)Two Modes of
Operation
  • Leave-one-out procedure
  • One input file of training/test patterns
  • Repeatedly train on all samples except one which
    is left for testing
  • Good for feasibility study with little data
  • Train and test on separate files
  • One input file for training and one for testing
  • Good for measuring performance change when
    varying an independent variable (e.g., different
    keyboards for keystroke biometric)

19
Simple PR System with k-NN (cont)
  • Used in keystroke biometric studies
  • Feasibility study Dr. Mary Curtin
  • Different keyboards/modes Dr. Mary Villani
  • Used in other studies that used keystroke data
  • Study of procedures for handling incomplete and
    missing data e.g., fallback procedures in the
    keystroke biometric system Dr. Mark Ritzmann
  • New kNN-ROC procedures Dr. Robert Zack
  • Used in other biometric studies
  • Mouse movement Larry Immohr
  • Stylometry keystroke study John Stewart

20
Conclusions
  • Bayes decision method best if probabilities known
  • Bayes method okay if you are good with statistics
    and the form of the probability distributions can
    be assumed, especially if there is justification
    for simplifying assumptions like independent
    features
  • Otherwise, stay with easier to implement methods
    that provide reasonable results, like k-NN
Write a Comment
User Comments (0)
About PowerShow.com