Bayesian Decision Theory - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Bayesian Decision Theory

Description:

Now we consider the case when the feature vector x is such. that each component takes on discrete values. ... It is also called a casual network or belief net. ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 22
Provided by: ricardo125
Category:

less

Transcript and Presenter's Notes

Title: Bayesian Decision Theory


1
Bayesian Decision Theory
  • Basic Concepts
  • Discriminant Functions
  • The Normal Density
  • Decision Theory for Discrete Features
  • Bayesian Belief Networks

2
Discrete Features
Now we consider the case when the feature vector
x is such that each component takes on discrete
values. Examples Color, Texture, Type, etc.
We no longer have probability densities, but
probability masses
? p(xwj) dx becomes Sx P(xwj)
All we have said about Bayes rule and conditional
risks remains the same.
3
Independent Binary Features
Consider two classes only w1 and w2. Consider
also that features are binary-valued X ( x1,
x2, , xd )t each xi can be
either 1 or 0.
Now, for each feature component define two
probabilities pi P xi 1 w1 qi P xi
1 w2 Remember it is usually very
complicated to compute the following probability
in Bayes formula P(wj x) P(xwj) P(wj) /
P(x)
4
Independent Binary Features
Things, however, are much easier if we assume
attributes to be independent. In that case we
can define the following likelihoods based simply
on products of conditional probabilities
P(xw1) ? pi x (1 pi) 1 x P(xw2)
? qi x (1 qi) 1 - x
The likelihood ratio is then P(xw1) / P(xw2)
? (pi / qi) x (1 pi) / (1 qi) 1 x
5
Independent Binary Features
Now remember that using Bayes rule, we select w1
if P(xw1) / P(xw2) gt P(w2) / P(w1) or
P(xw1) / P(xw2) - P(w2) / P(w1) gt
0 Then our discriminant function is g(x) S
xi ln (pi / qi) (1 xi) ln (1 pi) / (1
qi) ln P(w1) / P(w2)
This can be expressed as g(x) S wi xi w0 wi
ln pi (1 qi) / qi (1 pi) and w0 S
ln (1 pi) / (1 qi) ln P(w1) / P(w2)
(linear combination of features)
6
Bayesian Belief Networks
What does it mean for two variables to be
independent? Consider a multidimensional
distribution p(x). If for two features we know
that p(xi,xj) p(xi)p(xj) we say the features
are statistically independent. If we know which
features are independent and which not we can
simplify the computation of joint probabilities.
7
Figure 2.23
8
Bayesian Belief Networks
A Bayesian Belief Network is a method to describe
the joint probability distribution of a set of
variables. It is also called a casual network
or belief net. Let x1, x2, , xn be a set of
variables or features. A Bayesian Belief
Network or BBN will tell us the probability of
any combination of x1, x2 , .., xn.
9
An Example
Set of Boolean variables and their relations
Storm
Bus Tour Group
Lightning
Campfire
Thunder
Forest Fire
10
Conditional Probabilities
S,B S,B S,B S,B
C 0.4 0.1 0.8 0.2
C 0.6 0.9 0.2 0.8
Storm
Bus Tour Group
Campfire
11
Conditional Independence
We say x1 is conditionally independent of x2
given x3 if the probability of x1 is independent
of x2 given x3 P(x1x2,x3) P(x1x3) The same
can be said for a set of variables x1,x2,x3 is
independent of y1,y2,y3 given z1,z2,z3 P(x1,x2,x
2y1,y2,y3,z1,z2,z3) P(x1,x2,x3z1,z2,z3)
12
Representation
  • A BBN represents the joint probability
    distribution of a set of variables
  • by explicitly indicating the assumptions of
    conditional independence
  • through
  • directed acyclic graph and
  • local conditional probabilities.

Storm
Bus Tour Group
variable
conditional probabilities
Campfire
13
Representation
Each variable is independent of its
nondescendants given its predecessors. We say x1
is a descendant of x2 if there is a direct path
from x2 to x1. Example Predecessors of
Campfire Storm, Bus Tour Group. (Campfire is a
descendant of these two variables). Campfire is
independent of Lightning given its predecessors.

Bus Tour Group
Storm
Lightning
Campfire
14
Figure 2.25
15
Joint Probability Distributionn
To compute the joint probability distribution of
a set of variables given a Bayesian Belief
Network we simply use the following
formula P(x1,x2,,xn) ? i P(xi
Parents(xi)) Where parents are the immediate
predecessors of xi. Example P(Campfire, Storm,
BusGroupTour, Lightning, Thunder,
ForestFire)? P(Storm)P(BusTourGroup)P(CampfireSt
orm,BusTourGroup) P(LightningStorm)P(ThunderLigh
tning)P(ForestFireLightning,Storm,Campfire).
16
Joint Distribution, An Example
Storm
Bus Tour Group
Lightning
Campfire
Thunder
Forest Fire
P(Storm)P(BusTourGroup)P(CampfireStorm,BusTourGro
up) P(LightningStorm)P(ThunderLightning)P(Forest
FireLightning,Storm,Campfire).
17
Conditional Probabilities, An Example
S,B S,B S,B S,B
C 0.4 0.1 0.8 0.2
C 0.6 0.9 0.2 0.8
Storm
Bus Tour Group
P(CampfiretrueStormtrue,BusTourGrouptrue)
0.4
Campfire
18
Learning Belief Networks
  • We can learn BBN in different ways. Two basic
    approaches follow
  • Assume we know the network structure
  • We can estimate the conditional
    probabilities for each variable
  • from the data.
  • Assume we know part of the structure but some
    variables are missing
  • This is like learning hidden units in a
    neural network.
  • One can use a gradient ascent method to
    train the BBN.
  • Assume nothing is known.
  • We can learn the structure and conditional
    probabilities by looking
  • in the space of possible networks.

19
Naïve Bayes
What is the connection between a BBN and
classification? Suppose one of the variables is
the target variable. Can we compute the
probability of the target variable given the
other variables? In Naïve Bayes
Concept wj
X1
xn
X2

P(x1,x2,xn,wj) P(wj) P(x1wj) P(x2wj)
P(xnwj)
20
General Case
In the general case we can use a BBN to specify
independence assumptions among
variables. General Case
Concept wj
X1
x4
X2
X3
P(x1,x2,xn,wj) P(wj) P(x1wj) P(x2wj)
P(x3x1,x2,wj)P(x4,wj)
21
Points to Remember
  • Bayes formula and Bayes decision theory
  • Loss functions (e.g., zero-one loss)
  • Discriminant functions
  • ROC curves
  • Discriminant functions for independent
    binary-valued features
  • Conditional independence
  • Bayesian Belief Networks
  • Naïve Bayes
Write a Comment
User Comments (0)
About PowerShow.com