Marginalization - PowerPoint PPT Presentation

About This Presentation
Title:

Marginalization

Description:

Marginalization & Conditioning. Marginalization (summing out): for any ... Illustration of Independence. We know (product rule) that. Illustration continued ... – PowerPoint PPT presentation

Number of Views:2812
Avg rating:3.0/5.0
Slides: 28
Provided by: dpa86
Category:

less

Transcript and Presenter's Notes

Title: Marginalization


1
Marginalization Conditioning
  • Marginalization (summing out) for any sets of
    variables Y and Z
  • Conditioning(variant of marginalization)

2
Example of Marginalization
  • Using the full joint distribution

P(cavity) P(cavity, toothache, catch)
P(cavity, toothache, ? catch) P(cavity, ?
toothache, catch) P(cavity, ? toothache, ?
catch) 0.108 0.012 0.072 0.008
0.2
3
Inference By Enumeration using Full Joint
Distribution
  • Let X be a random variable about which we want to
    know its probabilities, given some evidence
    (values e for a set E of other variables). Let
    the remaining (unobserved, so-called hidden)
    variables be Y. The query is P(Xe), and it can
    be answered using the full joint distribution by

4
Example of Inference By Enumeration using Full
Joint Distribution
5
Independence
  • Propositions a and b are independent if and only
    if
  • Equivalently (by product rule)
  • Equivalently

6
Illustration of Independence
  • We know (product rule) that

7
Illustration continued
  • Allows us to represent a 32-element table for
    full joint on Weather, Toothache, Catch, Cavity
    by an 8-element table for the joint of Toothache,
    Catch, Cavity, and a 4-element table for Weather.
  • If we add a Boolean variable X to the 8-element
    table, we get 16 elements. A new 2-element table
    suffices with independence.



8
Difficulty with Bayes Rule with More than Two
Variables
9
Conditional Independence
  • X and Y are conditionally independent given Z if
    and only if P(X,YZ) P(XZ) P(YZ).
  • Y1,,Yn are conditionally independent given
    X1,,Xm if and only if P(Y1,,YnX1,,Xm)
    P(Y1X1,,Xm) P(Y2X1,,Xm) P(YmX1,,Xm).
  • Weve reduced 2n2m to 2n2m. Additional
    conditional independencies may reduce 2m.

10
Conditional Independence
  • As with absolute independence, the equivalent
    forms of X and Y being conditionally independent
    given Z can also be used
  • P(XY, Z) P(XZ) and
  • P(YX, Z) P(YZ)

11
Benefits of Conditional Independence
  • Allows probabilistic systems to scale up (tabular
    representations of full joint distributions
    quickly become too large.)
  • Conditional independence is much more commonly
    available than is absolute independence.

12
Decomposing a Full Joint by Conditional
Independence
  • Might assume Toothache and Catch are
    conditionally independent given Cavity
    P(Toothache,CatchCavity) P(ToothacheCavity)
    P(CatchCavity).
  • Then P(Toothache,Catch,Cavity) product rule
    P(Toothache,CatchCavity) P(Cavity) conditional
    independence P(ToothacheCavity) P(CatchCavity)
    P(Cavity).

13
Naive Bayes Algorithm
  • Let Fi be the i-th feature having valuej and Out
    be the target feature.
  • We can use training data to estimate
  • P(Fi vj)
  • P(Fi vj Out True)
  • P(Fi vj Out False)
  • P(Out True)
  • P(Out False)

14
Naive Bayes Algorithm
  • For a test example described by F1 v1 , ...,
    Fn vn , we need to compute
  • P(Out True F1 v1 , ..., Fn vn )
  • Applying Bayes rule
  • P(Out True F1 v1 , ..., Fn vn )
  • P(F1 v1 , ..., Fn vn Out True) P(Out
    True)
  • _______________________________________
  • P(F1 v1 , ..., Fn vn)

15
Naive Bayes Algorithm
  • By independence assumption
  • P(F1 v1 , ..., Fn vn) P(F1 v1 )x ...x
    P(Fn vn)
  • This leads to conditional independence
  • P(F1 v1 , ..., Fn vn Out True)
  • P(F1 v1 Out True) x ...x P(Fn vn Out
    True)

16
Naive Bayes Algorithm
  • P(Out True F1 v1 , ..., Fn vn )
  • P(F1 v1 Out True) x ...x P(Fn vn Out
    True)x P(Out True)
  • _______________________________________
  • P(F1 v1 )x ...x P(Fn vn)
  • All terms are computed using the training data!
  • Works well despite of strong assumptions(see
    Domingos and Pazzani MLJ 97) and thus provides
    a simple benchmark testset accuracy for a new
    data set

17
Bayesian Networks Motivation
  • Although the full joint distribution can answer
    any question about the domain it can become
    intractably large as the number of variable
    grows.
  • Specifying probabilities for atomic events is
    rather unnatural and may be very difficult.
  • Use a graphical representation for which we can
    more easily investigate the complexity of
    inference and can search for efficient inference
    algorithms.

18
Bayesian Networks
  • Capture independence and conditional independence
    where they exist, thus reducing the number of
    probabilities that need to be specified.
  • It represents dependencies among variables and
    encodes a concise specification of the full joint
    distribution.

19
A Bayesian Network is a ...
  • Directed Acyclic Graph (DAG) in which
  • the nodes denote random variables
  • each node X has a conditional probability
    distribution P(XParents(X)).
  • The intuitive meaning of an arc from X to Y is
    that X directly influences Y.

20
Additional Terminology
  • If X and its parents are discrete, we can
    represent the distribution P(XParents(X)) by a
    conditional probability table (CPT) specifying
    the probability of each value of X given each
    possible combination of settings for the
    variables in Parents(X).
  • A conditioning case is a row in this CPT (a
    setting of values for the parent nodes). Each row
    must sum to 1.

21
Bayesian Network Semantics
  • A Bayesian Network completely specifies a full
    joint distribution over its random variables, as
    below -- this is its meaning.
  • P
  • In the above, P(x1,,xn) is shorthand notation
    for P(X1x1,,Xnxn).

22
Inference Example
  • What is probability alarm sounds, but neither a
    burglary nor an earthquake has occurred, and both
    John and Mary call?
  • Using j for John Calls, a for Alarm, etc.

23
Chain Rule
  • Generalization of the product rule, easily proven
    by repeated application of the product rule.
  • Chain Rule

24
Chain Rule and BN Semantics
25
Example of the Key Property
  • The following conditional independence holds
  • P(MaryCalls JohnCalls, Alarm, Earthquake,
    Burglary)
  • P(MaryCalls Alarm)

26
Procedure for BN Construction
  • Choose relevant random variables.
  • While there are variables left

27
Principles to Guide Choices
  • Goal build a locally structured (sparse) network
    -- each component interacts with a bounded number
    of other components.
  • Add root causes first, then the variables that
    they influence.
Write a Comment
User Comments (0)
About PowerShow.com