Bayesian Statistics and Belief Networks - PowerPoint PPT Presentation

About This Presentation

Title:

Bayesian Statistics and Belief Networks

Description:

Bayesian methods are capable of handling noisy, incomplete data sets ... Russel, S. and Norvig, P. (1995). Artificial Intelligence, A Modern Approach. Prentice Hall. ... – PowerPoint PPT presentation

Number of Views:91

Avg rating:3.0/5.0

Slides: 26

Provided by: Kenric7

Learn more at: http://www.math.uaa.alaska.edu

Category:

more less

Transcript and Presenter's Notes

Title: Bayesian Statistics and Belief Networks

1
Bayesian Statistics and Belief Networks
2
Overview

Book Ch 8.3
Refresher on Bayesian statistics
Bayesian classifiers
Belief Networks / Bayesian Networks

3
Why Should We Care?

Theoretical framework for machine learning,
classification, knowledge representation,
analysis
Bayesian methods are capable of handling noisy,
incomplete data sets
Bayesian methods are commonly in use today

4
Bayesian Approach To Probability and Statistics

Classical Probability Physical property of the
world (e.g., 50 flip of a fair coin). True
probability.
Bayesian Probability A persons degree of
belief in event X. Personal probability.
Unlike classical probability, Bayesian
probabilities benefit from but do not require
repeated trials - only focus on next event e.g.
probability Seawolves win next game?

5
Bayes Rule
Product Rule
Equating Sides
i.e.
All classification methods can be seen as
estimates of Bayes Rule, with different
techniques to estimate P(evidenceClass).
6
Simple Bayes Rule Example
Probability your computer has a virus, V,
1/1000. If virused, the probability of a crash
that day, C, 4/5. Probability your computer
crashes in one day, C, 1/10.
P(CV)0.8
P(V)1/1000
P(C)1/10
Even though a crash is a strong indicator of a
virus, we expect only 8/1000 crashes to be caused
by viruses.
Why not compute P(VC) from direct evidence?
Causal vs. Diagnostic knowledge (consider if
P(C) suddenly drops).
7
Bayesian Classifiers
If were selecting the single most likely class,
we only need to find the class that maximizes
P(eClass)P(Class).
Hard part is estimating P(eClass).
Evidence e typically consists of a set of
observations
Usual simplifying assumption is conditional
independence
8
Bayesian Classifier Example
Probability CVirus CBad Disk
P(C) 0.4 0.6
P(crashesC) 0.1 0.2
P(diskfullC) 0.6 0.1
Given a case where the disk is full and computer
crashes, the classifier chooses Virus as most
likely since (0.4)(0.1)(0.6) gt (0.6)(0.2)(0.1).
9
Beyond Conditional Independence
Linear Classifier
C1
C2

Include second-order dependencies i.e. pairwise
combination of variables via joint probabilities

Correction factor -
Difficult to compute -
joint probabilities to consider
10
Belief Networks

DAG that represents the dependencies between
variables and specifies the joint probability
distribution
Random variables make up the nodes
Directed links represent causal direct influences
Each node has a conditional probability table
quantifying the effects from the parents
No directed cycles

11
Burglary Alarm Example
P(B)
P(E)
Burglary
Earthquake
0.001
0.002
B E P(A)
T T 0.95
Alarm
T F 0.94
F T 0.29
F F 0.001
A P(J)
A P(M)
John Calls
Mary Calls
T 0.70
T 0.90
F 0.01
F 0.05
12
Sample Bayesian Network
13
Using The Belief Network
P(B)
P(E)
Burglary
Earthquake
0.002
0.001
B E P(A)
T T 0.95
Alarm
T F 0.94
F T 0.29
F F 0.001
A P(M)
John Calls
Mary Calls
T 0.70
A P(J)
F 0.01
T 0.90
F 0.05
Probability of alarm, no burglary or earthquake,
both John and Mary call
14
Belief Computations

Two types both are NP-Hard
Belief Revision
Model explanatory/diagnostic tasks
Given evidence, what is the most likely
hypothesis to explain the evidence?
Also called abductive reasoning
Belief Updating
Queries
Given evidence, what is the probability of some
other random variable occurring?

15
Belief Revision

Given some evidence variables, find the state of
all other variables that maximize the
probability.
E.g. We know John Calls, but not Mary. What is
the most likely state? Only consider assignments
where JT and MF, and maximize. Best

16
Belief Updating

Causal Inferences
Diagnostic Inferences
Intercausal Inferences
Mixed Inferences

E
Q
Q
E
Q
E
E
E
Q
17
Causal Inferences
P(B)
P(E)
Burglary
Earthquake
Inference from cause to effect. E.g. Given a
burglary, what is P(JB)?
0.002
0.001
B E P(A)
T T 0.95
Alarm
T F 0.94
F T 0.29
F F 0.001
A P(M)
John Calls
Mary Calls
T 0.70
A P(J)
F 0.01
T 0.90
F 0.05
P(MB)0.67 via similar calculations
18
Diagnostic Inferences
From effect to cause. E.g. Given that John calls,
what is the P(burglary)?
What is P(J)? Need P(A) first
Many false positives.
19
Intercausal Inferences
Explaining Away Inferences.
Given an alarm, P(BA)0.37. But if we add the
evidence that earthquake is true, then
P(BAE)0.003.
Even though B and E are independent, the presence
of one may make the other more/less likely.
20
Mixed Inferences
Simultaneous intercausal and diagnostic inference.
E.g., if John calls and Earthquake is false
Computing these values exactly is somewhat
complicated.
21
Exact Computation - Polytree Algorithm

Judea Pearl, 1982
Only works on singly-connected networks - at most
one undirected path between any two nodes.
Backward-chaining Message-passing algorithm for
computing posterior probabilities for query node
X
Compute causal support for X, evidence variables
above X
Compute evidential support for X, evidence
variables below X

22
Polytree Computation
...
U(1)
U(m)
X
Z(1,j)
Z(n,j)
...
Y(1)
Y(n)
Algorithm recursive, message passing chain
23
Other Query Methods

Exact Algorithms
Clustering
Cluster nodes to make single cluster,
message-pass along that cluster
Symbolic Probabilistic Inference
Uses d-separation to find expressions to combine
Approximate Algorithms
Select sampling distribution, conduct trials
sampling from root to evidence nodes,
accumulating weight for each node. Still
tractable for dense networks.
Forward Simulation
Stochastic Simulation

24
Summary

Bayesian methods provide sound theory and
framework for implementation of classifiers
Bayesian networks a natural way to represent
conditional independence information.
Qualitative info in links, quantitative in
tables.
NP-complete or NP-hard to compute exact values
typical to make simplifying assumptions or
approximate methods.
Many Bayesian tools and systems exist

25
References

Russel, S. and Norvig, P. (1995). Artificial
Intelligence, A Modern Approach. Prentice Hall.
Weiss, S. and Kulikowski, C. (1991). Computer
Systems That Learn. Morgan Kaufman.
Heckerman, D. (1996). A Tutorial on Learning
with Bayesian Networks. Microsoft Technical
Report MSR-TR-95-06.
Internet Resources on Bayesian Networks and
Machine Learning http//www.cs.orst.edu/wangxi/r
esource.html