Title: Bayesian Statistics and Belief Networks
1Bayesian Statistics and Belief Networks
2Overview
- Book Ch 8.3
- Refresher on Bayesian statistics
- Bayesian classifiers
- Belief Networks / Bayesian Networks
3Why Should We Care?
- Theoretical framework for machine learning,
classification, knowledge representation,
analysis - Bayesian methods are capable of handling noisy,
incomplete data sets - Bayesian methods are commonly in use today
4Bayesian Approach To Probability and Statistics
- Classical Probability Physical property of the
world (e.g., 50 flip of a fair coin). True
probability. - Bayesian Probability A persons degree of
belief in event X. Personal probability. - Unlike classical probability, Bayesian
probabilities benefit from but do not require
repeated trials - only focus on next event e.g.
probability Seawolves win next game?
5Bayes Rule
Product Rule
Equating Sides
i.e.
All classification methods can be seen as
estimates of Bayes Rule, with different
techniques to estimate P(evidenceClass).
6Simple Bayes Rule Example
Probability your computer has a virus, V,
1/1000. If virused, the probability of a crash
that day, C, 4/5. Probability your computer
crashes in one day, C, 1/10.
P(CV)0.8
P(V)1/1000
P(C)1/10
Even though a crash is a strong indicator of a
virus, we expect only 8/1000 crashes to be caused
by viruses.
Why not compute P(VC) from direct evidence?
Causal vs. Diagnostic knowledge (consider if
P(C) suddenly drops).
7Bayesian Classifiers
If were selecting the single most likely class,
we only need to find the class that maximizes
P(eClass)P(Class).
Hard part is estimating P(eClass).
Evidence e typically consists of a set of
observations
Usual simplifying assumption is conditional
independence
8Bayesian Classifier Example
Probability CVirus CBad Disk
P(C) 0.4 0.6
P(crashesC) 0.1 0.2
P(diskfullC) 0.6 0.1
Given a case where the disk is full and computer
crashes, the classifier chooses Virus as most
likely since (0.4)(0.1)(0.6) gt (0.6)(0.2)(0.1).
9Beyond Conditional Independence
Linear Classifier
C1
C2
- Include second-order dependencies i.e. pairwise
combination of variables via joint probabilities
Correction factor -
Difficult to compute -
joint probabilities to consider
10Belief Networks
- DAG that represents the dependencies between
variables and specifies the joint probability
distribution - Random variables make up the nodes
- Directed links represent causal direct influences
- Each node has a conditional probability table
quantifying the effects from the parents - No directed cycles
11Burglary Alarm Example
P(B)
P(E)
Burglary
Earthquake
0.001
0.002
B E P(A)
T T 0.95
Alarm
T F 0.94
F T 0.29
F F 0.001
A P(J)
A P(M)
John Calls
Mary Calls
T 0.70
T 0.90
F 0.01
F 0.05
12Sample Bayesian Network
13Using The Belief Network
P(B)
P(E)
Burglary
Earthquake
0.002
0.001
B E P(A)
T T 0.95
Alarm
T F 0.94
F T 0.29
F F 0.001
A P(M)
John Calls
Mary Calls
T 0.70
A P(J)
F 0.01
T 0.90
F 0.05
Probability of alarm, no burglary or earthquake,
both John and Mary call
14Belief Computations
- Two types both are NP-Hard
- Belief Revision
- Model explanatory/diagnostic tasks
- Given evidence, what is the most likely
hypothesis to explain the evidence? - Also called abductive reasoning
- Belief Updating
- Queries
- Given evidence, what is the probability of some
other random variable occurring?
15Belief Revision
- Given some evidence variables, find the state of
all other variables that maximize the
probability. - E.g. We know John Calls, but not Mary. What is
the most likely state? Only consider assignments
where JT and MF, and maximize. Best
16Belief Updating
- Causal Inferences
- Diagnostic Inferences
- Intercausal Inferences
- Mixed Inferences
E
Q
Q
E
Q
E
E
E
Q
17Causal Inferences
P(B)
P(E)
Burglary
Earthquake
Inference from cause to effect. E.g. Given a
burglary, what is P(JB)?
0.002
0.001
B E P(A)
T T 0.95
Alarm
T F 0.94
F T 0.29
F F 0.001
A P(M)
John Calls
Mary Calls
T 0.70
A P(J)
F 0.01
T 0.90
F 0.05
P(MB)0.67 via similar calculations
18Diagnostic Inferences
From effect to cause. E.g. Given that John calls,
what is the P(burglary)?
What is P(J)? Need P(A) first
Many false positives.
19Intercausal Inferences
Explaining Away Inferences.
Given an alarm, P(BA)0.37. But if we add the
evidence that earthquake is true, then
P(BAE)0.003.
Even though B and E are independent, the presence
of one may make the other more/less likely.
20Mixed Inferences
Simultaneous intercausal and diagnostic inference.
E.g., if John calls and Earthquake is false
Computing these values exactly is somewhat
complicated.
21Exact Computation - Polytree Algorithm
- Judea Pearl, 1982
- Only works on singly-connected networks - at most
one undirected path between any two nodes. - Backward-chaining Message-passing algorithm for
computing posterior probabilities for query node
X - Compute causal support for X, evidence variables
above X - Compute evidential support for X, evidence
variables below X
22Polytree Computation
...
U(1)
U(m)
X
Z(1,j)
Z(n,j)
...
Y(1)
Y(n)
Algorithm recursive, message passing chain
23Other Query Methods
- Exact Algorithms
- Clustering
- Cluster nodes to make single cluster,
message-pass along that cluster - Symbolic Probabilistic Inference
- Uses d-separation to find expressions to combine
- Approximate Algorithms
- Select sampling distribution, conduct trials
sampling from root to evidence nodes,
accumulating weight for each node. Still
tractable for dense networks. - Forward Simulation
- Stochastic Simulation
24Summary
- Bayesian methods provide sound theory and
framework for implementation of classifiers - Bayesian networks a natural way to represent
conditional independence information.
Qualitative info in links, quantitative in
tables. - NP-complete or NP-hard to compute exact values
typical to make simplifying assumptions or
approximate methods. - Many Bayesian tools and systems exist
25References
- Russel, S. and Norvig, P. (1995). Artificial
Intelligence, A Modern Approach. Prentice Hall. - Weiss, S. and Kulikowski, C. (1991). Computer
Systems That Learn. Morgan Kaufman. - Heckerman, D. (1996). A Tutorial on Learning
with Bayesian Networks. Microsoft Technical
Report MSR-TR-95-06. - Internet Resources on Bayesian Networks and
Machine Learning http//www.cs.orst.edu/wangxi/r
esource.html