Representations for KBS: Uncertainty - PowerPoint PPT Presentation

About This Presentation

Title:

Representations for KBS: Uncertainty

Description:

When we restrict attention to C, A and B are independent. 6.871, 2004 -Uncertainty , Page 9 ... If symptoms are conditionally independent, same as doing it all at once ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 33

Provided by: howard127

Learn more at: https://courses.csail.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: Representations for KBS: Uncertainty

1
Representations for KBSUncertainty Decision
Support

6.871 - Knowledge Based Systems
Tuesday March 20, 2006
Howard Shrobe
Randall Davis

2
Outline

The Other Problem with Mycin
Brief review of history of uncertainty in AI
Bayes Theorem
Some tractable Bayesian situations
Bayes Nets
Decision Theory and Rational Choice

3
The Other Problem with Mycin

In an earlier class we argued that Mycin used an
extremely impoverished language for stating facts
and rules (A-O-V triples)
Here we argue that its notion of uncertainty was
broken
In mycin the certainty factor for OR is Max
CF (OR A B) (Max (Cf A) (Cf B))
Consider
Rule-1 IF A then C, certainty factor 1
Rule-2 If B then C, certainty factor 1
This is logically the same as
If (Or A B) then C, certainty factor 1

4
More Problems

If CF(A) .8 and CF(B) .3
Then CF (C ) .8 .3 (1 - .8) .8 .06
.86
CF (OR A B) (Max .8 .3) .8 and CF(C ) .8
IF A -gt B, A -gt C, B -gtD, C-gtD there will also be
a mistake (why?)

B
C
5
History of Uncertainty Representations

Probability tried and Rejected
too many numbers
Focus on Logical, qualitative
reasoning by cases
non-monotonic reasoning
Numerical Approaches retried
Certainty factors
Dempster-Schafer
Fuzzy
Bayes Networks

6
Understanding Bayes Theorem
Has it and Tests for it
10 .95 9.5
Positive .95
Test?
Yes 010
Has it and Doesnt Test for it
Has Cancer?
Doesnt Have it But Tests for it
990 .05 49.5
No 990
Test?
Doesnt Have it and doesnt Test for it
Number that test positive 9.5 49.5 59 If
you test positive your probability of
having Cancer is 9.5 / 59 16.1
7
Reviewing Bayes Theorem

Symptom S

Conditional Probability of S given D
U
8
Independence Conditional Independence

Independence
P(A) P(B) P(AB)
A varies the same within B as it does in the
universe
Conditional Independence within C
P(AC) P(BC) P(ABC)
When we restrict attention to C, A and B are
independent

9
Examples
A
B
A and B are Independent
A and B are Dependent
A and B are conditionally Dependent, given C
A and B are Conditionally Independent, given C.
10
IDIOT BAYES Model
S1
D
SK

Single Disease
Conditionally Independent Symptoms
P(S1,S2D) P(S1D) P(S2D)
N Symptoms means N probabilities
Without conditional independence need joint
probabilities 2N

11
Using Multiple Pieces of Evidence
E1
H
E2

If you assume "conditional independence" between
the evidence then this takes on a nice
multiplicative form.
Conditional Independence is the notion that the
various pieces of evidence are statistically
independent of one another, given that the
hypothesis obtains, i.e. the hypothesis
"separates" the different pieces of evidence
P(E1,E2H) P(E1H) P(E2H)
P(E1,E2H) P(E1H) P(E2H)
Without conditional independence you need to
build up a very large database of joint
probabilities and joint conditional probabilities.

12
Sequential Bayesian Inference

Consider symptoms one by one
Prior Probabilities P(Di)
Observe Symptom Sj
Updates Priors using Bayes Rule
Repeat for Other Symptoms using the resulting
Posterior as the new Prior
If symptoms are conditionally independent, same
as doing it all at once
Allows choice of what symptom to observe (test to
perform) next in terms of cost/benefit.

13
Bipartite Graphs

Multiple Symptoms, multiple diseases
Diseases are probabilistically independent
Symptoms are conditionally independent
Symptoms probabilities depend only the diseases
causing them
Symptoms with multiple causes require joint
probabilities P(S2D1,D2,D3)
Information explosion

14
Noisy OR

A useful element in the modeling vocabulary
Make the simplifying assumption that only 1
disease is present at a time
Probability that all diseases cause the symptom
is just the probability that at least 1 does
Therefore Symptom is absent only if no disease
caused it.

1 - P(S2D1,D2,D3) (1 - P(S2D1))
(1 - P(S2D2))
(1 - P(S2D3))

Use Causal Probabilities for the basic data
Reduces probability table size if n diseases and
k symptoms, from k2n to nk

15
Polytrees

What if diseases cause or influence each other?
Are there still well behaved versions?
Yes, Polytrees At most one path between any two
nodes
Dont have to worry about double-counting
Efficient Sequential updating is still possible

16
Bayes Nets

Directed Acyclic Graphs
Absence of link --gt conditional independence
P(X1,...,Xn) Product P(Xiparents (Xi))
Specify joint probability tables over parents for
each node
Probability A,B,C,D,E all present
P(A,B,C,D,E)
P(A) P(BA) P(CA) P(DB,C) P(EC)
Probability A,C,D present B,E absent
P(A,B,C,D,E)
P(A) P(BA) P(CA) P(DB,C) P(EC)

17
Example
Burglary
Earthquake
Alarm
Radio Report
Phone Call
P(CallAlarm)
t
f
P(RadioReportEarthquake)
t
f
t
.9
.01
t
1
0
f
.1
.99
f
0
1
P(AlarmB,E)
t,t
t,f
f,t
f,f
t
.8
.99
.6
.01
f
.2
.01
.4
.99
16 vs. 32 probabilites
18
Computing with Partial Information

Probability that A present and E absent

Graph separators (e.g. C) correspond to
factorizations
General problem of finding separators is NP-hard

19
Odds Likelihood Formulation

Define Odds as
Define Likelihood as

Divide complementary instances of Bayes Rule
Bayes Rule is Then
In Logarithmic Form Log Odds Log Odd Log
Likelihood
20
Certainty Factors
A
x
z
D
C
B
y
Parallel Combination

Rules
If A, then C (x)
If B, then C (x)
If C, then D(x)

CF(C)
Series Combination
CF(C) z max(0, CF(C))
21
Issues with Certainty Factors

Results obtained depend on order in which
evidence is considered in some cases
Reasoning is often fairly insensitive to them.
20 variations yield no change in MYCIN
What do they mean? (in some cases the answer is)

Conditional Probability
Likelihood
Certainty Factor
22
Decision Making

So far, what weve considered is how to use
evidence to evaluate a situation.
In many cases, this is only the first part of the
problem
What we want to do is to take actions to improve
the situation
But which action should we take?
The one which is most likely to leave us in the
best condition
Decision Analysis helps us calculate which action
that is

23
A Decision Making Problem

There are two types of Urn U1 and U2 (80 are U1)
U1 contains 4 Red balls and 6 Black balls
U2 contains nine Red balls and one Black ball
An urn is selected at random and you are to guess
which type it is.
You have several courses of action
Refuse to play No Payoff no cost
Guess it is of type 1 40 Payoff if right, 20
penalty if wrong
Guess it is of type 2 100 Payoff if right, 5
penalty if wrong
Sample a ball 8 payment for the right to sample

24
Decision Flow Diagrams
Decision Fork
Chance Fork
(e1,R, a1)
40.00
a1
-20.00
0.00
-5.00
Refuse to Play
a2
100.00
R
-8.00
Make an Observation
40.00
B
a1
-20.00
-5.00
No Observation
(e1,B)
a2
100.00
(e1,R, a1)
40.00
a1
-20.00
-5.00
a2
100.00
25
Expected Monetary Value

Suppose there are several possible outcomes
Each has a monetary payoff or penalty
Each has a probability
The Expected Monetary Value is the sum of the
products of the monetary payoffs times their
corresponding probabilities.

40
.8
.2
-20
EMV .8 40 .2 -20 32 (-4)
28

EMV is a normative notion of what a person who
has no other biases (risk aversion, e.g.) should
be willing to accept in exchange for the
situation facing him. In the picture above, you
should be indifferent to the choice of taking 28
or playing the game.
Most people have some extra biases and these can
be incorporated in the form of a utility function
applied to the calculated value.
A rational person should choose that course of
action which has the highest EMV.

26
Averaging Out and Folding Back

EMV of Decision Node is Max over all branches
EMV of Chance Node is Probability Weighted Sum
over all branches

32.00
28.00
-4.00
28.00
-4.00
16.00
20.00
27
The Effect of Observation
Bayes theorem is used to calculate probabilities
at chance nodes following decision nodes that
provide relevant evidence.
P(R) P(RU1) P(U1) P(RU2) P(U2)
P(U1R) P(RU1) P(U1) / P(R)
28
Calculating the Updated Probabilities
Initial Probabilities P(OutcomeState)
State Outcome U1 U2 Red .4 .9 Black .6 .1
.8 .2
Joint Marginal
Probabilities P(Outcome State)
State Marginal Probability Outcome U1 U2 of
Outcome Red .8 .4 .32 .2 .9
.18 .50 Black .8 .6 .48 .2 .1 .02 .50
Updated Probabilities P(State
Outcome) State Outcome U1 U2 Red .64 .3
6 Black .96 .04
29
Illustrating Evaluation
25.60
18.40
-7.20
-3.20
32.80
16.40
36.00
35.20
27.20
38.40
18.80
37.60
-.80
U1 U2 R .64 .36 .5 B .96 .04 .5
-4.04
-0.04
4.00
30
Final Value of Decision Flow Diagram
(e1,R, a1)
40.00
a1
-20.00
0.00
-5.00
Refuse to Play
a2
100.00
28.00
27.20
R
-8.00
Make an Observation
40.00
B
a1
-20.00
-5.00
No Observation
28.00
(e1,B)
a2
100.00
(e1,R, a1)
40.00
a1
-20.00
-5.00
a2
100.00
31
Maximum Entropy
.2
.1
.2
.5
Several Competing Hypotheses Each with a
Probability rating.

Suppose there are several tests you can make.
Each test can change the probability of some (or
all) of the hypotheses (using Bayes Theorem).
Each outcome of the test has a probability.
Were only interested in gathering information at
this point
Which test should you make?
Entropy Sum -2 P(i) Log P(i) is a standard
measure of Information.
For each outcome of a test calculate the change
in entropy.
Weigh this by the probability of that outcome.
Sum these to get an expected change of entropy
for the test.

32
Maximum Entropy (2)

Chose that test which has the greatest expected
change in entropy.
This is equivalent to choosing the test which is
most likely to provide the most information.
Tests have different costs (sometimes quite
drastic ones like life and death).
Normalize the benefits by the costs and then make
choice.

Write a Comment

User Comments (0)