Representations for KBS: Uncertainty - PowerPoint PPT Presentation

About This Presentation
Title:

Representations for KBS: Uncertainty

Description:

When we restrict attention to C, A and B are independent. 6.871, 2004 -Uncertainty , Page 9 ... If symptoms are conditionally independent, same as doing it all at once ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 33
Provided by: howard127
Category:

less

Transcript and Presenter's Notes

Title: Representations for KBS: Uncertainty


1
Representations for KBSUncertainty Decision
Support
  • 6.871 - Knowledge Based Systems
  • Tuesday March 20, 2006
  • Howard Shrobe
  • Randall Davis

2
Outline
  • The Other Problem with Mycin
  • Brief review of history of uncertainty in AI
  • Bayes Theorem
  • Some tractable Bayesian situations
  • Bayes Nets
  • Decision Theory and Rational Choice

3
The Other Problem with Mycin
  • In an earlier class we argued that Mycin used an
    extremely impoverished language for stating facts
    and rules (A-O-V triples)
  • Here we argue that its notion of uncertainty was
    broken
  • In mycin the certainty factor for OR is Max
  • CF (OR A B) (Max (Cf A) (Cf B))
  • Consider
  • Rule-1 IF A then C, certainty factor 1
  • Rule-2 If B then C, certainty factor 1
  • This is logically the same as
  • If (Or A B) then C, certainty factor 1

4
More Problems
  • If CF(A) .8 and CF(B) .3
  • Then CF (C ) .8 .3 (1 - .8) .8 .06
    .86
  • CF (OR A B) (Max .8 .3) .8 and CF(C ) .8
  • IF A -gt B, A -gt C, B -gtD, C-gtD there will also be
    a mistake (why?)

B
C
5
History of Uncertainty Representations
  • Probability tried and Rejected
  • too many numbers
  • Focus on Logical, qualitative
  • reasoning by cases
  • non-monotonic reasoning
  • Numerical Approaches retried
  • Certainty factors
  • Dempster-Schafer
  • Fuzzy
  • Bayes Networks

6
Understanding Bayes Theorem
Has it and Tests for it
10 .95 9.5
Positive .95
Test?
Yes 010
Has it and Doesnt Test for it
Has Cancer?
Doesnt Have it But Tests for it
990 .05 49.5
No 990
Test?
Doesnt Have it and doesnt Test for it
Number that test positive 9.5 49.5 59 If
you test positive your probability of
having Cancer is 9.5 / 59 16.1
7
Reviewing Bayes Theorem
  • Symptom S

Conditional Probability of S given D
U
8
Independence Conditional Independence
  • Independence
  • P(A) P(B) P(AB)
  • A varies the same within B as it does in the
    universe
  • Conditional Independence within C
  • P(AC) P(BC) P(ABC)
  • When we restrict attention to C, A and B are
    independent

9
Examples
A
B
A and B are Independent
A and B are Dependent
A and B are conditionally Dependent, given C
A and B are Conditionally Independent, given C.
10
IDIOT BAYES Model
S1
D
SK
  • Single Disease
  • Conditionally Independent Symptoms
  • P(S1,S2D) P(S1D) P(S2D)
  • N Symptoms means N probabilities
  • Without conditional independence need joint
    probabilities 2N

11
Using Multiple Pieces of Evidence
E1
H
E2
  • If you assume "conditional independence" between
    the evidence then this takes on a nice
    multiplicative form.
  • Conditional Independence is the notion that the
    various pieces of evidence are statistically
    independent of one another, given that the
    hypothesis obtains, i.e. the hypothesis
    "separates" the different pieces of evidence
  • P(E1,E2H) P(E1H) P(E2H)
  • P(E1,E2H) P(E1H) P(E2H)
  • Without conditional independence you need to
    build up a very large database of joint
    probabilities and joint conditional probabilities.

12
Sequential Bayesian Inference
  • Consider symptoms one by one
  • Prior Probabilities P(Di)
  • Observe Symptom Sj
  • Updates Priors using Bayes Rule
  • Repeat for Other Symptoms using the resulting
    Posterior as the new Prior
  • If symptoms are conditionally independent, same
    as doing it all at once
  • Allows choice of what symptom to observe (test to
    perform) next in terms of cost/benefit.

13
Bipartite Graphs
  • Multiple Symptoms, multiple diseases
  • Diseases are probabilistically independent
  • Symptoms are conditionally independent
  • Symptoms probabilities depend only the diseases
    causing them
  • Symptoms with multiple causes require joint
    probabilities P(S2D1,D2,D3)
  • Information explosion

14
Noisy OR
  • A useful element in the modeling vocabulary
  • Make the simplifying assumption that only 1
    disease is present at a time
  • Probability that all diseases cause the symptom
    is just the probability that at least 1 does
  • Therefore Symptom is absent only if no disease
    caused it.

1 - P(S2D1,D2,D3) (1 - P(S2D1))
(1 - P(S2D2))
(1 - P(S2D3))
  • Use Causal Probabilities for the basic data
  • Reduces probability table size if n diseases and
    k symptoms, from k2n to nk

15
Polytrees
  • What if diseases cause or influence each other?
  • Are there still well behaved versions?
  • Yes, Polytrees At most one path between any two
    nodes
  • Dont have to worry about double-counting
  • Efficient Sequential updating is still possible

16
Bayes Nets
  • Directed Acyclic Graphs
  • Absence of link --gt conditional independence
  • P(X1,...,Xn) Product P(Xiparents (Xi))
  • Specify joint probability tables over parents for
    each node
  • Probability A,B,C,D,E all present
  • P(A,B,C,D,E)
  • P(A) P(BA) P(CA) P(DB,C) P(EC)
  • Probability A,C,D present B,E absent
  • P(A,B,C,D,E)
  • P(A) P(BA) P(CA) P(DB,C) P(EC)

17
Example
Burglary
Earthquake
Alarm
Radio Report
Phone Call
P(CallAlarm)
t
f
P(RadioReportEarthquake)
t
f
t
.9
.01
t
1
0
f
.1
.99
f
0
1
P(AlarmB,E)
t,t
t,f
f,t
f,f
t
.8
.99
.6
.01
f
.2
.01
.4
.99
16 vs. 32 probabilites
18
Computing with Partial Information
  • Probability that A present and E absent
  • Graph separators (e.g. C) correspond to
    factorizations
  • General problem of finding separators is NP-hard

19
Odds Likelihood Formulation
  • Define Odds as
  • Define Likelihood as

Divide complementary instances of Bayes Rule
Bayes Rule is Then
In Logarithmic Form Log Odds Log Odd Log
Likelihood
20
Certainty Factors
A
x
z
D
C
B
y
Parallel Combination
  • Rules
  • If A, then C (x)
  • If B, then C (x)
  • If C, then D(x)

CF(C)
Series Combination
CF(C) z max(0, CF(C))
21
Issues with Certainty Factors
  • Results obtained depend on order in which
    evidence is considered in some cases
  • Reasoning is often fairly insensitive to them.
    20 variations yield no change in MYCIN
  • What do they mean? (in some cases the answer is)

Conditional Probability
Likelihood
Certainty Factor
22
Decision Making
  • So far, what weve considered is how to use
    evidence to evaluate a situation.
  • In many cases, this is only the first part of the
    problem
  • What we want to do is to take actions to improve
    the situation
  • But which action should we take?
  • The one which is most likely to leave us in the
    best condition
  • Decision Analysis helps us calculate which action
    that is

23
A Decision Making Problem
  • There are two types of Urn U1 and U2 (80 are U1)
  • U1 contains 4 Red balls and 6 Black balls
  • U2 contains nine Red balls and one Black ball
  • An urn is selected at random and you are to guess
    which type it is.
  • You have several courses of action
  • Refuse to play No Payoff no cost
  • Guess it is of type 1 40 Payoff if right, 20
    penalty if wrong
  • Guess it is of type 2 100 Payoff if right, 5
    penalty if wrong
  • Sample a ball 8 payment for the right to sample

24
Decision Flow Diagrams
Decision Fork
Chance Fork
(e1,R, a1)
40.00
a1
-20.00
0.00
-5.00
Refuse to Play
a2
100.00
R
-8.00
Make an Observation
40.00
B
a1
-20.00
-5.00
No Observation
(e1,B)
a2
100.00
(e1,R, a1)
40.00
a1
-20.00
-5.00
a2
100.00
25
Expected Monetary Value
  • Suppose there are several possible outcomes
  • Each has a monetary payoff or penalty
  • Each has a probability
  • The Expected Monetary Value is the sum of the
    products of the monetary payoffs times their
    corresponding probabilities.

40
.8
.2
-20
EMV .8 40 .2 -20 32 (-4)
28
  • EMV is a normative notion of what a person who
    has no other biases (risk aversion, e.g.) should
    be willing to accept in exchange for the
    situation facing him. In the picture above, you
    should be indifferent to the choice of taking 28
    or playing the game.
  • Most people have some extra biases and these can
    be incorporated in the form of a utility function
    applied to the calculated value.
  • A rational person should choose that course of
    action which has the highest EMV.

26
Averaging Out and Folding Back
  • EMV of Decision Node is Max over all branches
  • EMV of Chance Node is Probability Weighted Sum
    over all branches

32.00
28.00
-4.00
28.00
-4.00
16.00
20.00
27
The Effect of Observation
Bayes theorem is used to calculate probabilities
at chance nodes following decision nodes that
provide relevant evidence.
P(R) P(RU1) P(U1) P(RU2) P(U2)
P(U1R) P(RU1) P(U1) / P(R)
28
Calculating the Updated Probabilities
Initial Probabilities P(OutcomeState)
State Outcome U1 U2 Red .4 .9 Black .6 .1
.8 .2
Joint Marginal
Probabilities P(Outcome State)
State Marginal Probability Outcome U1 U2 of
Outcome Red .8 .4 .32 .2 .9
.18 .50 Black .8 .6 .48 .2 .1 .02 .50
Updated Probabilities P(State
Outcome) State Outcome U1 U2 Red .64 .3
6 Black .96 .04
29
Illustrating Evaluation
25.60
18.40
-7.20
-3.20
32.80
16.40
36.00
35.20
27.20
38.40
18.80
37.60
-.80
U1 U2 R .64 .36 .5 B .96 .04 .5
-4.04
-0.04
4.00
30
Final Value of Decision Flow Diagram
(e1,R, a1)
40.00
a1
-20.00
0.00
-5.00
Refuse to Play
a2
100.00
28.00
27.20
R
-8.00
Make an Observation
40.00
B
a1
-20.00
-5.00
No Observation
28.00
(e1,B)
a2
100.00
(e1,R, a1)
40.00
a1
-20.00
-5.00
a2
100.00
31
Maximum Entropy
.2
.1
.2
.5
Several Competing Hypotheses Each with a
Probability rating.
  • Suppose there are several tests you can make.
  • Each test can change the probability of some (or
    all) of the hypotheses (using Bayes Theorem).
  • Each outcome of the test has a probability.
  • Were only interested in gathering information at
    this point
  • Which test should you make?
  • Entropy Sum -2 P(i) Log P(i) is a standard
    measure of Information.
  • For each outcome of a test calculate the change
    in entropy.
  • Weigh this by the probability of that outcome.
  • Sum these to get an expected change of entropy
    for the test.

32
Maximum Entropy (2)
  • Chose that test which has the greatest expected
    change in entropy.
  • This is equivalent to choosing the test which is
    most likely to provide the most information.
  • Tests have different costs (sometimes quite
    drastic ones like life and death).
  • Normalize the benefits by the costs and then make
    choice.
Write a Comment
User Comments (0)
About PowerShow.com