Title: Uncertainty in AI, Probabilistic reasoning(1) Especially for Bayesian Networks
1Uncertainty in AI, Probabilistic reasoning(1)
Especially for Bayesian Networks
- KyuTae Cho ,Jeong Ki Yoo ,HeeJin Lee
2Contents
- Uncertainty
- Degree of Belief / Degree of Truth
- Handling Uncertain Knowledge Probabilistic
Reasoning System - Basic Probability Theories Bayes rule
- Bayesian Network
- Concepts of Bayesian Networks
- Structure of Bayesian Networks
- Features of Bayesian Networks
- Evaluating Networks
- Initial probability of Bayesian Networks
- Conclusion
3Uncertainty
- Agents almost never have access to the whole
truth about their environment - Characteristics of real-world applications
- Truth value is unknown
- Too complex to compute prior to make decision
- Rational decision the right thing to do
- Depends on both the relative importance of
various goals and the likelihood that, and degree
to which, they will be achieved
4Degree of Belief
- Agent can provide degree of belief for sentence
- Main tool Probability theory
- assign a numerical degree of belief between 0 and
1 to sentences - the way of summarizing the uncertainty that comes
from laziness and ignorance - Probability can be derived from statistical data
5Degree of Belief vs. Degree of Truth
- Degree of Belief
- The sentence itself is in fact either true or
false - Same ontological commitment as logic the facts
either do or do not hold in the world - Probability theory
- Degree of Truth (membership)
- Not a question of the external world
- Case of vagueness or uncertainty about the
meaning of the linguistic term tall, pretty - Fuzzy set theory, fuzzy logic
6Handling Uncertain Knowledge
- Diagnosis Rule
- ?p Symptom(p, Toothache) ?Disease(p, Cavity)
- ?p Symptom(p, Toothache) ?Disease(p, Cavity) ?
Disease(p,GumDisease) ?
Disease(p, ImpactedWisdom)? - Pr( Symptom Disease)
- Causal Rule
- ?p Disease(p, Cavity) ? Symptom(p, Toothache)
- Not every Cavity causes toothache
- Pr (Disease Symptom)
7Why First-order Logic Fails?
- Laziness Too much works to prepare complete set
of exceptionless rule, and too hard to use the
enormous rules - Theoretical ignorance Medical science has no
complete theory for the domain - Practical ignorance All the necessary tests
cannot be run, even though we know all the rules
8Probabilistic Reasoning System
- Assign probability to a proposition based on the
percepts that it has received to date - Evidence perception that an agent receives
- Probabilities can change when more evidence is
acquired - Prior / unconditional probability no evidence
at all - Posterior / conditional probability after
evidence is obtained
9Uncertainty and Rational Decisions
- No plan can guarantee to achieve the goal
- To make choice, agent must have preferences
between the different possible outcomes of
various plans - missing plane v.s. long waiting
- Utility theory to represent and reason with
preferences - Utility the quality of being useful (degree of
usefulness) - Decision Theory Probability Theory Utility
Theory - Principle of Maximum Expected Utility An agent
is rational if and only if it chooses the action
that yields the highest expected utility, average
over all possible outcomes of the action
10Basic Probability
- Prior probability P(A) unconditional or prior
probability that the proposition A is true - Conditional Probability
- P(ab) P(a,b)/P(b)
- Product rule -gt P(a,b)P(b)
- The axioms of Probability (Kolmogorovs axioms)
- 0lt P(a) lt 1, for any proposition a
- P(true) 1 , P(false) 0
- P(a?b) p(a)p(b) p(a,b)
11Basic Probability
- The probability of a proposition is equal to the
sum of the probabilities of the atomic events in
which it holds - P(a)
- where e(a) is set of all the atomic events in
which a holds - Marginalization, summing out
- P(Y) P(Y,z)
- Conditioning
- P(Y) P(Yz)P(z)
- independence between a and b
- P(ab) P(a)
- P(ba) P(b)
- P(a,b) P(a)P(b)
12Bayes rule
- P(ba) P(ab)P(b)/P(a)
- P(ba,e) P(ab,e)P(be)/P(ae) where e is the
background evidence - s patients having stiff neck
- m patients having meningtis
- P(sm) 0.5, P(m) 1/50000, P(s) 1/20
- P(ms) P(sm)P(m)/P(s) 0.5 x 1/50000 / 1/20
0.0002
13Conditional Independence
- Conditional independence of two variables X and
Y, given a third variable Z - P(X,YZ) P(XZ)P(YZ)
- X,Y,Z are related with each other, but once the
value of Z is settled, X and Y become independent
between them. - P(Cavitytoothache,catch) aP(toothache,catchcav
ity)P(cavity) - a P(toothachecavity) P(catchcavity)
P(cavity) - toothache and catch are directly caused by the
cavity, but neither has a direct effect on the
other.
14naive Bayes model
- a single cause directly influences a number of
effects, all of which are conditionally
independent, given cause. - P(Cause,Effect1,..., EffectN)
P(Cause)P(Effect1Cause)... P(EffectNCause) - Naive because it is often used in cases where the
effect variables are not conditionally
independent given cause variable. - Though work surprisingly well in practice
15 16Bayesian Networks
- Concepts of Bayesian Networks
- Structure of Bayesian Networks
- Features of Bayesian Networks
- Evaluating Networks
- Initial probability of Bayesian Networks
17Concepts
- Model for representing uncertainty in our
knowledge - Graphical model of causality and influence
- Representation of the dependencies among random
variables -
18- Bayesian Networks
- Concepts of Bayesian Networks
- Structure of Bayesian Networks
- Features of Bayesian Networks
- Evaluating Networks
- Initial probability of Bayesian Networks
19Structure
- Form
- DAGs with following properties
- Nodes are random variables.
- Certain independence assumptions hold
- Components
- Nodes
- Random variables
- Arcs
- Specification of dependency among random
variables. - Probability distribution
- P(a) or P(ab)
20Structure(cont.)
- Initial configuration of BN
- Root nodes
- Prior probabilities
- Nonroot nodes
- Conditional probabilities given all possible
combinations of direct predecessors
21Structure Example
P(fo) .15
P(bp) .01
Family-out(fo)
Bowel-problem(bp)
P(do fo bp) .99 P(do fo?bp).90 P(do ?fo
bp) .97 P(do?fo?bp).3
Dog-out(do)
Light-on(lo)
P(lofo) .6 P(lo?fo).05
Hear-bark(hb)
P(hbdo) .7 P(hb?do).05
lt Family-out problem gt
22- Bayesian Networks
- Concepts of Bayesian Networks
- Structure of Bayesian Networks
- Features of Bayesian Networks
- Indepenence Assumptions
- Consistent Probabilities
- Evaluating Networks
- Initial probability of Bayesian Networks
23Features
- Independence assumptions
- Relating to casual interpretation of arcs
- consistent probabilities
- Relating to the probabilities that are specified
-
24Independence assumptions
- Problem of probability theory
- 2n-1 joint distributions for n variables
- For 5 variables, 31 joint distributions
- Solution by BN
- For 5 variables, 10 joint distributions
- Bayesian Networks have built-in independence
assumptions.
25 Independence assumption(cont.)
- Definition of independence assumptions
- If random variable a,b are independent,
P(ab) P(a) - How to know dependency between two variables
- By the existence of d-connecting path between two
random variables - d-connecting path exist dependent
26Independence assumption D-connecting path
1) It is linear or diverging and not a member
of evidence nodes 2) It is converging, and
either n or one of its descendents is a
member of evidence nodes
A
X
.
.
intermediate nodes
.
Y
B
27Independence assumption(cont.)
A is dependent on B
B
A
B
A
Evidence node
C
D
D
E
Evidence node
C is independent on G
Evidence node
G
F
F
H
I
H
I
H is independent on I
28Consistent Probabilities
- Problem of probability theory
- Inconsistent probability
- Exgt P(ab).7 ,P(ba).3, P(b).5
- P(a) P(b)P(ab) / p(ba) .5
.7 / .3 .35 / .3 - Solution by BN
- Bayesian Network has consistent probabilities
- Consistent numbers
- Unique definition of distribution
29Consistent Probabilities Joint
distribution
- Definition of joint distribution
- Set of boolean variables (a,b)
(a,b) P(ab) P(?ab) P(a?b) P(?a?b) - Role of joint distribution
- Joint distribution give all the information about
probability distribution. - Exgt P(ab) P(ab) / P(b)
- P(ab) / ((P(ab)P(?ab))
- For n random variables, 2n 1 joint
distributions -
30Consistent Probabilities Unique definition of
distribution
- Joint distribution for BN is uniquely defined
- By the product individual distribution of R.V.
- Using chain-rule, topological sort and dependency
Ex)
P(abcde) P(a)P(b)P(ca)P(dab)P(ed)
31Consistent Probabilities Unique definition of
distribution(cont.)
- Chain-rule
- Topological sort
- Ordering of variable comes before all its
descendants. - Exgt
32Consistent Probabilities Unique definition of
distribution(cont.)
- Dependency
- If no given nodes, node is only dependent on
immediately above the node
Ex)
b is independent on a,c d is independent on c e
is independent on a,b,c
33Consistent Probabilities Unique definition of
distribution(cont.)
Chain-rule, Topological sort
Joint probability P(abcde)
P(a)P(ba)P(cab)P(dabc)P(eabcd)
Independence assumption
P(abcde) P(a)P(b)P(ca)P(dab)P(ed)
b is independent on a,c d is independent on c e
is independent on a,b,c
34- Bayesian Networks
- Concepts of Bayesian Networks
- Structure of Bayesian Networks
- Features of Bayesian Networks
- Evaluating Networks
- Exact Inference
- Approximate Solutions
- Initial probability of Bayesian Networks
35Evaluating networks
- Evaluation of network
- Computation of all nodes conditional probability
given evidence - Type of evaluation
- Exact inference
- NP-Hard Problem
- Approximate inference
- Not exact, but within small distance of the
correct answer
36Exact inference
- Two network types
- Singly connected network (polytree)
- Multiply connected network
- Complexity according to network type
- Singly connected network can be efficiently solved
37Exact inference (cont.)
- Inference by enumeration (with alarm example)
38Exact inference (cont.)
- Variable elimination
- IDEA Do the calculation once and reuse later
- factor
39Exact inference Multiply Connected Network
- Hard to evaluate multiply connection network
A
Probabilities can be affected by both neighbor
nodes and other nodes
p(CD) ?
B
C
D
evidence
40Exact inference Multiply Connected Network
(cont.)
- Methodology to evaluate the network exactly
- Clustering
- To Combination of nodes until the resulting
graph is singly connected
41Approximate Solutions
- Logic Sampling
- While(not assign to all values)
- guess the value of next lower node
- on the basis of the higher node values
- ex)
42- Bayesian Networks
- Concepts of Bayesian Networks
- Structure of Bayesian Networks
- Features of Bayesian Networks
- Evaluating Networks
- Initial probability of Bayesian Networks
43Initial probability of B.N.
- Determined by expert who subjectively assesses
problems - Ex) Three causes of a fever
44Conclusion
- Bayesian Networks are solutions of problems in
traditional probability theory - Drawback and choice
- Evaluation time is drawback of BN.
- Reason of using BN
- BN need not many numbers
- Efficient exact solution methods as well as a
variety of approximation schemes