Title: Belief Networks
1Belief Networks
2Other Names
- Bayesian networks
- Probabilistic networks
- Causal networks
3Probabilistic Belief
- There are several possible worlds that
areindistinguishable to an agent given some
priorevidence. - The agent believes that a logic sentence B is
True with probability p and False with
probability 1-p. B is called a belief - In the frequency interpretation of probabilities,
this means that the agent believes that the
fraction of possible worlds that satisfy B is p - The distribution (p,1-p) is the strength of B
4Problem
- At a certain time t, the KB of an agent is some
collection of beliefs - At time t the agents sensors make an observation
that changes the strength of one of its beliefs - How should the agent update the strength of its
other beliefs?
5Toothache Example
- A certain dentist is only interested in two
things about any patient, whether he has a
toothache and whether he has a cavity - Over years of practice, she has constructed the
following joint distribution
6Toothache Example
- Using the joint distribution, the dentist can
compute the strength of any logic sentence built
with the proposition Toothache and Cavity
7New Evidence
- She now makes an observation E that indicates
that a specific patient x has high probability
(0.8) of having a toothache, but is not directly
related to whether he has a cavity
8Adjusting Joint Distribution
- She now makes an observation E that indicates
that a specific patient x has high probability
(0.8) of having a toothache, but is not directly
related to whether he has a cavity - She can use this additional information to create
a joint distribution (specific for x) conditional
to E, by keeping the same probability ratios
between Cavity and ?Cavity
9Corresponding Calculus
- P(CT) P(C?T)/P(T) 0.04/0.05
10Corresponding Calculus
- P(CT) P(C?T)/P(T) 0.04/0.05
- P(C?TE) P(CT,E) P(TE)
P(CT) P(TE)
11Corresponding Calculus
- P(CT) P(C?T)/P(T) 0.04/0.05
- P(C?TE) P(CT,E) P(TE)
P(CT) P(TE) (0.04/0.05)0.8
0.64
12Generalization
- n beliefs X1,,Xn
- The joint distribution can be used to update
probabilities when new evidence arrives - But
- The joint distribution contains 2n probabilities
- Useful independence is not made explicit
13Purpose of Belief Networks
- Facilitate the description of a collection of
beliefs by making explicit causality relations
and conditional independence among beliefs - Provide a more efficient way (than by using joint
distribution tables) to update belief strengths
when new evidence is observed
14Alarm Example
- Five beliefs
- A Alarm
- B Burglary
- E Earthquake
- J JohnCalls
- M MaryCalls
15A Simple Belief Network
Intuitive meaning of arrow from x to y x has
direct influence on y
Directed acyclicgraph (DAG)
Nodes are beliefs
16Assigning Probabilities to Roots
17Conditional Probability Tables
Size of the CPT for a node with k parents 2k
18Conditional Probability Tables
19What the BN Means
P(x1,x2,,xn) Pi1,,nP(xiParents(Xi))
20Calculation of Joint Probability
P(J?M?A??B??E) P(JA)P(MA)P(A?B,?E)P(?B)P(?E)
0.9 x 0.7 x 0.001 x 0.999 x 0.998 0.00062
21What The BN Encodes
- Each of the beliefs JohnCalls and MaryCalls is
independent of Burglary and Earthquake given
Alarm or ?Alarm
- The beliefs JohnCalls and MaryCalls are
independent given Alarm or ?Alarm
22What The BN Encodes
- Each of the beliefs JohnCalls and MaryCalls is
independent of Burglary and Earthquake given
Alarm or ?Alarm
- The beliefs JohnCalls and MaryCalls are
independent given Alarm or ?Alarm
23Inference In BN
- Set E of evidence variables that are observed
with new probability distribution, e.g.,
JohnCalls,MaryCalls - Query variable X, e.g., Burglary, for which we
would like to know the posterior probability
distribution P(XE)
24Inference Patterns
- Basic use of a BN Given new
- observations, compute the newstrengths of some
(or all) beliefs
- Other use Given the strength of
- a belief, which observation should
- we gather to make the greatest
- change in this beliefs strength
25Applications
- http//excalibur.brc.uconn.edu/baynet/researchApp
s.html - Medical diagnosis, e.g., lymph-node deseases
- Fraud/uncollectible debt detection
- Troubleshooting of hardware/software systems
26Neural Networks
27Function-Learning Formulation
- Goal function f
- Training set (xi, f(xi)), i 1,,n
- Inductive inference find a function h that fits
the point well
- Issues
- Representation
- Incremental learning
28Unit (Neuron)
y g(Si1,,n wi xi)
g(u) 1/1 exp(-a u)
29Particular Case Perceptron
30Particular Case Perceptron
?
31Neural Network
- Network of interconnected neurons
Acyclic (feed-forward) vs. recurrent networks
32Two-Layer Feed-Forward Neural Network
33Backpropagation (Principle)
- New example Yk f(xk)
- Error function
- E(w) yk Yk2
- wij(k) wij(k-1) e ?E/?wij
- Backprojection Update the weights of the inputs
to the last layer, then the weights of the inputs
to the previous layer, etc.
34Issues
- How to choose the size and structure of
networks? - If network is too large, risk of over-fitting
(data caching) - If network is too small, representation may not
be rich enough - Role of representation e.g., learn the concept
of an odd number
35What is AI?
- Discipline that systematizes and automates
intellectual tasks to create machines that
36What Have We Learned?
- Collection of useful methods
- Connection between fields
- Relation between high-level (e.g., logic) and
low-level (e.g., neural networks) representations - Impact of hardware
- What is intelligence?
- Our techniques are better than our understanding