Cooperating Intelligent Systems - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Cooperating Intelligent Systems

Description:

Output = distribution over values. Any type of function from values ... Probability distribution for 'no earthquake, no burglary, but alarm, and both Mary and ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 57
Provided by: HH3
Category:

less

Transcript and Presenter's Notes

Title: Cooperating Intelligent Systems


1
Cooperating Intelligent Systems
  • Bayesian networks
  • Chapter 14, AIMA

2
Inference
  • Inference in the statistical setting means
    computing probabilities for different outcomes to
    be true given the information
  • We need an efficient method for doing this, which
    is more powerful than the naïve Bayes model.

3
Bayesian networks
  • A Bayesian network is a directed graph in which
    each node is annotated with quantitative
    probability information
  • A set of random variables, X1,X2,X3,..., makes
    up the nodes of the network.
  • A set of directed links connect pairs of nodes,
    parent ? child
  • Each node Xi has a conditional probability
    distribution P(Xi Parents(Xi)) .
  • The graph is a directed acyclic graph (DAG).

4
The dentist network
Cavity
Weather
Catch
Toothache
5
The alarm network
Burglar alarm responds to both earthquakes and
burglars. Two neighbors John and Mary,who have
promised to call youwhen the alarm goes
off. John always calls when theresan alarm,
and sometimes whentheres not an alarm. Mary
sometimes misses the alarms (she likes loud
music).
Burglary
Earthquake
Alarm
JohnCalls
MaryCalls
6
The cancer network
From Breese and Coller 1997
7
The cancer network
P(A,G) P(A)P(G)
Age
Gender
Smoking
Toxics
P(CS,T,A,G) P(CS,T)
Cancer
GeneticDamage
SerumCalcium
LungTumour
P(A,G,T,S,C,SC,LT,GD) P(A)P(G)P(TA)P(SA,G)?P
(CT,S)P(GD)P(SCC)?P(LTC,GD)
P(SC,C,LT,GD) P(SCC)P(LTC,GD)P(C) P(GD)
From Breese and Coller 1997
8
The product (chain) rule
(This is for Bayesian networks, the general case
comeslater in this lecture)
9
Bayes network node is a function
A
B
a
b
C
P(Ca,b) U0.7,1.9
10
Bayes network node is a function
A
B
C
  • A BN node is a conditionaldistribution function
  • Inputs Parent values
  • Output distribution over values
  • Any type of function from valuesto distributions.

11
Example The alarm network
Note Each number in the tables represents
aboolean distribution. Hence there is
adistribution output forevery input.
12
Example The alarm network
Probability distribution forno earthquake, no
burglary,but alarm, and both Mary andJohn make
the call
13
Meaning of Bayesian network
The BN is a correct representation of the domain
iff each node is conditionally independent of
its predecessors, given its parents.
14
The alarm network
The fully correct alarm network might look
something like the figure. The Bayesian network
(red) assumes that some of the variables are
independent (or that the dependecies can be
neglected since they are very weak). The
correctness of the Bayesian network of course
depends on the validity of these assumptions.
It is this sparse connection structure that makes
the BN approachfeasible (linear growth in
complexity rather than exponential)
15
How construct a BN?
  • Add nodes in causal order (causal determined
    from expertize).
  • Determine conditional independence using either
    (or all) of the following semantics
  • Blocking/d-separation rule
  • Non-descendant rule
  • Markov blanket rule
  • Experience/your beliefs

16
Path blocking d-separation
  • Intuitively, knowledge about Serum Calcium
    influences our belief about Cancer, if we dont
    know the value of Cancer, which in turn
    influences our belief about Lung Tumour, etc.
  • However, if we are given the value of Cancer
    (i.e. C true or false), then knowledge of Serum
    Calcium will not tell us anything about Lung
    Tumour that we dont already know.
  • We say that Cancer d-separates (direction-dependen
    t separation) Serum Calcium and Lung Tumour.

17
Some definitions of BN(from Wikipedia)
  • X is a Bayesian network with respect to G if its
    joint probability density function can be written
    as a product of the individual density functions,
    conditional on their parent variables

X X1, X2, ..., XN is a set of random
variables G (V,E) is a directed acyclic graph
(DAG) of vertices (V) and edges (E)
18
Some definitions of BN(from Wikipedia)
  • X is a Bayesian network with respect to G if it
    satisfies the local Markov property each
    variable is conditionally independent of its
    non-descendants given its parent variables

Note
X X1, X2, ..., XN is a set of random
variables G (V,E) is a directed acyclic graph
(DAG) of vertices (V) and edges (E)
19
Non-descendants
  • A node is conditionally independent of its
    non-descendants (Zij), given its parents.

20
Some definitions of BN(from Wikipedia)
  • X is a Bayesian network with respect to G if
    every node is conditionally independent of all
    other nodes in the network, given its Markov
    blanket. The Markov blanket of a node is its
    parents, children and children's parents.

X X1, X2, ..., XN is a set of random
variables G (V,E) is a directed acyclic graph
(DAG) of vertices (V) and edges (E)
21
Markov blanket
X2
X3
X1
X4
  • A node is conditionally independent of all other
    nodes in the network, given its parents,
    children, and childrens parents
  • These constitute the nodes Markov blanket.

X5
X6
Xk
22
Path blocking d-separation
Xi and Xj are d-separated if all paths betweeen
them are blocked
  • Two nodes Xi and Xj are conditionally independent
    given a set W X1,X2,X3,... of nodes if for
    every undirected path in the BN between Xi and Xj
    there is some node Xk on the path having one of
    the following three properties
  • Xk ? W, and both arcs on the path lead out of
    Xk.
  • Xk ? W, and one arc on the path leads into Xk
    and one arc leads out.
  • Neither Xk nor any descendant of Xk is in W, and
    both arcs on the path lead into Xk.
  • Xk blocks the path between Xi and Xj

23
Some definitions of BN(from Wikipedia)
  • X is a Bayesian network with respect to G if, for
    any two nodes i, j

The d-separating set(i,j) is the set of nodes
that d-separate node i and j.
The Markov blanket of node i is the minimal set
of nodes that d-separates node i from all other
nodes.
X X1, X2, ..., XN is a set of random
variables G (V,E) is a directed acyclic graph
(DAG) of vertices (V) and edges (E)
24
Causal networks
  • Bayesian networks are usually used to represent
    causal relationships. This is, however, not
    strictly necessary a directed edge from node i
    to node j does not require that Xi is causally
    dependent on Xj.
  • This is demonstrated by the fact that Bayesian
    networks on the two graphsare equivalent.
    They impose the same conditional independence
    requirements.

A causal network is a Bayesian network with an
explicit requirement that the relationships be
causal.
25
Causal networks
The equivalence is proved with Bayes theorem...
26
Exercise 14.3 (a) in AIMA
  • Two astronomers in different parts of the world
    make measurements M1 and M2 of the number of
    stars N in some small region of the sky, using
    their telescopes. Normally there is a small
    possibility e of error up to one star in each
    direction. Each telescope can also (with a much
    smaller probability f) be badly out of focus
    (events F1 and F2) in which case the scientist
    will undercount by three or more stars (or, if N
    is less than 3, fail to detect any stars at all).
    Consider the three networks in Figure 14.19.
  • (a) Which of these Bayesian networks are correct
    (but not necessarily efficient) representations
    of the preceeding information?

27
Exercise 14.3 (a) in AIMA
  • Two astronomers in different parts of the world
    make measurements M1 and M2 of the number of
    stars N in some small region of the sky, using
    their telescopes. Normally there is a small
    possibility e of error up to one star in each
    direction. Each telescope can also (with a much
    smaller probability f) be badly out of focus
    (events F1 and F2) in which case the scientist
    will undercount by three or more stars (or, if N
    is less than 3, fail to detect any stars at all).
    Consider the three networks in Figure 14.19.
  • (a) Which of these Bayesian networks are correct
    (but not necessarily efficient) representations
    of the preceeding information?

F1
F2
N
M2
M1
28
Exercise 14.3 (a) in AIMA
  • Two astronomers in different parts of the world
    make measurements M1 and M2 of the number of
    stars N in some small region of the sky, using
    their telescopes. Normally there is a small
    possibility e of error up to one star in each
    direction. Each telescope can also (with a much
    smaller probability f) be badly out of focus
    (events F1 and F2) in which case the scientist
    will undercount by three or more stars (or, if N
    is less than 3, fail to detect any stars at all).
    Consider the three networks in Figure 14.19.
  • (a) Which of these Bayesian networks are correct
    (but not necessarily efficient) representations
    of the preceeding information?

29
Exercise 14.3 (a) in AIMA
  • (i) must be incorrect N is d-separated from F1
    and F2, i.e. knowing the focus states F does not
    affect N if we know M. This cannot be correct.

wrong
30
Exercise 14.3 (a) in AIMA
  • Two astronomers in different parts of the world
    make measurements M1 and M2 of the number of
    stars N in some small region of the sky, using
    their telescopes. Normally there is a small
    possibility e of error up to one star in each
    direction. Each telescope can also (with a much
    smaller probability f) be badly out of focus
    (events F1 and F2) in which case the scientist
    will undercount by three or more stars (or, if N
    is less than 3, fail to detect any stars at all).
    Consider the three networks in Figure 14.19.
  • (a) Which of these Bayesian networks are correct
    (but not necessarily efficient) representations
    of the preceeding information?

wrong
31
Exercise 14.3 (a) in AIMA
  • (ii) is correct it describes the causal
    relationships. It is a causal network.

wrong
ok
32
Exercise 14.3 (a) in AIMA
  • Two astronomers in different parts of the world
    make measurements M1 and M2 of the number of
    stars N in some small region of the sky, using
    their telescopes. Normally there is a small
    possibility e of error up to one star in each
    direction. Each telescope can also (with a much
    smaller probability f) be badly out of focus
    (events F1 and F2) in which case the scientist
    will undercount by three or more stars (or, if N
    is less than 3, fail to detect any stars at all).
    Consider the three networks in Figure 14.19.
  • (a) Which of these Bayesian networks are correct
    (but not necessarily efficient) representations
    of the preceeding information?

wrong
ok
33
Exercise 14.3 (a) in AIMA
  • (iii) is also ok a fully connected graph would
    be correct (but not efficient). (iii) has all
    connections except MiFj and FiFj. (iii) is not
    causal and not efficient.

wrong
ok
ok but not good
34
Efficient representation of PDs
A
C
P(Ca,b) ?
  • Boolean ? Boolean
  • Boolean ? Discrete
  • Boolean ? Continuous
  • Discrete ? Boolean
  • Discrete ? Discrete
  • Discrete ? Continuous
  • Continuous ? Boolean
  • Continuous ? Discrete
  • Continuous ? Continuous

B
35
Efficient representation of PDs
  • Boolean ? Boolean Noisy-OR, Noisy-AND
  • Boolean/Discrete ? Discrete Noisy-MAX
  • Bool./Discr./Cont. ? Continuous Parametric
    distribution (e.g. Gaussian)
  • Continuous ? Boolean Logit/Probit

36
Noisy-OR exampleBoolean ? Boolean
The effect (E) is off (false) when none of the
causes are true. The probability for the effect
increases with the number of true causes.
(for this example)
Example from L.E. Sucar
37
Noisy-OR general caseBoolean ? Boolean
Example on previous slide usedqi 0.1 for all i.
Needs only n parameters, not 2n parameters.
Image adapted from Laskey Mahoney 1999
38
Noisy-OR example (II)
  • Fever is True if and only if Cold, Flu or Malaria
    is True.
  • each cause has an independent chance of causing
    the effect.
  • all possible causes are listed
  • inhibitors are independent

Cold
Flu
Malaria
q1
q2
q3
Fever
39
Noisy-OR example (II)
  • P(Fever Cold) 0.4 ? q1 0.6
  • P(Fever Flu) 0.8 ? q2 0.2
  • P(Fever Malaria) 0.9 ? q3 0.1

Cold
Flu
Malaria
q1 0.6
q2 0.2
q3 0.1
Fever
40
Noisy-OR example (II)
  • P(Fever Cold) 0.4 ? q1 0.6
  • P(Fever Flu) 0.8 ? q2 0.2
  • P(Fever Malaria) 0.9 ? q3 0.1

Cold
Flu
Malaria
q1 0.6
q2 0.2
q3 0.1
Fever
41
Noisy-OR example (II)
  • P(Fever Cold) 0.4 ? q1 0.6
  • P(Fever Flu) 0.8 ? q2 0.2
  • P(Fever Malaria) 0.9 ? q3 0.1

Cold
Flu
Malaria
q1 0.6
q2 0.2
q3 0.1
Fever
42
Noisy-OR example (II)
  • P(Fever Cold) 0.4 ? q1 0.6
  • P(Fever Flu) 0.8 ? q2 0.2
  • P(Fever Malaria) 0.9 ? q3 0.1

Cold
Flu
Malaria
q1 0.6
q2 0.2
q3 0.1
Fever
43
Parametric probability densitiesBoolean/Discr./Co
ntinuous ? Continuous
  • Use parametric probability densities, e.g., the
    normal distribution

Gaussian networks (a input to the node)
44
Probit LogitDiscrete ? Boolean
  • If the input is continuous but output is boolean,
    use probit or logit

P(Ax)
x
45
The cancer network
Discrete
Discrete/boolean
  • Age 1-10, 11-20,...
  • Gender M, F
  • Toxics Low, Medium, High
  • Smoking No, Light, Heavy
  • Cancer No, Benign, Malignant
  • Serum Calcium Level
  • Lung Tumour Yes, No

Discrete
Discrete
Discrete
Continuous
Discrete/boolean
46
Inference in BN
  • Inference means computing P(Xe), where X is a
    query (variable) and e is a set of evidence
    variables (for which we know the values).
  • Examples
  • P(Burglary john_calls, mary_calls)
  • P(Cancer age, gender, smoking, serum_calcium)
  • P(Cavity toothache, catch)

47
Exact inference in BN
  • Doable for boolean variables Look up entries
    in conditional probability tables (CPTs).

48
Example The alarm network
What is the probability for a burglary if both
John and Mary call?
Evidence variables J,M Query variable B
49
Example The alarm network
What is the probability for a burglary if both
John and Mary call?
0.001 10-3
50
Example The alarm network
What is the probability for a burglary if both
John and Mary call?
51
Example The alarm network
What is the probability for a burglary if both
John and Mary call?
52
Example The alarm network
What is the probability for a burglary if both
John and Mary call?
Answer 28
53
Use depth-first search
A lot of unneccesary repeated computation...
54
Complexity of exact inference
  • By eliminating repeated calculation
    uninteresting paths we can speed up the inference
    a lot.
  • Linear time complexity for singly connected
    networks (polytrees).
  • Exponential for multiply connected networks.
  • Clustering can improve this

55
Approximate inference in BN
  • Exact inference is intractable in large multiply
    connected BNs ? use approximate inference
    Monte Carlo methods (random sampling).
  • Direct sampling
  • Rejection sampling
  • Likelihood weighting
  • Markov chain Monte Carlo

56
Markov chain Monte Carlo
  • Fix the evidence variables (E1, E2, ...) at their
    given values.
  • Initialize the network with values for all other
    variables, including the query variable.
  • Repeat the following many, many, many times
  • Pick a non-evidence variable at random (query Xi
    or hidden Yj)
  • Select a new value for this variable, conditioned
    on the current values in the variables Markov
    blanket.
  • Monitor the values of the query variables.
Write a Comment
User Comments (0)
About PowerShow.com