Connectionist Computing COMP 30230 - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

Connectionist Computing COMP 30230

Description:

The size of the largest clique in the induced graph is an indicator for the ... marginalise down to any variable. details in the Jensen&Lauritzen paper on the web site ... – PowerPoint PPT presentation

Number of Views:77

Avg rating:3.0/5.0

Slides: 39

Provided by: gruye

Category:

more less

Transcript and Presenter's Notes

Title: Connectionist Computing COMP 30230

1
Connectionist ComputingCOMP 30230

Gianluca Pollastri
office 2nd floor, UCD CASL
email gianluca.pollastri_at_ucd.ie

2
Credits

Geoffrey Hinton, University of Toronto.
borrowed some of his slides for Neural Networks
and Computation in Neural Networks courses.
Ronan Reilly, NUI Maynooth.
slides from his CS4018.
Paolo Frasconi, University of Florence.
slides from tutorial on Machine Learning for
structured domains.

3
Lecture notes

http//gruyere.ucd.ie/2009_courses/30230/
Strictly confidential...

4
Books

No book covers large fractions of this course.
Parts of chapters 4, 6, (7), 13 of Tom Mitchells
Machine Learning
Parts of chapter V of Mackays Information
Theory, Inference, and Learning Algorithms,
available online at
http//www.inference.phy.cam.ac.uk/mackay/itprnn/b
ook.html
Chapter 20 of Russell and Norvigs Artificial
Intelligence A Modern Approach, also available
at
http//aima.cs.berkeley.edu/newchap20.pdf
More materials later..

5
Make a Boltzmann Machine

http//gruyere.ucd.ie/2009_courses/30230/boltzmann
.doc
Due on March 6th
30!
-5 every day late

6
d-separation

Two variables A and B are d-separated given the
evidence e if for all paths between A and B there
is an intermediate variable V such that
the connection is serial or divergent and V is
instantiated by e
the connection is converging and neither V not
any of Vs descendants have received evidence.

7
P(U)

If we can store P(U), it is possible to update
easily our belief for all the variables composing
U when we receive evidence. This is done by
inserting evidence
marginalising

8
Bayesian Networks

A BN consists of
a set of variables/nodes and a set of directed
edges between nodes.
each variable has a finite set of mutually
exclusive states.
the overall graph is a DAG
to each node A with parents B1, B2, .., Bn there
is attached a conditional probability table (CPT)
P(A B1, B2, .., Bn).
Each variable is independent on its
non-descendants given its parents.

9
Chain rule for BN

Let BN be a Bayesian Network over UA1, .. ,
An. The joint probability distribution P(U) is
the product of the conditional probabilities that
label the BN.
If pa(Ai)parents of Ai, then

10
Probability updates in BN

Now P(U) is factorised into a set of smaller
tables.
Given evidence e, how can we update P(A) to
P(Ae) for each variable in U, using the BN, i.e.
how can we insert findings and marginalise?

11
Marginalising in BN

What we want to do is
Which means

12
distributing

In
we want to distribute the sums so that we are
making the smallest possible number of operations.

13
example
A1,A2,A3,A4
A3, A4, A5
A4,A5
A4,A5
A3,A4
14
marginalisation by elimination

We now know that we can marginalise a probability
distribution wrt a variable (or set of variables)
by successive eliminations.
Not all elimination sequences carry the same
complexity.
Now the task is finding the best elimination
sequence.

15
Domain graph

We say that two variables are members of the same
domain if they appear in the same conditional
probability table of a BN.
The domain graph for a set of variables is a
graph with one node for each variable and an
undirected edge between any two variables that
are members of the same domain.
This is sometimes also called moralised graph for
the BN.

16
example
A
B
C
E
D
F
G
17
elimination from a domain graph

We eliminate a variable A from a domain graph G
by the following procedure
add a link (fill-in) between any two neighbours
of A
remove A
The new domain graph is called G-A. It can be
shown that G-A is the domain graph for P(U\A).
This means that we can perform variable
elimination on the domain graph.

18
example eliminate A
A
B
B
C
C
E
D
E
D
F
G
F
G
19
example eliminate B
B
C
C
E
D
E
D
F
G
F
G
20
example eliminate C
C
E
D
E
D
F
G
F
G
21
example eliminate C first
A
B
A
B
C
E
D
E
D
F
G
F
G
order matters!
22
induced graph

We call induced graph of G and an elimination
order s (or s-completion of G) the graph Gs
obtained by augmenting G with all the fill-ins
associated with s.

23
triangulated graph

An undirected graph G is triangulated if every
cycle with more than three links has a chord (a
link connecting two nodes not being neighbours in
the cycle).
A graph G is said to be a triangulation of G if
G is triangulated, and G is a subgraph of G
over the same nodes.
Any s-completion of G is a triangulation of G.
A graph is triangulated if, and only if, it has
an elimination sequence without fill-ins.

24
induced graph and cliques

Every time we eliminate a node A (eliminate a
variable A) we create a clique, i.e. a fully
connected subgraph of G containing all neighbours
of A.
Every maximal clique in an induced
graphcorresponds to a intermediate factor in the
computations
Every factor stored during the process is a
subset of some maximal clique in the graph

25
example
Elimination order A, B, C, D, E, F, G. No
fill-ins needed
A
B
C
E
D
F
G
26
induced width

The size of the largest clique in the induced
graph is an indicator for the complexity of
variable elimination
This quantity is called the induced width of a
graph according to the specified ordering
Finding a good ordering for a graph is equivalent
to finding the minimal induced width of the graph

27
Elimination on Trees

Suppose we have a tree
A network where each variable has at most one
parent
All the factors involve at most two variables
Thus, the domain graph is also a tree

28
Elimination on Trees

We can maintain the tree structure by eliminating
extreme variables in the tree

A
C
B
A
E
D
C
B
F
G
D
E
F
G
29
Elimination on Trees

Formally, for any tree, there is an elimination
ordering with induced width 1
Inference on trees is linear in number of
variables

30
PolyTrees

A polytree is a network where there is at most
one path from one variable to another
Inference in a polytree is linear in the
representation size of the network
This assumes tabular CPT representation
Check it if you wish..

31
General Networks

What do we do when the network is not a polytree?
If network has a cycle, the induced width for any
ordering is greater than 1

32
Example

Eliminating A, B, C, D, E,.

33
Example

Eliminating H,G, E, C, F, D, E, A

A
A
B
C
D
E
F
G
H
H
34
General Networks

It can be shown that finding an ordering that
minimises the induced width is NP-Hard
However,
There are reasonable heuristics for finding good
orderings
There are provable approximations to the best
induced width
If the graph has a small induced width, there are
algorithms that find it in polynomial time

35
Junction tree

If G is a triangulated graph, and C1 .. Ck are
its cliques, a junction tree for G is a graph
whose nodes are C1 .. Ck
such that each node on the path between Ci and Cj
contains CinCj.
The edge between two nodes is labelled with the
intersection between the nodes.