Title: Cause and Independence
1Lecture 7
2Cause in trees
- We noted previously that cause can be found by
identifying the root nodes of network. (An expert
is required for this) - However it is also possible (in theory) to
determine cause statistically
3Possible configurations for a triplet
4Conditional Independence
- Remember that, for configurations of type 1 and
type 2 the nodes A and B are conditionally
independent (given C)
5Marginal Independence
- However for the type 3 triplet, the nodes A and B
are only independent if there is no information
on C.
6Determining marginal independence
- Given a data set for our triplet A-C-B we can
measure the dependence of A and B using all the
data. (ie with no information on C) - If this is low we may suspect a multiple parent
7Determining marginal independence
- Alternatively we can partition our data according
to the states of C, and then compute a set of
dependency values (one for each state of C). - If any of these is high we may suspect a multiple
parent
8Practical Computation
- Partition the data according to the states of the
middle node, calculate the dependency for each
set. - if Dep(A,B) small, and some Dep(A,B)Ccj is large
assume multiple parent
9Algorithm for determining causal directions
- For each triplet A-C-B in the tree
- Test to see if A and B are independent, but have
some dependency given C - For any such triplet set the arrow directions
- A?C?B
10Continuing to find causal links
- Having obtained some arrows in the network we can
now extend the procedure to resolve the case
where a node has a known parent. - If A and C are independent given B, B is the
parent of C otherwise vice versa.
11We start with an undirected tree
12Test for multiple parents
13Propagate the arrows where possible
14Continue until no propagation is possible
15Problems in determining cause
- Mutual entropy is a continuous function, it tells
us only the degree to which variables are
dependent. Thus we need thresholds to decide
whether a node has multiple parents. - We may find a few (or no) cases of multiple
parents.
16Problem break
- Given the following data, what arc directions
would you give to the triple A-B-C -
If we ignore B then A and C are completely
independent. However given Bb0 or Bb1 there is
complete dependence Thus the causal picture is
A?B?C
If you believe Pearl!
17Structure and Parameter Learning
- One of the good features of Bayesian networks is
that they combine both structure and parameters. - We can express our knowledge (if any) about the
data by choosing a structure. - We then optimise the performance by adjusting the
parameters (link matrices).
18Pure Parameter Learning
- Neural networks are a class of inference systems
which offer just parameter learning. - Generally it is very difficult to embed knowledge
into a neural net, or infer a structure once the
learning phase is complete.
19Pure structural Learning
- Traditional rule based inference systems have
just structure (sometimes with a rudimentary
parameter mechanism). - They do offer structure modification through
methods such as rule induction. - However, they are difficult to optimise using
large data sets.
20Small is beautiful
- The joint probability of the variables in a
Bayesian Network is simply the product of the
conditional probabilities and the priors of the
root(s). - If the network is an exact model of the data then
it must represent the dependency exactly. - However, using a spanning tree algorithm this may
not be the case
21Small is Beautiful
- In particular, we may not be able to insert an
arc between two nodes with some dependency
because it would form a loop. - The effect of unaccounted dependencies is likely
to be more pronounced as the number of variables
in the network increases.
22Minimal Spanning tree approach
- A variant on the spanning tree for cases where
the class node is known was proposed by Enrique
Sucar - This requires a measure of quality of the tree.
- A simple approach to this is to test the
predictive ability of the network.
23The steps are as follows
- 1. Build a spanning tree and obtain an ordering
of the nodes starting at the root. - 2. Remove all arcs
- 3. Add arcs in the order of the magnitude of
their dependency - 4. If the predictive ability of the network is
good enough (or the nodes are all joined) stop,
otherwise go to step 3
24Multi-Trees
- An interesting way of reducing the size of
Bayesian classifier networks was proposed by
Heckerman - Here the data set is partitioned according to the
states of the root node(s)
25Example of Multi trees
- Given a data set with the root identified as D
- a1,b1,c1,d1 a2,b1,c1,d1 a2,b1,c2,d1
a2,b2,c1,d1 - a1,b2,c2,d2 a2,b1,c1,d2 a2,b2,c2,d2
a1,b1,c1,d2 - a1,b1,c1,d3 a2,b2,c2,d3 a2,b2,c1,d3
a2,b2,c2,d3 - Since D has three states we split the data into 3
sets
26Example of Multi trees, part 2
- Data set for Dd1
- a1,b1,c1 a2,b1,c1 a2,b1,c2 a2,b2,c1
- Data set for Dd2
- a1,b2,c2 a2,b1,c1 a2,b2,c2 a1,b1,c1
- Data set for Dd3
- a1,b1,c1, a2,b2,c2 a2,b2,c1 a2,b2,c2
- And we use the spanning tree methodology to find
three trees with 3 variables rather than one tree
with 4 variables
27Example of Multi trees, part 3
The resulting trees will have different structure
and different conditional probabilities (when
some causal relation has been established)
28Using multi-trees
- For a given data point (ai,bj,ck) we calculate
the joint probability using each tree found - Evidence for d1 is PDd1(aibjck)
- Evidence for d2 is PDd2(aibjck)
- Evidence for d3 is PDd3(aibjck)
- The evidence can be normalised to form a
distribution.