Cause and Independence - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Cause and Independence

Description:

Minimal Spanning tree approach. A variant on the spanning tree for cases ... 1. Build a spanning tree and obtain an ordering of the nodes starting at the root. ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 29
Provided by: dfg8
Category:

less

Transcript and Presenter's Notes

Title: Cause and Independence


1
Lecture 7
  • Cause and Independence

2
Cause in trees
  • We noted previously that cause can be found by
    identifying the root nodes of network. (An expert
    is required for this)
  • However it is also possible (in theory) to
    determine cause statistically

3
Possible configurations for a triplet
4
Conditional Independence
  • Remember that, for configurations of type 1 and
    type 2 the nodes A and B are conditionally
    independent (given C)

5
Marginal Independence
  • However for the type 3 triplet, the nodes A and B
    are only independent if there is no information
    on C.

6
Determining marginal independence
  • Given a data set for our triplet A-C-B we can
    measure the dependence of A and B using all the
    data. (ie with no information on C)
  • If this is low we may suspect a multiple parent

7
Determining marginal independence
  • Alternatively we can partition our data according
    to the states of C, and then compute a set of
    dependency values (one for each state of C).
  • If any of these is high we may suspect a multiple
    parent

8
Practical Computation
  • Partition the data according to the states of the
    middle node, calculate the dependency for each
    set.
  • if Dep(A,B) small, and some Dep(A,B)Ccj is large
    assume multiple parent

9
Algorithm for determining causal directions
  • For each triplet A-C-B in the tree
  • Test to see if A and B are independent, but have
    some dependency given C
  • For any such triplet set the arrow directions
  • A?C?B

10
Continuing to find causal links
  • Having obtained some arrows in the network we can
    now extend the procedure to resolve the case
    where a node has a known parent.
  • If A and C are independent given B, B is the
    parent of C otherwise vice versa.

11
We start with an undirected tree
12
Test for multiple parents
13
Propagate the arrows where possible
14
Continue until no propagation is possible
15
Problems in determining cause
  • Mutual entropy is a continuous function, it tells
    us only the degree to which variables are
    dependent. Thus we need thresholds to decide
    whether a node has multiple parents.
  • We may find a few (or no) cases of multiple
    parents.

16
Problem break
  • Given the following data, what arc directions
    would you give to the triple A-B-C

If we ignore B then A and C are completely
independent. However given Bb0 or Bb1 there is
complete dependence Thus the causal picture is
A?B?C
If you believe Pearl!
17
Structure and Parameter Learning
  • One of the good features of Bayesian networks is
    that they combine both structure and parameters.
  • We can express our knowledge (if any) about the
    data by choosing a structure.
  • We then optimise the performance by adjusting the
    parameters (link matrices).

18
Pure Parameter Learning
  • Neural networks are a class of inference systems
    which offer just parameter learning.
  • Generally it is very difficult to embed knowledge
    into a neural net, or infer a structure once the
    learning phase is complete.

19
Pure structural Learning
  • Traditional rule based inference systems have
    just structure (sometimes with a rudimentary
    parameter mechanism).
  • They do offer structure modification through
    methods such as rule induction.
  • However, they are difficult to optimise using
    large data sets.

20
Small is beautiful
  • The joint probability of the variables in a
    Bayesian Network is simply the product of the
    conditional probabilities and the priors of the
    root(s).
  • If the network is an exact model of the data then
    it must represent the dependency exactly.
  • However, using a spanning tree algorithm this may
    not be the case

21
Small is Beautiful
  • In particular, we may not be able to insert an
    arc between two nodes with some dependency
    because it would form a loop.
  • The effect of unaccounted dependencies is likely
    to be more pronounced as the number of variables
    in the network increases.

22
Minimal Spanning tree approach
  • A variant on the spanning tree for cases where
    the class node is known was proposed by Enrique
    Sucar
  • This requires a measure of quality of the tree.
  • A simple approach to this is to test the
    predictive ability of the network.

23
The steps are as follows
  • 1. Build a spanning tree and obtain an ordering
    of the nodes starting at the root.
  • 2. Remove all arcs
  • 3. Add arcs in the order of the magnitude of
    their dependency
  • 4. If the predictive ability of the network is
    good enough (or the nodes are all joined) stop,
    otherwise go to step 3

24
Multi-Trees
  • An interesting way of reducing the size of
    Bayesian classifier networks was proposed by
    Heckerman
  • Here the data set is partitioned according to the
    states of the root node(s)

25
Example of Multi trees
  • Given a data set with the root identified as D
  • a1,b1,c1,d1 a2,b1,c1,d1 a2,b1,c2,d1
    a2,b2,c1,d1
  • a1,b2,c2,d2 a2,b1,c1,d2 a2,b2,c2,d2
    a1,b1,c1,d2
  • a1,b1,c1,d3 a2,b2,c2,d3 a2,b2,c1,d3
    a2,b2,c2,d3
  • Since D has three states we split the data into 3
    sets

26
Example of Multi trees, part 2
  • Data set for Dd1
  • a1,b1,c1 a2,b1,c1 a2,b1,c2 a2,b2,c1
  • Data set for Dd2
  • a1,b2,c2 a2,b1,c1 a2,b2,c2 a1,b1,c1
  • Data set for Dd3
  • a1,b1,c1, a2,b2,c2 a2,b2,c1 a2,b2,c2
  • And we use the spanning tree methodology to find
    three trees with 3 variables rather than one tree
    with 4 variables

27
Example of Multi trees, part 3
The resulting trees will have different structure
and different conditional probabilities (when
some causal relation has been established)
28
Using multi-trees
  • For a given data point (ai,bj,ck) we calculate
    the joint probability using each tree found
  • Evidence for d1 is PDd1(aibjck)
  • Evidence for d2 is PDd2(aibjck)
  • Evidence for d3 is PDd3(aibjck)
  • The evidence can be normalised to form a
    distribution.
Write a Comment
User Comments (0)
About PowerShow.com