Learning Bayesian Network Structure from Massive Datasets: The - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Learning Bayesian Network Structure from Massive Datasets: The

Description:

Bioinformatics. Learning Bayesian Network Structures. Constraint satisfaction problem ... To iterate basic procedure, using the previously constructed network to ... – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Slides: 18
Provided by: Kyubae8
Category:

less

Transcript and Presenter's Notes

Title: Learning Bayesian Network Structure from Massive Datasets: The


1
Learning Bayesian Network Structure from Massive
DatasetsThe Sparse Candidate Algorithm
  • Nir Friedman, Iftach Nachman, and Dana Peer
  • Announcer Kyu-Baek Hwang

2
Abstract
  • Learning Bayesian network
  • Optimization problem (in machine learning)
  • Constraint satisfaction (in statistics)
  • Search space is extremely large.
  • Search procedure spends most of times examining
    extremely unreasonable candidate structures.
  • If we can reduce search space, faster learning
    will be possible.
  • Some restrictions on candidate parent variables
    for a variable are given.
  • Bioinformatics

3
Learning Bayesian Network Structures
  • Constraint satisfaction problem
  • ?2-test
  • Optimization problem
  • BDe, MDL
  • Learning is to find the structure maximizes these
    scores.
  • Search technique
  • Generally NP-hard
  • Greedy hill-climbing, simulated annealing
  • O(n2)
  • If the number of examples and the number of
    attributes are large, the computational cost is
    too expensive to get tractable result.

4
Combining Statistical Properties
  • Most of the candidates considered during the
    search procedure can be eliminated in advance
    based on our statistical understanding on the
    domain
  • If X and Y are almost independent in data, we
    might decide not to consider Y as a parent of X.
  • Mutual information
  • Restricting the possible parents of each variable
    (k)
  • k ltlt n 1
  • The key idea is to use the network structure
    found at the last stage to find better candidate
    parents.

5
Background
  • A Bayesian network for X X1, X2, , Xn
  • B ltG, ?gt
  • The problem of learning a Bayesian network
  • Given a training set D X1, X2, , XN,
  • Find a B that best matches D.
  • BDe, MDL
  • Score(GD) ?iScore(XiPa(Xi)NXi, Pa(Xi))
  • Greedy hill-climbing search
  • At each step, all possible local change is
    examined and the change which brings maximal gain
    in the score is selected.
  • Calculation of sufficient statistics is
    computational bottle-neck.

6
Simple Intuitions
  • Using mutual information or correlation
  • If the true structure is X -gt Y -gt Z,
  • I(XZ) gt 0, I(YZ) gt 0, I(XY) gt 0 and I(XZY)
    0
  • Basic idea of Sparse Candidate algorithm
  • For each variable X, we find a set of variables
    Y1, Y2, , Yk that are most promising candidate
    parents for X.
  • This gives us smaller search space.
  • The main drawback of this idea
  • A mistake in initial stage can lead us to find an
    inferior scoring network.
  • To iterate basic procedure, using the previously
    constructed network to reconsider the candidate
    parents.

7
Outline of the Sparse Candidate Algorithm
8
Convergence Properties of the Sparse Candidate
Algorithm
  • We require that in Restrict step, the selected
    candidates for Xis parents include Xis current
    parents.
  • PaGn(Xi) ? Cin1
  • This requirement implies that the winning network
    Bn is a legal structure in the n 1 iteration.
  • Score(Bn1D) ? Score(BnD)
  • Stopping criterion
  • Score(Bn) Score(Bn-1)

9
Mutual Information
  • Mutual information
  • Example
  • I(AC) gt I (AD) gt I(AB)

B
A
C
D
10
Discrepancy Test
  • Initial iteration uses mutual information and
    after this, discrepancy.

11
Other tests
  • Conditional mutual information
  • Penalizing structures with more parameters

12
Learning with Small Candidate Sets
  • Standard heuristics
  • Unconstrained
  • Space O(nCk)
  • Time O(n2)
  • Constrained by small candidate
  • Space O(2k)
  • Time O(kn)
  • Divide and Conquer heuristics

13
Strongly Connected Components
  • Decomposing H into strongly connected components
    takes linear time.

14
Separator Decomposition
  • The bottle-neck is S.
  • We can order the variables in S to disallow any
    cycle in H1 ? H2.

15
Experiments on Synthetic Data
16
Experiments on Real-Life Data
17
Conclusions
  • Sparse candidate set enables us to search for
    good structure efficiently.
  • Better criterion is necessary.
  • The authors applied these techniques to
    Spellmans cell-cycle data.
  • Exploiting of network structure to search in H
    needs to be improved.
Write a Comment
User Comments (0)
About PowerShow.com