Learning factor graphs in polynomial time - PowerPoint PPT Presentation

About This Presentation
Title:

Learning factor graphs in polynomial time

Description:

Applicable to any factor graph of bounded factor size and ... Samples from PBN with unknown structure. Factor graph. Factor graph distribution P with D(PBN||P) ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 36
Provided by: pieter4
Learn more at: http://ai.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: Learning factor graphs in polynomial time


1
Learning factor graphs in polynomial time
sample complexity
  • Pieter Abbeel
  • Daphne Koller
  • Andrew Y. Ng
  • Stanford University

2
Overview
Introduction
  • First polynomial time sample complexity
    learning algorithm for factor graphs,
  • a superset of Bayesian nets, Markov nets.
  • Applicable to any factor graph of bounded factor
    size and connectivity,
  • including intractable networks (e.g., grids).
  • New technical ideas
  • Parameter learning closed-form parameterization
    with low-dimensional frequencies only.
  • Structure learning results about
    guaranteed-approximate Markov blankets from
    sample data.

3
Factor graph distributions
Introduction
Bayesian network
Factor graph
1 factor per conditional probability table
Factor graph
Markov random field
1 factor per clique
4
Factor graph distributions
Introduction
  • Example

factor over variables Cj µ X1n
partition function
instantiation x1n restricted to Cj
Factor node Variable node
5
Related work
Introduction
Target distr. (structure/parameter learning) True distr. Samples Time Graceful degradation Ref.
ML Tree (structure) any poly poly yes 1
ML Bounded tree width (structure) any poly NP-hard yes 2
Bounded tree width (structure) same poly poly no 3
Factor graph (parameter) same poly poly yes x
Factor graph (structure) same poly poly yes x
  • Our work first poly time sample complexity
    solutions for parameter estimation structure
    learning of factor graphs.
  • Current practice for parameter learning max
    likelihood.
  • Expensive, and applies only to tractable
    networks.
  • Current practice for structure learning local
    search heuristics or heuristic learning of
    bounded tree-width model.
  • Slow to evaluate, and no performance guarantees.
    4,5,6,7,8

1 ChowLiu, 1968 2 Srebro, 2001 3
NarasimhanBilmes, 2004 4 Della Pietra et al.,
1997 5 McCallum, 2003 6 Malvestuto, 1991
7 BachJordan, 2002 8 Deshpande et al., 2001
6
Canonical parameterization
Parameter learning
  • Consider the
  • factor graph
  • Hammersley-Clifford theorem gives

canonical factors
7
Canonical factors
Parameter learning
  • No lower-order interactions by inclusion-exclusion

Complete interaction.
Subtract lower order interactions.
Compensate for double counting.
Frequencies only.
Equal of ,- terms.
Closed-form parameter learning ? NO. (Not yet.)
The frequencies P(X116(x1,x2,0,,0)) involve
full instantiations and are thus expensive to
estimate from samples.
8
Markov blanket canonical factors
Parameter learning
Positive and negative term of canonical factor
Transform to conditional probability.
Terms cancel.
Conditional independence.
Low dimensional distributions.
(MB Markov blanket.)
9
Markov blanket canonical factors
Parameter learning
  • Cj all subfactors of the given structure.
  • from distribution over Cj,
    MB(Cj).
  • Low dimensional distributions.
  • Efficient estimation from samples.
  • Example

10
Parameter learning
Parameter learning
  • Algorithm
  • Estimate the Markov blanket canonical
  • factors from data.
  • Return
  • Theorem. The parameter learning algorithm
  • runs in polynomial time,
  • uses polynomial of samples,
  • guarantees D() is small with high probability.

No dependence on tree-width of the network!
11
Graceful degradation
Parameter learning
  • Theorem. When
  • true distribution factor graph G,
  • structure for parameter learning factor graph G
    (? G),
  • then the additional error consists of two terms

?
Canonical factors capture residual highest-order
interactions only. Small error when subfactors
are in G.
?
?
If MB is a good approximation of MB, error will
be small. (See structure learning.)
MB in given factor graph G
MB in given factor graph G
?
12
Structure learning
Structure learning
  • Assume factor size ? k.

?

Structure learning
Structure all factors of size ? k
Parameter learning

Estimating Markov blanket canonical factors
requires knowledge of the Markov blankets.
NO
But if we knew the Markov blankets, structure
learning problem would be solved.
13
Recovering the Markov blankets
Structure learning
Markov blanket criterion
True distribution
True Markov blankets
Markov blanket criterion
Sample data
???
At best approximate Markov blanket from sample
data.
Key for parameter learning
Desired property for approximate Markov blanket
14
Conditional entropy
Structure learning
  • Conditional entropy
  • For any candidate Markov blanket Y

Conditional independence
Conditioning reduces entropy For any X,Y,Z H
(X Y,Z ) ? H (X Y ).
Conditional entropy
Thus
True distribution
True Markov blankets
What about
Conditional entropy
Sample data
???
15
Conditional entropy
Structure learning
  • Theorem. Empirical conditional entropy estimates
    are a good approximation for the true conditional
    entropy, even with poly number of samples.
  • Theorem. Conditional entropy satisfies the
    desired approximate Markov blanket property
  • For any ? gt 0,

?
MB(C) looks like Markov blanket
if
?
MB(C) can be used as Markov blanket for learning
then
where
16
Structure learning algorithm
Structure learning
  • Assume factor size ? k, Markov blanket size ? b.
  • For all subsets of variables Cj of size ? k
  • Estimate Markov blanket
  • canonical factors
  • from data.
  • Discard factors that are close
  • to the trivial all ones factor.
  • Return

Find Markov blankets from empirical entropy.
Parameter learning.
Simplify structure.
17
Structure learning theorem
Structure learning
  • Assume fixed factor size ? k, MB size ? b.
  • Theorem. The structure learning algorithm
  • runs in polynomial time,
  • uses polynomial of samples,
  • guarantees D() is small with high
    probability.
  • Note
  • Exponential dependence on factor size, MB size
    for computational and sample complexity.
  • Bounded connectivity implies bounded factor and
    MB size.

No dependence on tree-width of the network!
18
Graceful degradation
Structure learning
  • Theorem. Let G be the factor graph of true
    distribution. When in the true distribution the
    max factor size gt k or max MB size gt b, the
    additional error consists of three terms

Canonical factors capture residual highest-order
interactions only. Small error when small true
interactions of order gt k.
?
If MB is a good approximation of MB, error will
be small.
Factors that are trivial in the true
distribution but estimated as non-trivial since
their MB size is larger than b.
19
Consequences for Bayesian networks
Structure learning
Factor graph
Bayesian network
1 factor per conditional probability table
bounded factor size, bounded Markov blanket size
bounded fan-in, fan-out
Factor graph
Samples from PBN with unknown structure.
Factor graph distribution P with D(PBNP) ? ?.
?
?
structure learning
Learning a factor graph (not a Bayesian network)
gives efficient learning of the distribution from
finite data.
20
Related work
Structure learning
  • Finding highest scoring, bounded in-degree
    Bayesian network is NP-hard (Chickering, Meek
    Heckerman, 2003).
  • Our algorithm recovers a factor graph
    representation only.
  • The (difficult) acyclicity constraint is avoided.

?
  • Learning a factor graph (not a Bayesian network)
  • gives efficient learning of the distribution
    from finite data.
  • Note Spirtes, Glymour Scheines (2000) and
    Chickering Meek (2002) do recover Bayesian
    network structure, but only with access to true
    distribution (infinite sample size).

21
Discussion and conclusion
Conclusion
  • First polynomial time polynomial sample
    complexity learning algorithm for factor graphs.
  • Applicable to any factor graph of bounded factor
    size and connectivity,
  • including intractable networks (e.g., grids).
  • Practical drawbacks of the proposed algorithm
  • Estimates parameters from only small fraction of
    data.
  • Structure learning algorithm enumerates all
    possible Markov blankets.
  • Complexity exponential in Markov blanket size.

22
Done ...
  • Additional and outdated slides follow.

23
Parameter learning theorem
Detailed theorem statements
24
Structure learning theorem
Detailed theorem statements
25
Learning factor graphs in polynomial time
sample complexity
  • Factor graphs superset of Markov, Bayesian
    networks.

Factor graph
Markov network (MN)
1 factor per clique
Bayesian network (BN)
Factor graph
1 factor per conditional probability table
  • Current practice in Markov network learning
  • parameter learning max likelihood, only
    applicable in tractable MNs.
  • structure learning local-search heuristics or
    heuristic learning of bounded tree-width model.
    No performance guarantees.
  • Finding highest scoring BN is NP-hard (Chickering
    et al. 2003).

Pieter Abbeel, Daphne Koller and Andrew Y. Ng
26
Learning factor graphs in polynomial time
sample complexity
  • First polynomial time sample complexity
    learning algorithm for factor graphs.
  • Applicable to any factor graph of bounded factor
    size and connectivity,
  • including intractable networks (e.g., grids).
  • New technical ideas
  • Parameter learning in closed-form, using
    parameterization with low-dimensional frequencies
    only.
  • Structure learning results about
    guaranteed-approximate Markov blankets from
    sample data.

Pieter Abbeel, Daphne Koller and Andrew Y. Ng
27
Relation to Narasimhan Bilmes (2004)
Structure learning
Narasimhan Bilmes (2004) This paper
Independent of treewidth. NO. YES.
Independent of Markov blanket size. YES. NO.
Graceful degradation result. NO. YES.
n x n grid treewidthn1, Markov blanket size6.
n-star graph treewidth2, Markov blanket sizen.
Factor node Variable node
28
Canonical parameterization
29
Canonical parameterization (2)
30
Canonical parameterization (3)
31
Markov blanket canonical factors
32
Markov blanket canonical parameterization
33
Approximate Markov blankets
34
Structure learning algorithm
35
Structure learning algorithm
Write a Comment
User Comments (0)
About PowerShow.com