Learning factor graphs in polynomial time - PowerPoint PPT Presentation

About This Presentation

Title:

Learning factor graphs in polynomial time

Description:

Applicable to any factor graph of bounded factor size and ... Samples from PBN with unknown structure. Factor graph. Factor graph distribution P with D(PBN||P) ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 36

Provided by: pieter4

Learn more at: http://ai.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: Learning factor graphs in polynomial time

1
Learning factor graphs in polynomial time
sample complexity

Pieter Abbeel
Daphne Koller
Andrew Y. Ng
Stanford University

2
Overview
Introduction

First polynomial time sample complexity
learning algorithm for factor graphs,
a superset of Bayesian nets, Markov nets.
Applicable to any factor graph of bounded factor
size and connectivity,
including intractable networks (e.g., grids).
New technical ideas
Parameter learning closed-form parameterization
with low-dimensional frequencies only.
Structure learning results about
guaranteed-approximate Markov blankets from
sample data.

3
Factor graph distributions
Introduction
Bayesian network
Factor graph
1 factor per conditional probability table
Factor graph
Markov random field
1 factor per clique
4
Factor graph distributions
Introduction

Example

factor over variables Cj µ X1n
partition function
instantiation x1n restricted to Cj
Factor node Variable node
5
Related work
Introduction
Target distr. (structure/parameter learning) True distr. Samples Time Graceful degradation Ref.
ML Tree (structure) any poly poly yes 1
ML Bounded tree width (structure) any poly NP-hard yes 2
Bounded tree width (structure) same poly poly no 3
Factor graph (parameter) same poly poly yes x
Factor graph (structure) same poly poly yes x

Our work first poly time sample complexity
solutions for parameter estimation structure
learning of factor graphs.
Current practice for parameter learning max
likelihood.
Expensive, and applies only to tractable
networks.
Current practice for structure learning local
search heuristics or heuristic learning of
bounded tree-width model.
Slow to evaluate, and no performance guarantees.
4,5,6,7,8

1 ChowLiu, 1968 2 Srebro, 2001 3
NarasimhanBilmes, 2004 4 Della Pietra et al.,
1997 5 McCallum, 2003 6 Malvestuto, 1991
7 BachJordan, 2002 8 Deshpande et al., 2001
6
Canonical parameterization
Parameter learning

Consider the
factor graph
Hammersley-Clifford theorem gives

canonical factors
7
Canonical factors
Parameter learning

No lower-order interactions by inclusion-exclusion

Complete interaction.
Subtract lower order interactions.
Compensate for double counting.
Frequencies only.
Equal of ,- terms.
Closed-form parameter learning ? NO. (Not yet.)
The frequencies P(X116(x1,x2,0,,0)) involve
full instantiations and are thus expensive to
estimate from samples.
8
Markov blanket canonical factors
Parameter learning
Positive and negative term of canonical factor
Transform to conditional probability.
Terms cancel.
Conditional independence.
Low dimensional distributions.
(MB Markov blanket.)
9
Markov blanket canonical factors
Parameter learning

Cj all subfactors of the given structure.
from distribution over Cj,
MB(Cj).

Low dimensional distributions.
Efficient estimation from samples.

Example

10
Parameter learning
Parameter learning

Algorithm
Estimate the Markov blanket canonical
factors from data.

Return

Theorem. The parameter learning algorithm
runs in polynomial time,
uses polynomial of samples,
guarantees D() is small with high probability.

No dependence on tree-width of the network!
11
Graceful degradation
Parameter learning

Theorem. When
true distribution factor graph G,
structure for parameter learning factor graph G
(? G),
then the additional error consists of two terms

?
Canonical factors capture residual highest-order
interactions only. Small error when subfactors
are in G.
?
?
If MB is a good approximation of MB, error will
be small. (See structure learning.)
MB in given factor graph G
MB in given factor graph G
?
12
Structure learning
Structure learning

Assume factor size ? k.

?

Structure learning
Structure all factors of size ? k
Parameter learning

Estimating Markov blanket canonical factors
requires knowledge of the Markov blankets.
NO
But if we knew the Markov blankets, structure
learning problem would be solved.
13
Recovering the Markov blankets
Structure learning
Markov blanket criterion
True distribution
True Markov blankets
Markov blanket criterion
Sample data
???
At best approximate Markov blanket from sample
data.
Key for parameter learning
Desired property for approximate Markov blanket
14
Conditional entropy
Structure learning

Conditional entropy
For any candidate Markov blanket Y

Conditional independence
Conditioning reduces entropy For any X,Y,Z H
(X Y,Z ) ? H (X Y ).
Conditional entropy
Thus
True distribution
True Markov blankets
What about
Conditional entropy
Sample data
???
15
Conditional entropy
Structure learning

Theorem. Empirical conditional entropy estimates
are a good approximation for the true conditional
entropy, even with poly number of samples.

Theorem. Conditional entropy satisfies the
desired approximate Markov blanket property
For any ? gt 0,

?
MB(C) looks like Markov blanket
if
?
MB(C) can be used as Markov blanket for learning
then
where
16
Structure learning algorithm
Structure learning

Assume factor size ? k, Markov blanket size ? b.
For all subsets of variables Cj of size ? k
Estimate Markov blanket
canonical factors
from data.
Discard factors that are close
to the trivial all ones factor.
Return

Find Markov blankets from empirical entropy.
Parameter learning.
Simplify structure.
17
Structure learning theorem
Structure learning

Assume fixed factor size ? k, MB size ? b.
Theorem. The structure learning algorithm
runs in polynomial time,
uses polynomial of samples,
guarantees D() is small with high
probability.
Note
Exponential dependence on factor size, MB size
for computational and sample complexity.
Bounded connectivity implies bounded factor and
MB size.

No dependence on tree-width of the network!
18
Graceful degradation
Structure learning

Theorem. Let G be the factor graph of true
distribution. When in the true distribution the
max factor size gt k or max MB size gt b, the
additional error consists of three terms

Canonical factors capture residual highest-order
interactions only. Small error when small true
interactions of order gt k.
?
If MB is a good approximation of MB, error will
be small.
Factors that are trivial in the true
distribution but estimated as non-trivial since
their MB size is larger than b.
19
Consequences for Bayesian networks
Structure learning
Factor graph
Bayesian network
1 factor per conditional probability table
bounded factor size, bounded Markov blanket size
bounded fan-in, fan-out
Factor graph
Samples from PBN with unknown structure.
Factor graph distribution P with D(PBNP) ? ?.
?
?
structure learning
Learning a factor graph (not a Bayesian network)
gives efficient learning of the distribution from
finite data.
20
Related work
Structure learning

Finding highest scoring, bounded in-degree
Bayesian network is NP-hard (Chickering, Meek
Heckerman, 2003).

Our algorithm recovers a factor graph
representation only.
The (difficult) acyclicity constraint is avoided.

Learning a factor graph (not a Bayesian network)
gives efficient learning of the distribution
from finite data.

Note Spirtes, Glymour Scheines (2000) and
Chickering Meek (2002) do recover Bayesian
network structure, but only with access to true
distribution (infinite sample size).

21
Discussion and conclusion
Conclusion

First polynomial time polynomial sample
complexity learning algorithm for factor graphs.
Applicable to any factor graph of bounded factor
size and connectivity,
including intractable networks (e.g., grids).
Practical drawbacks of the proposed algorithm
Estimates parameters from only small fraction of
data.
Structure learning algorithm enumerates all
possible Markov blankets.
Complexity exponential in Markov blanket size.

22
Done ...

Additional and outdated slides follow.

23
Parameter learning theorem
Detailed theorem statements
24
Structure learning theorem
Detailed theorem statements
25
Learning factor graphs in polynomial time
sample complexity

Factor graphs superset of Markov, Bayesian
networks.

Factor graph
Markov network (MN)
1 factor per clique
Bayesian network (BN)
Factor graph
1 factor per conditional probability table

Current practice in Markov network learning
parameter learning max likelihood, only
applicable in tractable MNs.
structure learning local-search heuristics or
heuristic learning of bounded tree-width model.
No performance guarantees.
Finding highest scoring BN is NP-hard (Chickering
et al. 2003).

Pieter Abbeel, Daphne Koller and Andrew Y. Ng
26
Learning factor graphs in polynomial time
sample complexity

First polynomial time sample complexity
learning algorithm for factor graphs.
Applicable to any factor graph of bounded factor
size and connectivity,
including intractable networks (e.g., grids).
New technical ideas
Parameter learning in closed-form, using
parameterization with low-dimensional frequencies
only.
Structure learning results about
guaranteed-approximate Markov blankets from
sample data.

Pieter Abbeel, Daphne Koller and Andrew Y. Ng
27
Relation to Narasimhan Bilmes (2004)
Structure learning
Narasimhan Bilmes (2004) This paper
Independent of treewidth. NO. YES.
Independent of Markov blanket size. YES. NO.
Graceful degradation result. NO. YES.
n x n grid treewidthn1, Markov blanket size6.
n-star graph treewidth2, Markov blanket sizen.
Factor node Variable node
28
Canonical parameterization
29
Canonical parameterization (2)
30
Canonical parameterization (3)
31
Markov blanket canonical factors
32
Markov blanket canonical parameterization
33
Approximate Markov blankets
34
Structure learning algorithm
35
Structure learning algorithm

Write a Comment

User Comments (0)