Algorithms for Answering Queries with Graphical Models - PowerPoint PPT Presentation

About This Presentation
Title:

Algorithms for Answering Queries with Graphical Models

Description:

New algorithms for learning and inference in PGMs. to make ... Results typical convergence time. good results. early on in practice. 16. Test log-likelihood ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 48
Provided by: ANT971
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Algorithms for Answering Queries with Graphical Models


1
Algorithms for Answering Queries with Graphical
Models
Thesis Proposal
Anton Chechetka
  • Thesis committee Carlos Guestrin
    Eric Xing
    Drew Bagnell Pedro
    Domingos (UW)

21 May 2009
2
Motivation
Activity recognition
Sensor networks
Patient monitoring diagnosis
Image credit http//www.dremed.com
Image credit Pentneyal2006
3
Motivation
Common problem computeP(Q E e)
True temperature in a room?
Sensor 3 reads 25C
Has the person finished cooking?
The person is next to the kitchen sink (RFID)
Is the patient well?
Heart rate is 70 BPM
4
Common solution
Common problem compute P(Q E e)
(query)Common solution probabilistic graphical
models
This thesis New algorithms for learning and
inference in PGMs to make answering queries
better
Pentneyal2006
Deshpandeal2004
Beinlichal1988
5
Graphical models
Represent factorized distributions X? are
small subsets of X ? compact representation
corresponding graph structure
X4
X1
X3
X5
X2
Learn/constructstructure
Learn/defineparameters
Inference
P(QEe)
  • Fundamental problems
  • P(QEe) given a PGM?
  • Best parameters f? given the structure?
  • Optimal structure (i.e. sets X?)?

P-complete / NP-complete exp(X)
complexity NP-complete
6
This thesis
Learn/constructstructure
Learn/defineparameters
Inference
P(QEe)
NIPS 2007
1. Learning tractable models efficiently
and with quality guarantees
2. Simplifying large-scale models /
focusing inference on the query
Thesis contributions
3. Learning simple local models by
exploiting evidence assignments
7
Leaning tractable models
Learn/constructstructure
Learn/defineparameters
Inference
P(QEe)
  • Every step in the pipeline is computationally
    hard for general PGMs
  • Compounding errors
  • But there are exact inference and parameter
    learning algorithms with exp(graph treewidth)
    complexity
  • So if we learn low-treewidth models, all the rest
    is easy!

8
Treewidth
Learn/constructstructure
Learn/defineparameters
Inference
P(QEe)
  • Learn low-treewidth models ? all the rest is
    easy!
  • Treewidth size of largest clique in a
    triangulated graph
  • Computing treewidth is NP-complete in general
  • But easy to constructgraphs with given treewidth
  • Convenient representation junction tree

X4,X5
X1,X4,X5
X4,X5,X6
C1
C4
X1,X5
X1,X5
X1,X2,X5
X1,X3,X5
C2
C5
3
4
X1,X2
1
5
6
X1,X2,X7
C3
7
2
9
Junction trees
  • Learn junction trees ? all the rest is easy!
  • Other classes of tractable models exist, e.g.
    LowdDomingos2008
  • Running intersection property
  • Most likely junction tree of fixed treewidth gt1
    is NP-complete
  • We will look for good approximations

X4,X5
X1,X4,X5
X4,X5,X6
C1
C4
X1,X5
X1,X5
X1,X2,X5
X1,X3,X5
C2
C5
X1,X2
X1,X2,X7
C3
10
Independencies in low-treewidth distributions
P(X) factorizes according to a JT
conditional independencies hold
conditional mutual information
works in the other way too!
X4,X5,X6
X1,X3,X5
X???X2X3X7
X???X4X6
X1,X5
X1,X2,X7
X1,X2,X5
X1,X4,X5
11
Constraint-based structure learning
We will look for JTs where this holds
Constraint-based structure learning
S1
S4
S3
S3

S1
S2
I(V, X \VS S) lt ? ??
S4
S2
Construct a junction tree (e.g. using dynamic
programming)
Take all candidate separators
12
Mutual information estimation
I(V, X \VS S) lt ? ??
definition I(A,BS) H(A S)
H(ABS) naïve estimation of costs
exp(X), too expensive
sum over all 2Xassignments to X
our work upper bound on I(V, X \VS S),
using values of I(Y,ZS) for
YZ?treewidth1 there are
O(Xtreewidth1) subsets Y and Z
?
complexity polynomial in X
13
Mutual information estimation
I(V, X \VS S) lt ? ??
hard
  • Theorem suppose that P(X), S, V are s.t.
  • an ?-JT of treewidth k for P(X) exists
  • for every A?V, B?X-VS s.t. AB ? k1
  • I( ? ) ? ?
  • Then
  • I(V, X-VS S) ? X(? ?)

I(V,X-VS S)??
V
X-VS
easy
I(A,BS)
B
A
AB?treewidth1
  • Complexity O(Xk1) ? exponential speedup
  • No need to know the ?-JT, only that it exists
  • The bound is loose only when there is no hope to
    learn a good JT

14
Guarantees on learned model quality
  • Theorem suppose that P(X) is s.t.
  • a strongly connected ?-JT of treewidth k for
    P(X) exists
  • Then our alg. will with probability at least
    (1-?) find a JT (C,E) s.t.

quality guarantee
using
samples
and
time
poly samples
poly time
Corollary strongly connected junction trees are
PAC-learnable
15
Related work
Ref. Model Guarantees Time
BachJordan2002 tractable local poly(n)
ChowLiu1968 tree global O(n2 log n)
MeilaJordan2001 tree mix local O(n2 log n)
TeyssierKoller2005 compact local poly(n)
SinghMoore2005 all global exp(n)
KargerSrebro2001 tractable const-factor poly(n)
Abbeelal2006 compact PAC poly(n)
NarasimhanBilmes2004 tractable PAC exp(n)
our work tractable PAC poly(n)
16
Results typical convergence time
Test log-likelihood
good results early on in practice
17
Results log-likelihood
OBS ? local search in limited
in-degree Bayes nets Chow-Liu ? most
likely JTs of treewidth 1 Karger-Srebro ?
constant-factor approximation JTs
better
our method
18
This thesis
Learn/constructstructure
Learn/defineparameters
Inference
P(QEe)
NIPS 2007
1. Learning tractable models efficiently
and with quality guarantees
2. Simplifying large-scale models /
focusing inference on the query
Thesis contributions
3. Learning simple local models by
exploiting evidence assignments
19
Approximate inference is still useful
  • Often learning a tractable graphical model is not
    an option
  • Need domain knowledge
  • Templatized models
  • Markov logic nets
  • Probabilistic relational models
  • Dynamic Bayesian nets
  • This part the (intractable) PGM is a given
  • What can we do with the inference?
  • What if we know the query variables Q and
    evidence Ee?

20
Query-specific simplification
  • This part the (intractable) PGM is a given

Observation often many variables are unknown,
but also not important to the user
Suppose we know the variables Q of interest (the
query)
Observation usually, variables far away from
the query do not affect P(Q) much
21
Query-specific simplification
Observation variables far away from the query
do not affect P(Q) much
these have little effect on P(Q)
query
Idea discard parts of the model that
have little effect on the query
Observation values of potentials are important
want this part first
Our work
  • edge importance from values of potentials
  • efficient algorithms for model simplification
  • focused inference as soft model simplification

22
Belief propagation Pearl1988
  • For every edge Xi-Xj and variable, a message
  • Belief about the marginal over Xi
  • Algorithm until
    convergence
  • Fixed point of BP(?) is the solution

23
Model simplification problem

Model simplificationproblem
which messages to skip updating s.t.-
inference cost gets small enough - BP fixed point
for P(Q) does not change much
24
Edge costs
  • Inference cost IC(i?j)
  • complexity of one BP update for mi?j
  • Approximation value AV(i?j)
  • Measure of influence of mi?j on the belief P(Q)


Model simplificationproblem
Find the set E?E of edges s.t.- ? AV(i?j) ?
max - ? IC(i?j) ? inference budget
maximize fit quality
keep inference affordable
Lemma Model simplification problem is NP-hard
Greedy edge selection gets
-factor
approximation
25
Approximation values
  • Approximation value AV(i?j)
  • Measure of influence of mi?j on the belief P(Q)

(i?j) - how important is it?
(r?q)
mr?q BP?(mv?n)
define path strength(?)
max-sensitivity approximation value AV(i?j) is
the single strongest
dependency (in derivative) that (i?j)
participates in
define AV(i?j) max(i?j)?? path strength(?)
26
Efficient model simplification
max-sensitivity approximation value AV(i?j) is
the single strongest
dependency (in derivative) that (i?j)
participates in
Lemma with max-sensitivity edge values can find
optimal submodel - as the first M
edges expanded by best-first search
- with constant-time computation per expanded
edge
(using MooijKappen2007)
Simplification complexity independent of the size
of the full model(only depends on the solution
size)
Templated models only instantiate model parts
that are in the solution
27
Future work multi-path dependencies
(i?j)
Want to take both of these into account
(r?q)
query
  • All paths possible, but expensive O(E3)
  • k strongest paths?
  • AV(i?j) max(i?j)??1,,?k ?m path strength(?m)
  • best-first search with at most k visits of an
    edge?

28
Perturbation approximation values
(v?n)
mr?q BP?(mv?n)
fix all messagesnot in ?
(r?q)
simple path ?
query
path strength(?) is the largest derivative value
along the path w.r.t the endpoint message
mean value theorem
tighter bound from BP message properties
upper bound on mr?q change
observation do not take the possible range
of the endpoint message into account
define path strength(?)
29
Efficient model simplification
define max-perturbation AV(??i) max(??i)??
path strength(?)
Lemma with max-perturbation edge values,
assuming that the message derivatives along
paths ? are known, can find optimal
submodel - as the first M edges
expanded by best-first search -
with constant-time computation per expanded edge
extra work need to know derivatives along paths ?
solution max-sensitivity best-first search as a
subroutine
30
Future work efficient max-perturbation
simplification
AV(i?j)
only need exact derivative iff??derivativei
s in this range
min???f?
current lower bound onpath strength from BFS
define path strength(?)
extra work need to know derivatives along paths ?
not always!
31
Future work computation trees
1
1
?
2
3
2
4
4
prune computation trees according to edge
importance
1
3
computation tree traversal message update
schedule
1
2
3
2
4
4
4

32
Focused inference
  • BP proceeds until all beliefs converge
  • But we only care about query beliefs
  • Residual importance weighting for convergence
    testing
  • For residual BP ? more attention to more
    important regions

convergence hereis less important
Weigh residuals by edge importance
convergence hereis more important
33
Related work
  • Minimal submodel to have exactly the same
    P(QEe) regardless of the values of potentials
  • Knowledge-based model construction
    Wellmanal1992,RichardsonDomingos2006
  • Graph distance as edge importance measure
    Pentneyal2006
  • Empirical mutual information as variable
    importance measure Pentneyal2007
  • Inference in simplified model to quantify the
    effect of an extra edge exactly
    Kjaerulff1993,ChoiDarwiche2008

34
This thesis
Learn/constructstructure
Learn/defineparameters
Inference
P(QEe)
NIPS 2007
1. Learning tractable models efficiently
and with quality guarantees
2. Simplifying large-scale models /
focusing inference on the query
Thesis contributions
3. Learning simple local models by
exploiting evidence assignments
35
Local models motivation
Common approach
Learn/constructstructure
Approximate parameters
Approximateinference
P(QEe)
This talk, part 1
Learn tractablestructure
optimalparameters
exactinference
P(QEe)
What if no single tractable structure fits well?
36
Local models motivation
What if no single tractable structure fits well?
But locallyalmost lineardependence
Regression analogy
query
no single line fits well
q
qf(e)
solution learn local tractable models
e
evidence
learn tractablestructure
optimalparameters
exactinference
P(QEe)
get evidenceasssignmentEe
learn tractablestructurefor Ee
parametersfor Ee
37
Local models example
exactinference
P(QEe)
get evidenceasssignmentEe
learn tractablestructurefor Ee
parametersfor Ee
example local conditional random fields (CRFs)
global CRF
local CRF
feature
weight
query-specific structure. I?(E)?0, 1
Ee1
Ee1
Ee2
Ee2


Een
Een
38
Learning local models
Need to learn w and QS structure I(E)
known structures for every training point
Ee1
good local structures (e.g. local search)

Een
Ee1
Qq1



Een
Qqn
Iterate!
Ee1
Qq1



Een
Qqn
optimal weights w(convex opt)
known weights w
local CRF
query-specific structure. I?(E)?0, 1
need query values here!cannot use at test time ?
Ee1

Een
39
Learning local models
parametrize I(E) by V I?I(E, V)
learn w and QS structure parameters V
known structures for every training point
optimize V so that I(E, V) mimics the good
local structures well for training data
Ee1, V

Een, V
Ee1
Qq1



Een
Qqn
Iterate!
Ee1
good local structures (e.g. local search)

Een
Ee1
Qq1



Een
Qqn
optimal weights w(convex opt)
known weights w
40
Future work better exploration
need to avoid shallow local minima- multiple
structures per datapoint- stochastic
optimization ? sample structures
will these be different?
known structures for every training point
Ee1
good local structures (e.g. local search)

Een
Ee1
Qq1



Een
Qqn
Ee1
Qq1



Een
Qqn
optimal weights w(convex opt)
known weights w
41
Future work multi-query optimization
separate structure for every query may be too
costlyquery clustering- directly using
evidence- using inferred model parameters (given
w and V)
42
Future work faster local search
need efficient structure learning- amortize
inference cost for scoring multiple search
steps


need support for nuisance vars in structure
scores
43
Recap
Learn/constructstructure
Learn/defineparameters
Inference
P(QEe)
NIPS 2007
1. Learning tractable models efficiently
and with quality guarantees
2. Simplifying large-scale models /
focusing inference on the query
Thesis contributions
3. Learning local tractable models by
exploiting evidence assignments
44
Timeline
  • Validation of QS model simplification
  • Activity recognition data, MLN data
  • QS simplification
  • Multi-path extensions for edge importance
    measures
  • Computation trees connections
  • Max-perturbation computation speedups
  • QS learning
  • Better exploration (stochastic optimization /
    multiple structures per datapoint)
  • Multi-query optimization
  • Validation
  • QS learning
  • Nuisance variables support
  • Local search speedups
  • Quality guarantees
  • Validation
  • Write thesis, defend

Summer 2009
Fall 2009
Spring 2010
Summer 2010
45
Thank you!
Collaborators Carlos Guestrin, Joseph Bradley,
Dafna Shahaf
46
Speeding things up
there are O(Xk) separators here
  • Constraint-based algorithm
  • set L ?
  • for every potential separator S?X s.t. Sk
  • do I(?) estimation, change L
  • find junction tree (C,E) consistent with L

Observation there are X-k separators in (C,E)
?
I(?) estimations for the rest
O(Xk) separators are wasted
  • Faster heuristic
  • until (C,E) passes checks
  • do I(?) estimation, change L
  • find junction tree (C,E) consistent with L

47
Speeding things up
  • Faster heuristic
  • until (C,E) passes checks
  • do I(?) estimation, change L
  • find junction tree (C,E) consistent with L

Recall that our upper bound on I(?)uses all Y?X
\S for Y?k
I(V,X-VS S)??
V
X-VS
Idea get a rough estimate by only
looking at smaller Y (e.g. Y2)
I(Y?V,Y?X-VSS)
Y?X-VS
  • Faster heuristic
  • estimate I(?) with Y2, form L
  • do
  • find junction tree (C,E) consistent with L
  • estimate I(?S??) with Yk for S???S,
    update L
  • check if (C,E) is still an ?-JT with the
    updated I(?S??)
  • until (C,E) passes checks

Y?V
Write a Comment
User Comments (0)
About PowerShow.com