Title: Performing Bayesian Inference by Weighted Model Counting
1Performing Bayesian Inference by Weighted Model
Counting
- Tian Sang, Paul Beame, and Henry Kautz
- Department of Computer Science Engineering
- University of Washington
- Seattle, WA
2Goal
- Extend success of compilation to SAT work for
NP-complete problems to compilation to SAT for
P-complete problems - Leverage rapid advances in SAT technology
- Example Computing permanent of a 0/1 matrix
- Inference in Bayesian networks (Roth 1996,
Dechter 1999) - Provide practical reasoning tool
- Demonstrate relationship between SAT and
conditioning algorithms - In particular compilation to DNNF (Darwiche
2002, 2004)
3Contributions
- Simple encoding of Bayesian networks into
weighted model counting - Techniques for extending state-of-the-art SAT
algorithms for efficient weighted model counting - Evaluation on computationally challenging domains
- Outperforms join-tree methods on problems with
high tree-width - Competitive with best conditioning methods on
problems with high degree of determinism
4Outline
- Model counting
- Encoding Bayesian networks
- Related Bayesian inference algorithms
- Experiments
- Grid networks
- Plan recognition
- Conclusion
5SAT and SAT
- Given a CNF formula,
- SAT find a satisfying assignment n
- SAT count satisfying assignments
- Example (x ? y) ? (y ? ?z)
- 5 models
- (0,1,0), (0,1,1), (1,1,0), (1,1,1), (1, 0, 0)
- Equivalently satisfying probability 5/23
- Probability that formula is satisfied by a random
truth assignment - Can modify Davis-Putnam-Logemann-Loveland to
calculate this value
6- DPLL for SAT
- DPLL(F)
- if F is empty, return 1
- if F contains an empty clause, return 0
- else choose a variable x to branch
- return (DPLL(Fx1) V DPLL(Fx0))
-
DPLL for SAT DPLL(F) // computes satisfying
probability of F if F is empty, return 1 if F
contains an empty clause, return 0 else choose
a variable x to branch return
0.5DPLL(Fx1 ) 0.5DPLL(Fx0)
7Weighted Model Counting
- Each literal has a weight
- Weight of a model Product of weight of its
literals - Weight of a formula Sum of weight of its models
WMC(F) if F is empty, return 1 if F contains
an empty clause, return 0 else choose a
variable x to branch return weight(x)
WMC(Fx1) weight(?x) WMC(Fx0)
8Cachet
- State of the art model counting program (Sang,
Bacchus, Beame, Kautz, Pitassi 2004) - Key innovation sound integration of component
caching and clause learning - Component analysis (Bayardo Pehoushek 2000) if
formulas C1 and C2 share no variables, - BWMC (C1 ? C2) BWMC (C1) BWMC (C2)
- Caching (Majercik Littman 1998 Darwiche 2002
Bacchus, Dalmao, Pitassi 2003 Beame,
Impagliazzo, Pitassi, Segerland 2003) save and
reuse values of internal nodes of search tree - Clause learning (Marquis-Silva 1996 Bayardo
Shrag 1997 Zhang, Madigan, Moskewicz, Malik
2001) analyze reason for backtracking, store as
a new clause
9Cachet
- State of the art model counting program (Sang,
Bacchus, Beame, Kautz, Pitassi 2004) - Key innovation sound integration of component
caching and clause learning - Naïve combination of all three techniques is
unsound - Can resolve by careful cache management (Sang,
Bacchus, Beame, Kautz, Pitassi 2004) - New branching strategy (VSADS) optimized for
counting (Sang, Beame, Kautz SAT-2005)
10Computing All Marginals
- Task In one counting pass,
- Compute number of models in which each literal is
true - Equivalently compute marginal satisfying
probabilities - Approach
- Each recursion computes a vector of marginals
- At branch point compute left and right vectors,
combine with vector sum - Cache vectors, not just counts
- Reasonable overhead 10 - 40 slower than
counting
11Encoding Bayesian Networks to Weighted Model
Counting
A
0.1
A
B
12Encoding Bayesian Networks to Weighted Model
Counting
A
0.1
A
Chance variable P added with weight(P)0.2
B
13Encoding Bayesian Networks to Weighted Model
Counting
A
0.1
A
and weight(?P)0.8
B
14Encoding Bayesian Networks to Weighted Model
Counting
A
0.1
A
Chance variable Q added with weight(Q)0.6
B
15Encoding Bayesian Networks to Weighted Model
Counting
A
0.1
A
and weight(?Q)0.4
B
16Encoding Bayesian Networks to Weighted Model
Counting
A
w(A)0.1 w(?A)0.9
w(P)0.2 w(?P)0.8
w(Q)0.6 w(?Q)0.4
w(B)1.0 w(?B)1.0
0.1
A
B
17Main Theorem
- Let
- F a weighted CNF encoding of a Bayes net
- E an arbitrary CNF formula, the evidence
- Q an arbitrary CNF formula, the query
- Then
18Exact Bayesian Inference Algorithms
- Junction tree algorithm (Shenoy Shafer 1990)
- Most widely used approach
- Data structure grows exponentially large in
tree-width of underlying graph - To handle high tree-width, researchers developed
conditioning algorithms, e.g. - Recursive conditioning (Darwiche 2001)
- Value elimination (Bacchus, Dalmao, Pitassi 2003)
- Compilation to d-DNNF (Darwiche 2002 Chavira,
Darwiche, Jaeger 2004 Darwiche 2004) - These algorithms become similar to DPLL...
19Techniques
Method Cache index Cache value Branchingheuristic Clause learning?
Weighted Model Counting component probability dynamic ?
Recursive Conditioning partialassignment probability static
Value Elimination dependency set probability semi-dynamic
Compiling to d-DNNF residualformula d-DNNF semi-dynamic ?
20Experiments
- Our benchmarks Grid, Plan Recognition
- Junction tree - Netica
- Recursive conditioning SamIam
- Value elimination Valelim
- Weighted model counting Cachet
- ISCAS-85 and SATLIB benchmarks
- Compilation to d-DNNF timings from (Darwiche
2004) - Weighted model counting - Cachet
21Experiments Grid Networks
- CPTs are set randomly.
- A fraction of the nodes are deterministic,
specified as a parameter ratio. - T is the query node
22Results of ratio0.5
Size JunctionTree RecursiveConditioning ValueElimination Weighted ModelCounting
1010 0.02 0.88 2.0 7.3
1212 0.55 1.6 15.4 38
1414 21 7.9 87 419
1616 X 104 gt20,861 890
1818 X 2,126 X 13,111
10 problems of each size, Xmemory out or time out
23Results of ratio0.75
Size JunctionTree RecursiveConditioning ValueElimination Weighted ModelCounting
1212 0.47 1.5 1.4 1.0
1414 2120 15 8.3 4.7
1616 gt227 93 71 39
1818 X 1,751 gt1,053 81
2020 X gt24,026 gt94,997 248
2222 X X X 1,300
2424 X X X 4,998
24Results of ratio0.9
Size JunctionTree RecursiveConditioning ValueElimination Weighted ModelCounting
1616 259 102 0.55 0.47
1818 X 1151 1.9 1.4
2020 X gt44,675 13 1.7
2424 X X 84 4.5
2626 X X gt8,010 14
3030 X X X 108
25Plan Recognition
- Task
- Given a planning domain described by STRIPS
operators, initial and goal states, and time
horizon - Infer the marginal probabilities of each action
- Abstraction of strategic plan recognition We
know enemys capabilities and goals, what will it
do? - Modified Blackbox planning system (Kautz
Selman 1999) to create instances
26problem variables JunctionTree RecursiveConditioning ValueElimination Weighted ModelCounting
4-step 165 0.16 8.3 0.03 0.03
5-step 177 56 36 0.04 0.03
tire-1 352 X X 0.68 0.12
tire-2 550 X X 4.1 0.09
tire-3 577 X X 24 0.23
tire-4 812 X X 25 1.1
log-1 939 X X 24 0.11
log-2 1337 X X X 7.9
log-3 1413 X X X 9.7
log-4 2303 X X X 65
27ISCAS/SATLIB Benchmarks
Benchmarks reported in (Darwiche 2004) Compiling to d-DNNF WeightedModelCounting
uf200 (100 instances) 13 7
flat200 (100 instances) 50 8
c432 0.1 0.1
c499 6 85
c880 80 17,506
c1355 15 7,057
c1908 187 1,855
28Summary
- Bayesian inference by translation to model
counting is competitive with best known
algorithms for problems with - High tree-width
- High degree of determinism
- Recent conditioning algorithms already make use
of important SAT techniques - Most striking compilation to d-DNNF
- Translation approach makes it possible to quickly
exploit future SAT algorithms and implementations