Title: Inference
1Inference
2Overview
- The MC-SAT algorithm
- Knowledge-based model construction
- Lazy inference
- Lifted inference
3MCMC Gibbs Sampling
state ? random truth assignment for i ? 1 to
num-samples do for each variable x
sample x according to P(xneighbors(x))
state ? state with new value of x P(F) ? fraction
of states in which F is true
4But Insufficient for Logic
- ProblemDeterministic dependencies break
MCMCNear-deterministic ones make it very slow - SolutionCombine MCMC and WalkSAT? MC-SAT
algorithm
5The MC-SAT Algorithm
- MC-SAT MCMC SAT
- MCMC Slice sampling with an auxiliary variable
- for each clause
- SAT Wraps around SampleSAT (a uniform
sampler) - to sample from highly non-uniform
distributions - Sound Satisfies ergodicity detailed balance
- Efficient Orders of magnitude faster than
- Gibbs and other MCMC algorithms
6Auxiliary-Variable Methods
- Main ideas
- Use auxiliary variables to capture dependencies
- Turn difficult sampling into uniform sampling
- Given distribution P(x)
- Sample from f (x,u), then discard u
7Slice Sampling Damien et al. 1999
U
P(x)
Slice
u(k)
X
x(k1)
x(k)
8Slice Sampling
- Identifying the slice may be difficult
- Introduce an auxiliary variable ui for each ?i
9The MC-SAT Algorithm
- Approximate inference for Markov logic
- Use slice sampling in MCMC
- Auxiliary var. ui for each clause Ci
- Ci unsatisfied 0 ? ui ? 1
- ? exp ( wi fi ( x ) ) ? ui for any next state x
- Ci satisfied 0 ? ui ? exp ( wi )
- ? With prob. 1 exp ( wi ), next state x must
satisfy Ci - to ensure that exp ( wi fi(x) ) ? ui
10The MC-SAT Algorithm
- Select random subset M of satisfied clauses
- Larger wi ? Ci more likely to be selected
- Hard clause (wi ? ?) Always selected
- Slice ? States that satisfy clauses in M
- Sample uniformly from these
11The MC-SAT Algorithm
- X ( 0 ) ? A random solution satisfying all hard
clauses - for k ? 1 to num_samples
- M ? Ø
- forall Ci satisfied by X ( k 1 )
- With prob. 1 exp ( wi ) add Ci to M
- endfor
- X ( k ) ? A uniformly random solution satisfying
M - endfor
12The MC-SAT Algorithm
- Sound Satisfies ergodicity and detailed balance
- (Assuming we have a perfect uniform sampler)
- Approximately uniform sampler Wei et al. 2004
- SampleSAT WalkSAT Simulated annealing
- WalkSAT Find a solution very efficiently
- But may be highly non-uniform
- Sim. Anneal. Uniform sampling as temperature ? 0
- But very slow to reach a solution
- Trade off uniformity vs. efficiency
- by tuning the prob. of WS steps vs. SA steps
13Combinatorial Explosion
- Problem If there are n constantsand the
highest clause arity is c,the ground network
requires O(n ) memory(and inference time grows
in proportion) - Solutions
- Knowledge-based model construction
- Lazy inference
- Lifted inference
c
14Knowledge-BasedModel Construction
- Basic idea
- Most of ground network may be unnecessary,because
evidence renders query independent of it - Assumption Evidence is conjunction of ground
atoms - Knowledge-based model construction (KBMC)
- First construct minimum subset of network
neededto answer query (generalization of KBMC) - Then apply MC-SAT (or other)
15Ground Network Construction
network ? Ø queue ? query nodes repeat node ?
front(queue) remove node from queue add
node to network if node not in evidence then
add neighbors(node) to queue until
queue Ø
16Example Grounding
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
P( Cancer(B) Smokes(A), Friends(A,B),
Friends(B,A))
17Example Grounding
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
P( Cancer(B) Smokes(A), Friends(A,B),
Friends(B,A))
18Example Grounding
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
P( Cancer(B) Smokes(A), Friends(A,B),
Friends(B,A))
19Example Grounding
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
P( Cancer(B) Smokes(A), Friends(A,B),
Friends(B,A))
20Example Grounding
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
P( Cancer(B) Smokes(A), Friends(A,B),
Friends(B,A))
21Example Grounding
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
P( Cancer(B) Smokes(A), Friends(A,B),
Friends(B,A))
22Example Grounding
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
P( Cancer(B) Smokes(A), Friends(A,B),
Friends(B,A))
23Example Grounding
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
P( Cancer(B) Smokes(A), Friends(A,B),
Friends(B,A))
24Example Grounding
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
P( Cancer(B) Smokes(A), Friends(A,B),
Friends(B,A))
25Lazy Inference
- Most domains are extremely sparse
- Most ground atoms are false
- Therefore most clauses are trivially satisfied
- We can exploit this by
- Having a default state for atoms and clauses
- Grounding only those atoms and clauses with
non-default states - Typically reduces memory (and time) by many
orders of magnitude
26Example Scientific Research
1000 Papers , 100 Authors
Author(person,paper)
100,000 possible groundings But only a few
thousand are true
Author(person1,paper) ? Author(person2,paper)
? Coauthor(person1,person2)
10 million possible groundings But only tens
of thousands are unsatisfied
27Lazy Inference
- Here LazySAT (lazy version of WalkSAT)
- Method is applicable to many other algorithms
(including MC-SAT)
28Naïve Approach
- Create the groundings and keep in memory
- True atoms
- Unsatisfied clauses
- Memory cost is O( unsatisfied clauses)
- Problem
- Need to go to the KB for each flip
- Too slow!
- Solution Idea Keep more things in memory
- A list of active atoms
- Potentially unsatisfied clauses (active clauses)
29LazySAT Definitions
- An atom is an Active Atom if
- It is in the initial set of active atoms
- It was flipped at some point during the search
- A clause is an Active Clause if
- It can be made unsatisfied by flipping zero or
more active atoms in it
30LazySAT The Basics
- Activate all the atoms appearing in clauses
unsatisfied by evidence DB - Create the corresponding clauses
- Randomly assign truth value to all active atoms
- Activate an atom when it is flipped if not
already so - Potentially activate additional clauses
- No need to go to the KB for calculating the
change in cost for flipping an active atom
31LazySAT
for i ? 1 to max-tries do active_atoms ?
atoms in clauses unsatisfied by DB
active_clauses ? clauses activated by
active_atoms soln random truth assignment to
active_atoms for j ? 1 to max-flips do
if ? weights(sat. clauses) threshold then
return soln c ? random unsatisfied
clause with probability p vf ? a
randomly chosen variable from c else
for each variable v in c do
compute DeltaGain(v), using weighted_KB if vf ?
active_atoms vf ? v with highest
DeltaGain(v) if vf ? active_atoms then
activate vf and add clauses activated by vf
soln ? soln with vf flipped return failure,
best soln found
32LazySAT
for i ? 1 to max-tries do active_atoms ?
atoms in clauses unsatisfied by DB
active_clauses ? clauses activated by
active_atoms soln random truth assignment to
active_atoms for j ? 1 to max-flips do
if ? weights(sat. clauses) threshold then
return soln c ? random unsatisfied
clause with probability p vf ? a
randomly chosen variable from c else
for each variable v in c do
compute DeltaGain(v), using weighted_KB if vf ?
active_atoms vf ? v with highest
DeltaGain(v) if vf ? active_atoms then
activate vf and add clauses activated by vf
soln ? soln with vf flipped return failure,
best soln found
33LazySAT
for i ? 1 to max-tries do active_atoms ?
atoms in clauses unsatisfied by DB
active_clauses ? clauses activated by
active_atoms soln random truth assignment to
active_atoms for j ? 1 to max-flips do
if ? weights(sat. clauses) threshold then
return soln c ? random unsatisfied
clause with probability p vf ? a
randomly chosen variable from c else
for each variable v in c do
compute DeltaGain(v), using weighted_KB if vf ?
active_atoms vf ? v with highest
DeltaGain(v) if vf ? active_atoms then
activate vf and add clauses activated by vf
soln ? soln with vf flipped return failure,
best soln found
34LazySAT
for i ? 1 to max-tries do active_atoms ?
atoms in clauses unsatisfied by DB
active_clauses ? clauses activated by
active_atoms soln random truth assignment to
active_atoms for j ? 1 to max-flips do
if ? weights(sat. clauses) threshold then
return soln c ? random unsatisfied
clause with probability p vf ? a
randomly chosen variable from c else
for each variable v in c do
compute DeltaGain(v), using weighted_KB if vf ?
active_atoms vf ? v with highest
DeltaGain(v) if vf ? active_atoms then
activate vf and add clauses activated by vf
soln ? soln with vf flipped return failure,
best soln found
35Example
Coa(C,A) ? Coa(A,A) ? Coa(C,A)
Coa(A,A)
False
Coa(A,B) ? Coa(B,C) ? Coa(A,C)
Coa(A,B)
False
Coa(C,A) ? Coa(A,B) ? Coa(C,B)
Coa(A,C)
False
Coa(C,B) ? Coa(B,A) ? Coa(C,A)
Coa(C,B) ? Coa(B,B) ? Coa(C,B)
. . .
. . .
36Example
Coa(C,A) ? Coa(A,A) ? Coa(C,A)
Coa(A,A)
False
Coa(A,B) ? Coa(B,C) ? Coa(A,C)
Coa(A,B)
True
Coa(C,A) ? Coa(A,B) ? Coa(C,B)
Coa(A,C)
False
Coa(C,B) ? Coa(B,A) ? Coa(C,A)
Coa(C,B) ? Coa(B,B) ? Coa(C,B)
. . .
. . .
37Example
Coa(C,A) ? Coa(A,A) ? Coa(C,A)
Coa(A,A)
False
Coa(A,B) ? Coa(B,C) ? Coa(A,C)
Coa(A,B)
True
Coa(C,A) ? Coa(A,B) ? Coa(C,B)
Coa(A,C)
False
Coa(C,B) ? Coa(B,A) ? Coa(C,A)
Coa(C,B) ? Coa(B,B) ? Coa(C,B)
. . .
. . .
38LazySAT Performance
- Solution quality
- Performs the same sequence of flips
- Same result as WalkSAT
- Memory cost
- O( potentially unsatisfied clauses)
- Time cost
- Much lower initialization cost
- Cost of creating active clauses amortized over
many flips
39Lifted Inference
- We can do inference in first-order logic without
grounding the KB (e.g. resolution) - Lets do the same for inference in MLNs
- Group atoms and clauses into indistinguishable
sets - Do inference over those
- First approach Lifted variable elimination(not
practical) - Here Lifted belief propagation
40Belief Propagation
Features (f)
Nodes (x)
41Lifted Belief Propagation
Features (f)
Nodes (x)
42Lifted Belief Propagation
Features (f)
Nodes (x)
43Lifted Belief Propagation
?,? Functions of edge counts
?
?
Features (f)
Nodes (x)
44Lifted Belief Propagation
- Form lifted network composed of supernodesand
superfeatures - Supernode Set of ground atoms that all send
andreceive same messages throughout BP - Superfeature Set of ground clauses that all send
and receive same messages throughout BP - Run belief propagation on lifted network
- Guaranteed to produce same results as ground BP
- Time and memory savings can be huge
45Forming the Lifted Network
- 1. Form initial supernodesOne per predicate and
truth value(true, false, unknown) - 2. Form superfeatures by doing joins of their
supernodes - 3. Form supernodes by projectingsuperfeatures
down to their predicatesSupernode Groundings
of a predicate with same number of projections
from each superfeature - 4. Repeat until convergence
46Example
Evidence Smokes(Ana) Friends(Bob,Charles),
Friends(Charles,Bob)
N people in the domain
47Example
Evidence Smokes(Ana) Friends(Bob,Charles),
Friends(Charles,Bob)
Intuitive Grouping
Smokes(Bob) Smokes(Charles)
Smokes(James) Smokes(Harry)
Smokes(Ana)
48Initialization
Supernodes
Superfeatures
Smokes(Ana)
Smokes(X) X ? Ana
Friends(Bob, Charles) Friends(Charles, Bob)
Friends(Ana, X) Friends(X, Ana) Friends(Bob,X) X
? Charles .
49Joining the Supernodes
Supernodes
Superfeatures
Smokes(Ana)
Smokes(Ana)
Smokes(X) X ? Ana
Friends(Bob, Charles) Friends(Charles, Bob)
Friends(Ana, X) Friends(X, Ana) Friends(Bob,X) X
? Charles .
50Joining the Supernodes
Supernodes
Superfeatures
Smokes(Ana) ? Friends(Ana, X)
Smokes(Ana)
Smokes(X) X ? Ana
Friends(Bob, Charles) Friends(Charles, Bob)
Friends(Ana, X) Friends(X, Ana) Friends(Bob,X) X
? Charles .
51Joining the Supernodes
Supernodes
Superfeatures
Smokes(Ana) ? Friends(Ana, X) ? Smokes(X) X
? Ana
Smokes(Ana)
Smokes(X) X ? Ana
Friends(Bob, Charles) Friends(Charles, Bob)
Friends(Ana, X) Friends(X, Ana) Friends(Bob,X) X
? Charles .
52Joining the Supernodes
Supernodes
Superfeatures
Smokes(Ana) ? Friends(Ana, X) ? Smokes(X) X
? Ana
Smokes(Ana)
Smokes(X) X ? Ana
Smokes(X) ? Friends(X, Ana) ? Smokes(Ana) X ? Ana
Friends(Bob, Charles) Friends(Charles, Bob)
Friends(Ana, X) Friends(X, Ana) Friends(Bob,X) X
? Charles .
53Joining the Supernodes
Supernodes
Superfeatures
Smokes(Ana) ? Friends(Ana, X) ? Smokes(X) X
? Ana
Smokes(Ana)
Smokes(X) X ? Ana
Smokes(X) ? Friends(X, Ana) ? Smokes(Ana) X ? Ana
Friends(Bob, Charles) Friends(Charles, Bob)
- Smokes(Bob) ? Friends(Bob, Charles)
- ? Smokes(Charles)
Friends(Ana, X) Friends(X, Ana) Friends(Bob,X) X
? Charles .
54Joining the Supernodes
Supernodes
Superfeatures
Smokes(Ana) ? Friends(Ana, X) ? Smokes(X) X
? Ana
Smokes(Ana)
Smokes(X) X ? Ana
Smokes(X) ? Friends(X, Ana) ? Smokes(Ana) X ? Ana
Friends(Bob, Charles) Friends(Charles, Bob)
- Smokes(Bob) ? Friends(Bob, Charles)
- ? Smokes(Charles)
Friends(Ana, X) Friends(X, Ana) Friends(Bob,X) X
? Charles .
Smokes(Bob) ? Friends(Bob, X) ? Smokes(X) X ?
Charles
55Projecting the Superfeatures
Supernodes
Superfeatures
Smokes(Ana) ? Friends(Ana, X) ? Smokes(X) X ? Ana
Smokes(X) ? Friends(X, Ana) ? Smokes(Ana) X ? Ana
- Smokes(Bob) ? Friends(Bob, Charles)
- ? Smokes(Charles)
Smokes(Bob) ? Friends(Bob,X) ? Smokes(X) X ?
Charles
56Projecting the Superfeatures
Supernodes
Superfeatures
Smokes(Ana) ? Friends(Ana, X) ? Smokes(X) X ? Ana
Smokes(Ana)
Smokes(X) ? Friends(X, Ana) ? Smokes(Ana) X ? Ana
- Smokes(Bob) ? Friends(Bob, Charles)
- ? Smokes(Charles)
Smokes(Bob) ? Friends(Bob,X) ? Smokes(X) X ?
Charles
57Projecting the Superfeatures
Supernodes
Superfeatures
Smokes(Ana) ? Friends(Ana, X) ? Smokes(X) X ? Ana
Smokes(Ana)
Smokes(X) ? Friends(X, Ana) ? Smokes(Ana) X ? Ana
- Populate with
- projection counts
- Smokes(Bob) ? Friends(Bob, Charles)
- ? Smokes(Charles)
Smokes(Bob) ? Friends(Bob,X) ? Smokes(X) X ?
Charles
58Projecting the Superfeatures
Supernodes
Superfeatures
Smokes(Ana) ? Friends(Ana, X) ? Smokes(X) X ? Ana
Smokes(Ana)
Smokes(X) ? Friends(X, Ana) ? Smokes(Ana) X ? Ana
- Smokes(Bob) ? Friends(Bob, Charles)
- ? Smokes(Charles)
Smokes(Bob) ? Friends(Bob,X) ? Smokes(X) X ?
Charles
59Projecting the Superfeatures
Supernodes
Superfeatures
Smokes(Ana) ? Friends(Ana, X) ? Smokes(X) X ? Ana
Smokes(Ana)
Smokes(X) ? Friends(X, Ana) ? Smokes(Ana) X ? Ana
- Smokes(Bob) ? Friends(Bob, Charles)
- ? Smokes(Charles)
Smokes(Bob) ? Friends(Bob,X) ? Smokes(X) X ?
Charles
60Projecting the Superfeatures
Supernodes
Superfeatures
Smokes(Ana) ? Friends(Ana, X) ? Smokes(X) X ? Ana
Smokes(Ana)
Smokes(X) ? Friends(X, Ana) ? Smokes(Ana) X ? Ana
- Smokes(Bob) ? Friends(Bob, Charles)
- ? Smokes(Charles)
Smokes(Bob) ? Friends(Bob,X) ? Smokes(X) X ?
Charles
61Projecting the Superfeatures
Supernodes
Superfeatures
Smokes(Ana) ? Friends(Ana, X) ? Smokes(X) X ? Ana
Smokes(Ana)
Smokes(X) ? Friends(X, Ana) ? Smokes(Ana) X ? Ana
- Smokes(Bob) ? Friends(Bob, Charles)
- ? Smokes(Charles)
Smokes(Bob) ? Friends(Bob,X) ? Smokes(X) X ?
Charles
62Projecting the Superfeatures
Supernodes
Superfeatures
Smokes(Ana) ? Friends(Ana, X) ? Smokes(X) X ? Ana
Smokes(Ana)
Smokes(X) ? Friends(X, Ana) ? Smokes(Ana) X ? Ana
Smokes(Bob) Smokes(Charles)
- Smokes(Bob) ? Friends(Bob, Charles)
- ? Smokes(Charles)
Smokes(Bob) ? Friends(Bob,X) ? Smokes(X) X ?
Charles
63Projecting the Superfeatures
Supernodes
Superfeatures
Smokes(Ana) ? Friends(Ana, X) ? Smokes(X) X ? Ana
Smokes(Ana)
Smokes(X) ? Friends(X, Ana) ? Smokes(Ana) X ? Ana
Smokes(Bob) Smokes(Charles)
- Smokes(Bob) ? Friends(Bob, Charles)
- ? Smokes(Charles)
Smokes(X) X ? Ana, Bob, Charles
Smokes(Bob) ? Friends(Bob,X) ? Smokes(X) X ?
Charles
64Theorem
- There exists a unique minimal lifted network
- The lifted network construction algo. finds it
- BP on lifted network gives same result ason
ground network
65Representing SupernodesAnd Superfeatures
- List of tuples Simple but inefficient
- Resolution-like Use equality and inequality
- Form clusters (in progress)
66Open Questions
- Can we do approximate KBMC/lazy/lifting?
- Can KBMC, lazy and lifted inference be combined?
- Can we have lifted inference over both
probabilistic and deterministic dependencies?
(Lifted MC-SAT?) - Can we unify resolution and lifted BP?
- Can other inference algorithms be lifted?