Title: MemoryEfficient Inference in Relational Domains
1Memory-Efficient Inference in Relational Domains
- Parag Singla Pedro Domingos
- Computer Science Engineering
- University of Washington
2Outline
- Motivation
- Satisfiability and Relational Inference
- Memory-Efficient Inference
- Experiments
- Discussion
3Motivation
- Most problems in AI are relational
- Multiple types of objects
- Relations between objects
- An efficient inference approach
Propositionali-zation followed by satisfiability
testing Kautz Selman 96 - MPE/MAP inference in a large class of SRL models
can be done using weighted satisfiability
Richardson Domingos 06
4Problem
- Exponential cost of propositionalization
- Memory cost is O(( objects)clause-arity)
- Applicability is severely limited
- Even moderately sized domains tend to blow out of
memory
5Example Scientific Research Domain
1000 Papers , 100 Authors
Author(person,paper)
100,000 possible groundings .. But only a few
thousand true
Author(person1,paper) ? Author(person2,paper)
? Coauthor(person1,person2)
10 million possible groundings .. But only tens
of thousands unsatisfied
6Solution
- Most real-world domains characterized by extreme
sparseness - Majority of predicates are false
- Majority of clauses are satisfied
- Exploit sparseness
- Embodied in lazy variation of WalkSAT
- Create only potentially unsatisfied clause
groundings - LazySAT
- Memory-cost is O ( potentially unsatisfied
clauses)
7Outline
- Motivation
- Satisfiability and Relational Inference
- Memory-Efficient Inference
- Experiments
- Discussion
8Satisfiability
- KB Set of formulas over Boolean variables
- Every KB can be converted to a CNF form
- (X1 ? X5 ? X7) ? .. (X12 ? X53 ? X5) ?
- SAT Problem of finding a satisfying assignment
- Protypical NP complete problem
- Weighted SAT
- Each clause given a weight
- Maximize sum of weights of satisfied clauses
- One of most efficient approaches is stochastic
local search (e.g. WalkSAT Selman et al. 96)
9 (Max)WalkSAT
for i ? 1 to max-tries do soln random truth
assignment to all atoms for j ? 1 to
max-flips do if ? weights(sat. clauses)
threshold then return soln c
? random unsatisfied clause with
probability p vf ? a randomly
chosen variable from c else
for each variable v in c do
compute DeltaGain(v)
vf ? v with highest DeltaGain(v) soln ?
soln with vf flipped return failure, best
solution found // DeltaGain(v) returns the
increase in ? weights(sat. clauses) // caused by
flipping v
initialization
random move
greedy move
10Relational Inference
- First-Order Logic (FOL) explicitly represents a
domains relational structure - Focus on function free finite FOL
- Propositionalize the first order KB
- Replace universal quantification by conjunction
- Replace existential quantification by disjunction
- Perform satisfiability testing over
propositiona-lized KB
11Statistical Relational Learning
- Statistical relational models explicitly deal
with - Relational structure
- Probabilistic dependencies
- Markov Logic Richardson Domingos 06
- Weighted first-order formulas
- Subsumes many other SRL models
- MAP/MPE inference in Markov Logic is an instance
of weighted satisfiability
12Outline
- Motivation
- Satisfiability and Relational Inference
- Memory-Efficient Inference
- Experiments
- Discussion
13Naïve Approach
- Create the groundings and keep in memory
- True atoms
- Unsatisfied clauses
- Memory-cost is O( unsatisfied clauses)
- Problem
- Need to go to the KB for each flip
- Too slow!
- Solution Idea Keep more things in memory
- A list of active atoms
- Potentially unsatisfied clauses (active clauses)
14LazySAT Definitions
- An atom is an Active Atom if
- It is in the initial set of active atoms
- It was flipped at some point during the search
- A clause is an Active Clause if
- It can be made unsatisfied by flipping zero or
more active atoms in it
15LazySAT The Basics
- Activate all the atoms appearing in clauses
unsatisfied by evidence DB - Create the corresponding clauses
- Randomly assign truth value to all active atoms
- Activate an atom when it is flipped if not
already so - Potentially activate additional clauses
- No need to go to the KB for calculating the
change in cost for flipping an active atom
16LazySAT
for i ? 1 to max-tries do active_atoms ?
atoms in clauses unsatisfied by DB
active_clauses ? clauses activated by
active_atoms soln random truth assignment to
active_atoms for j ? 1 to max-flips do
if ? weights(sat. clauses) threshold then
return soln c ? random unsatisfied
clause with probability p vf ? a
randomly chosen variable from c else
for each variable v in c do
compute DeltaGain(v), using weighted_KB if vf ?
active_atoms vf ? v with highest
DeltaGain(v) if vf ? active_atoms then
activate vf and add clauses activated by vf
soln ? soln with vf flipped return failure,
best soln found
17LazySAT
for i ? 1 to max-tries do active_atoms ?
atoms in clauses unsatisfied by DB
active_clauses ? clauses activated by
active_atoms soln random truth assignment to
active_atoms for j ? 1 to max-flips do
if ? weights(sat. clauses) threshold then
return soln c ? random unsatisfied
clause with probability p vf ? a
randomly chosen variable from c else
for each variable v in c do
compute DeltaGain(v), using weighted_KB if vf ?
active_atoms vf ? v with highest
DeltaGain(v) if vf ? active_atoms then
activate vf and add clauses activated by vf
soln ? soln with vf flipped return failure,
best soln found
18LazySAT
for i ? 1 to max-tries do active_atoms ?
atoms in clauses unsatisfied by DB
active_clauses ? clauses activated by
active_atoms soln random truth assignment to
active_atoms for j ? 1 to max-flips do
if ? weights(sat. clauses) threshold then
return soln c ? random unsatisfied
clause with probability p vf ? a
randomly chosen variable from c else
for each variable v in c do
compute DeltaGain(v), using weighted_KB if vf ?
active_atoms vf ? v with highest
DeltaGain(v) if vf ? active_atoms then
activate vf and add clauses activated by vf
soln ? soln with vf flipped return failure,
best soln found
19LazySAT
for i ? 1 to max-tries do active_atoms ?
atoms in clauses unsatisfied by DB
active_clauses ? clauses activated by
active_atoms soln random truth assignment to
active_atoms for j ? 1 to max-flips do
if ? weights(sat. clauses) threshold then
return soln c ? random unsatisfied
clause with probability p vf ? a
randomly chosen variable from c else
for each variable v in c do
compute DeltaGain(v), using weighted_KB if vf ?
active_atoms vf ? v with highest
DeltaGain(v) if vf ? active_atoms then
activate vf and add clauses activated by vf
soln ? soln with vf flipped return failure,
best soln found
20LazySAT Performance Analysis
- Solution Quality
- Performs the same sequence of flips
- Same result as WalkSAT
- Memory cost
- O( potentially unsatisfied clauses)
- Time cost
- Much lower initialization cost
- Cost of creating active clauses amortized over
many flips
21Outline
- Motivation
- Satisfiability and Relational Inference
- Memory-Efficient Inference
- Experiments
- Discussion
22Experiments
- Two domains
- De-duplicating citation databases
- Cora
- BibServ
- Planning
- Blocks world
- Used the Alchemy system Kok et al. 2005
- Experiments run on 3 GHz machines with
3.46 GB of RAM
23De-duplication
- Weighted KB consisting of 33 first-order rules
- High similarity score gt Same field
- Same record gt Same fields
- Cora
- Cleaned version of McCallums hand-labeled
dataset - 1295 citations to 132 different research papers
- BibServ
- Public repository of half-a-million citations
- Experimented on user donated subset (21,805
citations)
24Methodology
- Compared using varying number of records
- Memory Number of clauses grounded
- Speed Average Flips/sec
- Results reported over an average of 5 randomly
chosen subsets for a given record count
25Results Memory
Cora
BibServ
n2.97
n2.98
n1.75
n2.34
Memory Reduction on full DB 300X
Memory Reduction on full DB 400,000X
26Results Speed
Cora
BibServ
27Outline
- Motivation
- Satisfiability and Relational Inference
- Memory-Efficient Inference
- Experiments
- Discussion
28Conclusion
- Satisfiability testing is an effective approach
to inference in relational domains - Limitation Exponential cost of
propositionali-zation - LazySAT
- Exploits sparseness
- Reduces memory cost by many orders of magnitude
- Same solution with similar speed
29Future Work
- Extending LazySAT to other SAT solvers, MCMC
- Degrading gracefully when number of clauses
exceeds available memory - Combining LazySAT with KBMC
- LazySAT available in the Alchemy system
- http//www.cs.washington.edu/ai/alchemy