MemoryEfficient Inference in Relational Domains - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

MemoryEfficient Inference in Relational Domains

Description:

DeltaGain(v) returns the increase in weights(sat. clauses) // caused by flipping v ... Focus on function free finite FOL. Propositionalize the first order KB ... – PowerPoint PPT presentation

Number of Views:55

Avg rating:3.0/5.0

Slides: 30

Provided by: Dude7

Category:

more less

Transcript and Presenter's Notes

Title: MemoryEfficient Inference in Relational Domains

1
Memory-Efficient Inference in Relational Domains

Parag Singla Pedro Domingos
Computer Science Engineering
University of Washington

2
Outline

Motivation
Satisfiability and Relational Inference
Memory-Efficient Inference
Experiments
Discussion

3
Motivation

Most problems in AI are relational
Multiple types of objects
Relations between objects
An efficient inference approach
Propositionali-zation followed by satisfiability
testing Kautz Selman 96
MPE/MAP inference in a large class of SRL models
can be done using weighted satisfiability
Richardson Domingos 06

4
Problem

Exponential cost of propositionalization
Memory cost is O(( objects)clause-arity)
Applicability is severely limited
Even moderately sized domains tend to blow out of
memory

5
Example Scientific Research Domain
1000 Papers , 100 Authors
Author(person,paper)
100,000 possible groundings .. But only a few
thousand true
Author(person1,paper) ? Author(person2,paper)
? Coauthor(person1,person2)
10 million possible groundings .. But only tens
of thousands unsatisfied
6
Solution

Most real-world domains characterized by extreme
sparseness
Majority of predicates are false
Majority of clauses are satisfied
Exploit sparseness
Embodied in lazy variation of WalkSAT
Create only potentially unsatisfied clause
groundings
LazySAT
Memory-cost is O ( potentially unsatisfied
clauses)

7
Outline

Motivation
Satisfiability and Relational Inference
Memory-Efficient Inference
Experiments
Discussion

8
Satisfiability

KB Set of formulas over Boolean variables
Every KB can be converted to a CNF form
(X1 ? X5 ? X7) ? .. (X12 ? X53 ? X5) ?
SAT Problem of finding a satisfying assignment
Protypical NP complete problem
Weighted SAT
Each clause given a weight
Maximize sum of weights of satisfied clauses
One of most efficient approaches is stochastic
local search (e.g. WalkSAT Selman et al. 96)

9
(Max)WalkSAT
for i ? 1 to max-tries do soln random truth
assignment to all atoms for j ? 1 to
max-flips do if ? weights(sat. clauses)
threshold then return soln c
? random unsatisfied clause with
probability p vf ? a randomly
chosen variable from c else
for each variable v in c do
compute DeltaGain(v)
vf ? v with highest DeltaGain(v) soln ?
soln with vf flipped return failure, best
solution found // DeltaGain(v) returns the
increase in ? weights(sat. clauses) // caused by
flipping v
initialization
random move
greedy move
10
Relational Inference

First-Order Logic (FOL) explicitly represents a
domains relational structure
Focus on function free finite FOL
Propositionalize the first order KB
Replace universal quantification by conjunction
Replace existential quantification by disjunction
Perform satisfiability testing over
propositiona-lized KB

11
Statistical Relational Learning

Statistical relational models explicitly deal
with
Relational structure
Probabilistic dependencies
Markov Logic Richardson Domingos 06
Weighted first-order formulas
Subsumes many other SRL models
MAP/MPE inference in Markov Logic is an instance
of weighted satisfiability

12
Outline

Motivation
Satisfiability and Relational Inference
Memory-Efficient Inference
Experiments
Discussion

13
Naïve Approach

Create the groundings and keep in memory
True atoms
Unsatisfied clauses
Memory-cost is O( unsatisfied clauses)
Problem
Need to go to the KB for each flip
Too slow!
Solution Idea Keep more things in memory
A list of active atoms
Potentially unsatisfied clauses (active clauses)

14
LazySAT Definitions

An atom is an Active Atom if
It is in the initial set of active atoms
It was flipped at some point during the search
A clause is an Active Clause if
It can be made unsatisfied by flipping zero or
more active atoms in it

15
LazySAT The Basics

Activate all the atoms appearing in clauses
unsatisfied by evidence DB
Create the corresponding clauses
Randomly assign truth value to all active atoms
Activate an atom when it is flipped if not
already so
Potentially activate additional clauses
No need to go to the KB for calculating the
change in cost for flipping an active atom

16
LazySAT
for i ? 1 to max-tries do active_atoms ?
atoms in clauses unsatisfied by DB
active_clauses ? clauses activated by
active_atoms soln random truth assignment to
active_atoms for j ? 1 to max-flips do
if ? weights(sat. clauses) threshold then
return soln c ? random unsatisfied
clause with probability p vf ? a
randomly chosen variable from c else
for each variable v in c do
compute DeltaGain(v), using weighted_KB if vf ?
active_atoms vf ? v with highest
DeltaGain(v) if vf ? active_atoms then
activate vf and add clauses activated by vf
soln ? soln with vf flipped return failure,
best soln found
17
LazySAT
for i ? 1 to max-tries do active_atoms ?
atoms in clauses unsatisfied by DB
active_clauses ? clauses activated by
active_atoms soln random truth assignment to
active_atoms for j ? 1 to max-flips do
if ? weights(sat. clauses) threshold then
return soln c ? random unsatisfied
clause with probability p vf ? a
randomly chosen variable from c else
for each variable v in c do
compute DeltaGain(v), using weighted_KB if vf ?
active_atoms vf ? v with highest
DeltaGain(v) if vf ? active_atoms then
activate vf and add clauses activated by vf
soln ? soln with vf flipped return failure,
best soln found
18
LazySAT
for i ? 1 to max-tries do active_atoms ?
atoms in clauses unsatisfied by DB
active_clauses ? clauses activated by
active_atoms soln random truth assignment to
active_atoms for j ? 1 to max-flips do
if ? weights(sat. clauses) threshold then
return soln c ? random unsatisfied
clause with probability p vf ? a
randomly chosen variable from c else
for each variable v in c do
compute DeltaGain(v), using weighted_KB if vf ?
active_atoms vf ? v with highest
DeltaGain(v) if vf ? active_atoms then
activate vf and add clauses activated by vf
soln ? soln with vf flipped return failure,
best soln found
19
LazySAT
for i ? 1 to max-tries do active_atoms ?
atoms in clauses unsatisfied by DB
active_clauses ? clauses activated by
active_atoms soln random truth assignment to
active_atoms for j ? 1 to max-flips do
if ? weights(sat. clauses) threshold then
return soln c ? random unsatisfied
clause with probability p vf ? a
randomly chosen variable from c else
for each variable v in c do
compute DeltaGain(v), using weighted_KB if vf ?
active_atoms vf ? v with highest
DeltaGain(v) if vf ? active_atoms then
activate vf and add clauses activated by vf
soln ? soln with vf flipped return failure,
best soln found
20
LazySAT Performance Analysis

Solution Quality
Performs the same sequence of flips
Same result as WalkSAT
Memory cost
O( potentially unsatisfied clauses)
Time cost
Much lower initialization cost
Cost of creating active clauses amortized over
many flips

21
Outline

Motivation
Satisfiability and Relational Inference
Memory-Efficient Inference
Experiments
Discussion

22
Experiments

Two domains
De-duplicating citation databases
Cora
BibServ
Planning
Blocks world
Used the Alchemy system Kok et al. 2005
Experiments run on 3 GHz machines with
3.46 GB of RAM

23
De-duplication

Weighted KB consisting of 33 first-order rules
High similarity score gt Same field
Same record gt Same fields
Cora
Cleaned version of McCallums hand-labeled
dataset
1295 citations to 132 different research papers
BibServ
Public repository of half-a-million citations
Experimented on user donated subset (21,805
citations)

24
Methodology

Compared using varying number of records
Memory Number of clauses grounded
Speed Average Flips/sec
Results reported over an average of 5 randomly
chosen subsets for a given record count

25
Results Memory
Cora
BibServ
n2.97
n2.98
n1.75
n2.34
Memory Reduction on full DB 300X
Memory Reduction on full DB 400,000X
26
Results Speed
Cora
BibServ
27
Outline

Motivation
Satisfiability and Relational Inference
Memory-Efficient Inference
Experiments
Discussion

28
Conclusion

Satisfiability testing is an effective approach
to inference in relational domains
Limitation Exponential cost of
propositionali-zation
LazySAT
Exploits sparseness
Reduces memory cost by many orders of magnitude
Same solution with similar speed

29
Future Work