Approximation Techniques for Automated Reasoning

About This Presentation

Title:

Approximation Techniques for Automated Reasoning

Description:

University of California, Irvine. dechter_at_ics.uci.edu. SP2. 2. Outline ... 'Road map' CSPs: complete algorithms. Variable Elimination. Conditioning (Search) ... – PowerPoint PPT presentation

Number of Views:108

Avg rating:3.0/5.0

Slides: 136

Provided by: ibm76

Category:

more less

Transcript and Presenter's Notes

Title: Approximation Techniques for Automated Reasoning

1
Approximation Techniques for Automated Reasoning

Irina Rish
IBM T.J.Watson Research Center
rish_at_us.ibm.com

Rina Dechter
University of California, Irvine
dechter_at_ics.uci.edu

2
Outline

Introduction
Reasoning tasks
Reasoning approaches elimination and
conditioning
CSPs exact inference and approximations
Belief networks exact inference and
approximations
MDPs decision-theoretic planning
Conclusions

3
Automated reasoning tasks

Propositional satisfiability
Constraint satisfaction
Planning and scheduling
Probabilistic inference
Decision-theoretic planning
Etc.

Reasoning is NP-hard
Approximations
4
Graphical Frameworks

Our focus - graphical frameworks
constraint and belief networks
Nodes variables
Edges dependencies
(constraints, probabilities, utilities)
Reasoning graph transformations

5
Propositional Satisfiability
Example party problem

If Alex goes, then Becky goes
If Chris goes, then Alex goes
Query
Is it possible that Chris goes to the party
but Becky does not?

6
Constraint Satisfaction

Example map coloring
Variables - countries (A,B,C,etc.)
Values - colors (e.g., red, green, yellow)
Constraints

7
Constrained Optimization
Example power plant scheduling
8
Probabilistic Inference
Example medical diagnosis
smoking
visit to Asia
S
V
lung cancer
T
B
C
bronchitis
tuberculosis
abnormality in lungs
A
X
D
dyspnoea (shortness of breath)
X-ray
Query P(T yes S no, D yes) ?
9
Decision-Theoretic Planning
Example robot navigation

State X, Y, Battery_Level
Actions Go_North, Go_South, Go_West, Go_East
Probability of success P
Task reach the goal location ASAP

10
Reasoning Methods

Our focus - conditioning and elimination
Conditioning
(guessing assignments, reasoning by
assumptions)
Branch-and-bound (optimization)
Backtracking search (CSPs)
Cycle-cutset (CSPs, belief nets)
Variable elimination
(inference, propagation of constraints,
probabilities, cost functions)
Dynamic programming (optimization)
Adaptive consistency (CSPs)
Joint-tree propagation (CSPs, belief nets)

11
Conditioning Backtracking Search
12
Bucket EliminationAdaptive Consistency (Dechter
Pear, 1987)

Bucket E E ¹ D, E ¹ C Bucket D D ¹
A Bucket C C ¹ B Bucket B B ¹ A Bucket A
13
Bucket-elimination and conditioning a
uniform framework

Unifying approach to different reasoning tasks
Understanding commonality and differences
Technology transfer
Ease of implementation
Extensions to hybrids conditioningelimination
Approximations

14
Exact CSP techniques complexity
15
Approximations

Exact approaches can be intractable
Approximate conditioning
Local search, gradient descent (optimization,
CSPs, SAT)
Stochastic simulations (belief nets)
Approximate elimination
Local consistency enforcing (CSPs), local
probability propagation (belief nets)
Bounded resolution (SAT)
Mini-bucket approach (belief nets)
Hybrids (conditioningelimination)
Other approximations (e.g., variational)

16
Road map

CSPs complete algorithms
Variable Elimination
Conditioning (Search)
CSPs approximations
Belief nets complete algorithms
Belief nets approximations
MDPs

17
Constraint Satisfaction
Applications

Planning and scheduling
Configuration and design problems
Circuit diagnosis
Scene labeling
Temporal reasoning
Natural language processing

18
Constraint Satisfaction

Example map coloring
Variables - countries (A,B,C,etc.)
Values - colors (e.g., red, green, yellow)
Constraints

19
Constraint Networks
20
The Idea of Elimination
21
Variable Elimination
Eliminate variables one by one constraint propag
ation
Solution generation after elimination is
backtrack-free
22
Elimination Operationjoin followed by projection
Join operation over A finds all solutions
satisfying constraints that involve A
23
Bucket EliminationAdaptive Consistency (Dechter
and Pearl, 1987)
RCBE
RDBE ,
RE
24
Induced Width
Width along ordering d max of previous
neighbors (parents) Induced width
The width in the ordered induced graph, obtained
by connecting parents of each
recursively, from in to 1.
25
Induced width (continued)

Finding minimum- ordering is NP-complete
(Arnborg, 1985)
Greedy ordering heuristics min-width,
min-degree, max-cardinality (Bertele and Briochi,
1972 Freuder 1982)
Tractable classes trees have
of an ordering is computed in O(n) time,
i.e. complexity of elimination is easy to
predict

26
Example crossword puzzle
27
Crossword PuzzleAdaptive consistency
28
Adaptive Consistency as bucket-elimination

Initialize partition constraints into
For in down to 1 // process buckets in the
reverse order
for all relations
do
// join all relations and project-out

If is not empty, add it to
where k is the largest variable
index in Else problem is unsatisfiable
Return the set of all relations (old and new) in
the buckets
29
Solving Trees (Mackworth and Freuder, 1985)
Adaptive consistency is linear for trees
and equivalent to enforcing directional
arc-consistency (recording only unary
constraints)
30
Properties of bucket-elimination(adaptive
consistency)

Adaptive consistency generates a constraint
network that is backtrack-free (can be solved
without deadends).
The time and space complexity of adaptive
consistency along ordering d is
.
Therefore, problems having bounded induced width
are tractable (solved in polynomial time).
Examples of tractable problem classes trees (
), series-parallel networks ( ),
and in general k-trees ( ).

31
Road map

CSPs complete algorithms
Variable Elimination
Conditioning (Search)
CSPs approximations
Belief nets complete algorithms
Belief nets approximations
MDPs

32
The Idea of Conditioning
33
Backtracking SearchHeuristics
Vanilla backtracking variable/value ordering
Heuristics constraint propagation learning

Look-ahead schemes
Forward checking (Haralick and Elliot, 1980)
MAC (full arc-consistency at each node) (Gashnig
1977)
Look back schemes
Backjumping (Gashnig 1977, Dechter 1990, Prosser
1993)
Backmarking (Gashnig 1977)
BJDVO (Frost and Dechter, 1994)
Constraint learning (Dechter 1990, Frost and
Dechter 1994, Bayardo and Miranker 1996)

34
Search complexity distributions
Complexity histograms (deadends, time) gt
continuous distributions (Frost, Rish, and Vila
1997 Selman and Gomez 1997, Hoos 1998)
Frequency (probability)
nodes explored in the search space
35
Constraint Programming

Constraint solving embedded in programming
languages
Allows flexible modeling with algorithms
Logic programs forward checking
Eclipse, Ilog, OPL
Using only look-ahead schemes.

36
Complete CSP algorithms summary

Bucket elimination
adaptive consistency (CSP), directional
resolution (SAT)
elimination operation join-project (CSP),
resolution (SAT)
Time and space exponential in the induced width
(given a variable ordering)
Conditioning
Backtracking searchheuristics
Time complexity worst-case O(exp(n)), but
average-case
is often much better. Space complexity
linear.

37
Road map

CSPs complete algorithms
CSPs approximations
Approximating elimination
Approximating conditioning
Belief nets complete algorithms
Belief nets approximations
MDPs

38
Approximating EliminationLocal Constraint
Propagation

Problem bucket-elimination algorithms are
intractable
when induced width is large
Approximation bound the size of recorded
dependencies,
i.e. perform local constraint propagation
(local inference)
Advantages efficiency may discover
inconsistencies by deducing new constraints
Disadvantages does not guarantee a solution
exist

39
From Global to Local Consistency
40
Constraint Propagation

Arc-consistency, unit resolution, i-consistency

X
Y
?
3
2,
1,
3
2,
1,
1 ? X, Y, Z, T ? 3 X ? Y Y Z T ? Z X ? T

?
3
2,
1,
3
2,
1,
?
T
Z
41
Constraint Propagation

Arc-consistency, unit resolution, i-consistency

X
Y
?
1 ? X, Y, Z, T ? 3 X ? Y Y Z T ? Z X ? T

?
?
T
Z

Incorporated into backtracking search
Constraint programming languages powerful
approach for modeling and solving combinatorial
optimization problems.

42
Arc-consistency
Only domain constraints are recorded
Example
43
Local consistency i-consistency

i-consistency
Any consistent assignment to any i-1
variables is consistent with at least one value
of any i-th variable
strong i-consistency k-consistency for every
directional i-consistency
Given an ordering, each variable is
i-consistent with any i-1 preceding variables
strong directional i-consistency
Given an ordering, each variable is strongly
i-consistent with any i-1 preceding variables

44
Directional i-consistency
Adaptive
d-arc
d-path
45
Enforcing Directional i-consistency

Directional i-consistency bounds
the size of recorded constraints by i.
i1 - arc-consistency
i2 - path-consistency
For , directional i-consistency is
equivalent to adaptive consistency

46
Example SAT

Elimination operation resolution
Directional Resolution adaptive consistency
(Davis and Putnam, 1960 Dechter and Rish, 1994)
Bounded resolution bounds the resolvent size
BDR(i) directional i-consistency (Dechter and
Rish, 1994)
k-closure full k-consistency (van Gelder and
Tsuji, 1996)
In general bounded induced-width resolution
DCDR(b) generalizes cycle-cutset idea limits
induced width by conditioning on cutset
variables
(Rish and Dechter 1996, Rish and Dechter 2000)

47
Directional Resolution ?Adaptive Consistency
48
DR complexity
49
History

1960 resolution-based Davis-Putnam algorithm
1962 resolution step replaced by conditioning
(Davis, Logemann and Loveland, 1962) to
avoid
memory explosion, resulting into a
backtracking search
algorithm known as Davis-Putnam (DP), or
DPLL procedure.
The dependency on induced width was not known in
1960.
1994 Directional Resolution (DR), a rediscovery
of
the original Davis-Putnam, identification of
tractable classes
(Dechter and Rish, 1994).

50
DR versus DPLL complementary properties
(k,m)-tree 3-CNFs (bounded induced width)
Uniform random 3-CNFs (large induced width)
51
Complementary properties gt hybrids
52
BDR-DP(i) bounded resolution backtracking

Complete algorithm run BDR(i) as preprocessing
before
the Davis-Putnam backtracking algorithm.
Empirical results random vs. structured (low-w)
problems

53
DCDR(b)ConditioningDR
54
(No Transcript)
55
DCDR(b) empirical results
56
Approximating Elimination Summary

Key idea local propagation, restricting the
number of
variables involved in recorded constraints
Examples arc-, path-, and i-consistency (CSPs),
bounded resolution, k-closure (SAT)
For SAT
bucket-eliminationdirectional resolution
(original resolution-based Davis-Putnam)
ConditioningDPLL (backtracking search)
Hybrids bounded resolutionsearch
complete algorithms (BDR-DP(i), DCDR(b) )

57
Road map

CSPs complete algorithms
CSPs approximations
Approximating elimination
Approximating conditioning
Belief nets complete algorithms Belief nets
approximations
MDPs

58
Approximating Conditioning Local Search

Problem complete (systematic, exhaustive) search
can be intractable (O(exp(n) worst-case)
Approximation idea explore only parts of search
space
Advantages anytime answer may run into a
solution quicker than systematic approaches
Disadvantages may not find an exact solution
even if there is one cannot detect that a
problem is unsatisfiable

59
Simple greedy search

1. Generate a random assignment to all variables
2. Repeat until no improvement made or solution
found
// hill-climbing step
3. flip a variable (change its value)
that
increases the number of satisfied
constraints

Easily gets stuck at local maxima
60
GSAT local search for SAT(Selman, Levesque and
Mitchell, 1992)
Greatly improves hill-climbing by adding
restarts and sideway moves

For i1 to MaxTries
Select a random assignment A
For j1 to MaxFlips
if A satisfies all constraint,
return A
else flip a variable to maximize the
score
(number of satisfied constraints
if no variable
assignment increases the score,
flip at random)
end
end

61
WalkSAT (Selman, Kautz and Cohen, 1994)
Adds random walk to GSAT

With probability p
random walk flip a variable in some
unsatisfied constraint
With probability 1-p
perform a hill-climbing step

Randomized hill-climbing often solves large and
hard satisfiable problems
62
Other approaches

Different flavors of GSAT with randomization
(GenSAT by Gent and Walsh, 1993 Novelty by
McAllester, Kautz and Selman, 1997)
Simulated annealing
Tabu search
Genetic algorithms
Hybrid approximations
eliminationconditioning

63
Approximating conditioning with elimination

Energy minimization in neural networks
(Pinkas and Dechter, 1995)

For cycle-cutset nodes, use the greedy update
function (relative to neighbors). For the rest
of nodes, run the arc-consistency algorithm
followed by value assignment.
64
GSAT with Cycle-Cutset(Kask and Dechter, 1996)
Input a CSP, a partition of the variables into
cycle-cutset and tree
variables Output an assignment to all the
variables Within each try Generate a random
initial asignment, and then alternate between
the two steps 1. Run Tree algorithm
(arc-consistencyassignment) on the
problem with fixed values of cutset variables.
2. Run GSAT on the problem with fixed values of
tree variables.
65
Results GSAT with Cycle-Cutset(Kask and
Dechter, 1996)
66
Results GSAT with Cycle-Cutset(Kask and
Dechter, 1996)
67
Road map

CSPs complete algorithms
CSPs approximations
Bayesian belief nets
complete algorithms
Bucket-elimination
Relation to join-tree, Pearls poly-tree
algorithm, conditioning
Belief nets approximations
MDPs

68
Belief Networks
Smoking
lung Cancer
Bronchitis
X-ray
Dyspnoea
P(S) P(CS) P(BS) P(XC,S) P(DC,B)
P(S, C, B, X, D)
69
Example Printer Troubleshooting
70
Example Car Diagnosis
71
What are they good for?

Diagnosis P(causesymptom)?

Prediction P(symptomcause)?

Decision-making (given a cost function)

72
Probabilistic Inference Tasks

Belief updating
Finding most probable explanation (MPE)
Finding maximum a-posteriory hypothesis
Finding maximum-expected-utility (MEU) decision

73
Belief Updating
Smoking
lung Cancer
Bronchitis
X-ray
Dyspnoea
P (lung canceryes smokingno, dyspnoeayes )
?
74
Moral Graph
Conditional Probability Distribution (CPD)
Clique in moral graph (family)
75
Belief updating P(Xevidence)?
P(ae0)
B
C
E
D
P(a)
76
Bucket elimination Algorithm elim-bel (Dechter
1996)
77
Finding Algorithm elim-mpe (Dechter 1996)
Elimination operator
78
Generating the MPE-tuple
79
Complexity of elimination
The effect of the ordering
80
Other tasks and algorithms

MAP and MEU tasks
Similar bucket-elimination algorithms - elim-map,
elim-meu (Dechter 1996)
Elimination operation either summation or
maximization
Restriction on variable ordering summation must
precede maximization (i.e. hypothesis or decision
variables are eliminated last)
Other inference algorithms
Join-tree clustering
Pearls poly-tree propagation
Conditioning, etc.

81
Relationship with join-tree clustering
BCE
ADB
A cluster is a set of buckets (a
super-bucket)
ABC
82
Relationship with Pearls belief propagation in
poly-trees
Pearls belief propagation for
single-root query
elim-bel using topological ordering and
super-buckets for families
Elim-bel, elim-mpe, and elim-map are linear for
poly-trees.
83
Conditioning generates the probability tree
Complexity of conditioning exponential time,
linear space
84
ConditioningElimination
85
Super-bucket elimination(Dechter and El Fattah,
1996)

Eliminating several variables at once
Conditioning is done only in super-buckets

86
The idea of super-buckets
Larger super-buckets (cliques) gtmore time but
less space

Complexity
Time exponential in clique (super-bucket) size
Space exponential in separator size

87
Application circuit diagnosis
Problem Given a circuit and its unexpected
output, identify faulty components. The problem
can be modeled as a constraint optimization
problem and solved by bucket elimination.
88
Time-Space Tradeoff
89
Road map

CSPs complete algorithms
CSPs approximations
Belief nets complete algorithms
Belief nets approximations
Local inference mini-buckets
Stochastic simulations
Variational techniques
MDPs

90
Mini-buckets local inference

The idea is similar to i-consistency
bound the size of recorded dependencies
Computation in a bucket is time and space
exponential in the number of variables
involved
Therefore, partition functions in a bucket
into mini-buckets on smaller number of
variables

91
Mini-bucket approximation MPE task
Split a bucket into mini-buckets gtbound
complexity
92
Approx-mpe(i)

Input i max number of variables allowed in a
mini-bucket
Output lower bound (P of a sub-optimal
solution), upper bound

Example approx-mpe(3) versus elim-mpe
93
Properties of approx-mpe(i)

Complexity O(exp(2i)) time and O(exp(i))
time.
Accuracy determined by upper/lower (U/L) bound.
As i increases, both accuracy and complexity
increase.
Possible use of mini-bucket approximations
As anytime algorithms (Dechter and Rish, 1997)
As heuristics in best-first search (Kask and
Dechter, 1999)
Other tasks similar mini-bucket approximations
for belief updating, MAP and MEU (Dechter and
Rish, 1997)

94
Anytime Approximation
95
Empirical Evaluation(Dechter and Rish, 1997
Rish, 1999)

Randomly generated networks
Uniform random probabilities
Random noisy-OR
CPCS networks
Probabilistic decoding
Comparing approx-mpe and anytime-mpe
versus elim-mpe

96
Random networks

Uniform random 60 nodes, 90 edges (200
instances)
In 80 of cases, 10-100 times speed-up while
U/Llt2
Noisy-OR even better results
Exact elim-mpe was infeasible appprox-mpe took
0.1 to 80 sec.

97
CPCS networks medical diagnosis(noisy-OR model)
Test case no evidence
98
The effect of evidence
More likely evidencegthigher MPE gt higher
accuracy (why?)
Likely evidence versus random (unlikely) evidence
99
Probabilistic decoding
Error-correcting linear block code
State-of-the-art
approximate algorithm iterative belief
propagation (IBP) (Pearls poly-tree algorithm
applied to loopy networks)
100
Iterative Belief Proapagation

Belief propagation is exact for poly-trees
IBP - applying BP iteratively to cyclic networks
No guarantees for convergence
Works well for many coding networks

101
approx-mpe vs. IBP
Bit error rate (BER) as a function of noise
(sigma)
102
Mini-buckets summary

Mini-buckets local inference approximation
Idea bound size of recorded functions
Approx-mpe(i) - mini-bucket algorithm for MPE
Better results for noisy-OR than for random
problems
Accuracy increases with decreasing noise in
Accuracy increases for likely evidence
Sparser graphs -gt higher accuracy
Coding networks approx-mpe outperfroms IBP on
low-induced width codes

103
Heuristic search

Mini-buckets record upper-bound heuristics
The evaluation function over
Best-first expand a node with maximal evaluation
function
Branch and Bound prune if f gt upper bound
Properties
an exact algorithm
Better heuristics lead to more prunning

104
Heuristic Function
Given a cost function
P(a,b,c,d,e) P(a) P(ba) P(ca) P(eb,c)
P(db,a)
Define an evaluation function over a partial
assignment as the probability of its best
extension
0
D
0
B
E
0
D
1
A
B
1
D
E
1
D
f(a,e,d) maxb,c P(a,b,c,d,e) P(a)
maxb,c P)ba) P(ca) P(eb,c) P(da,b)
g(a,e,d) H(a,e,d)
105
Heuristic Function
H(a,e,d) maxb,c P(ba) P(ca) P(eb,c)
P(da,b) maxc P(ca) maxb P(eb,c)
P(ba) P(da,b) maxc P(ca) maxb
P(eb,c) maxb P(ba) P(da,b)
H(a,e,d) f(a,e,d) g(a,e,d) H(a,e,d) ³
f(a,e,d) The heuristic function H is compiled
during the preprocessing stage of the
Mini-Bucket algorithm.
106
Heuristic Function
The evaluation function f(xp) can be computed
using function recorded by the Mini-Bucket scheme
and can be used to estimate the probability of
the best extension of partial assignment xpx1,
, xp,
f(xp)g(xp) H(xp )
For example,
maxB P(eb,c) P(da,b)
P(ba) maxC P(ca) hB(e,c) maxD

hB(d,a) maxE hC(e,a) maxA P(a)
hE(a) hD (a)
H(a,e,d) hB(d,a) hC (e,a)
g(a,e,d) P(a)
107
Properties

Heuristic is monotone
Heuristic is admissible
Heuristic is computed in linear time
IMPORTANT
Mini-buckets generate heuristics of varying
strength using control parameter bound I
Higher bound -gt more preprocessing -gt
stronger heuristics -gt less search
Allows controlled trade-off between preprocessing
and search

108
Empirical Evaluation of mini-bucket heuristics
109
Road map

CSPs complete algorithms
CSPs approximations
Belief nets complete algorithms
Belief nets approximations
Local inference mini-buckets
Stochastic simulations
Variational techniques
MDPs

110
Stochastic Simulation

Forward sampling (logic sampling)
Likelihood weighing
Markov Chain Monte Carlo (MCMC) Gibbs sampling

111
Approximation via Sampling
112
Forward Sampling(logic sampling (Henrion, 1988))

113
Forward sampling (example)
Drawback high rejection rate!
114
Likelihood Weighing(Fung and Chang, 1990
Shachter and Peot, 1990)
Clamping evidenceforward sampling weighing
samples by evidence likelihood
Works well for likely evidence!
115
Gibbs Sampling(Geman and Geman, 1984)
Markov Chain Monte Carlo (MCMC) create a Markov
chain of samples
Advantage guaranteed to converge to
P(X) Disadvantage convergence may be slow
116
Gibbs Sampling (contd)(Pearl, 1988)
Markov blanket
117
Road map

CSPs complete algorithms
CSPs approximations
Belief nets complete algorithms
Belief nets approximations
Local inference mini-buckets
Stochastic simulations
Variational techniques
MDPs

118
Variational Approximations

Idea
variational transformation of CPDs simplifies
inference
Advantages
Compute upper and lower bounds on P(Y)
Usually faster than sampling techniques
Disadvantages
More complex and less general re-derived for
each particular form of CPD functions

119
Variational bounds example
log(x)
This approach can be generalized for any concave
(convex) function in order to compute its
upper (lower) bounds
120
Convex duality approach(Jaakkola and Jordan,
1997)
121
Example QMR-DT network(Quick Medical Reference
Decision-Theoretic (Shwe et al., 1991))
600 diseases
4000 findings
Noisy-OR model
122
Inference in QMR-DT
factorized
Positive evidence couples the disease nodes
factorized
Inference complexity O(exp(minp,k)) p
of positive findings, k max family
size (Heckerman, 1989 (Quickscore), Rish and
Dechter, 1998)
123
Variational approach to QMR-DT(Jaakkola and
Jordan, 1997)
The effect of positive evidence is now factorized
(diseases are decoupled)
124
Variational approach (cont.)

Bounds on local CPDs yield a bound on posterior
Two approaches sequential and block
Sequential applies variational transformation to
(a subset of) nodes sequentially during inference
using a heuristic node ordering then optimizes
across variational parameters
Block selects in advance nodes to be
transformed, then selects variational parameters
minimizing the KL-distance between true and
approximate posteriors

125
Block approach
126
Variational approach summary

Variational approximations were successfully
applied to inference in QMR-DT and neural
networks (logistic functions), and to learning
(approximate E step in EM-algorithm)
For more details, see
Saul, Jaakkola, and Jordan, 1996
Jaakkola and Jordan, 1997
Neal and Hinton, 1998
Jordan, 1999

127
Road map

CSPs complete algorithms
CSPs approximations
Belief nets complete algorithms
Belief nets approximations
MDPs
Elimination and Conditioning

128
Decision-Theoretic Planning
Example robot navigation

State X, Y, Battery_Level
Actions Go_North, Go_South, Go_West, Go_East
Probability of success P
Task reach the goal location ASAP

129
Dynamic Belief Networks (DBNs)
Two-stage influence diagram
Interaction graph
130
Markov Decision Process
131
Dynamic Programming Elimination
132
Bucket Elimination
Complexity O(exp(w))
133
MDPs Elimination and Conditioning

Finite-horizon MDPs
dynamic programmingelimination along
temporal ordering (N slices)
Infinite-horizon MDPs
Value Iteration (VI) elimination along
temporal ordering (iterative)
Policy Iteration (PI) conditioning on Aj,
elimination on Xj (iterative)
Bucket elimination non-temporal orderings
Complexity

134
MDPs approximations

Open directions for further research
Applying probabilistic inference approximations
to DBNs
Handling actions (rewards)
Approximating elimination, heuristic search, etc.

135
Conclusions

Common reasoning approaches elimination and
conditioning
Exact reasoning is often intractable gt need
approximations
Approximation principles
Approximating elimination local inference,
bounding size of dependencies among variables
(cliques in a problems graph).
Mini-buckets, IBP, i-consistency enforcing
Approximating conditioning local search,
stochastic simulations
Other approximations variational techniques,
etc.
Further research
Combining orthogonal approximation approaches
Better understanding of what works well where
which approximation suits which problem structure
Other approximation paradigms (e.g., other ways
of approximating probabilities, constraints, cost
functions)