Title: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding
1Exact Mode Estimation for POMDPs based on
Constraint Decomposition and Symbolic Encoding
- Martin Sachenbacher
- July 1, 2003
2Exact vs. Approximate ME
- Problems of ME with incomplete belief state
- Dead ends (no solutions)
- Incorrect leading solutions
- Incorrect probabilities of solutions
- Usefulness of ME with complete belief state
- As accuracy reference
- As performance reference
- As a starting point for approximations
- Key Compact representation of belief state
- Map to semiring-based CSP
- Decompose Hypergraph into Hypertree
- Encode Tree Nodes symbolically as ADDs
3Outline
- SCSPs (Semiring-based CSPs)
- Mapping State Constraints to SCSPs
- Mapping Transition Constraints to SCSPs
- ADDs (Algebraic Decision Diagrams)
- Hypertree Decompositions of SCSPs
- Solving Tree-structured SCSPs
- Exact Mode Estimation for POMDPs as
Decomposition/ADD-based SCSP Solving - Demonstration Two Switches Example
4SCSPs (Semiring-based CSPs)
- Generalization of CSPs Bistarelli et al. 97
- Domain D, Variables V, Set S, Type T ? V
- Constraints are mappings Dk ? S
- Operations ? (for join) and ? (for projection) on
S - (S, ?, ?, 0, 1) must for form c-semiring
- Dynamic Programming applicable to all SCSPs
- Examples
- (0,1, ?, ?, 0, 1) Classical CSPs
- (R, min, , ?, 0) Weighted CSPs
- (0,1, max, , 0, 1) Probabilistic CSPs
5Encoding States as SCSPs
- Example Or-Gate
- P(Orok) 99, P(Orfty) 1
Or
1
xt in1 in2 out
f
ok lo lo look lo hi hiok hi lo
hiok hi hi hifty
0.990.990.990.990.01
6Encoding Observations as SCSPs
- Example (Probabilistic) Observation
Distribution over values for xi
xi
f
0.9
0123
0.60.90.30.0
P
0.6
0.3
xi
0
1
2
3
7Encoding Transitions as SCSPs
- Example (Probabilistic) CCA
Transition Function
cmdoff
xt cmd xt1
f
0.9
0.90.10.10.90.90.10.10.9
0 off 00 on 00 off 10 on 11 off 01
on 01 off 11 on 1
0
cmdoff
cmdon
0.9
0.9
1
0.9
cmdon
8Algebraic Decision Diagrams
- ADDs Symbolic (graph-based) representation of
functions 0,1n ? R - Generalization of BDDs (functions 0,1n ? 0,1)
- Canonicity of representation (as for BDDs)
- Efficient package CUDD
A
B
B
C
C
0
1
2
3
9ADD Join Operations
- Multiplication, addition, maximum,
- Generalization of BDD operations
ABC
f
fg
g
fgt1
fg
5f
max(f,g)
000001010011100101110111
01121223
32010001
32131224
02020003
055105101015
00010111
32121223
10Example
- Summation of ADD f, ADD g
A
A
A
B
B
B
B
B
B
C
C
C
C
C
C
C
C
3
2
1
0
0
1
2
3
4
3
2
1
11ADD Projection Operations
- ?(f,X) (and ?(f,X)) obtained by summing
(multiplying) values of tuples that differ only
w.r.t. X
ABC
f
AB
?(f,C)
?(f,C)
000001010011100101110111
01121223
00011011
1335
0226
12ADD Projection Operations
- For optimization, we require operation ?max(f,X)
that yields maximum value of tuples differing
only w.r.t. X
ABC
f
AB
?(f,C)
?(f,C)
?max(f,C)
000001010011100101110111
01121223
00011011
1335
0226
1223
Not part of CUDD, but easy to implement as
variant of ?/?(f,X).
13Solving SCSPs using Decomposition
- Transform SCSPs into Hypertree H(T,?,?)
- Compute constraint ?(v) for each node v
- Bottom-up phase for computing values
- Top-down phase for extracting solutions
14Pseudocode for Bottom-Up Phase
- Function solve(v)
- For Each child ? children(v)
- ?(v) ? ?(v) ? ?max(?(child), ?(child) \ ?(v))
- Next child
- Return ?(v)
Generalization of (Semi-)Join Operation
15Example
X
A 1
Or1
And1
B 1
F 0
Y
Or2
C 1
And2
G 1
D 1
Z
Or3
E 0
16Example
- Hypertree Decomposition of Boolean Polycell
ok ok 1 0 0 0 0 1ok ok 1 0 0 0 1 1ok ok 1 0 0 1
0 1
U.98505
O3A1CEFXYZ
v0
Y,Z
Y
C,X
O2BDY
A2GYZ
O1ACX
v1
v2
v3
ok 1 1 1fty 1 1 1fty 1 0 1fty 1 1 0fty 1 0 0
ok 1 1 1fty 1 1 1fty 1 1 0
ok 1 1 1fty 1 1 1fty 1 1 0
U.99
U.99
U.995
U.005
U.01
U.01
17Example
U.98505
ok ok 1 0 0 0 0 1ok ok 1 0 0 0 1 1ok ok 1 0 0 1
0 1
ADD with20 nodes,5 leaves
fty ok 1 0 0 0 1 1fty ok 1 0 0 0 1 0fty ok 1 0
0 1 0 0fty ok 1 0 0 1 0 1fty ok 1 0 0 0 1 1fty
ok 1 0 0 1 0 1
U.00995
O3A1CEFXYZ
v0
U.00495
U.00005
18Example
- After multiplication with ?max(?(v1),A2,G)
ok ok 1 0 0 0 1 1
U.98012
ADD with28 nodes,7 leaves
fty ok 1 0 0 0 1 1
U.00990
ok ok 1 0 0 0 0 1ok ok 1 0 0 1 0 1
U.00492
O3A1CEFXYZ
v0
U2.4E-5
U4.9E-5
U2.5E-7
19Example
- After multiplication with ?max(?(v2),O2,B,D)
ok ok 1 0 0 0 1 1
ADD with30 nodes,8 leaves
U.97032
fty ok 1 0 0 0 1 1
U.00980
U.00487
ok fty 1 0 0 0 1 1
U4.9E-5
U4.9E-7
O3A1CEFXYZ
v0
U2.4E-7
U2.5E-9
20Example
- After multiplication with ?max(?(v3),O1,A)
ADD with35 nodes,10 leaves
ok ok 1 0 0 0 1 1
U.00970
ok fty 1 0 0 1 1 1
U.00482
U9.8E-5
fty ok 1 0 0 0 1 1
Best SolutionUmax .0097
U4.8E-5
U4.9E-7
O3A1CEFXYZ
v0
U2.4E-7
U4.9E-9
U2.4E-9
U2.5E-11
21Pseudocode for Top-Down Phase
No search queue necessary
- Function extractSolutions(vroot)
- E ? edges(vroot)
- ? ? ?(vroot)
- ? ? ?max(?, vars(?) \ decvars(?)?vars(E))
- While E ? ? Do
- e ? choose(E)
- v ? son-node(e)
- E ? (E \ e) ? edges(v)
- ?0-1 ? (??0)
- ?div ? ?max(?0-1 ? ?(v), vars(?))
- ? ? (? ? ?(v)) ?-1 ?div
- ? ? ?max(?, vars(?) \ decvars(?)?vars(E))
- End While
Restrict todecision andshared variables
Divisor
22Example
- Initial ? ?max(?(vroot),E,F)
ok ok 1 0 1 1
O3A1CXYZ
U.00970
ok fty 1 1 1 1
U.00482
U9.8E-5
fty ok 1 0 1 1
U4.8E-5
ADD with21 tuples, 33 nodes, 10 leaves
U4.9E-7
U2.4E-7
U4.9E-9
U2.4E-9
U2.5E-11
23Example
- After processing edge(v0,v3)
fty ok ok 1 1
O1O3A1YZ
U.00970
ok ok fty 1 1
U.00482
U9.8E-5
fty fty ok 1 1
U4.8E-5
ADD with21 tuples, 32 nodes, 10 leaves
U4.9E-7
U2.4E-7
U4.9E-9
U2.4E-9
U2.5E-11
24Example
- After processing edge(v0,v2)
fty ok ok ok 1 1
O1O2O3A1YZ
U.00970
ok ok ok fty 1 1
U.00482
U9.8E-5
fty fty ok ok 1 1fty ok fty ok 1 1
ADD with30 tuples, 47 nodes, 11 leaves
U4.8E-5
U9.9E-7
U4.9E-7
U2.5E-11
25Example
- After processing edge(v0,v1)
fty ok ok ok ok
O1O2O3A1A2
U.00970
ok ok ok fty ok
U.00482
U9.8E-5
fty fty ok ok okfty ok fty ok ok
ADD with26 tuples,35 nodes, 12 leaves
U4.8E-5
U2.4E-5
Solutions 26
U9.9E-7
Easy to focus on leading solutions.
U2.5E-11
26Application Exact ME for POMDPs
- Given POMDP (Feasible States, Observables,
Control Actions, Transitions), Observations - Approach Complete representation of belief state
(through decomposition and symbolic encoding) - Benefit Allows for exploiting Markov property
S0 S1 Sn
S0 S1 Sn
Time t
Time t1
27Algorithm Exact ME for POMDPs
- Construct Hypertree (offline)
- Construct State-ADDs for each node (offline)
- Construct Transition-ADDs for each node (offline)
- Repeat for each time step
- Multiply nodes with Obs-ADDs (Condition on
Observations) - Establish consistency in the tree (Bottom-up)
- Extract leading solution(s) from the tree
(Top-down) - Multiply nodes with Transition-ADDs, project on
xt1, set xt xt1, multiply with State-ADDs
(Transition Expansion) - Complexity Polynomial in width of Hypertree
28Example
- Adapted from Jim Kuriens thesis
- t0 Sw1.cmd on
- t1 Or.out lo, Sw1.cmd idl, Sw2.cmd on
- t2 Or.out lo
Sw1
Or
Switches more likely to fail than Or-Gate
hi
1
hi
Sw2
29Example
cmdoff,idl
cmdon,idl
0.95
0.95
0.95
t1 t2
cmdoff
t1 t2
lo lo lo hihi lohi hi
on
off
lo lo hi hi
cmdon
0.95
0.05
0.05
fty
true
1.0
30Example
xt cmd xt1
f
on on onon off offon idl onon
ftyoff on onoff off offoff idl offoff
ftyfty fty
0.950.950.950.050.950.950.950.051.0
xt t1 t2
f
on lo loon hi hioff fty
1.01.01.01.0
31Example
xt in1 in2 out
f
ok lo lo look lo hi hiok hi lo
hiok hi hi hifty
1.01.01.01.01.0
0.99
in1 in2 out
lo lo lolo hi hihi lo hihi hi
hi
ok
0.01
xt xt1
f
fty
true
ok okok ftyfty fty
0.990.011.0
1.0
32Example
- Initial belief state (chosen)
- p(Swon) p(Swoff) 0.475, p(Swfty) 0.05
- p(Orok) 0.99, p(Orfty) 0.01
- Observations/Commands
- t0 Sw1.cmdon
- t1 Or.outlo, Sw1.cmdidl, Sw2.cmdon
- t2 Or.outlo
- Leading Solutions
- t0 Sw1on/off, Sw2on/off, Orok
- t1 Sw1fty, Sw2off, Orok
- t2 Sw1on, Sw2on, Orfty
33Conclusion
- SCSPs elegant and general representation
- ADDs encoding of SCSPs efficient in average case,
exponential in the number of variables in worst
case - Decomposition factors problem into set of ADDs,
each confined to small numbers of variables - The two methods complement each other well
- How far can we get with this combination?