External Memory Value Iteration - PowerPoint PPT Presentation

About This Presentation

Title:

External Memory Value Iteration

Description:

But what to do, if the agent's state space or policy space is too large to be ... liveness detection (cycle) [Barnat, Brim, Simecek, 07] for liveness detection ... – PowerPoint PPT presentation

Number of Views:30

Avg rating:3.0/5.0

Slides: 28

Provided by: SJ172

Category:

more less

Transcript and Presenter's Notes

Title: External Memory Value Iteration

1
External Memory Value Iteration

Stefan Edelkamp, Shahid Jabbar
Chair for Programming Systems,
University of Dortmund, Germany
Blai Bonet
Departamento de Computacion
Universidad Simon Bolivar, Caracas, Venezuela

2
Motivation Reinforcement Learning

Aim Write Controller to act successfully in the
environment
Minimize Cost/Maximize Rewards

3
Motivation External Reinforcement Learning

Cover deterministic, non-deterministic,
probabilistic environments (and games)
But what to do, if the agents state space or
policy space is too large to be computed and
stored in RAM?
Disk Space is Cheap (500 GB 100)
? External Memory Algorithm

4
Overview

Uniform Search Model
Internal Memory Value Iteration
Existing External Model and BFS
External Memory Value Iteration
Experimental Highlights
Summary Outlook

5
Overview

Uniform Search Model
Internal Memory Value Iteration
Existing External Model and BFS
External Memory Value Iteration
Experimental Highlights
Summary Outlook

6
Uniform Search Modell

Deterministic
Non-Deterministic
Probabilistic
7
Overview

Uniform Search Model
Internal Memory Value Iteration
Existing External Model and BFS
External Memory Value Iteration
Experimental Highlights
Summary Outlook

8
e-Optimal for solving MDPs, AND/OR
trees Problem Needs to have the whole state
space in the main memory.
9
Why External Memory Algorithms ?

Search algorithms perform well as long as they
consume RAM only!
Virtual memory slows down the performance!

Virtual Address Space
0x000000
7 I/Os
Memory Page
0xFFFFFF
10
Overview

Uniform Search Model
Internal Memory Value Iteration
Existing External Memory Model and BFS
External Memory Value Iteration
Experimental Highlights
Summary Outlook

11
External Memory Model Vitter and Shriver, 94
If the input size is very large, running time
depends on the I/Os rather than on the number of
instructions.
M
B
Input of size N gtgt M
12
External Breadth-First Search (Munagala and
Ranade, SODA99)
A
Open (0)
For undirected graphs, subtracting two layers is
enough Munagala Ranade, 99. For directed
graphs, the longest back-edge has to be taken
into account Zhou Hansen, 05.
13
External Memory Algorithms for Implicit Graphs

Frontier Search Korf, 03
External A Edelkamp, Jabbar, Schrödl, 04
Structured Duplicate Detection Zhou Hansen,
04.
Cost-Optimal External Planning Edelkamp, Jabbar,
06
Model Checking for Linear Temporal Logic
Jabbar Edelkamp, 05 for safety error
detection
Edelkamp Jabbar, 06 for liveness detection
(cycle)
Barnat, Brim, Simecek, 07 for liveness
detection (cycle)
Real-Time Model Checking/Scheduling Edelkamp,
Jabbar, 06

14
Overview

Uniform Search Model
Internal Memory Value Iteration
Existing External Memory Model and BFS
External Memory Value Iteration
Experimental Highlights
Summary Outlook

15
External Memory Algorithm for Value Iteration

What makes value iteration different from the
usual external memory search algorithms?
Answer
Propagation of information from states to
predecessors!
? Edges are more important than the states.
Ext-VI works on Edges

16
External Memory Value Iteration

Phase I Generate the edge space by External BFS.
Open(0) Init i -1
while (Open(i-1) ! empty)
Open(i) Succ(Open(i-1))
Externally-Sort-and-Remove-Duplicates(Open(i))
for loc 1 to Locality(Graph)
Open(i) Open(i) \ Open(i - loc)
i
endwhile

Remove previous layers

Merge all BFS layers into one edge list on disk!
Opent Open(0) U Open(1) U U Open(DIAM)
Temp Opent
Sort Opent wrt. the successors Sort Temp wrt.
the predecessors

17
Working of Ext-VIPhase-II
Temp Edge List on Disk Sorted on Predecessors
h
3 2 2 2 2 1
2 0 1 1 1
1 0 0 0 0
(Ø, 1), (1,2), (1,3), (1,4), (2,3), (2,5),
(3,4), (3,8), (4,6), (5,6), (5,7), (6,9), (7,8),
(7,10), (9,8), (9,10)
(Ø,1), (1,2), (1,3), (2,3), (1,4), (3,4), (2,5),
(4,6), (5,6), (5,7), (3,8), (7,8), (9,8), (6,9),
(7,10), (9,10)
h
3 2 2 2 2 2
1 1 1 1 0
0 0 1 0 0
3
2
1
1
2
2
2
2
2
1
0
0
0
1
0
0
h
Opent Edge List on Disk Sorted on Successors
Alternate sorting and update until residual lt
epsilon
18
Complexity Analysis

Phase-I External Memory Breadth-First Search.
Expansion
Scanning the red bucket O(scan(E))
Duplicates Removal
Sorting the green bucket having one state for
every edge from the red bucket.
Scanning and compaction.
O(sort(E))
Subtraction
Removing states of blue buckets (duplicates free)
from the green one.
O(l x scan(E))

Complexity of Phase-I O(l x scan(E)
sort(E) ) I/Os
19
Complexity Analysis

Phase-II Backward Update
Update
Simple block-wise scanning.
Scanning time for red and green files
O(scan(E)) I/Os
External Sort
Sorting the blue file with the updated values to
be used as red file later O(sort(E)) I/Os
Fast External Sort
If E / M lt Max file pointers
O(scan(E)) I/Os

Sorted on preds

Sorted on states
Updated h-values
Total Complexity of Phase-II For
tmax iterations, O(tmax x sort(E)) I/Os With
Fast External Sort O(tmax x scan(E)) I/Os
20
Overview

Uniform Search Model
Internal Memory Value Iteration
Existing External Model and BFS
External Memory Value Iteration
Experimental Highlights
Summary Outlook

21
Experiments 3x3 Sliding Tiles Puzzle
p1.0 heuristic 0 p1.0 heuristic 0 p1.0 heuristic 0 p1.0 heuristic 0 p1.0 heuristic 0
Alg. S/E RAM Iterations Time
VI 181,440 21M 27 6.3
Ext-VI 483,839 11M 32 71.5
p0.9 heuristic Manhattan distance p0.9 heuristic Manhattan distance p0.9 heuristic Manhattan distance p0.9 heuristic Manhattan distance p0.9 heuristic Manhattan distance
Alg. S/E RAM Iterations Time
VI 181,440 21M 35 8.3
Ext-VI 967,677 12M 43 237.4
Number of Iterations differ!!
22
3x4 Sliding Tile Puzzle with p0.9 (State space
12!/2 239 x 106)

On 2 Gigabytes, VI could not generate the state
space.
External VI Finished
Took 45 GB of disk space for the edges.
Total 1,357,171,197 edges.
Took 437 hours and 72 iterations to converge.
e 0.0001
RAM used 1.4 Gigabytes

23
Race Track Domain

Example

Alg. 150x300 RaceTrack
VI Out of mem. gt 2GB
LRTDP Out of mem. gt2 GB 12 hours
LDFS Out of time gt1.5 GB 118 hours
Ext-VI Converged! 1.6GB 91 hours
24
Overview