Title: External Memory Value Iteration
1External Memory Value Iteration
- Stefan Edelkamp, Shahid Jabbar
- Chair for Programming Systems,
- University of Dortmund, Germany
- Blai Bonet
- Departamento de Computacion
- Universidad Simon Bolivar, Caracas, Venezuela
2Motivation Reinforcement Learning
- Aim Write Controller to act successfully in the
environment - Minimize Cost/Maximize Rewards
3Motivation External Reinforcement Learning
- Cover deterministic, non-deterministic,
probabilistic environments (and games) - But what to do, if the agents state space or
policy space is too large to be computed and
stored in RAM? - Disk Space is Cheap (500 GB 100)
- ? External Memory Algorithm
4Overview
- Uniform Search Model
- Internal Memory Value Iteration
- Existing External Model and BFS
- External Memory Value Iteration
- Experimental Highlights
- Summary Outlook
5Overview
- Uniform Search Model
- Internal Memory Value Iteration
- Existing External Model and BFS
- External Memory Value Iteration
- Experimental Highlights
- Summary Outlook
6Uniform Search Modell
Deterministic
Non-Deterministic
Probabilistic
7Overview
- Uniform Search Model
- Internal Memory Value Iteration
- Existing External Model and BFS
- External Memory Value Iteration
- Experimental Highlights
- Summary Outlook
8e-Optimal for solving MDPs, AND/OR
trees Problem Needs to have the whole state
space in the main memory.
9Why External Memory Algorithms ?
- Search algorithms perform well as long as they
consume RAM only! - Virtual memory slows down the performance!
Virtual Address Space
0x000000
7 I/Os
Memory Page
0xFFFFFF
10Overview
- Uniform Search Model
- Internal Memory Value Iteration
- Existing External Memory Model and BFS
- External Memory Value Iteration
- Experimental Highlights
- Summary Outlook
11External Memory Model Vitter and Shriver, 94
If the input size is very large, running time
depends on the I/Os rather than on the number of
instructions.
M
B
Input of size N gtgt M
12External Breadth-First Search (Munagala and
Ranade, SODA99)
A
Open (0)
For undirected graphs, subtracting two layers is
enough Munagala Ranade, 99. For directed
graphs, the longest back-edge has to be taken
into account Zhou Hansen, 05.
13External Memory Algorithms for Implicit Graphs
- Frontier Search Korf, 03
- External A Edelkamp, Jabbar, Schrödl, 04
- Structured Duplicate Detection Zhou Hansen,
04. - Cost-Optimal External Planning Edelkamp, Jabbar,
06 - Model Checking for Linear Temporal Logic
- Jabbar Edelkamp, 05 for safety error
detection - Edelkamp Jabbar, 06 for liveness detection
(cycle) - Barnat, Brim, Simecek, 07 for liveness
detection (cycle) - Real-Time Model Checking/Scheduling Edelkamp,
Jabbar, 06
14Overview
- Uniform Search Model
- Internal Memory Value Iteration
- Existing External Memory Model and BFS
- External Memory Value Iteration
- Experimental Highlights
- Summary Outlook
15External Memory Algorithm for Value Iteration
- What makes value iteration different from the
usual external memory search algorithms? - Answer
- Propagation of information from states to
predecessors! - ? Edges are more important than the states.
- Ext-VI works on Edges
16External Memory Value Iteration
- Phase I Generate the edge space by External BFS.
- Open(0) Init i -1
- while (Open(i-1) ! empty)
- Open(i) Succ(Open(i-1))
- Externally-Sort-and-Remove-Duplicates(Open(i))
- for loc 1 to Locality(Graph)
- Open(i) Open(i) \ Open(i - loc)
- i
- endwhile
Remove previous layers
- Merge all BFS layers into one edge list on disk!
- Opent Open(0) U Open(1) U U Open(DIAM)
- Temp Opent
- Sort Opent wrt. the successors Sort Temp wrt.
the predecessors
17Working of Ext-VIPhase-II
Temp Edge List on Disk Sorted on Predecessors
h
3 2 2 2 2 1
2 0 1 1 1
1 0 0 0 0
(Ø, 1), (1,2), (1,3), (1,4), (2,3), (2,5),
(3,4), (3,8), (4,6), (5,6), (5,7), (6,9), (7,8),
(7,10), (9,8), (9,10)
(Ø,1), (1,2), (1,3), (2,3), (1,4), (3,4), (2,5),
(4,6), (5,6), (5,7), (3,8), (7,8), (9,8), (6,9),
(7,10), (9,10)
h
3 2 2 2 2 2
1 1 1 1 0
0 0 1 0 0
3
2
1
1
2
2
2
2
2
1
0
0
0
1
0
0
h
Opent Edge List on Disk Sorted on Successors
Alternate sorting and update until residual lt
epsilon
18Complexity Analysis
- Phase-I External Memory Breadth-First Search.
- Expansion
- Scanning the red bucket O(scan(E))
- Duplicates Removal
- Sorting the green bucket having one state for
every edge from the red bucket. - Scanning and compaction.
- O(sort(E))
- Subtraction
- Removing states of blue buckets (duplicates free)
from the green one. - O(l x scan(E))
Complexity of Phase-I O(l x scan(E)
sort(E) ) I/Os
19Complexity Analysis
- Phase-II Backward Update
- Update
- Simple block-wise scanning.
- Scanning time for red and green files
O(scan(E)) I/Os - External Sort
- Sorting the blue file with the updated values to
be used as red file later O(sort(E)) I/Os - Fast External Sort
- If E / M lt Max file pointers
- O(scan(E)) I/Os
Sorted on preds
Sorted on states
Updated h-values
Total Complexity of Phase-II For
tmax iterations, O(tmax x sort(E)) I/Os With
Fast External Sort O(tmax x scan(E)) I/Os
20Overview
- Uniform Search Model
- Internal Memory Value Iteration
- Existing External Model and BFS
- External Memory Value Iteration
- Experimental Highlights
- Summary Outlook
21Experiments 3x3 Sliding Tiles Puzzle
p1.0 heuristic 0 p1.0 heuristic 0 p1.0 heuristic 0 p1.0 heuristic 0 p1.0 heuristic 0
Alg. S/E RAM Iterations Time
VI 181,440 21M 27 6.3
Ext-VI 483,839 11M 32 71.5
p0.9 heuristic Manhattan distance p0.9 heuristic Manhattan distance p0.9 heuristic Manhattan distance p0.9 heuristic Manhattan distance p0.9 heuristic Manhattan distance
Alg. S/E RAM Iterations Time
VI 181,440 21M 35 8.3
Ext-VI 967,677 12M 43 237.4
Number of Iterations differ!!
223x4 Sliding Tile Puzzle with p0.9 (State space
12!/2 239 x 106)
- On 2 Gigabytes, VI could not generate the state
space. - External VI Finished
- Took 45 GB of disk space for the edges.
- Total 1,357,171,197 edges.
- Took 437 hours and 72 iterations to converge.
- e 0.0001
- RAM used 1.4 Gigabytes
23Race Track Domain
Alg. 150x300 RaceTrack
VI Out of mem. gt 2GB
LRTDP Out of mem. gt2 GB 12 hours
LDFS Out of time gt1.5 GB 118 hours
Ext-VI Converged! 1.6GB 91 hours
24Overview
- Uniform Search Model
- Internal Memory Value Iteration
- Existing External Model and BFS
- External Memory Value Iteration
- Experimental Highlights
- Summary Outlook
25Summary
- Achievements
- First I/O efficient disk-based algorithm for
solving Markov Decision Processes. - I/O Complexity Analysis.
- Features
- General Cost Model
- Can Pause-and-Resume Execution to add more Hard
Disks. - Refinements
- Disk Space eaten by Duplicate States
- ? Start Early Delayed Duplicate Detection
26Outlook
- Application to Bellman-Ford
- Parallel External Value Iteration During the
time of internal update, hard disk is not in
use..
27Thank You!Questions ?