External Memory Value Iteration - PowerPoint PPT Presentation

About This Presentation
Title:

External Memory Value Iteration

Description:

But what to do, if the agent's state space or policy space is too large to be ... liveness detection (cycle) [Barnat, Brim, Simecek, 07] for liveness detection ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 28
Provided by: SJ172
Category:

less

Transcript and Presenter's Notes

Title: External Memory Value Iteration


1
External Memory Value Iteration
  • Stefan Edelkamp, Shahid Jabbar
  • Chair for Programming Systems,
  • University of Dortmund, Germany
  • Blai Bonet
  • Departamento de Computacion
  • Universidad Simon Bolivar, Caracas, Venezuela

2
Motivation Reinforcement Learning
  • Aim Write Controller to act successfully in the
    environment
  • Minimize Cost/Maximize Rewards

3
Motivation External Reinforcement Learning
  • Cover deterministic, non-deterministic,
    probabilistic environments (and games)
  • But what to do, if the agents state space or
    policy space is too large to be computed and
    stored in RAM?
  • Disk Space is Cheap (500 GB 100)
  • ? External Memory Algorithm

4
Overview
  • Uniform Search Model
  • Internal Memory Value Iteration
  • Existing External Model and BFS
  • External Memory Value Iteration
  • Experimental Highlights
  • Summary Outlook

5
Overview
  • Uniform Search Model
  • Internal Memory Value Iteration
  • Existing External Model and BFS
  • External Memory Value Iteration
  • Experimental Highlights
  • Summary Outlook

6
Uniform Search Modell

Deterministic
Non-Deterministic
Probabilistic
7
Overview
  • Uniform Search Model
  • Internal Memory Value Iteration
  • Existing External Model and BFS
  • External Memory Value Iteration
  • Experimental Highlights
  • Summary Outlook

8
e-Optimal for solving MDPs, AND/OR
trees Problem Needs to have the whole state
space in the main memory.
9
Why External Memory Algorithms ?
  • Search algorithms perform well as long as they
    consume RAM only!
  • Virtual memory slows down the performance!

Virtual Address Space
0x000000
7 I/Os
Memory Page
0xFFFFFF
10
Overview
  • Uniform Search Model
  • Internal Memory Value Iteration
  • Existing External Memory Model and BFS
  • External Memory Value Iteration
  • Experimental Highlights
  • Summary Outlook

11
External Memory Model Vitter and Shriver, 94
If the input size is very large, running time
depends on the I/Os rather than on the number of
instructions.
M
B
Input of size N gtgt M
12
External Breadth-First Search (Munagala and
Ranade, SODA99)
A
Open (0)
For undirected graphs, subtracting two layers is
enough Munagala Ranade, 99. For directed
graphs, the longest back-edge has to be taken
into account Zhou Hansen, 05.
13
External Memory Algorithms for Implicit Graphs
  • Frontier Search Korf, 03
  • External A Edelkamp, Jabbar, Schrödl, 04
  • Structured Duplicate Detection Zhou Hansen,
    04.
  • Cost-Optimal External Planning Edelkamp, Jabbar,
    06
  • Model Checking for Linear Temporal Logic
  • Jabbar Edelkamp, 05 for safety error
    detection
  • Edelkamp Jabbar, 06 for liveness detection
    (cycle)
  • Barnat, Brim, Simecek, 07 for liveness
    detection (cycle)
  • Real-Time Model Checking/Scheduling Edelkamp,
    Jabbar, 06

14
Overview
  • Uniform Search Model
  • Internal Memory Value Iteration
  • Existing External Memory Model and BFS
  • External Memory Value Iteration
  • Experimental Highlights
  • Summary Outlook

15
External Memory Algorithm for Value Iteration
  • What makes value iteration different from the
    usual external memory search algorithms?
  • Answer
  • Propagation of information from states to
    predecessors!
  • ? Edges are more important than the states.
  • Ext-VI works on Edges

16
External Memory Value Iteration
  • Phase I Generate the edge space by External BFS.
  • Open(0) Init i -1
  • while (Open(i-1) ! empty)
  • Open(i) Succ(Open(i-1))
  • Externally-Sort-and-Remove-Duplicates(Open(i))
  • for loc 1 to Locality(Graph)
  • Open(i) Open(i) \ Open(i - loc)
  • i
  • endwhile

Remove previous layers
  • Merge all BFS layers into one edge list on disk!
  • Opent Open(0) U Open(1) U U Open(DIAM)
  • Temp Opent
  • Sort Opent wrt. the successors Sort Temp wrt.
    the predecessors

17
Working of Ext-VIPhase-II
Temp Edge List on Disk Sorted on Predecessors
h
3 2 2 2 2 1
2 0 1 1 1
1 0 0 0 0
(Ø, 1), (1,2), (1,3), (1,4), (2,3), (2,5),
(3,4), (3,8), (4,6), (5,6), (5,7), (6,9), (7,8),
(7,10), (9,8), (9,10)
(Ø,1), (1,2), (1,3), (2,3), (1,4), (3,4), (2,5),
(4,6), (5,6), (5,7), (3,8), (7,8), (9,8), (6,9),
(7,10), (9,10)
h
3 2 2 2 2 2
1 1 1 1 0
0 0 1 0 0
3
2
1
1
2
2
2
2
2
1
0
0
0
1
0
0
h
Opent Edge List on Disk Sorted on Successors
Alternate sorting and update until residual lt
epsilon
18
Complexity Analysis




  • Phase-I External Memory Breadth-First Search.
  • Expansion
  • Scanning the red bucket O(scan(E))
  • Duplicates Removal
  • Sorting the green bucket having one state for
    every edge from the red bucket.
  • Scanning and compaction.
  • O(sort(E))
  • Subtraction
  • Removing states of blue buckets (duplicates free)
    from the green one.
  • O(l x scan(E))

Complexity of Phase-I O(l x scan(E)
sort(E) ) I/Os
19
Complexity Analysis
  • Phase-II Backward Update
  • Update
  • Simple block-wise scanning.
  • Scanning time for red and green files
    O(scan(E)) I/Os
  • External Sort
  • Sorting the blue file with the updated values to
    be used as red file later O(sort(E)) I/Os
  • Fast External Sort
  • If E / M lt Max file pointers
  • O(scan(E)) I/Os

Sorted on preds

Sorted on states
Updated h-values
Total Complexity of Phase-II For
tmax iterations, O(tmax x sort(E)) I/Os With
Fast External Sort O(tmax x scan(E)) I/Os
20
Overview
  • Uniform Search Model
  • Internal Memory Value Iteration
  • Existing External Model and BFS
  • External Memory Value Iteration
  • Experimental Highlights
  • Summary Outlook

21
Experiments 3x3 Sliding Tiles Puzzle
p1.0 heuristic 0 p1.0 heuristic 0 p1.0 heuristic 0 p1.0 heuristic 0 p1.0 heuristic 0
Alg. S/E RAM Iterations Time
VI 181,440 21M 27 6.3
Ext-VI 483,839 11M 32 71.5
p0.9 heuristic Manhattan distance p0.9 heuristic Manhattan distance p0.9 heuristic Manhattan distance p0.9 heuristic Manhattan distance p0.9 heuristic Manhattan distance
Alg. S/E RAM Iterations Time
VI 181,440 21M 35 8.3
Ext-VI 967,677 12M 43 237.4
Number of Iterations differ!!
22
3x4 Sliding Tile Puzzle with p0.9 (State space
12!/2 239 x 106)
  • On 2 Gigabytes, VI could not generate the state
    space.
  • External VI Finished
  • Took 45 GB of disk space for the edges.
  • Total 1,357,171,197 edges.
  • Took 437 hours and 72 iterations to converge.
  • e 0.0001
  • RAM used 1.4 Gigabytes

23
Race Track Domain
  • Example

Alg. 150x300 RaceTrack
VI Out of mem. gt 2GB
LRTDP Out of mem. gt2 GB 12 hours
LDFS Out of time gt1.5 GB 118 hours
Ext-VI Converged! 1.6GB 91 hours
24
Overview
  • Uniform Search Model
  • Internal Memory Value Iteration
  • Existing External Model and BFS
  • External Memory Value Iteration
  • Experimental Highlights
  • Summary Outlook

25
Summary
  • Achievements
  • First I/O efficient disk-based algorithm for
    solving Markov Decision Processes.
  • I/O Complexity Analysis.
  • Features
  • General Cost Model
  • Can Pause-and-Resume Execution to add more Hard
    Disks.
  • Refinements
  • Disk Space eaten by Duplicate States
  • ? Start Early Delayed Duplicate Detection

26
Outlook
  • Application to Bellman-Ford
  • Parallel External Value Iteration During the
    time of internal update, hard disk is not in
    use..

27
Thank You!Questions ?
Write a Comment
User Comments (0)
About PowerShow.com