Space-Indexed Dynamic Programming: Learning to Follow Trajectories - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Space-Indexed Dynamic Programming: Learning to Follow Trajectories

Description:

Space-indexed Dynamical Systems and Space-indexed Dynamic Programming. Experimental Results ... large to discretize, can't apply tabular RL/dynamic programming ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 57
Provided by: JeremyK157
Category:

less

Transcript and Presenter's Notes

Title: Space-Indexed Dynamic Programming: Learning to Follow Trajectories


1
Space-Indexed Dynamic Programming Learning to
Follow Trajectories
  • J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu,
    Charles DuHadway
  • Computer Science DepartmentStanford University
  • July 2008, ICML

TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box. AAAAAAAA
2
Outline
  • Reinforcement Learning and Following Trajectories
  • Space-indexed Dynamical Systems and Space-indexed
    Dynamic Programming
  • Experimental Results

3
Reinforcement Learning and Following Trajectories
4
Trajectory Following
  • Consider task of following trajectory in a
    vehicle such as a car or helicopter
  • State space too large to discretize, cant apply
    tabular RL/dynamic programming

5
Trajectory Following
  • Dynamic programming algorithms w/ non-stationary
    policies seem well-suited to task
  • Policy Search by Dynamic Programming (Bagnell,
    et. al), Differential Dynamic Programming
    (Jacobson and Mayne)

6
Dynamic Programming
t1
Divide control task into discrete time steps
7
Dynamic Programming
t1
t2
Divide control task into discrete time steps
8
Dynamic Programming
t4
t5
t3
t1
t2
Divide control task into discrete time steps
9
Dynamic Programming
t4
t5
t3
t1
t2
Proceeding backwards in time, learn policies
fort T, T-1, , 2, 1
10
Dynamic Programming
t4
t5
t3
t1
t2
Proceeding backwards in time, learn policies
fort T, T-1, , 2, 1
11
Dynamic Programming
t4
t5
t3
t1
t2
Proceeding backwards in time, learn policies
fort T, T-1, , 2, 1
12
Dynamic Programming
t4
t5
t3
t1
t2
Proceeding backwards in time, learn policies
fort T, T-1, , 2, 1
13
Dynamic Programming
t4
t5
t3
t1
t2
Key Advantage Policies are local (only need to
perform well over small portion of state space)
14
Problems with Dynamic Programming
Problem 1 Policies from traditional dynamic
programming algorithms are time-indexed
15
Problems with Dynamic Programming
Supposed we learned policy assuming this
distribution over states
16
Problems with Dynamic Programming
But, due to natural stochasticity of environment,
car is actually here at t 5
17
Problems with Dynamic Programming
Resulting policy will perform very poorly
18
Problems with Dynamic Programming
Partial Solution Re-indexingExecute policy
closest to current location, regardless of time
19
Problems with Dynamic Programming
Problem 2 Uncertainty over future states makes
it hard to learn any good policy
20
Problems with Dynamic Programming
Dist. over states at time t 5
Due to stochasticity, large uncertainty over
states in distant future
21
Problems with Dynamic Programming
Dist. over states at time t 5
DP algorithms require learning policy that
performs well over entire distribution
22
Space-Indexed Dynamic Programming
  • Basic idea of Space-Indexed Dynamic Programming
    (SIDP)

Perform DP with respect to space indices (planes
tangent to trajectory)
23
Space-Indexed Dynamical Systems and Dynamic
Programming
24
Difficulty with SIDP
  • No guarantee that taking single action will move
    to next plane along trajectory
  • Introduce notion of space-indexed dynamical system

25
Time-Indexed Dynamical System
  • Creating time-indexed dynamical systems

26
Time-Indexed Dynamical System
  • Creating time-indexed dynamical systems

current state
27
Time-Indexed Dynamical System
  • Creating time-indexed dynamical systems

control action
current state
28
Time-Indexed Dynamical System
  • Creating time-indexed dynamical systems

control action
time derivative of state
current state
29
Time-Indexed Dynamical System
  • Creating time-indexed dynamical systems

Euler integration
30
Space-Indexed Dynamical Systems
  • Creating space-indexed dynamical systems
  • Simulate forward until whenever vehicle hits
    next tangent plane

space index d1
space index d
31
Space-Indexed Dynamical Systems
  • Creating space-indexed dynamical systems

32
Space-Indexed Dynamical Systems
  • Creating space-indexed dynamical systems

(Positive solution exists as long
as controller makes some forward progress)
33
Space-Indexed Dynamical Systems
  • Result is a dynamical system indexed by
    spatial-index variable d rather than time
  • Space-indexed dynamic programming runs DP
    directly on this system

34
Space-Indexed Dynamic Programming
d1
Divide trajectory into discrete space planes
35
Space-Indexed Dynamic Programming
d1
d2
Divide trajectory into discrete space planes
36
Space-Indexed Dynamic Programming
d4
d5
d3
d1
d2
Divide trajectory into discrete space planes
37
Space-Indexed Dynamic Programming
d4
d5
d3
d1
d2
Proceeding backwards, learn policies ford D,
D-1, , 2, 1
38
Space-Indexed Dynamic Programming
d4
d5
d3
d1
d2
Proceeding backwards, learn policies ford D,
D-1, , 2, 1
39
Space-Indexed Dynamic Programming
d4
d5
d3
d1
d2
Proceeding backwards, learn policies ford D,
D-1, , 2, 1
40
Space-Indexed Dynamic Programming
d4
d5
d3
d1
d2
Proceeding backwards, learn policies ford D,
D-1, , 2, 1
41
Problems with Dynamic Programming
Problem 1 Policies from traditional dynamic
programming algorithms are time-indexed
42
Space-Indexed Dynamic Programming
Space indexed DP always executes policy based on
current spatial index
Time indexed DP can execute policy learned for
different location
43
Problems with Dynamic Programming
Problem 2 Uncertainty over future states makes
it hard to learn any good policy
44
Space-Indexed Dynamic Programming
Dist. over states at time t 5
Dist. over states at index d 5
Space indexed DP much tighter distribution over
future states
Time indexed DP wide distribution over future
states
45
Space-Indexed Dynamic Programming
Dist. over states at time t 5
Dist. over states at index d 5
t(5)
Space indexed DP much tighter distribution over
future states
Time indexed DP wide distribution over future
states
46
Experiments
47
Experimental Domain
  • Task following race track trajectory in RC car
    with randomly placed obstacles

48
Experimental Setup
  • Implemented space-indexed version of PSDP
    algorithm
  • Policy chooses steering angle using SVM
    classifier (constant velocity)
  • Used simple textbook model simulator of car
    dynamics to learn policy
  • Evaluated PSDP time-indexed, time-indexed with
    re-indexing and space-indexed

49
Time-Indexed PSDP
50
Time-Indexed PSDP w/ Re-indexing
51
Space-Indexed PSDP
52
Empirical Evaluation
Time-indexed PSDP
Time-indexed PSDP with Re-indexing
Space-indexed PSDP
Cost Infinite (no trajectory succeeds)
Cost 49.32
Cost 59.74
53
Additional Experiments
  • In the paper additional experiments on the
    Stanford Grand Challenge Car using space-indexed
    DDP, and on a simulated helicopter domain using
    space-indexed PSDP

54
Related Work
  • Reinforcement learning / dynamic programming
    Bagnell et al., 2004 Jacobson and Mayne, 1970
    Lagoudakis and Parr, 2003 Langford and Zadrozny,
    2005
  • Differential Dynamic Programming Atkeson, 1994
    Tassa et al., 2008
  • Gain Scheduling, Model Predictive Control Leith
    and Leithead, 2000 Garica et al., 1989

55
Summary
  • Trajectory following uses non-stationary
    policies, but traditional DP / RL algorithms
    suffer because they are time-indexed
  • In this paper, we introduce the notions of a
    space-indexed dynamical system, and space-indexed
    dynamic programming
  • Demonstrated usefulness of these methods on
    real-world control tasks.

56
Thank you!
  • Videos available online athttp//cs.stanford.edu/
    kolter/icml08videos
Write a Comment
User Comments (0)
About PowerShow.com