Planning Policies Using Dynamic Optimization - PowerPoint PPT Presentation

About This Presentation
Title:

Planning Policies Using Dynamic Optimization

Description:

Title: Handling uncertainty over time: predicting, estimating, recognizing Author: cga Last modified by: cga Created Date: 1/27/2004 2:24:05 AM Document presentation ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 13
Provided by: cga50
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Planning Policies Using Dynamic Optimization


1
Planning Policies UsingDynamic Optimization
  • Chris Atkeson 2012

2
Example One Link Swing Up
3
One Link Swing Up
  • State
  • Action
  • Cost function

4
Possible Trajectories
5
What is a policy?
  • Function mapping state to command u(x)

6
Policy
7
How can we compute a policy?
  • Optimize trajectory from every starting point.
    The value function is the cost of each of those
    trajectories.
  • Parameterize the policy u(x,p) and optimize the
    parameters for some distribution of initial
    conditions.
  • Dynamic programming.

8
Optimize Trajectory From Every Cell
9
Value Function
Value Function
10
Types of tasks
  • Regulator tasks want to stay at xd
  • Trajectory tasks go from A to B in time T, or
    attain goal set G
  • Periodic tasks cyclic behavior such as walking

11
Ways to Parameterize Policies
  • Linear function u(x,p) pTx Kx
  • Table
  • Polynomial (nonlinear controller)
  • Associated with trajectory
  • u(t) uff(t) K(t)(x xd(t))
  • Associated with trajectory(ies)
  • u(x) unn(x) Knn(x)(x xdnn(x))
  • nn nearest neighbor

12
Optimizing Policies Using Function Optimization
13
Policy Search
  • Parameterized policy u ?(x,p), p is vector of
    adjustable parameters.
  • Simplest approach Run it for a while, and
    measure total cost.
  • Use favorite function optimization approach to
    search for best p.
  • There are tricks to improve policy comparison,
    such as using the same perturbations in different
    trials, and terminating trial early if really bad
    (racing algorithms).
Write a Comment
User Comments (0)
About PowerShow.com