Learning optimal behavior

About This Presentation

Title:

Learning optimal behavior

Description:

Learning optimal behavior Twan van Laarhoven AIBO robot Walking Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion Nate Kohl, Peter Stone (2004 ... – PowerPoint PPT presentation

Number of Views:139

Avg rating:3.0/5.0

Slides: 23

Provided by: TwanvanL

Category:

more less

Transcript and Presenter's Notes

Title: Learning optimal behavior

1
Learning optimal behavior

Twan van Laarhoven

2
AIBO robot Walking

Policy Gradient Reinforcement Learningfor Fast
Quadrupedal LocomotionNate Kohl, Peter Stone
(2004)
Goal speed

3
(No Transcript)
4
Parameterization

12 parameters
Front ellipse
Rear ellipse
Height body
etc.

5
Learning

No simulator
test on actual AIBO
expensive
Not a MDP
No Q-Learning
Gradient Reinforcement Learning

6
Gradient Reinforcement Learning

Parameter vector p ?1, , ?N
Random policies Ri ?1 ?1, , ?N ?N
Each parameter S-e,n / S0,n / Se,n
Averages Avg-e,n avgscore(S-e,n)
Adjust An 0 or Avge,n - Avg-e,n
Repeat

7
(No Transcript)
8
Conclusion

Gradient Reinforcement Learning is very simple
and gives good results
Evaluation can be done in parallel

9
Learning from experts

Apprenticeship Learning for Motion Planning with
Application to Parking Lot Navigation Pieter
Abbeel, Dmitri Dolgov,Andrew Y. Ng, Sebastian
Thrun (2008)

10
Parking lot navigation

Path planning
Many cost functions
length
backward
smoothness
off road
etc.

11
Cost functions

forward length ?fwd ?fwd xi - xi-1
reverse length ?rev ?rev xi - xi-1
off-road ?road ?road(i) xi - xi-1
curvature ?curv ? (?xi1 - ?xi)2
in lane ?lane ? D(xi, ?i, G)
direction ?dir ? sin2 (2(?i - ai))

12
Path planning

Two step approach
Coarse A search
Refinement

13
Cost and paths

Total cost
Best path argmins?S F(s)
Many cost functions
how to weigh them?
learn from examples
Goal match cost ?k(s) ?k(sE)

14
Apprenticeship learning

random weights w(0) random
find paths si argmins F(s)
sum costs µ(i)k ? ?k(si)
find new weights w(j1) µk µEk
repeat until w(j1) e

15
Results

Nice
Sloppy
Backwards

16
Results

Nice
Sloppy
Backwards

17
Results

Nice
Sloppy
Backwards

18
Conclusion

Always performs as well as expert!
µ µE w e
Algorithm is difficult to understand
Paper uses confusing notation

19
EOF
20
More information

Apprenticeship learning via inverse reinforcement
learningPieter Abbeel, Andrew Y. Ng
Maximal margin

21
More information

Apprenticeship learning via inverse reinforcement
learningPieter Abbeel, Andrew Y. Ng
Projection method

22
Apprenticeship learning

random weights w(0) random
find path(s) si argmins ? w(i)k ?k(s)
sum costs µ(i)k ? ?k(si)
find weights minw,x w st. µk ? xjµ(j)k
wk µk µEk
repeat w(j1) w / w
combine

Write a Comment

User Comments (0)

About PowerShow.com

Learning optimal behavior - PowerPoint PPT Presentation

Learning optimal behavior

Learning optimal behavior Twan van Laarhoven AIBO robot Walking Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion Nate Kohl, Peter Stone (2004 ... – PowerPoint PPT presentation