Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning

1 / 14
About This Presentation
Title:

Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning

Description:

Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning ... teacher signal, since reward received only after ball is kicked to the goal. ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 15
Provided by: Suba1

less

Transcript and Presenter's Notes

Title: Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning


1
Purposive Behavior Acquisition for a real robot
by vision based Reinforcement Learning
  • Minuru Asada,Shoichi Noda, Sukoya Tawarasudia,
    Koh Hosoda
  • Presented by
  • Subarna Sadhukhan

2
Reinforced learning
  • Vision based reinforced learning by which a robot
    learns to shoot a ball into a goal. Develop a
    method which automatically acquires strategies
    for this.
  • The robot and its environment are modeled by two
    synchronized finite state automatons interacting
    in discrete time cyclical processes.
  • Robot senses current state and selects an
    action Environment makes decision to transition
    to a new state and generates reward back to the
    robot
  • Robot learns through purposive behavior to
    achieve a given goal

3
  • Environment Ball, Goal
  • Robot- Mobile and has a camera
  • Nothing about the system is known
  • Assume robot can discriminate the set S of states
    and take A actions on the world

4
Q-learning
  • Let Q(s,a) be the expected return
    for taking action a in situation s.
  • Where T(s,a,s) be probability of transition from
    s to s,
  • r(s,a) is the reward for state-action pair s-a
  • ? is discounting factor
  • Since T and r are not known we can write
  • Where r is the actual reward for taking a. s is
    the next state and a is the learning rate

5
State Set
  • 927279 states
  • (33 of ball333 of goalno goalno ball)

6
Action set
  • Two motors
  • Each motor forward, stop, back
  • 9 actions in all.
  • State-action deviation problem- Small change near
    observer results in large change in image, large
    change far from observer small change in image

7
Learning from Early Missions
  • Delayed reinforcement problem due to no explicit
    teacher signal, since reward received only after
    ball is kicked to the goal. r(s,a) 1 only in
    goal state
  • Construct the learning schedule so that robot can
    learn in easy situations at early stages and
    later on learn in more difficult situations
    Learning from Easy missions

8
Complexity analysis
  • K states, m possible actions
  • Q-learning for first , for second hence
  • LEM mk Get reward at each step

9
Implementing LEM
  • Rough ordering of easy situations
  • Small -gt medium -gt large
  • (sizes of ball roughly means reaching the goal)
  • State space is categorized into
  • sub-states such as ball size, position and so on.
  • n size of state space, m number of ordered
    sets
  • Apply LEM with m ordered states takes
  • As opposed to

10
When to shift
  • S1 is nearest to goal, next is S2 and so on.
  • Shifting occurs when
  • Where
  • ? t indicates a time interval for number of steps
    to change. We suppose that the current state set
    S(k-1) can transit only to its neighbors

11
  • From previous Q-learning equation if Q converges
  • Thus

12
LEM
13
Experiments
14
(No Transcript)
Write a Comment
User Comments (0)