Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning

About This Presentation

Title:

Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning

Description:

Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning ... teacher signal, since reward received only after ball is kicked to the goal. ... – PowerPoint PPT presentation

Number of Views:24

Avg rating:3.0/5.0

Slides: 15

Provided by: Suba1

Learn more at: http://www.cs.rutgers.edu

more less

Transcript and Presenter's Notes

Title: Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning

1
Purposive Behavior Acquisition for a real robot
by vision based Reinforcement Learning

Minuru Asada,Shoichi Noda, Sukoya Tawarasudia,
Koh Hosoda
Presented by
Subarna Sadhukhan

2
Reinforced learning

Vision based reinforced learning by which a robot
learns to shoot a ball into a goal. Develop a
method which automatically acquires strategies
for this.
The robot and its environment are modeled by two
synchronized finite state automatons interacting
in discrete time cyclical processes.
Robot senses current state and selects an
action Environment makes decision to transition
to a new state and generates reward back to the
robot
Robot learns through purposive behavior to
achieve a given goal

Environment Ball, Goal
Robot- Mobile and has a camera
Nothing about the system is known
Assume robot can discriminate the set S of states
and take A actions on the world

4
Q-learning

Let Q(s,a) be the expected return
for taking action a in situation s.
Where T(s,a,s) be probability of transition from
s to s,
r(s,a) is the reward for state-action pair s-a
? is discounting factor
Since T and r are not known we can write
Where r is the actual reward for taking a. s is
the next state and a is the learning rate

5
State Set

927279 states
(33 of ball333 of goalno goalno ball)

6
Action set

Two motors
Each motor forward, stop, back
9 actions in all.
State-action deviation problem- Small change near
observer results in large change in image, large
change far from observer small change in image

7
Learning from Early Missions

Delayed reinforcement problem due to no explicit
teacher signal, since reward received only after
ball is kicked to the goal. r(s,a) 1 only in
goal state
Construct the learning schedule so that robot can
learn in easy situations at early stages and
later on learn in more difficult situations
Learning from Easy missions

8
Complexity analysis

K states, m possible actions
Q-learning for first , for second hence
LEM mk Get reward at each step

9
Implementing LEM

Rough ordering of easy situations
Small -gt medium -gt large
(sizes of ball roughly means reaching the goal)
State space is categorized into
sub-states such as ball size, position and so on.
n size of state space, m number of ordered
sets
Apply LEM with m ordered states takes
As opposed to

10
When to shift

S1 is nearest to goal, next is S2 and so on.
Shifting occurs when
Where
? t indicates a time interval for number of steps
to change. We suppose that the current state set
S(k-1) can transit only to its neighbors

From previous Q-learning equation if Q converges
Thus

12
LEM
13
Experiments
14
(No Transcript)

Write a Comment

User Comments (0)