Title: Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning
1Purposive Behavior Acquisition for a real robot
by vision based Reinforcement Learning
- Minuru Asada,Shoichi Noda, Sukoya Tawarasudia,
Koh Hosoda - Presented by
- Subarna Sadhukhan
2Reinforced learning
- Vision based reinforced learning by which a robot
learns to shoot a ball into a goal. Develop a
method which automatically acquires strategies
for this. - The robot and its environment are modeled by two
synchronized finite state automatons interacting
in discrete time cyclical processes. - Robot senses current state and selects an
action Environment makes decision to transition
to a new state and generates reward back to the
robot - Robot learns through purposive behavior to
achieve a given goal
3- Environment Ball, Goal
- Robot- Mobile and has a camera
- Nothing about the system is known
- Assume robot can discriminate the set S of states
and take A actions on the world
4Q-learning
- Let Q(s,a) be the expected return
for taking action a in situation s. - Where T(s,a,s) be probability of transition from
s to s, - r(s,a) is the reward for state-action pair s-a
- ? is discounting factor
- Since T and r are not known we can write
- Where r is the actual reward for taking a. s is
the next state and a is the learning rate
5State Set
- 927279 states
- (33 of ball333 of goalno goalno ball)
6Action set
- Two motors
- Each motor forward, stop, back
- 9 actions in all.
- State-action deviation problem- Small change near
observer results in large change in image, large
change far from observer small change in image
7Learning from Early Missions
- Delayed reinforcement problem due to no explicit
teacher signal, since reward received only after
ball is kicked to the goal. r(s,a) 1 only in
goal state - Construct the learning schedule so that robot can
learn in easy situations at early stages and
later on learn in more difficult situations
Learning from Easy missions
8Complexity analysis
- K states, m possible actions
- Q-learning for first , for second hence
- LEM mk Get reward at each step
9Implementing LEM
- Rough ordering of easy situations
- Small -gt medium -gt large
- (sizes of ball roughly means reaching the goal)
- State space is categorized into
- sub-states such as ball size, position and so on.
- n size of state space, m number of ordered
sets - Apply LEM with m ordered states takes
- As opposed to
10When to shift
- S1 is nearest to goal, next is S2 and so on.
- Shifting occurs when
- Where
- ? t indicates a time interval for number of steps
to change. We suppose that the current state set
S(k-1) can transit only to its neighbors
11- From previous Q-learning equation if Q converges
- Thus
12LEM
13Experiments
14(No Transcript)