Implementing Reinforcement learning in Robocup Soccer

1 / 29

About This Presentation

Title:

Implementing Reinforcement learning in Robocup Soccer

Description:

Robosoccer is one of the most prestigious events in the AI world. ... 3.Body sensor: detects current physical stamina status like stamina, speed. Robocup Soccer ... – PowerPoint PPT presentation

Number of Views:30

Avg rating:3.0/5.0

Slides: 30

Provided by: KAU

more less

Transcript and Presenter's Notes

Title: Implementing Reinforcement learning in Robocup Soccer

1
Implementing Reinforcement learning in Robocup
Soccer

By
Satyam Prasad Shaw
Kaushik Kumar Mondal

2
Motivation

Robosoccer is one of the most prestigious events
in the AI world.
At present no specific strategy based on
reinforcement learning is implemented in the
institutes robosoccer team.
Our work will be of direct use of the institute
team.

3
Robocup Soccer

Described as the life AI after Deep Blue.
Dream By the year 2050, develop a team of fully
autonomous humanoid robots that can win against
the human world soccer champion team.
Focus developing cooperation among autonomous
agents in a dynamic multi-agent environment.
RoboCup Soccer simulation league to model
teamwork and agent behavior for team-play.

4
Robocup Soccer
5
Robocup Soccer

Soccerserver enables various teams to compete in
a game of soccer.
Match is carried out in client-server style.
Each client (represent a player) is a separate
process and connects through to the server
through a specified port.
A team can have up to 12 clients11 players and a
coach.
Players send requests to the server regarding the
actions they want to perform.

6
Robocup Soccer

Soccer Monitor provides a visual interface.
A Robocup agent has three different sensors
1.Aural sensor detects messages sent by coach,
players.
2.Visual sensor visual information about the
field.
3.Body sensor detects current physical stamina
status like stamina, speed.

7
Robocup Soccer

Has motivated many research works.
As many as three Doctoral Theses.
Many Masters theses including one by Sreangshu
Acharya on "Real-Time learning of Soccer
strategies based on Radial Basis Function
Networks.
A lot of B.Tech Projects including one by Rajat
Raina.

8
Reinforcement Learning

The computational approach to learning by
interaction is known as reinforcement learning.
Reinforcement learning a natural choice for
Robocup soccer.
Many of the skills (e.g. learning to walk,
learning to ride a bicycle) that we acquire are
learned through interacting with the environment.

9
Reinforcement Learning in RoboCup

RoboCup simulated soccer presents many challenges
to Reinforcement Learning.
A Large State Space
Hidden and Uncertain States
Multiple Agents
Long and Variable delays in effects of actions

10
Reinforcement Learning in RoboCup

The sensory inputs provided by the server
to the client might be noisy.
The communication is unreliable which complicates
the matter further.

11
Past Work

RoboCup has been an area of wide research
Peter Stone did his PhD thesis in implementing
Reinforcement learning in robosoccer
In Our Institute also substantial amount of work
related to reinforcement learning has already
been done

12
Past Work

Akhil Gupta and Rajat Raina Implemented
dribbling using Reinforcement learning in
robosoccer using the shooting and ball
interception module developed by Sreangshu
Acharya
They used Radial Basis Networks as the Function
Approximators

13
Past Work

Peter Stones contribution to RoboCup is huge
As mentioned earlier he did his PhD thesis in
implementing Reinforcement Learning in RoboCup
.His research has been applied to CMUnited
simulator Team
CMUnited won the RoboCup in 1998 using Peter
Stones work.

14
Past Work

Our work is inspired from one of Peter Stones
paper where he has implemented Reinforcement
Learning in Keep Away Soccer .
Keep Away Soccer is a subtask of real Soccer . It
consists of The Keepers and The Takers.

15
Keep Away Soccer

The Keepers try to have the maximum possession of
the ball
The Takers try to minimize this time
Whenever the Keepers loose the possession of the
ball or the ball goes outside the playing region
episode ends and the Keepers and Seekers are both
reset for another episode

16
Keep Away Soccer

Parameters of task include size of the region the
number of keepers and the number of seekers
An omniscient coach agent manages the play ending
episodes when a taker gains the possession of the
ball or the ball goes outside the playing region.
Each player learns independently and perceive the
world differently
For each the episode ends when the keepers loose
the possession of the ball.

17
Keep Away Soccer

Actions
HoldBall()
PassBall(k)
GetOpen()
GoToBall()
BlockPass(k)

18
Keep Away Soccer

Of the Actions mentioned PassBall(k) influences
actions for several timesteps
More over the simpler Actions may last more than
one time steps as the simulator occasionally
misses command.
To handle these the problem is treated as Semi
Markov decision process

19
Keep Away Soccer

Keepers
One not holding the ball
Receive
One Holding the ball
HoldBall
PasskThenRecieve

20
Keep Away Soccer

Random
Hand-Coded
Hold ball based on distance from the taker
If Teammate in better position then passBall
Hold Ball
Judging criteria for better position was CMUnited
strategy

21
Keep Away Soccer

Takers
Random
All to Ball
Hand Coded
If fastest taker and closest gotoball
k be the keeper with largest angle with vertex
at the at the ball.
Block(k)

22
State space 3-2

13 State variables for The Keepers
Dist(k1,c),dist(k2,c),dist(k3,c)
Dist(T1,c),dist(T2,c)
Dist(k1,k2),dist(k1,k3)
Dist(k1,T1),dist(K1,T2)
Min(dist(k1,T1),dist(k1,T2))
Min(dist(k3,T1),dist(k3,T2))
Min(ang(k2,k1,T1),ang(k2,k1,T2))
Min(ang(k3,k1,T1),ang(k3,k1,T2))

23
State space 3-3

18 state variables for The Takers
Dist(k1,c),dist(k2,c),dist(k3,c)
Dist(T1,c),dist(T2,c),dist(T3,c)
Dist(k1,k2),dist(k1,k3)
Dist(k1,T1),dist(K1,T2), dist(K1,T3)
Dist(T1,K2mid),Dist(T1,K3mid)
Min(dist(k2mid,T2),dist(k2mid,T3))
Min(dist(k3mid,T2),dist(k3mid,T3))
Min(ang(k2,k1,T2),ang(k2,k1,T3))
Min(ang(k3,k1,T2),ang(k3,k1,T3))
Number of takers closer to the ball than T1

24
Reinforcement Learning Algo

SMDP version of Sarsa ( lambda ) was used
Tile Coding was used

25
Tile Encoding

Tile coding is a form of coarse coding that is
particularly well suited for use on sequential
digital computers and for efficient online
learning.
In tile coding the receptive fields of the
features are grouped into exhaustive partitions
of input space.
Each such partition is called a tiling, and each
element of the partition is called a tile.
Each tile is a the receptive field for one binary
feature.

26
Sarsa algorithm

The SARSA algorithm 2 is a temporal difference
(TD) method that learns action-value functions by
a bootstrapping mechanism, that is, by making
estimations based on previous estimations. SARSA
algorithm in procedural form
1. Initialize the Q(s,a) value functions
arbitrarily,
2. Initialize the environment set a state s,
3. Select an action following a certain policy ,
4. Take action a observe reward, r, find the
next state s', and select the next action a',
5. Update the estimate value Q(s,a) as follows
Q(s,a) Q(s,a) alpha TDerr,where TDerr
rgamma Q(s',a')-Q(s,a) is the temporal
difference error,alpha is the step size, and
gamma is a discount reward factor,
6. Let s s',
7. Go to step 3 until the state s is a terminal
state,
8. Repeat steps 2 to 7 for a certain number of
episodes.

27
Sarsa algorithm

At every time step, SARSA updates the estimations
of the action-value functions Q(s,a) using the
quintuple (s,a,r,s',a'), which gives rise to the
name of the algorithm. SARSA is an on-policy
version of the well known Q-learning algorithm
3, where the learned action-value function Q
directly approximates the optimal action-value
function, denoted by Q(s,a).

28
What we are doing

We are implementing keep away soccer model with
three keepers and one taker.
We are implementing with a single taker because
Peter Stone used a specific strategy implemented
by CMUnited team which is not available with us
We are going to make The Keepers learn how to
keep but The Taker is going to be hardcoded to
goToBall and if he gets the ball or the ball goes
outside the playing area the episode ends.

29
Thanks

http//www-anw.cs.umass.edu/rich/tiles.html
http//lslwww.epfl.ch/aperez/RL/RL.html
"CMUnited A Team of Robotic Soccer Agents
Collaborating in an Adversarial Environment
Manuela Veloso, Peter Stone, Kwun Han and Sorin
Achim Crossroads The ACM Student Magazine, issue
(4.3), February, 1998.

Write a Comment

User Comments (0)