Title: Implementing Reinforcement learning in Robocup Soccer
1Implementing Reinforcement learning in Robocup
Soccer
- By
- Satyam Prasad Shaw
-
- Kaushik Kumar Mondal
2Motivation
- Robosoccer is one of the most prestigious events
in the AI world. - At present no specific strategy based on
reinforcement learning is implemented in the
institutes robosoccer team. - Our work will be of direct use of the institute
team.
3Robocup Soccer
- Described as the life AI after Deep Blue.
- Dream By the year 2050, develop a team of fully
autonomous humanoid robots that can win against
the human world soccer champion team. - Focus developing cooperation among autonomous
agents in a dynamic multi-agent environment. - RoboCup Soccer simulation league to model
teamwork and agent behavior for team-play.
4Robocup Soccer
5Robocup Soccer
- Soccerserver enables various teams to compete in
a game of soccer. - Match is carried out in client-server style.
- Each client (represent a player) is a separate
process and connects through to the server
through a specified port. - A team can have up to 12 clients11 players and a
coach. - Players send requests to the server regarding the
actions they want to perform.
6Robocup Soccer
- Soccer Monitor provides a visual interface.
- A Robocup agent has three different sensors
- 1.Aural sensor detects messages sent by coach,
players. - 2.Visual sensor visual information about the
field. - 3.Body sensor detects current physical stamina
status like stamina, speed.
7Robocup Soccer
- Has motivated many research works.
- As many as three Doctoral Theses.
- Many Masters theses including one by Sreangshu
Acharya on "Real-Time learning of Soccer
strategies based on Radial Basis Function
Networks. - A lot of B.Tech Projects including one by Rajat
Raina.
8Reinforcement Learning
- The computational approach to learning by
interaction is known as reinforcement learning. - Reinforcement learning a natural choice for
Robocup soccer. - Many of the skills (e.g. learning to walk,
learning to ride a bicycle) that we acquire are
learned through interacting with the environment.
9Reinforcement Learning in RoboCup
- RoboCup simulated soccer presents many challenges
to Reinforcement Learning. - A Large State Space
- Hidden and Uncertain States
- Multiple Agents
- Long and Variable delays in effects of actions
10Reinforcement Learning in RoboCup
- The sensory inputs provided by the server
- to the client might be noisy.
- The communication is unreliable which complicates
the matter further.
11Past Work
- RoboCup has been an area of wide research
- Peter Stone did his PhD thesis in implementing
Reinforcement learning in robosoccer - In Our Institute also substantial amount of work
related to reinforcement learning has already
been done
12Past Work
- Akhil Gupta and Rajat Raina Implemented
- dribbling using Reinforcement learning in
robosoccer using the shooting and ball
interception module developed by Sreangshu
Acharya - They used Radial Basis Networks as the Function
Approximators
13Past Work
- Peter Stones contribution to RoboCup is huge
- As mentioned earlier he did his PhD thesis in
implementing Reinforcement Learning in RoboCup
.His research has been applied to CMUnited
simulator Team - CMUnited won the RoboCup in 1998 using Peter
Stones work.
14Past Work
- Our work is inspired from one of Peter Stones
paper where he has implemented Reinforcement
Learning in Keep Away Soccer . - Keep Away Soccer is a subtask of real Soccer . It
consists of The Keepers and The Takers.
15Keep Away Soccer
- The Keepers try to have the maximum possession of
the ball - The Takers try to minimize this time
- Whenever the Keepers loose the possession of the
ball or the ball goes outside the playing region
episode ends and the Keepers and Seekers are both
reset for another episode
16Keep Away Soccer
- Parameters of task include size of the region the
number of keepers and the number of seekers - An omniscient coach agent manages the play ending
episodes when a taker gains the possession of the
ball or the ball goes outside the playing region. - Each player learns independently and perceive the
world differently - For each the episode ends when the keepers loose
the possession of the ball.
17Keep Away Soccer
- Actions
- HoldBall()
- PassBall(k)
- GetOpen()
- GoToBall()
- BlockPass(k)
18Keep Away Soccer
- Of the Actions mentioned PassBall(k) influences
actions for several timesteps - More over the simpler Actions may last more than
one time steps as the simulator occasionally
misses command. - To handle these the problem is treated as Semi
Markov decision process
19Keep Away Soccer
- Keepers
- One not holding the ball
- Receive
-
- One Holding the ball
- HoldBall
- PasskThenRecieve
20Keep Away Soccer
- Random
- Hand-Coded
- Hold ball based on distance from the taker
- If Teammate in better position then passBall
- Hold Ball
- Judging criteria for better position was CMUnited
strategy
21Keep Away Soccer
- Takers
- Random
- All to Ball
- Hand Coded
- If fastest taker and closest gotoball
- k be the keeper with largest angle with vertex
at the at the ball. - Block(k)
22State space 3-2
- 13 State variables for The Keepers
- Dist(k1,c),dist(k2,c),dist(k3,c)
- Dist(T1,c),dist(T2,c)
- Dist(k1,k2),dist(k1,k3)
- Dist(k1,T1),dist(K1,T2)
- Min(dist(k1,T1),dist(k1,T2))
- Min(dist(k3,T1),dist(k3,T2))
- Min(ang(k2,k1,T1),ang(k2,k1,T2))
- Min(ang(k3,k1,T1),ang(k3,k1,T2))
23State space 3-3
- 18 state variables for The Takers
- Dist(k1,c),dist(k2,c),dist(k3,c)
- Dist(T1,c),dist(T2,c),dist(T3,c)
- Dist(k1,k2),dist(k1,k3)
- Dist(k1,T1),dist(K1,T2), dist(K1,T3)
- Dist(T1,K2mid),Dist(T1,K3mid)
- Min(dist(k2mid,T2),dist(k2mid,T3))
- Min(dist(k3mid,T2),dist(k3mid,T3))
- Min(ang(k2,k1,T2),ang(k2,k1,T3))
- Min(ang(k3,k1,T2),ang(k3,k1,T3))
- Number of takers closer to the ball than T1
24Reinforcement Learning Algo
- SMDP version of Sarsa ( lambda ) was used
- Tile Coding was used
25Tile Encoding
- Tile coding is a form of coarse coding that is
particularly well suited for use on sequential
digital computers and for efficient online
learning. - In tile coding the receptive fields of the
features are grouped into exhaustive partitions
of input space. - Each such partition is called a tiling, and each
element of the partition is called a tile. - Each tile is a the receptive field for one binary
feature.
26Sarsa algorithm
- The SARSA algorithm 2 is a temporal difference
(TD) method that learns action-value functions by
a bootstrapping mechanism, that is, by making
estimations based on previous estimations. SARSA
algorithm in procedural form - 1. Initialize the Q(s,a) value functions
arbitrarily, - 2. Initialize the environment set a state s,
- 3. Select an action following a certain policy ,
- 4. Take action a observe reward, r, find the
next state s', and select the next action a', - 5. Update the estimate value Q(s,a) as follows
Q(s,a) Q(s,a) alpha TDerr,where TDerr
rgamma Q(s',a')-Q(s,a) is the temporal
difference error,alpha is the step size, and
gamma is a discount reward factor, - 6. Let s s',
- 7. Go to step 3 until the state s is a terminal
state, - 8. Repeat steps 2 to 7 for a certain number of
episodes.
27Sarsa algorithm
- At every time step, SARSA updates the estimations
of the action-value functions Q(s,a) using the
quintuple (s,a,r,s',a'), which gives rise to the
name of the algorithm. SARSA is an on-policy
version of the well known Q-learning algorithm
3, where the learned action-value function Q
directly approximates the optimal action-value
function, denoted by Q(s,a).
28What we are doing
- We are implementing keep away soccer model with
three keepers and one taker. - We are implementing with a single taker because
Peter Stone used a specific strategy implemented
by CMUnited team which is not available with us - We are going to make The Keepers learn how to
keep but The Taker is going to be hardcoded to
goToBall and if he gets the ball or the ball goes
outside the playing area the episode ends.
29Thanks
- http//www-anw.cs.umass.edu/rich/tiles.html
- http//lslwww.epfl.ch/aperez/RL/RL.html
- "CMUnited A Team of Robotic Soccer Agents
Collaborating in an Adversarial Environment
Manuela Veloso, Peter Stone, Kwun Han and Sorin
Achim Crossroads The ACM Student Magazine, issue
(4.3), February, 1998.