Implementing Reinforcement learning in Robocup Soccer

1 / 29
About This Presentation
Title:

Implementing Reinforcement learning in Robocup Soccer

Description:

Robosoccer is one of the most prestigious events in the AI world. ... 3.Body sensor: detects current physical stamina status like stamina, speed. Robocup Soccer ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 30
Provided by: KAU

less

Transcript and Presenter's Notes

Title: Implementing Reinforcement learning in Robocup Soccer


1
Implementing Reinforcement learning in Robocup
Soccer
  • By
  • Satyam Prasad Shaw
  • Kaushik Kumar Mondal

2
Motivation
  • Robosoccer is one of the most prestigious events
    in the AI world.
  • At present no specific strategy based on
    reinforcement learning is implemented in the
    institutes robosoccer team.
  • Our work will be of direct use of the institute
    team.

3
Robocup Soccer
  • Described as the life AI after Deep Blue.
  • Dream By the year 2050, develop a team of fully
    autonomous humanoid robots that can win against
    the human world soccer champion team.
  • Focus developing cooperation among autonomous
    agents in a dynamic multi-agent environment.
  • RoboCup Soccer simulation league to model
    teamwork and agent behavior for team-play.

4
Robocup Soccer
5
Robocup Soccer
  • Soccerserver enables various teams to compete in
    a game of soccer.
  • Match is carried out in client-server style.
  • Each client (represent a player) is a separate
    process and connects through to the server
    through a specified port.
  • A team can have up to 12 clients11 players and a
    coach.
  • Players send requests to the server regarding the
    actions they want to perform.

6
Robocup Soccer
  • Soccer Monitor provides a visual interface.
  • A Robocup agent has three different sensors
  • 1.Aural sensor detects messages sent by coach,
    players.
  • 2.Visual sensor visual information about the
    field.
  • 3.Body sensor detects current physical stamina
    status like stamina, speed.

7
Robocup Soccer
  • Has motivated many research works.
  • As many as three Doctoral Theses.
  • Many Masters theses including one by Sreangshu
    Acharya on "Real-Time learning of Soccer
    strategies based on Radial Basis Function
    Networks.
  • A lot of B.Tech Projects including one by Rajat
    Raina.

8
Reinforcement Learning
  • The computational approach to learning by
    interaction is known as reinforcement learning.
  • Reinforcement learning a natural choice for
    Robocup soccer.
  • Many of the skills (e.g. learning to walk,
    learning to ride a bicycle) that we acquire are
    learned through interacting with the environment.

9
Reinforcement Learning in RoboCup
  • RoboCup simulated soccer presents many challenges
    to Reinforcement Learning.
  • A Large State Space
  • Hidden and Uncertain States
  • Multiple Agents
  • Long and Variable delays in effects of actions

10
Reinforcement Learning in RoboCup
  • The sensory inputs provided by the server
  • to the client might be noisy.
  • The communication is unreliable which complicates
    the matter further.

11
Past Work
  • RoboCup has been an area of wide research
  • Peter Stone did his PhD thesis in implementing
    Reinforcement learning in robosoccer
  • In Our Institute also substantial amount of work
    related to reinforcement learning has already
    been done

12
Past Work
  • Akhil Gupta and Rajat Raina Implemented
  • dribbling using Reinforcement learning in
    robosoccer using the shooting and ball
    interception module developed by Sreangshu
    Acharya
  • They used Radial Basis Networks as the Function
    Approximators

13
Past Work
  • Peter Stones contribution to RoboCup is huge
  • As mentioned earlier he did his PhD thesis in
    implementing Reinforcement Learning in RoboCup
    .His research has been applied to CMUnited
    simulator Team
  • CMUnited won the RoboCup in 1998 using Peter
    Stones work.

14
Past Work
  • Our work is inspired from one of Peter Stones
    paper where he has implemented Reinforcement
    Learning in Keep Away Soccer .
  • Keep Away Soccer is a subtask of real Soccer . It
    consists of The Keepers and The Takers.

15
Keep Away Soccer
  • The Keepers try to have the maximum possession of
    the ball
  • The Takers try to minimize this time
  • Whenever the Keepers loose the possession of the
    ball or the ball goes outside the playing region
    episode ends and the Keepers and Seekers are both
    reset for another episode

16
Keep Away Soccer
  • Parameters of task include size of the region the
    number of keepers and the number of seekers
  • An omniscient coach agent manages the play ending
    episodes when a taker gains the possession of the
    ball or the ball goes outside the playing region.
  • Each player learns independently and perceive the
    world differently
  • For each the episode ends when the keepers loose
    the possession of the ball.

17
Keep Away Soccer
  • Actions
  • HoldBall()
  • PassBall(k)
  • GetOpen()
  • GoToBall()
  • BlockPass(k)

18
Keep Away Soccer
  • Of the Actions mentioned PassBall(k) influences
    actions for several timesteps
  • More over the simpler Actions may last more than
    one time steps as the simulator occasionally
    misses command.
  • To handle these the problem is treated as Semi
    Markov decision process

19
Keep Away Soccer
  • Keepers
  • One not holding the ball
  • Receive
  • One Holding the ball
  • HoldBall
  • PasskThenRecieve

20
Keep Away Soccer
  • Random
  • Hand-Coded
  • Hold ball based on distance from the taker
  • If Teammate in better position then passBall
  • Hold Ball
  • Judging criteria for better position was CMUnited
    strategy

21
Keep Away Soccer
  • Takers
  • Random
  • All to Ball
  • Hand Coded
  • If fastest taker and closest gotoball
  • k be the keeper with largest angle with vertex
    at the at the ball.
  • Block(k)

22
State space 3-2
  • 13 State variables for The Keepers
  • Dist(k1,c),dist(k2,c),dist(k3,c)
  • Dist(T1,c),dist(T2,c)
  • Dist(k1,k2),dist(k1,k3)
  • Dist(k1,T1),dist(K1,T2)
  • Min(dist(k1,T1),dist(k1,T2))
  • Min(dist(k3,T1),dist(k3,T2))
  • Min(ang(k2,k1,T1),ang(k2,k1,T2))
  • Min(ang(k3,k1,T1),ang(k3,k1,T2))

23
State space 3-3
  • 18 state variables for The Takers
  • Dist(k1,c),dist(k2,c),dist(k3,c)
  • Dist(T1,c),dist(T2,c),dist(T3,c)
  • Dist(k1,k2),dist(k1,k3)
  • Dist(k1,T1),dist(K1,T2), dist(K1,T3)
  • Dist(T1,K2mid),Dist(T1,K3mid)
  • Min(dist(k2mid,T2),dist(k2mid,T3))
  • Min(dist(k3mid,T2),dist(k3mid,T3))
  • Min(ang(k2,k1,T2),ang(k2,k1,T3))
  • Min(ang(k3,k1,T2),ang(k3,k1,T3))
  • Number of takers closer to the ball than T1

24
Reinforcement Learning Algo
  • SMDP version of Sarsa ( lambda ) was used
  • Tile Coding was used

25
Tile Encoding
  • Tile coding is a form of coarse coding that is
    particularly well suited for use on sequential
    digital computers and for efficient online
    learning.
  • In tile coding the receptive fields of the
    features are grouped into exhaustive partitions
    of input space.
  • Each such partition is called a tiling, and each
    element of the partition is called a tile.
  • Each tile is a the receptive field for one binary
    feature.

26
Sarsa algorithm
  • The SARSA algorithm 2 is a temporal difference
    (TD) method that learns action-value functions by
    a bootstrapping mechanism, that is, by making
    estimations based on previous estimations. SARSA
    algorithm in procedural form
  • 1. Initialize the Q(s,a) value functions
    arbitrarily,
  • 2. Initialize the environment set a state s,
  • 3. Select an action following a certain policy ,
  • 4. Take action a observe reward, r, find the
    next state s', and select the next action a',
  • 5. Update the estimate value Q(s,a) as follows
    Q(s,a) Q(s,a) alpha TDerr,where TDerr
    rgamma Q(s',a')-Q(s,a) is the temporal
    difference error,alpha is the step size, and
    gamma is a discount reward factor,
  • 6. Let s s',
  • 7. Go to step 3 until the state s is a terminal
    state,
  • 8. Repeat steps 2 to 7 for a certain number of
    episodes.

27
Sarsa algorithm
  • At every time step, SARSA updates the estimations
    of the action-value functions Q(s,a) using the
    quintuple (s,a,r,s',a'), which gives rise to the
    name of the algorithm. SARSA is an on-policy
    version of the well known Q-learning algorithm
    3, where the learned action-value function Q
    directly approximates the optimal action-value
    function, denoted by Q(s,a).

28
What we are doing
  • We are implementing keep away soccer model with
    three keepers and one taker.
  • We are implementing with a single taker because
    Peter Stone used a specific strategy implemented
    by CMUnited team which is not available with us
  • We are going to make The Keepers learn how to
    keep but The Taker is going to be hardcoded to
    goToBall and if he gets the ball or the ball goes
    outside the playing area the episode ends.

29
Thanks
  • http//www-anw.cs.umass.edu/rich/tiles.html
  • http//lslwww.epfl.ch/aperez/RL/RL.html
  • "CMUnited A Team of Robotic Soccer Agents
    Collaborating in an Adversarial Environment
    Manuela Veloso, Peter Stone, Kwun Han and Sorin
    Achim Crossroads The ACM Student Magazine, issue
    (4.3), February, 1998.
Write a Comment
User Comments (0)