A Learning Process Architecture for Continuous Strategic Games - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

A Learning Process Architecture for Continuous Strategic Games

Description:

Create a learning process architecture that does not rely predefined ... Man to Man Strategy. Feasible for one robot to beat. Spiral Approach. Change directions ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 20
Provided by: surf2
Category:

less

Transcript and Presenter's Notes

Title: A Learning Process Architecture for Continuous Strategic Games


1
A Learning Process Architecture for Continuous
Strategic Games
  • By Jonathan Gibbs
  • Mentor Richard Murray
  • Co-Mentor Ling Shi

2
Artificial Intelligence in Games
3
The RoboFlag Game
  • Up to 6 on 6 capture the flag game
  • Limited sensing and communication capability
  • Simulator and Hardware testbed
  • Each robot operates as a separate entity

Courtesy Richard Murray
4
Objectives
  • Create a learning process architecture that does
    not rely predefined strategies
  • Implement the architecture so that a simple
    strategy can be defeated in a small number of
    tries
  • Make the process cooperative

5
Personal Computer Architecture
6
Typical Learning Processes
  • State Definition
  • Reward Scheme
  • Mathematical Model
  • Strategy Database
  • Probabilistic decision maker
  • Solve the game as a math problem
  • Solve a probabilistic graph

Current State
Game
Database
Next Action
Current State
Game
Model
Next Action
7
Challenges with RoboFlag
  • RoboFlagis a dynamic game, NOT a board game
  • Limited model detail
  • Limited database size
  • Limited computation time
  • Small amount useful information available
  • Limited state definition must be efficient and
    effective
  • Limited sharing capability
  • Reward system must be aggressive

Current State
Game
Next Action
Current State
Game
Next Action
8
State Definition
struct JRobotStatus float radius //radius
from flag float theta //theta from
flag BOOL myside //which side of the
field BOOL enemy_present //Is there an enemy
in front of us BOOL gotflag //Do we have the
flag float prob1 //Probabilities of assigned
actions. float prob2 float prob3 float
prob4 float prob5 float prob6 float
prob7 float prob8
  • Contain relevant information
  • Easy to interpret
  • Small
  • Computationally efficient

9
Reward Scheme
  • Aggressive
  • Robust
  • Efficient
  • enum JReward Tagged -5, Ambig 0,
    MovedCloser 2, InZone 10, GotFlag 10

10
Markov Chain Evolution
1
1
1
1
1
11
The Architecture (Good)
RoboFlag
12
The Opposition (Evil)
  • Man to Man Strategy
  • Feasible for one robot to beat
  • Spiral Approach
  • Change directions

13
Results
  • Very little movement
  • No reaction based on enemy location
  • Many inconclusive events
  • Flag was never captured

14
Changes
  • Changed default probabilities
  • Replaced 2 boolean variables with enemy location
    information
  • Cosmetic changes to the update function
  • Added ability to read an old log file

15
Results
  • More movement towards the flag
  • New probability weights made enemy information
    insignificant
  • Did capture the flag
  • Logger failed

16
The New Architecture
RoboFlag
17
Conclusions
  • Architecture did not achieve original objective
    but showed potential
  • No matter how much learning the computer does,
    the mechanisms by which it learns must be
    continuously tweaked
  • Trial and Error is easy to implement but is
    probably not the best approach
  • A model is needed to reduce the order of the
    system to an acceptable level

18
Future Work
  • Increase state definition size until it is
    computationally too expensive
  • Implement a mechanism for cooperation with other
    robots
  • Perfect the architecture so that it can learn
    defensive and offensive strategy at the same time

19
Acknowledgments
  • Richard Murray
  • Ling Shi
  • Brian Beck and Jing Xiong
  • CDS Staff
  • MURF 2004
Write a Comment
User Comments (0)
About PowerShow.com