Title: A Learning Process Architecture for Continuous Strategic Games
1A Learning Process Architecture for Continuous
Strategic Games
- By Jonathan Gibbs
- Mentor Richard Murray
- Co-Mentor Ling Shi
2Artificial Intelligence in Games
3The RoboFlag Game
- Up to 6 on 6 capture the flag game
- Limited sensing and communication capability
- Simulator and Hardware testbed
- Each robot operates as a separate entity
Courtesy Richard Murray
4Objectives
- Create a learning process architecture that does
not rely predefined strategies - Implement the architecture so that a simple
strategy can be defeated in a small number of
tries - Make the process cooperative
5Personal Computer Architecture
6Typical Learning Processes
- State Definition
- Reward Scheme
- Mathematical Model
- Strategy Database
- Probabilistic decision maker
- Solve the game as a math problem
- Solve a probabilistic graph
Current State
Game
Database
Next Action
Current State
Game
Model
Next Action
7Challenges with RoboFlag
- RoboFlagis a dynamic game, NOT a board game
- Limited model detail
- Limited database size
- Limited computation time
- Small amount useful information available
- Limited state definition must be efficient and
effective - Limited sharing capability
- Reward system must be aggressive
Current State
Game
Next Action
Current State
Game
Next Action
8State Definition
struct JRobotStatus float radius //radius
from flag float theta //theta from
flag BOOL myside //which side of the
field BOOL enemy_present //Is there an enemy
in front of us BOOL gotflag //Do we have the
flag float prob1 //Probabilities of assigned
actions. float prob2 float prob3 float
prob4 float prob5 float prob6 float
prob7 float prob8
- Contain relevant information
- Easy to interpret
- Small
- Computationally efficient
9Reward Scheme
- Aggressive
- Robust
- Efficient
- enum JReward Tagged -5, Ambig 0,
MovedCloser 2, InZone 10, GotFlag 10
10Markov Chain Evolution
1
1
1
1
1
11The Architecture (Good)
RoboFlag
12The Opposition (Evil)
- Man to Man Strategy
- Feasible for one robot to beat
- Spiral Approach
- Change directions
13Results
- Very little movement
- No reaction based on enemy location
- Many inconclusive events
- Flag was never captured
14Changes
- Changed default probabilities
- Replaced 2 boolean variables with enemy location
information - Cosmetic changes to the update function
- Added ability to read an old log file
15Results
- More movement towards the flag
- New probability weights made enemy information
insignificant - Did capture the flag
- Logger failed
16The New Architecture
RoboFlag
17Conclusions
- Architecture did not achieve original objective
but showed potential - No matter how much learning the computer does,
the mechanisms by which it learns must be
continuously tweaked - Trial and Error is easy to implement but is
probably not the best approach - A model is needed to reduce the order of the
system to an acceptable level
18Future Work
- Increase state definition size until it is
computationally too expensive - Implement a mechanism for cooperation with other
robots - Perfect the architecture so that it can learn
defensive and offensive strategy at the same time
19Acknowledgments
- Richard Murray
- Ling Shi
- Brian Beck and Jing Xiong
- CDS Staff
- MURF 2004