A Learning Process Architecture for Continuous Strategic Games

About This Presentation

Title:

A Learning Process Architecture for Continuous Strategic Games

Description:

Create a learning process architecture that does not rely predefined ... Man to Man Strategy. Feasible for one robot to beat. Spiral Approach. Change directions ... – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 20

Provided by: surf2

Category:

more less

Transcript and Presenter's Notes

Title: A Learning Process Architecture for Continuous Strategic Games

1
A Learning Process Architecture for Continuous
Strategic Games

By Jonathan Gibbs
Mentor Richard Murray
Co-Mentor Ling Shi

2
Artificial Intelligence in Games
3
The RoboFlag Game

Up to 6 on 6 capture the flag game
Limited sensing and communication capability
Simulator and Hardware testbed
Each robot operates as a separate entity

Courtesy Richard Murray
4
Objectives

Create a learning process architecture that does
not rely predefined strategies
Implement the architecture so that a simple
strategy can be defeated in a small number of
tries
Make the process cooperative

5
Personal Computer Architecture
6
Typical Learning Processes

State Definition
Reward Scheme
Mathematical Model
Strategy Database
Probabilistic decision maker
Solve the game as a math problem
Solve a probabilistic graph

Current State
Game
Database
Next Action
Current State
Game
Model
Next Action
7
Challenges with RoboFlag

RoboFlagis a dynamic game, NOT a board game
Limited model detail
Limited database size
Limited computation time
Small amount useful information available
Limited state definition must be efficient and
effective
Limited sharing capability
Reward system must be aggressive

Current State
Game
Next Action
Current State
Game
Next Action
8
State Definition
struct JRobotStatus float radius //radius
from flag float theta //theta from
flag BOOL myside //which side of the
field BOOL enemy_present //Is there an enemy
in front of us BOOL gotflag //Do we have the
flag float prob1 //Probabilities of assigned
actions. float prob2 float prob3 float
prob4 float prob5 float prob6 float
prob7 float prob8

Contain relevant information
Easy to interpret
Small
Computationally efficient

9
Reward Scheme

Aggressive
Robust
Efficient

enum JReward Tagged -5, Ambig 0,
MovedCloser 2, InZone 10, GotFlag 10

10
Markov Chain Evolution
1
1
1
1
1
11
The Architecture (Good)
RoboFlag
12
The Opposition (Evil)

Man to Man Strategy
Feasible for one robot to beat
Spiral Approach
Change directions

13
Results

Very little movement
No reaction based on enemy location
Many inconclusive events
Flag was never captured

14
Changes

Changed default probabilities
Replaced 2 boolean variables with enemy location
information
Cosmetic changes to the update function
Added ability to read an old log file

15
Results

More movement towards the flag
New probability weights made enemy information
insignificant
Did capture the flag
Logger failed

16
The New Architecture
RoboFlag
17
Conclusions

Architecture did not achieve original objective
but showed potential
No matter how much learning the computer does,
the mechanisms by which it learns must be
continuously tweaked
Trial and Error is easy to implement but is
probably not the best approach
A model is needed to reduce the order of the
system to an acceptable level

18
Future Work

Increase state definition size until it is
computationally too expensive
Implement a mechanism for cooperation with other
robots
Perfect the architecture so that it can learn
defensive and offensive strategy at the same time

19
Acknowledgments