Testbed for Integrating and Evaluating Learning Techniques

About This Presentation

Title:

Testbed for Integrating and Evaluating Learning Techniques

Description:

Example: Knowledge base content. Status: Implementation & documentation ... Few deployed cognitive systems integrate techniques that exhibit rapid ... – PowerPoint PPT presentation

Number of Views:150

Avg rating:3.0/5.0

Slides: 84

Provided by: rli27

Learn more at: https://www.cse.lehigh.edu

Category:

more less

Transcript and Presenter's Notes

Title: Testbed for Integrating and Evaluating Learning Techniques

1
Testbed for Integrating and Evaluating Learning
Techniques
TIELT
David W. Aha1 Matthew Molineaux2 1Intelligent
Decision Aids Group Navy Center for Applied
Research in AI Naval Research Laboratory
Washington, DC 2ITT Industries AES Division
Alexandria, VA first.surname_at_nrl.navy.mil
17 November 2004
2
Outline

Motivation Learning in cognitive systems
Objectives
Encourage machine learning research on complex
tasks that require knowledge-intensive approaches
Provide industry military with access to the
results
Design TIELT functionality components
Example Knowledge base content
Status
Implementation documentation
Collaborations events
Task list
Summary

3
DARPA

Defense Advanced Research Projects Agency
(2.3B/yr)

4
Cognitive Systems
Systems that know what theyre doing

A cognitive system is one that
can reason, using substantial amounts of
appropriately represented knowledge
can learn from its experience so that it performs
better tomorrow than it did today
can explain itself and be told what to do
can be aware of its own capabilities and reflect
on its own behavior
can respond robustly to surprise

5
Anatomy of a Cognitive Agent
Reflective Processes
LTM
Cognitive Agent
Concepts
STM
Deliberative Processes
Learning
Other reasoning
Sentences
Communication (language, gesture, image)
Prediction, planning
Perception
Action
Reactive Processes
Sensors
Effectors
External Environment
Attention
(Brachman, 2003)
6
Learning in Cognitive Systems(Langley Laird,
2002)
Many opportunities exist for learning in
cognitive systems
7
Status of Learning in Cognitive Systems
Problem

Few deployed cognitive systems integrate
techniques that exhibit rapid enduring learning
behavior on complex tasks
Its costly to integrate evaluate embedded
learning techniques

8
TIELT Motivation

We want Cognitive Agents that Learn
Rapidly,
in context, and
over the long-term.
We have few (if any) of them

9
TIELT Objective

Encourage the study of research on learning in
cognitive systems, with subsequent transition
goals

Learning Modules
Cognitive Agents That Learn
Military
ML Researchers
Cognitive Agents
Industry
10
Current ML Research Focus

Benchmark studies of multiple algorithms on
simple (e.g., supervised) learning tasks from
many static datasets

ML Researcher
ML System1
Database1
m results on System1
Analysis
Benchmark Analysis
Database2
ML System2
m results on System2
. . .
. . .
. . .
Databasem
ML Systemn
m results on Systemn
This was encouraged (in part) by the availability
of datasets in a standard (interface) format
11
Previous API for ML Investigations
Inspiration

UC Irvine Repository of Machine Learning (ML)
Databases
An interface for empirical benchmarking studies
on supervised learning
1525 citations (and many publications use it w/o
citing) since 1986

Supervised Learning

ML Systemj
Decision Systemk
Interface (standard format)
Databasei
12
Accomplishing TIELTs Objective

One approach Shift ML research focus from static
datasets to dynamic simulators of rich
environments

13
Refining TIELTs Objective
Objective

Develop a tool for evaluating decision systems in
simulators
Specific support for evaluating learning
techniques
Demonstrate research utility prior to approaching
industry/military

Benefits

Reduce system-simulator integration costs from
mn to mn (see next)
Permits benchmark studies on selected simulator
tasks
Encourages study of ML for knowledge-intensive
problems
Provide support for DARPA Challenge Problems on
Cognitive Learning

14
Reducing Integration Costs
15
What Domain?
Desiderata

Available implementations (cheap to acquire
run)
Challenging problems for CogSys/ML research
Significant interest (academia, military,
industry, funding, public)

Simulation Games?
16
Gaming Genres of Interest(modified from (Laird
van Lent, 2001))
AI Roles
Sub-Genres
Description
Example
Genre
Control enemies
1st vs. 3rd person, solo vs team play
Control a character
Quake, Unreal
Action
Control enemies, partners, and supporting
characters
Solo vs. (massively) multi-player
Be a character (includes puzzle solving, etc.)
Temple of Elemental Evil
Role-Playing
Control all units and strategic enemies
God, first-person perspectives
Controlling at multiple levels (e.g., strategic,
tactical warfare)
Empire Earth 2, AoE, Civilization
Strategy (real-time, discrete)
Control units and strategic enemy (i.e., other
coach), commentator
Act as coach and a key player
Madden NFL Football
Team Sports
1st vs. 3rd person
Control enemy
Individual competition
Many (e.g., driving games)
Individual Sports
17
Some Game Environment Challenges

Significant background knowledge available
e.g., Processes, tasks, objects, actions
Use Provide opportunities for rapid learning
Adversarial
Collaborative
Multiple reasoning levels (e.g., strategic,
tactical)
Real-time
Uncertainty (Fog of War)
Noise (e.g., imprecision)
Relational (e.g., social networks)
Temporal
Spatial

18
Academia Learning in Simulation Games
Focus Broad interests

Game engines (e.g., GameBots, ORTS, RoboCup
Soccer Server)
Use (other) open source engines (e.g., FreeCiv,
Stratagus)
Representation (e.g., Forbus et al., 2001 Houk,
2004 Munoz-Avila Fisher, 2004)
Learning opponent unit models (e.g., Laird, 2001
Hill et al., 2002)
(see table)

Evidence of commitment

Interactive Computer Games Human-Level AIs
Killer Application (Laird van Lent, AAAI00
Invited Talk)
Meetings
AAAI symposia (several in recent years)
International Conference on Computers and Games
AAAI04 Workshop on Challenges in Game AI
AI in Interactive Digital Entertainment
Conference (2005-)
New journals focusing on (e.g., real-time)
simulation games
J. of Game Development
Int. J. of Intelligent Games and Simulation

19
Survey Selected Previous Work onLearning
Gaming Simulators
Name Reference Method Tasks Tasks Test Plan Metrics (independent variables to vary and dependents to measure)
Name Reference Method Learning Performance Test Plan Metrics (independent variables to vary and dependents to measure)
(Goodman, AAAI93) Projective Visualization 1 TDIDT per feature cluster Predict amount of inflicted damage Vary training amount projection length predict summed pain
MAYOR (Fasciano, 1996 M.S. Thesis) Case-based planning Plan Execution Conds. Maximize SimCity Game Score Online Vary whether learning was used measure successful plan executions
(Fogel et al., CCGFBR96) Genetic Alg Rule learning 1x1 tank battles Vary locations/space of routes measure damage
KnoMic (van Lent Laird, ICML98) Production Rules Rule Conds. Goals Racetrack Mission for TacAir SOAR Measure speed in which KnoMic learned correct control rules
(Agogino et al., 1999 NPL) Neuro-evolution Wt genetic learning 30 gold-collecting peons vs. 1 human Vary learning methodology measure survival rate of peons
(Laird, ICAA01) SOAR Chunking Rule learning Predict enemy beh. None would focus on speedup
(Geisler, 2002 M.S. Thesis) NB, TDIDT, BP, ensembles Depends on the method 4 simple classification tasks Vary training set size ensembles measure classification accuracy
Bryant Mikkulainen, CEC03) Neuroevolution NN wts, etc. Discrete Legions vs. Barbarians Offline Vary training set size measure a game-specific fn.
(Chia Williams, BRIMS03) Naïve Bayes Learning to add/del. rules 1x1 tank battles Vary adversarial aggressiveness whether learning occurs measure wins
(Fagan Cunningham, ICCBR03) Case-based prediction Selecting plans to save Predict a players action Vary the stored plans and the user measure acc. prediction freq.
(Guestrin et al., IJCAI03) Relational MDPs Partition objects Beat enemy in 3x3 Freecraft games Simplistic one run.
(Sweetser Dennis, 2003 Ent. Computing Tech. Applications) Advice giving Regression wts Just-in-time Hints to Human Player Vary with vs. without providing hints measure hints that were useful
(Spronck et al., 2004 IJIGS) Dynamic Scripting Rule wts Beat NWN AI in simple scenarios Offline Measure average turning pt speed, effectiveness, robustness, efficiency
(Ponsen, 2004 M.S. Thesis) Dynamic Scripting GA for rule learning Rule wts and new rules Defeat Wargus opponent Offline Vary map size, learning algorithm, and opponent control alg measure wins
(Ulam et al., AAAI04 Workshop) Self-adaptation Task Edits Defend city (FreeCiv) Offline Vary trace size measure successes
20
Industry Learning in Simulation Games
Focus Increase sales via enhanced gaming
experience

USA 7B in sales in 2003 (ESA, 2004)
Strategy games 0.3B
Simulators Many! (e.g., SimCity, Quake, SoF, UT)
Target Control avatars, unit behaviors

Evidence of commitment

Developers keenly interested in building AIs
that might learn, both from the player
environment around them. (GDC03 Roundtable
Report)
Middleware products that support learning (e.g.,
MASA, SHAI, LearningMachine)
Long-term investments in learning (e.g., iKuni,
Inc.)
Conferences
Game Developers Conference
Computer Game Technology Conference

21
Industry Learning in Simulation Games
Status

Few deployed systems have used learning (Kirby,
2004) e.g.,
Black White on-line, explicit (player
immediately reinforces behavior)
CC Renegade on-line, implicit (agent updates
set of legal paths)
Re-volt off-line, implicit (GA tunes racecar
behaviors prior to shipping)
Problems Performance, constraints (preventing
learning something dumb), trust in learning
system

Some Promising Techniques (Rabin, 2004)

Belief networks for probabilistic inference
Decision tree learning
Genetic algorithms (e.g., for offline parameter
tuning)
Statistical prediction (e.g., using N-grams to
predict future events)
Neural networks (e.g., for offline applications)
Player modeling (e.g., to regulate game
difficulty, model reputation)
Reinforcement learning
Weakness modification learning (e.g., dont
repeat failed strategies)

22
Military Learning in Simulation Games
Focus Training, analysis, experimentation

Learning Acquisition of new knowledge or
behaviors
Simulators JWARS, OneSAF, Full Spectrum Command,
etc.
Target Control strategic opponent or own units

Evidence of commitment

Learning is an essential ability of intelligent
systems (NRC, 1998)
To realize the full benefit of a human behavior
model within an intelligent simulator,the model
should incorporate learning (Hunter et al.,
CCGBR00)
Successful employment of human behavior
modelsrequires that they possess the ability
to integrate learning (Banks Stytz, CCGBR00)
Conferences BRIMS, I/ITSEC

Status No CGF simulator has been deployed with
learning (D. Reece, 2003)

Some problems (Petty, CGFBR01)
Cost of training phase
Loss of training control
Learning non-doctrinal behaviors
Learning unpredictable behaviors

23
Analysis Conclusions
State-of-the-art

Research on learning in complex gaming simulators
is in its infancy
Knowledge-poor approaches are limited to simple
performance tasks
Knowledge-intensive approaches require huge
knowledge bases, which to date have been manually
encoded
Existing approaches have many simplifying
assumptions
Scenario limitations (e.g., on number and/or
capabilities of adversaries)
Learning is (usually) performed only off-line
Learned knowledge is not transferred (e.g., to
playing other games)

Significant advances would include

Fast acquisition approaches for a large amount of
domain knowledge
This would enable rapid learning without
requiring manual encoding
Demonstrations of on-line learning (i.e., within
a single simulation run)
Increasing knowledge transfer among tasks
simulators over time
e.g., knowledge of processes, strategies, tasks,
roles, objects, actions

24
TIELT Specification

Simplifies integration evaluation!
Learning-embedded decision systems gaming
simulators
Supports communications, game model, perf. task,
evaluation
Free available
Learning foci
Task (e.g., learn how to execute, or advise on, a
task)
Player (e.g., accept advice, predict a players
strategies)
Game (e.g., learn/refine its objects, their
relations, behaviors)
Learning methods
Supervised/unsupervised, immediate/delayed
feedback, analytic, active/passive,
online/offline, direct/indirect,
automated/interactive
Learning results should be available for
inspection
Gaming simulators Those with challenging
learning tasks
Reuse
Communications are separated from the game model
perf. task
Provide access to libraries of simulators
decision systems

25
Distinguishing TIELT
System Focus Game Engine(s) Prominent Feature Reasoning Activity
DirectIA (MASA) AI SDK ? FPS, RTS, etc. Behavior authoring Sense-act,
SimBionic (SHAI) AI SDK ? FPS, etc. Behavior authoring Sense-act,
FEAR AI SDK Quake 2, etc. Behavior authoring Sense-act,
RoboCup Research Testbed RoboCup Soccer game play Sense-act, coaching, etc.
GameBots Research Testbed UT (FPS) UT game play Sense-act
ORTS Research Testbed RTS games Hack-free MM RTS Sense-act, strategy
TIELT Research Testbed Several genres Experimentation for evaluating learning learned behaviors Sense-act, advice processing, prediction, model updating, etc.

Provides an interface for message-passing
interfaces
Supports composable system-level interfaces

26
TIELT Integration Architecture
TIELTs User Interface
Game Engine Library
Evaluation Interface
Prediction Interface
Coordination Interface
Advice Interface
TIELT User
TIELTs Internal Communication Modules
Selected Game Engine
Selected Decision System
. . .
Learned Knowledge (inspectable)
Game Player(s)
TIELTs KB Editors
Selected/Developed Knowledge Bases
Game Model
Agent Description
Game Interface Model
Decision System Interface Model
Experiment Methodology
TIELT User
Knowledge Base Libraries
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
27
TIELTs Knowledge Bases
Game Interface Model
Defines communication processes with the game
engine
Decision System Interface Model
Defines communication processes with the decision
system
Game Model

Defines interpretation of the game
e.g., initial state, classes, operators,
behaviors (rules)
Behaviors could be used to provide constraints on
learning

Agent Description
Defines what decision tasks (if any) TIELT must
support
Experiment Methodology
Defines selected performance tasks (taken from
Game Model Description) and the experiment to
conduct
28
TIELT Supported Performance Tasks
Performance vs. learning tasks
Performance Application of the learned knowledge
(e.g., classification) Learning Activity of
learning system (e.g., update weights in a neural
net)
TIELT users will define complex,
user-configurable performance tasks
29
An Example Complex Learning Task
Task description
This involves several challenging learning tasks
Win a real-time strategy game
Subtasks and supporting operations

Diagnosis Identify (computer and/or human)
opponent strategies goals
Classification Opponent recognition
Recording Actions of opponents and their effects
This repeatedly involves classification
Diagnosis Identify goal(s) being solved by these
effects
Classification Identify goal(s), if solved, that
prevents opponent goals
Planning Select/adapt or create plan to achieve
goals and win the game
Classification Select top-level actions to
achieve goals
Iteratively identify necessary sub-goals and,
finally, primitive actions
Design (parametric) Identify good initial layout
of controllable assets
Execute plan
Recording Collect measures of effectiveness, to
provide feedback
Planning If needed, re-plan, based on feedback,
at Step 2

30
Use Controlling a Game Character
TIELTs User Interface
Evaluation Interface
Prediction Interface
Coordination Interface
Advice Interface
TIELT User
TIELTs Internal Communication Modules
Selected Game Engine
Selected Decision System
Learned Knowledge (inspectable)
TIELTs KB Editors
Selected/Developed Knowledge Bases
Game Model
Agent Description
Game Interface Model
Decision System Interface Model
Experiment Methodology
TIELT User
Knowledge Base Libraries
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
31
UT Example Game Model
State Description
Operators
Players Array of Player Self Player Score
Integer
Shoot(Player) Preconditions Player.isVisible
Effects Player.Health - rand(10) MoveTo(Location
) Preconditions Location.isReachable()
Effects Self.position Location
Classes
Player Team String Number Integer
Position Location
Rules
GetShotBy(Player) Preconditions
Player.hasLineOfSight(Self) Effects
Self.Health - rand(10) EnemyMovements(Enemy,
Location1, Location2) Preconditions
Location2.isReachableFrom(Location1)
Enemy.position Location1 Effects
Enemy.position Location2
Location x Integer y Integer z
Integer
32
UT Example Game Interface Model
Communication
Medium TCP/IP, Port 3000 Message Format ltnamegt
ltattr1gt ltvalue1gt ltattr2gt ltvalue2gt

Examples interface messages from the GameBots API
http//www.planetunreal.com/gamebots/docapi.html

33
UT Example Decision System Interface Model
34
UT Example Agent Description
Think-Act Cycle
Shoot Something
Pick up a Healthpack
Go Somewhere Else
Call Shoot Operator
Ask Decision System Where Do I Go?
Call Pickup Operator
Ask Decision System Where Do I Go?
35
UT Example Experiment Methodology
Initialization
Game Model Unreal Tournament.xml Game Interface
GameBots.xml Decision System MyUTBot.xml Runs
100 Call slowdown(0.5)
36
Use Predicting Opponent Actions
TIELTs User Interface
Evaluation Interface
Prediction Interface
Coordination Interface
Advice Interface
TIELT User
Processed State
Raw State
TIELTs Internal Communication Modules
Selected Game Engine
Selected Decision System
Learned Knowledge (inspectable)
TIELTs KB Editors
Selected/Developed Knowledge Bases
Game Model
Agent Description
Game Interface Model
Decision System Interface Model
Experiment Methodology
TIELT User
Knowledge Base Libraries
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
37
Use Updating a Game Model
TIELTs User Interface
Evaluation Interface
Prediction Interface
Coordination Interface
Advice Interface
TIELT User
TIELTs Internal Communication Modules
Selected Game Engine
Selected Decision System
Learned Knowledge (inspectable)
TIELTs KB Editors
Selected/Developed Knowledge Bases
Game Model
Agent Description
Game Interface Model
Decision System Interface Model
Experiment Methodology
TIELT User
Knowledge Base Libraries
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
38
TIELT A Researcher Use Case

Define/store decision system interface model
Select game simulator interface
Select game model
Select/define performance task(s)
Define/select expt. methodology
Run experiments
Analyze displayed results

Selected/Developed Knowledge Bases
Knowledge Base Libraries
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
39
TIELT A Game Developer Use Case

Define/store game interface model
Define/store game model
Select decision system/interface
Define performance task(s)
Define/select expt. methodology
Run experiments
Analyze displayed results

Selected/Developed Knowledge Bases
Knowledge Base Libraries
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
40
TIELTs Internal Communication Modules
Database
Evaluation Interface
Advice Interface
Database Engine
State
Evaluator
Controller
Stored State
Current State
Translated Model (Subset)
Learning Translator (Mapper)
Model Updater
Selected Decision System
Learning Task
Selected Game Engine
Percepts
Action / Control Translator (Mapper)
Learning Outputs
Actions
Perf. Task
Game Model
Game Interface Model
Agent Description
Decision System Interface Model
Experiment Methodology
User
Game Interface Model Editor
Decision System Interface Model Editor
Game Model Editor
Agent Descr. Editor
Expt. Method. Editor
41
Sensing the Game State (City placement example,
inspired by Alpha Centauri, etc.)
1
In Game Engine, the game begins a colony pod is
created and placed.
TIELT
Current State
2
The Game Engine sends a See sensor message
identifying the pods location.
4
Action Translator
Actions
3
Updates
Game Engine
1
The Model Updater receives the sensor message and
finds the corresponding message template in the
Game Interface Model.
Sensors
3
5
2
Model Updater
Controller
3
4
Game Model
4
Game Interface Model
This message template provides updates
(instructions) to the Current State, telling it
that there is a pod at the location See describes.
User

Game Interface Model Editor
Game Model Editor
5
The Model Updater notifies the Controller that
the See action event has occurred.
42
Fetching Decisions from the Decision System (City
placement example)
1
The Controller notifies the Learning Translator
that it has received a See message.
TIELT
Controller
2
The Learning Translator finds a city location
task, which is triggered by the See message. It
queries the controller for the learning mode,
then creates a TestInput message to send to the
reasoning system with information on the pods
location and the map from the Current State.
Selected Decision System
1
Learning Outputs
Action Translator
4
Translated Model (Subset)
Learning Module 1
Learning Translator
Current State
3
. . .
2
2
Learning Module n
Agent Description
Decision System Interface Model
3
The Learning Translator transmits the TestInput
message to the Decision System.
User
4

The Decision System transmits output to the
Action Translator.
Decision System Interface Model Editor
Agent Desc. Editor
43
Acting in the Game World (City placement example)
1
The Action Translator receives a TestOutput
message from the Decision System.
4.b, c
The Advice Interface receives Move and displays
advice to a human player on what to do next, or
makes a Prediction.
2
The Action Translator finds the TestOutput
message template, determines it is associated
with the city location task, and builds a MovePod
operator (defined by the Current State) with the
parameters of TestOutput.
TIELT

Advice Interface
Prediction Interface
4.b
4.c
Current State
1
2
Actions
Action Translator
4.a
Game Engine
3
3
The Action Translator determines that the Move
Action from the Game Interface Model is triggered
by the MovePod Operator and binds Move using
information from MovePod.
2
3
Game Interface Model
Decision System Interface Model
User
Game Interface Model Editor
Decision System Interface Model Editor
4.a
The Game Engine receives Move and updates the
game to move the pod toward its destination, or
44
TIELT Status (November 2004)
Implementation

TIELT (v0.5) available
Features
Message protocols
Current Console I/O, TCP/IP, UDP
Future Library calls, HLA interface, RMI
(possibly)
Message content Configurable
Instantiated templates tell it how to communicate
with other modules
Initialization messages Start, Stop, Load
Scenario, Set Speed
Game Model representations (w/ Lehigh University)
Simple programs
TMK process models
PDDL (language used in planning competitions)

45
TIELT Status (November 2004)
Documentation

TIELT Users Manual (82 pages)
TIELT Overview
The TIELT User Interface
Scripting in TIELT
Theory of the Game Model
Communications
TMK Models
Experiments
TIELT Tutorial (45 pages)
The Game Model
The Game Interface Model
Decision System Interface Model
Agent Description
Experiment Methodology

46
TIELT Status (November 2004)
Access

TIELT www site (new)
Selected Components
Documents Documentation, publications, XML Spec
Status
Forum A full-featured web forum/bulletin board
Bug Tracker TIELT bug/feature tracking facility
FAQ-o-Matic Questions and problem solutions
user-driven
Download

47
TIELT Issues (November 2004)
1. Communication
TIELT
TIELT is a multilingual application this
provides interfacing with many different games.
TCP/IP
Library Calls
SWIG
2. Resources for learning to use TIELT

TIELT Scripting syntax highlighting
Map of TIELT Component Interactions
Thanks, Megan
Typed script interface

48
TIELT Issues (November 2004)
3. Formatting
Game Model

To no ones surprise, everyone agrees that
TIELTs Game Model representation is inadequate.
Requests have been made for
3D Maps (Quake)
A different programming language
A relational operator representation
Standardized events

49
TIELT Collaborations (2004-05)
TIELTs User Interface
TIELT User
Prediction Interface
Evaluation Interface
Coordination Interface
Advice Interface
U.Minn-D.
USC/ICT
U.Mich.
Decision System Library
Game Library
TIELTs Internal Communication Modules
Soar U.Mich
ICARUS ISLE
DCA UT Arlington
EE2
Learning Modules
Mad Doc
Troika
Neuroevolution UT Austin
FreeCiv
Others Many
NWU
ISLE
TIELTs KB Editors
Selected/Developed Knowledge Bases
LU, USC
Mich/ISLE
U. Mich.
Many
Game Model
Task Descriptions
Game Interface Model
Decision System Interface Model
Experiment Methodology
Many
TIELT User
Knowledge Base Libraries
50
TIELT Collaboration Projects (2004-05)
Organization Game Interface and Model Decision System Tasks and Evaluation Methodology
Mad Doc Software Empire Earth 2 (RTS)
Troika Games Temple of Elemental Evil (RPG)
ISLE SimCity (RTS) ICARUS ICARUS w/ FreeCiv, design
Lehigh U. Stratagus/Wargus (RTS), and HTN/TMK designs Case-based planner (CBP) Wargus/CBP
NWU FreeCiv (discrete strategy), and qualitative game representations
U. Michigan SOAR SOAR w/ 2 games (e.g., FSW, ToEE), design
U. Minnesota-Duluth RoboCup (team sports) Advice-taking components Advice processing
USC/ICT Full Spectrum Command (RTS) SOAR with FSC
UT Arlington Urban Terror (FPS) DCA (lite version)
UT Austin Neuroevolution e.g., Neuroevolution/EE2
51
Games Being Integrated with TIELT
Category Gaming Simulator Description Description Description
Category Gaming Simulator Genre Foci Perspective
Commercial Empire Earth II? (Mad Doc S/W) Temple of Elemental Evil? (Toika) SimCity? (ISLE) RTS Role-playing RTS Civilization Solve quests City manager God 1st person God
Freeware FreeCiv (NWU) (Civilization?) Wargus (Lehigh U.) (Warcraft II?) Urban Terror (UT Arlington) RoboCup Soccer (UW) Discrete strategy RTS FPS Team sports Civilization Civilization Shooter Team of agents God God 1st person Behavior designer
Military Full Spectrum Command? (USC/Inst. Creative Technologies) RTS Leading an Army Light Infantry Company 1st person
52
Promising Learning Strategies
Learning Strategy Description When to Use Justification
Advice Giving Expert explains how to perform in a given state (this is the only interactive strategy listed here) Speedup needed expert is available Permits quick acquisition of specific and general domain knowledge
Backpropagation Trains a 3-layer neural network (NN) of sigmoidal hidden units Target is a non-linear function offline training is ok Many learning tasks are non-linear and some can be performed off-line
Case-Based Reasoning Use/adapt solutions from experiences to solve similar problems Cases complement incomplete domain model problem-solving speed is crucial. Quicker to adapt cases than reason from scratch, but requires domain-specific adaptation knowledge
Chunking Compile a sequence of steps into a macro For tasks requiring speedup Transforms a complex reasoning task into a fast retrieval task
Dynamic Scripting RL for tasks with large state spaces that w/ domain knowledge can be collapsed into a smaller set Small set of states exist, with a set of rules for each Greatly speeds up RL approach, but requires analysis of task states
Evolutionary Computation Evolutionary (genetic) selection on a population of genomes, where application dictates their repn Search space is huge, and training can be done offline Genome repns can be task specific, so this powerful search method can be tuned for the task
Meta Reasoning After a failure, this identifies its type task that failed, it retrieves a task-specific strategy to avoid this failure, and updates its model To support self-adaptation Although knowledge intensive, this is an excellent method for changing problem-solving strategies
Neuroevolution Using a separate genetic algorithm population for learning each hidden units weight in a NN To support cooperating heterogeneous agents A good offline agent-based learning approach for multi-agent gaming
Reinforcement Learning (RL) Reinforce sequence of decisions after problem solving is completed Reward is known only after sequence ends, and blame can be ascribed Well-understood paradigm for learning action policies (i.e., what action to perform in a given state)
Relational MDPs Learn a Markov decision process re objects their relations using probabilistic relational models Seeking knowledge transfer (KT) to similar environments KT is crucial for learning quickly, and feasibly, for some tasks
53
TIELT-General Game Player Integration(with
Stanford Universitys Michael Genesereth)
TIELT
GGP-TIELT

Experiment design/control capabilities
Common game engine interface
Support for several learning approaches

Play entire class of general games as well as
TIELT-integrated gaming simulators.
Compete remotely against reference players and
other GGP systems.
Define evaluation methodologies for learning
experimentation.
Participate in AAAI05 GGP Competition.

GGP

Logical game formalisms
Access to remote players
WWW access

54
Upcoming Events

National Conference on AI (AAAI05 24-28 July
Pittsburgh)
General Game Playing Competition (10K prize)
Int. Joint Conference on AI (IJCAI05 30 July-5
August Edinburgh)
Workshop Reasoning, Representation, and Learning
in Gaming Simulation Tasks (Tentative title)
Int. Conference on ML (ICML05 7-11 August
Bonn)
Workshop submission in progress
Int. Conference on CBR (ICCBR05 23-26 August
Chicago)
Workshop Competition CBR in Games

55
Summary

TIELT Mediates between a (gaming) simulator and
a learning-embedded decision system
Goals
Simplify running learning expts with cognitive
systems
Support DARPA challenge problems in learning
Designed to work with many types of simulators
decision systems

Status
TIELT (v0.5 Alpha) completed in 10/04
Users Manual, Tutorial, www site exist
10 collaborating organizations (1-year contracts)
Enhances probability that TIELT will achieve its
goals
Were planning several TIELT-related events

56
Backup Slides
57
Metrics
Research perspective

Time required to develop reasoning interface KB
Ability to design/facilitate selected evaluation
methodology
Expressiveness of KB representation
Breadth of learning techniques supported
Breadth of learning and performance tasks
supported
Availability of integrated gaming simulators
challenges

Industry perspective

Ability to develop learned/learning behaviors of
interest
Time required to
develop game interface model KBs, and
these behaviors
Availability of learning-embedded reasoning
systems
Support for both off-line and on-line learning

58
Some Expected User Metrics
Performance tasks

Some standards
e.g., classification accuracy, ROC analyses,
precision recall
Decision making speed and accuracy
Plan execution quality (e.g., time to execute,
mission-specific Measures of Effectiveness)
Number of constraint violations
Ability to transfer learned knowledge

59
TIELT Potential Learning Challenge Problems

Learn to win a game (i.e., accomplish an
objective)
e.g., solve a challenging diplomacy task, provide
a realistic military training course facing
intelligent adversaries, or help users to develop
real-time cognitive reasoning skills for a
defined role in support of a multi-echelon mission

Learn an adversarys strategy
e.g., predict a terrorist groups plan and/or
tactics, suggest appropriate responses to prevent
adversarial goals, help users identify
characteristics of adversarial strategies

Learn crucial processes of an environment
e.g., learn to improve an incorrect/incomplete
game model so that it more accurately/reliably
defines objects/agents in the game, their
behaviors, their capabilities, and their
limitations

Intelligent situation assessment
e.g., learn which factors in the simulation
require attention to accomplish different types
of tasks

60
Example Game FreeCiv(Discrete-time strategy)
Civilization II? (MicroProse)

Civilization II? (1996-) 850K copies sold
PC Gamer Game of the Year Award winner
Many other awards
Civilization? series (1991-) Introduced the
civilization-based game genre

FreeCiv (Civ II clone)

Open source freeware
Discrete strategy game
Goal Defeat opponents, or build a spaceship
Resource management
Economy, diplomacy, science, cities, buildings,
world wonders
Units (e.g., for combat)
Up to 7 opponent civs
Partial observability

http//www.freeciv.org
61
Previous FreeCiv/Learning Research
(Ulam et al., AAAI04 Workshop on Challenges in
Game AI)

Title Reflection in Action Model-Based
Self-Adaptation in Game Playing Agents
Scenarios
City defense Defend a city for 3000 years

62
FreeCiv CP Scenario
General description

Game initialization Your only unit, a settler,
is placed randomly on a random world (see Game
Options below). Players cyclically alternate play
Objective Obtain highest score, conquer all
opponents, or build first spaceship
Scoring Basic goal is to obtain 1000 points.
Game options affect the score.
Citizens 2 pts per happy citizen, 1 per content
citizen
Advances 20 pts per World Wonder, 5 per
futuristic advance
Peace 3 pts per turn of world peace (no wars or
combat)
Pollution -10pts per square currently polluted
Top-level tasks (to achieve a high score)
Develop an economy
Increase population
Pursue research advances
Opponent interactions Diplomacy and
defense/combat

Game Option Y1 Y2 Y3
World size Small Normal Large
Difficulty level Warlord (2/6) Prince (3/6) King (4/6)
Opponent civilizations 5 5 7
Level of barbarian activity Low Medium High
63
FreeCiv CP Information Sources
Concepts in an Initial Knowledge Base

Resources Collection and use
Food, production, trade (money)
Terrain
Resources gained per turn
Movement requirements
Units
Type (Military, trade, diplomatic, settlers,
explorers)
Health
Combat Offense defense
Movement constraints (e.g., Land, sea, air)
Government Types (e.g., anarchy, despotism,
monarchy, democracy)
Research network Identifies constraints on what
can be studied at any time
Buildings (e.g., cost, capabilities)
Cities
Population Growth
Happiness
Pollution
Civilizations (e.g., military strength,
aggressiveness, finances, cities, units)
Diplomatic states negotiations

64
FreeCiv CP Decisions
Civilization decisions

Choice of government type (e.g., democracy)
Distribution of income devoted to research,
entertainment, and wealth goals
Strategic decisions affecting other decisions
(e.g., coordinated unit movement for trade)

City decisions

Production choice (i.e., what to create,
including city buildings and units)
Citizen roles (e.g., laborers, entertainers, or
specialists), and laborer placement
Note Locations vary in their terrain, which
generate different amounts of food, income, and
production capability

Unit decisions

Task (e.g., where to build a city, whether/where
to engage in combat, espionage)
Movement

Diplomacy decisions

Whether to sign a proffered peace treaty with
another civilization
Whether to offer a gift

65
FreeCiv CP Decision Space
Variables

Civilization-wide variables
N Number of civilizations encountered
D Number of diplomatic states (that you can have
with an opponent)
G Number of government types available to you
R Number of research advances that can be
pursued
I Number of partitions of income into
entertainment, money, research
U Units
L Number of locations a unit can move to in a
turn
C Cities
Z Number of citizens per city
S Citizen status (i.e., laborer, entertainer,
doctor)
B Number of choices for city production

Decision complexity per turn (for a typical game
state)

O(DNGRILU(SZB)C) this ignores both
other variables and domain knowledge
This becomes large with the number of units and
cities
Example N3 D5 G3 R4 I10 U25 L4
C8 Z10 S3 B10
Size of decision space (i.e., possible next
states) 2.51065 (in one turn!)
Comparison Decision space of chess per turn is
well below 140 (e.g., 20 at first move)

66
FreeCiv CP A Simple Example Learning Task
Situation

Were England (e.g., London)
Barbarians are north (in red)
Two other civs exist
Our military is weak

What should we do?

Ally with Wales? If so, how?
Build a military unit? Which?
Improve defenses?
Increase citys production rate?
Build a new city to the south? Where?
Research Gun Powder? Or?
Move our diplomat back to London?
A combination of these?

What information could help with this decision?

Previous similar experiences
Generalizations of those experiences
Similarity knowledge

Adaptation knowledge
Opponent model
Statistics on barbarian strength, etc.

67
Analysis of the Example Learning Task
Complexity function

O(DNGRILU(SZB)C)

Situation

D 3 (war, neutral, peace)
N Only 1 other civilization contacted (i.e.,
Wales)
G 2 government types known
R 4 research advances available
I 5 partitions of income available
L 14 per unit
U 3 Units (1 external, 2 in city)
C 1 City
S 3 (entertainer, laborer, doctor)
Z 6 citizens
B 5 units/buildings it can produce

Decision Space Size

1.2109
This reduces to 32 sensible choices after
applying some domain knowledge
e.g., dont change diplomatic status now, keep
units in city for defense, dont change
government now (because itll slow production),
keep external unit away from danger

68
FreeCiv CP Learning Opportunities
Learn to keep citizens happy

Citizens in a city who are unhappy will revolt
this temporarily eliminates city production
Several factors influence happiness (e.g.,
entertainment, military presence, govt type)

Learn to obtain diplomatic advantages

Countries at war tend to have decreased trade,
lose units and cities, etc.
Diplomats can sometimes obtain peace treaties or
otherwise end wars
Unit movement decisions can also impact
opponents diplomatic decisions

Learn how to wage war successfully

Good military decisions can yield new
cities/citizens/trade, but losses can be huge
Unit decisions can benefit from learning tactical
coordinated behaviors
The selection of a military unit(s) for a task
depends on the opponents capabilities

Learn how to increase territory size

Initially, unexplored areas are unknown their
resources (e.g., gold) cannot be harvested
Exploration needs to be balanced with security
City placement decisions influence territory
expansion

69
FreeCiv CP Example Learned Knowledge
Learn what playing strategy to use in each
adversarial situation

Situations are defined by relative military
strength, diplomatic status, whether the opponent
has strong alliances, locations of forces, etc.
Selecting a good playing strategy depends on many
of these variables

70
What Techniques Could Learn the Task of Selecting
a Playing Strategy?
Meta-reasoning (e.g., Ulam et al., AAAI04 Wkshp
on Challenges in Game AI)

Requires knowledge on
Tasks being performed
Types of failures that can occur when performing
these tasks
T2 Overestimate own strength, underestimate
enemy strength,
T3 Incorrect assessment of enemys diplomatic
status,
Strategies for adapting these tasks
S1 Increase military strength
S2 Assess distribution of enemy forces
S3 Consider enemys diplomatic history
Mapping of failure types in (2) to adaptation
strategies in (3)
Example We decided to Attack, but underestimated
enemy strength. This was indexed by strategy S2,
which well do from now on in T2.

T1 Determine Playing Strategy
T3 Assess Diplomatic Status
T4 Select Strategy
T2 Assess Military Advantage
Attack
Retreat!
Fortify
Trade
Seek Peace
Bribe
71
Challenges for Using Learning via Meta-Reasoning
How can its background knowledge be learned
(efficiently)?

i.e., tasks, failure types, failure adaptation
strategies, mappings
Also, the agent needs to understand how to
diagnosis an error (i.e., identify which task
failed and its failure type)

What if only incomplete background knowledge
exists?

Could complementary learning techniques apply it?
e.g., Relational MDPs (which handle uncertainty)
Could learning techniques be used to
extend/correct it?
e.g., Learning from advice, case-based reasoning

Can we scale it to more challenging learning
problems?

Currently, it has only been applied to simpler
tasks
Defend a City (in FreeCiv)
More difficult would be Play Entire Game

72
Full Spectrum Command Warrior(http//www.ict.us
c.edu/disp.php?bdproj_games)
Organization USCs Institute for Creative
Technologies

POC Michael van Lent (Editor-in-Chief, Journal
of Game Development)
Goal Develop immersive, interactive, real time
training simulations to help the Army create
decision-making leadership-development tools

Focus US Army training tools (deployed _at_ Ft
Benning Afghanistan)

Full Spectrum Command (PC-based simulator)
Role Commander of a U.S. Army light infantry
Company (120 soldiers)
Tasks Interpret the assigned mission, organize
the force, plan strategically, coordinate the
actions of the Company
Full Spectrum Warrior (MS Xbox-based simulator)
Role Light infantry squad leader
Tasks Complete assigned missions safely

73
METAGAME(Pell, 1992)
Focus Learn strategies to win any game in a
pre-defined category

Initial category Chess-like games
Games are produced by a game generator
Input Rules on how to play the game
Move grammar is used to communicate actions
Output (desired) A winning playing strategy

Games
Graphics for Spectators
Game Manager
percept actions clocks
action
Temporary State Data
Records
74
Collaborator Mad Doc Software
Summary

PI Ron Rosenberg (Producer)
Experience
Mad Doc is a leader in real-time strategy games
Empire Earth II is expected to sell in the
millions of copies
CEO Ian Davis (CMU PhD in Robotics) is a well
known collaborator with the AI research
community, and gave an invited presentation at
AAAI04. He will work with Ron on this contract.
Deliverables Mad Doc (RTS) game simulator API
This will be used by multiple other collaborators

75
Collaborator Troika Games
Summary

PI Tim Cain, Joint-CEO
Experience
Troika has outstanding experience with developing
state-of-the-art role playing games, including
Temple of Elemental Evil (ToEE)
A game developer since 1982, Tim obtained an M.S.
with a focus on machine learning at UC Irvine in
the late 1980s.
Deliverables ToEE (RPG) game simulator API
This will be used by some other collaborators
(e.g., U. Michigan)

76
Collaborator ISLE
Summary

PIs Dr. Seth Rogers, Dr. Pat Langley
Experience
ISLE (Institute for the Study of Learning and
Expertise) is known for its ICARUS cognitive
architecture, which is distinguished in part by
its commitment to ground every symbol with a
physical world object
Pat Langley, founder of the journal Machine
Learning, is known for his expertise in cognitive
architectures and evaluation methodologies of
learning systems.
Deliverables
ICARUS reasoning system API
FreeCiv agent (with assistance from NWU) and
SimCity agent
This will also be used by USC/ICT
SimCity (RTS) game simulator API

77
Collaborator Lehigh U.
Summary

PI Prof. Héctor Muñoz-Avila
Experience
Héctor is an expert on hierarchical planning
technology, and in particular has expertise in
case-based planning
Collaborating with NRL on TIELT during CY04 on
(1) Game Model description representations, (2)
Stratagus/Wargus game simulator API, and (3)
feedback on TIELT usage
Deliverables
Software for translating among Game Model
representations
Stratagus/Wargus (RTS) game simulator API
This may be used by UT Austin
Case-based planning reasoning system API

78
Collaborator NWU
Summary

PIs Prof. Ken Forbus, Prof. Tom Hinrichs
Experience
Ken is a leading AI/games researcher. He is also
the leading worldwide researcher in computational
approaches to reasoning by analogy.
Kens group has extensive experience with
qualitative reasoning approaches and with using
the FreeCiv gaming simulator.
Deliverables
FreeCiv (Discrete Strategy) game simulator API
This will be used by ISLE
Qualitative spatial reasoning system for FreeCiv
API

79
Collaborator U. Michigan
Summary

PI Prof. John Laird
Experience
John is the best-known AI/games researcher, and
has extensive experience with integrating many
commerical, freeware, and military game
simulators with the Soar cognitive architecture.
Deliverables
Soar reasoning system API
This will be used by USC/ICT
Applications of Soar to two game simulators
(e.g., ToEE, Wargus)

80
Collaborator USC/ICT
Summary

PI Dr. Michael van Lent
Experience
Extensive implementation experience with AI/game
research PhD advisor was John Laird.
Lead ICTs development of Full Spectrum Warrior
and Full Spectrum Command (FSC) in collaboration
with Quicksilver Software and the Armys PEO
STRI. FSC is deployed at Ft. Benning and
Afghanistan.
Editor-in-Chief, Journal of Game Development
Deliverables
FSC (RTS) game simulator API
Applications of FSC with U. Michigans Soar and
ISLEs ICARUS

81
Collaborator UT Arlington
Summary

PIs Prof. Larry Holder, G. Michael Youngblood
Experience
Larry has extensive experience with developing
unsupervised machine learning systems that use
relational representations, and has lead efforts
on developing the DArtagnan cognitive
architecture.
Deliverables
Urban Terror (FPS) game simulator API
DArtagnan reasoning system API (partial)

82
Collaborator UT Austin
Summary

PI Prof. Risto Miikkulainen
Experience
Risto has significant experience with integrating
neuro-evolution and similar approaches with game
simulators.
Collaborating with UT Austins Digital Media
Laboratorys development of the NERO (FPS) game
simulator
Deliverables
Knowledge-intensive neuro-evolution reasoning
system API
Application of this API using other simulators
(e.g., FSC, Wargus) and U. Wisconsins advice
processing module

83
Collaborator U. Wisconsin
Summary

PI(s) Prof. Jude Shavlik (UW), Prof. Richard
Maclin (U. Minn-Duluth)
Experience
Jude advised the first significant M.S. Thesis on
applying machine learning to FPS game simulators
(Geisler, 2002)
Maclin, who will be on sabbatical at U. Wisconsin
during this project, has performed extensive work
with applying AI techinques (e.g., advice
processing) to the RoboCup game simulator
Deliverables
RoboCup (team sports) game simulator API
Advice processing module
WWW-based repository for TIELT software
components (e.g., APIs)