Title: Testbed for Integrating and Evaluating Learning Techniques
1Testbed for Integrating and Evaluating Learning
Techniques
TIELT
David W. Aha1 Matthew Molineaux2 1Intelligent
Decision Aids Group Navy Center for Applied
Research in AI Naval Research Laboratory
Washington, DC 2ITT Industries AES Division
Alexandria, VA first.surname_at_nrl.navy.mil
17 November 2004
2Outline
- Motivation Learning in cognitive systems
- Objectives
- Encourage machine learning research on complex
tasks that require knowledge-intensive approaches - Provide industry military with access to the
results - Design TIELT functionality components
- Example Knowledge base content
- Status
- Implementation documentation
- Collaborations events
- Task list
- Summary
3DARPA
- Defense Advanced Research Projects Agency
(2.3B/yr)
4Cognitive Systems
Systems that know what theyre doing
- A cognitive system is one that
- can reason, using substantial amounts of
appropriately represented knowledge - can learn from its experience so that it performs
better tomorrow than it did today - can explain itself and be told what to do
- can be aware of its own capabilities and reflect
on its own behavior - can respond robustly to surprise
5 Anatomy of a Cognitive Agent
Reflective Processes
LTM
Cognitive Agent
Concepts
STM
Deliberative Processes
Learning
Other reasoning
Sentences
Communication (language, gesture, image)
Prediction, planning
Perception
Action
Reactive Processes
Sensors
Effectors
External Environment
Attention
(Brachman, 2003)
6Learning in Cognitive Systems(Langley Laird,
2002)
Many opportunities exist for learning in
cognitive systems
7Status of Learning in Cognitive Systems
Problem
- Few deployed cognitive systems integrate
techniques that exhibit rapid enduring learning
behavior on complex tasks - Its costly to integrate evaluate embedded
learning techniques
8TIELT Motivation
- We want Cognitive Agents that Learn
- Rapidly,
- in context, and
- over the long-term.
- We have few (if any) of them
9TIELT Objective
- Encourage the study of research on learning in
cognitive systems, with subsequent transition
goals
Learning Modules
Cognitive Agents That Learn
Military
ML Researchers
Cognitive Agents
Industry
10Current ML Research Focus
- Benchmark studies of multiple algorithms on
simple (e.g., supervised) learning tasks from
many static datasets
ML Researcher
ML System1
Database1
m results on System1
Analysis
Benchmark Analysis
Database2
ML System2
m results on System2
. . .
. . .
. . .
Databasem
ML Systemn
m results on Systemn
This was encouraged (in part) by the availability
of datasets in a standard (interface) format
11Previous API for ML Investigations
Inspiration
- UC Irvine Repository of Machine Learning (ML)
Databases - An interface for empirical benchmarking studies
on supervised learning - 1525 citations (and many publications use it w/o
citing) since 1986
Supervised Learning
ML Systemj
Decision Systemk
Interface (standard format)
Databasei
12Accomplishing TIELTs Objective
- One approach Shift ML research focus from static
datasets to dynamic simulators of rich
environments
13Refining TIELTs Objective
Objective
- Develop a tool for evaluating decision systems in
simulators - Specific support for evaluating learning
techniques - Demonstrate research utility prior to approaching
industry/military
Benefits
- Reduce system-simulator integration costs from
mn to mn (see next) - Permits benchmark studies on selected simulator
tasks - Encourages study of ML for knowledge-intensive
problems - Provide support for DARPA Challenge Problems on
Cognitive Learning
14Reducing Integration Costs
15What Domain?
Desiderata
- Available implementations (cheap to acquire
run) - Challenging problems for CogSys/ML research
- Significant interest (academia, military,
industry, funding, public)
Simulation Games?
16Gaming Genres of Interest(modified from (Laird
van Lent, 2001))
AI Roles
Sub-Genres
Description
Example
Genre
Control enemies
1st vs. 3rd person, solo vs team play
Control a character
Quake, Unreal
Action
Control enemies, partners, and supporting
characters
Solo vs. (massively) multi-player
Be a character (includes puzzle solving, etc.)
Temple of Elemental Evil
Role-Playing
Control all units and strategic enemies
God, first-person perspectives
Controlling at multiple levels (e.g., strategic,
tactical warfare)
Empire Earth 2, AoE, Civilization
Strategy (real-time, discrete)
Control units and strategic enemy (i.e., other
coach), commentator
Act as coach and a key player
Madden NFL Football
Team Sports
1st vs. 3rd person
Control enemy
Individual competition
Many (e.g., driving games)
Individual Sports
17Some Game Environment Challenges
- Significant background knowledge available
- e.g., Processes, tasks, objects, actions
- Use Provide opportunities for rapid learning
- Adversarial
- Collaborative
- Multiple reasoning levels (e.g., strategic,
tactical) - Real-time
- Uncertainty (Fog of War)
- Noise (e.g., imprecision)
- Relational (e.g., social networks)
- Temporal
- Spatial
18Academia Learning in Simulation Games
Focus Broad interests
- Game engines (e.g., GameBots, ORTS, RoboCup
Soccer Server) - Use (other) open source engines (e.g., FreeCiv,
Stratagus) - Representation (e.g., Forbus et al., 2001 Houk,
2004 Munoz-Avila Fisher, 2004) - Learning opponent unit models (e.g., Laird, 2001
Hill et al., 2002) - (see table)
Evidence of commitment
- Interactive Computer Games Human-Level AIs
Killer Application (Laird van Lent, AAAI00
Invited Talk) - Meetings
- AAAI symposia (several in recent years)
- International Conference on Computers and Games
- AAAI04 Workshop on Challenges in Game AI
- AI in Interactive Digital Entertainment
Conference (2005-) - New journals focusing on (e.g., real-time)
simulation games - J. of Game Development
- Int. J. of Intelligent Games and Simulation
19Survey Selected Previous Work onLearning
Gaming Simulators
Name Reference Method Tasks Tasks Test Plan Metrics (independent variables to vary and dependents to measure)
Name Reference Method Learning Performance Test Plan Metrics (independent variables to vary and dependents to measure)
(Goodman, AAAI93) Projective Visualization 1 TDIDT per feature cluster Predict amount of inflicted damage Vary training amount projection length predict summed pain
MAYOR (Fasciano, 1996 M.S. Thesis) Case-based planning Plan Execution Conds. Maximize SimCity Game Score Online Vary whether learning was used measure successful plan executions
(Fogel et al., CCGFBR96) Genetic Alg Rule learning 1x1 tank battles Vary locations/space of routes measure damage
KnoMic (van Lent Laird, ICML98) Production Rules Rule Conds. Goals Racetrack Mission for TacAir SOAR Measure speed in which KnoMic learned correct control rules
(Agogino et al., 1999 NPL) Neuro-evolution Wt genetic learning 30 gold-collecting peons vs. 1 human Vary learning methodology measure survival rate of peons
(Laird, ICAA01) SOAR Chunking Rule learning Predict enemy beh. None would focus on speedup
(Geisler, 2002 M.S. Thesis) NB, TDIDT, BP, ensembles Depends on the method 4 simple classification tasks Vary training set size ensembles measure classification accuracy
Bryant Mikkulainen, CEC03) Neuroevolution NN wts, etc. Discrete Legions vs. Barbarians Offline Vary training set size measure a game-specific fn.
(Chia Williams, BRIMS03) Naïve Bayes Learning to add/del. rules 1x1 tank battles Vary adversarial aggressiveness whether learning occurs measure wins
(Fagan Cunningham, ICCBR03) Case-based prediction Selecting plans to save Predict a players action Vary the stored plans and the user measure acc. prediction freq.
(Guestrin et al., IJCAI03) Relational MDPs Partition objects Beat enemy in 3x3 Freecraft games Simplistic one run.
(Sweetser Dennis, 2003 Ent. Computing Tech. Applications) Advice giving Regression wts Just-in-time Hints to Human Player Vary with vs. without providing hints measure hints that were useful
(Spronck et al., 2004 IJIGS) Dynamic Scripting Rule wts Beat NWN AI in simple scenarios Offline Measure average turning pt speed, effectiveness, robustness, efficiency
(Ponsen, 2004 M.S. Thesis) Dynamic Scripting GA for rule learning Rule wts and new rules Defeat Wargus opponent Offline Vary map size, learning algorithm, and opponent control alg measure wins
(Ulam et al., AAAI04 Workshop) Self-adaptation Task Edits Defend city (FreeCiv) Offline Vary trace size measure successes
20Industry Learning in Simulation Games
Focus Increase sales via enhanced gaming
experience
- USA 7B in sales in 2003 (ESA, 2004)
- Strategy games 0.3B
- Simulators Many! (e.g., SimCity, Quake, SoF, UT)
- Target Control avatars, unit behaviors
Evidence of commitment
- Developers keenly interested in building AIs
that might learn, both from the player
environment around them. (GDC03 Roundtable
Report) - Middleware products that support learning (e.g.,
MASA, SHAI, LearningMachine) - Long-term investments in learning (e.g., iKuni,
Inc.) - Conferences
- Game Developers Conference
- Computer Game Technology Conference
21 Industry Learning in Simulation Games
Status
- Few deployed systems have used learning (Kirby,
2004) e.g., - Black White on-line, explicit (player
immediately reinforces behavior) - CC Renegade on-line, implicit (agent updates
set of legal paths) - Re-volt off-line, implicit (GA tunes racecar
behaviors prior to shipping) - Problems Performance, constraints (preventing
learning something dumb), trust in learning
system
Some Promising Techniques (Rabin, 2004)
- Belief networks for probabilistic inference
- Decision tree learning
- Genetic algorithms (e.g., for offline parameter
tuning) - Statistical prediction (e.g., using N-grams to
predict future events) - Neural networks (e.g., for offline applications)
- Player modeling (e.g., to regulate game
difficulty, model reputation) - Reinforcement learning
- Weakness modification learning (e.g., dont
repeat failed strategies)
22Military Learning in Simulation Games
Focus Training, analysis, experimentation
- Learning Acquisition of new knowledge or
behaviors - Simulators JWARS, OneSAF, Full Spectrum Command,
etc. - Target Control strategic opponent or own units
Evidence of commitment
- Learning is an essential ability of intelligent
systems (NRC, 1998) - To realize the full benefit of a human behavior
model within an intelligent simulator,the model
should incorporate learning (Hunter et al.,
CCGBR00) - Successful employment of human behavior
modelsrequires that they possess the ability
to integrate learning (Banks Stytz, CCGBR00) - Conferences BRIMS, I/ITSEC
Status No CGF simulator has been deployed with
learning (D. Reece, 2003)
- Some problems (Petty, CGFBR01)
- Cost of training phase
- Loss of training control
- Learning non-doctrinal behaviors
- Learning unpredictable behaviors
23Analysis Conclusions
State-of-the-art
- Research on learning in complex gaming simulators
is in its infancy - Knowledge-poor approaches are limited to simple
performance tasks - Knowledge-intensive approaches require huge
knowledge bases, which to date have been manually
encoded - Existing approaches have many simplifying
assumptions - Scenario limitations (e.g., on number and/or
capabilities of adversaries) - Learning is (usually) performed only off-line
- Learned knowledge is not transferred (e.g., to
playing other games)
Significant advances would include
- Fast acquisition approaches for a large amount of
domain knowledge - This would enable rapid learning without
requiring manual encoding - Demonstrations of on-line learning (i.e., within
a single simulation run) - Increasing knowledge transfer among tasks
simulators over time - e.g., knowledge of processes, strategies, tasks,
roles, objects, actions
24TIELT Specification
- Simplifies integration evaluation!
- Learning-embedded decision systems gaming
simulators - Supports communications, game model, perf. task,
evaluation - Free available
- Learning foci
- Task (e.g., learn how to execute, or advise on, a
task) - Player (e.g., accept advice, predict a players
strategies) - Game (e.g., learn/refine its objects, their
relations, behaviors) - Learning methods
- Supervised/unsupervised, immediate/delayed
feedback, analytic, active/passive,
online/offline, direct/indirect,
automated/interactive - Learning results should be available for
inspection - Gaming simulators Those with challenging
learning tasks - Reuse
- Communications are separated from the game model
perf. task - Provide access to libraries of simulators
decision systems
25Distinguishing TIELT
System Focus Game Engine(s) Prominent Feature Reasoning Activity
DirectIA (MASA) AI SDK ? FPS, RTS, etc. Behavior authoring Sense-act,
SimBionic (SHAI) AI SDK ? FPS, etc. Behavior authoring Sense-act,
FEAR AI SDK Quake 2, etc. Behavior authoring Sense-act,
RoboCup Research Testbed RoboCup Soccer game play Sense-act, coaching, etc.
GameBots Research Testbed UT (FPS) UT game play Sense-act
ORTS Research Testbed RTS games Hack-free MM RTS Sense-act, strategy
TIELT Research Testbed Several genres Experimentation for evaluating learning learned behaviors Sense-act, advice processing, prediction, model updating, etc.
- Provides an interface for message-passing
interfaces - Supports composable system-level interfaces
26TIELT Integration Architecture
TIELTs User Interface
Game Engine Library
Evaluation Interface
Prediction Interface
Coordination Interface
Advice Interface
TIELT User
TIELTs Internal Communication Modules
Selected Game Engine
Selected Decision System
. . .
Learned Knowledge (inspectable)
Game Player(s)
TIELTs KB Editors
Selected/Developed Knowledge Bases
Game Model
Agent Description
Game Interface Model
Decision System Interface Model
Experiment Methodology
TIELT User
Knowledge Base Libraries
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
27TIELTs Knowledge Bases
Game Interface Model
Defines communication processes with the game
engine
Decision System Interface Model
Defines communication processes with the decision
system
Game Model
- Defines interpretation of the game
- e.g., initial state, classes, operators,
behaviors (rules) - Behaviors could be used to provide constraints on
learning
Agent Description
Defines what decision tasks (if any) TIELT must
support
Experiment Methodology
Defines selected performance tasks (taken from
Game Model Description) and the experiment to
conduct
28TIELT Supported Performance Tasks
Performance vs. learning tasks
Performance Application of the learned knowledge
(e.g., classification) Learning Activity of
learning system (e.g., update weights in a neural
net)
TIELT users will define complex,
user-configurable performance tasks
29An Example Complex Learning Task
Task description
This involves several challenging learning tasks
Win a real-time strategy game
Subtasks and supporting operations
- Diagnosis Identify (computer and/or human)
opponent strategies goals - Classification Opponent recognition
- Recording Actions of opponents and their effects
- This repeatedly involves classification
- Diagnosis Identify goal(s) being solved by these
effects - Classification Identify goal(s), if solved, that
prevents opponent goals - Planning Select/adapt or create plan to achieve
goals and win the game - Classification Select top-level actions to
achieve goals - Iteratively identify necessary sub-goals and,
finally, primitive actions - Design (parametric) Identify good initial layout
of controllable assets - Execute plan
- Recording Collect measures of effectiveness, to
provide feedback - Planning If needed, re-plan, based on feedback,
at Step 2
30Use Controlling a Game Character
TIELTs User Interface
Evaluation Interface
Prediction Interface
Coordination Interface
Advice Interface
TIELT User
TIELTs Internal Communication Modules
Selected Game Engine
Selected Decision System
Learned Knowledge (inspectable)
TIELTs KB Editors
Selected/Developed Knowledge Bases
Game Model
Agent Description
Game Interface Model
Decision System Interface Model
Experiment Methodology
TIELT User
Knowledge Base Libraries
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
31UT Example Game Model
State Description
Operators
Players Array of Player Self Player Score
Integer
Shoot(Player) Preconditions Player.isVisible
Effects Player.Health - rand(10) MoveTo(Location
) Preconditions Location.isReachable()
Effects Self.position Location
Classes
Player Team String Number Integer
Position Location
Rules
GetShotBy(Player) Preconditions
Player.hasLineOfSight(Self) Effects
Self.Health - rand(10) EnemyMovements(Enemy,
Location1, Location2) Preconditions
Location2.isReachableFrom(Location1)
Enemy.position Location1 Effects
Enemy.position Location2
Location x Integer y Integer z
Integer
32UT Example Game Interface Model
Communication
Medium TCP/IP, Port 3000 Message Format ltnamegt
ltattr1gt ltvalue1gt ltattr2gt ltvalue2gt
- Examples interface messages from the GameBots API
- http//www.planetunreal.com/gamebots/docapi.html
33UT Example Decision System Interface Model
34UT Example Agent Description
Think-Act Cycle
Shoot Something
Pick up a Healthpack
Go Somewhere Else
Call Shoot Operator
Ask Decision System Where Do I Go?
Call Pickup Operator
Ask Decision System Where Do I Go?
35UT Example Experiment Methodology
Initialization
Game Model Unreal Tournament.xml Game Interface
GameBots.xml Decision System MyUTBot.xml Runs
100 Call slowdown(0.5)
36Use Predicting Opponent Actions
TIELTs User Interface
Evaluation Interface
Prediction Interface
Coordination Interface
Advice Interface
TIELT User
Processed State
Raw State
TIELTs Internal Communication Modules
Selected Game Engine
Selected Decision System
Learned Knowledge (inspectable)
TIELTs KB Editors
Selected/Developed Knowledge Bases
Game Model
Agent Description
Game Interface Model
Decision System Interface Model
Experiment Methodology
TIELT User
Knowledge Base Libraries
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
37Use Updating a Game Model
TIELTs User Interface
Evaluation Interface
Prediction Interface
Coordination Interface
Advice Interface
TIELT User
TIELTs Internal Communication Modules
Selected Game Engine
Selected Decision System
Learned Knowledge (inspectable)
TIELTs KB Editors
Selected/Developed Knowledge Bases
Game Model
Agent Description
Game Interface Model
Decision System Interface Model
Experiment Methodology
TIELT User
Knowledge Base Libraries
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
38TIELT A Researcher Use Case
- Define/store decision system interface model
- Select game simulator interface
- Select game model
- Select/define performance task(s)
- Define/select expt. methodology
- Run experiments
- Analyze displayed results
Selected/Developed Knowledge Bases
Knowledge Base Libraries
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
39TIELT A Game Developer Use Case
- Define/store game interface model
- Define/store game model
- Select decision system/interface
- Define performance task(s)
- Define/select expt. methodology
- Run experiments
- Analyze displayed results
Selected/Developed Knowledge Bases
Knowledge Base Libraries
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
40TIELTs Internal Communication Modules
Database
Evaluation Interface
Advice Interface
Database Engine
State
Evaluator
Controller
Stored State
Current State
Translated Model (Subset)
Learning Translator (Mapper)
Model Updater
Selected Decision System
Learning Task
Selected Game Engine
Percepts
Action / Control Translator (Mapper)
Learning Outputs
Actions
Perf. Task
Game Model
Game Interface Model
Agent Description
Decision System Interface Model
Experiment Methodology
User
Game Interface Model Editor
Decision System Interface Model Editor
Game Model Editor
Agent Descr. Editor
Expt. Method. Editor
41Sensing the Game State (City placement example,
inspired by Alpha Centauri, etc.)
1
In Game Engine, the game begins a colony pod is
created and placed.
TIELT
Current State
2
The Game Engine sends a See sensor message
identifying the pods location.
4
Action Translator
Actions
3
Updates
Game Engine
1
The Model Updater receives the sensor message and
finds the corresponding message template in the
Game Interface Model.
Sensors
3
5
2
Model Updater
Controller
3
4
Game Model
4
Game Interface Model
This message template provides updates
(instructions) to the Current State, telling it
that there is a pod at the location See describes.
User
Game Interface Model Editor
Game Model Editor
5
The Model Updater notifies the Controller that
the See action event has occurred.
42Fetching Decisions from the Decision System (City
placement example)
1
The Controller notifies the Learning Translator
that it has received a See message.
TIELT
Controller
2
The Learning Translator finds a city location
task, which is triggered by the See message. It
queries the controller for the learning mode,
then creates a TestInput message to send to the
reasoning system with information on the pods
location and the map from the Current State.
Selected Decision System
1
Learning Outputs
Action Translator
4
Translated Model (Subset)
Learning Module 1
Learning Translator
Current State
3
. . .
2
2
Learning Module n
Agent Description
Decision System Interface Model
3
The Learning Translator transmits the TestInput
message to the Decision System.
User
4
The Decision System transmits output to the
Action Translator.
Decision System Interface Model Editor
Agent Desc. Editor
43Acting in the Game World (City placement example)
1
The Action Translator receives a TestOutput
message from the Decision System.
4.b, c
The Advice Interface receives Move and displays
advice to a human player on what to do next, or
makes a Prediction.
2
The Action Translator finds the TestOutput
message template, determines it is associated
with the city location task, and builds a MovePod
operator (defined by the Current State) with the
parameters of TestOutput.
TIELT
Advice Interface
Prediction Interface
4.b
4.c
Current State
1
2
Actions
Action Translator
4.a
Game Engine
3
3
The Action Translator determines that the Move
Action from the Game Interface Model is triggered
by the MovePod Operator and binds Move using
information from MovePod.
2
3
Game Interface Model
Decision System Interface Model
User
Game Interface Model Editor
Decision System Interface Model Editor
4.a
The Game Engine receives Move and updates the
game to move the pod toward its destination, or
44TIELT Status (November 2004)
Implementation
- TIELT (v0.5) available
- Features
- Message protocols
- Current Console I/O, TCP/IP, UDP
- Future Library calls, HLA interface, RMI
(possibly) - Message content Configurable
- Instantiated templates tell it how to communicate
with other modules - Initialization messages Start, Stop, Load
Scenario, Set Speed - Game Model representations (w/ Lehigh University)
- Simple programs
- TMK process models
- PDDL (language used in planning competitions)
45TIELT Status (November 2004)
Documentation
- TIELT Users Manual (82 pages)
- TIELT Overview
- The TIELT User Interface
- Scripting in TIELT
- Theory of the Game Model
- Communications
- TMK Models
- Experiments
- TIELT Tutorial (45 pages)
- The Game Model
- The Game Interface Model
- Decision System Interface Model
- Agent Description
- Experiment Methodology
46TIELT Status (November 2004)
Access
- TIELT www site (new)
- Selected Components
- Documents Documentation, publications, XML Spec
- Status
- Forum A full-featured web forum/bulletin board
- Bug Tracker TIELT bug/feature tracking facility
- FAQ-o-Matic Questions and problem solutions
user-driven - Download
47TIELT Issues (November 2004)
1. Communication
TIELT
TIELT is a multilingual application this
provides interfacing with many different games.
TCP/IP
Library Calls
SWIG
2. Resources for learning to use TIELT
- TIELT Scripting syntax highlighting
- Map of TIELT Component Interactions
- Thanks, Megan
- Typed script interface
48TIELT Issues (November 2004)
3. Formatting
Game Model
- To no ones surprise, everyone agrees that
TIELTs Game Model representation is inadequate. - Requests have been made for
- 3D Maps (Quake)
- A different programming language
- A relational operator representation
- Standardized events
49TIELT Collaborations (2004-05)
TIELTs User Interface
TIELT User
Prediction Interface
Evaluation Interface
Coordination Interface
Advice Interface
U.Minn-D.
USC/ICT
U.Mich.
Decision System Library
Game Library
TIELTs Internal Communication Modules
Soar U.Mich
ICARUS ISLE
DCA UT Arlington
EE2
Learning Modules
Mad Doc
Troika
Neuroevolution UT Austin
FreeCiv
Others Many
NWU
ISLE
TIELTs KB Editors
Selected/Developed Knowledge Bases
LU, USC
Mich/ISLE
U. Mich.
Many
Game Model
Task Descriptions
Game Interface Model
Decision System Interface Model
Experiment Methodology
Many
TIELT User
Knowledge Base Libraries
50TIELT Collaboration Projects (2004-05)
Organization Game Interface and Model Decision System Tasks and Evaluation Methodology
Mad Doc Software Empire Earth 2 (RTS)
Troika Games Temple of Elemental Evil (RPG)
ISLE SimCity (RTS) ICARUS ICARUS w/ FreeCiv, design
Lehigh U. Stratagus/Wargus (RTS), and HTN/TMK designs Case-based planner (CBP) Wargus/CBP
NWU FreeCiv (discrete strategy), and qualitative game representations
U. Michigan SOAR SOAR w/ 2 games (e.g., FSW, ToEE), design
U. Minnesota-Duluth RoboCup (team sports) Advice-taking components Advice processing
USC/ICT Full Spectrum Command (RTS) SOAR with FSC
UT Arlington Urban Terror (FPS) DCA (lite version)
UT Austin Neuroevolution e.g., Neuroevolution/EE2
51Games Being Integrated with TIELT
Category Gaming Simulator Description Description Description
Category Gaming Simulator Genre Foci Perspective
Commercial Empire Earth II? (Mad Doc S/W) Temple of Elemental Evil? (Toika) SimCity? (ISLE) RTS Role-playing RTS Civilization Solve quests City manager God 1st person God
Freeware FreeCiv (NWU) (Civilization?) Wargus (Lehigh U.) (Warcraft II?) Urban Terror (UT Arlington) RoboCup Soccer (UW) Discrete strategy RTS FPS Team sports Civilization Civilization Shooter Team of agents God God 1st person Behavior designer
Military Full Spectrum Command? (USC/Inst. Creative Technologies) RTS Leading an Army Light Infantry Company 1st person
52Promising Learning Strategies
Learning Strategy Description When to Use Justification
Advice Giving Expert explains how to perform in a given state (this is the only interactive strategy listed here) Speedup needed expert is available Permits quick acquisition of specific and general domain knowledge
Backpropagation Trains a 3-layer neural network (NN) of sigmoidal hidden units Target is a non-linear function offline training is ok Many learning tasks are non-linear and some can be performed off-line
Case-Based Reasoning Use/adapt solutions from experiences to solve similar problems Cases complement incomplete domain model problem-solving speed is crucial. Quicker to adapt cases than reason from scratch, but requires domain-specific adaptation knowledge
Chunking Compile a sequence of steps into a macro For tasks requiring speedup Transforms a complex reasoning task into a fast retrieval task
Dynamic Scripting RL for tasks with large state spaces that w/ domain knowledge can be collapsed into a smaller set Small set of states exist, with a set of rules for each Greatly speeds up RL approach, but requires analysis of task states
Evolutionary Computation Evolutionary (genetic) selection on a population of genomes, where application dictates their repn Search space is huge, and training can be done offline Genome repns can be task specific, so this powerful search method can be tuned for the task
Meta Reasoning After a failure, this identifies its type task that failed, it retrieves a task-specific strategy to avoid this failure, and updates its model To support self-adaptation Although knowledge intensive, this is an excellent method for changing problem-solving strategies
Neuroevolution Using a separate genetic algorithm population for learning each hidden units weight in a NN To support cooperating heterogeneous agents A good offline agent-based learning approach for multi-agent gaming
Reinforcement Learning (RL) Reinforce sequence of decisions after problem solving is completed Reward is known only after sequence ends, and blame can be ascribed Well-understood paradigm for learning action policies (i.e., what action to perform in a given state)
Relational MDPs Learn a Markov decision process re objects their relations using probabilistic relational models Seeking knowledge transfer (KT) to similar environments KT is crucial for learning quickly, and feasibly, for some tasks
53TIELT-General Game Player Integration(with
Stanford Universitys Michael Genesereth)
TIELT
GGP-TIELT
- Experiment design/control capabilities
- Common game engine interface
- Support for several learning approaches
- Play entire class of general games as well as
TIELT-integrated gaming simulators. - Compete remotely against reference players and
other GGP systems. - Define evaluation methodologies for learning
experimentation. - Participate in AAAI05 GGP Competition.
GGP
- Logical game formalisms
- Access to remote players
- WWW access
54Upcoming Events
- National Conference on AI (AAAI05 24-28 July
Pittsburgh) - General Game Playing Competition (10K prize)
- Int. Joint Conference on AI (IJCAI05 30 July-5
August Edinburgh) - Workshop Reasoning, Representation, and Learning
in Gaming Simulation Tasks (Tentative title) - Int. Conference on ML (ICML05 7-11 August
Bonn) - Workshop submission in progress
- Int. Conference on CBR (ICCBR05 23-26 August
Chicago) - Workshop Competition CBR in Games
55Summary
- TIELT Mediates between a (gaming) simulator and
a learning-embedded decision system - Goals
- Simplify running learning expts with cognitive
systems - Support DARPA challenge problems in learning
- Designed to work with many types of simulators
decision systems
- Status
- TIELT (v0.5 Alpha) completed in 10/04
- Users Manual, Tutorial, www site exist
- 10 collaborating organizations (1-year contracts)
- Enhances probability that TIELT will achieve its
goals - Were planning several TIELT-related events
56Backup Slides
57Metrics
Research perspective
- Time required to develop reasoning interface KB
- Ability to design/facilitate selected evaluation
methodology - Expressiveness of KB representation
- Breadth of learning techniques supported
- Breadth of learning and performance tasks
supported - Availability of integrated gaming simulators
challenges
Industry perspective
- Ability to develop learned/learning behaviors of
interest - Time required to
- develop game interface model KBs, and
- these behaviors
- Availability of learning-embedded reasoning
systems - Support for both off-line and on-line learning
58Some Expected User Metrics
Performance tasks
- Some standards
- e.g., classification accuracy, ROC analyses,
precision recall - Decision making speed and accuracy
- Plan execution quality (e.g., time to execute,
mission-specific Measures of Effectiveness) - Number of constraint violations
- Ability to transfer learned knowledge
59TIELT Potential Learning Challenge Problems
- Learn to win a game (i.e., accomplish an
objective) - e.g., solve a challenging diplomacy task, provide
a realistic military training course facing
intelligent adversaries, or help users to develop
real-time cognitive reasoning skills for a
defined role in support of a multi-echelon mission
- Learn an adversarys strategy
- e.g., predict a terrorist groups plan and/or
tactics, suggest appropriate responses to prevent
adversarial goals, help users identify
characteristics of adversarial strategies
- Learn crucial processes of an environment
- e.g., learn to improve an incorrect/incomplete
game model so that it more accurately/reliably
defines objects/agents in the game, their
behaviors, their capabilities, and their
limitations
- Intelligent situation assessment
- e.g., learn which factors in the simulation
require attention to accomplish different types
of tasks
60Example Game FreeCiv(Discrete-time strategy)
Civilization II? (MicroProse)
- Civilization II? (1996-) 850K copies sold
- PC Gamer Game of the Year Award winner
- Many other awards
- Civilization? series (1991-) Introduced the
civilization-based game genre
FreeCiv (Civ II clone)
- Open source freeware
- Discrete strategy game
- Goal Defeat opponents, or build a spaceship
- Resource management
- Economy, diplomacy, science, cities, buildings,
world wonders - Units (e.g., for combat)
- Up to 7 opponent civs
- Partial observability
http//www.freeciv.org
61Previous FreeCiv/Learning Research
(Ulam et al., AAAI04 Workshop on Challenges in
Game AI)
- Title Reflection in Action Model-Based
Self-Adaptation in Game Playing Agents - Scenarios
- City defense Defend a city for 3000 years
62FreeCiv CP Scenario
General description
- Game initialization Your only unit, a settler,
is placed randomly on a random world (see Game
Options below). Players cyclically alternate play - Objective Obtain highest score, conquer all
opponents, or build first spaceship - Scoring Basic goal is to obtain 1000 points.
Game options affect the score. - Citizens 2 pts per happy citizen, 1 per content
citizen - Advances 20 pts per World Wonder, 5 per
futuristic advance - Peace 3 pts per turn of world peace (no wars or
combat) - Pollution -10pts per square currently polluted
- Top-level tasks (to achieve a high score)
- Develop an economy
- Increase population
- Pursue research advances
- Opponent interactions Diplomacy and
defense/combat
Game Option Y1 Y2 Y3
World size Small Normal Large
Difficulty level Warlord (2/6) Prince (3/6) King (4/6)
Opponent civilizations 5 5 7
Level of barbarian activity Low Medium High
63FreeCiv CP Information Sources
Concepts in an Initial Knowledge Base
- Resources Collection and use
- Food, production, trade (money)
- Terrain
- Resources gained per turn
- Movement requirements
- Units
- Type (Military, trade, diplomatic, settlers,
explorers) - Health
- Combat Offense defense
- Movement constraints (e.g., Land, sea, air)
- Government Types (e.g., anarchy, despotism,
monarchy, democracy) - Research network Identifies constraints on what
can be studied at any time - Buildings (e.g., cost, capabilities)
- Cities
- Population Growth
- Happiness
- Pollution
- Civilizations (e.g., military strength,
aggressiveness, finances, cities, units) - Diplomatic states negotiations
64FreeCiv CP Decisions
Civilization decisions
- Choice of government type (e.g., democracy)
- Distribution of income devoted to research,
entertainment, and wealth goals - Strategic decisions affecting other decisions
(e.g., coordinated unit movement for trade)
City decisions
- Production choice (i.e., what to create,
including city buildings and units) - Citizen roles (e.g., laborers, entertainers, or
specialists), and laborer placement - Note Locations vary in their terrain, which
generate different amounts of food, income, and
production capability
Unit decisions
- Task (e.g., where to build a city, whether/where
to engage in combat, espionage) - Movement
Diplomacy decisions
- Whether to sign a proffered peace treaty with
another civilization - Whether to offer a gift
65FreeCiv CP Decision Space
Variables
- Civilization-wide variables
- N Number of civilizations encountered
- D Number of diplomatic states (that you can have
with an opponent) - G Number of government types available to you
- R Number of research advances that can be
pursued - I Number of partitions of income into
entertainment, money, research - U Units
- L Number of locations a unit can move to in a
turn - C Cities
- Z Number of citizens per city
- S Citizen status (i.e., laborer, entertainer,
doctor) - B Number of choices for city production
Decision complexity per turn (for a typical game
state)
- O(DNGRILU(SZB)C) this ignores both
other variables and domain knowledge - This becomes large with the number of units and
cities - Example N3 D5 G3 R4 I10 U25 L4
C8 Z10 S3 B10 - Size of decision space (i.e., possible next
states) 2.51065 (in one turn!) - Comparison Decision space of chess per turn is
well below 140 (e.g., 20 at first move)
66FreeCiv CP A Simple Example Learning Task
Situation
- Were England (e.g., London)
- Barbarians are north (in red)
- Two other civs exist
- Our military is weak
What should we do?
- Ally with Wales? If so, how?
- Build a military unit? Which?
- Improve defenses?
- Increase citys production rate?
- Build a new city to the south? Where?
- Research Gun Powder? Or?
- Move our diplomat back to London?
- A combination of these?
What information could help with this decision?
- Previous similar experiences
- Generalizations of those experiences
- Similarity knowledge
- Adaptation knowledge
- Opponent model
- Statistics on barbarian strength, etc.
67Analysis of the Example Learning Task
Complexity function
Situation
- D 3 (war, neutral, peace)
- N Only 1 other civilization contacted (i.e.,
Wales) - G 2 government types known
- R 4 research advances available
- I 5 partitions of income available
- L 14 per unit
- U 3 Units (1 external, 2 in city)
- C 1 City
- S 3 (entertainer, laborer, doctor)
- Z 6 citizens
- B 5 units/buildings it can produce
Decision Space Size
- 1.2109
- This reduces to 32 sensible choices after
applying some domain knowledge - e.g., dont change diplomatic status now, keep
units in city for defense, dont change
government now (because itll slow production),
keep external unit away from danger
68FreeCiv CP Learning Opportunities
Learn to keep citizens happy
- Citizens in a city who are unhappy will revolt
this temporarily eliminates city production - Several factors influence happiness (e.g.,
entertainment, military presence, govt type)
Learn to obtain diplomatic advantages
- Countries at war tend to have decreased trade,
lose units and cities, etc. - Diplomats can sometimes obtain peace treaties or
otherwise end wars - Unit movement decisions can also impact
opponents diplomatic decisions
Learn how to wage war successfully
- Good military decisions can yield new
cities/citizens/trade, but losses can be huge - Unit decisions can benefit from learning tactical
coordinated behaviors - The selection of a military unit(s) for a task
depends on the opponents capabilities
Learn how to increase territory size
- Initially, unexplored areas are unknown their
resources (e.g., gold) cannot be harvested - Exploration needs to be balanced with security
- City placement decisions influence territory
expansion
69FreeCiv CP Example Learned Knowledge
Learn what playing strategy to use in each
adversarial situation
- Situations are defined by relative military
strength, diplomatic status, whether the opponent
has strong alliances, locations of forces, etc. - Selecting a good playing strategy depends on many
of these variables
70What Techniques Could Learn the Task of Selecting
a Playing Strategy?
Meta-reasoning (e.g., Ulam et al., AAAI04 Wkshp
on Challenges in Game AI)
- Requires knowledge on
- Tasks being performed
- Types of failures that can occur when performing
these tasks - T2 Overestimate own strength, underestimate
enemy strength, - T3 Incorrect assessment of enemys diplomatic
status, - Strategies for adapting these tasks
- S1 Increase military strength
- S2 Assess distribution of enemy forces
- S3 Consider enemys diplomatic history
- Mapping of failure types in (2) to adaptation
strategies in (3) - Example We decided to Attack, but underestimated
enemy strength. This was indexed by strategy S2,
which well do from now on in T2.
T1 Determine Playing Strategy
T3 Assess Diplomatic Status
T4 Select Strategy
T2 Assess Military Advantage
Attack
Retreat!
Fortify
Trade
Seek Peace
Bribe
71Challenges for Using Learning via Meta-Reasoning
How can its background knowledge be learned
(efficiently)?
- i.e., tasks, failure types, failure adaptation
strategies, mappings - Also, the agent needs to understand how to
diagnosis an error (i.e., identify which task
failed and its failure type)
What if only incomplete background knowledge
exists?
- Could complementary learning techniques apply it?
- e.g., Relational MDPs (which handle uncertainty)
- Could learning techniques be used to
extend/correct it? - e.g., Learning from advice, case-based reasoning
Can we scale it to more challenging learning
problems?
- Currently, it has only been applied to simpler
tasks - Defend a City (in FreeCiv)
- More difficult would be Play Entire Game
72Full Spectrum Command Warrior(http//www.ict.us
c.edu/disp.php?bdproj_games)
Organization USCs Institute for Creative
Technologies
- POC Michael van Lent (Editor-in-Chief, Journal
of Game Development) - Goal Develop immersive, interactive, real time
training simulations to help the Army create
decision-making leadership-development tools
Focus US Army training tools (deployed _at_ Ft
Benning Afghanistan)
- Full Spectrum Command (PC-based simulator)
- Role Commander of a U.S. Army light infantry
Company (120 soldiers) - Tasks Interpret the assigned mission, organize
the force, plan strategically, coordinate the
actions of the Company - Full Spectrum Warrior (MS Xbox-based simulator)
- Role Light infantry squad leader
- Tasks Complete assigned missions safely
73METAGAME(Pell, 1992)
Focus Learn strategies to win any game in a
pre-defined category
- Initial category Chess-like games
- Games are produced by a game generator
- Input Rules on how to play the game
- Move grammar is used to communicate actions
- Output (desired) A winning playing strategy
Games
Graphics for Spectators
Game Manager
percept actions clocks
action
Temporary State Data
Records
74Collaborator Mad Doc Software
Summary
- PI Ron Rosenberg (Producer)
- Experience
- Mad Doc is a leader in real-time strategy games
Empire Earth II is expected to sell in the
millions of copies - CEO Ian Davis (CMU PhD in Robotics) is a well
known collaborator with the AI research
community, and gave an invited presentation at
AAAI04. He will work with Ron on this contract. - Deliverables Mad Doc (RTS) game simulator API
- This will be used by multiple other collaborators
75Collaborator Troika Games
Summary
- PI Tim Cain, Joint-CEO
- Experience
- Troika has outstanding experience with developing
state-of-the-art role playing games, including
Temple of Elemental Evil (ToEE) - A game developer since 1982, Tim obtained an M.S.
with a focus on machine learning at UC Irvine in
the late 1980s. - Deliverables ToEE (RPG) game simulator API
- This will be used by some other collaborators
(e.g., U. Michigan)
76Collaborator ISLE
Summary
- PIs Dr. Seth Rogers, Dr. Pat Langley
- Experience
- ISLE (Institute for the Study of Learning and
Expertise) is known for its ICARUS cognitive
architecture, which is distinguished in part by
its commitment to ground every symbol with a
physical world object - Pat Langley, founder of the journal Machine
Learning, is known for his expertise in cognitive
architectures and evaluation methodologies of
learning systems. - Deliverables
- ICARUS reasoning system API
- FreeCiv agent (with assistance from NWU) and
SimCity agent - This will also be used by USC/ICT
- SimCity (RTS) game simulator API
77Collaborator Lehigh U.
Summary
- PI Prof. Héctor Muñoz-Avila
- Experience
- Héctor is an expert on hierarchical planning
technology, and in particular has expertise in
case-based planning - Collaborating with NRL on TIELT during CY04 on
(1) Game Model description representations, (2)
Stratagus/Wargus game simulator API, and (3)
feedback on TIELT usage - Deliverables
- Software for translating among Game Model
representations - Stratagus/Wargus (RTS) game simulator API
- This may be used by UT Austin
- Case-based planning reasoning system API
78Collaborator NWU
Summary
- PIs Prof. Ken Forbus, Prof. Tom Hinrichs
- Experience
- Ken is a leading AI/games researcher. He is also
the leading worldwide researcher in computational
approaches to reasoning by analogy. - Kens group has extensive experience with
qualitative reasoning approaches and with using
the FreeCiv gaming simulator. - Deliverables
- FreeCiv (Discrete Strategy) game simulator API
- This will be used by ISLE
- Qualitative spatial reasoning system for FreeCiv
API
79Collaborator U. Michigan
Summary
- PI Prof. John Laird
- Experience
- John is the best-known AI/games researcher, and
has extensive experience with integrating many
commerical, freeware, and military game
simulators with the Soar cognitive architecture. - Deliverables
- Soar reasoning system API
- This will be used by USC/ICT
- Applications of Soar to two game simulators
(e.g., ToEE, Wargus)
80Collaborator USC/ICT
Summary
- PI Dr. Michael van Lent
- Experience
- Extensive implementation experience with AI/game
research PhD advisor was John Laird. - Lead ICTs development of Full Spectrum Warrior
and Full Spectrum Command (FSC) in collaboration
with Quicksilver Software and the Armys PEO
STRI. FSC is deployed at Ft. Benning and
Afghanistan. - Editor-in-Chief, Journal of Game Development
- Deliverables
- FSC (RTS) game simulator API
- Applications of FSC with U. Michigans Soar and
ISLEs ICARUS
81Collaborator UT Arlington
Summary
- PIs Prof. Larry Holder, G. Michael Youngblood
- Experience
- Larry has extensive experience with developing
unsupervised machine learning systems that use
relational representations, and has lead efforts
on developing the DArtagnan cognitive
architecture. - Deliverables
- Urban Terror (FPS) game simulator API
- DArtagnan reasoning system API (partial)
82Collaborator UT Austin
Summary
- PI Prof. Risto Miikkulainen
- Experience
- Risto has significant experience with integrating
neuro-evolution and similar approaches with game
simulators. - Collaborating with UT Austins Digital Media
Laboratorys development of the NERO (FPS) game
simulator - Deliverables
- Knowledge-intensive neuro-evolution reasoning
system API - Application of this API using other simulators
(e.g., FSC, Wargus) and U. Wisconsins advice
processing module
83Collaborator U. Wisconsin
Summary
- PI(s) Prof. Jude Shavlik (UW), Prof. Richard
Maclin (U. Minn-Duluth) - Experience
- Jude advised the first significant M.S. Thesis on
applying machine learning to FPS game simulators
(Geisler, 2002) - Maclin, who will be on sabbatical at U. Wisconsin
during this project, has performed extensive work
with applying AI techinques (e.g., advice
processing) to the RoboCup game simulator - Deliverables
- RoboCup (team sports) game simulator API
- Advice processing module
- WWW-based repository for TIELT software
components (e.g., APIs)