Title: Automated Tactics Modeling: Techniques and Applications
1Automated Tactics ModelingTechniques and
Applications
- Robert G Abbott
- Advisor Stephanie Forrest
- April 5, 2007
2Objectives of Simulation-Based Training
- Increase
- Safety
- Availability
- Flexibility
- Decrease
- Cost
- Environmental Impact
3But is Simulation Training Valid?
- Topic selection
- Physics accuracy
- Human behavior simulation
- Corrective feedback
4Proposed Solution
- More SME Control
- Simulation as Interactive Media
Getting the message from here
To Here
Recipients Students
Medium
Sender (SME)
Recipients Students
Recipients Students
5Summary of Research
Technology
Application
Automated Perception
Grounding in Reality
1)
Tactics Modeling
Models for Experimentation
2)
Auto Student Evaluation
Corrective Feedback
3)
Team Behavioral Cloning
Tactics Transferability
4)
Agent Team Evolution
Simulate Learner
5)
6BackgroundandRelated Work
7Training Transfer Studies
- The Gold Standard
- Few and far between (Carletta 1998)
So much for that new flight training software!
8SME Validation
(Wallace 1989)
- Highly subjective
- Limited SME availability
Wheeeee!
9Reverse the Process
From
Simulation
Reality
To
Simulation
Reality
10Conventional Expert System Construction
(Shadrick 2005)
SME
KnowledgeEngineer
Programmer
Simulation
11Problems With Conventional Expert System
Construction
SME
- Implicit knowledge
- Coordination
- Expensive
- Slow
KnowledgeEngineer
Programmer
Simulation
12An Example From Soccer
Soccer SME Force the attacker to screen the
ball with their first touch
Knowledge Engineer Section 23, Paragraph 9 In
defensive roles (section 7.21), players shall
reverse the course of the ball before prior to
initial contact by the advancing opponent
TheTelephone Game
Programmer else if(player.id gt 1 player.id
lt 4) dpos.x cos((w.y-s.y)/(w.x-s.x))
dpos.x cos((w.y-s.y)/(w.x-s.x))
Student I can beat the defender every time by
turning left.
13Alternate ApproachTrainable Agents
SME
Software Developer
Software
Trainable Software
14Related Work Behavioral Cloning
- Data
- Model
- Agents
- Generalization
- Applications
- Inverted pendulum (Widrow 1964)
- Aircraft piloting (Sammut 1992)
- Bicycle riding (Suc 1999)
- Soccer play in RoboCup (Aler 2005)
- Highly synthetic
15Related Work Learning by Observation
- Robotics
- Applications
- Juggling (Schaal 2004)
- Table tennis (Atkeson 2000)
- Automated driving (Pomerleau 1991, Thrun 2006)
- Radio-controlled helicopter (Abbeel 2007)
- many, many more
- The simulator (if any) is just a byproduct
16Task Domain RoboCup Simulation League
- Popular simulation of a popular sport
- Requires expertise
17Summary of Research
Technology
Application
Automated Perception
Grounding in Reality
1)
Tactics Modeling
Models for Experimentation
2)
Auto Student Evaluation
Corrective Feedback
3)
Team Behavioral Cloning
Tactics Transferability
4)
Agent Team Evolution
Simulate Learner
5)
18MuTTSA
19Summary of Research
Technology
Application
Automated Perception
Grounding in Reality
1)
Tactics Modeling
Models for Experimentation
2)
Auto Student Evaluation
Corrective Feedback
3)
Team Behavioral Cloning
Tactics Transferability
4)
Agent Team Evolution
Simulate Learner
5)
20Soccer Field Positioning
- Predict position of each player
- Captures high-level tactics
- Instance-based learning algorithm (IBL)
21Model ParametersNumber of Observations
- Learning success
- Data is precious
- Up to 20 minutes of data
22Parameter SelectionNumber of Contexts
- Amount of agent memory
- Over-fitting vs. over-generalization
- Computational and memory costs
- Up to 256 contexts
23Parameter SelectionFeature Selection
- Markov property
- Curse of dimensionality
- Conditions
- BallX
- Ball position
- Ball position velocity
24Prediction Accuracy Matrix(NumObservations,
NumContexts, Features)
Mean Squared Error of Player Position Prediction
Each Point 10-Fold Cross Validation for each of
10 Agents
min(row)
min(column)
25Training Data Reduces Prediction Error
26Additional ContextsReduce Prediction Error
27Soccer Tactics Modeling Results
- (Some) team tactics can be modeled
- Ball position surprisingly effective
- Room for improvement
- But good enough to continue
28Summary of Research
Technology
Application
Automated Perception
Grounding in Reality
1)
Tactics Modeling
Models for Experimentation
2)
Auto Student Evaluation
Corrective Feedback
3)
Team Behavioral Cloning
Tactics Transferability
4)
Agent Team Evolution
Simulate Learner
5)
29Automated Student Evaluation
- Behavior models effective in assessing student
behavior - Automated Student Evaluation for Automated
Student Evaluation (Abbott 2006) - Or Chapter 6
30Summary of Research
Technology
Application
Automated Perception
Grounding in Reality
1)
Tactics Modeling
Models for Experimentation
2)
Auto Student Evaluation
Corrective Feedback
3)
Team Behavioral Cloning
Tactics Transferability
4)
Agent Team Evolution
Simulate Learner
5)
31Hierarchy of Skills for aHuman-Like RoboCup Team
Team Field Positioning Model-Based
Individual Ball-Handling Ad-Hoc
Low-Level Skills UVA TriLearn
32RoboCup Performance Matrix Parameter Sweep
(NumObservations, NumContexts, Features)
Mean Penalty Score
Each Point Mean of 100 Robocup Matches
33RoboCup Performance Increases with Additional
Human Observation
34Large Models Perform Poorly in RoboCup
35Human Model Accuracy and RoboCup Performance are
Correlated
Average correlation for all 3 feature sets 0.43
36Robocup Performance ExperimentResults
- Strong correlation
- Modeled behavior is significant
- Real-world tactics transfer to simulation
- The humans being modeled are skilled
- Weak correlation
- At least one of the above statements is false.
37RoboCup Performance ExperimentSignificance to
Training Simulations
- Strong correlation in a training simulator
- The skill is important
- Simulator penalizes arbitrary tactics
- Simulator likely valid for evaluating students
- Lack of significant correlation implies
- Negative training
- Back to the drawing board
38Limitations of the Results
- Constrained process for producing arbitrary
tactics - Degraded models probably not representative of
students
39Summary of Research
Technology
Application
Automated Perception
Grounding in Reality
1)
Tactics Modeling
Models for Experimentation
2)
Auto Student Evaluation
Corrective Feedback
3)
Team Behavioral Cloning
Tactics Transferability
4)
Agent Team Evolution
Simulate Learner
5)
40Gaming the System
- Gaming Tactics beat the game without regard to
training benefit - Circle-kick vulnerability (Burkhard 1998)
- Tank simulator
41Evolutionary Optimization
- Maximizes reward
- No expert preconceptions
- Global search
- Approximate training influnce on student
42Characteristics of Robocup As An Optimization
Problem
- Huge Search space
- Reward (fitness) is
- Expensive to compute
- Stochastic
- Delayed
43Evolutionary Optimization Compliments Behavioral
Cloning
- Shared strengths
- Implementer is not a domain expert
- Reduced programming
- Complimentary strengths
- Fast vs. Slow
- Open-ended vs. dead-end
44Evolving a Team Fitness
- Goal difference
- Fixed opponent
- UVA Trilearn clone
- 0 Penalty achievable
45Unit of SelectionWhat is an Individual?
Population
Team
Team
Agent
Agent
Agent
Agent
Context
Context
Context
Context
Context
Context
Context
Context
X1
Y1
X2
Y2
X3
Y3
X4
Y4
X1
Y1
X2
Y2
X3
Y3
X4
Y4
- Each genome codes for an entire team
46Crossover Operation
Population
Team
Team
Agent
Agent
Agent
Agent
Context
Context
Context
Context
Context
Context
Context
Context
X1
Y1
X2
Y2
X1
Y1
X2
Y2
X3
Y3
X4
Y4
X3
Y3
X4
Y4
- Single-point crossover agents 1..N switch teams
47Mutation
Population
Team
Team
Agent
Agent
Agent
Agent
Context
Context
Context
Context
Context
Context
Context
Context
X1
Y1
X2
Y2
X3
Y3
X4
Y4
X1
Y1
X2
Y2
X3
Y3
X4
Y4
- Pmutation 0.1 for all teams created by
crossover - Mutation all values in team perturbed
N(0,0.5m)
48Selection OperatorTournament of Gaussians
49Start Conditions
- Only initial model contents vary
- Random
- Human Median
- Human Clone
- Trilearn Clone
50Evolution Parameters
- Volume of search space 3210105m68m
(7140m2)320 101233m640 - 250K RoboCup matches per run, 15 min. ea.
51Team Evolution Results
52Fitness - 1000 Generations
- Progress but not convergence
53Initial Fitness and Fitness Gain
- Starting condition is critical
- Synthetic (TriLearn) tactics beat human tactics
54Human Similarity Metric
a1
b1
t4
t3
t2
t1
Player 1 Contexts
Cost function (a12b12 a12b12)/n
Player 2 Contexts
55Evolution of Human Similarity
56Total Change of Human Similarity
57Team EvolutionDiscussion
- Expert model captures valuable information
- No drastic results from tactics evolution
- 3 of 4 teams tactics became less human-like
- Cant model all human teams
- Cant cover the space of tactics
- Hypothesis space of model is limited
- Breakthrough tactics in RoboCup difficult
58Team EvolutionLimitations
- Cant model all human teams
- Cant cover the space of tactics
- Limited behavior model
- Breakthrough tactics in RoboCup difficult
59Team EvolutionLimitations
- Cant model all human teams
- Cant cover the space of tactics
- Limited behavior model
- Breakthrough tactics in RoboCup difficult
60Conclusion
- Human behavior critical to modeling and
simulation - Importance of real-world data
- Quantification of
- Student tactical behavior
- Tactics transferability
- Training influence of simulator
61Contributions
- A data set of real-world human soccer tournament
play - An attention-driven image segmentation algorithm
which drastically reduces computation costs in a
vision-based multiple target tracking system - A method for accurate, real-time assessment of
student behavior in a tactical domain using
comparison with a behavioral clone of domain
experts - A measure of tactical fidelity in a simulator
based on the correlation between human behavior
predictive accuracy and software agent
performance in the simulator - Tournament of Gaussians, a genetic selection
algorithm which combines favorable aspects of
proportionate and ordinal techniques
62Publications
- R.G. Abbott. Automated expert modeling for
automated student evaluation. Intelligent
Tutoring Systems 2006 1-10 - R.G. Abbott, J.H. Whetzel, J.D. Basilico.
Automated Student Evaluation for a Distributed
After-Action Review Application. Human System
Integration Symposium 2007. - R.G. Abbott, S. Forrest, K.J. Pienta. Simulating
the hallmarks of cancer. Artificial Life 124
617-634. 2006. - R.G. Abbot. Behavioral cloning for simulator
validation. Submitted to RoboCup Symposium 2007. - R.G. Abbott, L.R. Williams. Multiple target
tracking with lazy background subtraction and
connected components analysis. Submitted to
Journal of Machine Vision and Applications.
63References
- Pieter Abbeel, Adam Coates, Morgan Quigley, , and
Andrew Y. Ng, 2007. An application of
reinforcement learning to aerobatic helicopter
flight. In Advances in Neural Information
Processing Systems 19 (NIPS). - R. Aler, O. Garcia, and J.M. Valls, 2005.
Correcting and improving imitation models of
humans for robosoccer agents. IEEE Congress on
Evolutionary Computation. Volume 3, pages
2402-2409. - Christopher G. Atkeson, Joshua G. Hale, Frank
Pollick, Marcia Riley, Shinya Kotosaka, Stefan
Schaal, Tomohiro Shibata, Gaurav Tevatia, Ales
Ude, Sethu Vijayakumar, and Mitsuo Kawato, 2000.
Using humanoid robots to study human behavior.
IEEE Intelligent Systems, 15(4)46-56. - Hans-Dieter Burkhard, Markus Hannebauer, and Jan
Wendler, 1998. AT Humboldt - Development,
Practice and Theory. In RoboCup-97 Robot
Soccer World Cup I, pages 357-372. - Caretta, T. R. Dunlap, R. D. (1998). Transfer
of training effectiveness in flight simulation
1986-1997 (Report No. AFRL-HE-AZ-TR01998-0078).
Mesa, AZ Air Force Research Laboratory. - Dean Pomerleau, 1991. Efficient training of
artificial neural networks for autonomous
navigation. Neural Computation, 3(1)88-97. - Claude Sammut, Scott Hurst, Dana Kedzier, and
Donald Michie, 1992. Learning to fly. In ML '92
Proceedings of the Ninth International Workshop
on Machine Learning, pages 385-393, San
Francisco, CA, USA. Morgan Kaufmann Publishers
Inc. - S. Schaal, A. Ijspeert, and A. Billard, 2004.
Computational approaches to motor learning by
imitation, volume 1431, pages 199218. Oxford
University Press. - Scott B. Shadrick and James W. Lussier, 2005.
Concept Development for Future Domains A New
Method of Knowledge Elicitation. U.S. Army
Research Institute Tech Report 1167. - D. Suc and I. Bratko, 1999. Symbolic and
qualitative reconstruction of control skill.
Electronic transactions on artificial
intelligence, Section B, Vol. 3122. - S. Thrun, M. Montemerlo, H. Dahlkamp, D. Stavens,
A. Aron, J. Diebel, P. Fong, J. Gale, M.
Halpenny, G. Hoffmann, K. Lau, C. Oakley, M.
Palatucci, V. Pratt, P. Stang, S. Strohband, C.
Dupont, L.-E. Jendrossek, C. Koelen, C. Markey,
C. Rummel, J. van Niekerk, E. Jensen, P.
Alessandrini, G. Bradski, B. Davies, S. Ettinger,
A. Kaehler, A. Nefian, and P. Mahoney, 2006.
Winning the DARPA Grand Challenge. Journal of
Field Robotics. - B. Widrow and F.W. Smith, 1964.
Pattern-recognizing control systems. In Computer
and Information Sciences (COINS) Proceedings,
Washington, D.C. Spartan.
64Backup Slides
65Data
AEMASE Algorithm
Code
Model Construction
Student Evaluation
Expert Performs Task
Student Performs Task
Observation Sequence
Feature Extraction
Feature Vector
Feature Vector Sequence
Context Set
Context Recognition
Transition Probabilities
Performance Evaluation
Sequencing
66ClusteringFor Observation Selection
- Try to preserve only semantically distinct
Contexts (cluster centers) - K-Means Clustering
- Allows (requires) manual specification of model
size - Requires many distance computations expensive
if using rotational invariance
67Feature Mappingvs. Clustering
- Specifying an appropriate dissimilarity measure
is far more important in obtaining success with
clustering than choice of clustering algorithm - - Hastie, Tibshirani, Freiedman , 2001
68Clustering Example
Input 1215 Observations
Output 30 Cluster Centers (Contexts)
69IntroductionIntelligent Tutoring Systems
Tutoring Module
Student Module
Expert Module
A Minimal Intelligent Tutoring System
Burns Capps 1998
70IntroductionIntelligent Tutoring Systems
- Student Module
- Represents a students domain knowledge
- Expert Module
- Represents the tutors domain knowledge and
problem-solving expertise - Tutoring moduleSelects exercises and presents
instruction
71Perception
- Discretize input into a long sequence of
Observations - Observation a (potentially) high dimensional
vector - Computer vision, or application instrumentation
Soccer Video, 1 Game 70,000 Frames 640x240
Pixels 4 Cameras Highly Compressed 1,200,000,000
Bytes
Player/Ball Coordinates 70,000 46-Dimensional
Vectors i.e. 70,000 Rows x 46 Cols observation
Matrix 12,880,000 Bytes
72Perception Example Soccer Data
1000 Samples (Approx 1 Min.) Of 23 Points In
R2 2 Teams Of 11 Players And The Ball
73Perception Example Soccer DataMatrix
Representation
N 46 Columns (Indicators / Inputs / Random
Variables)
Ball
Player 1
Player 2
t 1
- 0 0 -50.5 0 -20 -17 -20 -5.4
-20 5.4 ... - -1.1 -2.0 -50.5 0 -20 -17 -20 -5.4
-20 5.4 ... - -2.2 -3.9 -49.8 -0.0 -19.5 -16.6 -19.3 -5.3
-19.4 5.3 ... - -3.3 -5.5 -49.0 -0.0 -19.4 -16.5 -18.6 -5.1
-18.5 5.1 ... - -4.4 -7.1 -48.7 -0.1 -18.7 -16.4 -17.6 -4.9
-17.7 4.8 ... - -5.3 -8.6 -48.6 -0.1 -17.9 -16.5 -16.7 -4.8
-16.8 4.5 ... - -6.3 -10.1 -48.0 -0.2 -16.9 -16.5 -15.8 -4.5
-15.9 4.1 ... - -7.3 -11.5 -47.2 -0.2 -15.9 -16.6 -15.5 -4.4
-15.5 4.0 ... - -6.8 -11.7 -46.3 -0.2 -15.0 -16.7 -14.8 -4.4
-14.7 3.9 ... - -6.3 -11.8 -45.3 -0.2 -13.9 -16.7 -13.9 -4.4
-13.9 3.8 ... - ... ... ... ... ... ... ... ...
... ... ...
t 2
M 70,000 Rows (Observations)
Each Observation Is A Point In R46
74Feature MappingComplex Features
- The linear observations are segmented into
logically atomic Complex (multidimensional)
features
- Each input might be used in several complex
features, or not at all - Complex features may also use past inputs (not
shown)
Self
Teammate 1
Opponent 2
Opponent 2
Observation (R20)
Fuel
Position
Heading
Fuel
Position
Heading
Position
Heading
Position
Heading
Complex Feature 1
Complex Feature 2
Complex Feature 3
Feature Vector (R6)
X1
Y1
Z1
X3
Y3
X2
75Merging of 4 Viewpoints
- UNM Soccer Pitch 70x66m
- 4 cameras utilized for adequate resolution
- Total resolution 2720 x 240 pixels
- Efficient processing necessary!
76Synthesized Overhead View
77Some Target Tracking Terminology
- Targets soccer players (etymology radar)
- Track computers current state estimate for each
target - Observation for each frame of video, the set of
perceived target returns
78LBSCCA Tentative Dilation
Tentative Connected Components With Dilation
Vertical Erosion
Horizontal Dilation
Vertical Erosion
Connected Components
Horizontal Dilation, Connected Components
LBSCCA (Correct Result)
Conventional Process
79(No Transcript)
80(No Transcript)
811NN Clustering AvoidsUndesirable Generalization
KnowledgeBase
1NNResponse
2NN Response
Query
82Tournament of Gaussians
83Selective Pressure vsPopulation Diversity
- Low selective pressure Fitness is nullified
- Evolution does not progress
- High selective pressure Diversity is destroyed
- Fast short-term gains
- Converges to local minimum
84Tournament of Gaussians
85Common Tournament SelectionConstant Selective
Pressure
Diverse
Homogenous
86Individual Tutoring
Instructor
Student
Group Instruction
Several Students
Instructor
Several Students
Books
Several Students
Many Students
Many Students
Instructor
Book
Many Students
Several Students
Computer Instruction
Many Students
Many Students
Many Students
Instructor
Programmer
Software
Other Applications
Computer Instruction
Several Students
Many Students
Many Students
Many Students
Instructor
Software
Programmer
Other Applications
87Behavior Dataset Produced
- 10 Hz samples
- Positions of 22 players
- Ball
- PlayMode in play, waiting for kickoff, etc.
- 20 minutes of play
- Hand-verified
- Accuracy 1m
- Player identities preserved for duration of
dataset - UC Irvine Anteaters vs. West Illinois U
Leathernecks
88RoboCup Performance Experiment Protocol
- For each combination of observation set size,
context set size, and selected features
calculate the mean penalty score. - Penalty score Goalsopponent-Goalsself
- The opponent
- Is fixed is a model based-team but uses the
same parameters in all conditions. - Is a clone of the UVA Trilearn RoboCup team.
- 100 RoboCup matches per condition to estimate the
mean penalty score.
89Player Prediction ErrorX Component of Ball
Position
- Best performance with only 256 (25s) of
Observations - Best performance with only 1-4 Contexts
90Player Prediction ErrorBall Position (2D)
- Best performance with all (12000) Observations
- Best performance with max (256) Contexts
91Player Prediction ErrorBall Position Ball
Velocity
- Best performance with all (12000) Observations
- Best performance with max (256) Contexts
92RoboCup PerformanceX Component of Ball
Position (1D)
- Best performance with only 256 (25s) of
Observations - Best performance with only 1-4 Contexts
93RoboCup PerformanceBall Position (2D)
- Best performance with 1024 to 4096 Observations
- Best performance with 16 Contexts
94RoboCup PerformanceBall Position Ball
Velocity (4D)
- Performance improves through 12000 (all available
Observations) - Best performance with 8-32 Contexts
95Summary of Research
- The foundation of simulator validation
parallel observations of the real world and the
simulator. (Chapters 3,4)
Perception Multiple Target Tracking MuTTSA, LBSCCA
1)
Tactics Modeling Temporal Feature
Extraction Goal-Driven Instance-Based Learning
- Computational representation of tactics for
experimentation and analysis. (Chapter 5)
2)
Automated Student Evaluation AEMASE Tactical
Aircraft Maneuver
- Corrective feedback for students using a
simulator for training (Chapter 6)
3)
Team Behavioral Cloning RoboCup
Simulation Simulator Validation
- Quantify transferability of tactics between
reality and simulation (Chapter 7)
4)
Team Evolution RoboCup Simulation Reinforcement
learning
- Predict influence of simulator on students
(Chapter 8)
5)
96Previous Coping Strategies for Evolving RoboCup
Agents
- Evolve an individual agent instead of the whole
team. - Implement a graduated fitness function to provide
short-term rewards (and bias behavior). - Play a less complicated game in the RoboCup
domain - Keepaway soccer
- Single-player team
- Focus exclusively on goalie
97Hardware
- Cluster of commodity PCs
- 45 to 70 nodes
- 1GB of RAM
- 2.4 GHz Pentium-4 processor
- Gigabit ethernet
- Each generation (250 soccer matches) 15 minutes
of cluster execution time - Batch processing model implemented
- Torque (PBS)
- Maximize node utilization
- Cope with node failures
- Arbitrate cluster access with others jobs
98Fitness GainIsolated For Between-Run Comparison