Automated Tactics Modeling: Techniques and Applications

About This Presentation

Title:

Automated Tactics Modeling: Techniques and Applications

Description:

Automated Tactics Modeling: Techniques and Applications. Robert G Abbott ... Automated expert modeling for automated student evaluation. ... – PowerPoint PPT presentation

Number of Views:112

Avg rating:3.0/5.0

Slides: 99

Provided by: csU94

Category:

more less

Transcript and Presenter's Notes

Title: Automated Tactics Modeling: Techniques and Applications

1
Automated Tactics ModelingTechniques and
Applications

Robert G Abbott
Advisor Stephanie Forrest
April 5, 2007

2
Objectives of Simulation-Based Training

Increase
Safety
Availability
Flexibility

Decrease
Cost
Environmental Impact

3
But is Simulation Training Valid?

Topic selection
Physics accuracy
Human behavior simulation
Corrective feedback

4
Proposed Solution

More SME Control
Simulation as Interactive Media

Getting the message from here
To Here
Recipients Students
Medium
Sender (SME)
Recipients Students
Recipients Students
5
Summary of Research
Technology
Application
Automated Perception
Grounding in Reality
1)
Tactics Modeling
Models for Experimentation
2)
Auto Student Evaluation
Corrective Feedback
3)
Team Behavioral Cloning
Tactics Transferability
4)
Agent Team Evolution
Simulate Learner
5)
6
BackgroundandRelated Work
7
Training Transfer Studies

The Gold Standard
Few and far between (Carletta 1998)

So much for that new flight training software!
8
SME Validation
(Wallace 1989)

Highly subjective
Limited SME availability

Wheeeee!
9
Reverse the Process
From
Simulation
Reality
To
Simulation
Reality
10
Conventional Expert System Construction
(Shadrick 2005)
SME
KnowledgeEngineer
Programmer
Simulation
11
Problems With Conventional Expert System
Construction
SME

Implicit knowledge
Coordination
Expensive
Slow

KnowledgeEngineer
Programmer
Simulation
12
An Example From Soccer
Soccer SME Force the attacker to screen the
ball with their first touch
Knowledge Engineer Section 23, Paragraph 9 In
defensive roles (section 7.21), players shall
reverse the course of the ball before prior to
initial contact by the advancing opponent
TheTelephone Game
Programmer else if(player.id gt 1 player.id
lt 4) dpos.x cos((w.y-s.y)/(w.x-s.x))
dpos.x cos((w.y-s.y)/(w.x-s.x))
Student I can beat the defender every time by
turning left.
13
Alternate ApproachTrainable Agents
SME
Software Developer
Software
Trainable Software
14
Related Work Behavioral Cloning

Data
Model
Agents
Generalization
Applications
Inverted pendulum (Widrow 1964)
Aircraft piloting (Sammut 1992)
Bicycle riding (Suc 1999)
Soccer play in RoboCup (Aler 2005)
Highly synthetic

15
Related Work Learning by Observation

Robotics
Applications
Juggling (Schaal 2004)
Table tennis (Atkeson 2000)
Automated driving (Pomerleau 1991, Thrun 2006)
Radio-controlled helicopter (Abbeel 2007)
many, many more
The simulator (if any) is just a byproduct

16
Task Domain RoboCup Simulation League

Popular simulation of a popular sport
Requires expertise

17
Summary of Research
Technology
Application
Automated Perception
Grounding in Reality
1)
Tactics Modeling
Models for Experimentation
2)
Auto Student Evaluation
Corrective Feedback
3)
Team Behavioral Cloning
Tactics Transferability
4)
Agent Team Evolution
Simulate Learner
5)
18
MuTTSA
19
Summary of Research
Technology
Application
Automated Perception
Grounding in Reality
1)
Tactics Modeling
Models for Experimentation
2)
Auto Student Evaluation
Corrective Feedback
3)
Team Behavioral Cloning
Tactics Transferability
4)
Agent Team Evolution
Simulate Learner
5)
20
Soccer Field Positioning

Predict position of each player
Captures high-level tactics
Instance-based learning algorithm (IBL)

21
Model ParametersNumber of Observations

Learning success
Data is precious
Up to 20 minutes of data

22
Parameter SelectionNumber of Contexts

Amount of agent memory
Over-fitting vs. over-generalization
Computational and memory costs
Up to 256 contexts

23
Parameter SelectionFeature Selection

Markov property
Curse of dimensionality
Conditions
BallX
Ball position
Ball position velocity

24
Prediction Accuracy Matrix(NumObservations,
NumContexts, Features)
Mean Squared Error of Player Position Prediction
Each Point 10-Fold Cross Validation for each of
10 Agents
min(row)
min(column)
25
Training Data Reduces Prediction Error
26
Additional ContextsReduce Prediction Error
27
Soccer Tactics Modeling Results

(Some) team tactics can be modeled
Ball position surprisingly effective
Room for improvement
But good enough to continue

28
Summary of Research
Technology
Application
Automated Perception
Grounding in Reality
1)
Tactics Modeling
Models for Experimentation
2)
Auto Student Evaluation
Corrective Feedback
3)
Team Behavioral Cloning
Tactics Transferability
4)
Agent Team Evolution
Simulate Learner
5)
29
Automated Student Evaluation

Behavior models effective in assessing student
behavior
Automated Student Evaluation for Automated
Student Evaluation (Abbott 2006)
Or Chapter 6

30
Summary of Research
Technology
Application
Automated Perception
Grounding in Reality
1)
Tactics Modeling
Models for Experimentation
2)
Auto Student Evaluation
Corrective Feedback
3)
Team Behavioral Cloning
Tactics Transferability
4)
Agent Team Evolution
Simulate Learner
5)
31
Hierarchy of Skills for aHuman-Like RoboCup Team
Team Field Positioning Model-Based
Individual Ball-Handling Ad-Hoc
Low-Level Skills UVA TriLearn
32
RoboCup Performance Matrix Parameter Sweep
(NumObservations, NumContexts, Features)
Mean Penalty Score
Each Point Mean of 100 Robocup Matches
33
RoboCup Performance Increases with Additional
Human Observation
34
Large Models Perform Poorly in RoboCup
35
Human Model Accuracy and RoboCup Performance are
Correlated
Average correlation for all 3 feature sets 0.43
36
Robocup Performance ExperimentResults

Strong correlation
Modeled behavior is significant
Real-world tactics transfer to simulation
The humans being modeled are skilled
Weak correlation
At least one of the above statements is false.

37
RoboCup Performance ExperimentSignificance to
Training Simulations

Strong correlation in a training simulator
The skill is important
Simulator penalizes arbitrary tactics
Simulator likely valid for evaluating students
Lack of significant correlation implies
Negative training
Back to the drawing board

38
Limitations of the Results

Constrained process for producing arbitrary
tactics
Degraded models probably not representative of
students

39
Summary of Research
Technology
Application
Automated Perception
Grounding in Reality
1)
Tactics Modeling
Models for Experimentation
2)
Auto Student Evaluation
Corrective Feedback
3)
Team Behavioral Cloning
Tactics Transferability
4)
Agent Team Evolution
Simulate Learner
5)
40
Gaming the System

Gaming Tactics beat the game without regard to
training benefit
Circle-kick vulnerability (Burkhard 1998)
Tank simulator

41
Evolutionary Optimization

Maximizes reward
No expert preconceptions
Global search
Approximate training influnce on student

42
Characteristics of Robocup As An Optimization
Problem

Huge Search space
Reward (fitness) is
Expensive to compute
Stochastic
Delayed

43
Evolutionary Optimization Compliments Behavioral
Cloning

Shared strengths
Implementer is not a domain expert
Reduced programming
Complimentary strengths
Fast vs. Slow
Open-ended vs. dead-end

44
Evolving a Team Fitness

Goal difference
Fixed opponent
UVA Trilearn clone
0 Penalty achievable

45
Unit of SelectionWhat is an Individual?
Population
Team
Team
Agent
Agent
Agent
Agent
Context
Context
Context
Context
Context
Context
Context
Context
X1
Y1
X2
Y2
X3
Y3
X4
Y4
X1
Y1
X2
Y2
X3
Y3
X4
Y4

Each genome codes for an entire team

46
Crossover Operation
Population
Team
Team
Agent
Agent
Agent
Agent
Context
Context
Context
Context
Context
Context
Context
Context
X1
Y1
X2
Y2
X1
Y1
X2
Y2
X3
Y3
X4
Y4
X3
Y3
X4
Y4

Single-point crossover agents 1..N switch teams

47
Mutation
Population
Team
Team
Agent
Agent
Agent
Agent
Context
Context
Context
Context
Context
Context
Context
Context
X1
Y1
X2
Y2
X3
Y3
X4
Y4
X1
Y1
X2
Y2
X3
Y3
X4
Y4

Pmutation 0.1 for all teams created by
crossover
Mutation all values in team perturbed
N(0,0.5m)

48
Selection OperatorTournament of Gaussians
49
Start Conditions

Only initial model contents vary
Random
Human Median
Human Clone
Trilearn Clone

50
Evolution Parameters

Volume of search space 3210105m68m
(7140m2)320 101233m640
250K RoboCup matches per run, 15 min. ea.

51
Team Evolution Results
52
Fitness - 1000 Generations

Progress but not convergence

53
Initial Fitness and Fitness Gain

Starting condition is critical
Synthetic (TriLearn) tactics beat human tactics

54
Human Similarity Metric
a1
b1
t4
t3
t2
t1
Player 1 Contexts
Cost function (a12b12 a12b12)/n
Player 2 Contexts
55
Evolution of Human Similarity
56
Total Change of Human Similarity
57
Team EvolutionDiscussion

Expert model captures valuable information
No drastic results from tactics evolution
3 of 4 teams tactics became less human-like
Cant model all human teams
Cant cover the space of tactics
Hypothesis space of model is limited
Breakthrough tactics in RoboCup difficult

58
Team EvolutionLimitations

Cant model all human teams
Cant cover the space of tactics
Limited behavior model
Breakthrough tactics in RoboCup difficult

59
Team EvolutionLimitations

Cant model all human teams
Cant cover the space of tactics
Limited behavior model
Breakthrough tactics in RoboCup difficult

60
Conclusion

Human behavior critical to modeling and
simulation
Importance of real-world data
Quantification of
Student tactical behavior
Tactics transferability
Training influence of simulator

61
Contributions

A data set of real-world human soccer tournament
play
An attention-driven image segmentation algorithm
which drastically reduces computation costs in a
vision-based multiple target tracking system
A method for accurate, real-time assessment of
student behavior in a tactical domain using
comparison with a behavioral clone of domain
experts
A measure of tactical fidelity in a simulator
based on the correlation between human behavior
predictive accuracy and software agent
performance in the simulator
Tournament of Gaussians, a genetic selection
algorithm which combines favorable aspects of
proportionate and ordinal techniques

62
Publications

R.G. Abbott. Automated expert modeling for
automated student evaluation. Intelligent
Tutoring Systems 2006 1-10
R.G. Abbott, J.H. Whetzel, J.D. Basilico.
Automated Student Evaluation for a Distributed
After-Action Review Application. Human System
Integration Symposium 2007.
R.G. Abbott, S. Forrest, K.J. Pienta. Simulating
the hallmarks of cancer. Artificial Life 124
617-634. 2006.
R.G. Abbot. Behavioral cloning for simulator
validation. Submitted to RoboCup Symposium 2007.
R.G. Abbott, L.R. Williams. Multiple target
tracking with lazy background subtraction and
connected components analysis. Submitted to
Journal of Machine Vision and Applications.

63
References

Pieter Abbeel, Adam Coates, Morgan Quigley, , and
Andrew Y. Ng, 2007. An application of
reinforcement learning to aerobatic helicopter
flight. In Advances in Neural Information
Processing Systems 19 (NIPS).
R. Aler, O. Garcia, and J.M. Valls, 2005.
Correcting and improving imitation models of
humans for robosoccer agents. IEEE Congress on
Evolutionary Computation. Volume 3, pages
2402-2409.
Christopher G. Atkeson, Joshua G. Hale, Frank
Pollick, Marcia Riley, Shinya Kotosaka, Stefan
Schaal, Tomohiro Shibata, Gaurav Tevatia, Ales
Ude, Sethu Vijayakumar, and Mitsuo Kawato, 2000.
Using humanoid robots to study human behavior.
IEEE Intelligent Systems, 15(4)46-56.
Hans-Dieter Burkhard, Markus Hannebauer, and Jan
Wendler, 1998. AT Humboldt - Development,
Practice and Theory. In RoboCup-97 Robot
Soccer World Cup I, pages 357-372.
Caretta, T. R. Dunlap, R. D. (1998). Transfer
of training effectiveness in flight simulation
1986-1997 (Report No. AFRL-HE-AZ-TR01998-0078).
Mesa, AZ Air Force Research Laboratory.
Dean Pomerleau, 1991. Efficient training of
artificial neural networks for autonomous
navigation. Neural Computation, 3(1)88-97.
Claude Sammut, Scott Hurst, Dana Kedzier, and
Donald Michie, 1992. Learning to fly. In ML '92
Proceedings of the Ninth International Workshop
on Machine Learning, pages 385-393, San
Francisco, CA, USA. Morgan Kaufmann Publishers
Inc.
S. Schaal, A. Ijspeert, and A. Billard, 2004.
Computational approaches to motor learning by
imitation, volume 1431, pages 199218. Oxford
University Press.
Scott B. Shadrick and James W. Lussier, 2005.
Concept Development for Future Domains A New
Method of Knowledge Elicitation. U.S. Army
Research Institute Tech Report 1167.
D. Suc and I. Bratko, 1999. Symbolic and
qualitative reconstruction of control skill.
Electronic transactions on artificial
intelligence, Section B, Vol. 3122.
S. Thrun, M. Montemerlo, H. Dahlkamp, D. Stavens,
A. Aron, J. Diebel, P. Fong, J. Gale, M.
Halpenny, G. Hoffmann, K. Lau, C. Oakley, M.
Palatucci, V. Pratt, P. Stang, S. Strohband, C.
Dupont, L.-E. Jendrossek, C. Koelen, C. Markey,
C. Rummel, J. van Niekerk, E. Jensen, P.
Alessandrini, G. Bradski, B. Davies, S. Ettinger,
A. Kaehler, A. Nefian, and P. Mahoney, 2006.
Winning the DARPA Grand Challenge. Journal of
Field Robotics.
B. Widrow and F.W. Smith, 1964.
Pattern-recognizing control systems. In Computer
and Information Sciences (COINS) Proceedings,
Washington, D.C. Spartan.

64
Backup Slides
65
Data
AEMASE Algorithm
Code
Model Construction
Student Evaluation
Expert Performs Task
Student Performs Task
Observation Sequence
Feature Extraction
Feature Vector
Feature Vector Sequence
Context Set
Context Recognition
Transition Probabilities
Performance Evaluation
Sequencing
66
ClusteringFor Observation Selection

Try to preserve only semantically distinct
Contexts (cluster centers)
K-Means Clustering
Allows (requires) manual specification of model
size
Requires many distance computations expensive
if using rotational invariance

67
Feature Mappingvs. Clustering

Specifying an appropriate dissimilarity measure
is far more important in obtaining success with
clustering than choice of clustering algorithm
- Hastie, Tibshirani, Freiedman , 2001

68
Clustering Example
Input 1215 Observations
Output 30 Cluster Centers (Contexts)
69
IntroductionIntelligent Tutoring Systems
Tutoring Module
Student Module
Expert Module
A Minimal Intelligent Tutoring System
Burns Capps 1998
70
IntroductionIntelligent Tutoring Systems

Student Module
Represents a students domain knowledge
Expert Module
Represents the tutors domain knowledge and
problem-solving expertise
Tutoring moduleSelects exercises and presents
instruction

71
Perception

Discretize input into a long sequence of
Observations
Observation a (potentially) high dimensional
vector
Computer vision, or application instrumentation

Soccer Video, 1 Game 70,000 Frames 640x240
Pixels 4 Cameras Highly Compressed 1,200,000,000
Bytes
Player/Ball Coordinates 70,000 46-Dimensional
Vectors i.e. 70,000 Rows x 46 Cols observation
Matrix 12,880,000 Bytes
72
Perception Example Soccer Data
1000 Samples (Approx 1 Min.) Of 23 Points In
R2 2 Teams Of 11 Players And The Ball
73
Perception Example Soccer DataMatrix
Representation
N 46 Columns (Indicators / Inputs / Random
Variables)
Ball
Player 1
Player 2

t 1

0 0 -50.5 0 -20 -17 -20 -5.4
-20 5.4 ...
-1.1 -2.0 -50.5 0 -20 -17 -20 -5.4
-20 5.4 ...
-2.2 -3.9 -49.8 -0.0 -19.5 -16.6 -19.3 -5.3
-19.4 5.3 ...
-3.3 -5.5 -49.0 -0.0 -19.4 -16.5 -18.6 -5.1
-18.5 5.1 ...
-4.4 -7.1 -48.7 -0.1 -18.7 -16.4 -17.6 -4.9
-17.7 4.8 ...
-5.3 -8.6 -48.6 -0.1 -17.9 -16.5 -16.7 -4.8
-16.8 4.5 ...
-6.3 -10.1 -48.0 -0.2 -16.9 -16.5 -15.8 -4.5
-15.9 4.1 ...
-7.3 -11.5 -47.2 -0.2 -15.9 -16.6 -15.5 -4.4
-15.5 4.0 ...
-6.8 -11.7 -46.3 -0.2 -15.0 -16.7 -14.8 -4.4
-14.7 3.9 ...
-6.3 -11.8 -45.3 -0.2 -13.9 -16.7 -13.9 -4.4
-13.9 3.8 ...
... ... ... ... ... ... ... ...
... ... ...

t 2
M 70,000 Rows (Observations)
Each Observation Is A Point In R46
74
Feature MappingComplex Features

The linear observations are segmented into
logically atomic Complex (multidimensional)
features

Each input might be used in several complex
features, or not at all
Complex features may also use past inputs (not
shown)

Self
Teammate 1
Opponent 2
Opponent 2
Observation (R20)
Fuel
Position
Heading
Fuel
Position
Heading
Position
Heading
Position
Heading
Complex Feature 1
Complex Feature 2
Complex Feature 3
Feature Vector (R6)
X1
Y1
Z1
X3
Y3
X2
75
Merging of 4 Viewpoints

UNM Soccer Pitch 70x66m
4 cameras utilized for adequate resolution
Total resolution 2720 x 240 pixels
Efficient processing necessary!

76
Synthesized Overhead View
77
Some Target Tracking Terminology

Targets soccer players (etymology radar)
Track computers current state estimate for each
target
Observation for each frame of video, the set of
perceived target returns

78
LBSCCA Tentative Dilation
Tentative Connected Components With Dilation
Vertical Erosion
Horizontal Dilation
Vertical Erosion
Connected Components
Horizontal Dilation, Connected Components
LBSCCA (Correct Result)
Conventional Process
79
(No Transcript)
80
(No Transcript)
81
1NN Clustering AvoidsUndesirable Generalization
KnowledgeBase
1NNResponse
2NN Response
Query
82
Tournament of Gaussians
83
Selective Pressure vsPopulation Diversity

Low selective pressure Fitness is nullified
Evolution does not progress
High selective pressure Diversity is destroyed
Fast short-term gains
Converges to local minimum

84
Tournament of Gaussians
85
Common Tournament SelectionConstant Selective
Pressure
Diverse
Homogenous
86
Individual Tutoring
Instructor
Student
Group Instruction
Several Students
Instructor
Several Students
Books
Several Students
Many Students
Many Students
Instructor
Book
Many Students
Several Students
Computer Instruction
Many Students
Many Students
Many Students
Instructor
Programmer
Software
Other Applications
Computer Instruction
Several Students
Many Students
Many Students
Many Students
Instructor
Software
Programmer
Other Applications
87
Behavior Dataset Produced

10 Hz samples
Positions of 22 players
Ball
PlayMode in play, waiting for kickoff, etc.
20 minutes of play
Hand-verified
Accuracy 1m
Player identities preserved for duration of
dataset
UC Irvine Anteaters vs. West Illinois U
Leathernecks

88
RoboCup Performance Experiment Protocol

For each combination of observation set size,
context set size, and selected features
calculate the mean penalty score.
Penalty score Goalsopponent-Goalsself
The opponent
Is fixed is a model based-team but uses the
same parameters in all conditions.
Is a clone of the UVA Trilearn RoboCup team.
100 RoboCup matches per condition to estimate the
mean penalty score.

89
Player Prediction ErrorX Component of Ball
Position

Best performance with only 256 (25s) of
Observations
Best performance with only 1-4 Contexts

90
Player Prediction ErrorBall Position (2D)

Best performance with all (12000) Observations
Best performance with max (256) Contexts

91
Player Prediction ErrorBall Position Ball
Velocity

Best performance with all (12000) Observations
Best performance with max (256) Contexts

92
RoboCup PerformanceX Component of Ball
Position (1D)

Best performance with only 256 (25s) of
Observations
Best performance with only 1-4 Contexts

93
RoboCup PerformanceBall Position (2D)

Best performance with 1024 to 4096 Observations
Best performance with 16 Contexts

94
RoboCup PerformanceBall Position Ball
Velocity (4D)

Performance improves through 12000 (all available
Observations)
Best performance with 8-32 Contexts

95
Summary of Research

The foundation of simulator validation
parallel observations of the real world and the
simulator. (Chapters 3,4)

Perception Multiple Target Tracking MuTTSA, LBSCCA
1)
Tactics Modeling Temporal Feature
Extraction Goal-Driven Instance-Based Learning

Computational representation of tactics for
experimentation and analysis. (Chapter 5)

2)
Automated Student Evaluation AEMASE Tactical
Aircraft Maneuver

Corrective feedback for students using a
simulator for training (Chapter 6)

3)
Team Behavioral Cloning RoboCup
Simulation Simulator Validation

Quantify transferability of tactics between
reality and simulation (Chapter 7)

4)
Team Evolution RoboCup Simulation Reinforcement
learning

Predict influence of simulator on students
(Chapter 8)

5)
96
Previous Coping Strategies for Evolving RoboCup
Agents

Evolve an individual agent instead of the whole
team.
Implement a graduated fitness function to provide
short-term rewards (and bias behavior).
Play a less complicated game in the RoboCup
domain
Keepaway soccer
Single-player team
Focus exclusively on goalie

97
Hardware