Automated Tactics Modeling: Techniques and Applications - PowerPoint PPT Presentation

1 / 98
About This Presentation
Title:

Automated Tactics Modeling: Techniques and Applications

Description:

Automated Tactics Modeling: Techniques and Applications. Robert G Abbott ... Automated expert modeling for automated student evaluation. ... – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 99
Provided by: csU94
Category:

less

Transcript and Presenter's Notes

Title: Automated Tactics Modeling: Techniques and Applications


1
Automated Tactics ModelingTechniques and
Applications
  • Robert G Abbott
  • Advisor Stephanie Forrest
  • April 5, 2007

2
Objectives of Simulation-Based Training
  • Increase
  • Safety
  • Availability
  • Flexibility
  • Decrease
  • Cost
  • Environmental Impact

3
But is Simulation Training Valid?
  • Topic selection
  • Physics accuracy
  • Human behavior simulation
  • Corrective feedback

4
Proposed Solution
  • More SME Control
  • Simulation as Interactive Media

Getting the message from here
To Here
Recipients Students
Medium
Sender (SME)
Recipients Students
Recipients Students
5
Summary of Research
Technology
Application
Automated Perception
Grounding in Reality
1)
Tactics Modeling
Models for Experimentation
2)
Auto Student Evaluation
Corrective Feedback
3)
Team Behavioral Cloning
Tactics Transferability
4)
Agent Team Evolution
Simulate Learner
5)
6
BackgroundandRelated Work
7
Training Transfer Studies
  • The Gold Standard
  • Few and far between (Carletta 1998)

So much for that new flight training software!
8
SME Validation
(Wallace 1989)
  • Highly subjective
  • Limited SME availability

Wheeeee!
9
Reverse the Process
From
Simulation
Reality
To
Simulation
Reality
10
Conventional Expert System Construction
(Shadrick 2005)
SME
KnowledgeEngineer
Programmer
Simulation
11
Problems With Conventional Expert System
Construction
SME
  • Implicit knowledge
  • Coordination
  • Expensive
  • Slow

KnowledgeEngineer
Programmer
Simulation
12
An Example From Soccer
Soccer SME Force the attacker to screen the
ball with their first touch
Knowledge Engineer Section 23, Paragraph 9 In
defensive roles (section 7.21), players shall
reverse the course of the ball before prior to
initial contact by the advancing opponent
TheTelephone Game
Programmer else if(player.id gt 1 player.id
lt 4) dpos.x cos((w.y-s.y)/(w.x-s.x))
dpos.x cos((w.y-s.y)/(w.x-s.x))
Student I can beat the defender every time by
turning left.
13
Alternate ApproachTrainable Agents
SME
Software Developer
Software
Trainable Software
14
Related Work Behavioral Cloning
  • Data
  • Model
  • Agents
  • Generalization
  • Applications
  • Inverted pendulum (Widrow 1964)
  • Aircraft piloting (Sammut 1992)
  • Bicycle riding (Suc 1999)
  • Soccer play in RoboCup (Aler 2005)
  • Highly synthetic

15
Related Work Learning by Observation
  • Robotics
  • Applications
  • Juggling (Schaal 2004)
  • Table tennis (Atkeson 2000)
  • Automated driving (Pomerleau 1991, Thrun 2006)
  • Radio-controlled helicopter (Abbeel 2007)
  • many, many more
  • The simulator (if any) is just a byproduct

16
Task Domain RoboCup Simulation League
  • Popular simulation of a popular sport
  • Requires expertise

17
Summary of Research
Technology
Application
Automated Perception
Grounding in Reality
1)
Tactics Modeling
Models for Experimentation
2)
Auto Student Evaluation
Corrective Feedback
3)
Team Behavioral Cloning
Tactics Transferability
4)
Agent Team Evolution
Simulate Learner
5)
18
MuTTSA
19
Summary of Research
Technology
Application
Automated Perception
Grounding in Reality
1)
Tactics Modeling
Models for Experimentation
2)
Auto Student Evaluation
Corrective Feedback
3)
Team Behavioral Cloning
Tactics Transferability
4)
Agent Team Evolution
Simulate Learner
5)
20
Soccer Field Positioning
  • Predict position of each player
  • Captures high-level tactics
  • Instance-based learning algorithm (IBL)

21
Model ParametersNumber of Observations
  • Learning success
  • Data is precious
  • Up to 20 minutes of data

22
Parameter SelectionNumber of Contexts
  • Amount of agent memory
  • Over-fitting vs. over-generalization
  • Computational and memory costs
  • Up to 256 contexts

23
Parameter SelectionFeature Selection
  • Markov property
  • Curse of dimensionality
  • Conditions
  • BallX
  • Ball position
  • Ball position velocity

24
Prediction Accuracy Matrix(NumObservations,
NumContexts, Features)
Mean Squared Error of Player Position Prediction
Each Point 10-Fold Cross Validation for each of
10 Agents
min(row)
min(column)
25
Training Data Reduces Prediction Error
26
Additional ContextsReduce Prediction Error
27
Soccer Tactics Modeling Results
  • (Some) team tactics can be modeled
  • Ball position surprisingly effective
  • Room for improvement
  • But good enough to continue

28
Summary of Research
Technology
Application
Automated Perception
Grounding in Reality
1)
Tactics Modeling
Models for Experimentation
2)
Auto Student Evaluation
Corrective Feedback
3)
Team Behavioral Cloning
Tactics Transferability
4)
Agent Team Evolution
Simulate Learner
5)
29
Automated Student Evaluation
  • Behavior models effective in assessing student
    behavior
  • Automated Student Evaluation for Automated
    Student Evaluation (Abbott 2006)
  • Or Chapter 6

30
Summary of Research
Technology
Application
Automated Perception
Grounding in Reality
1)
Tactics Modeling
Models for Experimentation
2)
Auto Student Evaluation
Corrective Feedback
3)
Team Behavioral Cloning
Tactics Transferability
4)
Agent Team Evolution
Simulate Learner
5)
31
Hierarchy of Skills for aHuman-Like RoboCup Team
Team Field Positioning Model-Based
Individual Ball-Handling Ad-Hoc
Low-Level Skills UVA TriLearn
32
RoboCup Performance Matrix Parameter Sweep
(NumObservations, NumContexts, Features)
Mean Penalty Score
Each Point Mean of 100 Robocup Matches
33
RoboCup Performance Increases with Additional
Human Observation
34
Large Models Perform Poorly in RoboCup
35
Human Model Accuracy and RoboCup Performance are
Correlated
Average correlation for all 3 feature sets 0.43
36
Robocup Performance ExperimentResults
  • Strong correlation
  • Modeled behavior is significant
  • Real-world tactics transfer to simulation
  • The humans being modeled are skilled
  • Weak correlation
  • At least one of the above statements is false.

37
RoboCup Performance ExperimentSignificance to
Training Simulations
  • Strong correlation in a training simulator
  • The skill is important
  • Simulator penalizes arbitrary tactics
  • Simulator likely valid for evaluating students
  • Lack of significant correlation implies
  • Negative training
  • Back to the drawing board

38
Limitations of the Results
  • Constrained process for producing arbitrary
    tactics
  • Degraded models probably not representative of
    students

39
Summary of Research
Technology
Application
Automated Perception
Grounding in Reality
1)
Tactics Modeling
Models for Experimentation
2)
Auto Student Evaluation
Corrective Feedback
3)
Team Behavioral Cloning
Tactics Transferability
4)
Agent Team Evolution
Simulate Learner
5)
40
Gaming the System
  • Gaming Tactics beat the game without regard to
    training benefit
  • Circle-kick vulnerability (Burkhard 1998)
  • Tank simulator

41
Evolutionary Optimization
  • Maximizes reward
  • No expert preconceptions
  • Global search
  • Approximate training influnce on student

42
Characteristics of Robocup As An Optimization
Problem
  • Huge Search space
  • Reward (fitness) is
  • Expensive to compute
  • Stochastic
  • Delayed

43
Evolutionary Optimization Compliments Behavioral
Cloning
  • Shared strengths
  • Implementer is not a domain expert
  • Reduced programming
  • Complimentary strengths
  • Fast vs. Slow
  • Open-ended vs. dead-end

44
Evolving a Team Fitness
  • Goal difference
  • Fixed opponent
  • UVA Trilearn clone
  • 0 Penalty achievable

45
Unit of SelectionWhat is an Individual?
Population
Team
Team
Agent
Agent
Agent
Agent
Context
Context
Context
Context
Context
Context
Context
Context
X1
Y1
X2
Y2
X3
Y3
X4
Y4
X1
Y1
X2
Y2
X3
Y3
X4
Y4
  • Each genome codes for an entire team

46
Crossover Operation
Population
Team
Team
Agent
Agent
Agent
Agent
Context
Context
Context
Context
Context
Context
Context
Context
X1
Y1
X2
Y2
X1
Y1
X2
Y2
X3
Y3
X4
Y4
X3
Y3
X4
Y4
  • Single-point crossover agents 1..N switch teams

47
Mutation
Population
Team
Team
Agent
Agent
Agent
Agent
Context
Context
Context
Context
Context
Context
Context
Context
X1
Y1
X2
Y2
X3
Y3
X4
Y4
X1
Y1
X2
Y2
X3
Y3
X4
Y4
  • Pmutation 0.1 for all teams created by
    crossover
  • Mutation all values in team perturbed
    N(0,0.5m)

48
Selection OperatorTournament of Gaussians
49
Start Conditions
  • Only initial model contents vary
  • Random
  • Human Median
  • Human Clone
  • Trilearn Clone

50
Evolution Parameters
  • Volume of search space 3210105m68m
    (7140m2)320 101233m640
  • 250K RoboCup matches per run, 15 min. ea.

51
Team Evolution Results
52
Fitness - 1000 Generations
  • Progress but not convergence

53
Initial Fitness and Fitness Gain
  • Starting condition is critical
  • Synthetic (TriLearn) tactics beat human tactics

54
Human Similarity Metric
a1
b1
t4
t3
t2
t1
Player 1 Contexts
Cost function (a12b12 a12b12)/n
Player 2 Contexts
55
Evolution of Human Similarity
56
Total Change of Human Similarity
57
Team EvolutionDiscussion
  • Expert model captures valuable information
  • No drastic results from tactics evolution
  • 3 of 4 teams tactics became less human-like
  • Cant model all human teams
  • Cant cover the space of tactics
  • Hypothesis space of model is limited
  • Breakthrough tactics in RoboCup difficult

58
Team EvolutionLimitations
  • Cant model all human teams
  • Cant cover the space of tactics
  • Limited behavior model
  • Breakthrough tactics in RoboCup difficult

59
Team EvolutionLimitations
  • Cant model all human teams
  • Cant cover the space of tactics
  • Limited behavior model
  • Breakthrough tactics in RoboCup difficult

60
Conclusion
  • Human behavior critical to modeling and
    simulation
  • Importance of real-world data
  • Quantification of
  • Student tactical behavior
  • Tactics transferability
  • Training influence of simulator

61
Contributions
  • A data set of real-world human soccer tournament
    play
  • An attention-driven image segmentation algorithm
    which drastically reduces computation costs in a
    vision-based multiple target tracking system
  • A method for accurate, real-time assessment of
    student behavior in a tactical domain using
    comparison with a behavioral clone of domain
    experts
  • A measure of tactical fidelity in a simulator
    based on the correlation between human behavior
    predictive accuracy and software agent
    performance in the simulator
  • Tournament of Gaussians, a genetic selection
    algorithm which combines favorable aspects of
    proportionate and ordinal techniques

62
Publications
  • R.G. Abbott. Automated expert modeling for
    automated student evaluation. Intelligent
    Tutoring Systems 2006 1-10
  • R.G. Abbott, J.H. Whetzel, J.D. Basilico.
    Automated Student Evaluation for a Distributed
    After-Action Review Application. Human System
    Integration Symposium 2007.
  • R.G. Abbott, S. Forrest, K.J. Pienta. Simulating
    the hallmarks of cancer. Artificial Life 124
    617-634. 2006.
  • R.G. Abbot. Behavioral cloning for simulator
    validation. Submitted to RoboCup Symposium 2007.
  • R.G. Abbott, L.R. Williams. Multiple target
    tracking with lazy background subtraction and
    connected components analysis. Submitted to
    Journal of Machine Vision and Applications.

63
References
  • Pieter Abbeel, Adam Coates, Morgan Quigley, , and
    Andrew Y. Ng, 2007. An application of
    reinforcement learning to aerobatic helicopter
    flight. In Advances in Neural Information
    Processing Systems 19 (NIPS).
  • R. Aler, O. Garcia, and J.M. Valls, 2005.
    Correcting and improving imitation models of
    humans for robosoccer agents. IEEE Congress on
    Evolutionary Computation. Volume 3, pages
    2402-2409.
  • Christopher G. Atkeson, Joshua G. Hale, Frank
    Pollick, Marcia Riley, Shinya Kotosaka, Stefan
    Schaal, Tomohiro Shibata, Gaurav Tevatia, Ales
    Ude, Sethu Vijayakumar, and Mitsuo Kawato, 2000.
    Using humanoid robots to study human behavior.
    IEEE Intelligent Systems, 15(4)46-56.
  • Hans-Dieter Burkhard, Markus Hannebauer, and Jan
    Wendler, 1998. AT Humboldt - Development,
    Practice and Theory. In RoboCup-97 Robot
    Soccer World Cup I, pages 357-372.
  • Caretta, T. R. Dunlap, R. D. (1998). Transfer
    of training effectiveness in flight simulation
    1986-1997 (Report No. AFRL-HE-AZ-TR01998-0078).
    Mesa, AZ Air Force Research Laboratory.
  • Dean Pomerleau, 1991. Efficient training of
    artificial neural networks for autonomous
    navigation. Neural Computation, 3(1)88-97.
  • Claude Sammut, Scott Hurst, Dana Kedzier, and
    Donald Michie, 1992. Learning to fly. In ML '92
    Proceedings of the Ninth International Workshop
    on Machine Learning, pages 385-393, San
    Francisco, CA, USA. Morgan Kaufmann Publishers
    Inc.
  • S. Schaal, A. Ijspeert, and A. Billard, 2004.
    Computational approaches to motor learning by
    imitation, volume 1431, pages 199218. Oxford
    University Press.
  • Scott B. Shadrick and James W. Lussier, 2005.
    Concept Development for Future Domains A New
    Method of Knowledge Elicitation. U.S. Army
    Research Institute Tech Report 1167.
  • D. Suc and I. Bratko, 1999. Symbolic and
    qualitative reconstruction of control skill.
    Electronic transactions on artificial
    intelligence, Section B, Vol. 3122.
  • S. Thrun, M. Montemerlo, H. Dahlkamp, D. Stavens,
    A. Aron, J. Diebel, P. Fong, J. Gale, M.
    Halpenny, G. Hoffmann, K. Lau, C. Oakley, M.
    Palatucci, V. Pratt, P. Stang, S. Strohband, C.
    Dupont, L.-E. Jendrossek, C. Koelen, C. Markey,
    C. Rummel, J. van Niekerk, E. Jensen, P.
    Alessandrini, G. Bradski, B. Davies, S. Ettinger,
    A. Kaehler, A. Nefian, and P. Mahoney, 2006.
    Winning the DARPA Grand Challenge. Journal of
    Field Robotics.
  • B. Widrow and F.W. Smith, 1964.
    Pattern-recognizing control systems. In Computer
    and Information Sciences (COINS) Proceedings,
    Washington, D.C. Spartan.

64
Backup Slides
65
Data
AEMASE Algorithm
Code
Model Construction
Student Evaluation
Expert Performs Task
Student Performs Task
Observation Sequence
Feature Extraction
Feature Vector
Feature Vector Sequence
Context Set
Context Recognition
Transition Probabilities
Performance Evaluation
Sequencing
66
ClusteringFor Observation Selection
  • Try to preserve only semantically distinct
    Contexts (cluster centers)
  • K-Means Clustering
  • Allows (requires) manual specification of model
    size
  • Requires many distance computations expensive
    if using rotational invariance

67
Feature Mappingvs. Clustering
  • Specifying an appropriate dissimilarity measure
    is far more important in obtaining success with
    clustering than choice of clustering algorithm
  • - Hastie, Tibshirani, Freiedman , 2001

68
Clustering Example
Input 1215 Observations
Output 30 Cluster Centers (Contexts)
69
IntroductionIntelligent Tutoring Systems
Tutoring Module
Student Module
Expert Module
A Minimal Intelligent Tutoring System
Burns Capps 1998
70
IntroductionIntelligent Tutoring Systems
  • Student Module
  • Represents a students domain knowledge
  • Expert Module
  • Represents the tutors domain knowledge and
    problem-solving expertise
  • Tutoring moduleSelects exercises and presents
    instruction

71
Perception
  • Discretize input into a long sequence of
    Observations
  • Observation a (potentially) high dimensional
    vector
  • Computer vision, or application instrumentation

Soccer Video, 1 Game 70,000 Frames 640x240
Pixels 4 Cameras Highly Compressed 1,200,000,000
Bytes
Player/Ball Coordinates 70,000 46-Dimensional
Vectors i.e. 70,000 Rows x 46 Cols observation
Matrix 12,880,000 Bytes
72
Perception Example Soccer Data
1000 Samples (Approx 1 Min.) Of 23 Points In
R2 2 Teams Of 11 Players And The Ball
73
Perception Example Soccer DataMatrix
Representation
N 46 Columns (Indicators / Inputs / Random
Variables)
Ball
Player 1
Player 2

t 1
  • 0 0 -50.5 0 -20 -17 -20 -5.4
    -20 5.4 ...
  • -1.1 -2.0 -50.5 0 -20 -17 -20 -5.4
    -20 5.4 ...
  • -2.2 -3.9 -49.8 -0.0 -19.5 -16.6 -19.3 -5.3
    -19.4 5.3 ...
  • -3.3 -5.5 -49.0 -0.0 -19.4 -16.5 -18.6 -5.1
    -18.5 5.1 ...
  • -4.4 -7.1 -48.7 -0.1 -18.7 -16.4 -17.6 -4.9
    -17.7 4.8 ...
  • -5.3 -8.6 -48.6 -0.1 -17.9 -16.5 -16.7 -4.8
    -16.8 4.5 ...
  • -6.3 -10.1 -48.0 -0.2 -16.9 -16.5 -15.8 -4.5
    -15.9 4.1 ...
  • -7.3 -11.5 -47.2 -0.2 -15.9 -16.6 -15.5 -4.4
    -15.5 4.0 ...
  • -6.8 -11.7 -46.3 -0.2 -15.0 -16.7 -14.8 -4.4
    -14.7 3.9 ...
  • -6.3 -11.8 -45.3 -0.2 -13.9 -16.7 -13.9 -4.4
    -13.9 3.8 ...
  • ... ... ... ... ... ... ... ...
    ... ... ...

t 2
M 70,000 Rows (Observations)
Each Observation Is A Point In R46
74
Feature MappingComplex Features
  • The linear observations are segmented into
    logically atomic Complex (multidimensional)
    features
  • Each input might be used in several complex
    features, or not at all
  • Complex features may also use past inputs (not
    shown)

Self
Teammate 1
Opponent 2
Opponent 2
Observation (R20)
Fuel
Position
Heading
Fuel
Position
Heading
Position
Heading
Position
Heading
Complex Feature 1
Complex Feature 2
Complex Feature 3
Feature Vector (R6)
X1
Y1
Z1
X3
Y3
X2
75
Merging of 4 Viewpoints
  • UNM Soccer Pitch 70x66m
  • 4 cameras utilized for adequate resolution
  • Total resolution 2720 x 240 pixels
  • Efficient processing necessary!

76
Synthesized Overhead View
77
Some Target Tracking Terminology
  • Targets soccer players (etymology radar)
  • Track computers current state estimate for each
    target
  • Observation for each frame of video, the set of
    perceived target returns

78
LBSCCA Tentative Dilation
Tentative Connected Components With Dilation
Vertical Erosion
Horizontal Dilation
Vertical Erosion
Connected Components
Horizontal Dilation, Connected Components
LBSCCA (Correct Result)
Conventional Process
79
(No Transcript)
80
(No Transcript)
81
1NN Clustering AvoidsUndesirable Generalization
KnowledgeBase
1NNResponse
2NN Response
Query
82
Tournament of Gaussians
83
Selective Pressure vsPopulation Diversity
  • Low selective pressure Fitness is nullified
  • Evolution does not progress
  • High selective pressure Diversity is destroyed
  • Fast short-term gains
  • Converges to local minimum

84
Tournament of Gaussians
85
Common Tournament SelectionConstant Selective
Pressure
Diverse
Homogenous
86
Individual Tutoring
Instructor
Student
Group Instruction
Several Students
Instructor
Several Students
Books
Several Students
Many Students
Many Students
Instructor
Book
Many Students
Several Students
Computer Instruction
Many Students
Many Students
Many Students
Instructor
Programmer
Software
Other Applications
Computer Instruction
Several Students
Many Students
Many Students
Many Students
Instructor
Software
Programmer
Other Applications
87
Behavior Dataset Produced
  • 10 Hz samples
  • Positions of 22 players
  • Ball
  • PlayMode in play, waiting for kickoff, etc.
  • 20 minutes of play
  • Hand-verified
  • Accuracy 1m
  • Player identities preserved for duration of
    dataset
  • UC Irvine Anteaters vs. West Illinois U
    Leathernecks

88
RoboCup Performance Experiment Protocol
  • For each combination of observation set size,
    context set size, and selected features
    calculate the mean penalty score.
  • Penalty score Goalsopponent-Goalsself
  • The opponent
  • Is fixed is a model based-team but uses the
    same parameters in all conditions.
  • Is a clone of the UVA Trilearn RoboCup team.
  • 100 RoboCup matches per condition to estimate the
    mean penalty score.

89
Player Prediction ErrorX Component of Ball
Position
  • Best performance with only 256 (25s) of
    Observations
  • Best performance with only 1-4 Contexts

90
Player Prediction ErrorBall Position (2D)
  • Best performance with all (12000) Observations
  • Best performance with max (256) Contexts

91
Player Prediction ErrorBall Position Ball
Velocity
  • Best performance with all (12000) Observations
  • Best performance with max (256) Contexts

92
RoboCup PerformanceX Component of Ball
Position (1D)
  • Best performance with only 256 (25s) of
    Observations
  • Best performance with only 1-4 Contexts

93
RoboCup PerformanceBall Position (2D)
  • Best performance with 1024 to 4096 Observations
  • Best performance with 16 Contexts

94
RoboCup PerformanceBall Position Ball
Velocity (4D)
  • Performance improves through 12000 (all available
    Observations)
  • Best performance with 8-32 Contexts

95
Summary of Research
  • The foundation of simulator validation
    parallel observations of the real world and the
    simulator. (Chapters 3,4)

Perception Multiple Target Tracking MuTTSA, LBSCCA
1)
Tactics Modeling Temporal Feature
Extraction Goal-Driven Instance-Based Learning
  • Computational representation of tactics for
    experimentation and analysis. (Chapter 5)

2)
Automated Student Evaluation AEMASE Tactical
Aircraft Maneuver
  • Corrective feedback for students using a
    simulator for training (Chapter 6)

3)
Team Behavioral Cloning RoboCup
Simulation Simulator Validation
  • Quantify transferability of tactics between
    reality and simulation (Chapter 7)

4)
Team Evolution RoboCup Simulation Reinforcement
learning
  • Predict influence of simulator on students
    (Chapter 8)

5)
96
Previous Coping Strategies for Evolving RoboCup
Agents
  • Evolve an individual agent instead of the whole
    team.
  • Implement a graduated fitness function to provide
    short-term rewards (and bias behavior).
  • Play a less complicated game in the RoboCup
    domain
  • Keepaway soccer
  • Single-player team
  • Focus exclusively on goalie

97
Hardware
  • Cluster of commodity PCs
  • 45 to 70 nodes
  • 1GB of RAM
  • 2.4 GHz Pentium-4 processor
  • Gigabit ethernet
  • Each generation (250 soccer matches) 15 minutes
    of cluster execution time
  • Batch processing model implemented
  • Torque (PBS)
  • Maximize node utilization
  • Cope with node failures
  • Arbitrate cluster access with others jobs

98
Fitness GainIsolated For Between-Run Comparison
Write a Comment
User Comments (0)
About PowerShow.com