Predicting and Explaining Individual Performance in Complex Tasks

About This Presentation

Title:

Predicting and Explaining Individual Performance in Complex Tasks

Description:

Combine best features of cognitive modeling ... Modeling the target task is harder ... Complex tasks are not a modeling panacaea! ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 65

Provided by: scs88

Learn more at: http://act-r.psy.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Predicting and Explaining Individual Performance in Complex Tasks

1
Predicting and Explaining Individual Performance
in Complex Tasks

Marsha Lovett, Lynne Reder, Christian Lebiere,
John Rehling, Baris Demiral

This project is sponsored by the Department of
the Navy, Office of Naval Research
2
Multi-Tasking

A single person can perform multiple tasks.
A single model should be able to capture
performance on those multiple tasks.
A single person brings to bear the same
fundamental processing capacities to perform all
those tasks.
A single model should be able to predict that
persons performance across tasks from his/her
capacities.

A way to keep the multiple-constraint advantage
offered by unified theories of cognition while
making their development tractable is to do
Individual Data Modeling. That is, to gather a
large number of empirical/experimental
observations on a single subject (or a few
subjects analysed individually) using a variety
of tasks that exercise multiple abilities (e.g.,
perception memory, problem solving), and then to
use these data to develop a detailed
computational model of the subject that is able
to learn while performing the tasks.

Gobet Ritter, 2000
4

ZERO
PARAMETER
PREDICTIONS!

5
Basic Goals of Project

Combine best features of cognitive modeling
Study performance in a dynamic, multi-tasking
situation (albeit less complex than real world)
Explain not only aggregate behavior but variation
(using individual difference variables)
Predict (not fit/postdict) complex performance
Use cognitive architecture and fixed parameters
Employ off-the-shelf models whenever possible
Plug in individual difference params for each
person

6
How to predict task performance

Estimate each individuals processing parameters
Measure individuals performance on standard
tasks
Using models of these tasks, estimate
participants corresponding architectural
parameters (e.g., working memory capacity,
perceptual/motor speed)
Build/refine model of target task
Select global parameters for model of target task
(e.g., from previously collected data)
Plug into model of target task each individuals
parameters to predict his/her target task
performance

7
Example Memory Task Performance

Fit task A to estimate individuals parameters

8
Zero-Parameter Predictions

Plug those parameters into model of task B

(Lovett, Daily, Reder, 2000)
9
Challenges of Complex Tasks

Modeling the target task is harder
More than one individual difference variable
likely impacting target task
Possibility of knowledge/strategy differences

10
What about knowledge differences?

Develop tasks that reduce their relevance
Train participants on specific procedures
Measure skill/knowledge differences in another
task and incorporate them in model
Use model to predict variation in relative use of
strategies by way of estimates of individuals
processing capacities

11
Individual Differences in ACT-R

Most ACT-R models dont account for impact of
individual differences on performance, but the
potential is there
There are many parameters with particular
interpretations related to individual difference
variables
Most ACT-R modelers set parameters to universal
or global values, i.e., defaults or values that
fit aggregate data

12
ACT-R Individual Differences
P1, P2, P3,
M1, M2, M3,
W1, W2, W3,
13
Overview of Talk

Review tasks we are studying
Illustrate methodology
Highlight key results
Visual search vs. memory strategies trade off in
final performance gt complex task modeling offers
best constraint with fine-grained analysis

14
Modified Digit Span (MODS)
15
Modified Digit Span (MODS)
16
P/M Tasks

In our earlier studies, initial training phase of
target task was used to collect data on
individuals perceptual/motor speed.
e.g., Time to find object A7 and click on it
In later studies, separate task used to measure
perceptual and motor speed.

17
How to predict task performance

Estimate each individuals processing parameters
Measure individuals performance on MODS,
PercMotor
Using models of these tasks, estimate
participants corresponding architectural
parameters (e.g., working memory capacity,
perceptual/motor speed)
Build/refine model of target task
Select global parameters for model of target task
(e.g., from previously collected data)
Plug into model of target task each individuals
parameters to predict his/her target task
performance

18
W affects Performance

W is the ACT-R parameter for source activation,
which impacts the degree to which activation of
goal-related facts rises above the sea of other
facts activations
Higher W gt goal-related facts relatively more
activated gt faster and more accurately retrieved
gt better MODS performance

19
Estimating W

Model of MODS task is fit to individuals MODS
performance by varying W
Best fitting value of W is taken as estimate

20
Estimating PM

For simplicity, we estimated a combined PM
parameter directly from each individuals
perceptual/motor task performance.
This PM parameter was then used to scale the
timing of the target tasks perceptual-motor
productions.

21
Joint Distribution of W and P/M
W and P/M are tapping distinct characteristics
22
ACT-R Individual Differences
P1, P2, P3,
M1, M2, M3,
W1, W2, W3,
23
Specifics of our Approach

Estimate each individuals processing parameters
Measure individuals performance on modified
digit span, spatial span, perceptual/motor speed
Using models of these tasks, estimate
participants W, P, M
Build/refine model of air traffic control
taskAMBR
Select global parameters for AMBR model
Plug in individuals parameters to predict
performance across different AMBR scenarios

24
AMBR Air Traffic Control Task

Complex and dynamic task
Spatial and verbal aspects
Multi-tasking
Testbed for cognitive modeling architectures

25
AMBR TaskACaircraft, ATCair traffice controller

As ATC, you communicate with AC and other ATC to
handle all AC in your airspace
Six commands with different triggers
First ACCEPT, then WELCOME incoming AC (these two
separated by short interval)
First TRANSFER, then order a CONTACT message from
outgoing AC (these two separated by short
interval)
Decide to OK or REJECT requests for speed
increase
When a command is not handled before AC reaches
zone boundary, this is a HOLD (error)

26
Issuing an AMBR Command

Text message or radar cues particular action
Click on Command Button
Click on Aircraft (in radar screen)
Click on Air Traffic Controller (if necy)
Click on SEND Button

27
(No Transcript)
28
(No Transcript)
29
General Methods

Empirical Methods
Day 1 Collect MODS and P/M data and train on
AMBR plus AMBR practice
Day 2 Review AMBR instructions, battery of AMBR
scenarios
Modeling Methods
Use MODS PM data to estimate W and PM for each
subject
Plug individual W and PM values into AMBR model
Compare individuals AMBR performance with model
predictions

30
Experiments 1 2

AMBR Scenario Design
Experiment 1 alternating 5 easy, 5 hard
Experiment 2 9 scenarios of varying difficulty
AMBR Dependent Measures
Total time to handle each command
Number of hold errors

31
Off-the-shelf ACT-R Model of AMBR

Scan for something to do Radar, Left, Right,
Bottom text windows
When an action cue is noticed, determine if it
has been handled or not scan/remember
If the cue has not been handled, click command,
AC, ATC, SEND
Resume scanning

32
Model Captures Range of Performance
33
Model Predictions

Prediction of whether a subject commits an error
in a scenario, based on scenario details and
individuals W P/M

34
Indl Diffs Impact on Hold Errors

Hold errors only weakly dependent on W, more
strongly on P/M and scenario difficulty

Hold Errors
Parameter Value
35
Scenario Difficulty
Scenario
36
Mean Errors by Scenario
Scenario
37
Be Careful What (DM) you Model

Error data too coarse to constrain model
Even total RT/command data insufficient
Model predicts that scanning strategy plays a
large role in performance.
This is consistent with participant reports who
may be doing any combination of visual search or
memory retrieval

38
Observable Behaviors

Subject
T 0.0 Cue Accept T6?
T 3.6 ACCEPT button
T 5.9 AC T6
T 6.7 ATC EAST
T 7.7 SEND button

Model
T 0.0 Cue Accept T6?
T 3.7 ACCEPT button
T 5.7 AC T6
T 7.0 ATC EAST
T 8.2 SEND button

Stochastic variation on the single-action level
is part of subject and model behavior
39
The Details Are Inside

Model I/O
T 0.0 Cue Accept T6?
T 3.7 ACCEPT button
T 5.7 AC T6
T 7.0 ATC EAST
T 8.2 SEND button

Model Trace
T 1.5 Notice cue
T 2.5 Subgoal task
T 3.7 Mouse click
T 3.8 Start AC search
T 4.9 Find AC
T 5.7 Mouse click
T 7.0 Mouse click
T 8.2 Mouse click

40
Conclusion thus far

Visual search vs. memory strategies trade off in
final performance gt even when modeling a complex
task, coarse dependent measures (accuracy, total
RT) hide important details
Previous AMBR model fit group data well
Only by seeking extra constraint of modeling
individual participants were important gaps in
model fidelity revealed

41
Modifications for Experiment 3

Use more fine-grained measures Action RT
Clicks
Modify the ATC task to increase memory demand
More interesting for our purposes
More realistic
Lengthen scenario length so same planes are in
play
Hide AC names until click, then only after delay
Use model to bracket appropriate difficulty level

42
Raw Characteristics of Data

Experiment 3
Action RT 12.1 sec, Holds 3.3 / subject
Action RT correlates with W (r -0.314) and Pm
(r 0.485)
Holds correlates with W (r -0.444) and Pm (r
0.508)

43
Model Modifications

Search not only can give the answer sought (a
specific ACs location) but an additional
rehearsal of that information
In slack times, possible strategy of studying
radar screen to rehearse AC names (called
exploratory clicks)

44
Model Predicts Hold Errors

Predicts errors per subject, r 0.81
Hold errors depend more on W (compared to
previous version of task) but still mostly
dependent on PM and scenario difficulty
Move to modeling more fine-grained aspects of
data

45
Model Predicts Number of Clicks
46
(No Transcript)
47
W, P/M affect RT click by click
Hi-Hi Model Subject

Set W-P/M parameters in model corresponding to
participants (e.g., hi-hi lo-lo)
Run model to produce RT predictions click by
click (for 2 commands Accept and Contact)

Lo-Lo Model Subject
48
W, P/M affect RT click by click

Set W-P/M parameters in model corresponding to
participants
Run model to produce RT predictions click by
click (for 2 commands Accept and Contact)

49
Conclusion thus far

Modeling more fine-grained measures required task
and model modifications, but this produced
individual participant predictions that were very
promising.
Clicking on correct AC the first time ranges from
69 to 96
Akin to remember vs. scan strategies
Higher number -gt more (accurate) remembering
This detailed aspect of performance relates to W

50
Theoretical InterludeSpatial vs. Verbal WM

Our working assumption (parsimoniously) posits a
single source activation parameter, W
W modulates the degree to which goal-relevant
facts are activated above the sea of unrelated
facts
regardless of spatial/verbal representation
This perspective still allows for spatial/verbal
distinctions in performance but explains them as
a function of differences in spatial/verbal
skills etc.

51
Opportunity to Test in Current Work

AMBR task has spatial and verbal aspects
Included verbal and spatial working memory tasks
in battery, starting with Experiment 3
Which span task produces W estimates that best
predict individuals AMBR performance?
Spatial Span task from Miyake and Shah (1996)

R
R
R
normal
normal
reversed
52
Opportunity to Test in Current Work

Result
Experiments 3 4 Spatial Span-based W predicts
AMBR performance better than MODS-based W
Possible explanations
Spatial format more relevant for this task?
Spatial Span shows more variability -gt more
sensitive?
Spatial Span variability taps other sources of
variation?
Are there separate Ws for verbal and spatial WM?

53
Opportunity to Test in Current Work

Result
Experiments 3 4 Spatial Span-based W predicts
AMBR performance better than MODS-based W
Possible explanations
Spatial format more relevant for this task?
Spatial Span shows more variability -gt more
sensitive?
Spatial Span variability taps other sources of
variation?
Are there separate Ws for verbal and spatial WM?

54
Spatial Span taps speed as well

Another study, spawned by this issue, shows
relationship between individuals mental rotation
speed and Spatial Span
Pattern of correlations with PM
MODS r.25 Spatial Span r.65
Pattern of correlations with AMBR components

MemMouse
Mouse
Mouse
55
Theoretical Interlude Conclusion

Studying verbal vs. spatial memory resources in
context of AMBR task moves theoretical debate to
more realistic arena
This complements work with laboratory tasks and
allows greater potential for generalization of
results

56
Strategic Variation Emerges

Experiment 4 also revealed several sources of
strategic variation, explored further in
Experiment 5
Waiting for AC name ranges from 42 to 100
May reflect lack of confidence in memory, utility
of checking ones memory
Somewhat negatively correlated with W
Initiating welcome and contact commands in
anticipation of text cue (ranges from 0 to 100)
Making exploratory clicks on ACs during slack
time (ranges from never to gt 5 per scenario)

57
Experiment 5 Details

Scenarios designed to have low (6 ACs) vs. high
memory load (total 12 ACs)
Speed requests most common command
Most interesting for model predictions
Least susceptible to snowball effects
Dependent measures include RTs for individual
clicks and strategy use as a function of scenario
difficulty and command

58
Modeling Specific AMBR Components
Hard Scenarios
Accuracy of first AC click
Easy Scenarios
Accuracy of first AC click
59
Modeling Specific AMBR Components
Hard Scenarios
RT to Correct AC click
Easy Scenarios
RT to Correct AC click
60
Model Predictions Match Data

Main effects of scenario difficulty amplified for
low W individuals
Main effects of command type (more/less
memory-demanding) amplified for low W
Wait-for-AC-name strategy varied as a function of
command type
Exploratory clicks strategy varied as a function
of scenario difficulty

61
Summary of Conclusions

Complex tasks are not a modeling panacaea! Only
by seeking extra constraint of modeling
individual participants were important gaps in
models fidelity revealed.
Studying verbal vs. spatial memory resources in
context of AMBR task moves theoretical debate to
more realistic arena.
Variability in performance -- from different use
of strategies and/or from differences in
processing capacities -- is there for the
looking. Studying performance on average offers
incomplete understanding.

62
(No Transcript)
63
Features of Our Approach

Our approach aims to jointly provide
Predictions that are accurate and detailed
At the individual participant level
Generated in real time (or faster)
Based on an interpretable model with variation in
meaningful individual difference parameters
That generalize to variants of the target task

64
Joint Distribution of W and P/M
W and P/M are tapping distinct characteristics

Write a Comment

User Comments (0)