Predicting and Explaining Individual Performance in Complex Tasks - PowerPoint PPT Presentation

About This Presentation
Title:

Predicting and Explaining Individual Performance in Complex Tasks

Description:

Combine best features of cognitive modeling ... Modeling the target task is harder ... Complex tasks are not a modeling panacaea! ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 65
Provided by: scs88
Learn more at: http://act-r.psy.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Predicting and Explaining Individual Performance in Complex Tasks


1
Predicting and Explaining Individual Performance
in Complex Tasks
  • Marsha Lovett, Lynne Reder, Christian Lebiere,
    John Rehling, Baris Demiral

This project is sponsored by the Department of
the Navy, Office of Naval Research
2
Multi-Tasking
  • A single person can perform multiple tasks.
  • A single model should be able to capture
    performance on those multiple tasks.
  • A single person brings to bear the same
    fundamental processing capacities to perform all
    those tasks.
  • A single model should be able to predict that
    persons performance across tasks from his/her
    capacities.

3
  • A way to keep the multiple-constraint advantage
    offered by unified theories of cognition while
    making their development tractable is to do
    Individual Data Modeling. That is, to gather a
    large number of empirical/experimental
    observations on a single subject (or a few
    subjects analysed individually) using a variety
    of tasks that exercise multiple abilities (e.g.,
    perception memory, problem solving), and then to
    use these data to develop a detailed
    computational model of the subject that is able
    to learn while performing the tasks.

Gobet Ritter, 2000
4
  • ZERO
  • PARAMETER
  • PREDICTIONS!

5
Basic Goals of Project
  • Combine best features of cognitive modeling
  • Study performance in a dynamic, multi-tasking
    situation (albeit less complex than real world)
  • Explain not only aggregate behavior but variation
    (using individual difference variables)
  • Predict (not fit/postdict) complex performance
  • Use cognitive architecture and fixed parameters
  • Employ off-the-shelf models whenever possible
  • Plug in individual difference params for each
    person

6
How to predict task performance
  • Estimate each individuals processing parameters
  • Measure individuals performance on standard
    tasks
  • Using models of these tasks, estimate
    participants corresponding architectural
    parameters (e.g., working memory capacity,
    perceptual/motor speed)
  • Build/refine model of target task
  • Select global parameters for model of target task
    (e.g., from previously collected data)
  • Plug into model of target task each individuals
    parameters to predict his/her target task
    performance

7
Example Memory Task Performance
  • Fit task A to estimate individuals parameters

8
Zero-Parameter Predictions
  • Plug those parameters into model of task B

(Lovett, Daily, Reder, 2000)
9
Challenges of Complex Tasks
  • Modeling the target task is harder
  • More than one individual difference variable
    likely impacting target task
  • Possibility of knowledge/strategy differences

10
What about knowledge differences?
  • Develop tasks that reduce their relevance
  • Train participants on specific procedures
  • Measure skill/knowledge differences in another
    task and incorporate them in model
  • Use model to predict variation in relative use of
    strategies by way of estimates of individuals
    processing capacities

11
Individual Differences in ACT-R
  • Most ACT-R models dont account for impact of
    individual differences on performance, but the
    potential is there
  • There are many parameters with particular
    interpretations related to individual difference
    variables
  • Most ACT-R modelers set parameters to universal
    or global values, i.e., defaults or values that
    fit aggregate data

12
ACT-R Individual Differences
P1, P2, P3,
M1, M2, M3,
W1, W2, W3,
13
Overview of Talk
  • Review tasks we are studying
  • Illustrate methodology
  • Highlight key results
  • Visual search vs. memory strategies trade off in
    final performance gt complex task modeling offers
    best constraint with fine-grained analysis

14
Modified Digit Span (MODS)
15
Modified Digit Span (MODS)
16
P/M Tasks
  • In our earlier studies, initial training phase of
    target task was used to collect data on
    individuals perceptual/motor speed.
  • e.g., Time to find object A7 and click on it
  • In later studies, separate task used to measure
    perceptual and motor speed.

17
How to predict task performance
  • Estimate each individuals processing parameters
  • Measure individuals performance on MODS,
    PercMotor
  • Using models of these tasks, estimate
    participants corresponding architectural
    parameters (e.g., working memory capacity,
    perceptual/motor speed)
  • Build/refine model of target task
  • Select global parameters for model of target task
    (e.g., from previously collected data)
  • Plug into model of target task each individuals
    parameters to predict his/her target task
    performance

18
W affects Performance
  • W is the ACT-R parameter for source activation,
    which impacts the degree to which activation of
    goal-related facts rises above the sea of other
    facts activations
  • Higher W gt goal-related facts relatively more
    activated gt faster and more accurately retrieved
    gt better MODS performance

19
Estimating W
  • Model of MODS task is fit to individuals MODS
    performance by varying W
  • Best fitting value of W is taken as estimate

20
Estimating PM
  • For simplicity, we estimated a combined PM
    parameter directly from each individuals
    perceptual/motor task performance.
  • This PM parameter was then used to scale the
    timing of the target tasks perceptual-motor
    productions.

21
Joint Distribution of W and P/M
W and P/M are tapping distinct characteristics
22
ACT-R Individual Differences
P1, P2, P3,
M1, M2, M3,
W1, W2, W3,
23
Specifics of our Approach
  • Estimate each individuals processing parameters
  • Measure individuals performance on modified
    digit span, spatial span, perceptual/motor speed
  • Using models of these tasks, estimate
    participants W, P, M
  • Build/refine model of air traffic control
    taskAMBR
  • Select global parameters for AMBR model
  • Plug in individuals parameters to predict
    performance across different AMBR scenarios

24
AMBR Air Traffic Control Task
  • Complex and dynamic task
  • Spatial and verbal aspects
  • Multi-tasking
  • Testbed for cognitive modeling architectures

25
AMBR TaskACaircraft, ATCair traffice controller
  • As ATC, you communicate with AC and other ATC to
    handle all AC in your airspace
  • Six commands with different triggers
  • First ACCEPT, then WELCOME incoming AC (these two
    separated by short interval)
  • First TRANSFER, then order a CONTACT message from
    outgoing AC (these two separated by short
    interval)
  • Decide to OK or REJECT requests for speed
    increase
  • When a command is not handled before AC reaches
    zone boundary, this is a HOLD (error)

26
Issuing an AMBR Command
  • Text message or radar cues particular action
  • Click on Command Button
  • Click on Aircraft (in radar screen)
  • Click on Air Traffic Controller (if necy)
  • Click on SEND Button

27
(No Transcript)
28
(No Transcript)
29
General Methods
  • Empirical Methods
  • Day 1 Collect MODS and P/M data and train on
    AMBR plus AMBR practice
  • Day 2 Review AMBR instructions, battery of AMBR
    scenarios
  • Modeling Methods
  • Use MODS PM data to estimate W and PM for each
    subject
  • Plug individual W and PM values into AMBR model
  • Compare individuals AMBR performance with model
    predictions

30
Experiments 1 2
  • AMBR Scenario Design
  • Experiment 1 alternating 5 easy, 5 hard
  • Experiment 2 9 scenarios of varying difficulty
  • AMBR Dependent Measures
  • Total time to handle each command
  • Number of hold errors

31
Off-the-shelf ACT-R Model of AMBR
  • Scan for something to do Radar, Left, Right,
    Bottom text windows
  • When an action cue is noticed, determine if it
    has been handled or not scan/remember
  • If the cue has not been handled, click command,
    AC, ATC, SEND
  • Resume scanning

32
Model Captures Range of Performance
33
Model Predictions
  • Prediction of whether a subject commits an error
    in a scenario, based on scenario details and
    individuals W P/M

34
Indl Diffs Impact on Hold Errors
  • Hold errors only weakly dependent on W, more
    strongly on P/M and scenario difficulty

Hold Errors
Parameter Value
35
Scenario Difficulty
Scenario
36
Mean Errors by Scenario
Scenario
37
Be Careful What (DM) you Model
  • Error data too coarse to constrain model
  • Even total RT/command data insufficient
  • Model predicts that scanning strategy plays a
    large role in performance.
  • This is consistent with participant reports who
    may be doing any combination of visual search or
    memory retrieval

38
Observable Behaviors
  • Subject
  • T 0.0 Cue Accept T6?
  • T 3.6 ACCEPT button
  • T 5.9 AC T6
  • T 6.7 ATC EAST
  • T 7.7 SEND button
  • Model
  • T 0.0 Cue Accept T6?
  • T 3.7 ACCEPT button
  • T 5.7 AC T6
  • T 7.0 ATC EAST
  • T 8.2 SEND button

Stochastic variation on the single-action level
is part of subject and model behavior
39
The Details Are Inside
  • Model I/O
  • T 0.0 Cue Accept T6?
  • T 3.7 ACCEPT button
  • T 5.7 AC T6
  • T 7.0 ATC EAST
  • T 8.2 SEND button
  • Model Trace
  • T 1.5 Notice cue
  • T 2.5 Subgoal task
  • T 3.7 Mouse click
  • T 3.8 Start AC search
  • T 4.9 Find AC
  • T 5.7 Mouse click
  • T 7.0 Mouse click
  • T 8.2 Mouse click

40
Conclusion thus far
  • Visual search vs. memory strategies trade off in
    final performance gt even when modeling a complex
    task, coarse dependent measures (accuracy, total
    RT) hide important details
  • Previous AMBR model fit group data well
  • Only by seeking extra constraint of modeling
    individual participants were important gaps in
    model fidelity revealed

41
Modifications for Experiment 3
  • Use more fine-grained measures Action RT
    Clicks
  • Modify the ATC task to increase memory demand
  • More interesting for our purposes
  • More realistic
  • Lengthen scenario length so same planes are in
    play
  • Hide AC names until click, then only after delay
  • Use model to bracket appropriate difficulty level

42
Raw Characteristics of Data
  • Experiment 3
  • Action RT 12.1 sec, Holds 3.3 / subject
  • Action RT correlates with W (r -0.314) and Pm
    (r 0.485)
  • Holds correlates with W (r -0.444) and Pm (r
    0.508)

43
Model Modifications
  • Search not only can give the answer sought (a
    specific ACs location) but an additional
    rehearsal of that information
  • In slack times, possible strategy of studying
    radar screen to rehearse AC names (called
    exploratory clicks)

44
Model Predicts Hold Errors
  • Predicts errors per subject, r 0.81
  • Hold errors depend more on W (compared to
    previous version of task) but still mostly
    dependent on PM and scenario difficulty
  • Move to modeling more fine-grained aspects of
    data

45
Model Predicts Number of Clicks
46
(No Transcript)
47
W, P/M affect RT click by click
Hi-Hi Model Subject
  • Set W-P/M parameters in model corresponding to
    participants (e.g., hi-hi lo-lo)
  • Run model to produce RT predictions click by
    click (for 2 commands Accept and Contact)

Lo-Lo Model Subject
48
W, P/M affect RT click by click
  • Set W-P/M parameters in model corresponding to
    participants
  • Run model to produce RT predictions click by
    click (for 2 commands Accept and Contact)

49
Conclusion thus far
  • Modeling more fine-grained measures required task
    and model modifications, but this produced
    individual participant predictions that were very
    promising.
  • Clicking on correct AC the first time ranges from
    69 to 96
  • Akin to remember vs. scan strategies
  • Higher number -gt more (accurate) remembering
  • This detailed aspect of performance relates to W

50
Theoretical InterludeSpatial vs. Verbal WM
  • Our working assumption (parsimoniously) posits a
    single source activation parameter, W
  • W modulates the degree to which goal-relevant
    facts are activated above the sea of unrelated
    facts
  • regardless of spatial/verbal representation
  • This perspective still allows for spatial/verbal
    distinctions in performance but explains them as
    a function of differences in spatial/verbal
    skills etc.

51
Opportunity to Test in Current Work
  • AMBR task has spatial and verbal aspects
  • Included verbal and spatial working memory tasks
    in battery, starting with Experiment 3
  • Which span task produces W estimates that best
    predict individuals AMBR performance?
  • Spatial Span task from Miyake and Shah (1996)

R
R
R
normal
normal
reversed
52
Opportunity to Test in Current Work
  • Result
  • Experiments 3 4 Spatial Span-based W predicts
    AMBR performance better than MODS-based W
  • Possible explanations
  • Spatial format more relevant for this task?
  • Spatial Span shows more variability -gt more
    sensitive?
  • Spatial Span variability taps other sources of
    variation?
  • Are there separate Ws for verbal and spatial WM?

53
Opportunity to Test in Current Work
  • Result
  • Experiments 3 4 Spatial Span-based W predicts
    AMBR performance better than MODS-based W
  • Possible explanations
  • Spatial format more relevant for this task?
  • Spatial Span shows more variability -gt more
    sensitive?
  • Spatial Span variability taps other sources of
    variation?
  • Are there separate Ws for verbal and spatial WM?

54
Spatial Span taps speed as well
  • Another study, spawned by this issue, shows
    relationship between individuals mental rotation
    speed and Spatial Span
  • Pattern of correlations with PM
  • MODS r.25 Spatial Span r.65
  • Pattern of correlations with AMBR components

MemMouse
Mouse
Mouse
55
Theoretical Interlude Conclusion
  • Studying verbal vs. spatial memory resources in
    context of AMBR task moves theoretical debate to
    more realistic arena
  • This complements work with laboratory tasks and
    allows greater potential for generalization of
    results

56
Strategic Variation Emerges
  • Experiment 4 also revealed several sources of
    strategic variation, explored further in
    Experiment 5
  • Waiting for AC name ranges from 42 to 100
  • May reflect lack of confidence in memory, utility
    of checking ones memory
  • Somewhat negatively correlated with W
  • Initiating welcome and contact commands in
    anticipation of text cue (ranges from 0 to 100)
  • Making exploratory clicks on ACs during slack
    time (ranges from never to gt 5 per scenario)

57
Experiment 5 Details
  • Scenarios designed to have low (6 ACs) vs. high
    memory load (total 12 ACs)
  • Speed requests most common command
  • Most interesting for model predictions
  • Least susceptible to snowball effects
  • Dependent measures include RTs for individual
    clicks and strategy use as a function of scenario
    difficulty and command

58
Modeling Specific AMBR Components
Hard Scenarios
Accuracy of first AC click
Easy Scenarios
Accuracy of first AC click
59
Modeling Specific AMBR Components
Hard Scenarios
RT to Correct AC click
Easy Scenarios
RT to Correct AC click
60
Model Predictions Match Data
  • Main effects of scenario difficulty amplified for
    low W individuals
  • Main effects of command type (more/less
    memory-demanding) amplified for low W
  • Wait-for-AC-name strategy varied as a function of
    command type
  • Exploratory clicks strategy varied as a function
    of scenario difficulty

61
Summary of Conclusions
  • Complex tasks are not a modeling panacaea! Only
    by seeking extra constraint of modeling
    individual participants were important gaps in
    models fidelity revealed.
  • Studying verbal vs. spatial memory resources in
    context of AMBR task moves theoretical debate to
    more realistic arena.
  • Variability in performance -- from different use
    of strategies and/or from differences in
    processing capacities -- is there for the
    looking. Studying performance on average offers
    incomplete understanding.

62
(No Transcript)
63
Features of Our Approach
  • Our approach aims to jointly provide
  • Predictions that are accurate and detailed
  • At the individual participant level
  • Generated in real time (or faster)
  • Based on an interpretable model with variation in
    meaningful individual difference parameters
  • That generalize to variants of the target task

64
Joint Distribution of W and P/M
W and P/M are tapping distinct characteristics
Write a Comment
User Comments (0)
About PowerShow.com