How a Modeler - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

How a Modeler

Description:

Information in Target Window is only available after waiting for a lockout time ... A new utility learning mechanism. Paper presented at the 2006 ACT-R workshop. ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 28
Provided by: uclic
Category:

less

Transcript and Presenter's Notes

Title: How a Modeler


1
How a Modelers Conception of Rewards Influences
a Models behavior
  • Investigating ACT-R 6s utility learning mechanism
  • Christian P. Janssen
  • Wayne D. Gray
  • Michael J. Schoelles

2
Temporal difference learning ACT-R
  • Temporal difference learning has recently been
    introduced as ACT-Rs new utility learning
    mechanism (e.g., Fu Anderson, 2004 Anderson,
    2006, 2007 Bothell, 2005)
  • Utility learning learns to optimize behavior as
    to maximize the rewards that the model receives
  • A model can
  • Receive rewards at different moments in times
  • Receive rewards of different magnitudes
  • There are no guidelines for choosing when a
    reward should be given and what its magnitude
    should be

3
New issues for ACT-R
  • We studied two aspects of TD learning
  • When is reward given
  • Magnitude of the reward
  • This a new issue for ACT-R
  • When is reward given could be varied in ACT-R 5
  • Magnitude of reward could not be varied in ACT-R
    5
  • As we will show, the modelers conception of
    rewards has a big influence on a models behavior
  • Case study Blocks World task (Gray et al., 2006)

4
Why the Blocks World task?
  • Previous work indicates that the utility learning
    mechanism is crucial for this task
  • ACT-R 5 models (Gray, Sims, Schoelles, 2005)
  • Regular ACT-R 5 can not provide a good fit to the
    human data
  • Because rewards in ACT-R 5 are binary (i.e.,
    successes and failures) and not scalar
  • Ideal Performer Model (Gray et al., 2006)
  • Model outside of ACT-R that uses temporal
    difference learning provided a very good fit
    (Gray et al., 2006)

5
Blocks World task
  • So whats the task?

6
Blocks World task
Task Copy pattern in target window by moving
blocks from resource window to workspace window
7
Blocks World task
Windows are covered with gray rectanglesAccessin
g information requires interaction with the
interface
8
Blocks World task
Windows are covered with gray rectanglesAccessin
g information requires interaction with the
interface
9
Blocks World task
Windows are covered with gray rectanglesAccessin
g information requires interaction with the
interface
10
Blocks World task
Windows are covered with gray rectanglesAccessin
g information requires interaction with the
interface
11
Blocks World task
  • Blocks world task
  • Information in Target Window is only available
    after waiting for a lockout time
  • 0, 400 or 3200 milliseconds (between subjects)

12
Blocks World task human data (Gray et al., 2006)
  • Size of lockout time influences human behavior

13
Blocks World task Modeling Strategies
  • Strategy How many blocks do you plan to place
    after a visit to the target window?
  • 8 encode-x production rules
  • study x blocks
  • Encode-1 till encode-8
  • Model learns utility value of each production
    rule using ACT-Rs temporal difference learning
    algorithm

14
Utility learning
  • Utility learning requires the incorporation of
    rewards
  • Two choices are crucial
  • When is the reward is given?
  • What is the magnitude of the reward?
  • After some experience, the utility of a
    production rule approximates (Anderson, 2007)

Magnitude
When is reward given
15
Utility learning
  • Choice 1 When is the reward given?
  • Important because
  • Utility value has a linear relationship with the
    the time at which the reward is given
  • Choice in Blocks World
  • Once model Update once, at the end of the trial
  • Each model Update each time that part of the
    task is completed.
  • A (set of) block(s) has been placed and the model
    either returns to the target window to study more
    blocks, or finishes the trial

16
Utility learning
  • Choice 2 magnitude of the reward
  • Important because
  • Utility value has a linear relationship with the
    magnitude of the reward
  • But how to set this value?
  • Experimental tweaking? -gt unfavorable
  • Fixed range of values? (e.g., between 0 and 1) -gt
    difficult
  • Relate to neurological data? -gt not available for
    most models

17
Utility learning
  • Choice 2 magnitude of the reward
  • Choice in Blocks World
  • Relate the reward to what might be important in
    the task
  • Accuracy Accuracy with which task is
    performedOptions
  • Success blocks placed (once)
  • Success blocks placed (each)
  • Success Failure blocks placed - blocks
    forgotten (each model)
  • Time How much time does (part of the) task
    take?Options
  • Time spend on the task -1 time spend (once)
  • Time spend waiting for specific aspect of the
    task -1 lockout size number of visits to
    target window (once)
  • Number of blocks placed per second (each)

18
Blocks World task Modeling Strategies
  • 6 models were developed
  • Each model is run 6 times for each of 3
    experimental conditions
  • 0, 400 and 3200 milliseconds
  • Models interact with the same interface as human
    participants

19
Blocks World task general results
  • Each model has unique results

20
Blocks World task general results
  • What is the impact of
  • When the reward is given (once/each)
  • The concept of the reward (related to
    accuracy/time)
  • Results averaged over 3 models

21
Utility learning impact of when reward is given
22
Utility learning impact of concept of reward
23
Comparison with ACT-R 5 (Gray, Sims Schoelles,
2005)
24
Conclusion
  • Rewards can be given at different times during a
    trial and according to different concepts
  • There are no guidelines what the best choices are
  • Blocks World suggests that rewards should
  • Be given once Model can optimize behavior over
    entire task
  • Relate to concept of time because different
    strategy choices have a big impact on reward size
  • Models of other tasks should point out if this is
    consistent

25
Conclusion
  • This is not just a Blocks World issue
  • General Computer Science / AI issue
    representing a task in the right way is
    crucial(e.g., Russell Norvig, 1995 Sutton
    Barto, 1998)
  • Many experiments involve manipulations and
    measurements of accuracy and speed of performance
  • This a new issue for ACT-R
  • When is reward given could be varied in ACT-R 5
  • Magnitude of reward could not be varied in ACT-R
    5

26
Thank you for your attention
  • Questions?
  • More information
  • cjanssen_at_ai.rug.nl
  • www.ai.rug.nl/cjanssen
  • www.cogsci.rpi.edu/cogworks
  • Poster Session _at_ CogSci 2008 Thursday, July
    24th Cognitive Models of Strategy Shifts in
    Interactive Behavior(session Attention and
    Implicit Learning)

27
References
  • Anderson, J. R. (2006). A new utility learning
    mechanism. Paper presented at the 2006 ACT-R
    workshop.
  • Anderson, J. R. (2007). How can the human mind
    occur in the physical universe? New York Oxford
    University Press.
  • Bothell, D. (2005). ACT-R 6 Official Release.
    Proceedings of the 12th ACT-R Workshop.
  • Fu, W. T., Anderson, J. R. (2004). Extending
    the computational abilities of the procedural
    learning mechanism in ACT-R. Proceedings of the
    26th annual meeting of the Cognitive Science
    Society, 416-421.
  • Gray, W. D., Schoelles, M. J., Sims, C. R.
    (2005). Adapting to the task environment
    Explorations in expected value. Cognitive Systems
    Research, 6(1), 27-40.
  • Gray, W. D., Sims, C. R., Fu, W. T., Schoelles,
    M. J. (2006). The soft constraints hypothesis A
    rational analysis approach to resource allocation
    for interactive behavior. Psychological Review,
    113(3), 461-482.
  • Russell, S. J., Norvig, P. (1995). Artificial
    intelligence a modern approach. Upper Saddle
    River, NJ Prentice-Hall, Inc.
  • Sutton, R. S., Barto, A. G. (1998).
    Reinforcement learning An introduction.
    Cambridge, MA MIT Press.
Write a Comment
User Comments (0)
About PowerShow.com