Relational Macros for Transfer in Reinforcement Learning - PowerPoint PPT Presentation

About This Presentation
Title:

Relational Macros for Transfer in Reinforcement Learning

Description:

Relational Macros for Transfer in Reinforcement Learning. Lisa ... Aleph: top-down search in a bottom clause. Heuristic and randomized search. Maximize F1 score ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 23
Provided by: david1067
Learn more at: https://ftp.cs.wisc.edu
Category:

less

Transcript and Presenter's Notes

Title: Relational Macros for Transfer in Reinforcement Learning


1
Relational Macros for Transfer in Reinforcement
Learning
Lisa Torrey, Jude Shavlik, Trevor
Walker University of Wisconsin-Madison, USA
Richard Maclin University of Minnesota-Duluth,
USA
2
Transfer Learning Scenario
Agent learns Task A
3
Goals of Transfer Learning
Learning curves in the target task
performance
with transfer
without transfer
training
4
Reinforcement Learning
Described by a set of features
Take an action
Observe world state
Policy choose the action with the highest
Q-value in the current state
Receive a reward
Use the rewards to estimate the Q-values of
actions in states
5
The RoboCup Domain
6
Transfer in Reinforcement Learning
Copy the Q-function
  • Related work
  • Model reuse (Taylor Stone 2005)
  • Policy reuse (Fernandez Veloso 2006)
  • Option transfer (Perkins Precup 1999)
  • Relational RL (Driessens et al. 2006)
  • Our previous work
  • Policy transfer (Torrey et al. 2005)
  • Skill transfer (Torrey et al. 2006)

Learn rules that describe when to take
individual actions
Now we learn a strategy instead of individual
skills
7
Representing a Multi-step Strategy
Really these are rule sets, not just single rules
The learning agent jumps between players
  • A relational macro is a finite-state machine
  • Nodes represent internal states of agent in which
    limited independent policies apply
  • Conditions for transitions and actions are in
    first-order logic

8
Our Proposed Method
  • Learn a relational macro that describes a
    successful strategy in the source task
  • Execute the macro in the target task to
    demonstrate the successful strategy
  • Continue learning the target task with standard
    RL after the demonstration

9
Learning a Relational Macro
  • We use ILP to learn macros
  • Aleph top-down search in a bottom clause
  • Heuristic and randomized search
  • Maximize F1 score
  • We learn a macro in two phases
  • The action sequence (node structure)
  • The rule sets for actions and transitions

10
Learning Macro Structure
  • Objective find an action pattern that separates
    good and bad games

macroSequence(Game) ? actionTaken(Game,
StateA, move, ahead, StateB), actionTaken(Game,
StateB, pass, _, StateC), actionTaken(Game,
StateC, shoot, _, gameEnd).
11
Learning Macro Conditions
  • Objective describe when transitions and actions
    should be taken

12
Examples for Actions
Game 1 move(ahead) pass(a1) shoot(goalR
ight)
scoring
Game 2 move(ahead) pass(a2) shoot(goalL
eft)
Game 3 move(right) pass(a1)
non-scoring
Game 4 move(ahead) pass(a1) shoot(goalR
ight)
13
Examples for Transitions
Game 1 move(ahead) pass(a1) shoot(goalR
ight)
scoring
Game 2 move(ahead) move(ahead) shoot(go
alLeft)
non-scoring
Game 3 move(ahead)
pass(a1) shoot(goalRight)
14
Transferring a Macro
  • Demonstration
  • Execute the macro strategy to get Q-value
    estimates
  • Infer low Q-values for actions not taken by macro
  • Compute an initial Q-function with these examples
  • Continue learning with standard RL
  • Advantage potential for large immediate jump in
    performance
  • Disadvantage risk that agent will blindly follow
    an inappropriate strategy

15
Experiments
  • Source task 2-on-1 BreakAway
  • 3000 existing games from the learning curve
  • Learn macros from 5 separate runs
  • Target tasks 3-on-2 and 4-on-3 BreakAway
  • Demonstration period of 100 games
  • Continue training up to 3000 games
  • Perform 5 target runs for each source run

16
2-on-1 BreakAway Macro
The learning agent jumped players here
In one source run this node was absent
This shot is apparently a leading pass
The ordering of these nodes varied
17
Results 2-on-1 to 3-on-2
18
Results 2-on-1 to 4-on-3
19
Conclusions
  • This transfer method can significantly improve
    initial target-task performance
  • It can handle new elements being added to the
    target task, but not new objectives
  • It is an aggressive approach that is a good
    choice for tasks with similar strategies

20
Future Work
  • Alternative ways to apply relational macros in
    the target task
  • Keep the initial benefits
  • Alleviate risks when tasks differ more
  • Alternative ways to make decisions about steps
    within macros
  • Statistical relational learning techniques

21
Acknowledgements
  • DARPA Grant HR0011-04-1-0007
  • DARPA IPTO contract FA8650-06-C-7606

Thank You
22
Rule scores
  • Each transition and action has a set of rules,
    one or more of which may fire
  • If multiple rules fire, we obey the one with the
    highest score
  • The score of a rule is the probability that
    following it leads to a successful game
  • Score source-task games that followed
    the rule and scored
  • source-task games that followed
    the rule
Write a Comment
User Comments (0)
About PowerShow.com