Relational Macros for Transfer in Reinforcement Learning - PowerPoint PPT Presentation

About This Presentation

Title:

Relational Macros for Transfer in Reinforcement Learning

Description:

Relational Macros for Transfer in Reinforcement Learning. Lisa ... Aleph: top-down search in a bottom clause. Heuristic and randomized search. Maximize F1 score ... – PowerPoint PPT presentation

Number of Views:30

Avg rating:3.0/5.0

Slides: 23

Provided by: david1067

Learn more at: https://ftp.cs.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Relational Macros for Transfer in Reinforcement Learning

1
Relational Macros for Transfer in Reinforcement
Learning
Lisa Torrey, Jude Shavlik, Trevor
Walker University of Wisconsin-Madison, USA
Richard Maclin University of Minnesota-Duluth,
USA
2
Transfer Learning Scenario
Agent learns Task A
3
Goals of Transfer Learning
Learning curves in the target task
performance
with transfer
without transfer
training
4
Reinforcement Learning
Described by a set of features
Take an action
Observe world state
Policy choose the action with the highest
Q-value in the current state
Receive a reward
Use the rewards to estimate the Q-values of
actions in states
5
The RoboCup Domain
6
Transfer in Reinforcement Learning
Copy the Q-function

Related work
Model reuse (Taylor Stone 2005)
Policy reuse (Fernandez Veloso 2006)
Option transfer (Perkins Precup 1999)
Relational RL (Driessens et al. 2006)
Our previous work
Policy transfer (Torrey et al. 2005)
Skill transfer (Torrey et al. 2006)

Learn rules that describe when to take
individual actions
Now we learn a strategy instead of individual
skills
7
Representing a Multi-step Strategy
Really these are rule sets, not just single rules
The learning agent jumps between players

A relational macro is a finite-state machine
Nodes represent internal states of agent in which
limited independent policies apply
Conditions for transitions and actions are in
first-order logic

8
Our Proposed Method

Learn a relational macro that describes a
successful strategy in the source task
Execute the macro in the target task to
demonstrate the successful strategy
Continue learning the target task with standard
RL after the demonstration

9
Learning a Relational Macro

We use ILP to learn macros
Aleph top-down search in a bottom clause
Heuristic and randomized search
Maximize F1 score
We learn a macro in two phases
The action sequence (node structure)
The rule sets for actions and transitions

10
Learning Macro Structure

Objective find an action pattern that separates
good and bad games

macroSequence(Game) ? actionTaken(Game,
StateA, move, ahead, StateB), actionTaken(Game,
StateB, pass, _, StateC), actionTaken(Game,
StateC, shoot, _, gameEnd).
11
Learning Macro Conditions

Objective describe when transitions and actions
should be taken

12
Examples for Actions
Game 1 move(ahead) pass(a1) shoot(goalR
ight)
scoring
Game 2 move(ahead) pass(a2) shoot(goalL
eft)
Game 3 move(right) pass(a1)
non-scoring
Game 4 move(ahead) pass(a1) shoot(goalR
ight)
13
Examples for Transitions
Game 1 move(ahead) pass(a1) shoot(goalR
ight)
scoring
Game 2 move(ahead) move(ahead) shoot(go
alLeft)
non-scoring
Game 3 move(ahead)
pass(a1) shoot(goalRight)
14
Transferring a Macro

Demonstration
Execute the macro strategy to get Q-value
estimates
Infer low Q-values for actions not taken by macro
Compute an initial Q-function with these examples
Continue learning with standard RL
Advantage potential for large immediate jump in
performance
Disadvantage risk that agent will blindly follow
an inappropriate strategy

15
Experiments

Source task 2-on-1 BreakAway
3000 existing games from the learning curve
Learn macros from 5 separate runs
Target tasks 3-on-2 and 4-on-3 BreakAway
Demonstration period of 100 games
Continue training up to 3000 games
Perform 5 target runs for each source run

16
2-on-1 BreakAway Macro
The learning agent jumped players here
In one source run this node was absent
This shot is apparently a leading pass
The ordering of these nodes varied
17
Results 2-on-1 to 3-on-2
18
Results 2-on-1 to 4-on-3
19
Conclusions

This transfer method can significantly improve
initial target-task performance
It can handle new elements being added to the
target task, but not new objectives
It is an aggressive approach that is a good
choice for tasks with similar strategies

20
Future Work

Alternative ways to apply relational macros in
the target task
Keep the initial benefits
Alleviate risks when tasks differ more
Alternative ways to make decisions about steps
within macros
Statistical relational learning techniques

21
Acknowledgements

DARPA Grant HR0011-04-1-0007
DARPA IPTO contract FA8650-06-C-7606

Thank You
22
Rule scores

Each transition and action has a set of rules,
one or more of which may fire
If multiple rules fire, we obey the one with the
highest score
The score of a rule is the probability that
following it leads to a successful game
Score source-task games that followed
the rule and scored
source-task games that followed
the rule