Giving Advice about Preferred Actions to Reinforcement Learners Via KnowledgeBased Kernel Regression

About This Presentation

Title:

Giving Advice about Preferred Actions to Reinforcement Learners Via KnowledgeBased Kernel Regression

Description:

Giving Advice about Preferred Actions to Reinforcement ... Lagoudakis and Parr, ICML 2003. Current and Future Work. Knowledge transfer via. preference advice ... – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 20

Provided by: Mangas

Learn more at: https://ftp.cs.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Giving Advice about Preferred Actions to Reinforcement Learners Via KnowledgeBased Kernel Regression

1
Giving Advice about Preferred Actions to
Reinforcement Learners ViaKnowledge-Based Kernel
Regression

Richard Maclin
University of Minnesota-Duluth
Jude Shavlik, Lisa Torrey, Trevor Walker, Edward
WildUniversity of Wisconsin-Madison

2
Goal!!!!

Given
Environment to explore

Reinforcements for that environment

Advice from a human observer

Do
Learn a good policy for the environment

Pass to your teammate!!
3
Our Contribution

Natural form of advice for reinforcement learning
(RL)
preference advice

Advice format
If ltagent is in this region of feature spacegt
Then Prefer Action1 To Action2

Advice about policy rather than Q value

4
Desiderata for Advice-Taking

Human observer expresses advice naturally and
w/o knowledge of ML agents internals

Agent incorporates advice directly into function
it is learning

Additional feedback (rewards, more advice) used
to refine learner continually

5
Advice in Knowledge-Based Kernel Regression
(KBKR MSW, JMLR04)

If The goal center is close and
The goalie isnt covering it
Then Shoot!

and angleGoalieGCenter 25
If distGoalCenter 15
Then Q(shoot) 0.9
6
Preference Advice (Pref-KBKR)

Likely hard for the user togenerate Q(shoot)
0.9

Would be useful to sayShoot is better than Pass
If distGoalCenter 15 and angleGoalieGC
enter 25 Then Prefer Shoot to Pass
7
Knowledge-Based SVMs Generalizing Example
from POINT to REGION
POS
NEG
8
Support-Vector Regression for RL

min w1 ? b C s1
such that for all training examples
Qa(x) - s w x b Qa(x) s

Learned models predictions
Error
9
Mathematically
penalties for not following advice
(hence advice can be refined )

min w1 ? b C s1
such that
Qa(x) - s w x b Qa(x) s

constraints that represent advice
10
Incorporating Advice in KBKR

Advice format
Bx d ? f(x) hx ?

If distGoalCenter 15 and angleGoalieGCenter
25 Then Q(shoot) 0.9
11
Preference Advice

If distGoalCenter 15 and
angleGoalieGCenter 25
Then Prefer Shoot to Pass

Advice format Bx d ? Qshoot(x) - Qpass(x)
?

Note, learn w,b for each action simultaneously

12
Preference Advice Theorem

Let xBxd be nonempty. For fixed
(wp,bp,wn,bn,?)
Bx d ? Qp(x) Qn(x) ?
is equivalent to the following having a solution
u (Motzkins Theorem of the Alternative)
BTu wp - wn 0
-dTu bp - bn - ? 0, u 0

13
Pref-KBKR Linear Program

min sum for each action a
wa1 ? ba C sa1
such that
for each action a
Qa(x) - sa wa x ba Qa(x) sa

sum for each piece of advice k ?1zk1
?2 ?k
for each piece of advice k -zk wp wn
BkTuk zk -dTuk ?k ?k bp bn
14
Methodology

Test on 2-on-1 BreakAway
Learnable choice action when you have ball
13 basic features describe world, each augmented
by 32 tiles
Average over 10 runs
Batch learning at 100 games using SARSA
estimates
Learn on 2000 examples chosen stochastically
(favoring recent examples)

15
(No Transcript)
16
Related Work

Advice-taking RL
Gordon Subramanian, Informatica 1994
Maclin Shavlik, AAAI 1994, MLJ 1996
Andre Russell, NIPS 2001
RL and SVMs
Dietterich Wang, ECML 2001
Lagoudakis and Parr, ICML 2003

17
Current and Future Work

Knowledge transfer viapreference advice
Wider variety problems
Large numbers of examples
Large numbers of pieces of advice
Other types of advice (e.g., multi-step plans)

18
Conclusions

Pref-KBKR
Allows human user to advise RL agent in a
natural manner
Accepts rules about policies rather than Q values
If distGoalCenter 15 and
angleGoalieGCenter 25
Then Prefer Shoot to Pass
When applied to a complex RL problem
significantly outperforms agents w/o advice and
with KBKR advice