Giving Advice about Preferred Actions to Reinforcement Learners Via KnowledgeBased Kernel Regression - PowerPoint PPT Presentation

About This Presentation
Title:

Giving Advice about Preferred Actions to Reinforcement Learners Via KnowledgeBased Kernel Regression

Description:

Giving Advice about Preferred Actions to Reinforcement ... Lagoudakis and Parr, ICML 2003. Current and Future Work. Knowledge transfer via. preference advice ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 20
Provided by: Mangas
Learn more at: https://ftp.cs.wisc.edu
Category:

less

Transcript and Presenter's Notes

Title: Giving Advice about Preferred Actions to Reinforcement Learners Via KnowledgeBased Kernel Regression


1
Giving Advice about Preferred Actions to
Reinforcement Learners ViaKnowledge-Based Kernel
Regression
  • Richard Maclin
  • University of Minnesota-Duluth
  • Jude Shavlik, Lisa Torrey, Trevor Walker, Edward
    WildUniversity of Wisconsin-Madison

2
Goal!!!!
  • Given
  • Environment to explore
  • Reinforcements for that environment
  • Advice from a human observer
  • Do
  • Learn a good policy for the environment

Pass to your teammate!!
3
Our Contribution
  • Natural form of advice for reinforcement learning
    (RL)
  • preference advice
  • Advice format
  • If ltagent is in this region of feature spacegt
  • Then Prefer Action1 To Action2
  • Advice about policy rather than Q value

4
Desiderata for Advice-Taking
  • Human observer expresses advice naturally and
    w/o knowledge of ML agents internals
  • Agent incorporates advice directly into function
    it is learning
  • Additional feedback (rewards, more advice) used
    to refine learner continually

5
Advice in Knowledge-Based Kernel Regression
(KBKR MSW, JMLR04)
  • If The goal center is close and
  • The goalie isnt covering it
  • Then Shoot!


and angleGoalieGCenter 25
If distGoalCenter 15
Then Q(shoot) 0.9
6
Preference Advice (Pref-KBKR)
  • Likely hard for the user togenerate Q(shoot)
    0.9

Would be useful to sayShoot is better than Pass
If distGoalCenter 15 and angleGoalieGC
enter 25 Then Prefer Shoot to Pass
7
Knowledge-Based SVMs Generalizing Example
from POINT to REGION
POS
NEG
8
Support-Vector Regression for RL
  • min w1 ? b C s1
  • such that for all training examples
  • Qa(x) - s w x b Qa(x) s

Learned models predictions
Error
9
Mathematically
penalties for not following advice
(hence advice can be refined )
  • min w1 ? b C s1
  • such that
  • Qa(x) - s w x b Qa(x) s

constraints that represent advice
10
Incorporating Advice in KBKR
  • Advice format
  • Bx d ? f(x) hx ?

If distGoalCenter 15 and angleGoalieGCenter
25 Then Q(shoot) 0.9
11
Preference Advice
  • If distGoalCenter 15 and
  • angleGoalieGCenter 25
  • Then Prefer Shoot to Pass

Advice format Bx d ? Qshoot(x) - Qpass(x)
?
  • Note, learn w,b for each action simultaneously

12
Preference Advice Theorem
  • Let xBxd be nonempty. For fixed
    (wp,bp,wn,bn,?)
  • Bx d ? Qp(x) Qn(x) ?
  • is equivalent to the following having a solution
    u (Motzkins Theorem of the Alternative)
  • BTu wp - wn 0
  • -dTu bp - bn - ? 0, u 0

13
Pref-KBKR Linear Program
  • min sum for each action a
  • wa1 ? ba C sa1
  • such that
  • for each action a
  • Qa(x) - sa wa x ba Qa(x) sa

sum for each piece of advice k ?1zk1
?2 ?k
for each piece of advice k -zk wp wn
BkTuk zk -dTuk ?k ?k bp bn
14
Methodology
  • Test on 2-on-1 BreakAway
  • Learnable choice action when you have ball
  • 13 basic features describe world, each augmented
    by 32 tiles
  • Average over 10 runs
  • Batch learning at 100 games using SARSA
    estimates
  • Learn on 2000 examples chosen stochastically
    (favoring recent examples)

15
(No Transcript)
16
Related Work
  • Advice-taking RL
  • Gordon Subramanian, Informatica 1994
  • Maclin Shavlik, AAAI 1994, MLJ 1996
  • Andre Russell, NIPS 2001
  • RL and SVMs
  • Dietterich Wang, ECML 2001
  • Lagoudakis and Parr, ICML 2003

17
Current and Future Work
  • Knowledge transfer viapreference advice
  • Wider variety problems
  • Large numbers of examples
  • Large numbers of pieces of advice
  • Other types of advice (e.g., multi-step plans)

18
Conclusions
  • Pref-KBKR
  • Allows human user to advise RL agent in a
    natural manner
  • Accepts rules about policies rather than Q values
  • If distGoalCenter 15 and
  • angleGoalieGCenter 25
  • Then Prefer Shoot to Pass
  • When applied to a complex RL problem
    significantly outperforms agents w/o advice and
    with KBKR advice

19
Acknowledgements
  • DARPA Grant HR0011-04-0007
  • US Naval Research Lab Grant N00173-04-1-G026
Write a Comment
User Comments (0)
About PowerShow.com