Machine Learning via Advice Taking - PowerPoint PPT Presentation

About This Presentation
Title:

Machine Learning via Advice Taking

Description:

Title: Knowledge-Based Support Vector Machine Classifiers Author: Mangasarian Last modified by: Jude Shavlik Created Date: 1/3/2002 10:44:36 PM Document presentation ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 46
Provided by: Mang54
Category:

less

Transcript and Presenter's Notes

Title: Machine Learning via Advice Taking


1
Machine Learning via Advice Taking
  • Jude Shavlik

2
Thanks To ...
Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi
Mangasarian Glenn Fung Ted Wild DARPA
3
Quote (2002) from DARPA
  • Sometimes an assistant will merely watch you and
    draw conclusions.
  • Sometimes you have to tell a new person, 'Please
    don't do it this way' or 'From now on when I say
    X, you do Y.'
  • It's a combination of learning by example and by
    being guided.

4
Widening the Communication Pipeline between
Humans and Machine Learners
Pupil
Machine Learner
Teacher
5
Our Approach to Building Better Machine Learners
  • Human partner expresses advice naturally and
    w/o knowledge of ML agents internals
  • Agent incorporates advice directly into the
    function it is learning
  • Additional feedback (rewards, I/O pairs,
    inferred labels, more advice) used to refine
    learner continually

6
Standard Machine Learning vs. Theory Refinement
  • Positive Examples (should see doctor)
  • temp 102.1, age 21, sex F,
  • temp 101.7, age 37, sex M,
  • Negative Examples (take two aspirins)
  • temp 99.1, age 43, sex M,
  • temp 99.6, age 24, sex F,
  • Approximate Domain Knowledge
  • if temp high and age young
    then neg example
  • Related work by labs of Mooney, Pazzani, Cohen,
    Giles, etc

7
Rich Maclins PhD (1995)
  • IF a Bee is (Near and West)
  • an Ice is (Near and North)
  • Then
  • Begin
  • Move East
  • Move North
  • END

8
Sample Results
With advice
Without advice
9
Our Motto
Give advice rather than commands to your
computer
10
Outline
  • Prior Knowledge and Support Vector Machines
  • Intro to SVMs
  • Linear Separation
  • Non-Linear Separation
  • Function Fitting (Regression)
  • Advice-Taking Reinforcement Learning
  • Transfer Learning via Advice Taking

11
Support Vector MachinesMaximizing the Margin
between Bounding Planes
A
A-
Support Vectors
?
Margin
12
Linear Algebra for SVMs
  • Given p points in n dimensional space
  • Represent by p-by-n matrix A of reals
  • Each Ai in class 1 or -1
  • Separate by two bounding planes
  • More succinctly

13
Slack VariablesDealing with Data that is not
Linearly Separable
A
A-
Support Vectors
y
14
Support Vector Machines Quadratic Programming
Formulation
  • Solve this quadratic program
  • Minimize sum of slack vars with wgt
  • Maximize margin by minimizing

15
Support Vector MachinesLinear Programming
Formulation
Use 1-norm instead of 2-norm(typically runs
faster better feature selectionmight
generalize better, NIPS 03)
16
Knowledge-Based SVMsGeneralizing Example from
POINT to REGION
A
A-
17
Incorporating Knowledge Sets Into the SVM
Linear Program
  • Suppose that knowledge set belongs to class
    A
  • Hence must lie in half space

This implication equivalent to set of
constraints (proof in NIPS 02 paper)
18
Resulting LP for KBSVMs
Ranges over regions
19
KBSVM with Slack Variables
Was 0
20
SVMs and Non-Linear Separating Surfaces
Non-linearly map to new space
Linearly separate in new space (using kernels)
Result is non-linear separator in original space
Fung et al. (2003) presents knowledge-based
non-linear SVMs
21
Support Vector Regression(aka Kernel Regression)
Linearly approximating a function, given array A
of inputs and vector y of (numeric) outputs f(x)
xw b Find weights such that Aw be y In
dual space, w A?, so get (A A)? be
y Kernelizing (to get non-linear
approx) K(A,A)? be y
y
x
22
What to Optimize?
Linear program to optimize
  • 1st term (?) is regularizer that minimizes
    model complexity
  • 2nd term is approximation error, weighted by
    parameter C
  • Classical least squares fit if quadratic
    version and first term ignored

23
Predicting Y for New X
  • y K(x, A)? b
  • Use Kernel to compute distance to each training
    point (ie, row in A)
  • Weight by ?i (hopefully many ?i are zero), Sum
  • Add b (a scalar)

24
Knowledge-Based SVRMangasarian, Shavlik, Wild,
JMLR 04
  • Add soft constraints to linear program (so need
    only follow advice approximately)

4
y
Advice In this region, y should exceed 4
S
minimize w1 C s1 penalty for
violating advice such that y - s ? Aw
b ? y s slacked match to advice
25
Testbeds Subtasks of RoboCup
26
Reinforcement Learning Overview
Described by a set of features
Take an action
Receive a state
Policy choose the action with the highest
Q-value in the current state
Receive a reward
Use the rewards to estimate the Q-values of
actions in states
27
Incorporating Advice in KBKR
  • Advice format
  • Bx d ? f(x) hx ?

If distanceToGoal 10 and shotAngle 30 Then
Q(shoot) 0.9
28
Giving Advice About Relative Values of Multiple
FunctionsMaclin et al, AAAI 05
When the input satisfies preconditions(input) Th
en f1(input) gt f2(input)
29
Sample Advice-Taking Results
if distanceToGoal ? 10 and shotAngle
? 30 then prefer shoot over all other actions
Q(shoot) gt Q(pass) Q(shoot) gt Q(move)
advice
2 vs 1 BreakAway, rewards 1, -1
std RL
30
Transfer Learning
Agent learns Task A
We use a user mappingto tell the agent this
Agent encounters related Task B
Task A is the source Task B is the target
Agent discovers how tasks are related
Agent uses knowledge from Task A to learn Task B
faster
31
Transfer LearningThe Goal for the Target Task
better asymptote
faster rise
with transfer
performance
without transfer
training
better start
32
Our Transfer Algorithm
Observe source task games to learn skills
Translate learned skills into transfer advice
Use ILP to create advice for the target task
If there is user advice, add it in
Learn target task with KBKR
33
Learning Skills By Observation
  • Source-task games are sequences (state, action)
  • Learning skills is like learning to classify
    states by their correct actions
  • ILP Inductive Logic Programming

34
ILP Searching for First-Order Rules
We also use a random-sampling approach
35
Advantages of ILP
  • Can produce first-order rules for skills
  • Capture only the essential aspects of the skill
  • We expect these aspects to transfer better
  • Can incorporate background knowledge

pass(teammate1)
...
vs.
pass(Teammate)
pass(teammateN)
36
Example of a Skill Learned by ILP from KeepAway
pass(Teammate) - distBetween(me, Teammate) gt
14, passAngle(Teammate) gt 30,
passAngle(Teammate) lt 150, distBetween(me,
Opponent) lt 7.
Also gave human advice about shooting, since
that is new skill in BreakAway
37
TL Level 7 KA to BA Raw Curves
38
TL Level 7 KA to BA Averaged Curves
39
TL Level 7 Statistics
TL Metrics TL Metrics Average Reward Average Reward Average Reward Average Reward
Type Name KA to BA KA to BA MD to BA MD to BA
Type Name Score P Value Score P Value
I Jump start 0.05 0.0312 0.08 0.0086
I Jump start smoothed 0.08 0.0002 0.06 0.0014
II Transfer ratio 1.82 0.0034 1.86 0.0004
II Transfer ratio (truncated) 1.82 0.0032 1.86 0.0004
II Average relative reduction (narrow) 0.58 0.0042 0.54 0.0004
II Average relative reduction (wide) 0.70 0.0018 0.71 0.0008
II Ratio (of area under the curves) 1.37 0.0056 1.41 0.0012
II Transfer difference 503.57 0.0046 561.27 0.0008
II Transfer difference (scaled) 1017.00 0.0040 1091.2 0.0016
III Asymptotic advantage 0.09 0.0086 0.11 0.0040
III Asymptotic advantage smoothed 0.08 0.0116 0.10 0.0030
Boldface indicates a significant difference was
found
40
Conclusion
  • Can use much more than I/O pairs in ML
  • Give advice to computers theyautomatically
    refine it based on feedback from user or
    environment
  • Advice an appealing mechanism for transferring
    learned knowledgecomputer-to-computer

41
Some Papers (on-line, use Google -)
  • Creating Advice-Taking Reinforcement Learners,
    Maclin Shavlik, Machine Learning 1996
  • Knowledge-Based Support Vector Machine
    Classifiers, Fung, Mangasarian, Shavlik, NIPS
    2002
  • Knowledge-Based Nonlinear Kernel Classifiers,
    Fung, Mangasarian, Shavlik, COLT 2003
  • Knowledge-Based Kernel Approximation,
    Mangasarian, Shavlik, Wild, JAIR 2004
  • Giving Advice about Preferred Actions to
    Reinforcement Learners Via Knowledge-Based Kernel
    Regression, Maclin, Shavlik, Torrey, Walker,
    Wild, AAAI 2005
  • Skill Acquisition via Transfer Learning and
    Advice Taking, Torrey, Shavlik, Walker,
    Maclin, ECML 2006

42
  • Backups

43
Breakdown of Results
44
What if User Advice is Bad?
45
Related Work on Transfer
  • Q-function transfer in RoboCup
  • Taylor Stone (AAMAS 2005, AAAI 2005)
  • Transfer via policy reuse
  • Fernandez Veloso (AAMAS 2006, ICML workshop
    2006)
  • Madden Howley (AI Review 2004)
  • Torrey et al. (ECML 2005)
  • Transfer via relational RL
  • Driessens et al. (ICML workshop 2006)
Write a Comment
User Comments (0)
About PowerShow.com