Title: Machine Learning via Advice Taking
1 Machine Learning via Advice Taking
2Thanks To ...
Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi
Mangasarian Glenn Fung Ted Wild DARPA
3Quote (2002) from DARPA
- Sometimes an assistant will merely watch you and
draw conclusions. - Sometimes you have to tell a new person, 'Please
don't do it this way' or 'From now on when I say
X, you do Y.' - It's a combination of learning by example and by
being guided.
4Widening the Communication Pipeline between
Humans and Machine Learners
Pupil
Machine Learner
Teacher
5Our Approach to Building Better Machine Learners
- Human partner expresses advice naturally and
w/o knowledge of ML agents internals - Agent incorporates advice directly into the
function it is learning - Additional feedback (rewards, I/O pairs,
inferred labels, more advice) used to refine
learner continually
6Standard Machine Learning vs. Theory Refinement
- Positive Examples (should see doctor)
- temp 102.1, age 21, sex F,
- temp 101.7, age 37, sex M,
- Negative Examples (take two aspirins)
- temp 99.1, age 43, sex M,
- temp 99.6, age 24, sex F,
- Approximate Domain Knowledge
- if temp high and age young
then neg example - Related work by labs of Mooney, Pazzani, Cohen,
Giles, etc
7Rich Maclins PhD (1995)
- IF a Bee is (Near and West)
- an Ice is (Near and North)
- Then
- Begin
- Move East
- Move North
- END
8Sample Results
With advice
Without advice
9Our Motto
Give advice rather than commands to your
computer
10Outline
- Prior Knowledge and Support Vector Machines
- Intro to SVMs
- Linear Separation
- Non-Linear Separation
- Function Fitting (Regression)
- Advice-Taking Reinforcement Learning
- Transfer Learning via Advice Taking
11Support Vector MachinesMaximizing the Margin
between Bounding Planes
A
A-
Support Vectors
?
Margin
12Linear Algebra for SVMs
- Given p points in n dimensional space
- Represent by p-by-n matrix A of reals
- Separate by two bounding planes
13Slack VariablesDealing with Data that is not
Linearly Separable
A
A-
Support Vectors
y
14Support Vector Machines Quadratic Programming
Formulation
- Solve this quadratic program
- Minimize sum of slack vars with wgt
- Maximize margin by minimizing
15Support Vector MachinesLinear Programming
Formulation
Use 1-norm instead of 2-norm(typically runs
faster better feature selectionmight
generalize better, NIPS 03)
16Knowledge-Based SVMsGeneralizing Example from
POINT to REGION
A
A-
17Incorporating Knowledge Sets Into the SVM
Linear Program
- Suppose that knowledge set belongs to class
A - Hence must lie in half space
-
This implication equivalent to set of
constraints (proof in NIPS 02 paper)
18Resulting LP for KBSVMs
Ranges over regions
19KBSVM with Slack Variables
Was 0
20SVMs and Non-Linear Separating Surfaces
Non-linearly map to new space
Linearly separate in new space (using kernels)
Result is non-linear separator in original space
Fung et al. (2003) presents knowledge-based
non-linear SVMs
21Support Vector Regression(aka Kernel Regression)
Linearly approximating a function, given array A
of inputs and vector y of (numeric) outputs f(x)
xw b Find weights such that Aw be y In
dual space, w A?, so get (A A)? be
y Kernelizing (to get non-linear
approx) K(A,A)? be y
y
x
22What to Optimize?
Linear program to optimize
- 1st term (?) is regularizer that minimizes
model complexity - 2nd term is approximation error, weighted by
parameter C - Classical least squares fit if quadratic
version and first term ignored
23Predicting Y for New X
- y K(x, A)? b
-
- Use Kernel to compute distance to each training
point (ie, row in A) - Weight by ?i (hopefully many ?i are zero), Sum
- Add b (a scalar)
24Knowledge-Based SVRMangasarian, Shavlik, Wild,
JMLR 04
- Add soft constraints to linear program (so need
only follow advice approximately)
4
y
Advice In this region, y should exceed 4
S
minimize w1 C s1 penalty for
violating advice such that y - s ? Aw
b ? y s slacked match to advice
25Testbeds Subtasks of RoboCup
26Reinforcement Learning Overview
Described by a set of features
Take an action
Receive a state
Policy choose the action with the highest
Q-value in the current state
Receive a reward
Use the rewards to estimate the Q-values of
actions in states
27Incorporating Advice in KBKR
- Advice format
- Bx d ? f(x) hx ?
If distanceToGoal 10 and shotAngle 30 Then
Q(shoot) 0.9
28Giving Advice About Relative Values of Multiple
FunctionsMaclin et al, AAAI 05
When the input satisfies preconditions(input) Th
en f1(input) gt f2(input)
29Sample Advice-Taking Results
if distanceToGoal ? 10 and shotAngle
? 30 then prefer shoot over all other actions
Q(shoot) gt Q(pass) Q(shoot) gt Q(move)
advice
2 vs 1 BreakAway, rewards 1, -1
std RL
30Transfer Learning
Agent learns Task A
We use a user mappingto tell the agent this
Agent encounters related Task B
Task A is the source Task B is the target
Agent discovers how tasks are related
Agent uses knowledge from Task A to learn Task B
faster
31Transfer LearningThe Goal for the Target Task
better asymptote
faster rise
with transfer
performance
without transfer
training
better start
32Our Transfer Algorithm
Observe source task games to learn skills
Translate learned skills into transfer advice
Use ILP to create advice for the target task
If there is user advice, add it in
Learn target task with KBKR
33Learning Skills By Observation
- Source-task games are sequences (state, action)
- Learning skills is like learning to classify
states by their correct actions - ILP Inductive Logic Programming
34ILP Searching for First-Order Rules
We also use a random-sampling approach
35Advantages of ILP
- Can produce first-order rules for skills
- Capture only the essential aspects of the skill
- We expect these aspects to transfer better
- Can incorporate background knowledge
pass(teammate1)
...
vs.
pass(Teammate)
pass(teammateN)
36Example of a Skill Learned by ILP from KeepAway
pass(Teammate) - distBetween(me, Teammate) gt
14, passAngle(Teammate) gt 30,
passAngle(Teammate) lt 150, distBetween(me,
Opponent) lt 7.
Also gave human advice about shooting, since
that is new skill in BreakAway
37TL Level 7 KA to BA Raw Curves
38TL Level 7 KA to BA Averaged Curves
39TL Level 7 Statistics
TL Metrics TL Metrics Average Reward Average Reward Average Reward Average Reward
Type Name KA to BA KA to BA MD to BA MD to BA
Type Name Score P Value Score P Value
I Jump start 0.05 0.0312 0.08 0.0086
I Jump start smoothed 0.08 0.0002 0.06 0.0014
II Transfer ratio 1.82 0.0034 1.86 0.0004
II Transfer ratio (truncated) 1.82 0.0032 1.86 0.0004
II Average relative reduction (narrow) 0.58 0.0042 0.54 0.0004
II Average relative reduction (wide) 0.70 0.0018 0.71 0.0008
II Ratio (of area under the curves) 1.37 0.0056 1.41 0.0012
II Transfer difference 503.57 0.0046 561.27 0.0008
II Transfer difference (scaled) 1017.00 0.0040 1091.2 0.0016
III Asymptotic advantage 0.09 0.0086 0.11 0.0040
III Asymptotic advantage smoothed 0.08 0.0116 0.10 0.0030
Boldface indicates a significant difference was
found
40Conclusion
- Can use much more than I/O pairs in ML
- Give advice to computers theyautomatically
refine it based on feedback from user or
environment - Advice an appealing mechanism for transferring
learned knowledgecomputer-to-computer
41Some Papers (on-line, use Google -)
- Creating Advice-Taking Reinforcement Learners,
Maclin Shavlik, Machine Learning 1996 - Knowledge-Based Support Vector Machine
Classifiers, Fung, Mangasarian, Shavlik, NIPS
2002 - Knowledge-Based Nonlinear Kernel Classifiers,
Fung, Mangasarian, Shavlik, COLT 2003 - Knowledge-Based Kernel Approximation,
Mangasarian, Shavlik, Wild, JAIR 2004 - Giving Advice about Preferred Actions to
Reinforcement Learners Via Knowledge-Based Kernel
Regression, Maclin, Shavlik, Torrey, Walker,
Wild, AAAI 2005 - Skill Acquisition via Transfer Learning and
Advice Taking, Torrey, Shavlik, Walker,
Maclin, ECML 2006
42 43Breakdown of Results
44What if User Advice is Bad?
45Related Work on Transfer
- Q-function transfer in RoboCup
- Taylor Stone (AAMAS 2005, AAAI 2005)
- Transfer via policy reuse
- Fernandez Veloso (AAMAS 2006, ICML workshop
2006) - Madden Howley (AI Review 2004)
- Torrey et al. (ECML 2005)
- Transfer via relational RL
- Driessens et al. (ICML workshop 2006)