Machine Learning via Advice Taking - PowerPoint PPT Presentation

About This Presentation

Title:

Machine Learning via Advice Taking

Description:

Title: Knowledge-Based Support Vector Machine Classifiers Author: Mangasarian Last modified by: Jude Shavlik Created Date: 1/3/2002 10:44:36 PM Document presentation ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 46

Provided by: Mang54

Category:

more less

Transcript and Presenter's Notes

Title: Machine Learning via Advice Taking

1
Machine Learning via Advice Taking

Jude Shavlik

2
Thanks To ...
Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi
Mangasarian Glenn Fung Ted Wild DARPA
3
Quote (2002) from DARPA

Sometimes an assistant will merely watch you and
draw conclusions.
Sometimes you have to tell a new person, 'Please
don't do it this way' or 'From now on when I say
X, you do Y.'
It's a combination of learning by example and by
being guided.

4
Widening the Communication Pipeline between
Humans and Machine Learners
Pupil
Machine Learner
Teacher
5
Our Approach to Building Better Machine Learners

Human partner expresses advice naturally and
w/o knowledge of ML agents internals
Agent incorporates advice directly into the
function it is learning
Additional feedback (rewards, I/O pairs,
inferred labels, more advice) used to refine
learner continually

6
Standard Machine Learning vs. Theory Refinement

Positive Examples (should see doctor)
temp 102.1, age 21, sex F,
temp 101.7, age 37, sex M,
Negative Examples (take two aspirins)
temp 99.1, age 43, sex M,
temp 99.6, age 24, sex F,
Approximate Domain Knowledge
if temp high and age young
then neg example
Related work by labs of Mooney, Pazzani, Cohen,
Giles, etc

7
Rich Maclins PhD (1995)

IF a Bee is (Near and West)
an Ice is (Near and North)
Then
Begin
Move East
Move North
END

8
Sample Results
With advice
Without advice
9
Our Motto
Give advice rather than commands to your
computer
10
Outline

Prior Knowledge and Support Vector Machines
Intro to SVMs
Linear Separation
Non-Linear Separation
Function Fitting (Regression)
Advice-Taking Reinforcement Learning
Transfer Learning via Advice Taking

11
Support Vector MachinesMaximizing the Margin
between Bounding Planes
A
A-
Support Vectors
?
Margin
12
Linear Algebra for SVMs

Given p points in n dimensional space

Represent by p-by-n matrix A of reals

Each Ai in class 1 or -1

Separate by two bounding planes

More succinctly

13
Slack VariablesDealing with Data that is not
Linearly Separable
A
A-
Support Vectors
y
14
Support Vector Machines Quadratic Programming
Formulation

Solve this quadratic program

Minimize sum of slack vars with wgt

Maximize margin by minimizing

15
Support Vector MachinesLinear Programming
Formulation
Use 1-norm instead of 2-norm(typically runs
faster better feature selectionmight
generalize better, NIPS 03)
16
Knowledge-Based SVMsGeneralizing Example from
POINT to REGION
A
A-
17
Incorporating Knowledge Sets Into the SVM
Linear Program

Suppose that knowledge set belongs to class
A
Hence must lie in half space

This implication equivalent to set of
constraints (proof in NIPS 02 paper)
18
Resulting LP for KBSVMs
Ranges over regions
19
KBSVM with Slack Variables
Was 0
20
SVMs and Non-Linear Separating Surfaces
Non-linearly map to new space
Linearly separate in new space (using kernels)
Result is non-linear separator in original space
Fung et al. (2003) presents knowledge-based
non-linear SVMs
21
Support Vector Regression(aka Kernel Regression)
Linearly approximating a function, given array A
of inputs and vector y of (numeric) outputs f(x)
xw b Find weights such that Aw be y In
dual space, w A?, so get (A A)? be
y Kernelizing (to get non-linear
approx) K(A,A)? be y
y
x
22
What to Optimize?
Linear program to optimize

1st term (?) is regularizer that minimizes
model complexity
2nd term is approximation error, weighted by
parameter C
Classical least squares fit if quadratic
version and first term ignored

23
Predicting Y for New X

y K(x, A)? b
Use Kernel to compute distance to each training
point (ie, row in A)
Weight by ?i (hopefully many ?i are zero), Sum
Add b (a scalar)

24
Knowledge-Based SVRMangasarian, Shavlik, Wild,
JMLR 04

Add soft constraints to linear program (so need
only follow advice approximately)

4
y
Advice In this region, y should exceed 4
S
minimize w1 C s1 penalty for
violating advice such that y - s ? Aw
b ? y s slacked match to advice
25
Testbeds Subtasks of RoboCup
26
Reinforcement Learning Overview
Described by a set of features
Take an action
Receive a state
Policy choose the action with the highest
Q-value in the current state
Receive a reward
Use the rewards to estimate the Q-values of
actions in states
27
Incorporating Advice in KBKR

Advice format
Bx d ? f(x) hx ?

If distanceToGoal 10 and shotAngle 30 Then
Q(shoot) 0.9
28
Giving Advice About Relative Values of Multiple
FunctionsMaclin et al, AAAI 05
When the input satisfies preconditions(input) Th
en f1(input) gt f2(input)
29
Sample Advice-Taking Results
if distanceToGoal ? 10 and shotAngle
? 30 then prefer shoot over all other actions
Q(shoot) gt Q(pass) Q(shoot) gt Q(move)
advice
2 vs 1 BreakAway, rewards 1, -1
std RL
30
Transfer Learning
Agent learns Task A
We use a user mappingto tell the agent this
Agent encounters related Task B
Task A is the source Task B is the target
Agent discovers how tasks are related
Agent uses knowledge from Task A to learn Task B
faster
31
Transfer LearningThe Goal for the Target Task
better asymptote
faster rise
with transfer
performance
without transfer
training
better start
32
Our Transfer Algorithm
Observe source task games to learn skills
Translate learned skills into transfer advice
Use ILP to create advice for the target task
If there is user advice, add it in
Learn target task with KBKR
33
Learning Skills By Observation

Source-task games are sequences (state, action)
Learning skills is like learning to classify
states by their correct actions
ILP Inductive Logic Programming

34
ILP Searching for First-Order Rules
We also use a random-sampling approach
35
Advantages of ILP

Can produce first-order rules for skills
Capture only the essential aspects of the skill
We expect these aspects to transfer better
Can incorporate background knowledge

pass(teammate1)
...
vs.
pass(Teammate)
pass(teammateN)
36
Example of a Skill Learned by ILP from KeepAway
pass(Teammate) - distBetween(me, Teammate) gt
14, passAngle(Teammate) gt 30,
passAngle(Teammate) lt 150, distBetween(me,
Opponent) lt 7.
Also gave human advice about shooting, since
that is new skill in BreakAway
37
TL Level 7 KA to BA Raw Curves
38
TL Level 7 KA to BA Averaged Curves
39
TL Level 7 Statistics
TL Metrics TL Metrics Average Reward Average Reward Average Reward Average Reward
Type Name KA to BA KA to BA MD to BA MD to BA
Type Name Score P Value Score P Value
I Jump start 0.05 0.0312 0.08 0.0086
I Jump start smoothed 0.08 0.0002 0.06 0.0014
II Transfer ratio 1.82 0.0034 1.86 0.0004
II Transfer ratio (truncated) 1.82 0.0032 1.86 0.0004
II Average relative reduction (narrow) 0.58 0.0042 0.54 0.0004
II Average relative reduction (wide) 0.70 0.0018 0.71 0.0008
II Ratio (of area under the curves) 1.37 0.0056 1.41 0.0012
II Transfer difference 503.57 0.0046 561.27 0.0008
II Transfer difference (scaled) 1017.00 0.0040 1091.2 0.0016
III Asymptotic advantage 0.09 0.0086 0.11 0.0040
III Asymptotic advantage smoothed 0.08 0.0116 0.10 0.0030
Boldface indicates a significant difference was
found
40
Conclusion

Can use much more than I/O pairs in ML
Give advice to computers theyautomatically
refine it based on feedback from user or
environment
Advice an appealing mechanism for transferring
learned knowledgecomputer-to-computer

41
Some Papers (on-line, use Google -)

Creating Advice-Taking Reinforcement Learners,
Maclin Shavlik, Machine Learning 1996
Knowledge-Based Support Vector Machine
Classifiers, Fung, Mangasarian, Shavlik, NIPS
2002
Knowledge-Based Nonlinear Kernel Classifiers,
Fung, Mangasarian, Shavlik, COLT 2003
Knowledge-Based Kernel Approximation,
Mangasarian, Shavlik, Wild, JAIR 2004
Giving Advice about Preferred Actions to
Reinforcement Learners Via Knowledge-Based Kernel
Regression, Maclin, Shavlik, Torrey, Walker,
Wild, AAAI 2005
Skill Acquisition via Transfer Learning and
Advice Taking, Torrey, Shavlik, Walker,
Maclin, ECML 2006