A Survey on Distance Metric Learning (Part 2) - PowerPoint PPT Presentation

About This Presentation

Title:

A Survey on Distance Metric Learning (Part 2)

Description:

Lecture material shamelessly adapted from the following sources: Kilian ... nearest neighbors in a heap-tree structure, update heap tree every 15 gradient steps ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 31

Provided by: IBMU288

Learn more at: http://www1.cs.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: A Survey on Distance Metric Learning (Part 2)

1
A Survey on Distance Metric Learning (Part 2)

Gerry Tesauro
IBM T.J.Watson Research Center

2
Acknowledgement

Lecture material shamelessly adapted from the
following sources
Kilian Weinberger
Survey on Distance Metric Learning slides
IBM summer intern talk slides (Aug. 2006)
Sam Roweis slides (NIPS 2006 workshop on
Learning to Compare Examples)
Yann LeCun talk slides (CVPR 2005, 2006)

3
Outline Part 2

Neighbourhood Components Analysis (Golderberger
et al.), Metric Learning by Collapsing Classes
(Globerson Roweis)
Metric Learning for Kernel Regression (Weinberger
Tesauro)
Metric learning for RL basis function
construction (Keller et al.)
Similarity learning for image processing (LeCun
et al.)

4
Neighborhood Component Analysis
Distance metric for visualization and kNN
(Goldberger et. al. 2004)
5
Metric Learning for Kernel Regression
Weinberger Tesauro, AISTATS 2007
6
Killing three birds with one stone
We construct a method for linear dimensionality
reduction
that generates a meaningful distance metric
optimally tuned for distance-based kernel
regression
7
Kernel Regression

Given training set (xj , yj), j1,,N where x
is ?-dim vector and y is real-valued, estimate
value of a test point xi by weighted avg. of
samples
where kij kD (xi, xj) is a distance-based
kernel function using distance metric D

8
Choice of Kernel

Many functional forms for kij can be used in
MLKR our empirical work uses the Gaussian
kernel
where s is a kernel width parameter (can set s1
W.L.O.G. since we learn D)
softmax regression estimate similar to Roweis
softmax classifier

9
Distance Metric for Nearest Neighbor Regression
Learn a linear transformation that allows to
estimate the value of a test point from its
nearest neighbors
10
Mahalanobis Metric
Distance function is a pseudo Mahalanobis metric
(Generalizes Euclidean distance)
11
General Metric Learning Objective

Find parmaterized distance function D? that
minimizes total leave-one-out cross-validation
loss function
e.g. params ? elements Aij of A matrix
Since were solving for A not M, optimization is
non-convex ? use gradient descent

12
Gradient Computation

where xij xi xj
For fast implementation
Dont sum over all i-j pairs, only go up to 1000
nearest neighbors for each sample i
Maintain nearest neighbors in a heap-tree
structure, update heap tree every 15 gradient
steps
Ignore sufficiently small values of kij ( lt e-34
)
Even better data structures cover trees, k-d
trees

13
Learned Distance Metric example
orig. Euclidean D lt 1
learned D lt 1
14
Twin Peaks test
Training
n8000
we added 3 dimensions with 1000 noise
we rotated 5 dimensions randomly
15
Input Variance
Noise
Signal
16
Test data
17
Test data
18
Output Variance
Signal
Noise
19
DimReduction with MLKR

FG-NET face data 82 persons, 984 face images
w/age

20
DimReduction with MLKR

FG-NET face data 82 persons, 984 face images
w/age

21
DimReduction with MLKR
PowerManagement data (d21)

Force A to be rectangular
Project onto eigenvectors of A
Allows visualization of data

22
Robot arm results (8,32dim)
regression error
23
Unity Data Center Prototype

Objective Learn long-range resource value
estimates for each application manager
State Variables (48)
Arrival rate
ResponseTime
QueueLength
iatVariance
rtVariance
Action of servers allocated
by Arbiter
Reward SLA(Resp. Time)

Maximize Total SLA Revenue
5 sec
Demand (HTTP req/sec)
Demand (HTTP req/sec)
Value(srvrs)
Value(srvrs)
Value(srvrs)
SLA
SLA
SLA
Value(RT)
WebSphere 5.1
Value(srvrs)
WebSphere 5.1
Value(RT)
DB2
DB2
Trade3
Batch
Trade3
8 xSeries servers
(Tesauro, AAAI 2005 Tesauro et al., ICAC 2006)
24
Power Performance Management

Objective Managing systems to multi-discipline
objectives minimize Resp. Time and minimize
Power Usage
State Variables (21)
Power Cap
Power Usage
CPU Utilization
Temperature
of requests arrived
Workload intensity ( Clients)
Response Time
Action Power Cap
Reward SLA(Resp. Time) Power Usage

(Kephart et al., ICAC 2007)
25
IBM Regression Results TEST ERROR
MLKR
14/47
3/5
10/22
26
IBM Regression Results TRAINING ERROR
MLKR
27
Metric Learning for RL basis function
construction (Keller et al. ICML 2006)

RL Dataset of state-action-reward tuples (si,
ai, ri) , i1,,N

28
Value Iteration

Define an iterative bootstrap calculation
Each round of VI must iterate over all states in
the state space
Try to speed this up using state aggregation
(Bertsekas Castanon, 1989)
Idea Use NCA to aggregate states
project states into lower-dim rep keep states
with similar Bellman error close together
use projected states to define a set of basis
functions ?
learn linear value function over basis functions
V ? ?i ?i

29
Chopra et. al. 2005
Similarity metric for image verification.
Problem Given a pair of face-images, decide if
they are from the same person.
30
Chopra et. al. 2005
Similarity metric for image verification.
Problem Given a pair of face-images, decide if
they are from the same person.
Too difficult for linear mapping!
31
(No Transcript)

Write a Comment

User Comments (0)