Title: A Survey on Distance Metric Learning (Part 2)
1A Survey on Distance Metric Learning (Part 2)
- Gerry Tesauro
- IBM T.J.Watson Research Center
2Acknowledgement
- Lecture material shamelessly adapted from the
following sources - Kilian Weinberger
- Survey on Distance Metric Learning slides
- IBM summer intern talk slides (Aug. 2006)
- Sam Roweis slides (NIPS 2006 workshop on
Learning to Compare Examples) - Yann LeCun talk slides (CVPR 2005, 2006)
3Outline Part 2
- Neighbourhood Components Analysis (Golderberger
et al.), Metric Learning by Collapsing Classes
(Globerson Roweis) - Metric Learning for Kernel Regression (Weinberger
Tesauro) - Metric learning for RL basis function
construction (Keller et al.) - Similarity learning for image processing (LeCun
et al.)
4Neighborhood Component Analysis
Distance metric for visualization and kNN
(Goldberger et. al. 2004)
5Metric Learning for Kernel Regression
Weinberger Tesauro, AISTATS 2007
6Killing three birds with one stone
We construct a method for linear dimensionality
reduction
that generates a meaningful distance metric
optimally tuned for distance-based kernel
regression
7Kernel Regression
- Given training set (xj , yj), j1,,N where x
is ?-dim vector and y is real-valued, estimate
value of a test point xi by weighted avg. of
samples - where kij kD (xi, xj) is a distance-based
kernel function using distance metric D
8Choice of Kernel
- Many functional forms for kij can be used in
MLKR our empirical work uses the Gaussian
kernel - where s is a kernel width parameter (can set s1
W.L.O.G. since we learn D) - softmax regression estimate similar to Roweis
softmax classifier
9Distance Metric for Nearest Neighbor Regression
Learn a linear transformation that allows to
estimate the value of a test point from its
nearest neighbors
10Mahalanobis Metric
Distance function is a pseudo Mahalanobis metric
(Generalizes Euclidean distance)
11General Metric Learning Objective
- Find parmaterized distance function D? that
minimizes total leave-one-out cross-validation
loss function - e.g. params ? elements Aij of A matrix
- Since were solving for A not M, optimization is
non-convex ? use gradient descent -
12Gradient Computation
- where xij xi xj
- For fast implementation
- Dont sum over all i-j pairs, only go up to 1000
nearest neighbors for each sample i - Maintain nearest neighbors in a heap-tree
structure, update heap tree every 15 gradient
steps - Ignore sufficiently small values of kij ( lt e-34
) - Even better data structures cover trees, k-d
trees -
13Learned Distance Metric example
orig. Euclidean D lt 1
learned D lt 1
14Twin Peaks test
Training
n8000
we added 3 dimensions with 1000 noise
we rotated 5 dimensions randomly
15Input Variance
Noise
Signal
16Test data
17Test data
18Output Variance
Signal
Noise
19DimReduction with MLKR
- FG-NET face data 82 persons, 984 face images
w/age
20DimReduction with MLKR
- FG-NET face data 82 persons, 984 face images
w/age
21DimReduction with MLKR
PowerManagement data (d21)
- Force A to be rectangular
- Project onto eigenvectors of A
- Allows visualization of data
22Robot arm results (8,32dim)
regression error
23Unity Data Center Prototype
- Objective Learn long-range resource value
estimates for each application manager - State Variables (48)
- Arrival rate
- ResponseTime
- QueueLength
- iatVariance
- rtVariance
- Action of servers allocated
- by Arbiter
- Reward SLA(Resp. Time)
Maximize Total SLA Revenue
5 sec
Demand (HTTP req/sec)
Demand (HTTP req/sec)
Value(srvrs)
Value(srvrs)
Value(srvrs)
SLA
SLA
SLA
Value(RT)
WebSphere 5.1
Value(srvrs)
WebSphere 5.1
Value(RT)
DB2
DB2
Trade3
Batch
Trade3
8 xSeries servers
(Tesauro, AAAI 2005 Tesauro et al., ICAC 2006)
24Power Performance Management
- Objective Managing systems to multi-discipline
objectives minimize Resp. Time and minimize
Power Usage - State Variables (21)
- Power Cap
- Power Usage
- CPU Utilization
- Temperature
- of requests arrived
- Workload intensity ( Clients)
- Response Time
- Action Power Cap
- Reward SLA(Resp. Time) Power Usage
(Kephart et al., ICAC 2007)
25IBM Regression Results TEST ERROR
MLKR
14/47
3/5
10/22
26IBM Regression Results TRAINING ERROR
MLKR
27Metric Learning for RL basis function
construction (Keller et al. ICML 2006)
- RL Dataset of state-action-reward tuples (si,
ai, ri) , i1,,N
28Value Iteration
- Define an iterative bootstrap calculation
- Each round of VI must iterate over all states in
the state space - Try to speed this up using state aggregation
(Bertsekas Castanon, 1989) - Idea Use NCA to aggregate states
- project states into lower-dim rep keep states
with similar Bellman error close together - use projected states to define a set of basis
functions ? - learn linear value function over basis functions
V ? ?i ?i
29Chopra et. al. 2005
Similarity metric for image verification.
Problem Given a pair of face-images, decide if
they are from the same person.
30Chopra et. al. 2005
Similarity metric for image verification.
Problem Given a pair of face-images, decide if
they are from the same person.
Too difficult for linear mapping!
31(No Transcript)