Online and Batch Learning of Pseudo-Metrics - PowerPoint PPT Presentation

About This Presentation
Title:

Online and Batch Learning of Pseudo-Metrics

Description:

1 Nearest Neighbor (1-NN) Perceptron Algorithm. Perceptron Algorithm with Uneven Margins (PAUM) ... A color-coded matrix of Euclidean distances between pairs of ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 24
Provided by: sha113
Category:

less

Transcript and Presenter's Notes

Title: Online and Batch Learning of Pseudo-Metrics


1
Online and Batch Learning of Pseudo-Metrics
  • Shai Shalev-Shwartz
  • Hebrew University, Jerusalem
  • Joint work with
  • Yoram Singer, Google Inc.
  • Andrew Y. Ng, Stanford University

2
Motivating Example
3
Our Technique
  • Map instances into a space in which distances
    correspond to labels

4
Outline
  • Distance learning setting
  • Large margin for distances
  • An online learning algorithm
  • Online loss analysis
  • A dual version
  • Experiments
  • Online - document filtering
  • Batch - handwritten digit recognition

5
Problem Setting
  • Training examples
  • two instances
  • similarity label
  • Hypotheses class Pseudo-metrics

matrix
symmetric positive semi-definite matrix
6
Large Margin for Pseudo-Metrics
  • Sample S is ?-separated w.r.t. a metric

7
Batch Formulation
s.t.
s.t.
8
Pseudo-metric Online Learning Algorithm (POLA)
  • For
  • Get two instances
  • Calculate distance
  • Predict
  • Get true label and suffer hinge-loss
  • Update matrix and threshold

9
Core Update Two Projections
  • Projection of vector v on closed convex set C
  • Two-step update
  • 1) Project onto a half-space
  • 2) Project onto the PSD cone

10
Core Update Two Projections
  • Start with
  • An example defines a half-space
  • is the projection of onto this
    half-space
  • is the projection of onto the PSD
    cone

PSD cone
All zero loss matrices
11
Online Learning
  • Goal minimize cumulative loss
  • Why Online?
  • Online processing tasks (e.g. Text Filtering)
  • Simple to implement
  • Memory and run-time efficient
  • Worst-case bounds on the performance
  • Online to batch conversions

12
Online Loss Bound
  • sequence of
    examples s.t.
  • any fixed matrix and threshold
  • Then,
  • Loss bound does not depend on dimension

13
Incorporating Kernels
  • Matrix A can be written as
    , where
  • Therefore

14
Online Experiments
  • Task Document filtering according to topics
  • Dataset Reuters-21578
  • 10,000 documents
  • Documents labeled as Relevant and Irrelevant
  • A few relevant documents (1 - 10 of entire set)
  • Algorithms
  • POLA
  • 1 Nearest Neighbor (1-NN)
  • Perceptron Algorithm
  • Perceptron Algorithm with Uneven Margins (PAUM)
    (Li, Zaragoza, Herbrich, Shawe-Taylor, Kandola)

15
POLA for Document Filtering
  • Get a document
  • Calculate distance to relevant documents observed
    so far using current matrix
  • Predict document is relevant iff the distance to
    the closest relevant document is smaller than the
    current threshold
  • Get true label
  • Update matrix and threshold

16
Document Filtering Results
  • Each blue point corresponds to one topic
  • Y-axis designates the error of POLA
  • Points beneath the black diagonal line mean that
    POLA wins

17
Batch Experiments
  • Task Handwritten digits recognition
  • Dataset MNIST dataset
  • 45 binary classification problems (all pairs)
  • 10,000 training examples
  • 10,000 test examples
  • Algorithms Used k-NN with various metrics
  • Pseudo-metric learned by POLA
  • Euclidean distance
  • Metric induced by Fisher Discriminant Analysis
    (FDA)
  • Metric learned by Relevant Component Analysis
    (RCA)
  • (Bar-Hillel, Hertz, Shental, and Weinshall)

18
MNIST Results
  • Each blue point corresponds to one binary
    classification problem
  • Y-axis designates the error of POLA
  • Points beneath the black diagonal line mean that
    POLA wins

Euclidean distance error
FDA error
RCA error
RCA was applied after using PCA as a
pre-processing step
19
Experiments Dimensionality Reduction
PCA
POLA
20
Toy problem
A color-coded matrix of Euclidean distances
between pairs of images
21
Metric found by POLA
22
Mapping found by POLA
  • Our Pseudo-metrics

23
Mapping found by POLA
24
Summary and Extensions
  • An online algorithm for learning pseudo-metrics
  • Formal properties, good experimental results
  • Extensions
  • Alternative regularization schemes to the
    Frobenius norm
  • Learning to learn
  • Learning a metric from one set of classes and
    apply to another set of related classes

25
  • Hello ? bye ? w x
Write a Comment
User Comments (0)
About PowerShow.com