Nearest Neighbor Editing and Condensing Techniques - PowerPoint PPT Presentation

About This Presentation
Title:

Nearest Neighbor Editing and Condensing Techniques

Description:

Consider a two class problem where each sample consists of two measurements (x,y) ... Compute only an approximate distance (LSH) Remove redundant data (condensing) ... – PowerPoint PPT presentation

Number of Views:460
Avg rating:3.0/5.0
Slides: 27
Provided by: david134
Learn more at: https://www2.cs.uh.edu
Category:

less

Transcript and Presenter's Notes

Title: Nearest Neighbor Editing and Condensing Techniques


1
Nearest Neighbor Editing and Condensing
Techniques
Organization
  • Nearest Neighbor Revisited
  • Condensing Techniques
  • Proximity Graphs and Decision Boundaries
  • Editing Techniques

Last updated Oct. 7, 2005
2
Nearest Neighbour Rule
Non-parametric pattern classification. Consider a
two class problem where each sample consists of
two measurements (x,y).
k 1
For a given query point q, assign the class of
the nearest neighbour.
k 3
Compute the k nearest neighbours and assign the
class by majority vote.
3
Example Digit Recognition
  • Yann LeCunn MNIST Digit Recognition
  • Handwritten digits
  • 28x28 pixel images d 784
  • 60,000 training samples
  • 10,000 test samples
  • Nearest neighbour is competitive

Test Error Rate () Test Error Rate ()
Linear classifier (1-layer NN) 12.0
K-nearest-neighbors, Euclidean 5.0
K-nearest-neighbors, Euclidean, deskewed 2.4
K-NN, Tangent Distance, 16x16 1.1
K-NN, shape context matching 0.67
1000 RBF linear classifier 3.6
SVM deg 4 polynomial 1.1
2-layer NN, 300 hidden units 4.7
2-layer NN, 300 HU, deskewing 1.6
LeNet-5, distortions 0.8
Boosted LeNet-4, distortions 0.7
4
Nearest Neighbour Issues
  • Expensive
  • To determine the nearest neighbour of a query
    point q, must compute the distance to all N
    training examples
  • Pre-sort training examples into fast data
    structures (kd-trees)
  • Compute only an approximate distance (LSH)
  • Remove redundant data (condensing)
  • Storage Requirements
  • Must store all training data P
  • Remove redundant data (condensing)
  • Pre-sorting often increases the storage
    requirements
  • High Dimensional Data
  • Curse of Dimensionality
  • Required amount of training data increases
    exponentially with dimension
  • Computational cost also increases dramatically
  • Partitioning techniques degrade to linear search
    in high dimension

5
Exact Nearest Neighbour
  • Asymptotic error (infinite sample size) is less
    than twice the Bayes classification error
  • Requires a lot of training data
  • Expensive for high dimensional data (dgt20?)
  • O(Nd) complexity for both storage and query time
  • N is the number of training examples, d is the
    dimension of each sample
  • This can be reduced through dataset
    editing/condensing

6
Decision Regions
Each cell contains one sample, and every location
within the cell is closer to that sample than to
any other sample. A Voronoi diagram divides the
space into such cells.
Every query point will be assigned the
classification of the sample within that cell.
The decision boundary separates the class regions
based on the 1-NN decision rule. Knowledge of
this boundary is sufficient to classify new
points. The boundary itself is rarely computed
many algorithms seek to retain only those points
necessary to generate an identical boundary.
7
Condensing
  • Aim is to reduce the number of training samples
  • Retain only the samples that are needed to define
    the decision boundary
  • This is reminiscent of a Support Vector Machine
  • Decision Boundary Consistent a subset whose
    nearest neighbour decision boundary is identical
    to the boundary of the entire training set
  • Consistent Set --- the smallest subset of the
    training data that correctly classifies all of
    the original training data
  • Minimum Consistent Set smallest consistent set

Original data
Condensed data
Minimum Consistent Set
8
Condensing
  • Condensed Nearest Neighbour (CNN) Hart 1968
  • Incremental
  • Order dependent
  • Neither minimal nor decision boundary consistent
  • O(n3) for brute-force method
  • Can follow up with reduced NN Gates72
  • Remove a sample if doing so does not cause any
    incorrect classifications
  1. Initialize subset with a single training example
  2. Classify all remaining samples using the subset,
    and transfer any incorrectly classified samples
    to the subset
  3. Return to 2 until no transfers occurred or the
    subset is full

Produces consistent set
9
Condensing
  • Condensed Nearest Neighbour (CNN) Hart 1968
  • Incremental
  • Order dependent
  • Neither minimal nor decision boundary consistent
  • O(n3) for brute-force method
  • Can follow up with reduced NN Gates72
  • Remove a sample if doing so does not cause any
    incorrect classifications
  1. Initialize subset with a single training example
  2. Classify all remaining samples using the subset,
    and transfer any incorrectly classified samples
    to the subset
  3. Return to 2 until no transfers occurred or the
    subset is full

10
Condensing
  • Condensed Nearest Neighbour (CNN) Hart 1968
  • Incremental
  • Order dependent
  • Neither minimal nor decision boundary consistent
  • O(n3) for brute-force method
  • Can follow up with reduced NN Gates72
  • Remove a sample if doing so does not cause any
    incorrect classifications
  1. Initialize subset with a single training example
  2. Classify all remaining samples using the subset,
    and transfer any incorrectly classified samples
    to the subset
  3. Return to 2 until no transfers occurred or the
    subset is full

11
Condensing
  • Condensed Nearest Neighbour (CNN) Hart 1968
  • Incremental
  • Order dependent
  • Neither minimal nor decision boundary consistent
  • O(n3) for brute-force method
  • Can follow up with reduced NN Gates72
  • Remove a sample if doing so does not cause any
    incorrect classifications
  1. Initialize subset with a single training example
  2. Classify all remaining samples using the subset,
    and transfer any incorrectly classified samples
    to the subset
  3. Return to 2 until no transfers occurred or the
    subset is full

12
Condensing
  • Condensed Nearest Neighbour (CNN) Hart 1968
  • Incremental
  • Order dependent
  • Neither minimal nor decision boundary consistent
  • O(n3) for brute-force method
  • Can follow up with reduced NN Gates72
  • Remove a sample if doing so does not cause any
    incorrect classifications
  1. Initialize subset with a single training example
  2. Classify all remaining samples using the subset,
    and transfer any incorrectly classified samples
    to the subset
  3. Return to 2 until no transfers occurred or the
    subset is full

13
Condensing
  • Condensed Nearest Neighbour (CNN) Hart 1968
  • Incremental
  • Order dependent
  • Neither minimal nor decision boundary consistent
  • O(n3) for brute-force method
  • Can follow up with reduced NN Gates72
  • Remove a sample if doing so does not cause any
    incorrect classifications
  1. Initialize subset with a single training example
  2. Classify all remaining samples using the subset,
    and transfer any incorrectly classified samples
    to the subset
  3. Return to 2 until no transfers occurred or the
    subset is full

14
Condensing
  • Condensed Nearest Neighbour (CNN) Hart 1968
  • Incremental
  • Order dependent
  • Neither minimal nor decision boundary consistent
  • O(n3) for brute-force method
  • Can follow up with reduced NN Gates72
  • Remove a sample if doing so does not cause any
    incorrect classifications
  1. Initialize subset with a single training example
  2. Classify all remaining samples using the subset,
    and transfer any incorrectly classified samples
    to the subset
  3. Return to 2 until no transfers occurred or the
    subset is full

15
Proximity Graphs
  • Condensing aims to retain points along the
    decision boundary
  • How to identify such points?
  • Neighbouring points of different classes
  • Proximity graphs provide various definitions of
    neighbour

NNG Nearest Neighbour Graph MST Minimum
Spanning Tree RNG Relative Neighbourhood
Graph GG Gabriel Graph DT Delaunay
Triangulation (neighbours of a 1NN-classifier)
16
Proximity Graphs Delaunay
  • The Delaunay Triangulation is the dual of the
    Voronoi diagram
  • Three points are each others neighbours if their
    tangent sphere contains no other points
  • Voronoi condensing retain those points whose
    neighbours (as defined by the Delaunay
    Triangulation) are of the opposite class
  • The decision boundary is identical
  • Conservative subset
  • Retains extra points
  • Expensive to compute in high dimensions

17
Proximity Graphs Gabriel
  • The Gabriel graph is a subset of the Delaunay
    Triangulation (some decision boundary might be
    missed)
  • Points are neighbours only if their (diametral)
    sphere of influence is empty
  • Does not preserve the identical decision
    boundary, but most changes occur outside the
    convex hull of the data points
  • Can be computed more efficiently

Green lines denote Tomek links
18
(No Transcript)
19
Not a Gabriel Edge
20
Proximity Graphs RNG
  • The Relative Neighbourhood Graph (RNG) is a
    subset of the Gabriel graph
  • Two points are neighbours if the lune defined
    by the intersection of their radial spheres is
    empty
  • Further reduces the number of neighbours
  • Decision boundary changes are often drastic, and
    not guaranteed to be training set consistent

Gabriel edited
RNG edited not consistent
21
Dataset Reduction Editing
  • Training data may contain noise, overlapping
    classes
  • starting to make assumptions about the underlying
    distributions
  • Editing seeks to remove noisy points and produce
    smooth decision boundaries often by retaining
    points far from the decision boundaries
  • Results in homogenous clusters of points

22
Wilson Editing
  • Wilson 1972
  • Remove points that do not agree with the majority
    of their k nearest neighbours

Earlier example
Overlapping classes
Original data
Original data
Wilson editing with k7
Wilson editing with k7
23
Multi-edit
  1. Diffusion divide data into N 3 random subsets
  2. Classification Classify Si using 1-NN with
    S(i1)Mod N as the training set (i 1..N)
  3. Editing Discard all samples incorrectly
    classified in (2)
  4. Confusion Pool all remaining samples into a new
    set
  5. Termination If the last I iterations produced no
    editing then end otherwise go to (1)
  • Multi-edit Devijer Kittler 79
  • Repeatedly apply Wilson editing to random
    partitions
  • Classify with the 1-NN rule
  • Approximates the error rate of the Bayes decision
    rule

Multi-edit, 8 iterations last 3 same
24
Combined Editing/Condensing
  • First edit the data to remove noise and smooth
    the boundary
  • Then condense to obtain a smaller subset

25
Where are we with respect to NN?
  • Simple method, pretty powerful rule
  • Very popular in text mining (?seems to work well
    for this task)
  • Can be made to run fast
  • Requires a lot of training data
  • Edit to reduce noise, class overlap, overfitting
  • Condense to remove data that are not needed to
    enhance speed

26
Problems when using k-NN in Practice
  • What distance measure to use?
  • Often Euclidean distance is used
  • Locally adaptive metrics
  • More complicated with non-numeric data, or when
    different dimensions have different scales
  • Choice of k?
  • Cross-validation
  • 1-NN often performs well in practice
  • k-NN needed for overlapping classes
  • Reduce k-NN problem to 1-NN through dataset
    editing
Write a Comment
User Comments (0)
About PowerShow.com