When Is Nearest Neighbor Meaningful - PowerPoint PPT Presentation

About This Presentation
Title:

When Is Nearest Neighbor Meaningful

Description:

Serious questions are raised about techniques that map ... Bottom left - Ideal clusters. Bottom right - Distance distribution for ideally clustered data/queries ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 10
Provided by: uris5
Category:

less

Transcript and Presenter's Notes

Title: When Is Nearest Neighbor Meaningful


1
When Is Nearest Neighbor Meaningful?
  • By Kevin Beyer, Jonathan Goldstein, Raghu
    Ramakrishnan, and Uri Shaft

2
Nearest neighbor queries
Typical query in 2D
Unstable query in 2D
3
Main theoretical instability result
(i.e. As dimensionality increases, all points
become equidistant w.r.t. the query point)
4
IID contrast as dimensionality increases
5
Repercussions of the technical result
  • Serious questions are raised about techniques
    that map approximate similarity into high
    dimensional nearest neighbor problems.
  • The ease with which linear scan beats more
    complex access methods for high-D nearest
    neighbor is explained by our theorem.
  • These results should not be taken to mean that
    all high dimensional nearest neighbor problems
    are badly framed or that more complex access
    methods will always fail on individual high-D
    data sets.

6
Example result application
  • Assume the following
  • The data distribution and query distribution are
    IID in all dimensions.
  • All the appropriate moments are finite (i.e., up
    to the é2pùth moment).
  • The query point is chosen independently of the
    data points.

7
Examples that meet our condition
  • IID (Identical Independently Distributed), Q D
    (Query distribution follows data distribution)
  • Variance converging to 0 at a bounded rate, Q D
  • Variance converging to infinity at a bounded
    rate, Q D
  • Partial correlation between all dimensions, Q D
  • Variance converging to 0 at a bounded rate, and
    partial correlation between all dimensions, Q D
  • Perfectly realized clustering, Q IID uniform

8
Examples that dont meet our condition
  • Total correlation between all dimensions, Q D
  • All dimensions are linear combinations of a fixed
    number of IID random variables, Q D
  • Perfectly realized clustering with query
    distribution following data distribution, Q D

9
Contrast in ideally clustered data
Top right - Typical distance distribution Bottom
left - Ideal clusters Bottom right - Distance
distribution for ideally clustered data/queries
Write a Comment
User Comments (0)
About PowerShow.com