kNN: A Non-parametric Classification and Prediction Technique - PowerPoint PPT Presentation

About This Presentation
Title:

kNN: A Non-parametric Classification and Prediction Technique

Description:

K-nearest neighbors of a record x are data points that have the k smallest distance to x ... Compute distance between two points: Euclidean distance ... – PowerPoint PPT presentation

Number of Views:253
Avg rating:3.0/5.0
Slides: 15
Provided by: Compu265
Learn more at: https://www2.cs.uh.edu
Category:

less

Transcript and Presenter's Notes

Title: kNN: A Non-parametric Classification and Prediction Technique


1
kNN A Non-parametric Classification and
Prediction Technique
  • Goals of this set of transparencies
  • Introduce kNN---a popular non-parameric technique
  • Illustrate differences between parametric
  • and not parametric techniques
  • Later
  • Non-Parametric Density Estimation Techniques
  • Editing and Condensing Techniques to Enhance kNN

2
Classification and Decision Boundaries
  • Classification can be viewed as learning good
    decision boundaries that separate the examples
    belonging to different classes in a data set.

Decision boundary
3
Problems with Parametric Techniques
  • Parametric approaches assume that the type of
    model is known before hand, which is not
    realistic for many application.
  • The types of models of parametric approaches are
    kind of simplistic. If the characteristics of
    the data do no match the assumptions of the
    underlying model unreliable predictions are
    obtained.
  • Non-parametric approacheskey ideas
  • Let the data speak for themselves
  • Predict new cases based on similar cases
  • Use multiple local models instead of a single
    global model

4
Instance-Based Classifiers
  • Store the training records
  • Use training records to predict the class
    label of unseen cases

5
Instance Based Classifiers
  • Instance-based Classifiers do not create a model
    but use training examples directly to classify
    unseen examples (lazy classifiers).
  • Examples
  • Rote-learner
  • Memorizes entire training data and performs
    classification only if attributes of record match
    one of the training examples exactly
  • Nearest neighbor
  • Uses k closest points (nearest neighbors) for
    performing classification

6
kNN k Nearest-Neighbor Classifiers
  • Requires three things
  • The set of stored records
  • Distance Metric to compute distance between
    records
  • The value of k, the number of nearest neighbors
    to retrieve
  • To classify an unknown record
  • Compute distance to other training records
  • Identify k nearest neighbors
  • Use class labels of nearest neighbors to
    determine the class label of unknown record
    (e.g., by taking majority vote)

7
Definition of Nearest Neighbor
K-nearest neighbors of a record x are data
points that have the k smallest distance to x
8
Voronoi Diagrams for NN-Classifiers
Each cell contains one sample, and every location
within the cell is closer to that sample than to
any other sample. A Voronoi diagram divides the
space into such cells.
Every query point will be assigned the
classification of the sample within that cell.
The decision boundary separates the class regions
based on the 1-NN decision rule. Knowledge of
this boundary is sufficient to classify new
points. Remarks Voronoi diagrams can be computed
in lower dimensional spaces in feasible for
higher dimensional spaced. They also represent
models for clusters that have been generate by
representative-based clustering algorithms.
9
Nearest Neighbor Classification
  • Compute distance between two points
  • Euclidean distance
  • Determine the class from nearest neighbor list
  • take the majority vote of class labels among the
    k-nearest neighbors
  • Weigh the vote according to distance
  • weight factor, w 1/d2

10
Nearest Neighbor Classification
  • Choosing the value of k
  • If k is too small, sensitive to noise points
  • If k is too large, neighborhood may include
    points from other classes

11
Voronoi Diagrams for NN-Classifiers
Each cell contains one sample, and every location
within the cell is closer to that sample than to
any other sample. A Voronoi diagram divides the
space into such cells.
Every query point will be assigned the
classification of the sample within that cell.
The decision boundary separates the class regions
based on the 1-NN decision rule. Knowledge of
this boundary is sufficient to classify new
points. Remarks Voronoi diagrams can be computed
in lower dimensional spaces in feasible for
higher dimensional spaced. They also represent
models for clusters that have been generate by
representative-based clustering algorithms.
12
K-NNMore Complex Decision Boundaries
13
Nearest Neighbor Classification
  • Scaling issues
  • Attributes may have to be scaled to prevent
    distance measures from being dominated by one of
    the attributes
  • Example
  • height of a person may vary from 1.5m to 1.8m
  • weight of a person may vary from 90lb to 300lb
  • income of a person may vary from 10K to 1M

14
Summary Nearest Neighbor Classifiers
  • k-NN classifiers are lazy learners
  • Unlike eager learners such as decision tree
    induction and rule-based systems, it does not
    build models explicitly
  • Classifying unknown records is relatively
    expensive
  • Rely on local knowledge (let the data speak for
    themselves) and not on global models to make
    decisions.
  • k-NN classifiers rely on a distance function the
    quality of the distance function is critical for
    the performance of a K-NN classifier.
  • Capable to create quite complex decision
    boundaries which consists of edges of the Voronoi
    diagram.
  • K-NN classifiers never actually compute decision
    boundaries, in contrast to decision trees/SVMs.
  • k-NN classifiers obtain high accuracies and are
    quite popular in some fields, such as text data
    mining and in information retrieval, in general.
Write a Comment
User Comments (0)
About PowerShow.com