kNN: A Non-parametric Classification and Prediction Technique - PowerPoint PPT Presentation

About This Presentation

Title:

kNN: A Non-parametric Classification and Prediction Technique

Description:

K-nearest neighbors of a record x are data points that have the k smallest distance to x ... Compute distance between two points: Euclidean distance ... – PowerPoint PPT presentation

Number of Views:253

Avg rating:3.0/5.0

Slides: 15

Provided by: Compu265

Learn more at: https://www2.cs.uh.edu

Category:

more less

Transcript and Presenter's Notes

Title: kNN: A Non-parametric Classification and Prediction Technique

1
kNN A Non-parametric Classification and
Prediction Technique

Goals of this set of transparencies
Introduce kNN---a popular non-parameric technique
Illustrate differences between parametric
and not parametric techniques
Later
Non-Parametric Density Estimation Techniques
Editing and Condensing Techniques to Enhance kNN

2
Classification and Decision Boundaries

Classification can be viewed as learning good
decision boundaries that separate the examples
belonging to different classes in a data set.

Decision boundary
3
Problems with Parametric Techniques

Parametric approaches assume that the type of
model is known before hand, which is not
realistic for many application.
The types of models of parametric approaches are
kind of simplistic. If the characteristics of
the data do no match the assumptions of the
underlying model unreliable predictions are
obtained.
Non-parametric approacheskey ideas
Let the data speak for themselves
Predict new cases based on similar cases
Use multiple local models instead of a single
global model

4
Instance-Based Classifiers

Store the training records
Use training records to predict the class
label of unseen cases

5
Instance Based Classifiers

Instance-based Classifiers do not create a model
but use training examples directly to classify
unseen examples (lazy classifiers).
Examples
Rote-learner
Memorizes entire training data and performs
classification only if attributes of record match
one of the training examples exactly
Nearest neighbor
Uses k closest points (nearest neighbors) for
performing classification

6
kNN k Nearest-Neighbor Classifiers

Requires three things
The set of stored records
Distance Metric to compute distance between
records
The value of k, the number of nearest neighbors
to retrieve
To classify an unknown record
Compute distance to other training records
Identify k nearest neighbors
Use class labels of nearest neighbors to
determine the class label of unknown record
(e.g., by taking majority vote)

7
Definition of Nearest Neighbor
K-nearest neighbors of a record x are data
points that have the k smallest distance to x
8
Voronoi Diagrams for NN-Classifiers
Each cell contains one sample, and every location
within the cell is closer to that sample than to
any other sample. A Voronoi diagram divides the
space into such cells.
Every query point will be assigned the
classification of the sample within that cell.
The decision boundary separates the class regions
based on the 1-NN decision rule. Knowledge of
this boundary is sufficient to classify new
points. Remarks Voronoi diagrams can be computed
in lower dimensional spaces in feasible for
higher dimensional spaced. They also represent
models for clusters that have been generate by
representative-based clustering algorithms.
9
Nearest Neighbor Classification

Compute distance between two points
Euclidean distance
Determine the class from nearest neighbor list
take the majority vote of class labels among the
k-nearest neighbors
Weigh the vote according to distance
weight factor, w 1/d2

10
Nearest Neighbor Classification

Choosing the value of k
If k is too small, sensitive to noise points
If k is too large, neighborhood may include
points from other classes

11
Voronoi Diagrams for NN-Classifiers
Each cell contains one sample, and every location
within the cell is closer to that sample than to
any other sample. A Voronoi diagram divides the
space into such cells.
Every query point will be assigned the
classification of the sample within that cell.
The decision boundary separates the class regions
based on the 1-NN decision rule. Knowledge of
this boundary is sufficient to classify new
points. Remarks Voronoi diagrams can be computed
in lower dimensional spaces in feasible for
higher dimensional spaced. They also represent
models for clusters that have been generate by
representative-based clustering algorithms.
12
K-NNMore Complex Decision Boundaries
13
Nearest Neighbor Classification

Scaling issues
Attributes may have to be scaled to prevent
distance measures from being dominated by one of
the attributes
Example
height of a person may vary from 1.5m to 1.8m
weight of a person may vary from 90lb to 300lb
income of a person may vary from 10K to 1M

14
Summary Nearest Neighbor Classifiers

k-NN classifiers are lazy learners
Unlike eager learners such as decision tree
induction and rule-based systems, it does not
build models explicitly
Classifying unknown records is relatively
expensive
Rely on local knowledge (let the data speak for
themselves) and not on global models to make
decisions.
k-NN classifiers rely on a distance function the
quality of the distance function is critical for
the performance of a K-NN classifier.
Capable to create quite complex decision
boundaries which consists of edges of the Voronoi
diagram.
K-NN classifiers never actually compute decision
boundaries, in contrast to decision trees/SVMs.
k-NN classifiers obtain high accuracies and are
quite popular in some fields, such as text data
mining and in information retrieval, in general.