Clustering Algorithms presentation

About This Presentation

Transcript and Presenter's Notes

Title: Clustering Algorithms

1
Clustering Algorithms
2
K Nearest Neighbors (KNN)
X
3
K Nearest Neighbor

Store all input/output pairs in the training set
For each pattern in the test set
Search for the K nearest patterns to the input
patterns using a Euclidean distance measure
For classification, compute the confidence for
each class as Ci/K, where Ci is the number of
patterns among the K nearest patterns belonging
to class i The classification of the input
pattern is the class with the highest confidence.
For estimation, the output value is based on the
average of the output values of the K nearest
patterns

4
K Nearest Neighbor Settings

Number of Nearest Neighbors (K)
should be based on cross validation over many K
settings. Generally, where p is the total
number of training patterns.
Input Compression
used if storage/memory is an issue
affects precision of algorithm
Distance Metric
Examples Euclidean, Manhattan, absolute
dimension
Combination of the k neighbors
make them equal or weighted average
May use Principle Component Analysis to map
higher dimensional inputs into key meaningful
dimensions for feasible KNN problem

5
Nearest Cluster

A condensed version of KNN generally used for
classification
Partitions the training set into a few clusters
of neighbors
Each cluster has numerical value for posterier
probability of all possible classes given the
input attributes for the members of the cluster
A new item is classified by finding its nearest
cluster and using that clusters posterier
probability estimates to estimate the class for
the new item.

6
Nearest Cluster Training

Perform K means clustering on the training data
For each cluster, generate a probability for each
class according to

where Pjk is the probability for class j within
cluster k, Njk is the number of class-j
patterns belonging to cluster k, and Nk is the
number of patterns belonging to cluster k.
7
Nearest Cluster Testing

For each input pattern, X, find the nearest
cluster, Ck, using the Euclidean distance
measure

where Y is a cluster center, and m is the
number of dimensions in the input patterns

Use the probabilities Pjk for all classes j
stored with Ck, and classify pattern X into the
class j with the highest probability.

8
K means Clustering

Initialize the number of cluster centers selected
by the user by randomly selecting them from the
training set.
Classify the entire training set. For each
pattern Xi, in the training set, find the nearest
cluster center C and classify Xi as a member of
C
For each cluster, recompute its center by finding
the mean of the cluster

where Mk is the new mean, Nk is the number of
training patterns in cluster k, and Xjk is the
j-th pattern belonging to cluster k

Write a Comment

User Comments (0)

About PowerShow.com

Clustering Algorithms PowerPoint PPT Presentation