Clustering Algorithms - PowerPoint PPT Presentation

1 / 8
About This Presentation
Title:

Clustering Algorithms

Description:

where: Pjk is the probability for class j within cluster k, ... Use the probabilities Pjk for all classes j stored with Ck, and classify pattern ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 9
Provided by: mcatheri
Category:

less

Transcript and Presenter's Notes

Title: Clustering Algorithms


1
Clustering Algorithms
2
K Nearest Neighbors (KNN)
X
3
K Nearest Neighbor
  • Store all input/output pairs in the training set
  • For each pattern in the test set
  • Search for the K nearest patterns to the input
    patterns using a Euclidean distance measure
  • For classification, compute the confidence for
    each class as Ci/K, where Ci is the number of
    patterns among the K nearest patterns belonging
    to class i The classification of the input
    pattern is the class with the highest confidence.
  • For estimation, the output value is based on the
    average of the output values of the K nearest
    patterns

4
K Nearest Neighbor Settings
  • Number of Nearest Neighbors (K)
  • should be based on cross validation over many K
    settings. Generally, where p is the total
    number of training patterns.
  • Input Compression
  • used if storage/memory is an issue
  • affects precision of algorithm
  • Distance Metric
  • Examples Euclidean, Manhattan, absolute
    dimension
  • Combination of the k neighbors
  • make them equal or weighted average
  • May use Principle Component Analysis to map
    higher dimensional inputs into key meaningful
    dimensions for feasible KNN problem

5
Nearest Cluster
  • A condensed version of KNN generally used for
    classification
  • Partitions the training set into a few clusters
    of neighbors
  • Each cluster has numerical value for posterier
    probability of all possible classes given the
    input attributes for the members of the cluster
  • A new item is classified by finding its nearest
    cluster and using that clusters posterier
    probability estimates to estimate the class for
    the new item.

6
Nearest Cluster Training
  • Perform K means clustering on the training data
  • For each cluster, generate a probability for each
    class according to

where Pjk is the probability for class j within
cluster k, Njk is the number of class-j
patterns belonging to cluster k, and Nk is the
number of patterns belonging to cluster k.
7
Nearest Cluster Testing
  • For each input pattern, X, find the nearest
    cluster, Ck, using the Euclidean distance
    measure

where Y is a cluster center, and m is the
number of dimensions in the input patterns
  • Use the probabilities Pjk for all classes j
    stored with Ck, and classify pattern X into the
    class j with the highest probability.

8
K means Clustering
  • Initialize the number of cluster centers selected
    by the user by randomly selecting them from the
    training set.
  • Classify the entire training set. For each
    pattern Xi, in the training set, find the nearest
    cluster center C and classify Xi as a member of
    C
  • For each cluster, recompute its center by finding
    the mean of the cluster

where Mk is the new mean, Nk is the number of
training patterns in cluster k, and Xjk is the
j-th pattern belonging to cluster k
Write a Comment
User Comments (0)
About PowerShow.com