Poster Template - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Poster Template

Description:

DATA CLUSTERING WITH KERNAL K-MEANS++ Matt Strautmann, Dept. of Electrical and Computer Engineering Dr. Donald C. Wunsch II, Dept. of Electrical and – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 2
Provided by: KKrishn
Category:

less

Transcript and Presenter's Notes

Title: Poster Template


1
DATA CLUSTERING WITH KERNAL K-MEANS
Matt Strautmann, Dept. of Electrical and Computer
Engineering
Dr. Donald C. Wunsch II, Dept. of Electrical and
Computer Engineering
  • PROJECT OBJECTIVES
  • PROJECT GOAL
  • Experimentally demonstrate the application of
    Kernel K-Means to non-linearly clusterable data
    sets
  • ACADEMIC IMPORTANCE
  • Expand the application of the Kernel K-Means
    clustering algorithm to non-traditional uses

PROJECT DATASETS
  • DISCUSSION
  • Kernel K-Means was found to cluster the test
    datasets in a superior manner over Soft K-Means
  • Kernel data-mapping was seen to solve the
    overlapping data sets by
  • Mapping the data before clustering to a
    higher-dimensional feature space using a
    nonlinear function
  • Partitioning the points with linear separators in
    the new space
  • Soft K-Means could not successfully cluster the
    Lung Cancer Dataset results were for one cluster
    out of three successfully clustered
  • Soft K-Means clustered the two dimension, two
    cluster Gaussian dataset with only one error out
    of the one thousand data points

Iris Plant Dataset
eleves.ens.fr
  • BACKGROUND
  • WHAT IS K-MEANS CLUSTERING?
  • K-Means clustering aims to divide the dataset
    into clusters (groups) in which each data point
    belongs to the cluster with the nearest mean
    vector.
  • WHAT IS KERNAL K-MEANS?
  • Sum-of-squares algorithm
  • Two step process data point assignment and
    update
  • WHAT IS THE PLUS PLUS INITIALIZATION SCHEME?
  • The first mean vector is a randomly selected data
    point
  • Each subsequent mean vector is created by
    evaluating randomly selected data points against
    a vector weighting probability

2 Dimension, 2 Cluster Dataset (Gaussian 2D2K)
2 Dimension, 2 Cluster Dataset (Gaussian 2D2K)
lans.ece.utexas.edu
lans.ece.utexas.edu
  • CONCLUDING REMARKS
  • The initialization was seen to be the most
    important factor in the algorithm converging
  • The PLUS PLUS cluster mean initialization was
    seen to improve the results
  • Kernel assignment works better than the maximum
    responsibility calculation of Soft K-Means
  • Kernel K-Means can handle small or large
    dimension datasets well the increase of
    dimensionally seemed to be advantageous for the
    Lung Cancer Dataset (56 dimensions) over the
    lower clustering accuracy of the Iris Plant
    Dataset (4 dimensions)
  • Kernel K-Means produced superior results to
    Soft K-Means when clustering the Lung Cancer
    Dataset and demonstrated recognition of all three
    clusters
  • SOFT K-MEANS VS. KERNEL K-MEANS

Soft K-Means Clustering Accuracy Average (over ten runs) Standard Deviation of Accuracy Calculation (over ten runs) Variance of Accuracy Calculation (over ten runs)
Iris Plant Dataset 28.00 8.218 2.867
Lung Cancer Dataset 43.75 - -
2D2K Gaussian Dataset 99.00 - -
8D5K Gaussian Dataset 58.50 2.082 0.043
http//en.wikipedia.org/wiki/K-means_clustering
http//en.wikipedia.org/wiki/K-means_clustering
Kernel K-Means Clustering Accuracy Average (over ten runs) Standard Deviation of Accuracy Calculation (over ten runs) Variance of Accuracy Calculation (over ten runs)
Iris Plant Dataset 57.00 5.009 2.238
Lung Cancer Dataset 62.00 6.878 0.473
2D2K Gaussian Dataset 96.50 1.677 0.028
8D5K Gaussian Dataset 76.31 10.366 1.075
2.) Voronoi Diagram Generated by the Means
(data points associated with nearest cluster
mean)
1.) Initial Mean Orientations
  • FUTURE WORK
  • Further improvement of the mean vector
    initialization is believed possible over the
    PLUS PLUS initialization
  • Other options for the mean-squared error
    calculation for data point evaluation are
    possible
  • The time analysis of the algorithm must be
    calculate
  • The author would like to acknowledge the
    expertise of Dr. Rui Xu in advising this project. 
  • RESULTS COMPARISON
  • Kernel K-Means clustering accuracy superior in
    all cases except the two dimensional, two cluster
    dataset.
  • The clustering accuracy of the datasets
    increased by the following amounts
  • Iris Plant 104
  • Lung Cancer 38
  • 2D2k -2.5
  • 8D5K 30

http//en.wikipedia.org/wiki/K-means_clustering
http//en.wikipedia.org/wiki/K-means_clustering
4.) Step 2 and 3 Repeated until Convergence
3.) Cluster Centroid Becomes New Cluster Mean
  • APPROACH
  • Evaluate standard K-Means (Soft) against 4
    datasets to form benchmark
  • Hybridize Soft K-Means with Kernel K-Means to
    form Kernel K-Means
  • Test Kernel K-Means on small size, small
    dimension Gaussian, large dimension Gaussian, and
    large size datasets

Acknowledgements
Write a Comment
User Comments (0)
About PowerShow.com