Cluster Analysis - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Cluster Analysis

Description:

Cluster Analysis Classifying the Exoplanets Cluster Analysis Simple idea, difficult execution Used for indexing large amounts of data in databases. – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 13
Provided by: Information2216
Category:

less

Transcript and Presenter's Notes

Title: Cluster Analysis


1
Cluster Analysis
  • Classifying the Exoplanets

2
Cluster Analysis
  • Simple idea, difficult execution
  • Used for indexing large amounts of data in
    databases. (very hot skill to have 70/hour)
  • The best form of cluster analysis is ordination,
    because ordination is not a form of cluster
    analysis. Morgan Byron
  • No formal def. of a cluster
  • Results are descriptive and subjective.

3
R Commands
  • library("scatterplot3d")
  • scatterplot3d(log(planetsmass),
    log(planetsperiod), log(planetseccen), type
    "h", angle 55, scale.y 0.7, pch 16,
    y.ticklabs seq(0, 10, by 2), y.margin.add
    0.1)
  • Taking the log of the each data point
  • Setting the angle and the physical scale so it
    looks like a box
  • Pch is the symbol used for the data point
  • Seq() function sets the numeric scales
  • Y.margin.add adds a bit to the vertical margins

4
Interpretation
  • No real insight after our first view of the data,
    but it looks neat.

5
R Commands
  • rge lt- apply(planets, 2, max) - apply(planets, 2,
    min)
  • Stores the range of the data
  • 2 indicates the column margin of the data matrix
  • planet.dat lt- sweep(planets, 2, rge, FUN "/")
  • Divides each element in the matrix by the range
    of the column margin
  • n lt- nrow(planet.dat)
  • wss lt- rep(0, 10)
  • Creates a 10 dimensional vector of all 0s
  • wss1 lt- (n-1)sum(apply(planet.dat, 2, var))
  • This is the sum of squares of all the points if
    we partition the data in 1 group.
  • for (i in 210) wssi lt- sum(kmeans(planet.dat,
    centers i)withinss)
  • Using the kmeans method, as the number of
    partitions increases, calculates the sum of
    squares of the members of each group.

6
The K-Means Method
  • This method uses different ways of minimizing a
    numerical value - often a notion of distance- by
    partitioning the data.
  • The method used in this analysis is minimizing
    the sums of squares of data within a group, and
    finding a number of groups that has the lowest SS
  • This method can be impractical with the number of
    partitions increasing very quickly as the number
    of groups and data points increases.

7
The Elbow
  • In choosing a good number of partitions, the
    elbow or the sharpest angle in the graph is an
    easy approach.
  • The steepest angles look to be at 3 and 5 number
    of groups.

8
Number of planets in the groups
  • planet_kmeans3 lt- kmeans(planet.dat, centers
    3)
  • We chose to try 3 groups
  • table(planet_kmeans3cluster)
  • 1 2 3
  • 14 53 34
  • ccent lt- function(cl)
  • f lt- function(i) colMeans(planetscl i, )
  • Finds the mean for each cluster
  • x lt- sapply(sort(unique(cl)), f)
  • Sorts
  • colnames(x) lt- sort(unique(cl))
  • return(x)

9
The results
  • gt ccent(planet_kmeans3cluster)
  • Cluster 1 2 3
  • mass 10.56786 1.6710566 2.9276471
  • period 1693.17201 427.7105892 616.0760882
  • eccen 0.36650 0.1219491 0.4953529
  • Number 14 53 34

10
Model-Based Clustering in brief
  • The subjective decision or assumption is the
    number of clusters.
  • After that, it becomes a problem of maximizing
    the likelihood that a partition is the best.

11
Mclust function
  • Mclust find an appropriate model AND the optimal
    number of groups.
  • Not Free?!! Need a liscence agreement from
    University of Washington.
  • R Commands
  • Library(mclust)
  • Planet_mclust lt- Mclust(planet.dat)
  • Plot(planet_mclust, planet.dat)
  • Print(planet_mclust)
  • The best model is of diagonal clusters of
    varying volume and shape with 3 groups

12
Homework
  • Spend 30 minutes attempting exercise 15.1 and
    send me what you get done.
  • Stick it to the Man!
  • Then practice your air guitar
  • zweihanderdawg_at_gmail.com
Write a Comment
User Comments (0)
About PowerShow.com