Title: Clustering and Visual Data Analysis
1Clustering and Visual Data Analysis
- Ata Kaban
- The University of Birmingham
2The Clustering Problem
Unsupervised Learning
Data (input)
Interesting structure (output)
- Interesting
- contains essential characteristics
- discards unessential details
- provides a summary the data (e.g. to visualise on
the screen) - compact
- interpretable for humans
- etc.
Objective function that expresses our notion of
interestingness for this data
3One reason for clustering of data
- Here is some data
- Assume you transmit the coordinates of points
drawn randomly from this data set - You are only allowed to send a small (say 2 or
3) bits per point - So it will be a lossy transmission
- Loss sum of squared errors between the
original and the decoded coordinates - What encoder / decoder will loose the least
information?
4(No Transcript)
5(No Transcript)
6(No Transcript)
7(No Transcript)
8(No Transcript)
9(No Transcript)
10Formalising
- What objective does K-means optimise?
- Given an encoder function ENCRT?1..K
- (T is dimension of data, K is number of clusters)
- Given a decoder function DEC1..K?RT
- DISTORTIONsum nxn-DECENC(xn)2
- where DEC(k)µk are centers of clusters, k1..K
- So, DISTORTIONsumnxn-µENC(xn)2, where n goes
from 1 to N, the number of points
11The minimal distortion
- DISTORTIONsumnxn-µENC(x_n)2
- This is minimised.
- What properties do µ1,µK satisfy for that?
- 1) each point xn must be encoded by its nearest
center, otherwise DISTORTION could be reduced by
replacing ENC(xn) with the nearest center of xn. - 2) each µk must be the centroid of its own points
12- If N is the known number of points and K the
desired number of clusters, the K-means algorithm
is - Begin
- initialize ?1, ?2, ,?K (randomly selected)
- do classify n samples according to nearest
?i - recompute ?i
- until no change in ?i
- return ?1, ?2, , ?K
- End
13(No Transcript)
14(No Transcript)
15(No Transcript)
16Other forms of clustering
- Many times, clusters are not disjoint, but a
cluster may have subclusters, in turn having
sub-subclusters. - ?Hierarchical clustering
17- Given any two samples x and x, they will be
grouped together at some level, and if they are
grouped a level k, they remain grouped for all
higher levels - Hierarchical clustering ? tree representation
called dendrogram
18- The similarity values may help to determine if
the grouping are natural or forced, but if they
are evenly distributed no information can be
gained - Another representation is based on set, e.g., on
the Venn diagrams
19- Hierarchical clustering can be divided in
agglomerative and divisive. - Agglomerative (bottom up, clumping) start with n
singleton cluster and form the sequence by
merging clusters - Divisive (top down, splitting) start with all of
the samples in one cluster and form the sequence
by successively splitting clusters
20- Agglomerative hierarchical clustering
- The procedure terminates when the specified
number of cluster has been obtained, and returns
the cluster as sets of points, rather than the
mean or a representative vector for each cluster
21The problem of the number of clusters
- Typically, the number of clusters is known.
- When its not, that is a hard problem called
model selection. There are several ways of
proceed. - A common approach is to repeat the clustering
with c1, c2, c3, etc.
22What did we learn today?
- Data clustering
- K-means algorithm in detail
- How K-means can get stuck and how to take care of
that - The outline of Hierarchical clustering methods
23Pattern ClassificationFind out more
here!Pattern Classification (2nd ed) by R. O.
Duda, P. E. Hart and D. G. Stork, John Wiley
Sons, 2000
24Pattern ClassificationFind out more
here!Pattern Classification (2nd ed) by R. O.
Duda, P. E. Hart and D. G. Stork, John Wiley
Sons, 2000