Clustering%20Methods presentation

About This Presentation

Transcript and Presenter's Notes

Title: Clustering%20Methods

1
Clustering Methods Part 6
Dimensionality
Ilja Sidoroff Pasi Fränti
Speech and Image Processing UnitDepartment of
Computer Science University of Joensuu, FINLAND
2
Dimensionality of data

Dimensionality of data set the minimum number
of free variables needed to represent data
without information loss
An d-attribute data set has an intrinsic
dimensionality (ID) of M if its elements lie
entirely within an M-dimensional subspace of Rd
(M lt d)

3
Dimensionality of data

The use of more dimensions than necessary leads
to problems
greater storage requirements
the speed of algorithms is slower
finding clusters and creating good classifiers is
more difficult (curse of dimensionality)

4
Curse of dimensionality

When the dimensionality of space increases,
distance measures become less useful
all points are more or less equidistant
most of the volume of a sphere is concentrated on
a thin layer near the surface of the sphere (eg.
next slide)

5
V(r) volume of sphere with radius r D
dimension of the sphere
6
Two approaches

Estimation of dimensionality
knowing ID of data set could help in tuning
classification or clustering performance
Dimensionality reduction
projecting data to some subspace
eg. 2D/3D visualisation of multi-dimensional data
set
may result in information loss if the subspace
dimension is smaller than ID

7
Goodness of the projection

Can be estimated by two measures
Trustworthiness data points that are not
neighbours in input space are not mapped as
neighbours in output space.
Continuity data points that are close are not
mapped far away in output space 11.

8
Trustworthiness

N - number of feature vectors
r(i,j) the rank of data sample j in the
ordering according to the distance from i in the
original data space
Uk(i) set of feature vectors that are in the
size k-neighbourhood of sample i in the
projection space but not in the original space
A(k) Scales the measure between 0 and 1

9
Continuity

r'(i,j) the rank of data sample j in the
ordering according to the distance from i in the
projection space
Vk(i) set of feature vectors that are in the
size k-neighbourhood of sample i in the original
space but not in the projection space

10
Example data sets

Swiss roll 20000 3D points
2D manifold in 3D space
http//isomap.stanford.edu

11
Example data sets

16 ? 16 pixel images of hands in different
positions
Each image can be considered as 4096-dimensional
data element
Could also be interpreted in terms of finger
extension wrist rotation (2D)

12
Example data sets
http//isomap.stanford.edu
13
Synthetic data sets 11
S-shaped manifold
Sphere
Six clusters
14
Principal component analysis (PCA)

Idea find directions of maximal variance and
align coordinate axis to them.
If variance is zero, that dimension is not
needed.
Drawback works well only with linear data 1

15
PCA method (1/2)

Center data so that its means are zero
Calculate covariance matrix for data
Calculate eigenvalues and eigenvectors of the
covariance matrix
Arrange eigenvectors according to the eigenvalues
For dimensionality reduction, choose the desired
number of eigenvectors (2 or 3 for visualization)

16
PCA Method

Intrinsic dimensionality number of non-zero
eigenvalues
Dimensionality reduction by projection yi
Axi
Here xi is the input vector, yi the output
vector, and A is the matrix containing
eigenvectors corresponding to the largest
eigenvalues.
For visualization typically 2 or 3 eigenvalues
preserved.

17
Example of PCA

The distances between points are different in
projections.
Test set c
two clusters are projected into one cluster
s-shaped cluster is projected nicely

18
Another example of PCA 10

Data set point lying on circle (x2 y2 1),
ID 2
PCA yield two non-null eigenvalues
u, v principal components

19
Limitations of PCA

Since eigenvectors are orthogonal works well only
with linear data
Tends to overestimate ID
Kernel PCA uses so called kernel trick to apply
PCA also to non linear data
make non linear projection into a higher
dimensional space, perform PCA analysis in this
space

20
Multidimensional scaling method (MDS)

Project data into a new space while trying to
preserve distances between data points
Define stress E (difference of pairwise distances
in original and projection spaces)
E is minimized using some optimization algorithm
With certain stress functions (i.e. Kruskal) when
E is 0, perfect projection exists
ID of the data is the smallest projection
dimension where perfect projection exists

21
Metric MDS

The simplest stress function 2, raw stress

d(xi, xj) distance in the original space d(yi,
yj) distance in the projection space yi,
yj representation of xi, xj in output space
22
Sammon's Mapping

Sammon's mapping gives small distances a larger
weight 5

23
Kruskal's stress

Ranking the point distances accounts for
decreasing distances in lower dimensional
projections

24
MDS example

Separates clusters better than PCA
Local structures are not always preserved
(leftmost test set)

25
Other MDS approaches

ISOMAP 12
Curvilinear component analysis CCA 13

26
Local methods

Previous methods are global in the sense that the
all input data is considered at once.
Local methods consider only some neighbourhood of
data points ? may be computationally less
demanding
Try to estimate topological dimension of the data
manifold

27
Fukunaga-Olsen algorithm 6

Assume that data can be divided into small
regions, i.e. clustered
Each cluster (voronoi set) of the data vector
lies in an approximately linear surface gt PCA
method can be applied to each cluster
Eigenvalues are normalized by diving by the
largest eigenvalue

28
Fukunaga-Olsen algorithm

ID is defined as the number of normalized
eigenvalues that are larger than a threshold T
Defining a good threshold is a problem as such

29
Near neighbour algorithm

Trunk's method 7
An initial value for an integer parameter k is
chosen (usually k1).
k nearest neighbours for each data vector are
identified.
for each data vector i, subspace spanned by
vectors from i to each of its k neighbours is
constructed.

30
Near neighbour algorithm

The angle between (k1)th near neighbour and its
projection to the subspace is calculated for each
data vector
If the average of these angles is below a
threshold, ID is k, otherwise increase k and
repeat the process

angle
subspace
31
Pseudocode
32
Near neighbour algorithm

It is not clear how to select suitable value for
threshold
Improvements to Trunk's method
Pettis et al. 8
Verver-Duin 9

33
Fractal methods

Global methods, but different definition of
dimensionality
Basic idea
count the observations inside a ball of radius r
(f(r)).
analyse the growth rate of f(r)
if f grows as rk the dimensionality of data can
be considered as k

34
Fractal methods

Dimensionality can be fractional, i.e. 1.5
So does not provide projections for lesser
dimensional space (what is an R1,5 anyway?)
Fractal dimensionality estimate can be used in
time-series analysis etc. 10

35
Fractal methods

Different definitions for fractal dimensions 10
Hausdorff dimension
Box-counting dimension
Correlation dimension
In order to get an accurate estimate of the
dimension D, the data set cardinality must be at
least 10D/2

36
Hausdorff dimension

data set is covered by cells si with variable
diameter ri, all ri lt r
in other words, we look for collection of
covering sets si with diameter less than or equal
to r, which minimizes the sum
d-dimensional Hausdorff measure

37
Hausdorff dimension

For every data set GdH is infinite if d is less
than some critical value DH, and 0 if d is
greater than DH
The critical value DH is the Hausdorff dimension
of the data set

38
Box-Counting dimension

Hausdorff dimension is not easy to calculate
Box-Counting DB dimension is an upper bound of
Hausdorff dimension, does not usually differ from
it

v(r) is the number of the boxes of size r
needed to cover the data set
39
Box-Counting dimension

Although Box-Counting dimension is easier to
calculate than Hausdorff dimension, the
algorithmic complexity grows exponentially with
the set dimensionality gt can be used only for
low-dimensional data sets
Correlation dimension is computationally more
feasible fractal dimension measure
Correlation dimension is an lower bound of the
Box-Counting dimension

40
Correlation dimension

Let x1, x2, x3, ... , xN be data points
Correlation integral can be defined as

I(x) is indicator function I(x) 1, iff x is
true, I(x) 0, otherwise.
41
Correlation dimension

(some explanation needed!!!)

42
Literature

M. Kirby, Geometric Data Analysis An Empirical
Approach to Dimensionality Reduction and the
Study of Patterns, John Wiley and Sons, 2001.
J. B. Kruskal, Multidimensional scaling by
optimizing goodness of ?t to a nonmetric
hypothesis, Psychometrika 29 (1964) 127.
R. N. Shepard, The analysis of proximities
Multimensional scaling with an unknown distance
function, Psychometrika 27 (1962) 125140.
R. S. Bennett, The intrinsic dimensionality of
signal collections, IEEE Transactions on
Information Theory 15 (1969) 517525.
J. W. J. Sammon, A nonlinear mapping for data
structure analysis, IEEE Transaction on Computers
C-18 (1969) 401409.
K. Fukunaga, D. R. Olsen, An algorithm for ?nding
intrinsic dimensionality of data, IEEE
Transactions on Computers 20 (2) (1976) 165171.
G. V. Trunk, Statistical estimation of the
intrinsic dimensionality of a noisy signal
collection, IEEE Transaction on Computers 25
(1976) 165171.

43
Literature

K. Pettis, T. Bailey, T. Jain, R. Dubes, An
intrinsic dimensionality estimator from
near-neighbor information, IEEE Transaction on
Pattern Analysis and Machine Intelligence 1 (1)
(1979) 2537.
P. J. Verveer, R. Duin, An evaluation of
intrinsic dimensionality estimators, IEEE
Transaction on Pattern Analysis and Machine
Intelligence 17 (1) (1995) 8186.
F. Camastra, Data dimensionality estimation
methods a survey, Pattern Recognition 36 (2003)
2945-2954.
J. Venna, Dimensionality reduction for visual
exploration of similarity structures (2007), PhD
thesis manuscript (submitted)
J. B. Tenenbaum, V. de Silva, J. C. Langford, A
global geometric framework for nonlinear
dimensionality reduction, Science 290 (12) (2000)
23192323.
P. Demartines, J. Herault, Curvilinear component
analysis A self-organizing neural network for
nonlinear mapping in cluster analysis, IEEE
Transactions on Neural Networks 8 (1) (1997)
148154.

Write a Comment

User Comments (0)

About PowerShow.com

Clustering%20Methods PowerPoint PPT Presentation