Clustering methods - PowerPoint PPT Presentation

About This Presentation
Title:

Clustering methods

Description:

Abstract ... RGB color values. University of Joensuu. Dept. of Computer ... Color reconstruction. Image with compression artifacts. Image with original colors ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 45
Provided by: csJoe
Category:

less

Transcript and Presenter's Notes

Title: Clustering methods


1
Clustering methods
Part 1 Introduction
Pasi Fränti 9.2.2017 Machine Learning School of
Computing University of Eastern Finland Joensuu,
FINLAND
2
Sample data
Sources of RGB vectors
Red-Green plot of the vectors
3
Sample data
Employment statistics
4
Application examples
5
Color reconstruction
Image with original colors
Image with compression artifacts
6
Speaker modelingfor voice biometrics
Tomi
Feature extraction and clustering
Mikko
Tomi
Matti
Matti
Training data
Mikko
Feature extraction
Speaker models
?
Best match Matti !
7
Speaker modeling
Speech data
Result of clustering
8
Image segmentation
Image with 4 color clusters
Normalized color plots according to red and
green components.
green
red
9
Signal quantization
Approximation of continuous range values (or a
very large set of possible discrete values) by a
small set of discrete symbols or integer values
Quantized signal
Original signal
10
Color quantization of images
Color image
RGB samples
Clustering
11
Users on map
12
Clustering the users
13
Clustering of photos in two ways
Clustering timeline
Clustering of photos
14
Photo clusters on map
Last known location of the user
User and date
Number of photos
Clusters
15
(No Transcript)
16
Clusters in the timeline view
Number of photos
Clusters
Functions
Open cluster
Start slideshow
17
Clustering GPS tracksMobile users, taxi routes,
fleet management
18
Conclusions from clusters
Cluster 2 Home
Cluster 1 Office
19
Clustering keywords
20
Clustering text descriptions
21
Home take care services
22
Clustering user preferences
23
Part IClustering problem
24
Subproblems of clustering
  1. Where are the clusters?(Algorithmic problem)
  2. How many clusters?(Methodological problem which
    criterion?)
  3. Selection of attributes (Application related
    problem)
  4. Preprocessing the data(Practical problems
    normalization, outliers)

25
Definitions and data
  • Set of N data points

Xx1, x2, , xN
Partition of the data
Pp1, p2, , pM,
Set of M cluster prototypes (centroids)
Cc1, c2, , cM,
26
Distance and cost function
Euclidean distance of data vectors
Total square error
27
Clustering result as partition
Cluster prototypes
Partition of data
Illustrated by Voronoi diagram
Illustrated by Convex hulls
28
Duality of partition and centroids
Cluster prototypes
Partition of data
Centroids as prototypes
Partition by nearestprototype mapping
29
Dependency of data structures
  • Centroid condition for a given partition (P),
    optimal cluster centroids (C) for minimizing MSE
    are the average vectors of the clusters
  • Optimal partition for a given centroids (C),
    optimal partition is the one with nearest
    centroid

30
K-means algorithm
31
K-means algorithm
X Data set C Cluster centroids P
Partition K-Means(X, C) ? (C, P) REPEAT Cprev ?
C FOR all i?1, N DO pi ? FindNearest(xi,
C) FOR all j?1, k DO cj ? Average of xi ?
pi j UNTIL C Cprev
Optimal partition
Optimal centoids
32
Summary
33
How to solve?
  • Solve the clustering
  • Given input data (X) of N data vectors, and
    number of clusters (M), find the clusters.
  • Result given as a set of prototypes, or
    partition.
  • Solve the number of clusters
  • Define appropriate cluster validity function f.
  • Repeat the clustering algorithm for several M.
  • Select the best result according to f.
  • Solve the problem efficiently.

Algorithmic problem
Mathematical problem
Computer science problem
34
Challenges in clustering
Incorrect cluster allocation
Incorrect number of clusters
Too many clusters
Cluster missing
Clusters missing
35
Taxonomy of clusteringJain, Murty, Flynn, Data
clustering A review, ACM Computing Surveys,
1999.
  • One possible classification based on cost
    function.
  • MSE is well defined and most popular.

36
Clustering method
  • Clustering method defines the problem
  • Clustering algorithm solves the problem
  • Problem defined as cost function
  • Goodness of one cluster
  • Similarity vs. distance
  • Global vs. local (merge cost, cut)
  • Solution algorithm to solve the problem

37
Complexity of clustering
  • Number of possible clusterings
  • Clustering problem is NP complete Garey et al.,
    1982
  • Optimal solution by branch-and-bound in
    exponential time.
  • Practical solutions by heuristic algorithms.

38
Software
39
Animator
http//cs.uef.fi/sipu/clustering/animator/
40
Clusterator
http//cs.uef.fi/paikka/Radu/clusterator/
41
Cluster software
http//cs.uef.fi/sipu/soft/cluster2009.exe
  • Main area working space for data
  • Input area inputs to be processed
  • Output areaobtained results
  • Menu Processselection of operation

42
Procedure to simulate k-means
Clustering image
Data set
Codebook
Partition
Open data set (file .ts), move it into Input
area Process Random codebook, select number of
clusters REPEAT Move obtained codebook from
Output area into Input area Process Optimal
partition, select Error function Move codebook
into Main area, partition into Input
area Process Optimal codebook UNTIL DESIRED
CLUSTERING
43
Conclusions
  • Clustering is a fundamental tool needed in
    everywhere in computer science and beyond.
  • Failing to do clustering properly may defect the
    application analysis.
  • Good clustering tool needed so that researchers
    can focus on application requirements.

44
Literature
  1. S. Theodoridis and K. Koutroumbas, Pattern
    Recognition, Academic Press, 3rd edition, 2006.
  2. C. Bishop, Pattern Recognition and Machine
    Learning, Springer, 2006.
  3. A.K. Jain, M.N. Murty and P.J. Flynn, Data
    clustering A review, ACM Computing Surveys,
    31(3) 264-323, September 1999.
  4. M.R. Garey, D.S. Johnson and H.S. Witsenhausen,
    The complexity of the generalized Lloyd-Max
    problem, IEEE Transactions on Information Theory,
    28(2) 255-256, March 1982.
  5. F. Aurenhammer Voronoi diagrams-a survey of a
    fundamental geometric data structure, ACM
    Computing Surveys, 23 (3), 345-405, September
    1991.
Write a Comment
User Comments (0)
About PowerShow.com