Similarity and Dissimilarity ,Biclustering - PowerPoint PPT Presentation

About This Presentation

Title:

Similarity and Dissimilarity ,Biclustering

Description:

Similarity and Dissimilarity ,Biclustering, multiview clustering ,K-mean clustering ,K- meloids clustering – PowerPoint PPT presentation

Number of Views:12

Slides: 42

Provided by: srilekh1214

Category:

Tags:

more less

Transcript and Presenter's Notes

Title: Similarity and Dissimilarity ,Biclustering

1
UNIT-I V
2

Measuring (dis)similarity -Evaluating output of
clustering methods Spectral clustering-
Hierarchical clustering- Agglomerative
clustering- Divisive clustering- Choosing the
number of clusters- Clustering data points and
features- Bi-clustering- Multi-view clustering
-K-Means clustering- K-meloids clustering-
Application image segmentation using K- means
clustering

3
What are similarity and dissimilarity
measures?

Similarities are usually non-negative and are
often between 0 (no similarity) and 1(complete
similarity). The dissimilarity between two
objects is the numerical measure of the degree to
which the two objects are different.
Dissimilarity is lower for more similar pairs of
objects.

4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8

Manhattan distance is a metric in which the
distance between two points is the sum of the
absolute differences of their Cartesian
coordinates.
In simple language, it is the absolute sum of
the difference between the x-coordinates and
y-coordinates of each of the points.
Suppose we have a point A and a point B.
To find the Manhattan distance between them, we
just have to sum up the absolute variation along
the x and y axes.
We find Manhattan distance between two points by
measuring along axis at right angles.In a plane
with p1 at (x1, y1) and p2 at (x2, y2).Manhattan
distance x1 x2 y1 y2

9
(No Transcript)
10
Clustering in Machine Learning

Clustering or cluster analysis is a machine
learning technique, which groups the unlabelled
dataset.
It can be defined as "A way of grouping the data
points into different clusters, consisting of
similar data points.
The objects with the possible similarities remain
in a group that has less or no similarities with
another group."

It does it by finding some similar patterns in
the unlabelled dataset such as shape, size,
color, behavior, etc., and divides them as per
the presence and absence of those similar
patterns.
It is an unsupervised learning method, hence no
supervision is provided to the algorithm, and it
deals with the unlabeled dataset.
After applying this clustering technique, each
cluster or group is provided with a cluster-ID.
ML system can use this id to simplify the
processing of large and complex datasets.

Example Let's understand the clustering
technique with the real-world example of Mall
When we visit any shopping mall, we can observe
that the things with similar usage are grouped
together. Such as the t-shirts are grouped in one
section, and trousers are at other sections,
similarly, at vegetable sections, apples,
bananas, Mangoes, etc., are grouped in separate
sections, so that we can easily find out the
things.
The clustering technique also works in the same
way. Other examples of clustering are grouping
documents according to the topic.
The clustering technique can be widely used in
various tasks. Some most common uses of this
technique are
Market Segmentation
Statistical data analysis
Social network analysis
Image segmentation
etc.

Apart from these general usages, it is used by
the Amazon in its recommendation system to
provide the recommendations as per the past
search of products.
Netflix also uses this technique to recommend the
movies and web-series to its users as per the
watch history.
The below diagram explains the working of the
clustering algorithm. We can see the different
fruits are divided into several groups with
similar properties.

14
(No Transcript)
15
What is spectral clustering algorithm?

Spectral clustering is a technique with roots in
graph theory, where the approach is used to
identify communities of nodes in a graph based on
the edges connecting them.
The method is flexible and allows us to cluster
non graph data as well.

16
To perform a spectral clustering we need 3 main
steps

Create a similarity graph between our N objects
to cluster.
Compute the first k eigenvectors of its Laplacian
matrix to define a feature vector for each
object.
Run k-means on these features to separate objects
into k classes.

17
(No Transcript)
18
Spectral Clustering Matrix Representation

Adjacency and Affinity Matrix (A)

19
Degree Matrix (D)

A Degree Matrix is a diagonal matrix, where the
degree of a node (i.e. values) of the diagonal is
given by the number of edges connected to it.
We can also obtain the degree of the nodes by
taking the sum of each row in the adjacency
matrix.

20
(No Transcript)
21
Laplacian Matrix (L)

This is another representation of the graph/data
points, which attributes to the beautiful
properties leveraged by Spectral Clustering.
One such representation is obtained by
subtracting the Adjacency Matrix from the Degree
Matrix (i.e. L D A).

22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
Choosing the number of clusters in hierarchical
clustering

To get the optimal number of clusters for
hierarchical clustering, we make use a dendrogram
which is tree-like chart that shows the sequences
of merges or splits of clusters.
If two clusters are merged, the dendrogram will
join them in a graph and the height of the join
will be the distance between those clusters.

28
How do you select the number of clusters in
hierarchical clustering?

Decide the number of clusters (k) Select k random
points from the data as centroid. Assign all the
points to the nearest cluster centroid. Calculate
the centroid of newly formed clusters.

29
How do you determine the number of clusters?

Compute clustering algorithm (e.g., k-means
clustering) for different values of k. ...
For each k, calculate the total within-cluster
sum of square (wss).
Plot the curve of wss according to the number of
clusters k.

30
Do we need to define the number of clusters in
advance for hierarchical clustering?

Hierarchical clustering does not require you to
pre-specify the number of clusters, the way that
k-means does, but you do select a number of
clusters from your output.

31
Clustering data points and features

Clustering is the task of dividing the population
or data points into a number of groups such that
data points in the same groups are more similar
to other data points in the same group than those
in other groups. In simple words, the aim is to
segregate groups with similar traits and assign
them into clusters.

32
What are features in clustering?

A clustering feature is essentially a summary of
the statistics for the given cluster.
Using a clustering feature, we can easily derive
many useful statistics of a cluster. For example,
the cluster's centroid, x0, radius, R, and
diameter, D, are.

33
What is clustering in business intelligence?

Cluster analysis or simply clustering is the
process of partitioning a set of data objects (or
observations) into subsets. ... In business
intelligence, clustering can be used to organize
a large number of customers into groups, where
customers within a group share strong similar
characteristics.

34
What is Biclustering used for?

Biclustering is a powerful data mining technique
that allows clustering of rows and columns,
simultaneously, in a matrix-format data set. It
was first applied to gene expression data in
2000, aiming to identify co-expressed genes under
a subset of all the conditions/samples.

35
What is clustering algorithm in BI?

The K- Means Clustering algorithm is a process by
which objects are classified into number of
groups so that they are as much dissimilar as
possible from one group to another, and as much
similar as possible within each group. KMeans
Clustering is a grouping of similar things or
data.

36
(No Transcript)
37
(No Transcript)
38
What is multi-view clustering?

Multi-view graph clustering This category
of methods seeks to find a fusion graph (or
network) across all views and then uses graph-cut
algorithms or other technologies (e.g., spectral
clustering) on the fusion graph in order to
produce the clustering result.

39
(No Transcript)
40
(No Transcript)
41
K-mean clustering