Similarity and Dissimilarity ,Biclustering - PowerPoint PPT Presentation

About This Presentation
Title:

Similarity and Dissimilarity ,Biclustering

Description:

Similarity and Dissimilarity ,Biclustering, multiview clustering ,K-mean clustering ,K- meloids clustering – PowerPoint PPT presentation

Number of Views:12
Slides: 42
Provided by: srilekh1214
Category:
Tags:

less

Transcript and Presenter's Notes

Title: Similarity and Dissimilarity ,Biclustering


1
UNIT-I V
2
  • Measuring (dis)similarity -Evaluating output of
    clustering methods Spectral clustering-
    Hierarchical clustering- Agglomerative
    clustering- Divisive clustering- Choosing the
    number of clusters- Clustering data points and
    features- Bi-clustering- Multi-view clustering
    -K-Means clustering- K-meloids clustering-
    Application image segmentation using K- means
    clustering

3
What are similarity and dissimilarity
measures?
  • Similarities are usually non-negative and are
    often between 0 (no similarity) and 1(complete
    similarity). The dissimilarity between two
    objects is the numerical measure of the degree to
    which the two objects are different.
    Dissimilarity is lower for more similar pairs of
    objects.

4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
  • Manhattan distance is a metric in which the
    distance between two points is the sum of the
    absolute differences of their Cartesian
    coordinates.
  • In simple language, it is the absolute sum of
    the difference between the x-coordinates  and
    y-coordinates of each of the points.
  • Suppose we have a point A and a point B.
  • To find the Manhattan distance between them, we
    just have to sum up the absolute variation along
    the x and y axes.
  • We find Manhattan distance between two points by
    measuring along axis at right angles.In a plane
    with p1 at (x1, y1) and p2 at (x2, y2).Manhattan
    distance x1 x2 y1 y2

9
(No Transcript)
10
Clustering in Machine Learning
  • Clustering or cluster analysis is a machine
    learning technique, which groups the unlabelled
    dataset.
  • It can be defined as "A way of grouping the data
    points into different clusters, consisting of
    similar data points.
  • The objects with the possible similarities remain
    in a group that has less or no similarities with
    another group."

11
  • It does it by finding some similar patterns in
    the unlabelled dataset such as shape, size,
    color, behavior, etc., and divides them as per
    the presence and absence of those similar
    patterns.
  • It is an unsupervised learning method, hence no
    supervision is provided to the algorithm, and it
    deals with the unlabeled dataset.
  • After applying this clustering technique, each
    cluster or group is provided with a cluster-ID.
    ML system can use this id to simplify the
    processing of large and complex datasets.

12
  • Example Let's understand the clustering
    technique with the real-world example of Mall
    When we visit any shopping mall, we can observe
    that the things with similar usage are grouped
    together. Such as the t-shirts are grouped in one
    section, and trousers are at other sections,
    similarly, at vegetable sections, apples,
    bananas, Mangoes, etc., are grouped in separate
    sections, so that we can easily find out the
    things.
  • The clustering technique also works in the same
    way. Other examples of clustering are grouping
    documents according to the topic.
  • The clustering technique can be widely used in
    various tasks. Some most common uses of this
    technique are
  • Market Segmentation
  • Statistical data analysis
  • Social network analysis
  • Image segmentation
  • etc.

13
  • Apart from these general usages, it is used by
    the Amazon in its recommendation system to
    provide the recommendations as per the past
    search of products. 
  • Netflix also uses this technique to recommend the
    movies and web-series to its users as per the
    watch history.
  • The below diagram explains the working of the
    clustering algorithm. We can see the different
    fruits are divided into several groups with
    similar properties.

14
(No Transcript)
15
What is spectral clustering algorithm?
  • Spectral clustering is a technique with roots in
    graph theory, where the approach is used to
    identify communities of nodes in a graph based on
    the edges connecting them.
  • The method is flexible and allows us to cluster
    non graph data as well.

16
To perform a spectral clustering we need 3 main
steps
  • Create a similarity graph between our N objects
    to cluster.
  • Compute the first k eigenvectors of its Laplacian
    matrix to define a feature vector for each
    object.
  • Run k-means on these features to separate objects
    into k classes.

17
(No Transcript)
18
Spectral Clustering Matrix Representation
  • Adjacency and Affinity Matrix (A)

19
Degree Matrix (D)
  • A Degree Matrix is a diagonal matrix, where the
    degree of a node (i.e. values) of the diagonal is
    given by the number of edges connected to it.
  • We can also obtain the degree of the nodes by
    taking the sum of each row in the adjacency
    matrix.

20
(No Transcript)
21
Laplacian Matrix (L)
  • This is another representation of the graph/data
    points, which attributes to the beautiful
    properties leveraged by Spectral Clustering.
  • One such representation is obtained by
    subtracting the Adjacency Matrix from the Degree
    Matrix (i.e. L D A).

22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
Choosing the number of clusters in hierarchical
clustering
  • To get the optimal number of clusters for
    hierarchical clustering, we make use a dendrogram
    which is tree-like chart that shows the sequences
    of merges or splits of clusters.
  • If two clusters are merged, the dendrogram will
    join them in a graph and the height of the join
    will be the distance between those clusters.

28
How do you select the number of clusters in
hierarchical clustering?
  • Decide the number of clusters (k) Select k random
    points from the data as centroid. Assign all the
    points to the nearest cluster centroid. Calculate
    the centroid of newly formed clusters.

29
How do you determine the number of clusters?
  • Compute clustering algorithm (e.g., k-means
    clustering) for different values of k. ...
  • For each k, calculate the total within-cluster
    sum of square (wss).
  • Plot the curve of wss according to the number of
    clusters k.

30
Do we need to define the number of clusters in
advance for hierarchical clustering?
  • Hierarchical clustering does not require you to
    pre-specify the number of clusters, the way that
    k-means does, but you do select a number of
    clusters from your output.

31
Clustering data points and features
  • Clustering is the task of dividing the population
    or data points into a number of groups such that
    data points in the same groups are more similar
    to other data points in the same group than those
    in other groups. In simple words, the aim is to
    segregate groups with similar traits and assign
    them into clusters.

32
What are features in clustering?
  • A clustering feature is essentially a summary of
    the statistics for the given cluster.
  • Using a clustering feature, we can easily derive
    many useful statistics of a cluster. For example,
    the cluster's centroid, x0, radius, R, and
    diameter, D, are.

33
What is clustering in business intelligence?
  • Cluster analysis or simply clustering is the
    process of partitioning a set of data objects (or
    observations) into subsets. ... In business
    intelligence, clustering can be used to organize
    a large number of customers into groups, where
    customers within a group share strong similar
    characteristics.

34
What is Biclustering used for?
  • Biclustering is a powerful data mining technique
    that allows clustering of rows and columns,
    simultaneously, in a matrix-format data set. It
    was first applied to gene expression data in
    2000, aiming to identify co-expressed genes under
    a subset of all the conditions/samples.

35
What is clustering algorithm in BI?
  • The K- Means Clustering algorithm is a process by
    which objects are classified into number of
    groups so that they are as much dissimilar as
    possible from one group to another, and as much
    similar as possible within each group. KMeans
    Clustering is a grouping of similar things or
    data.

36
(No Transcript)
37
(No Transcript)
38
What is multi-view clustering?
  • Multi-view graph clustering This category
    of methods seeks to find a fusion graph (or
    network) across all views and then uses graph-cut
    algorithms or other technologies (e.g., spectral
    clustering) on the fusion graph in order to
    produce the clustering result.

39
(No Transcript)
40
(No Transcript)
41
K-mean clustering
  • Steps
  • Take mean value
  • Find the nearest number to mean put it in the
    cluster
  • Repeat step 12 until we get same mean .
Write a Comment
User Comments (0)
About PowerShow.com