Sound Mathematical Foundation and Modification of TRICLUSTER

About This Presentation

Title:

Sound Mathematical Foundation and Modification of TRICLUSTER

Description:

... such as feature based clustering, graph based clustering and pattern based clustering. ... let. A three dimensional microarray dataset is a real-valued ... – PowerPoint PPT presentation

Number of Views:21

Avg rating:3.0/5.0

Slides: 15

Provided by: golammors

Category:

more less

Transcript and Presenter's Notes

Title: Sound Mathematical Foundation and Modification of TRICLUSTER

1
Sound Mathematical Foundation and Modification of
TRICLUSTER

Presented by
Morshed Osmani

2
Problem Statement

We have a paper on clustering. A new method
TriCluster is used to cluster gene expression
data.
The paper has a weak mathematical notation system
(as indicated by Dr. Perrizo).
Large search space.

3
My Planned Research-Work

Correct those mathematical notation
Develop method for pruning large search space
Implement the modified solution and get some
simulation result
Compare the modified algorithm with original
papers algorithm

4
Original Paper

TRICLUSTER An Effective Algorithm for Mining
Coherent Clusters in 3D Microarray Data
Lizhuang Zhao, Mohammed J. Zaki
Rensselaer Polytechnic Institute, New York
ACM SIGMOD international conference on Management
of data, 2005

5
Introduction

Traditional clustering algorithms work in the
full dimensional space.
Biclustering, on the other hand, does not have
such a strict requirement. If some points are
similar in several dimensions (a subspace), they
will be clustered together in that subspace.
Biclustering is able to identify the
co-expression patterns of a subset of genes that
might be relevant to a subset of the samples of
interest.

6
Introduction (Cont.)

There has been a lot of interest in mining gene
expression patterns across time. These approaches
are also mainly two-dimensional, i.e., finding
patterns along the gene-time dimensions.
The paper deals with mining tri-clusters, i.e.,
mining coherent clusters along the
gene-sample-time (temporal) or gene-sample-region
(spatial) dimensions.
The authors claim TRICLUSTER is the first 3D
microarray subspace clustering method.

7
Related Work

There has been work on mining gene expression
patterns across time.
There are many full-space and biclustering
algorithms designed to work with microarray
datasets, such as feature based clustering, graph
based clustering and pattern based clustering.
There is no previous method that mines
tri-clusters.

8
Challenges

Biclustering itself is known to be a NP-hard
problem. So heuristic methods or probabilistic
approximations are used .
Microarray data is inherently susceptible to
noise, due to varying experimental conditions,
thus it is essential that the methods be robust
to noise.
As we do not understand the complex gene
regulation circuitry in the cell, clustering
methods should allow overlapping clusters that
share subsets of genes, samples or
time-courses/spatial regions.
Furthermore, the methods should be flexible
enough to mine several (interesting) types of
clusters, and should not be too sensitive to
input parameters.

9
Mathematical Notations Used
Let
be a set of n genes,
let
be a set of m biological samples (e.g.,
different tissues or experiments)
be a set of l experimental time points.
let
matrix
A three dimensional microarray dataset is a
real-valued
whose three dimensions correspond to genes,
samples and times respectively
A tricluster C is a submatrix of the dataset D,
provided certain conditions of homogeneity are
satisfied.
10
Problems with Notation

Used unconventional notation system which seems
to be incorrect.
May confuse the reader to comprehend the actual
meaning.
Wrong use of Cartesian product.
May be solved by using mapping (function).

11
The TRICLUSTER Algorithm

3D microarray datasets have more genes than
samples, and perhaps an equal number of time
points and samples, i.e.,
Due to the symmetric property, TRICLUSTER always
transposes the input 3D matrix such that
the dimension with the largest cardinality (say
G) is 1st dimension
then make S as the 2nd and T as the 3rd
dimension.

12
Steps of TRICLUSTER

TRICLUSTER has following main steps
For each GxS time slice matrix, find the valid
ratio-ranges for all pair of samples, and
construct a range multigraph
Mine the maximal biclusters from the range
multigraph
Construct a graph based on the mined biclusters
(as vertices) and get the maximal TRICLUSTERs
Optionally, delete or merge clusters if certain
overlapping criteria are met.

13
Future Direction

Reduce the search space
- How (yet to be determined)
Implement the modified algorithm
Compare the result with result from previous
algorithm

14
Need Suggestion

Here dataset D is a data cube. What are the
benefits we may receive from the data cube
model in this case?
How can this problem be integrated to DataDEX (if
there is a possibility)?
How can we reduce the search space?
Any other comments or suggestions are welcome.

Write a Comment

User Comments (0)