Sound Mathematical Foundation and Modification of TRICLUSTER - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Sound Mathematical Foundation and Modification of TRICLUSTER

Description:

... such as feature based clustering, graph based clustering and pattern based clustering. ... let. A three dimensional microarray dataset is a real-valued ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 15
Provided by: golammors
Category:

less

Transcript and Presenter's Notes

Title: Sound Mathematical Foundation and Modification of TRICLUSTER


1
Sound Mathematical Foundation and Modification of
TRICLUSTER
  • Presented by
  • Morshed Osmani

2
Problem Statement
  • We have a paper on clustering. A new method
    TriCluster is used to cluster gene expression
    data.
  • The paper has a weak mathematical notation system
    (as indicated by Dr. Perrizo).
  • Large search space.

3
My Planned Research-Work
  • Correct those mathematical notation
  • Develop method for pruning large search space
  • Implement the modified solution and get some
    simulation result
  • Compare the modified algorithm with original
    papers algorithm

4
Original Paper
  • TRICLUSTER An Effective Algorithm for Mining
    Coherent Clusters in 3D Microarray Data
  • Lizhuang Zhao, Mohammed J. Zaki
  • Rensselaer Polytechnic Institute, New York
  • ACM SIGMOD international conference on Management
    of data, 2005

5
Introduction
  • Traditional clustering algorithms work in the
    full dimensional space.
  • Biclustering, on the other hand, does not have
    such a strict requirement. If some points are
    similar in several dimensions (a subspace), they
    will be clustered together in that subspace.
  • Biclustering is able to identify the
    co-expression patterns of a subset of genes that
    might be relevant to a subset of the samples of
    interest.

6
Introduction (Cont.)
  • There has been a lot of interest in mining gene
    expression patterns across time. These approaches
    are also mainly two-dimensional, i.e., finding
    patterns along the gene-time dimensions.
  • The paper deals with mining tri-clusters, i.e.,
    mining coherent clusters along the
    gene-sample-time (temporal) or gene-sample-region
    (spatial) dimensions.
  • The authors claim TRICLUSTER is the first 3D
    microarray subspace clustering method.

7
Related Work
  • There has been work on mining gene expression
    patterns across time.
  • There are many full-space and biclustering
    algorithms designed to work with microarray
    datasets, such as feature based clustering, graph
    based clustering and pattern based clustering.
  • There is no previous method that mines
    tri-clusters.

8
Challenges
  • Biclustering itself is known to be a NP-hard
    problem. So heuristic methods or probabilistic
    approximations are used .
  • Microarray data is inherently susceptible to
    noise, due to varying experimental conditions,
    thus it is essential that the methods be robust
    to noise.
  • As we do not understand the complex gene
    regulation circuitry in the cell, clustering
    methods should allow overlapping clusters that
    share subsets of genes, samples or
    time-courses/spatial regions.
  • Furthermore, the methods should be flexible
    enough to mine several (interesting) types of
    clusters, and should not be too sensitive to
    input parameters.

9
Mathematical Notations Used
Let
be a set of n genes,
let
be a set of m biological samples (e.g.,
different tissues or experiments)
be a set of l experimental time points.
let
matrix
A three dimensional microarray dataset is a
real-valued
whose three dimensions correspond to genes,
samples and times respectively
A tricluster C is a submatrix of the dataset D,
provided certain conditions of homogeneity are
satisfied.
10
Problems with Notation
  • Used unconventional notation system which seems
    to be incorrect.
  • May confuse the reader to comprehend the actual
    meaning.
  • Wrong use of Cartesian product.
  • May be solved by using mapping (function).

11
The TRICLUSTER Algorithm
  • 3D microarray datasets have more genes than
    samples, and perhaps an equal number of time
    points and samples, i.e.,
  • Due to the symmetric property, TRICLUSTER always
    transposes the input 3D matrix such that
  • the dimension with the largest cardinality (say
    G) is 1st dimension
  • then make S as the 2nd and T as the 3rd
    dimension.

12
Steps of TRICLUSTER
  • TRICLUSTER has following main steps
  • For each GxS time slice matrix, find the valid
    ratio-ranges for all pair of samples, and
    construct a range multigraph
  • Mine the maximal biclusters from the range
    multigraph
  • Construct a graph based on the mined biclusters
    (as vertices) and get the maximal TRICLUSTERs
  • Optionally, delete or merge clusters if certain
    overlapping criteria are met.

13
Future Direction
  • Reduce the search space
  • - How (yet to be determined)
  • Implement the modified algorithm
  • Compare the result with result from previous
    algorithm

14
Need Suggestion
  • Here dataset D is a data cube. What are the
    benefits we may receive from the data cube
    model in this case?
  • How can this problem be integrated to DataDEX (if
    there is a possibility)?
  • How can we reduce the search space?
  • Any other comments or suggestions are welcome.
Write a Comment
User Comments (0)
About PowerShow.com