An Association Analysis Approach to Biclustering - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

An Association Analysis Approach to Biclustering

Description:

An Association Analysis Approach to Biclustering – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 2
Provided by: rgu19
Category:

less

Transcript and Presenter's Notes

Title: An Association Analysis Approach to Biclustering


1
An Association Analysis Approach to Biclustering
Gaurav Pandey, Gowtham Atluri, Michael Steinbach,
Chad L. Myers and Vipin Kumar Department of
Computer Science and Engineering, University of
Minnesota
e-mail gaurav_at_cs.umn.edu website
http//www.cs.umn.edu/kumar/dmbio
APPROACH
MOTIVATION
RESULTS
Association patterns are biclusters!
Disadvantage
Advantages
Bicluster Group of objects showing similarity
over only a subset of the features in a data
set. Problem studied extensively for microarray
data for finding various type of
biclusters Finds more functionally enriched
groups of genes than hierarchical clustering
Prelic et al, 2006
  • Exhaustive (and efficient) discovery of
    biclusters.
  • Can discover small biclusters owing to bottom-up
    search procedure.
  • Need to binarize or discretize the original
    real-valued data set which causes a loss of
    information Becquet et al, 2002 Creighton et
    al, 2003 McIntosh et al, 2007

Functional enrichment for small classes (1-30
members)
Functional enrichment for large classes (31-500
members)
Range Support An anti-monotonic support measure
for real-valued data!
Constant addition biclusters
Constant addition biclusters
Constant value biclusters
Constant row (column) biclusters
Madeira Oliveira, 2004
  • Constraints imposed
  • Consistency of expression values
  • Same direction of expression
  • These conditions satisfied over substantial
    number of conditions

CURRENT BICLUSTERING APPROACHES
Fraction of patterns (biclusters) enriched by
several groups of small classes at p-value 1x10-5
Fraction of class covered by patterns
(biclusters) among several groups of small
classes at p-value 1x10-5
Common Issues
Define an objective function/measure for
coherence of a bicluster
Non-exhaustive Heuristic search scheme doesnt
enumerate all biclusters satisfying the specified
condition Bias towards larger biclusters Objectiv
e function/measure satisfied early Non-overlappin
g biclusters (some)
Reorder rows and columnsfor globalminimum
Eliminate rows and columns for local minimum
Eliminate rows and columns from random seed
  • Can be used within an Apriori-like framework
    Agrawal et al. 1994
  • Implementation at http//www.cs.umn.edu/vk/gaurav/
    rap.

REFERENCES
  • Computational Approaches for Protein Function
    Prediction A Survey, Gaurav Pandey, Vipin
    Kumar, Michael Steinbach, Technical Report
    06-028, October 2006, Department of Computer
    Science, University of Minnesota
  • Association Analysis-based Transformations for
    Protein Interaction Networks A Function
    Prediction Case Study, Gaurav Pandey, Michael
    Steinbach, Rohit Gupta, Tushar Garg, Vipin Kumar,
    Proceedings of ACM KDD, pp 540-549, 2007
  • Association Analysis for Real-valued Data
    Definitions and Application to Microarray Data,
    Gaurav Pandey, Gowtham Atluri, Michael Steinbach,
    Vipin Kumar, TR 08-007, Department of Computer
    Science, University of Minnesota, 2008
  • H. Xiong, G. Pandey, M. Steinbach, and V. Kumar.
    Enhancing data analysis with noise removal. IEEE
    Transactions on Knowledge and Data Engineering,
    18(3)304319, 2006.
  • H. Xiong, X. He, C. Ding, Y. Zhang, V. Kumar, and
    S. R. Holbrook. Identification of functional
    modules in protein complexes via hyperclique
    pattern discovery. In Proc. Pacific Symposium on
    Biocomputing (PSB), pages 221232, 2005.
  • H. Xiong, P.-N. Tan, and V. Kumar. Hyperclique
    pattern discovery. Data Min. Knowl. Discov.,
    13(2)219242, 2006.

Cheng Church (CC)
ISA
Coclustering
ACKNOWLEDGEMENTS
This work has been supported by NSF grants
CRI-0551551, IIS-0308264 and ITR-0325949.
Computational resources for this work were
provided by MSI.
Write a Comment
User Comments (0)
About PowerShow.com