Title: An Association Analysis Approach to Biclustering
1An Association Analysis Approach to Biclustering
Gaurav Pandey, Gowtham Atluri, Michael Steinbach,
Chad L. Myers and Vipin Kumar Department of
Computer Science and Engineering, University of
Minnesota
e-mail gaurav_at_cs.umn.edu website
http//www.cs.umn.edu/kumar/dmbio
APPROACH
MOTIVATION
RESULTS
Association patterns are biclusters!
Disadvantage
Advantages
Bicluster Group of objects showing similarity
over only a subset of the features in a data
set. Problem studied extensively for microarray
data for finding various type of
biclusters Finds more functionally enriched
groups of genes than hierarchical clustering
Prelic et al, 2006
- Exhaustive (and efficient) discovery of
biclusters. - Can discover small biclusters owing to bottom-up
search procedure.
- Need to binarize or discretize the original
real-valued data set which causes a loss of
information Becquet et al, 2002 Creighton et
al, 2003 McIntosh et al, 2007
Functional enrichment for small classes (1-30
members)
Functional enrichment for large classes (31-500
members)
Range Support An anti-monotonic support measure
for real-valued data!
Constant addition biclusters
Constant addition biclusters
Constant value biclusters
Constant row (column) biclusters
Madeira Oliveira, 2004
- Constraints imposed
- Consistency of expression values
- Same direction of expression
- These conditions satisfied over substantial
number of conditions
CURRENT BICLUSTERING APPROACHES
Fraction of patterns (biclusters) enriched by
several groups of small classes at p-value 1x10-5
Fraction of class covered by patterns
(biclusters) among several groups of small
classes at p-value 1x10-5
Common Issues
Define an objective function/measure for
coherence of a bicluster
Non-exhaustive Heuristic search scheme doesnt
enumerate all biclusters satisfying the specified
condition Bias towards larger biclusters Objectiv
e function/measure satisfied early Non-overlappin
g biclusters (some)
Reorder rows and columnsfor globalminimum
Eliminate rows and columns for local minimum
Eliminate rows and columns from random seed
- Can be used within an Apriori-like framework
Agrawal et al. 1994 - Implementation at http//www.cs.umn.edu/vk/gaurav/
rap.
REFERENCES
- Computational Approaches for Protein Function
Prediction A Survey, Gaurav Pandey, Vipin
Kumar, Michael Steinbach, Technical Report
06-028, October 2006, Department of Computer
Science, University of Minnesota - Association Analysis-based Transformations for
Protein Interaction Networks A Function
Prediction Case Study, Gaurav Pandey, Michael
Steinbach, Rohit Gupta, Tushar Garg, Vipin Kumar,
Proceedings of ACM KDD, pp 540-549, 2007 - Association Analysis for Real-valued Data
Definitions and Application to Microarray Data,
Gaurav Pandey, Gowtham Atluri, Michael Steinbach,
Vipin Kumar, TR 08-007, Department of Computer
Science, University of Minnesota, 2008 - H. Xiong, G. Pandey, M. Steinbach, and V. Kumar.
Enhancing data analysis with noise removal. IEEE
Transactions on Knowledge and Data Engineering,
18(3)304319, 2006. - H. Xiong, X. He, C. Ding, Y. Zhang, V. Kumar, and
S. R. Holbrook. Identification of functional
modules in protein complexes via hyperclique
pattern discovery. In Proc. Pacific Symposium on
Biocomputing (PSB), pages 221232, 2005. - H. Xiong, P.-N. Tan, and V. Kumar. Hyperclique
pattern discovery. Data Min. Knowl. Discov.,
13(2)219242, 2006.
Cheng Church (CC)
ISA
Coclustering
ACKNOWLEDGEMENTS
This work has been supported by NSF grants
CRI-0551551, IIS-0308264 and ITR-0325949.
Computational resources for this work were
provided by MSI.