Matrix Tile Analysis - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Matrix Tile Analysis

Description:

Probabilistic Model for MTA. Input for NxM matrix X : T - # of tiles to search for. By ranging over T we can do model selection. ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 13
Provided by: inm96
Category:
Tags: analysis | matrix | tile

less

Transcript and Presenter's Notes

Title: Matrix Tile Analysis


1
MATRIX TILE ANALYSIS (MTA)
Inmar Givoni, Vincent Cheung, Brendan Frey
2
Goal explain patterns in matrix data as a set
of non-overlapping generalized tiles
  • A tile is a subset of similar matrix
    elements defined by a subset of rows and columns
  • Under some permutation of the rows and columns,
    all elements of a particular tile form a
    contiguous block.
  • But there may not exist one permutation for
    which all tiles appear contiguous.
  • Assumptions
  • Each matrix element belongs to at most one tile.
  • A row/column may contain elements belonging to
    many tiles.

MTA is applicable to datasets that are
represented as matrices e.g. high-throughput
biological data, collaborative filtering.
3
Existing methods
  • Matrix Factorization (PCA,ICA ,NNMF,)
  • Assume sensibly defined addition and
    multiplication of matrix elements
  • Not necessarily the case likelihoods/non-ordinal
    input, etc.


X
Y
Z
N
M
C
4
Existing methods
  • Clustering (Hierarchical, K-means,)
  • Assume each row/column is in a single cluster
  • Similarity based on the entire row/column
  • We may wish for each row/column to be explained
    by several clusters.

5
Probabilistic Model for MTA
  • Input for NxM matrix X
  • T - of tiles to search for
  • By ranging over T we can do model selection.
  • L0, L user defined data likelihoods of Xs
    elements
  • Under background model and foreground model
  • The model can easily be extended to more than one
    foreground model.

For example Background (L0) tile elements
(L)
6
Probabilistic Model for MTA
  • Latent indicator variables for each tile
  • if ith row of X contains elements in
    tile t
  • if jth row of X contains elements in
    tile t

P(X,r,c) P(r,c) P(X r,c)
7
Factor Graph for MTA
  • We introduce dummy inidcator variables for
    every matrix element in every tile.

Tile T
Tile 1


Enforces the constraint that if xij is in tile t,
the corresponding row and column indicators must
be active.
Enforces the constraint that each element is
accounted for by at most one tile.
MTA-SP Perform inference using the sum-product
algorithm (Loopy Belief Propagation)
8
Alternative methods to sum-product for solving MTA
  • MTA-ICM - Iterative row/column update s.t.
    constraints are not violated.
  • PCA - Extract T principle components (of e.g.
    columns), for each component, threshold to find
    corresponding component elements (which row is in
    which tile), project matrix columns.
  • Plaid (Lazzeroni Owen,2000) - Interpret each
    layer as either background or foreground tile.

9
Experimental Results
  • Generate synthetic tile data
  • Corrupt with noise
  • X 40x40,T5,s20.0316

MTA-SP
MTA-ICM
PCA
Plaid
  • Evaluation criteria Hamming distance,
    classification error, cost

10
Experimental Results Hamming Distance
Error Rate
Noise Level (s2)
  • Vary of tiles, matrix size.
  • Test across 7 noise levels, 20 different matrices
    per setting.

11
Experimental Results Classification Error
Error Rate
Noise Level (s2)
12
Experimental Results Cost
13
Experimental Results on SGA Data
Synthetic Genetic Interaction Tong et
al.,2004 135 x 1023 binary interaction matrix.
When the deletion of gene A or gene B yields a
viable mutant, while the double knockout is
lethal.
Functional enrichment by GO categories
14
Summary
  • Introduced Matrix Tile Analysis
  • Factorization of a matrix into non-overlapping
    similar tiles.
  • Probabilistic model and inference algorithms.
  • Comparison to existing methods, and application
    to synthetic and real data.
Write a Comment
User Comments (0)
About PowerShow.com