Matrix Tile Analysis

About This Presentation

Title:

Matrix Tile Analysis

Description:

Probabilistic Model for MTA. Input for NxM matrix X : T - # of tiles to search for. By ranging over T we can do model selection. ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 13

Provided by: inm96

Category:

more less

Transcript and Presenter's Notes

Title: Matrix Tile Analysis

1
MATRIX TILE ANALYSIS (MTA)
Inmar Givoni, Vincent Cheung, Brendan Frey
2
Goal explain patterns in matrix data as a set
of non-overlapping generalized tiles

A tile is a subset of similar matrix
elements defined by a subset of rows and columns
Under some permutation of the rows and columns,
all elements of a particular tile form a
contiguous block.
But there may not exist one permutation for
which all tiles appear contiguous.
Assumptions
Each matrix element belongs to at most one tile.
A row/column may contain elements belonging to
many tiles.

MTA is applicable to datasets that are
represented as matrices e.g. high-throughput
biological data, collaborative filtering.
3
Existing methods

Matrix Factorization (PCA,ICA ,NNMF,)
Assume sensibly defined addition and
multiplication of matrix elements
Not necessarily the case likelihoods/non-ordinal
input, etc.

X
Y
Z
N
M
C
4
Existing methods

Clustering (Hierarchical, K-means,)
Assume each row/column is in a single cluster
Similarity based on the entire row/column
We may wish for each row/column to be explained
by several clusters.

5
Probabilistic Model for MTA

Input for NxM matrix X
T - of tiles to search for
By ranging over T we can do model selection.
L0, L user defined data likelihoods of Xs
elements
Under background model and foreground model
The model can easily be extended to more than one
foreground model.

For example Background (L0) tile elements
(L)
6
Probabilistic Model for MTA

Latent indicator variables for each tile
if ith row of X contains elements in
tile t
if jth row of X contains elements in
tile t

P(X,r,c) P(r,c) P(X r,c)
7
Factor Graph for MTA

We introduce dummy inidcator variables for
every matrix element in every tile.

Tile T
Tile 1

Enforces the constraint that if xij is in tile t,
the corresponding row and column indicators must
be active.
Enforces the constraint that each element is
accounted for by at most one tile.
MTA-SP Perform inference using the sum-product
algorithm (Loopy Belief Propagation)
8
Alternative methods to sum-product for solving MTA

MTA-ICM - Iterative row/column update s.t.
constraints are not violated.
PCA - Extract T principle components (of e.g.
columns), for each component, threshold to find
corresponding component elements (which row is in
which tile), project matrix columns.
Plaid (Lazzeroni Owen,2000) - Interpret each
layer as either background or foreground tile.

9
Experimental Results

Generate synthetic tile data
Corrupt with noise
X 40x40,T5,s20.0316

MTA-SP
MTA-ICM
PCA
Plaid

Evaluation criteria Hamming distance,
classification error, cost

10
Experimental Results Hamming Distance
Error Rate
Noise Level (s2)

Vary of tiles, matrix size.
Test across 7 noise levels, 20 different matrices
per setting.

11
Experimental Results Classification Error
Error Rate
Noise Level (s2)
12
Experimental Results Cost
13
Experimental Results on SGA Data
Synthetic Genetic Interaction Tong et
al.,2004 135 x 1023 binary interaction matrix.
When the deletion of gene A or gene B yields a
viable mutant, while the double knockout is
lethal.
Functional enrichment by GO categories
14
Summary