SelfOrganizing Maps - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

SelfOrganizing Maps

Description:

A nearest-neighbor model designed to project multi-dimensional data points down ... The child of Selma Hayek and Ed Norton: Is there hope? ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 18
Provided by: compbi
Category:

less

Transcript and Presenter's Notes

Title: SelfOrganizing Maps


1
Self-Organizing Maps
2
Basic Theory of SOMs
  • Kinda a constrained K-means clustering
    (particularly matches Hasties ch. 13
    description)
  • K-means where cluster centers are on smooth
    manifold
  • A nearest-neighbor model designed to project
    multi-dimensional data points down to a
    low-dimensional space

3
The Basic SOM Algo
  • Take a bunch of high-dimensional data points and
    a low-dimensional graph (typically a 2-d grid)
  • Each vertex of the graph has an associated vector
    that lives in the high-dimensional space
  • The vector initially has random weights

4
The Basic SOM Algo Cont
  • Each iteration take a data point and move each
    vector towards that point based on how close the
    vector is to it
  • OR just move vectors that are neighbors of the
    nearest vector
  • At each iteration the learning rate decreases

5
The Basic SOM Algo Cont
  • Why?
  • Well, there is a topographic relationship between
    the clusters
  • What it looks like

6
(No Transcript)
7
Using it for gene expression arrays
  • Whitehead Institute likes SOMs
  • In 1999, they pushed this method pretty hard
  • Early success, but now not used that much for
    gene array clustering

8
(No Transcript)
9
What they looked at
  • Took yeast cells through out their cell cycle and
    looked at various RNA levels
  • Clusters matched human analysis
  • More interestingly, similar expression patterns
    for neighbors

10
What else they looked at
  • Blood stem cells and their differentiation
  • Looked at how expression levels changed over a 24
    hour period as the stems cells differentiated
    into macrophages
  • Underlying idea similar time courses, shared
    regulation
  • Also looks at different gene regulations in
    different cell lines

11
The good
  • This algorithm runs in linear time
  • Robust against noise
  • A relation between clusters
  • Bustas lament

12
The bad
  • No high-order relations
  • Guess the number of vertices in the
    lower-dimensional space
  • What the heck does an edge mean here?

13
The ugly
  • The child of Selma Hayek and Ed Norton

14
Is there hope?
  • Toronen uses a method to create multiple maps
    with different node numbers
  • For example, maps of 3x4, 6x8, 12x16
  • Pick the one which produces best separation
  • Same connectivity, just different number

15
Combining Hierarchical Clustering and SOMs
  • Hierarchy of clusters
  • Self-organizing process like SOM, but uses a
    binary tree topology
  • Keeps splitting as long as researcher wants
  • Problem with method is that it requires training

16
Another Idea
  • Pick an underlying feature space that means
    something biologically
  • Easier said than done
  • Perhaps try to map biological knowledge of gene
    regulatory networks into an appropriate feature
    space topology
  • How unsupervised is that?

17
A mathematical aside
  • Can be thought of as a discretized version of
    principle curves
  • Principle curves are a generalized version of
    principal components
  • A smooth, 1-d curved approximation
  • Which brings us to some next-level ish
Write a Comment
User Comments (0)
About PowerShow.com