Barbara Hammer, TU Clausthal presentation

About This Presentation

Transcript and Presenter's Notes

Title: Barbara Hammer, TU Clausthal

1
Clustering and visualization of relational data

Barbara Hammer, TU Clausthal
(joint with Marie Cottrell, Alexander Hasenfuss,
Alessio Micheli, Nicolas Neubauer, Alessandro
Sperduti, Marc Strickert, Thomas Villmann)

2
???
3
2D-Visualization, Clustering
4
Outline

Clustering methods SOM and Neural Gas
Time series and recursive data Merge SOM/NG
Dissimilarities Relational SOM/NG

5
Clustering Methods SOM and Neural Gas
6
Clustering Methods SOM and NG
Task given data ? Rn find representative
prototypes wi ? Rn
Vector Quantization initialize wi adapt
wwinner ?(xj-wwinner)
optimizes EVQ ?ij?j(i)(xj-wi)2 sensitive to
initialization
Self-Organizing Map initialize wi adapt wi
?(nd(winner,i))(xj-wi)
optimizes ESOM ?ij?j(i)?k?(nd(i,k)) (xj-wk)2
Heskes variant sensitive to the topology
Neural Gas initialize wi adapt wi
?(rk(xj,wi))(xj-wi)
optimizes ENG ?ij?(rk(xj,wi)) (xj-wi)2
7
Visualization

SOM direct
NG subsequent multidimensional scaling of the
prototype vectors (choose the distance based on
the optimum topology learned by NG)

for euclidean data
8
Time series and recursive data Merge SOM/NG
9
Time series
10
SOM for time series

Temporal Kohonen Map Chappell/Taylor,93

x1,x2,x3,x4,,xt,
d(xt,wi) xt-wi ad(xt-1,wi)
training wi ? xt
Recurrent SOM Koskela/Varsta/Heikkonen,98
Similar
d(xt,wi) yt where yt (xt-wi) ayt-1
training wi ? yt
11
SOM for time series

TKM/RSOM compute a leaky average of time series
It is not clear how they can differentiate
various contexts

is the same as
12
Merge SOM

Idea explicit notion of context

(wj,cj) in Rnxn
wj represents the current entry xt cj
represents the context the content of the
winner of the last step
d(xt,wj) axt-wj (1-a)Ct-cj where Ct
?wI(t-1) (1-?)cI(t-1), I(t-1) winner in step
t-1
merge
13
Merge SOM

Example 42 ? 33? 33? 34

C1 (42 50)/2 46
C2 (3345)/2 39
C3 (3338)/2 35.5
14
Merge SOM

Training
MSOM wj wj ?nhd(j,j0)(xt-wj)
cj cj
?nhd(j,j0)(Ct-cj)
euclidean or alternative (e.g. hyperbolic)
lattices
MNG wj wj ?rk(wj,xt)(xt-wj)
cj wj
?rk(wj,xt)(Ct-cj)

15
Merge SOM

Experiment
speaker identification, Japanese vowel ae
9 speakers, 30 articulations per speaker in
training set
separate test set
http//kdd.ics.uci.edu/databases/JapaneseVowels/Ja
paneseVowels.html

time
12-dim. cepstrum
16
Merge SOM

MNG with posterior labeling
? 0.5, a 0.99?0.63, ? 0.3
150 neurons
0 training error
2.7 test error
1000 neurons
0 training error
1.6 test error
rule based 5.9, HMM 3.8 Kudo et al.

17
Merge SOM

Experiment
classification of donor sites for C.elegans
5 settings with 10000 training data, 10000 test
data, 50 nucleotides TCGA embedded in 3 dim, 38
donor Sonnenburg, Rätsch et al.
MNG with posterior labeling
512 neurons, ?0.25, ?0.075, a 0.999 ?
0.4,0.7
14.060.66 training error, 14.260.39 test
error
sparse representation 512 6 dim

18
Merge SOM

Reber grammar

reconstruction of the frequencies by counting
19
Merge SOM
SOM
HSOM
E
B
P
S
S
P
V
V
X
X
E
B
T
T
20
General recursive SOM
xt,xt-1,xt-2,,x0
xt-1,xt-2,,x0
xt
(w,c)
xt w2
Ct - c2
The methods differ in the choice of context!
Hebbian learning w ? xt c ? Ct
Ct
21
General recursive SOM

MSOM
Ct merged content of the winner in the
previous time step
TKM/RSOM
Ct activation of the current neuron
(implicit c)
Recursive SOM (RecSOM) Voegtlin
Ct exponential transformation of the
activation of all neurons
(exp(-d(xt-1,w1)),,exp(-d(xt-1,wN)))
Feedback SOM (FSOM) Horio/Yamakawa
Ct leaky integrated activation of all
neurons
(d(xt-1,w1),, d(xt-1,wN)) ?Ct-1
SOM for structured data (SOMSD)
Hagenbuchner/Sperduti/Tsoi
Ct index of the winner in the previous
step
Supervised recurrent networks
Ct sgd(activation), metric as dot product
Can be generalized to tree structures and beyond!

22
General recursive SOM
for normalized or WTA semilinear context
23
Dissimilarity data Relational SOM/NG
24
Dissimilarity data Median Neural Gas
frog's favorites

Data given by pairwise dissimilarities d(xi,xj),

25
Batch clustering
Vector Quantization initialize wi adapt
wwinner ?(xj-wwinner)
optimizes EVQ ?ij?j(i)(xj-wi)2
k-means for priorly given data set xj adapt
I(xi) index of wj closest to xi
wi ?j d(I(xj),i) xj / ?j d(I(xj),i) sensitive
to initialization
Self-Organizing Map initialize wi adapt wi
?(nd(winner,i))(xj-wi)
optimizes ESOM ?ij?j(i)?k?(nd(i,k)) (xj-wk)2
Heskes variant
Batch-SOM adapt I(xi) index of wj closest to
xi using Heskes definition of closeness
wi ?j?(nd(I(xj),i)) xj / ?j
?(nd(I(xj),i)) topological mismatches occur
easily, initialization with PCA directions
necessary
26
Batch clustering
In general optimize ?ijf1(kij(w)) f2ij(w) ?
EVQ ?ij ?j(i) (xj-wi)2

? ESOM ?ij ?j(i)
?k?(nd(i,k)) (xj-wk)2
? ENG
?ij ?(rk(xj,wi)) (xj-wi)2 with constraints on
kij(w) ? kij in 0,1, ?i kij 1
? kij in
0,1, ?i kij 1
? kij in 0,...,n-1 permutation
for each j algorithm in turn optimize kij
given fixed w
optimize w given fixed kij
Batch-NG kij
rank of prototype wi given xj
wi ?j ?(kij) xj / ?j
?(kij)
Fast!
One can prove convergence (for general
situations)!
27
Median NG
Data set xj given by dissimilarity d(xi,xj) ?
matrix D with entries d(xi,xj)2
Restrict prototypes to data locations wi xl for
some l Hence xj-wi2 d(xj,xl)2 Standard cost
function ENG ?ij ?(rk(xj,wi)) d(xj,wi)2
Median-NG compute xj-wi2 kij
rank of prototype wi given xj
wi argmin xl ?ij ?(kij) d(xj,xl)2
Discrete updates. Convergence. Applicable to SOM
28
Relational NG
Data set xj given by dissimilarity d(xi,xj) ?
matrix D with entries d(xi,xj)2 which stems from
a metric, but is unknown
Optimum prototypes fulfill wi ?l ail xl where
?ail 1 normalized ranks Hence xj-wi2 (D
ai)j ½ ait D ai Dual cost function ENG ?i?ll
?(rk(xl,wi)) ?(rk(xl,wi)) d(xl,xl)2
Relational-NG xj-wi2 (D ai)j ½ ait D
ai kij rank of prototype wi
given xj aij ?(kij), normalize
Extends NG. Convergence. Applicable to SOM
29
Relational NG

Protein classifiation (metric benchmark data set)
226 points, 5 classes (HA, HB, MY, GG/GP, other),
alignment distance
Evolutionary distance of globin proteins
45 neurons, 150 epochs, 10-fold cross-validation,
average over 100 runs, mixing parameter for
supervision 0.5

30
Relational NG
31
Relational NG

Cat cortex (non euclidean data)
65 points, 4 classes (4 cortex regions), matrix
data, symmetric
connection strength of cortical areas
12 neurons, 150 epochs, 10-fold cross-validation,
average over 250 runs, mixing parameter for
supervision 0.5

32
Conclusions

SOM and NG for robust clustering / visualization
Recursive versions for the inspection of events
in a temporal context (.. constituents of
recursive data)
Median and relational versions for general
similarity data

33
(No Transcript)

Write a Comment

User Comments (0)

About PowerShow.com

Barbara Hammer, TU Clausthal PowerPoint PPT Presentation