Title: Barbara Hammer, TU Clausthal
1Clustering and visualization of relational data
- Barbara Hammer, TU Clausthal
- (joint with Marie Cottrell, Alexander Hasenfuss,
Alessio Micheli, Nicolas Neubauer, Alessandro
Sperduti, Marc Strickert, Thomas Villmann)
2???
32D-Visualization, Clustering
4Outline
- Clustering methods SOM and Neural Gas
- Time series and recursive data Merge SOM/NG
- Dissimilarities Relational SOM/NG
5Clustering Methods SOM and Neural Gas
6Clustering Methods SOM and NG
Task given data ? Rn find representative
prototypes wi ? Rn
Vector Quantization initialize wi adapt
wwinner ?(xj-wwinner)
optimizes EVQ ?ij?j(i)(xj-wi)2 sensitive to
initialization
Self-Organizing Map initialize wi adapt wi
?(nd(winner,i))(xj-wi)
optimizes ESOM ?ij?j(i)?k?(nd(i,k)) (xj-wk)2
Heskes variant sensitive to the topology
Neural Gas initialize wi adapt wi
?(rk(xj,wi))(xj-wi)
optimizes ENG ?ij?(rk(xj,wi)) (xj-wi)2
7Visualization
- SOM direct
- NG subsequent multidimensional scaling of the
prototype vectors (choose the distance based on
the optimum topology learned by NG)
for euclidean data
8Time series and recursive data Merge SOM/NG
9Time series
10SOM for time series
- Temporal Kohonen Map Chappell/Taylor,93
x1,x2,x3,x4,,xt,
d(xt,wi) xt-wi ad(xt-1,wi)
training wi ? xt
Recurrent SOM Koskela/Varsta/Heikkonen,98
Similar
d(xt,wi) yt where yt (xt-wi) ayt-1
training wi ? yt
11SOM for time series
- TKM/RSOM compute a leaky average of time series
- It is not clear how they can differentiate
various contexts
is the same as
12Merge SOM
- Idea explicit notion of context
(wj,cj) in Rnxn
wj represents the current entry xt cj
represents the context the content of the
winner of the last step
d(xt,wj) axt-wj (1-a)Ct-cj where Ct
?wI(t-1) (1-?)cI(t-1), I(t-1) winner in step
t-1
merge
13Merge SOM
C1 (42 50)/2 46
C2 (3345)/2 39
C3 (3338)/2 35.5
14Merge SOM
- Training
- MSOM wj wj ?nhd(j,j0)(xt-wj)
- cj cj
?nhd(j,j0)(Ct-cj) - euclidean or alternative (e.g. hyperbolic)
lattices - MNG wj wj ?rk(wj,xt)(xt-wj)
- cj wj
?rk(wj,xt)(Ct-cj)
15Merge SOM
- Experiment
- speaker identification, Japanese vowel ae
- 9 speakers, 30 articulations per speaker in
training set - separate test set
- http//kdd.ics.uci.edu/databases/JapaneseVowels/Ja
paneseVowels.html
time
12-dim. cepstrum
16Merge SOM
- MNG with posterior labeling
- ? 0.5, a 0.99?0.63, ? 0.3
- 150 neurons
- 0 training error
- 2.7 test error
- 1000 neurons
- 0 training error
- 1.6 test error
- rule based 5.9, HMM 3.8 Kudo et al.
17Merge SOM
- Experiment
- classification of donor sites for C.elegans
- 5 settings with 10000 training data, 10000 test
data, 50 nucleotides TCGA embedded in 3 dim, 38
donor Sonnenburg, Rätsch et al. - MNG with posterior labeling
- 512 neurons, ?0.25, ?0.075, a 0.999 ?
0.4,0.7 - 14.060.66 training error, 14.260.39 test
error - sparse representation 512 6 dim
18Merge SOM
reconstruction of the frequencies by counting
19Merge SOM
SOM
HSOM
E
B
P
S
S
P
V
V
X
X
E
B
T
T
20General recursive SOM
xt,xt-1,xt-2,,x0
xt-1,xt-2,,x0
xt
(w,c)
xt w2
Ct - c2
The methods differ in the choice of context!
Hebbian learning w ? xt c ? Ct
Ct
21General recursive SOM
- MSOM
- Ct merged content of the winner in the
previous time step - TKM/RSOM
- Ct activation of the current neuron
(implicit c) - Recursive SOM (RecSOM) Voegtlin
- Ct exponential transformation of the
activation of all neurons - (exp(-d(xt-1,w1)),,exp(-d(xt-1,wN)))
- Feedback SOM (FSOM) Horio/Yamakawa
- Ct leaky integrated activation of all
neurons - (d(xt-1,w1),, d(xt-1,wN)) ?Ct-1
- SOM for structured data (SOMSD)
Hagenbuchner/Sperduti/Tsoi - Ct index of the winner in the previous
step - Supervised recurrent networks
- Ct sgd(activation), metric as dot product
- Can be generalized to tree structures and beyond!
22General recursive SOM
for normalized or WTA semilinear context
23Dissimilarity data Relational SOM/NG
24Dissimilarity data Median Neural Gas
frog's favorites
- Data given by pairwise dissimilarities d(xi,xj),
25Batch clustering
Vector Quantization initialize wi adapt
wwinner ?(xj-wwinner)
optimizes EVQ ?ij?j(i)(xj-wi)2
k-means for priorly given data set xj adapt
I(xi) index of wj closest to xi
wi ?j d(I(xj),i) xj / ?j d(I(xj),i) sensitive
to initialization
Self-Organizing Map initialize wi adapt wi
?(nd(winner,i))(xj-wi)
optimizes ESOM ?ij?j(i)?k?(nd(i,k)) (xj-wk)2
Heskes variant
Batch-SOM adapt I(xi) index of wj closest to
xi using Heskes definition of closeness
wi ?j?(nd(I(xj),i)) xj / ?j
?(nd(I(xj),i)) topological mismatches occur
easily, initialization with PCA directions
necessary
26Batch clustering
In general optimize ?ijf1(kij(w)) f2ij(w) ?
EVQ ?ij ?j(i) (xj-wi)2
? ESOM ?ij ?j(i)
?k?(nd(i,k)) (xj-wk)2
? ENG
?ij ?(rk(xj,wi)) (xj-wi)2 with constraints on
kij(w) ? kij in 0,1, ?i kij 1
? kij in
0,1, ?i kij 1
? kij in 0,...,n-1 permutation
for each j algorithm in turn optimize kij
given fixed w
optimize w given fixed kij
Batch-NG kij
rank of prototype wi given xj
wi ?j ?(kij) xj / ?j
?(kij)
Fast!
One can prove convergence (for general
situations)!
27Median NG
Data set xj given by dissimilarity d(xi,xj) ?
matrix D with entries d(xi,xj)2
Restrict prototypes to data locations wi xl for
some l Hence xj-wi2 d(xj,xl)2 Standard cost
function ENG ?ij ?(rk(xj,wi)) d(xj,wi)2
Median-NG compute xj-wi2 kij
rank of prototype wi given xj
wi argmin xl ?ij ?(kij) d(xj,xl)2
Discrete updates. Convergence. Applicable to SOM
28Relational NG
Data set xj given by dissimilarity d(xi,xj) ?
matrix D with entries d(xi,xj)2 which stems from
a metric, but is unknown
Optimum prototypes fulfill wi ?l ail xl where
?ail 1 normalized ranks Hence xj-wi2 (D
ai)j ½ ait D ai Dual cost function ENG ?i?ll
?(rk(xl,wi)) ?(rk(xl,wi)) d(xl,xl)2
Relational-NG xj-wi2 (D ai)j ½ ait D
ai kij rank of prototype wi
given xj aij ?(kij), normalize
Extends NG. Convergence. Applicable to SOM
29Relational NG
- Protein classifiation (metric benchmark data set)
- 226 points, 5 classes (HA, HB, MY, GG/GP, other),
alignment distance - Evolutionary distance of globin proteins
- 45 neurons, 150 epochs, 10-fold cross-validation,
average over 100 runs, mixing parameter for
supervision 0.5
30Relational NG
31Relational NG
- Cat cortex (non euclidean data)
- 65 points, 4 classes (4 cortex regions), matrix
data, symmetric - connection strength of cortical areas
- 12 neurons, 150 epochs, 10-fold cross-validation,
average over 250 runs, mixing parameter for
supervision 0.5
32Conclusions
- SOM and NG for robust clustering / visualization
- Recursive versions for the inspection of events
in a temporal context (.. constituents of
recursive data) - Median and relational versions for general
similarity data
33(No Transcript)