Barbara Hammer, TU Clausthal - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Barbara Hammer, TU Clausthal

Description:

for normalized or WTA semilinear context. 8/8/09. Clustering and visualization of relational data ... kij := rank of prototype wi given xj. wi := j ?(kij) xj ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 34
Provided by: Barb398
Category:

less

Transcript and Presenter's Notes

Title: Barbara Hammer, TU Clausthal


1
Clustering and visualization of relational data
  • Barbara Hammer, TU Clausthal
  • (joint with Marie Cottrell, Alexander Hasenfuss,
    Alessio Micheli, Nicolas Neubauer, Alessandro
    Sperduti, Marc Strickert, Thomas Villmann)

2
???
3
2D-Visualization, Clustering
4
Outline
  • Clustering methods SOM and Neural Gas
  • Time series and recursive data Merge SOM/NG
  • Dissimilarities Relational SOM/NG

5
Clustering Methods SOM and Neural Gas
6
Clustering Methods SOM and NG
Task given data ? Rn find representative
prototypes wi ? Rn
Vector Quantization initialize wi adapt
wwinner ?(xj-wwinner)
optimizes EVQ ?ij?j(i)(xj-wi)2 sensitive to
initialization
Self-Organizing Map initialize wi adapt wi
?(nd(winner,i))(xj-wi)
optimizes ESOM ?ij?j(i)?k?(nd(i,k)) (xj-wk)2
Heskes variant sensitive to the topology
Neural Gas initialize wi adapt wi
?(rk(xj,wi))(xj-wi)
optimizes ENG ?ij?(rk(xj,wi)) (xj-wi)2
7
Visualization
  • SOM direct
  • NG subsequent multidimensional scaling of the
    prototype vectors (choose the distance based on
    the optimum topology learned by NG)

for euclidean data
8
Time series and recursive data Merge SOM/NG
9
Time series
10
SOM for time series
  • Temporal Kohonen Map Chappell/Taylor,93

x1,x2,x3,x4,,xt,
d(xt,wi) xt-wi ad(xt-1,wi)
training wi ? xt
Recurrent SOM Koskela/Varsta/Heikkonen,98
Similar
d(xt,wi) yt where yt (xt-wi) ayt-1
training wi ? yt
11
SOM for time series
  • TKM/RSOM compute a leaky average of time series
  • It is not clear how they can differentiate
    various contexts

is the same as
12
Merge SOM
  • Idea explicit notion of context

(wj,cj) in Rnxn
wj represents the current entry xt cj
represents the context the content of the
winner of the last step
d(xt,wj) axt-wj (1-a)Ct-cj where Ct
?wI(t-1) (1-?)cI(t-1), I(t-1) winner in step
t-1
merge
13
Merge SOM
  • Example 42 ? 33? 33? 34

C1 (42 50)/2 46
C2 (3345)/2 39
C3 (3338)/2 35.5
14
Merge SOM
  • Training
  • MSOM wj wj ?nhd(j,j0)(xt-wj)
  • cj cj
    ?nhd(j,j0)(Ct-cj)
  • euclidean or alternative (e.g. hyperbolic)
    lattices
  • MNG wj wj ?rk(wj,xt)(xt-wj)
  • cj wj
    ?rk(wj,xt)(Ct-cj)

15
Merge SOM
  • Experiment
  • speaker identification, Japanese vowel ae
  • 9 speakers, 30 articulations per speaker in
    training set
  • separate test set
  • http//kdd.ics.uci.edu/databases/JapaneseVowels/Ja
    paneseVowels.html

time
12-dim. cepstrum
16
Merge SOM
  • MNG with posterior labeling
  • ? 0.5, a 0.99?0.63, ? 0.3
  • 150 neurons
  • 0 training error
  • 2.7 test error
  • 1000 neurons
  • 0 training error
  • 1.6 test error
  • rule based 5.9, HMM 3.8 Kudo et al.

17
Merge SOM
  • Experiment
  • classification of donor sites for C.elegans
  • 5 settings with 10000 training data, 10000 test
    data, 50 nucleotides TCGA embedded in 3 dim, 38
    donor Sonnenburg, Rätsch et al.
  • MNG with posterior labeling
  • 512 neurons, ?0.25, ?0.075, a 0.999 ?
    0.4,0.7
  • 14.060.66 training error, 14.260.39 test
    error
  • sparse representation 512 6 dim

18
Merge SOM
  • Reber grammar

reconstruction of the frequencies by counting
19
Merge SOM
SOM
HSOM
E
B
P
S
S
P
V
V
X
X
E
B
T
T
20
General recursive SOM
xt,xt-1,xt-2,,x0
xt-1,xt-2,,x0
xt
(w,c)
xt w2
Ct - c2
The methods differ in the choice of context!
Hebbian learning w ? xt c ? Ct
Ct
21
General recursive SOM
  • MSOM
  • Ct merged content of the winner in the
    previous time step
  • TKM/RSOM
  • Ct activation of the current neuron
    (implicit c)
  • Recursive SOM (RecSOM) Voegtlin
  • Ct exponential transformation of the
    activation of all neurons
  • (exp(-d(xt-1,w1)),,exp(-d(xt-1,wN)))
  • Feedback SOM (FSOM) Horio/Yamakawa
  • Ct leaky integrated activation of all
    neurons
  • (d(xt-1,w1),, d(xt-1,wN)) ?Ct-1
  • SOM for structured data (SOMSD)
    Hagenbuchner/Sperduti/Tsoi
  • Ct index of the winner in the previous
    step
  • Supervised recurrent networks
  • Ct sgd(activation), metric as dot product
  • Can be generalized to tree structures and beyond!

22
General recursive SOM
for normalized or WTA semilinear context
23
Dissimilarity data Relational SOM/NG
24
Dissimilarity data Median Neural Gas
frog's favorites
  • Data given by pairwise dissimilarities d(xi,xj),

25
Batch clustering
Vector Quantization initialize wi adapt
wwinner ?(xj-wwinner)
optimizes EVQ ?ij?j(i)(xj-wi)2
k-means for priorly given data set xj adapt
I(xi) index of wj closest to xi
wi ?j d(I(xj),i) xj / ?j d(I(xj),i) sensitive
to initialization
Self-Organizing Map initialize wi adapt wi
?(nd(winner,i))(xj-wi)
optimizes ESOM ?ij?j(i)?k?(nd(i,k)) (xj-wk)2
Heskes variant
Batch-SOM adapt I(xi) index of wj closest to
xi using Heskes definition of closeness
wi ?j?(nd(I(xj),i)) xj / ?j
?(nd(I(xj),i)) topological mismatches occur
easily, initialization with PCA directions
necessary
26
Batch clustering
In general optimize ?ijf1(kij(w)) f2ij(w) ?
EVQ ?ij ?j(i) (xj-wi)2

? ESOM ?ij ?j(i)
?k?(nd(i,k)) (xj-wk)2
? ENG
?ij ?(rk(xj,wi)) (xj-wi)2 with constraints on
kij(w) ? kij in 0,1, ?i kij 1
? kij in
0,1, ?i kij 1
? kij in 0,...,n-1 permutation
for each j algorithm in turn optimize kij
given fixed w
optimize w given fixed kij
Batch-NG kij
rank of prototype wi given xj
wi ?j ?(kij) xj / ?j
?(kij)
Fast!
One can prove convergence (for general
situations)!
27
Median NG
Data set xj given by dissimilarity d(xi,xj) ?
matrix D with entries d(xi,xj)2
Restrict prototypes to data locations wi xl for
some l Hence xj-wi2 d(xj,xl)2 Standard cost
function ENG ?ij ?(rk(xj,wi)) d(xj,wi)2
Median-NG compute xj-wi2 kij
rank of prototype wi given xj
wi argmin xl ?ij ?(kij) d(xj,xl)2
Discrete updates. Convergence. Applicable to SOM
28
Relational NG
Data set xj given by dissimilarity d(xi,xj) ?
matrix D with entries d(xi,xj)2 which stems from
a metric, but is unknown
Optimum prototypes fulfill wi ?l ail xl where
?ail 1 normalized ranks Hence xj-wi2 (D
ai)j ½ ait D ai Dual cost function ENG ?i?ll
?(rk(xl,wi)) ?(rk(xl,wi)) d(xl,xl)2
Relational-NG xj-wi2 (D ai)j ½ ait D
ai kij rank of prototype wi
given xj aij ?(kij), normalize
Extends NG. Convergence. Applicable to SOM
29
Relational NG
  • Protein classifiation (metric benchmark data set)
  • 226 points, 5 classes (HA, HB, MY, GG/GP, other),
    alignment distance
  • Evolutionary distance of globin proteins
  • 45 neurons, 150 epochs, 10-fold cross-validation,
    average over 100 runs, mixing parameter for
    supervision 0.5

30
Relational NG
31
Relational NG
  • Cat cortex (non euclidean data)
  • 65 points, 4 classes (4 cortex regions), matrix
    data, symmetric
  • connection strength of cortical areas
  • 12 neurons, 150 epochs, 10-fold cross-validation,
    average over 250 runs, mixing parameter for
    supervision 0.5

32
Conclusions
  • SOM and NG for robust clustering / visualization
  • Recursive versions for the inspection of events
    in a temporal context (.. constituents of
    recursive data)
  • Median and relational versions for general
    similarity data

33
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com