Title: SOMs for time series
1SOMs for time series
2Time series ...
3 try to learn French
design the next research project
be very polite
organize everything
enjoy the excellent food
enjoy the nice company
enjoy yourself
go asleep
drink something
4Time series
- time series of red wine, white wine, sake,
coke, green tea, water - spoken language time series of frequencies
- written language time series of symbols
- sensor streams
- music
- motor functions
- heart beat and other biological time series
- metabolic reactions
- DNA sequences
-
Vive la France!
5SOM for time series...
6SOM for time series
- Self-organizing map (SOM) Kohonen
- very popular unsupervised self-organizing neural
method for data mining and visualization
network given by prototypes wj ? Rn in a lattice
mapping Rn?x ? position j in the lattice for
which x-wj minimal
Hebbian learning based on examples xi and
neighborhood cooperation
i.e. choose xi and adapt all wj wj wj
?nhd(j,j0)(xi-wj)
7SOM for time series
- time window technique Martinetz et al., Simon et
al., - specific metrics for sequences Günter/Bunke,
Kohonen, Somervuo, Yin, - statistical models Bishop et al., Tino et al.,
Swarup et al., - temporal aspects by spatial representation
Euliano/Principe, James/Miikkulainen, Kohonen,
Schulz/Reggia, Wiemer, - recurrent processing of time series
8SOM for time series
- Temporal Kohonen Map Chappell/Taylor,93
x1,x2,x3,x4,,xt,
d(xt,wi) xt-wi ad(xt-1,wi)
training wi ? xt
Recurrent SOM Koskela/Varsta/Heikkonen,98
d(xt,wi) yt where yt (xt-wi) ayt-1
training wi ? yt
9SOM for time series
- TKM/RSOM compute a leaky average of time series
- It is not clear how they can differentiate
various contexts - no explicit context!
is the same as
10Merge SOM ...
11Merge SOM
- Idea explicit notion of context
(wj,cj) in Rnxn
wj represents the current entry xt cj
represents the context the content of the
winner of the last step
d(xt,wj) axt-wj (1-a)Ct-cj where Ct
?wI(t-1) (1-?)cI(t-1), I(t-1) winner in step
t-1
merge
12Merge SOM
C1 (42 50)/2 46
C2 (3345)/2 39
C3 (3338)/2 35.5
13Merge SOM
- Training
- MSOM wj wj ?nhd(j,j0)(xt-wj)
- cj cj
?nhd(j,j0)(Ct-cj) - euclidean or alternative (e.g. hyperbolic)
lattices - MNG wj wj ?rk(j,xt,w)(xt-wj)
- cj wj
?rk(j,xt,w)(Ct-cj)
14Merge SOM
- Training choice of merge parameters
- Ct ?wI(t-1) (1-?)cI(t-1)
- determines the influence of the history on the
interior context representation - ? 0.5 is often a good choice (balanced history)
- d(xt,wj) axt-wj (1-a)Ct-cj
- determines the influence of the history on the
winner - an annealing strategy starting from a 1 driven
by the map entropy provides a good strategy - Note
- ? determines the representation
- a determines the network dynamic/stability
- they can be controlled separately!
15Merge SOM
- Experiment
- speaker identification, Japanese vowel ae
- 9 speakers, 30 articulations per speaker in
training set - separate test set
- http//kdd.ics.uci.edu/databases/JapaneseVowels/Ja
paneseVowels.html
time
12-dim. cepstrum
16Merge SOM
- MNG with posterior labeling
- ? 0.5, a 0.99?0.63, ? 0.3
- 150 neurons
- 0 training error
- 2.7 test error
- 1000 neurons
- 0 training error
- 1.6 test error
- rule based 5.9, HMM 3.8 Kudo et al.
17Merge SOM
- Experiment
- Reber grammar
- 3106 input vectors for training
- 106 vectors for testing
- MNG
- 617 neurons, ? 0.5, a 1?0.57
- evaluation by the test data
- attach the longest unique sequence to each winner
- 428 distinct words
- average length 8.902
- reconstruction from the map
- backtracking of the best matching predecessor
- triplets only valid Reber words
- unlimited average 13.78
- TVPXTTVVEBTSXXTVPSEBPVPXTVVEBPVVEB
BTXXVPXVPXVPSE
BTXXVPXVPSE
(W,C)
18Merge SOM
- Experiment
- classification of donor sites for C.elegans
- 5 settings with 10000 training data, 10000 test
data, 50 nucleotides TCGA embedded in 3 dim, 38
donor Sonnenburg, Rätsch et al. - MNG with posterior labeling
- 512 neurons, ?0.25, ?0.075, a 0.999 ?
0.4,0.7 - 14.060.66 training error, 14.260.39 test
error - sparse representation 512 6 dim
19Merge SOM
- Theory (training)
- Assume
- a SOM with merge context is given (no
neighborhood) - a sequence x0, x1, x2, x3, is given
- enough neurons are available
- Then
- the optimum weight/context pair for xt is
- w xt, c ?i0..t-1
?(1-?)t-i-1xi - Hebbian training converges to this setting as a
stable fixed point
20Merge SOM
- MSOM
- w xt, c ?i0..t-1 ?(1-?)t-i-1xi
- stable fixed point of Hebbian training
- dynamics driven by entropy-controlled parameter a
- Compare to TKM/RSOM
- optimum weights are w ?i0..t (1-a)ixt-i /
?i0..t (1-a)i - but no fixed point for TKM
- fixed point for RSOM, but no separate control of
the dynamic is possible
21Merge SOM
- Theory (capacity)
- MSOM can simulate finite automata
- TKM/RSOM cannot
- ? MSOM is strictly more powerful than TKM/RSOM!
state
input
state
d
state
input (1,0,0,0)
22General recursive SOM ...
23General recursive SOM
xt,xt-1,xt-2,,x0
xt-1,xt-2,,x0
xt
(w,c)
xt w2
Ct - c2
The methods differ in the choice of context!
Ct
Hebbian learning w ? xt c ? Ct
24General recursive SOM
xt,xt-1,xt-2,,x0
(w,c)
xt w2
Ct - c2
xt
MSOM Ct merged content of the winner in the
previous time step TKM/RSOM Ct activation of
the current neuron (implicit c)
Ct
xt-1,xt-2,,x0
25General recursive SOM
- MSOM
- Ct merged content of the winner in the
previous time step - TKM/RSOM
- Ct activation of the current neuron
(implicit c) - Recursive SOM (RecSOM) Voegtlin
- Ct exponential transformation of the
activation of all neurons - (exp(-d(xt-1,w1)),,exp(-d(xt-1,wN)))
- Feedback SOM (FSOM) Horio/Yamakawa
- Ct leaky integrated activation of all
neurons - (d(xt-1,w1),, d(xt-1,wN)) ?Ct-1
- SOM for structured data (SOMSD)
Hagenbuchner/Sperduti/Tsoi - Ct index of the winner in the previous
step - Supervised recurrent networks
- Ct sgd(activation), metric as dot product
26General recursive SOM
for normalized or WTA semilinear context
27General recursive SOM
- Experiment
- Mackey-Glass time series
- 100 neurons
- different lattices
- different contexts
- evaluation by the temporal quantization error
-
average(mean activity k steps into the past -
observed activity k steps into the past)2
28General recursive SOM
SOM
quantization error
RSOM
NG
RecSOM
SOMSD
HSOMSD
MNG
now
past
29General recursive SOM
- MNG weight/context development
30General recursive SOM
- SOMSD average receptive fields and variation
31General recursive SOM
The principle can be generalized to tree
structures!
a(t,t)
t
t
a
(w,c,c)
w a2
C(t) - c2
C(t) c2
C(t)
C(t)
32General recursive SOM
Supervised well established recursive neural
networks for learning on tree-structured inputs.
33(No Transcript)