Title: On the capacity of unsupervised recursive neural networks for symbol processing
1On the capacity of unsupervised recursive neural
networks for symbol processing
Prof. Dr. Barbara Hammer Computational
Intelligence Group Institute of Computer
Science Clausthal University of
Technology Nicolas Neubauer, M.Sc. Neural
Information Processing Group Department of
Electrical Engineering Computer
Science Technische Universität Berlin
29. 8. 2006
2Overview
- Introduction GSOMSD models and capacities
- Main part Implementing deterministic push-down
automata in RecSOM - Conclusion
3Unsupervised neural networks
- Clustering algorithms
- Each neuron/prototype i has a weight vector wi
- Inputs x are mapped to a winning neuron i such
that d(x,wi) is minimal - Training Adapt weights to minimize some error
function - Self-Organizing Maps (SOMs) Neurons arranged on
lattice - During training, adapt neighbours also
- After training, similar inputs ? neighbouring
neurons - variants without fixed grid (e.g. neural gas)
- Defined for finite-dimensional input vectors
- Question How to adapt algorithms for inputs of
non-fixed length like time series? - E.g. time windowing, statistical analysis, ...
Rn? R?
4Recursive processing of time series
- Process each input of time series separately
- Along with a representation C of the maps
response to the previous input (the context) - function rep RN ? Rr (Nnumber of neurons)
- Ct rep(d1(t-1), ..., dN(t-1))
- Neurons respond not only to input, but to context
also - Neurons require additional context weights c
- distance of neuron i at timestep tdi(t)
ad(xt,wi) ßdr(Ct,ci)
x2
xt
...
rep
rep
rep
C2
Ct
C
5Generalized SOM for Structured Data (GSOMSD)
Hammer, Micheli, Sperduti, Stricker, 04
- Unsupervised recursive algorithms Instances of
GSOMSD - Varying in
- context function rep, distances d, dr
- and lattice (metric, hyperbolic, neural gas, ...)
- Example
- Recursive SOM (RecSOM)
- context stores all neurons activations
- rN, rep(d1,...,dN) -exp(d1),...,-exp(dN)
- each neuron needs N context weights! (memory
N2) - other models store
- properties of winning neuron
- previous activations only for single neurons
-
6Computational capacity
- 01111 ! 11111
- Ability to keep state information Equivalence
to Finite State Automata (FSA) / regular
languages - Decaying context will eventually forget leading
0 - (()) ! (()
- Ability to keep stack information Equivalence
to Pushdown Automata (PDA) / context-free
languages - Finite context cannot store potentially infinite
stack - Ability to store at least two binary stacks
Turing Machine Equivalence - ? connecting context models to Chomsky hierarchy
7Why capacity matters
- Explore dynamics of algorithms in detail
- Distinguish power of different models
- different contexts within GSOMSD e.g., justify
huge memory costs of RecSOM compared to other
models - to other approaches e.g., supervised recurrent
networks - Supervised recurrent networks Turing machine
equivalence - in exponential time for sigmoidal activation
functions Kilian/Siegelmann 96 - in polynomial time for semilinear activation
functionsSiegelmann/Sontag 95
8Various recursive models and their capacity
TKM Chappell/Taylor,93 RSOMKoskela/Varsta/Heikkonen,98 MSOMHammer/Strickert,04 SOMSDHagenbuchner/Sperduti/Tsoi,03 RecSOMVoegtlin,02
context neuron itself neuron itself winner content winner index exp(all act.)
encoding input space input space input space index space activ. space
lattice all all all SOM/ HSOM all
capacity ltFSA ltFSA FSA FSA PDA
TKM Chappell/Taylor,93 RSOMKoskela/Varsta/Heikkonen,98 MSOMHammer/Strickert,04 SOMSDHagenbuchner/Sperduti/Tsoi,03 RecSOMVoegtlin,02
context neuron itself neuron itself winner content winner index exp(all act.)
encoding input space input space input space index space activ. space
lattice all all all SOM/ HSOM all
capacity ltFSA ltFSA FSA FSA PDA
for WTA semilinear context
9Overview
- Introduction GSOMSD models and capacities
- Main part Implementing deterministic push-down
automata in RecSOM - Conclusion
10Goal Build a Deterministic PDA
- Using
- the GSOMSD recursive equation di(t) ad(xt,wi)
ßdr(Ct,ci) - L1 distances for d, dr (i.e., d(a,b) a-b)
- parameters
- a1
- ß¼
- modified RecSOM context
- instead of original exp(-di) max(1-di,0)
- similar overall shape
- easier to handle analytically
- additionally, winner-takes-all
- required at one place in the proof...
- makes life easier overall, however
11Three levels of abstraction
- Layerstemporal (vertical) grouping ?
feed-forward architecture - Operatorsfunctional grouping ? defined
operations - Phasescombining operators -gt automaton
simulations
12First level of abstraction Feed-forward
- One-dimensional input weights w
- Encoding function enc S ? Nl Input symbol si ?
series of inputs (i, e, 2e, 3e, , (l-1)e) - with e gtgt max(i)S
- Each neuron is active for at most one component
of enc - resulting in l layers of neurons
- In layer l, we know that only neurons from layer
l-1 have been active, i.e. are represented gt0 in
context - pass on activity from layer to layer
-
-
-
13Feed-forward Generic Architecture
- Sample architecture
- 2 states Sa,b
- 2 inputs Ss0,s1
- Output layer
- Network represents state a ?? a active ??
C(0,0,...,1,0) - Network represents state b ?? b active ??
C(0,0,...,0,1)
Simulation of a state transition function d S x
S ? S
a
b
(l-1)e
Hidden layers arbitrary intermediate
computations
e
- Input layer
- encoding input X state
- get input via input weight
- get state via context weight
0,1
enc
wc
14Second Layer of Abstraction Operators
- We might be finished
- In fact, we are - for the FSA case
- However, what about the stack?
- looks like ?0 ?1 ?1 ...
- how to store potentially infinite symbol
sequences? - General idea
- Encode stack in the winner neurons activation
- Then build operators to
- read
- modify or
- copy
- the stack by changing the winners activation
-
15Encoding the stack in neurons activations
- To save a sequence of stack symbols within the
map, - turn ?0 ?1 ?1 into binary sequence alpha011
- f4(alpha)
- f4 (?) 0
- f4 (0) ¼
- f4 (1) ¾
- f4 (01) ¼ 3/16
- f4 (011) ¼ 3/16 3/64
- push(s,?1) ¼s ¾
- pop(s,?0 ) (s- ¼)4
- Encode stack in activation Activation a 1¼s
- ? push(a, ?1) 13/16 -1/16s
- pop(a, ?0) 5/4 s
-
16Operators
- COPY
- copy activation into next layer
- TOP
- identify top stack symbol
- OR
- get activation of active neurons(if any)
- PUSH
- modify activation for push
- push(a, ?1) 13/16 -1/16s
- POP
- modify activation for pop
- pop(a, ?0) 5/4 s
17Third abstraction Phases
Set Content Elements generic / examples Elements generic / examples
S States s a s(uccess) f(ail)
S Input alphabet s ( ) e
G Stack alphabet ? ( ?
U Stack actions push( push pop( pop) do nothing
a
b
Phase Task Input ? Output Required operators
Finalize Collect all results leading to same state U x S ? S OR, COPY
Execute Manipulate stack where needed U x S ? U x S PUSH,COPY/POP,COPY
Merge Collect all states resulting in common stack state S x S x G ? U x S OR, COPY
Separate Read top stack symbol S x S ? S x S x G TOP
18The final architecture
0S2 1S3
19Overview
- Introduction GSOMSD models and capacities
- Main part Implementing deterministic push-down
automata in RecSOM - Conclusion
20Conclusions
- RecSOM stronger computational capacity than
SOMSD/MSOM - Does this mean its worth the cost?
- Simulations not learnable with Hebbian learning
Practical relevance questionable - Anyway Elaborate context (costly) rather
hindering for simulations - too much context results in a lot of noise
- maybe better simpler models, slightly enhanced
- for example MSOM, SOMSD with context variable
indicating last winners activation - Turing machines also possible?
- Storing two stacks into a real number is possible
- Reconstructing two stacks from real number is
hard - particularly when using only differences
- may have to leave constant-size simulations
- Other representation of stacks may be required
21Thanks
22Aux slide PDA definition
23Aux slide PDA definition suitable for map
construction