On the capacity of unsupervised recursive neural networks for symbol processing - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

On the capacity of unsupervised recursive neural networks for symbol processing

Description:

On the capacity of unsupervised recursive neural networks for symbol processing Prof. Dr. Barbara Hammer Computational Intelligence Group Institute of Computer Science – PowerPoint PPT presentation

Number of Views:89

Avg rating:3.0/5.0

Slides: 24

Provided by: Nicola196

Category:

more less

Transcript and Presenter's Notes

Title: On the capacity of unsupervised recursive neural networks for symbol processing

1
On the capacity of unsupervised recursive neural
networks for symbol processing
Prof. Dr. Barbara Hammer Computational
Intelligence Group Institute of Computer
Science Clausthal University of
Technology Nicolas Neubauer, M.Sc. Neural
Information Processing Group Department of
Electrical Engineering Computer
Science Technische Universität Berlin
29. 8. 2006
2
Overview

Introduction GSOMSD models and capacities
Main part Implementing deterministic push-down
automata in RecSOM
Conclusion

3
Unsupervised neural networks

Clustering algorithms
Each neuron/prototype i has a weight vector wi
Inputs x are mapped to a winning neuron i such
that d(x,wi) is minimal
Training Adapt weights to minimize some error
function
Self-Organizing Maps (SOMs) Neurons arranged on
lattice
During training, adapt neighbours also
After training, similar inputs ? neighbouring
neurons
variants without fixed grid (e.g. neural gas)
Defined for finite-dimensional input vectors
Question How to adapt algorithms for inputs of
non-fixed length like time series?
E.g. time windowing, statistical analysis, ...

Rn? R?
4
Recursive processing of time series

Process each input of time series separately
Along with a representation C of the maps
response to the previous input (the context)
function rep RN ? Rr (Nnumber of neurons)
Ct rep(d1(t-1), ..., dN(t-1))
Neurons respond not only to input, but to context
also
Neurons require additional context weights c
distance of neuron i at timestep tdi(t)
ad(xt,wi) ßdr(Ct,ci)

x2
xt
...
rep
rep
rep
C2
Ct
C
5
Generalized SOM for Structured Data (GSOMSD)
Hammer, Micheli, Sperduti, Stricker, 04

Unsupervised recursive algorithms Instances of
GSOMSD
Varying in
context function rep, distances d, dr
and lattice (metric, hyperbolic, neural gas, ...)
Example
Recursive SOM (RecSOM)
context stores all neurons activations
rN, rep(d1,...,dN) -exp(d1),...,-exp(dN)
each neuron needs N context weights! (memory
N2)
other models store
properties of winning neuron
previous activations only for single neurons

6
Computational capacity

01111 ! 11111
Ability to keep state information Equivalence
to Finite State Automata (FSA) / regular
languages
Decaying context will eventually forget leading
0
(()) ! (()
Ability to keep stack information Equivalence
to Pushdown Automata (PDA) / context-free
languages
Finite context cannot store potentially infinite
stack
Ability to store at least two binary stacks
Turing Machine Equivalence
? connecting context models to Chomsky hierarchy

7
Why capacity matters

Explore dynamics of algorithms in detail
Distinguish power of different models
different contexts within GSOMSD e.g., justify
huge memory costs of RecSOM compared to other
models
to other approaches e.g., supervised recurrent
networks
Supervised recurrent networks Turing machine
equivalence
in exponential time for sigmoidal activation
functions Kilian/Siegelmann 96
in polynomial time for semilinear activation
functionsSiegelmann/Sontag 95

8
Various recursive models and their capacity
TKM Chappell/Taylor,93 RSOMKoskela/Varsta/Heikkonen,98 MSOMHammer/Strickert,04 SOMSDHagenbuchner/Sperduti/Tsoi,03 RecSOMVoegtlin,02
context neuron itself neuron itself winner content winner index exp(all act.)
encoding input space input space input space index space activ. space
lattice all all all SOM/ HSOM all
capacity ltFSA ltFSA FSA FSA PDA
TKM Chappell/Taylor,93 RSOMKoskela/Varsta/Heikkonen,98 MSOMHammer/Strickert,04 SOMSDHagenbuchner/Sperduti/Tsoi,03 RecSOMVoegtlin,02
context neuron itself neuron itself winner content winner index exp(all act.)
encoding input space input space input space index space activ. space
lattice all all all SOM/ HSOM all
capacity ltFSA ltFSA FSA FSA PDA
for WTA semilinear context
9
Overview

Introduction GSOMSD models and capacities
Main part Implementing deterministic push-down
automata in RecSOM
Conclusion

10
Goal Build a Deterministic PDA

Using
the GSOMSD recursive equation di(t) ad(xt,wi)
ßdr(Ct,ci)
L1 distances for d, dr (i.e., d(a,b) a-b)
parameters
a1
ß¼
modified RecSOM context
instead of original exp(-di) max(1-di,0)
similar overall shape
easier to handle analytically
additionally, winner-takes-all
required at one place in the proof...
makes life easier overall, however

11
Three levels of abstraction

Layerstemporal (vertical) grouping ?
feed-forward architecture
Operatorsfunctional grouping ? defined
operations
Phasescombining operators -gt automaton
simulations

12
First level of abstraction Feed-forward

One-dimensional input weights w
Encoding function enc S ? Nl Input symbol si ?
series of inputs (i, e, 2e, 3e, , (l-1)e)
with e gtgt max(i)S
Each neuron is active for at most one component
of enc
resulting in l layers of neurons
In layer l, we know that only neurons from layer
l-1 have been active, i.e. are represented gt0 in
context
pass on activity from layer to layer

13
Feed-forward Generic Architecture

Sample architecture
2 states Sa,b
2 inputs Ss0,s1
Output layer
Network represents state a ?? a active ??
C(0,0,...,1,0)
Network represents state b ?? b active ??
C(0,0,...,0,1)

Simulation of a state transition function d S x
S ? S
a
b
(l-1)e
Hidden layers arbitrary intermediate
computations
e

Input layer
encoding input X state
get input via input weight
get state via context weight

0,1
enc
wc
14
Second Layer of Abstraction Operators

We might be finished
In fact, we are - for the FSA case
However, what about the stack?
looks like ?0 ?1 ?1 ...
how to store potentially infinite symbol
sequences?
General idea
Encode stack in the winner neurons activation
Then build operators to
read
modify or
copy
the stack by changing the winners activation

15
Encoding the stack in neurons activations

To save a sequence of stack symbols within the
map,
turn ?0 ?1 ?1 into binary sequence alpha011
f4(alpha)
f4 (?) 0
f4 (0) ¼
f4 (1) ¾
f4 (01) ¼ 3/16
f4 (011) ¼ 3/16 3/64
push(s,?1) ¼s ¾
pop(s,?0 ) (s- ¼)4
Encode stack in activation Activation a 1¼s
? push(a, ?1) 13/16 -1/16s
pop(a, ?0) 5/4 s

16
Operators

COPY
copy activation into next layer
TOP
identify top stack symbol
OR
get activation of active neurons(if any)
PUSH
modify activation for push
push(a, ?1) 13/16 -1/16s
POP
modify activation for pop
pop(a, ?0) 5/4 s

17
Third abstraction Phases
Set Content Elements generic / examples Elements generic / examples
S States s a s(uccess) f(ail)
S Input alphabet s ( ) e
G Stack alphabet ? ( ?
U Stack actions push( push pop( pop) do nothing
a
b
Phase Task Input ? Output Required operators
Finalize Collect all results leading to same state U x S ? S OR, COPY
Execute Manipulate stack where needed U x S ? U x S PUSH,COPY/POP,COPY
Merge Collect all states resulting in common stack state S x S x G ? U x S OR, COPY
Separate Read top stack symbol S x S ? S x S x G TOP
18
The final architecture
0S2 1S3
19
Overview

Introduction GSOMSD models and capacities
Main part Implementing deterministic push-down
automata in RecSOM
Conclusion

20
Conclusions

RecSOM stronger computational capacity than
SOMSD/MSOM
Does this mean its worth the cost?
Simulations not learnable with Hebbian learning
Practical relevance questionable
Anyway Elaborate context (costly) rather
hindering for simulations
too much context results in a lot of noise
maybe better simpler models, slightly enhanced
for example MSOM, SOMSD with context variable
indicating last winners activation
Turing machines also possible?
Storing two stacks into a real number is possible
Reconstructing two stacks from real number is
hard
particularly when using only differences
may have to leave constant-size simulations
Other representation of stacks may be required