Title: The flavors of Neural Networks
1The flavors of Neural Networks
2Variations on Back-Propagation
- Changing activation function
- Sigmoid
- Sin(x)
- Should be absolutely equivalent
- Check on results
- Discrete activation function
z sign ( w1 z1 w2 z2 wn zn t )
Perceptron no hidden layer
Learning rule ?wi a( z o ) zi
Convergence th finds a solution
3Perceptron can not learn CNOT
CNOT 0 0 0 0 1 1 1 0
1 1 1 0
? Can we get it with
-1 -1 -1 -1 1 1 1 -1 1 1 1
-1
z sign ( w1 z1 w2 z 2 t )
NO - w1 - w2 - t lt 0 - w1 w2 - t gt 0
w1 - w2 - t gt 0 w1 w2 - t lt 0
t gt 0
t lt 0
Perceptron cannot carry all of Boole algebra
4Perceptron divides the plane
CNOT is not separable
Multistate perceptrons hyperplanes on
n-dimensional spaces
Multilayer NN non-linear cuts !
5Changing learning rate Start with a large a to
search the gloal minimum. Avoid hovering by
decreasing a as the number of cycles increases.
Local learning rate addaptation a (t) Decaying
weights Systematically reduce weights after
every cycle. Back-propagation will have to
enhance back relevant weights. Irrelevant weights
go to zero. Dynamical change of architecture
6Choosing the appropriate error function Not all
patterns are known with the same precision
is conceptually wrong
Add diagonal errors
Patterns with big errors weight very little The
NN first fits the precise data
7Often patterns carry systematic errors which are
correlated
Very heavy CPU overload Batch training necessary
8NN and inference
- A neural network is a map
- The error function is a functional depending on
nets - Under assumptions
- Patterns are independent
- Underlying probability dist for
has zero mean - Mean square deviations ?? are uniformly bounded
- Strong law of large numbers applies
9For patterns ?8
The minimum for o(x) corresponds to
NN trained to minimize standard error are
Bayesian classifiers Take output layer as a 0s or
1s a1,.,C classes and
z(a)(0,0,0,..,1(a),0,0,0)
Conditional probability of x belonging to class a
10Unsupervised learning
Is it possible to learn with no teacher (error
function)? Yes You learn songs from ads
to guess the age of a person
forecast polititian
behavior
Redundancy
11Hebbian learning
Weights diverge
Ojas rule Add a V2 term to the hebbian rule
example
The increment depends on the input and the
back-propagated output No run away learning
Sangers rule
feedforwad
12Principal Component Analysis Find a set of M
orthogonal vectors which represent faithfully the
data variance A map from N data variable to M
directions preserves somehow information
(relative distance)
13Competitive unsupervised learning
Kohonen topological maps
- Empiric evidence for bi-dimensional
representation of sensory information in the
brain - Idea Near units behave similarly
- Information reaches neural net organized on a
plane. A neuron is faithful to the represented
information, its weights are comunicated to
nearest neighbors -
14VM
wi is the vector of weigths connecting to neuron
i and is normalized
V1
i
..
winner takes all
?1
?N
Corrects for error
Enhances de winner localizes pattern
representation
Self organizing map
15The identity NN
NN Rn x Rn
Train NN to be the identity inout
The hidden layer implements data compression!
and clustering!
16Recurrent Neural Networks
Feed-Forward NN are not Natures solution
Feed-forward is an architecture dictated by
mathematical simplicity
17t t1
layer 1 Layer 2 Layer 3
?
We feed layer 2 with activation from previous
pattern
Recurrent Back-Propagation Back-propagation
through time
Back propagation algorithm is easily adapted to
this scenario
18And many more
Radial Basis Function NN Probabilistic
NN Adaptive Resonance Theory
For standard applications use standard feed-forwar
d multilayer back-propagation NN