The flavors of Neural Networks - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

The flavors of Neural Networks

Description:

Sigmoid. Sin(x) Should be absolutely equivalent. Check on results. Discrete activation function ... Learning rule: ?wi = a( z o ) zi. Convergence th: finds a ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 19
Provided by: sophia6
Category:

less

Transcript and Presenter's Notes

Title: The flavors of Neural Networks


1
The flavors of Neural Networks
2
Variations on Back-Propagation
  • Changing activation function
  • Sigmoid
  • Sin(x)
  • Should be absolutely equivalent
  • Check on results
  • Discrete activation function

z sign ( w1 z1 w2 z2 wn zn t )
Perceptron no hidden layer
Learning rule ?wi a( z o ) zi
Convergence th finds a solution
3
Perceptron can not learn CNOT
CNOT 0 0 0 0 1 1 1 0
1 1 1 0
? Can we get it with
-1 -1 -1 -1 1 1 1 -1 1 1 1
-1
z sign ( w1 z1 w2 z 2 t )
NO - w1 - w2 - t lt 0 - w1 w2 - t gt 0
w1 - w2 - t gt 0 w1 w2 - t lt 0
t gt 0
t lt 0
Perceptron cannot carry all of Boole algebra
4
Perceptron divides the plane
CNOT is not separable
Multistate perceptrons hyperplanes on
n-dimensional spaces
Multilayer NN non-linear cuts !
5
Changing learning rate Start with a large a to
search the gloal minimum. Avoid hovering by
decreasing a as the number of cycles increases.
Local learning rate addaptation a (t) Decaying
weights Systematically reduce weights after
every cycle. Back-propagation will have to
enhance back relevant weights. Irrelevant weights
go to zero. Dynamical change of architecture
6
Choosing the appropriate error function Not all
patterns are known with the same precision
is conceptually wrong
Add diagonal errors
Patterns with big errors weight very little The
NN first fits the precise data
7
Often patterns carry systematic errors which are
correlated
Very heavy CPU overload Batch training necessary
8
NN and inference
  • A neural network is a map
  • The error function is a functional depending on
    nets
  • Under assumptions
  • Patterns are independent
  • Underlying probability dist for
    has zero mean
  • Mean square deviations ?? are uniformly bounded
  • Strong law of large numbers applies

9
For patterns ?8
The minimum for o(x) corresponds to
NN trained to minimize standard error are
Bayesian classifiers Take output layer as a 0s or
1s a1,.,C classes and
z(a)(0,0,0,..,1(a),0,0,0)
Conditional probability of x belonging to class a
10
Unsupervised learning
Is it possible to learn with no teacher (error
function)? Yes You learn songs from ads
to guess the age of a person
forecast polititian
behavior
Redundancy
11
Hebbian learning
Weights diverge
Ojas rule Add a V2 term to the hebbian rule
example
The increment depends on the input and the
back-propagated output No run away learning
Sangers rule
feedforwad
12
Principal Component Analysis Find a set of M
orthogonal vectors which represent faithfully the
data variance A map from N data variable to M
directions preserves somehow information
(relative distance)
13
Competitive unsupervised learning
Kohonen topological maps
  • Empiric evidence for bi-dimensional
    representation of sensory information in the
    brain
  • Idea Near units behave similarly
  • Information reaches neural net organized on a
    plane. A neuron is faithful to the represented
    information, its weights are comunicated to
    nearest neighbors

14
VM
wi is the vector of weigths connecting to neuron
i and is normalized
V1
i
..
winner takes all
?1
?N
Corrects for error
Enhances de winner localizes pattern
representation
Self organizing map
15
The identity NN
NN Rn x Rn
Train NN to be the identity inout
The hidden layer implements data compression!
and clustering!
16
Recurrent Neural Networks
Feed-Forward NN are not Natures solution
Feed-forward is an architecture dictated by
mathematical simplicity
17
t t1
layer 1 Layer 2 Layer 3
?
We feed layer 2 with activation from previous
pattern
Recurrent Back-Propagation Back-propagation
through time
Back propagation algorithm is easily adapted to
this scenario
18
And many more
Radial Basis Function NN Probabilistic
NN Adaptive Resonance Theory
For standard applications use standard feed-forwar
d multilayer back-propagation NN
Write a Comment
User Comments (0)
About PowerShow.com