Connectionist Computing CS4018 - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Connectionist Computing CS4018

Description:

The network was given a stream of words, with the corresponding phonemes. ... Each map is composed by a neuron (always the same) mapping a 5x5 area into a unit. ... – PowerPoint PPT presentation

Number of Views:55

Avg rating:3.0/5.0

Slides: 25

Provided by: gruye

Category:

more less

Transcript and Presenter's Notes

Title: Connectionist Computing CS4018

1
Connectionist ComputingCS4018

Gianluca Pollastri
office CS A1.07
email gianluca.pollastri_at_ucd.ie

2
Credits

Geoffrey Hinton, University of Toronto.
borrowed some of his slides for Neural Networks
and Computation in Neural Networks courses.
Ronan Reilly, NUI Maynooth.
slides from his CS4018.
Paolo Frasconi, University of Florence.
slides from tutorial on Machine Learning for
structured domains.

3
Lecture notes

http//gruyere.ucd.ie/2007_courses/4018/
Strictly confidential...

4
Books

No book covers large fractions of this course.
Parts of chapters 4, 6, (7), 13 of Tom Mitchells
Machine Learning
Parts of chapter V of Mackays Information
Theory, Inference, and Learning Algorithms,
available online at
http//www.inference.phy.cam.ac.uk/mackay/itprnn/b
ook.html
Chapter 20 of Russell and Norvigs Artificial
Intelligence A Modern Approach, also available
at
http//aima.cs.berkeley.edu/newchap20.pdf
More materials later..

5
Paper 2

Read the paper NETtalk a parallel network that
learns to read aloud, by Sejnowski and Rosenberg
(1986).
The paper is linked from the course website.
Email me (gianluca.pollastri_at_ucd.ie) a 250 word
MAX summary by Feb the 26nd at midnight in any
time zone of your choice.
5. 1 off each day late.
You are responsible for making sure I get it, etc
etc.

6
MLP applications matching words and sounds

Sejnowski and Rosenberg, NETtalk, a parallel
network that learns to read aloud, Cognitive
Science, 14, 179-211 (1986)
Teaching an MLP how to pronounce English by
backprop.
The network was given a stream of words, with the
corresponding phonemes.
Once the network had learned, it was possible to
make it read.

7
MLP applications protein secondary structure
prediction

Proteins are strings
FEFHGYARSGVIMNDSGASTKS
GAYITPAGETGGAIGRLGNQAD
TYVEMNLEHKQTLDNG
Structures too

8
deep network

4 hidden layers

9
Feature maps

Hidden layers 1 and 3 implement feature maps.
Layer 1 Input is the 16x16 image, with borders
added for technical reasons -gt 28x28. Output is
composed by 4 maps of 24x24 units. This is really
implemented with 4 neurons, each taking 5x5
inputs, each replicated in each possible position
on the input map.
This sounds complicated but is fairly easy
instead of a (28x28)-gt(24x24x4) full connectivity
only 5x5 inputs are connected to each output
unit. Not only, but there are just 4 neurons/sets
of weights (weight sharing).
So, a (5x5)-gt4 full connectivity that sweeps the
whole input. Only 104 weights including biases!

10
Averaging/subsampling layer

Hidden layers 2 and 4 implement
averaging/subsampling stages.
Layer 2 24x24x4 -gt 12x12x4
This is performed using 4 units, each one doing a
2x2-gt1 mapping. Weights are constrained to be all
the same.

11
layers 3, 4 and 5

Layer 3 more feature maps. 12 8x8 maps. Each
map is composed by a neuron (always the same)
mapping a 5x5 area into a unit.
Layer 4 Same as layer 2. (8x8x12) -gt (4x4x12)
Layer 5 10 output units fully connected to layer
4. This is where most weights are.

12
overall

5 layers, position invariance encoded in the
architecture, a lot of weights shared.
100k connections -gt 2k independent parameters.
every weight is shared on average by 50
connections.
Training complexity is still o(100k) though.

13
Training the network

Training is by gradient descent, using
backpropagation.
For each copy j of a shared weight there will be
a ?wj. They are simply added together.

14
Results

After 30 epochs the error on the training set was
1.1 and the squared error 0.017.
On the test set 3.4 and 0.024
To get 1 error 5.7 rejection (9 on just
handwritten)
A lot of these were actually caused by
preprocessing. Some of those that werent, were
ambiguous even to humans.

15
Invariances

In Le Cuns paper we saw translation invariance
was introduced into the network by weight
sharing.
Teaching neural networks invariances is a general
problem.

16
The invariance problem

Our perceptual systems are very good at dealing
with invariances
translation, rotation, scaling
deformation, contrast, lighting, rate
We are so good at this that its hard to
appreciate how difficult it is.
Its one of the main difficulties in making
computers perceive.
We still dont have generally accepted solutions.

17
Invariances using features

Instead of representing an object directly,
extract whatever-invariant features first.
For instance, if we want roto-translational
invariance, represent an object by the distances
between parts instead of xyz coordinates.

18
Invariances normalisation

For instance put a box around an object, then
scale it to a fixed size same as preprocessing
digits in Le Cun et al.
Eliminates degrees of freedom.
Not always trivial how to choose the box.

19
Invariances brute force

We can tackle invariances by
constraining network weights
using features
normalising
But computer scientists should be lazy and
impatient. Wouldnt it be great if we could let
the network do all the job?
Brute force to create invariance to
trasformation X, for each example generate a lot
of new other examples by applying X to it. Then
train a large network on a fast computer.

20
Invariances brute force

For example, translate and rotate a digit in a
lot of different ways and train a large network
to recognise it.
It generally works well, if the transformations
arent too large do approximate (easy)
normalisation first.

21
Summary invariances

Often as tough a problem as learning after
invariances are tackled.
Possible solutions
network design
features
normalisation
brute force

22
Problems with squared error

So far, for gradient descent, we used
Error squared
Output function linear or sigmoid (binary wont
work)
There are tricky problems with squared error. For
instance if the desired output is 1 and the
actual output is very close to 0 there is almost
no gradient.

23
Problems with squared error