Overview of multilayer neural networks

About This Presentation

Title:

Overview of multilayer neural networks

Description:

'There is nothing particularly magical about multilayer neural networks; they ... but in a space where the inputs have been mapped nonlinearly', Duda, Hart, Stork ... – PowerPoint PPT presentation

Number of Views:33

Avg rating:3.0/5.0

Slides: 9

Provided by: Jon8152

Category:

more less

Transcript and Presenter's Notes

Title: Overview of multilayer neural networks

1
Overview of multilayer neural networks

Chapter 6 in Duda et. al.

There is nothing particularly magical about
multilayer neural networks they implement linear
discriminants, but in a space where the inputs
have been mapped nonlinearly, Duda, Hart, Stork
2
Multilayer neural networks

In general a NN implements a non-linear mapping
For classification
Input is the d-dimensional feature vector x
Output is the c discriminant functions
We strive to obtain
Example

3-d feature vectors, two-category case, neural
network with 5 hidden units
3
Terminology of neural networks
Weights, synapses
Bias weights
Target vector
Non-linearity,activation function
A hidden unit
Net activation
Input layer
Output layer
Hidden layer
4
Structure of a neural network

We will study fully-connected, three layer
networks with a fixed non-linearity
We train the NN by optimizing the weights
according to some criterion
Generalizations
Different non-linearities in each node.
Other network topologies not fully connected,
feedback paths

Sloppy notation! Weight indices are used to
distinguish between layers
5
Sigmoid non-linearities

Sigmoids are non-decreasing, scalar functions
that satisfy
Examples
For training it is beneficial (if not crucial)
that the sigmoid is differentiable

Hard limiter or step function
6
Expressive power of neural networks

Neural networks can implement any
multidimensional mapping
Kolmogorov (1957) finite number of hidden units
but unknown and arbitrarily complex scalar
non-linearities
Hornik (Neural networks, vol 4, 1990) and many
others fixed scalar non-linearities (continuous,
bounded, non-constant) but arbitrarily many
hidden units
This situation is closer to practice where we
typically use differentiable sigmoids, and vary
the number of hidden units until satisfactory
performance.
In practice engineering skills are more important
Application specific knowledge that guide the
choice of network topology
Number of hidden layers
Number of units in each hidden layer
Feedback networks
Pruning techniques

7
Backpropagation training of neural networks

Supervised learning
For each feature vector there is an
associated target vector
A gradient descent algorithm that modifies the
weights iteratively so that the MSE is
minimized
Often a stochastic gradient descent algorithm is
used
For each input vector we consider the error
Calculate the stochastic gradient with respect to
all weightsand update by
How choose the target vectors?
We do not know the posterior probabilities!
Somehow the target vector should indicate the
category
For the batch version we have