Title: Overview of multilayer neural networks
1Overview of multilayer neural networks
- Chapter 6 in Duda et. al.
There is nothing particularly magical about
multilayer neural networks they implement linear
discriminants, but in a space where the inputs
have been mapped nonlinearly, Duda, Hart, Stork
2Multilayer neural networks
- In general a NN implements a non-linear mapping
- For classification
- Input is the d-dimensional feature vector x
- Output is the c discriminant functions
- We strive to obtain
- Example
3-d feature vectors, two-category case, neural
network with 5 hidden units
3Terminology of neural networks
Weights, synapses
Bias weights
Target vector
Non-linearity,activation function
A hidden unit
Net activation
Input layer
Output layer
Hidden layer
4Structure of a neural network
- We will study fully-connected, three layer
networks with a fixed non-linearity - We train the NN by optimizing the weights
according to some criterion - Generalizations
- Different non-linearities in each node.
- Other network topologies not fully connected,
feedback paths
Sloppy notation! Weight indices are used to
distinguish between layers
5Sigmoid non-linearities
- Sigmoids are non-decreasing, scalar functions
that satisfy - Examples
- For training it is beneficial (if not crucial)
that the sigmoid is differentiable
Hard limiter or step function
6Expressive power of neural networks
- Neural networks can implement any
multidimensional mapping - Kolmogorov (1957) finite number of hidden units
but unknown and arbitrarily complex scalar
non-linearities - Hornik (Neural networks, vol 4, 1990) and many
others fixed scalar non-linearities (continuous,
bounded, non-constant) but arbitrarily many
hidden units - This situation is closer to practice where we
typically use differentiable sigmoids, and vary
the number of hidden units until satisfactory
performance. - In practice engineering skills are more important
- Application specific knowledge that guide the
choice of network topology - Number of hidden layers
- Number of units in each hidden layer
- Feedback networks
- Pruning techniques
7Backpropagation training of neural networks
- Supervised learning
- For each feature vector there is an
associated target vector - A gradient descent algorithm that modifies the
weights iteratively so that the MSE is
minimized - Often a stochastic gradient descent algorithm is
used - For each input vector we consider the error
- Calculate the stochastic gradient with respect to
all weightsand update by - How choose the target vectors?
- We do not know the posterior probabilities!
- Somehow the target vector should indicate the
category - For the batch version we have
Training data fromall categories
8What more for session 6?
- Read sections 6.1-6.6
- Derivation of the backpropagation algorithm
- Convergence of gradient algorithms
- Interpretations of neural networks
- Mapping of feature vectors to a space where they
can be linearly separated - MSE approximation of the Bayes discriminant
functions - Gives one idea how to specify the target vector