Neural Network Architecture and Learning - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Neural Network Architecture and Learning

Description:

And use the results to compute gradients for intermediate weights. 3. Summary : ... Solution: Initialize weights to small nonzero values (on linear part of function) ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 17
Provided by: ccGa
Category:

less

Transcript and Presenter's Notes

Title: Neural Network Architecture and Learning


1
Neural Network Architecture and Learning
  • Guest Lecture by
  • Some slides by Jim Rehg

2
Recursive Error Propagation
Now we can compute the errors further from the
output recursively. And use the results to
compute gradients for intermediate weights.
3
Summary Calculating the Gradient Using Backprop
Do forward pass with current parameters to obtain
Compute errors for the output nodes
Recursion for backpropagating the error from
output to input
Weight gradient given by (backwards) error and
(forwards) node prediction
Reduce Error through gradient-based methods
4
Properties of Neural Networks
  • Fixed number of basis functions adapt to the data
  • Universal function approximator
  • Wide range of architectural possibilities
  • Trivially easy to handle very large datasets
    (out-of-memory training)
  • Patterns are presented to network sequentially
    and weights updated
  • Backprop is efficient O(w)
  • Many ways to make it faster
  • Hessian update, conjugate gradient, fastprop, etc.

5
Adaptive Bases
6
Construction of Input-Output Mapping
7
Neural Net as Universal Function Approximator
Fig. 5.3 from Bishop, Inputs are all
one-dimensional in these examples. Neural nets
are powerful. Training data, Learned function
8
Modular Training Via Jacobian
Fig. 5.8 from Bishop
  • Given pre-trained model
  • How to update weight w in blue module
    efficiently?
  • Green module has no effect
  • Red module participates in learning via its
    Jacobian

9
Challenges in Neural Net Training
  • Objective function is nonlinear, nonconvex
  • Local minima are a significant problem
  • How to control capacity?

10
Capacity Control
  • Capacity of network is roughly the number of
    hidden units
  • Many schemes for determining the number of hidden
    units
  • Standard approach to capacity control is
    regularization via early stopping

11
Early Stopping for Regularization
Fig. 5.13 from Bishop
12
Numerical Optimization
  • Training is local, gradient-based method
  • Various techniques for avoiding local minima
  • Momentum, stochastic gradient, etc.
  • Initialization procedure must be well-designed
  • Suppose weights are chosen to saturate function
    outputs?
  • Suppose weights are initialized to zero?
  • Solution Initialize weights to small nonzero
    values (on linear part of function)

13
Invariance
  • How to handle invariance to nuisance parameters
  • Rotation, position, scale of patterns such as
    handwritten digits
  • Solution 1 Augment training data set

14
Invariance
  • Solution 2 Tangent Propagation

15
Tangent Propagation
16
Convolutional Networks
Write a Comment
User Comments (0)
About PowerShow.com