Learning with Neural Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Learning with Neural Networks

Description:

Determine how to change weights to get correct output ... Which weights have greatest effect on error? Effectively, partial derivatives of error wrt weights ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 25
Provided by: ginal5
Category:

less

Transcript and Presenter's Notes

Title: Learning with Neural Networks


1
Learning with Neural Networks
  • Artificial Intelligence
  • CMSC 25000
  • February 19, 2002

2
Agenda
  • Neural Networks
  • Biological analogy
  • Review single-layer perceptrons
  • Perceptron Pros Cons
  • Neural Networks Multilayer perceptrons
  • Neural net training Backpropagation
  • Strengths Limitations
  • Conclusions

3
Neurons The Concept
Dendrites
Axon
Nucleus
Cell Body
Neurons Receive inputs from other neurons (via
synapses) When input exceeds threshold,
fires Sends output along axon to other
neurons Brain 1011 neurons, 1016 synapses
4
Perceptron Structure
Single neuron-like element -Binary inputs
output -Weighted sum of inputs gt threshold
y
w0
wn
w1
w3
w2
x0-1
x1
x3
x2
xn
. . .
Until perceptron correct output for all If
the perceptron is correct, do nothing If the
percepton is wrong, If it incorrectly says
yes, Subtract input vector from weight
vector Otherwise, add input vector to it
compensates for threshold
x0 w0
5
Perceptron Learning
  • Perceptrons learn linear decision boundaries
  • E.g.
  • Guaranteed to converge, if linearly separable
  • Many simple functions NOT learnable

x2

0
But not
0

x1
xor
6
Neural Nets
  • Multi-layer perceptrons
  • Inputs real-valued
  • Intermediate hidden nodes
  • Output(s) one (or more) discrete-valued

X1
Y1 Y2
X2
X3
X4
Inputs
Hidden
Hidden
Outputs
7
Neural Nets
  • Pro More general than perceptrons
  • Not restricted to linear discriminants
  • Multiple outputs one classification each
  • Con No simple, guaranteed training procedure
  • Use greedy, hill-climbing procedure to train
  • Gradient descent, Backpropagation

8
Solving the XOR Problem
o1
w11
Network Topology 2 hidden nodes 1 output
w13
x1
w01
w21
y
-1
w23
w12
w03
w22
x2
-1
w02
o2
Desired behavior x1 x2 o1 o2 y 0 0 0
0 0 1 0 0 1 1 0 1 0 1
1 1 1 1 1 0
-1
Weights w11 w121 w21w22 1 w013/2 w021/2
w031/2 w13-1 w231
9
Backpropagation
  • Greedy, Hill-climbing procedure
  • Weights are parameters to change
  • Original hill-climb changes one parameter/step
  • Slow
  • If smooth function, change all parameters/step
  • Gradient descent
  • Backpropagation Computes current output, works
    backward to correct error

10
Producing a Smooth Function
  • Key problem
  • Pure step threshold is discontinuous
  • Not differentiable
  • Solution
  • Sigmoid (squashed s function) Logistic fn

11
Neural Net Training
  • Goal
  • Determine how to change weights to get correct
    output
  • Large change in weight to produce large reduction
    in error
  • Approach
  • Compute actual output o
  • Compare to desired output d
  • Determine effect of each weight w on error d-o
  • Adjust weights

12
Neural Net Example
xi ith sample input vector w weight vector
yi desired output for ith sample
-
Sum of squares error over training samples
From 6.034 notes lozano-perez
Full expression of output in terms of input and
weights
13
Gradient Descent
  • Error Sum of squares error of inputs with
    current weights
  • Compute rate of change of error wrt each weight
  • Which weights have greatest effect on error?
  • Effectively, partial derivatives of error wrt
    weights
  • In turn, depend on other weights gt chain rule

14
Gradient Descent
dG dw
E
  • E G(w)
  • Error as function of weights
  • Find rate of change of error
  • Follow steepest rate of change
  • Change weights s.t. error is minimized

G(w)
w0w1
w
Local minima
15
Gradient of Error
-
Note Derivative of sigmoid ds(z1)
s(z1)(1-s(z1)) dz1
From 6.034 notes lozano-perez
16
From Effect to Update
  • Gradient computation
  • How each weight contributes to performance
  • To train
  • Need to determine how to CHANGE weight based on
    contribution to performance
  • Need to determine how MUCH change to make per
    iteration
  • Rate parameter r
  • Large enough to learn quickly
  • Small enough reach but not overshoot target values

17
Backpropagation Procedure
i
j
k
  • Pick rate parameter r
  • Until performance is good enough,
  • Do forward computation to calculate output
  • Compute Beta in output node with
  • Compute Beta in all other nodes with
  • Compute change for all weights with

18
Backprop Example
Forward prop Compute zi and yi given xk, wl
From 6.034 notes lozano-perez
19
Backpropagation Observations
  • Procedure is (relatively) efficient
  • All computations are local
  • Use inputs and outputs of current node
  • What is good enough?
  • Rarely reach target (0 or 1) outputs
  • Typically, train until within 0.1 of target

20
Neural Net Summary
  • Training
  • Backpropagation procedure
  • Gradient descent strategy (usual problems)
  • Prediction
  • Compute outputs based on input vector weights
  • Pros Very general, Fast prediction
  • Cons Training can be VERY slow (1000s of
    epochs), Overfitting

21
Training Strategies
  • Online training
  • Update weights after each sample
  • Offline (batch training)
  • Compute error over all samples
  • Then update weights
  • Online training noisy
  • Sensitive to individual instances
  • However, may escape local minima

22
Training Strategy
  • To avoid overfitting
  • Split data into training, validation, test
  • Also, avoid excess weights (less than samples)
  • Initialize with small random weights
  • Small changes have noticeable effect
  • Use offline training
  • Until validation set minimum
  • Evaluate on test set
  • No more weight changes

23
Classification
  • Neural networks best for classification task
  • Single output -gt Binary classifier
  • Multiple outputs -gt Multiway classification
  • Applied successfully to learning pronunciation
  • Sigmoid pushes to binary classification
  • Not good for regression

24
Neural Net Conclusions
  • Simulation based on neurons in brain
  • Perceptrons (single neuron)
  • Guaranteed to find linear discriminant
  • IF one exists -gt problem XOR
  • Neural nets (Multi-layer perceptrons)
  • Very general
  • Backpropagation training procedure
  • Gradient descent - local min, overfitting issues

25
Backpropagation
An efficient method of implementing gradient
descent for neural networks
Descent rule
Backprop rule
yi is xi for input layer
  1. Initialize weights to small random values
  2. Choose a random sample input feature vector
  3. Compute total input ( ) and output ( )
    for each unit (forward prop)
  4. Compute for output layer
  5. Compute for preceding layer by backprop rule
    (repeat for all layers)
  6. Compute weight change by descent rule (repeat for
    all weights)

Notation in Winstons book
Write a Comment
User Comments (0)
About PowerShow.com