Artificial Neural Networks - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Artificial Neural Networks

Description:

Robust to errors in training data. ... 1 e-net. sigmoid unit. 27. Multilayer Networks. Sigmoid unit: (y)/ y = (y).(1 - (y)) 28 ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 43
Provided by: caohoa
Category:

less

Transcript and Presenter's Notes

Title: Artificial Neural Networks


1
Artificial Neural Networks
  • Learning real-valued, discrete-valued, and
    vector-valued functions from examples.
  • Robust to errors in training data.
  • Applications interpreting visual scenes, speech
    recognition, robot control strategies

2
Biological Motivation
  • Human brain 1011 neurons
  • Each connected to 104 others
  • Switching time 10-3 seconds
  • (Computer switching speed 10-10 seconds)
  • It requires 10-1 seconds to recognize a human
    face
  • ? highly parallel and distributed processes

3
Biological Motivation
  • ANN model is not the same as that of biological
    neural systems
  • Using ANNs to study and model biological learning
    processes
  • Obtaining highly effective machine learning
    algorithms

4
ANN Representation
.......
.......
5
Appropriate Problems for ANNs
  • Instances are represented by many attribute-value
    pairs
  • Target function output may be discrete-valued,
    real-valued, vector-valued
  • Training examples can contain errors
  • Long training time is acceptable
  • Fast evaluation of the learned target function
    may be required
  • Understanding the learned target concept is not
    important

6
Perceptrons
w1
x1
x0 1
w0
w2
x2
..........
? wixi
1 if ? wixi ? 0 o -1 otherwise
wn
xn
7
Perceptrons
x2
x0 1
w0
-

w1
x1
x1
? wixi
-

xn
w2
1 if ? wixi ? 0 o -1 otherwise
A ??B
8
Perceptron Training Rule
  • wi ? wi ?wi
  • ?wi ?(t o)xi
  • t target output of the current training example
  • o the thresholded output generated by the
    perceptron
  • ? learning rate (positive constant)

9
Perceptron Training Rule
wi ? wi ?wi ?wi (t o)xi
10
Perceptron Training Rule
x2
x2


-



x1
x1
-
-

-
linearly separable
non linearly separable
11
Perceptron Training Rule
  • The learning procedure converges to a weight
    vector that correctly classifies all linearly
    separable training examples

12
Perceptron Training Rule
  • Minsky, M. Papert, S. (1969). Perceptrons.
  • MIT Press.

13
Gradient Descent Rule
w1
x1
x1 1
w0
w2
x2
..........
? wixi
1 if ? wixi ? 0 o -1 otherwise
wn
xn
linear unit
14
Gradient Descent Rule
  • Training error
  • E(w) ?d?D(td od)2/2
  • td target output of training example d
  • od the unthresholded output for d ( w. x)

15
Gradient Descent Rule
16
Gradient Descent Rule
  • Gradient of E (steepest increase direction)
  • ?E(w) ?E/?w0, ?E/?w1, ... , ?E/?wn
  • w ? w ?w
  • w ? w ??E(w)

17
Gradient Descent Rule
  • wi ? wi ?wi
  • ?wi -??E/?wi
  • ?E/?wi -?d?D(td od)xid
  • ?wi ??d?D(td od)xid

18
Gradient Descent Rule
  • Converging to a local minimum can be quite slow
  • No guarantee to converge to the global minimum

19
Stochastic Approximation
  • Delta rule
  • ?wi ?(td od)xid
  • E(w) (td od)2/2

20
Stochastic Approximation
  • Weights are updated upon examining each training
    example
  • Less computation per weight update step is
    required
  • Falling into local minima can be avoided

21
Stochastic Approximation
  • The delta rule converges towards a best-fit
    approximation to the target concept, regardless
    of whether the training data are linearly
    separable

22
Multilayer Networks
  • Single perceptrons can express only linear
    decision surfaces
  • A multilayer network can represent highly
    nonlinear decision surfaces

23
Multilayer Networks
head
hid
who'd
hood
.......
.......
F1
F2
24
Multilayer Networks
25
Multilayer Networks
  • What type of unit ?
  • Perceptrons non-differentiable
  • Linear units only linear functions
  • ....

26
Multilayer Networks
w1
x1
x1 1
w0
w2
x2
..........
1 o ?(net) ???? 1 e-net
wn
net ? wixi
xn
sigmoid unit
27
Multilayer Networks
  • Sigmoid unit
  • ??(y)/?y ?(y).(1 - ?(y))

28
Backpropagation Algorithm
  • Training error
  • E(w) ?d?D ?k?outputs (tkd okd)2/2

29
Backpropagation Algorithm
oh
ok
hid h
out k
in i
whi
wkh
?k
?h
?k ok(1 ok)(tk ok) ?h oh(1
oh)?kwkh?k wji ? wji ??jxji
30
Backpropagation Algorithm
xji
j
i
wji
?j
wji ? wji ??jxji
31
Backpropagation Algorithm
  • Adding momentum
  • ?wji(n) ??jxji a?wji(n - 1)
  • iteration momentum
  • Keeping the search direction ? passing small
    local minima
  • Increasing the search step size ? speeding
    convergence

32
Backpropagation Algorithm
  • Learning in arbitrary acyclic networks

layer m
m1
or
os
r
s
wsr
?s
?r
?r or(1 or)?s?layer m1wsr?s
33
Backpropagation Algorithm
  • Convergence and local minima
  • Not guaranteed to converge towards the global
    minimum error, but highly effective in practice
  • Approximately linear when the weights are close
    to 0, hence passing local minima of non-linear
    functions

34
Backpropagation Algorithm
  • Heuristics to alleviate the local minima problem
  • Add a momentum term to the weight-update rule
  • Use stochastic gradient descent rather than true
    gradient descent
  • Train multiple networks using the same data, but
    initializing each network with different random
    weights

35
Backpropagation Algorithm
  • Representation power of feedforward networks
  • Boolean functions any one, using 2-layer (1
    hidden 1 output) networks
  • Continuous functions any bounded one with
    approximation, using 2-layer networks
  • Arbitrary functions any one with approximation,
    using 3-layer networks

36
Backpropagation Algorithm
  • Hypothesis space and inductive bias
  • Hypothesis every possible assignment of network
    weights
  • Inductive bias smooth interpolation between data
    points

37
Backpropagation Algorithm
  • Hidden layer representations

identity function
38
Backpropagation Algorithm
39
Backpropagation Algorithm
  • Stopping criterion and overfitting
  • Number of iterations
  • Limit of training errors

40
Backpropagation Algorithm
41
Applications
  • To recognize face pose
  • 30 ? 32 resolution input images
  • 4 directions left, straight, right, up
  • ? 960 ? 3 ? 4 network

42
Exercises
  • In Mitchells ML (Chapter 4) 4.1 to 4.10
Write a Comment
User Comments (0)
About PowerShow.com