Artificial Neural Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Artificial Neural Networks

Description:

Key property of the sigmoid is that it is differentiable. ... Derivative of the sigmoid. Generalized Delta Rule. For an output unit p we similarly have: ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 38
Provided by: alext8
Category:

less

Transcript and Presenter's Notes

Title: Artificial Neural Networks


1
Artificial Neural Networks
2
Artificial Neural Networks
  • Interconnected networks of simple units
    ("artificial neurons").
  • Weight wij is the weight of the ith input into
    unit j.
  • Depending on the final output value we determine
    the class.
  • If more than one output units then we choose the
    one with the greatest value.
  • Learning takes place by adjusting the weights in
    the network
  • so that the desired output is produced whenever a
    training instance is presented.

3
Single Perceptron Unit
  • We start by looking at a simpler kind of
    "neural-like" unit called a perceptron.

Depending on the value of h(x) it outputs one
class or the other.
4
Beyond Linear Separability
  • Values of the XOR boolean function cannot be
    separated by a single perceptron unit.

5
Multi-Layer Perceptron
  • Solution Combine multiple linear separators.
  • Introduction of "hidden" units into NN make them
    much more powerful
  • they are no longer limited to linearly separable
    problems.
  • Earlier layers transform the problem into more
    tractable problems for the latter layers.

6
Example XOR problem
Output class 0 or class 1
7
Example XOR problem
8
Example XOR problem
w23o2w13o1w030 w03-1/2, w13-1,
w231 o2-o1-1/20
9
Multi-Layer Perceptron
  • Any set of training points can be separated by a
    three-layer perceptron network.
  • Almost any set of points is separable by a
    two-layer perceptron network.

10
Backpropagation technique
  • High level summary
  • Present a training sample to the neural network.
  • Calculate the error in each output neuron. This
    is the local error.
  • Adjust the weights of each neuron to lower the
    local error.
  • Assign "blame" for the local error to neurons at
    the previous level, giving greater responsibility
    to neurons connected by stronger weights.
  • Repeat from step 3 on the neurons at the previous
    level, using each one's "blame" as its error.

11
Autonomous Land Vehicle In a Neural Network
(ALVINN)
  • ALVINN is an automatic steering system for a car
    based on input from a camera mounted on the
    vehicle.
  • Successfully demonstrated in a cross-country trip.

12
ALVINN
  • The ALVINN neural network is shown here. It has
  • 960 inputs (a 30x32 array derived from the pixels
    of an image),
  • 4 hidden units and
  • 30 output units (each representing a steering
    command).

13
SVMs vs. ANNs
  • Comparable in practice.
  • Some comment
  • "SVMs have been developed in the reverse order
    to the development of neural networks (NNs). SVMs
    evolved from the sound theory to the
    implementation and experiments, while the NNs
    followed more heuristic path, from applications
    and extensive experimentation to the theory.
    (Wang 2005)

14
Soft Threshold
  • A natural question to ask is whether we could use
    gradient ascent/descent to train a multi-layer
    perceptron.
  • The answer is that we can't as long as the output
    is discontinuous with respect to changes in the
    inputs and the weights.
  • In a perceptron unit it doesn't matter how far a
    point is from the decision boundary, we will
    still get a 0 or a 1.
  • We need a smooth output (as a function of changes
    in the network weights) if we're to do gradient
    descent.

15
Sigmoid Unit
  • Commonly used in neural nets is a "sigmoid"
    (S-like) function (see on the right).
  • The one used here is called the logistic
    function.
  • Value z is also called the "activation" of a
    neuron.

16
Training
  • Key property of the sigmoid is that it is
    differentiable.
  • This means that we can use gradient based methods
    of minimization for training.
  • The output of a multi-layer net of sigmoid units
    is a function of two vectors, the inputs (x) and
    the weights (w).
  • Well, as we train the ANN the training instances
    are considered fixed.
  • The output of this function (y) varies smoothly
    with changes in the weights.

17
Training
18
Training
½ is only to simplify the derivations.
19
Gradient Descent
We follow gradient descent Gradient of the
training error is computed as a function of the
weights.
Online version We consider each time only the
error for one data item
As a shorthand, we will denote y(xm,w) just by y.
20
Gradient Descent Single Unit
Substituting in the equation of previous slide we
get (for the arbitrary ith element of w)
Delta rule
21
Derivative of the sigmoid
22
Generalized Delta Rule
  • For an output unit p we similarly have

p3 in this example
23
Backpropagation Example
First do forward propagation Compute zis and
yis.
We'll see soon why delta2 and delta3 have these
formulas.
24
Deriving ?2 and ?2
We similarly derive delta1.
25
Backpropagation Algorithm
  • Initialize weights to small random values
  • Choose a random sample training item, say (xm,
    ym)
  • Compute total input zj and output yj for each
    unit (forward prop)
  • Compute ?p for output layer ?p yp(1-yp)(yp-ym)
  • Compute ?j for all preceding layers by backprop
    rule
  • Compute weight change by descent rule (repeat for
    all weights)
  • Note that each expression involves data local to
    a particular unit, we don't have to look around
    summing things over the whole network.
  • It is for this reason, simplicity, locality and,
    therefore, efficiency that backpropagation has
    become the dominant paradigm for training neural
    nets.

26
Generalized Delta Rule
In general, for a hidden unit j we have
27
Input And Output Encoding
  • For neural networks, all attribute values must be
    encoded in a standardized manner, taking values
    between 0 and 1, even for categorical variables.
  • For continuous variables, we simply apply the
    min-max normalization
  • X X - min(X)/max(X)-min(X)
  • For categorical variables use indicator (flag)
    variables.
  • E.g. marital status attribute, containing values
    single, married, divorced.
  • Records for single would have
  • 1 for single, and 0 for the rest, i.e. (1,0,0)
  • Records for married would have
  • 1 for married, and 0 for the rest, i.e. (0,1,0)
  • Records for divorced would have
  • 1 for divorced, and 0 for the rest, i.e. (0,0,1)
  • Records for unknown would have
  • 0 for all, i.e. (0,0,0)
  • In general, categorical attributes with k values
    can be translated into k - 1 indicator attributes.

28
Output
  • Neural network output nodes always return a
    continuous value between 0 and 1 as output.
  • Many classification problems have a dichotomous
    result, with only two possible outcomes.
  • E.g., Meningitis, yes or not"
  • For such problems, one option is to use a single
    output node, with a threshold value set a priori
    which would separate the classes.
  • For example, with the threshold of Yes if output
    ? 0.3," an output of 0.4 from the output node
    would classify that record as likely to be Yes.
  • Single output nodes may also be used when the
    classes are clearly ordered. E.g., suppose that
    we would like to classify patients disease
    levels. We can say
  • If 0 ? output lt 0.33, classify mild
  • If 0.33 ? output lt 0.66, classify severe
  • If 0.66 ? output lt 1, classify grave

29
Multiple Output Nodes
  • If we have unordered categories for the target
    attribute, we create one output node for each
    possible category.
  • E.g. for marital status as target attribute, the
    network would have four output nodes in the
    output layer, one for each of
  • single, married, divorced, and unknown.
  • Output node with the highest value is then chosen
    as the classification for that particular record.

30
NN for Estimation And Prediction
  • Since NN produce continuous output, they can be
    used for estimation and prediction.
  • Suppose, we are interested in predicting the
    price of a stock three months in the future.
  • Presumably, we would have encoded price
    information using the min-max normalization.
  • However, the neural network would output a value
    between zero and 1.
  • The min-max normalization needs to be inverted.
  • This denormalization is
  • prediction output (max min) min

31
ANN Example
32
Learning Weights
  • For an output unit p we similarly have

p3 in this example
33
Backpropagation
First do forward propagation Compute zis and
yis.
34
Backpropagation Example
First do forward propagation Compute zis and
yis. Suppose we have initially chosen
(randomly) the weights given in the table.
Also, in the table is given one training
instance (first column).
35
Feed-Forward Example
z1 1.00.50.40.60.20.80.70.6 1.32 y1
1/(1e(-z1)) 1/(1e(-1.32)) 0.7892 z2
1.00.70.40.90.20.80.70.4 1.5 y2
1/(1e(-z2)) 1/(1e(-1.5)) 0.8175 z3
1.00.5 0.790.9 0.820.9 1.95 y3
1/(1e(-z3)) 1/(1e(-1.95)) 0.87
3
w03
1
w13
w23
1
2
w31
w12
w02
w01
w21
w32
w11
w22
1
1
x3
x1
x2
36
Backpropagation
  • So, the network output, for the given training
    example, is y30.87.
  • Assume the actual value of the target attribute
    is y0.8
  • Then the prediction error equals 0.8 0.8750
    -0.075.
  • Now
  • ?3 y3(1-y3)(y3-y) 0.87(1-0.87)(0.87-0.8)
    0.008
  • Lets have a learning rate of ?0.01. Then, we
    update weights
  • w03 w03 - ? ?3 (1) 0.5 - 0.010.0081
    0.49918
  • w13 w13 - ? ?3 y1 0.9 - 0.010.008 0.7892
    0.8999
  • w23 w23 - ? ?3 y2 0.9 - 0.010.008 0.8175
    0.8999

37
Backpropagation
  • ?2
  • y2(1-y2)?3w23 0.8175(1-0.8175)0.0080.9
    0.001
  • ?1
  • y1(1-y1)?3w13 0.7892(1- 0.7892)0.0080.9
    0.0012
  • Then, we update weights
  • w02 w02 - ? ?2 (1) 0.7 - 0.010.0011
    0.6999
  • w12 w12 - ? ?2 x1 0.9 - 0.010.001 0.4
    0.8999
  • w22 w22 - ? ?2 x2 0.8 - 0.010.001 0.2
    0.7999
  • w32 w32 - ? ?2 x3 0.4 - 0.010.001 0.7
    0.3999
  • w01 w01 - ? ?1 (1) 0.5 - 0.010.0011
    0.4999
  • w11 w11 - ? ?1 x1 0.6 - 0.010.001 0.4
    0.5999
  • w21 w21 - ? ?1 x2 0.8 - 0.010.001 0.2
    0.7999
  • w31 w31 - ? ?1 x3 0.6 - 0.010.001 0.7
    0.5999
Write a Comment
User Comments (0)
About PowerShow.com