Artificial Intelligence Chapter 20.5: Neural Networks - PowerPoint PPT Presentation

1 / 84
About This Presentation
Title:

Artificial Intelligence Chapter 20.5: Neural Networks

Description:

Department of Computer Science. Kent State University. November 11, 2004 ... Binary sigmoid. Bipolar sigmoid. November 11, 2004. AI: Chapter 20.5: Neural Networks ... – PowerPoint PPT presentation

Number of Views:827
Avg rating:3.0/5.0
Slides: 85
Provided by: michaels85
Category:

less

Transcript and Presenter's Notes

Title: Artificial Intelligence Chapter 20.5: Neural Networks


1
Artificial IntelligenceChapter 20.5 Neural
Networks
  • Michael Scherger
  • Department of Computer Science
  • Kent State University

2
Contents
  • Introduction
  • Simple Neural Networks for Pattern Classification
  • Pattern Association
  • Neural Networks Based on Competition
  • Backpropagation Neural Network

3
Introduction
  • Much of these notes come from Fundamentals of
    Neural Networks Architectures, Algorithms, and
    Applications by Laurene Fausett, Prentice Hall,
    Englewood Cliffs, NJ, 1994.

4
Introduction
  • Aims
  • Introduce some of the fundamental techniques and
    principles of neural network systems
  • Investigate some common models and their
    applications

5
What are Neural Networks?
  • Neural Networks (NNs) are networks of neurons,
    for example, as found in real (i.e. biological)
    brains.
  • Artificial Neurons are crude approximations of
    the neurons found in brains. They may be physical
    devices, or purely mathematical constructs.
  • Artificial Neural Networks (ANNs) are networks of
    Artificial Neurons, and hence constitute crude
    approximations to parts of real brains. They may
    be physical devices, or simulated on conventional
    computers.
  • From a practical point of view, an ANN is just a
    parallel computational system consisting of many
    simple processing elements connected together in
    a specific way in order to perform a particular
    task.
  • One should never lose sight of how crude the
    approximations are, and how over-simplified our
    ANNs are compared to real brains.

6
Why Study Artificial Neural Networks?
  • They are extremely powerful computational devices
    (Turing equivalent, universal computers)
  • Massive parallelism makes them very efficient
  • They can learn and generalize from training data
    so there is no need for enormous feats of
    programming
  • They are particularly fault tolerant this is
    equivalent to the graceful degradation found in
    biological systems
  • They are very noise tolerant so they can cope
    with situations where normal symbolic systems
    would have difficulty
  • In principle, they can do anything a
    symbolic/logic system can do, and more. (In
    practice, getting them to do it can be rather
    difficult)

7
What are Artificial Neural Networks Used for?
  • As with the field of AI in general, there are two
    basic goals for neural network research
  • Brain modeling The scientific goal of building
    models of how real brains work
  • This can potentially help us understand the
    nature of human intelligence, formulate better
    teaching strategies, or better remedial actions
    for brain damaged patients.
  • Artificial System Building The engineering goal
    of building efficient systems for real world
    applications.
  • This may make machines more powerful, relieve
    humans of tedious tasks, and may even improve
    upon human performance.

8
What are Artificial Neural Networks Used for?
  • Brain modeling
  • Models of human development help children with
    developmental problems
  • Simulations of adult performance aid our
    understanding of how the brain works
  • Neuropsychological models suggest remedial
    actions for brain damaged patients
  • Real world applications
  • Financial modeling predicting stocks, shares,
    currency exchange rates
  • Other time series prediction climate, weather,
    airline marketing tactician
  • Computer games intelligent agents, backgammon,
    first person shooters
  • Control systems autonomous adaptable robots,
    microwave controllers
  • Pattern recognition speech recognition,
    hand-writing recognition, sonar signals
  • Data analysis data compression, data mining
  • Noise reduction function approximation, ECG
    noise reduction
  • Bioinformatics protein secondary structure, DNA
    sequencing

9
Learning in Neural Networks
  • There are many forms of neural networks. Most
    operate by passing neural activations through a
    network of connected neurons.
  • One of the most powerful features of neural
    networks is their ability to learn and generalize
    from a set of training data. They adapt the
    strengths/weights of the connections between
    neurons so that the final output activations are
    correct.

10
Learning in Neural Networks
  • There are three broad types of learning
  • Supervised Learning (i.e. learning with a
    teacher)
  • Reinforcement learning (i.e. learning with
    limited feedback)
  • Unsupervised learning (i.e. learning with no help)

11
A Brief History
  • 1943 McCulloch and Pitts proposed the
    McCulloch-Pitts neuron model
  • 1949 Hebb published his book The Organization of
    Behavior, in which the Hebbian learning rule was
    proposed.
  • 1958 Rosenblatt introduced the simple single
    layer networks now called Perceptrons.
  • 1969 Minsky and Paperts book Perceptrons
    demonstrated the limitation of single layer
    perceptrons, and almost the whole field went into
    hibernation.
  • 1982 Hopfield published a series of papers on
    Hopfield networks.
  • 1982 Kohonen developed the Self-Organizing Maps
    that now bear his name.
  • 1986 The Back-Propagation learning algorithm for
    Multi-Layer Perceptrons was re-discovered and the
    whole field took off again.
  • 1990s The sub-field of Radial Basis Function
    Networks was developed.
  • 2000s The power of Ensembles of Neural Networks
    and Support Vector Machines becomes apparent.

12
Overview
  • Artificial Neural Networks are powerful
    computational systems consisting of many simple
    processing elements connected together to perform
    tasks analogously to biological brains.
  • They are massively parallel, which makes them
    efficient, robust, fault tolerant and noise
    tolerant.
  • They can learn from training data and generalize
    to new situations.
  • They are useful for brain modeling and real world
    applications involving pattern recognition,
    function approximation, prediction,

13
The Nervous System
  • The human nervous system can be broken down into
    three stages that may be represented in block
    diagram form as
  • The receptors collect information from the
    environment e.g. photons on the retina.
  • The effectors generate interactions with the
    environment e.g. activate muscles.
  • The flow of information/activation is represented
    by arrows feed forward and feedback.

14
Levels of Brain Organization
  • The brain contains both large scale and small
    scale anatomical structures and different
    functions take place at higher and lower levels.
    There is a hierarchy of interwoven levels of
    organization
  • Molecules and Ions
  • Synapses
  • Neuronal microcircuits
  • Dendritic trees
  • Neurons
  • Local circuits
  • Inter-regional circuits
  • Central nervous system
  • The ANNs we study in this module are crude
    approximations to levels 5 and 6.

15
Brains vs. Computers
  • There are approximately 10 billion neurons in the
    human cortex, compared with 10 of thousands of
    processors in the most powerful parallel
    computers.
  • Each biological neuron is connected to several
    thousands of other neurons, similar to the
    connectivity in powerful parallel computers.
  • Lack of processing units can be compensated by
    speed. The typical operating speeds of biological
    neurons is measured in milliseconds (10-3 s),
    while a silicon chip can operate in nanoseconds
    (10-9 s).
  • The human brain is extremely energy efficient,
    using approximately 10-16 joules per operation
    per second, whereas the best computers today use
    around 10-6 joules per operation per second.
  • Brains have been evolving for tens of millions of
    years, computers have been evolving for tens of
    decades.

16
Structure of a Human Brain
17
Slice Through a Real Brain
18
Biological Neural Networks
  • The majority of neurons encode their outputs or
    activations as a series of brief electical pulses
    (i.e. spikes or action potentials).
  • Dendrites are the receptive zones that receive
    activation from other neurons.
  • The cell body (soma) of the neurons processes
    the incoming activations and converts them into
    output activations.
  • 4. Axons are transmission lines that send
    activation to other neurons.
  • 5. Synapses allow weighted transmission of
    signals (using neurotransmitters) between axons
    and dendrites to build up large neural networks.

19
The McCulloch-Pitts Neuron
  • This vastly simplified model of real neurons is
    also known as a Threshold Logic Unit
  • A set of synapses (i.e. connections) brings in
    activations from other neurons.
  • A processing unit sums the inputs, and then
    applies a non-linear activation function (i.e.
    squashing/transfer/threshold function).
  • An output line transmits the result to other
    neurons.

20
Networks of McCulloch-Pitts Neurons
  • Artificial neurons have the same basic components
    as biological neurons. The simplest ANNs consist
    of a set of McCulloch-Pitts neurons labeled by
    indices k, i, j and activation flows between them
    via synapses with strengths wki, wij

21
Some Useful Notation
  • We often need to talk about ordered sets of
    related numbers we call them vectors, e.g.
  • x (x1, x2, x3, , xn) , y (y1, y2, y3, , ym)
  • The components xi can be added up to give a
    scalar (number), e.g.
  • s x1 x2 x3 xn SUM(i, n, xi)
  • Two vectors of the same length may be added to
    give another vector, e.g.
  • z x y (x1 y1, x2 y2, , xn yn)
  • Two vectors of the same length may be multiplied
    to give a scalar, e.g.
  • p x.y x1y1 x2 y2 xnyn SUM(i, N,
    xiyi)

22
Some Useful Functions
  • Common activation functions
  • Identity function
  • f(x) x for all x
  • Binary step function (with threshold ?) (aka
    Heaviside function or threshold function)

23
Some Useful Functions
  • Binary sigmoid
  • Bipolar sigmoid

24
The McCulloch-Pitts Neuron Equation
  • Using the above notation, we can now write down a
    simple equation for the output out of a
    McCulloch-Pitts neuron as a function of its n
    inputs ini

25
Review
  • Biological neurons, consisting of a cell body,
    axons, dendrites and synapses, are able to
    process and transmit neural activation
  • The McCulloch-Pitts neuron model (Threshold Logic
    Unit) is a crude approximation to real neurons
    that performs a simple summation and thresholding
    function on activation levels
  • Appropriate mathematical notation facilitates the
    specification and programming of artificial
    neurons and networks of artificial neurons.

26
Networks of McCulloch-Pitts Neurons
  • One neuron cant do much on its own. Usually we
    will have many neurons labeled by indices k, i, j
    and activation flows between them via synapses
    with strengths wki, wij

27
The Perceptron
  • We can connect any number of McCulloch-Pitts
    neurons together in any way we like.
  • An arrangement of one input layer of
    McCulloch-Pitts neurons feeding forward to one
    output layer of McCulloch-Pitts neurons is known
    as a Perceptron.

28
Logic Gates with MP Neurons
  • We can use McCulloch-Pitts neurons to implement
    the basic logic gates.
  • All we need to do is find the appropriate
    connection weights and neuron thresholds to
    produce the right outputs for each set of inputs.
  • We shall see explicitly how one can construct
    simple networks that perform NOT, AND, and OR.
  • It is then a well known result from logic that we
    can construct any logical function from these
    three operations.
  • The resulting networks, however, will usually
    have a much more complex architecture than a
    simple Perceptron.
  • We generally want to avoid decomposing complex
    problems into simple logic gates, by finding the
    weights and thresholds that work directly in a
    Perceptron architecture.

29
Implementation of Logical NOT, AND, and OR
  • Logical OR
  • x1 x2 y
  • 0 0 0
  • 0 1 1
  • 1 0 1
  • 1 1 1

x1
?2
2
y
x2
2
30
Implementation of Logical NOT, AND, and OR
  • Logical AND
  • x1 x2 y
  • 0 0 0
  • 0 1 0
  • 1 0 0
  • 1 1 1

x1
?2
1
y
x2
1
31
Implementation of Logical NOT, AND, and OR
  • Logical NOT
  • x1 y
  • 0 1
  • 1 0

x1
?2
-1
y
1
2
bias
32
Implementation of Logical NOT, AND, and OR
  • Logical AND NOT
  • x1 x2 y
  • 0 0 0
  • 0 1 0
  • 1 0 1
  • 1 1 0

x1
?2
2
y
x2
-1
33
Logical XOR
  • Logical XOR
  • x1 x2 y
  • 0 0 0
  • 0 1 1
  • 1 0 1
  • 1 1 0

x1
?
y
x2
?
34
Logical XOR
  • How long do we keep looking for a solution? We
    need to be able to calculate appropriate
    parameters rather than looking for solutions by
    trial and error.
  • Each training pattern produces a linear
    inequality for the output in terms of the inputs
    and the network parameters. These can be used to
    compute the weights and thresholds.

35
Finding the Weights Analytically
  • We have two weights w1 and w2 and the threshold
    q, and for each training pattern we need to
    satisfy

36
Finding the Weights Analytically
  • For the XOR network
  • Clearly the second and third inequalities are
    incompatible with the fourth, so there is in fact
    no solution. We need more complex networks, e.g.
    that combine together many simple networks, or
    use different activation/thresholding/transfer
    functions.

37
ANN Topologies
  • Mathematically, ANNs can be represented as
    weighted directed graphs. For our purposes, we
    can simply think in terms of activation flowing
    between processing units via one-way connections
  • Single-Layer Feed-forward NNs One input layer and
    one output layer of processing units. No
    feed-back connections. (For example, a simple
    Perceptron.)
  • Multi-Layer Feed-forward NNs One input layer, one
    output layer, and one or more hidden layers of
    processing units. No feed-back connections. The
    hidden layers sit in between the input and output
    layers, and are thus hidden from the outside
    world. (For example, a Multi-Layer Perceptron.)
  • Recurrent NNs Any network with at least one
    feed-back connection. It may, or may not, have
    hidden units. (For example, a Simple Recurrent
    Network.)

38
ANN Topologies
39
Detecting Hot and Cold
  • It is a well-known and interesting psychological
    phenomenon that if a cold stimulus is applied to
    a persons skin for a short period of time, the
    person will perceive heat.
  • However, if the same stimulus is applied for a
    longer period of time, the person will perceive
    cold. The use of discrete time steps enables the
    network of MP neurons to model this phenomenon.

40
Detecting Hot and Cold
  • The desired response of the system is that cold
    is perceived if a cold stimulus is applied for
    two time steps
  • y2(t) x2(t-2) AND x2(t-1)
  • It is also required that heat be perceived if
    either a hot stimulus is applied or a cold
    stimulus is applied briefly (for one time step)
    and then removed
  • y1(t) x1(t-1) OR x2(t-3) AND NOT x2(t-2)

41
Detecting Heat and Cold
2
Heat
x1
y1
2
z1
-1
2
1
2
z2
x2
y2
Cold
1
42
Detecting Heat and Cold
Heat
0
Apply Cold
1
Cold
43
Detecting Heat and Cold
Heat
0
0
Remove Cold
1
0
Cold
44
Detecting Heat and Cold
Heat
0
1
0
0
Cold
45
Detecting Heat and Cold
Heat
1
Perceive Heat
0
Cold
46
Detecting Heat and Cold
Heat
0
Apply Cold
1
Cold
47
Detecting Heat and Cold
Heat
0
0
1
1
Cold
48
Detecting Heat and Cold
Heat
0
0
1
1
Cold
Perceive Cold
49
Example Classification
  • Consider the example of classifying airplanes
    given their masses and speeds
  • How do we construct a neural network that can
    classify any type of bomber or fighter?

50
A General Procedure for Building ANNs
  • 1. Understand and specify your problem in terms
    of inputs and required outputs, e.g. for
    classification the outputs are the classes
    usually represented as binary vectors.
  • 2. Take the simplest form of network you think
    might be able to solve your problem, e.g. a
    simple Perceptron.
  • 3. Try to find appropriate connection weights
    (including neuron thresholds) so that the network
    produces the right outputs for each input in its
    training data.
  • 4. Make sure that the network works on its
    training data, and test its generalization by
    checking its performance on new testing data.
  • 5. If the network doesnt perform well enough, go
    back to stage 3 and try harder.
  • 6. If the network still doesnt perform well
    enough, go back to stage 2 and try harder.
  • 7. If the network still doesnt perform well
    enough, go back to stage 1 and try harder.
  • 8. Problem solved move on to next problem.

51
Building a NN for Our Example
  • For our airplane classifier example, our inputs
    can be direct encodings of the masses and speeds
  • Generally we would have one output unit for each
    class, with activation 1 for yes and 0 for no
  • With just two classes here, we can have just one
    output unit, with activation 1 for fighter and
    0 for bomber (or vice versa)
  • The simplest network to try first is a simple
    Perceptron
  • We can further simplify matters by replacing the
    threshold by using a bias

52
Building a NN for Our Example
53
Building a NN for Our Example
54
Decision Boundaries in Two Dimensions
  • For simple logic gate problems, it is easy to
    visualize what the neural network is doing. It
    is forming decision boundaries between classes.
    Remember, the network output is
  • The decision boundary (between out 0 and out
    1) is at
  • w1in1 w2in2 - ? 0

55
Decision Boundaries in Two Dimensions
In two dimensions the decision boundaries are
always on straight lines
56
Decision Boundaries for AND and OR
57
Decision Boundaries for XOR
  • There are two obvious remedies
  • either change the transfer function so that it
    has more than one decision boundary
  • use a more complex network that is able to
    generate more complex decision boundaries

58
Logical XOR (Again)
  • z1 x1 AND NOT x2
  • z2 x2 AND NOT x1
  • y z1 OR z2

2
x1
z1
2
-1
y
-1
2
x2
z2
2
59
Decision Hyperplanes and Linear Separability
  • If we have two inputs, then the weights define a
    decision boundary that is a one dimensional
    straight line in the two dimensional input space
    of possible input values
  • If we have n inputs, the weights define a
    decision boundary that is an n-1 dimensional
    hyperplane in the n dimensional input space
  • w1in1 w2in2 wninn - ? 0

60
Decision Hyperplanes and Linear Separability
  • This hyperplane is clearly still linear (i.e.
    straight/flat) and can still only divide the
    space into two regions. We still need more
    complex transfer functions, or more complex
    networks, to deal with XOR type problems
  • Problems with input patterns which can be
    classified using a single hyperplane are said to
    be linearly separable. Problems (such as XOR)
    which cannot be classified in this way are said
    to be non-linearly separable.

61
General Decision Boundaries
  • Generally, we will want to deal with input
    patterns that are not binary, and expect our
    neural networks to form complex decision
    boundaries
  • We may also wish to classify inputs into many
    classes (such as the three shown here)

62
Learning and Generalization
  • A network will also produce outputs for input
    patterns that it was not originally set up to
    classify (shown with question marks), though
    those classifications may be incorrect
  • There are two important aspects of the networks
    operation to consider
  • Learning The network must learn decision surfaces
    from a set of training patterns so that these
    training patterns are classified correctly
  • Generalization After training, the network must
    also be able to generalize, i.e. correctly
    classify test patterns it has never seen before
  • Usually we want our neural networks to learn
    well, and also to generalize well.

63
Learning and Generalization
  • Sometimes, the training data may contain errors
    (e.g. noise in the experimental determination of
    the input values, or incorrect classifications)
  • In this case, learning the training data
    perfectly may make the generalization worse
  • There is an important tradeoff between learning
    and generalization that arises quite generally

64
Generalization in Classification
  • Suppose the task of our network is to learn a
    classification decision boundary
  • Our aim is for the network to generalize to
    classify new inputs appropriately. If we know
    that the training data contains noise, we dont
    necessarily want the training data to be
    classified totally accurately, as that is likely
    to reduce the generalization ability.

65
Generalization in Function Approximation
  • Suppose we wish to recover a function for which
    we only have noisy data samples
  • We can expect the neural network output to give a
    better representation of the underlying function
    if its output curve does not pass through all the
    data points. Again, allowing a larger error on
    the training data is likely to lead to better
    generalization.

66
Training a Neural Network
  • Whether our neural network is a simple
    Perceptron, or a much more complicated multilayer
    network with special activation functions, we
    need to develop a systematic procedure for
    determining appropriate connection weights.
  • The general procedure is to have the network
    learn the appropriate weights from a
    representative set of training data
  • In all but the simplest cases, however, direct
    computation of the weights is intractable

67
Training a Neural Network
  • Instead, we usually start off with random initial
    weights and adjust them in small steps until the
    required outputs are produced
  • We shall now look at a brute force derivation of
    such an iterative learning algorithm for simple
    Perceptrons.
  • Later, we shall see how more powerful and general
    techniques can easily lead to learning algorithms
    which will work for neural networks of any
    specification we could possibly dream up

68
Perceptron Learning
  • For simple Perceptrons performing classification,
    we have seen that the decision boundaries are
    hyperplanes, and we can think of learning as the
    process of shifting around the hyperplanes until
    each training pattern is classified correctly
  • Somehow, we need to formalize that process of
    shifting around into a systematic algorithm
    that can easily be implemented on a computer
  • The shifting around can conveniently be split
    up into a number of small steps.

69
Perceptron Learning
  • If the network weights at time t are wij(t), then
    the shifting process corresponds to moving them
    by an amount Dwij(t) so that at time t1 we have
    weights
  • wij(t1) wij(t) Dwij(t)
  • It is convenient to treat the thresholds as
    weights, as discussed previously, so we dont
    need separate equations for them

70
Formulating the Weight Changes
  • Suppose the target output of unit j is targj and
    the actual output is outj sgn(S ini wij), where
    ini are the activations of the previous layer of
    neurons (e.g. the network inputs)
  • Then we can just go through all the possibilities
    to work out an appropriate set of small weight
    changes

71
Perceptron Algorithm
  • Step 0 Initialize weights and bias
  • For simplicity, set weights and bias to zero
  • Set learning rate a (0 lt a lt 1) (h)
  • Step 1 While stopping condition is false do
    steps 2-6
  • Step 2 For each training pair st do steps 3-5
  • Step 3 Set activations of input units
  • xi si

72
Perceptron Algorithm
  • Step 4 Compute response of output unit

73
Perceptron Algorithm
  • Step 5 Update weights and bias if an error
    occurred for this pattern
  • if y ! t
  • wi(new) wi(old) atxi
  • b(new) b(old) at
  • else
  • wi(new) wi(old)
  • b(new) b(old)
  • Step 6 Test Stopping Condition
  • If no weights changed in Step 2, stop, else,
    continue

74
Convergence of Perceptron Learning
  • The weight changes Dwij need to be applied
    repeatedly for each weight wij in the network,
    and for each training pattern in the training
    set. One pass through all the weights for the
    whole training set is called one epoch of
    training
  • Eventually, usually after many epochs, when all
    the network outputs match the targets for all the
    training patterns, all the Dwij will be zero and
    the process of training will cease. We then say
    that the training process has converged to a
    solution

75
Convergence of Perceptron Learning
  • It can be shown that if there does exist a
    possible set of weights for a Perceptron which
    solves the given problem correctly, then the
    Perceptron Learning Rule will find them in a
    finite number of iterations
  • Moreover, it can be shown that if a problem is
    linearly separable, then the Perceptron Learning
    Rule will find a set of weights in a finite
    number of iterations that solves the problem
    correctly

76
Overview and Review
  • Neural network classifiers learn decision
    boundaries from training data
  • Simple Perceptrons can only cope with linearly
    separable problems
  • Trained networks are expected to generalize, i.e.
    deal appropriately with input data they were not
    trained on
  • One can train networks by iteratively updating
    their weights
  • The Perceptron Learning Rule will find weights
    for linearly separable problems in a finite
    number of iterations.

77
Hebbian Learning
  • In 1949 neuropsychologist Donald Hebb postulated
    how biological neurons learn
  • When an axon of cell A is near enough to excite
    a cell B and repeatedly or persistently takes
    part in firing it, some growth process or
    metabolic change takes place on one or both cells
    such that As efficiency as one of the cells
    firing B, is increased.
  • In other words
  • 1. If two neurons on either side of a synapse
    (connection) are activated simultaneously (i.e.
    synchronously), then the strength of that synapse
    is selectively increased.
  • This rule is often supplemented by
  • 2. If two neurons on either side of a synapse are
    activated asynchronously, then that synapse is
    selectively weakened or eliminated.
  • so that chance coincidences do not build up
    connection strengths.

78
Hebbian Learning Algorithm
  • Step 0 Initialize all weights
  • For simplicity, set weights and bias to zero
  • Step 1 For each input training vector do steps
    2-4
  • Step 2 Set activations of input units
  • xi si
  • Step 3 Set the activation for the output unit
  • y t
  • Step 4 Adjust weights and bias
  • wi(new) wi(old) yxi
  • b(new) b(old) y

79
Hebbian vs Perceptron Learning
  • In the notation used for Perceptrons, the Hebbian
    learning weight update rule is
  • wij (new) outj . ini
  • There is strong physiological evidence that this
    type of learning does take place in the region of
    the brain known as the hippocampus.
  • Recall that the Perceptron learning weight update
    rule we derived was
  • wij (new) h. targj . ini
  • There is some similarity, but it is clear that
    Hebbian learning is not going to get our
    Perceptron to learn a set of training data.

80
Adaline
  • Adaline (Adaptive Linear Network) was developed
    by Widrow and Hoff in 1960.
  • Uses bipolar activations (-1 and 1) for its input
    signals and target values
  • Weight connections are adjustable
  • Trained using the delta rule for weight update
  • wij(new) wij(old) a(targj-outj)xi

81
Adaline Training Algorithm
  • Step 0 Initialize weights and bias
  • For simplicity, set weights (small random values)
    Set learning rate a (0 lt a lt 1) (h)
  • Step 1 While stopping condition is false do
    steps 2-6
  • Step 2 For each training pair st do steps 3-5
  • Step 3 Set activations of input units
  • xi si

82
Adaline Training Algorithm
  • Step 4 Compute net input to output unit
  • y_in b S xiwi
  • Step 5 Update bias and weights
  • wi(new) wi(old) a(t-y_in)xi
  • b(new) b(old) a(t-y_in)
  • Step 6 Test for stopping condition

83
Autoassociative Net
  • The feed forward autoassociative net has the
    following diagram
  • Useful for determining is something is a part of
    the test pattern or not
  • Weight matrix diagonal is usually zeroimproves
    generalization
  • Hebbian learning if mutually orthogonal vectors
    are used

x1
y1
xi
yj
xn
ym
84
BAM Net
  • Bidirectional Associative Net
Write a Comment
User Comments (0)
About PowerShow.com