Introduction to Neural Network


Information processing technology inspired by studies of brain and the nervous ... networks Hopfield showed how to use 'Ising spin glass' type of model to store ... – PowerPoint PPT presentation

Title: Introduction to Neural Network

Chapter 3
  • Introduction to Neural Network

Before we start
  • Information processing technology inspired by
    studies of brain and the nervous system.

Brains Capability
  • its performance tends
  • to degrade gracefully under
  • partial damage.
  • it can learn (reorganize itself)
  • from experience.
  • it performs massively parallel computations
    extremely efficiently.
  • it supports our intelligence and self-awareness.

What Is A Neural Network?
  • "...a computing system made up of a number of
    simple, highly interconnected processing
    elements, which process information by their
    dynamic state response to external inputs.
  • An ANN is a network of many very simple
    processors ("units"), each possibly having a
    (small amount of) local memory. The units are
    connected by unidirectional communication
    channels ("connections"), which carry numeric (as
    opposed to symbolic) data. The units operate only
    on their local data and on the inputs they
    receive via the connections.

  • 1943 --- McCulloch and Pitts (start of the modern
    era of neural networks).  Logical calculus of
    neural networks. A network consists of sufficient
    number of neurons (using a simple model) and
    properly set synaptic connections can compute any
    computable function.
  • 1949 --- Hebb's book "The organization of
    behavior".  An explicit statement of a
    physiological learning rule for synaptic
    modification was presented for the first time. 
  • Hebb proposes that the connectivity of the brain
    is continually changing as an organism learns
    differing functional tasks, and that neural
    assemblies are created by such changes. 
  • Hebb's work was immensely influential among
  • 1958 --- Rosenblatt introduced Perceptron A novel
    method of supervised learning.

Historical Contd
  • Perceptron convergence theorem.
  • Least mean-square (LMS) algorithm
  • 1969 --- Minsky and Papert showed limits on
    perceptron computation. Minsky and Papert showed
    that there are fundamental limits on what
    single-layer perceptrons can compute.
  • They speculated that the limits could not be
    overcome for the multi-layer version
  • 1982 --- Hopfield's networks Hopfield showed how
    to use "Ising spin glass" type of model to store
    information in dynamically stable networks.
  • His work paved the way for physicists to enter
    neural modeling, thereby transforming the field
    of neural networks.

  • 1982 --- Kohonen's self-organizing maps (SOM)
    Kohonen's self-organizing maps is capable of
    reproducing important aspects of the structure of
    biological neural nets Data representation using
    topographic maps (which are common in the nervous
    systems). SOM also has a wide range of
  • SOM shows how the output layer can pick up the
    correlational structure (from the inputs) in the
    form of the spatial arrangement of units.
  • 1985 --- Ackley, Hinton, and Sejnowski, developed
    Boltzmann machine, which was the first successful
    realization of a multilayer neural network.
  • 1986 --- Rumelhart, Hinton, and Williams
    developed the back-propagation algorithm --- the
    most popular learning algorithm for the training
    of multilayer perceptrons. It has been the
    workhorse for many neural network applications

Why Neural Nets?
  • Adaptive learning An ability to learn how to do
    tasks based on the data given for training or
    initial experience.
  • Self-Organisation An ANN can create its own
    organisation or representation of the information
    it receives during learning time.
  • Real Time Operation ANN computations may be
    carried out in parallel, and special hardware
    devices are being designed and manufactured which
    take advantage of this capability.
  • Fault Tolerance via Redundant Information Coding
    Partial destruction of a network leads to the
    corresponding degradation of performance.
    However, some network capabilities may be
    retained even with major network damage.

Before we start..
Differentiated between brain and computer
Neuron Vs ANN

Relationships between biological artificial
  • Soma
  • Dendrites
  • Axon
  • Synapse
  • Slow Speed
  • Many Neurons - 109
  • Node
  • Input
  • Output
  • Weight
  • Fast Speed
  • Few Neurons
  • - a dozen to hundreds of thousands

Summary of selected biophysical mechanisms and
their corresponding possible neural operations
they could implement
  • Biophysical Mechanism
  • Action potential initiation
  • Repetitive spiking activity
  • Action potential conduction
  • Chemically mediated synaptic transduction
  • Electrically mediated synaptic transduction
  • Distributed excitatory synapses in dendritic tree
  • Excitatory and inhibitory synapses of dendritic
  • Long distance action of neurotransmitter
  • Neural Operation
  • Analog OR/AND 1-bit A/D converter
  • Current-to-frequency transducer
  • Impulse transmission
  • Sigmoid threshold or Nonreciprocal 2-port
    negative resistance
  • Reciprocal 1-port resistance
  • Linear addition
  • Local AND-NOT presynaptic inhibition
  • Modulating and routing transmission of signals

Neural Network Fundamentals
  • Components and Structures
  • Composed of processing elements organized in
    different ways to form the networks structures
  • Processing Elements
  • Artificial neurons Processing Elements (PEs)
  • Each PE receives, process input , and delivers a
    single output (refer to diagram)
  • Input can be raw or the output of other
    processing elements.

Neural Network Fundamentals Contd
  • The Network
  • Composed of a collection of neuron grouped in
    layers (input, intermediate, output)
  • Network Structure
  • Can be organized in several different ways
    neuron connected into different ways
  • Network Information Processing
  • After structure is determined, information can be

Neural Network Fundamentals Contd
  • Input
  • Corresponds to a single attribute.
  • Input can be text, pictures, voice
  • Preprocessing needed to convert this data to
    meaningful inputs
  • Ouput
  • Contains the solution to a problem
  • Post-processing is required
  • Weights
  • Express the relative strength (mathematic value)
    of the input data
  • Crucial in that they store learned patterns of

Neural Network Fundamentals Contd
  • Summation Function
  • Computes the weighted sum all the input elements
    entering each processing elements
  • Multiplies each input value by its weight and
    totals the value for a weighted sum Y.
  • The formula is
  • The summation function computes the internal
    simulation or activation level of the neuron.
    Neuron may or may not produce an output

And for the jth
Neural Network Fundamentals Contd
  • Transformation (Transfer) Function
  • This Function is to produce the output after
    summations function has been compute (if
  • The popular - transfer function (sigmoid
    function)- useful nonlinear transfer function is
  • YT transformed (normalized) value of Y
  • Transformation modifies the output level to be
    within reasonable values ( 0-1)
  • This performed before the output reach the next
  • Without transformation the value become very
    large especially ehen there are several layers of

Learning Algorithm
  • There are a lot of learning algorithm
    classified as supervised learning and
    unsupervised Learning.
  • Supervised Learning uses a set of inputs for
    which the appropriate (desired) output are know
  • Unsupervised Learning only input stimuli are
    shown to the network. The network is

2 Main Types of ANN
  • e.g
  • Adaline
  • Perceptron
  • MLP
  • RBF
  • Fuzzy ARTMAP
  • etc.
  • e.g
  • Competitive learning networks
  • - SOM
  • - ART families
  • - neocognition
  • - etc.

Supervised Network
Unsupervised ANN
How does an ANN learn
  • Connected by links-each link has a numerical
  • Weight
  • basic means of long-term memory in ANNs
  • Express the strength
  • Learns through repeated adjustments of these

Input layer
Middle layer
Output Layer
Learning Process of ANN
  • Learn from experience
  • Learning algorithms
  • Recognize pattern of activities
  • Involves 3 tasks
  • Compute outputs
  • Compare outputs with desired targets
  • Adjust the weights and repeat the process

Compute output
Is Desired Output achieved
Adjust Weight
NN Application Development
  • Similar to the structured design methodologies of
    traditional computer-based IS
  • There are 9 step (Turban, Aronson. 2001)
  • Collect data
  • Separate into training and test, sets
  • Define a network structure
  • Select a training algorithm
  • Set, parameters, value, initialize weights
  • Transform data to network inputs
  • Start training and determine and revise weights
  • Stop and test
  • Implementation use the network with new cases

What Applications Should Neural Networks Be Used
  • capturing associations or discovering
    regularities within a set of patterns
  • where the volume, number of variables or
    diversity of the data is very great
  • the relationships between variables are vaguely
    understood or,
  • the relationships are difficult to describe
    adequately with conventional approaches.

Mathematic Relate
Neural Network Architecture
  • Feedforward Flow
  • Algorithms Backpropagation, Madaline III
  • Neuron Output feedforward to subsequent layer
  • Solving problem static pattern recognition,
    classification and generalization problems (eg
    quality control, loan evaluation)
  • Recurrent Structure
  • Algorithms TrueTime Algorithm
  • Neuron Output feedback as neuron input
  • Solving problem dynamic time-dependent problems
    (e.g sales forecasting, process analysis,
    sequence recognition, and sequence generation)

Topologies of ANN
Fully-connected feed-forward
Partially recurrent network
Fully recurrent network
  • Parallel processing
  • Distributed representations
  • Online (i.e., incremental) algorithm
  • Simple computations
  • Robust with respect to noisy data
  • Robust with respect to node failure
  • Empirically shown to work well for many problem

  • Slow training
  • Poor interpretability
  • Network topology layouts ad hoc
  • Hard to debug because distributed representations
    preclude content checking
  • May converge to a local, not global, minimum of
  • Not known how to model higher-level cognitive
  • May be hard to describe a problem in terms of
    features with numerical values

Limitation of ANN
  • Lack of explanation capability
  • Do not produce an explicit model
  • Do not perform well on tasks that people do not
    perform well
  • Required extensive training and testing of data

Applications of NN
  • best at identifying patterns or trends in data,
    they are well suited for prediction or
    forecasting needs including
  • sales forecasting
  • industrial process control
  • customer research
  • data validation
  • risk management
  • target marketing

Example of Applications
  • NETtalk (Sejnowski and Rosenberg, 1987)
  • Maps character strings into phonemes for learning
    speech from text.
  • Neurogammon (Tesauro and Sejnowski, 1989)
  • Backgammon learning program
  • Speech recognition (Waibel, 1989)
  • Converts sound to text
  • Character recognition (Le Cun et al., 1989)
  • Face Recognition (Mitchell)
  • ALVINN (Pomerleau, 1988)

Other Issues
  • How to Set Alpha, the Learning Rate
    Parameter?Use a tuning set or cross-validation
    to train using several candidate values for
    alpha, and then select the value that gives the
    lowest error
  • How to Estimate the Error?Use cross-validation
    (or some other evaluation method) multiple times
    with different random initial weights. Report the
    average error rate.
  • How many Hidden Layers and How many Hidden Units
    per Layer?Usually just one hidden layer is used
    (i.e., a 2-layer network). How many units should
    it contain? Too few gt can't learn. Too many gt
    poor generalization. Determine experimentally
    using a tuning set or cross-validation to select
    number that minimizes error.

Other Issues (cont..)
  • How many examples in the Training Set?
  • Under what circumstances can I be assured that a
    net that is trained to classify 1 - e/2 of the
    training set correctly, will also classify 1 - e
    of the testing set correctly? Clearly, the larger
    the training set the better the generalization,
    but the longer the training time required. But to
    obtain 1 - e correct classification on the
    testing set, training set should be of size
    approximately n/e, where n is the number of
    weights in the network and e is a fraction
    between 0 and 1. For example, if e.1 and n80,
    then a training set of size 800 that is trained
    until 95 correct classification is achieved on
    the training set, should produce 90 correct
    classification on the testing set.

Other Issues (cont..)
  • When to Stop?
  • Too much training "overfits" the data, and hence
    the error rate will go up on the testing set.
    Hence it is not usually advantageous to continue
    training until the MSE is minimized. Instead,
    train the network until the error rate on a
    tuning set starts to increase.
