Title: Artificial Neural Network
1Artificial Neural Network
- 1 Brief Introduction
- 2 Backpropogation Algorithm
- 3 A Simply Illustration
2Chapter 1 Brief Introduction
- 1.2 Review to Decision Tree
- Learning process is to reduce the error, which
can be understood as the difference between the
target and output values from learning structure. - ID3 Algorithm can be implemented only for
discrete values. - Artificial Neural Network (ANN) can describe
arbitrary functions.
3- 1.3 Basic Structure
- This example of ANN learning is provided by
Pomerluaus(1993) system ALVINN, which uses a
learned ANN to steer an autonomous vehicle
driving at normal speeds. The input of ANN is a
30x32 grid of pixel intensities obtained from
forward-faced camera mounted on the vehicle. The
output is the direction in which the vehicle is
steered. - As can be seen, 4 units receive inputs directly
from all of the 30X32 pixels from the camera in
vehicle. These are called hidden units because
their outputs are only available to the coming
units in the network, but not as apart of the
global network.
4(No Transcript)
5- 1.4 Ability
- Instances are represented by many attribute-value
pairs. The target function to be learned is
defined over instances that can be described by a
vector of predefined feature. such as the pixel
values in the ALVINN example. - The training examples may contain errors. In
following sections we can see, that ANN learning
methods are quite robust to noise in training
data. - Long training times are acceptable. Compared to
decision tree learning, network training
algorithm requires longer training time,
depending on factors such as the number of the
weights in network.
6Chapter 2backpropagation Algorithm
- 2.1 Sigmoid
- Like the perceptron, the sigmoid unit first
computes a linear combination of its input. - then the sigmoid unit computes its output with
the following function.
7- This equation 2 is often referred to as the
squashing function since it map very large input
domain to a small range of output.
- this sigmoid function has a useful property that
its derivative is easily expressed in terms of
its output. In the following description of the
backpropagation we can see, the algorithm makes
use of this derivative.
8- 2.2 Function
- the sigmoid is only one unit in the network, now
we take a look at the whole function, which the
neural network calculates. There is a figure 2.2,
if we consider an example (x, t), where x is
called input attribute and t is called target
attribute, than
9(No Transcript)
10- 2.3 Squared Error
- Above it has mentioned, that the whole learning
process is in order to reduce the error, but how
can man error describe? Generally the function
squared error is used. - Notice this function 3 sums all the error over
all of the networks output units after a whole
set of training examples has been computed.
11(No Transcript)
12- then the value-vector can be updated by
- where ?E(w) is the gradient of E
so for each value k can be updated by
13- But in practice, because the function 3 sums all
the error over a whole set of the training data,
so need the algorithm with this function more
time to compute, and can easily be effected by
local minimum, so construct man a new function,
named stochastic squared error
- As can be seen, the function computes error only
about a example. The gradient of Ed(w) is
easily made out
14- 2.4 Backpropagation Algorithm
- The learning problem faced by Backpropagation is
to search a large hypothesis space defined by all
possible weight values for all the units in the
network. The diagram of Algorithm is
15(No Transcript)
16- Notice the error term for hidden unit h is
calculated by summing the error terms s_k for
each output unit influenced by unit h, weighting
each of the s_ks by w_kh,the weight from hidden
unit h to output unit k. This weight
characterizes the degree to which hidden unit h
is responsible for the error in output unit k.
17Chapter 3 A Simple Illustration
Now we make an example to give a more inductive
knowledge. How does ANN learn the most simply
function, a identity id. We construct the network
shown in figure. There are eight network input
units, which are connected to three hidden units,
which are in turn connected to eight output
units. Because of this structure, the three
hidden units will be forced to represent the
eight input values in some way that captures
their relevant features, so that this hidden
layer representation can be used by the output
units to compute the correct target values.
18- This 8 x 3 x 8 network was trained to learn the
identity function. After 5000training times, the
three hidden unit values encode the eight
distinct inputs using the encoding shown in the
tabular. Notice if the encoded values are rounded
to zero or one, the result is the standard binary
encoding for 8 distinct values.