Neural Networks and Pattern Recognition - PowerPoint PPT Presentation

1 / 48

About This Presentation

Title:

Neural Networks and Pattern Recognition

Description:

logistic sigmoid. The use of the logistic sigmoid activation function allows the outputs of the ... logistic sigmoid. batch. homework. homework ... – PowerPoint PPT presentation

Number of Views:1226

Avg rating:3.0/5.0

Slides: 49

Provided by: EXIN

Category:

more less

Transcript and Presenter's Notes

Title: Neural Networks and Pattern Recognition

1
(No Transcript)
2
unit 4
Neural Networks and Pattern Recognition
Giansalvo EXIN Cirrincione
3
Single-layer networks
They directly compute linear discriminant
functions using the TS without need of
determining probability densities.
4
Linear discriminant functions
Two classes
(d-1)-dimensional hyperplane
5
Linear discriminant functions
Several classes
6
Linear discriminant functions
Several classes
The decision regions are always simply connected
and convex.
7
Logistic discrimination
The decision boundary is still linear

two classes
Gaussians with S1 S2 S

8
Logistic discrimination
9
Logistic discrimination
logistic sigmoid
10
Logistic discrimination
The use of the logistic sigmoid activation
function allows the outputs of the discriminant
to be interpreted as posterior probabilities.
logistic sigmoid
11
binary input vectors
Let Pki denote the probability that the input xi
takes the value 1 when the input vector is drawn
from the class Ck. The corresponding probability
that xi 0 is then given by 1- Pki .
Assuming the input variables are statistically
independent, the probability for the complete
input vector is given by
12
binary input vectors
Linear discriminant functions arise when we
consider input patterns in which the variables
are binary.
13
binary input vectors
Consider a set of independent binary variables
having Bernoulli class-conditional densities. For
the two-class problem
Both for normally distributed and Bernoulli
distributed class-conditional densities, the
posterior probabilities are obtained by a
logistic single-layer network.
14
homework
15
(No Transcript)
16
(No Transcript)
17
Generalized discriminant functions
It can approximate any CONTINUOUS functional
transformation to arbitrary accuracy.
18
Sum-of-squares error function
target
quadratic in the weights
19
Geometrical interpretation of least squares
20
Pseudo-inverse solution
21
Pseudo-inverse solution
22
bias
The role of the biases is to compensate for the
difference between the averages (over the data
set) of the target values and the averages of the
output vectors
23
gradient descent
Group all of the parameters (weights and biases)
together to form a single weight vector w.
batch
sequential
24
gradient descent
Differentiable non-linear activation functions
25
gradient descent
Generate and plot a set of data points in two
dimensions, drawn from two classes each of which
is described by a Gaussian class-conditional
density function. Implement the gradient descent
algorithm for training a logistic discriminant,
and plot the decision boundary at regular
intervals during the training procedure on the
same graph as the data. Explore the effect of
choosing different values for the learning rate.
Compare the behaviour of the sequential and batch
weight update procedures.
26
The perceptron
Applied to classification problems in which the
inputs are usually binary images of characters or
simple shapes
27
The perceptron
Define the error function in terms of the total
number of misclassifications over the TS.
However, an error function based on a loss matrix
is piecewise constant w.r.t. the weights and
gradient descent cannot be applied.
Minimize the perceptron criterion
The criterion is continuous and piecewise linear
28
The perceptron
Apply the sequential gradient descent rule to the
perceptron criterion
Cycle through all of the patterns in the TS and
test each pattern in turn using the current set
of weight values. If the pattern is correctly
classified do nothing, otherwise add the pattern
vector to the weight vector if the pattern is
labelled class C1 or subtract the pattern vector
from the weight vector if the pattern is labelled
class C2.
The value of ? is unimportant since its change is
equivalent to a re-scaling of the weights and
biases.
29
The perceptron
30
The perceptron convergence theorem
For any data set which is linearly separable, the
perceptron learning rule is guaranteed to find a
solution in a finite number of steps.
proof
null initial conditions
31
The perceptron convergence theorem
For any data set which is linearly separable, the
perceptron learning rule is guaranteed to find a
solution in a finite number of steps.
proof
end proof
32
The perceptron convergence theorem
33
If the data set happens not to be linearly
separable, then the learning algorithm will never
terminate. If we arbitrarily stop the learning
process, there is no guarantee that the weight
vector found will generalize well for new data.

decrease ? during the training process
the pocket algorithm.

34
Limitations of the perceptron
Even though the data set of input patterns may
not be linearly separable in the input space, it
can become linearly separable in the ? -space.
However, it implies the number and complexity of
the ?js to grow very rapidly (typically
exponential).
35
Fishers linear discriminant
optimal linear dimensionality reduction
no bias
36
Fishers linear discriminant
Constrained optimization w ? (m2 - m1)
arbitrarily large by increasing the magnitude of w
Maximize a function which represents the
difference between the projected class means,
normalized by a measure of the within-class
scatter along the direction of w.
37
Fishers linear discriminant
The within-class scatter of the transformed data
from class Ck is described by the within-class
covariance given by
Fisher criterion
between-class covariance matrix
within-class covariance matrix
38
Fishers linear discriminant
Generalized eigenvector problem
39
Fishers linear discriminant
EXAMPLE
40
Fishers linear discriminant
The projected data can subsequently be used to
construct a discriminant, by choosing a threshold
y0 so that we classify a new point as belonging
to C1 if y(x) ? y0 and classify it as belonging
to C2 otherwise. Note that y wTx is the sum of
a set of random variables and so we may invoke
the central limit theorem and model the
class-conditional density functions p(y Ck)
using normal distributions.
Once we have obtained a suitable weight vector
and a threshold, the procedure for deciding the
class of a new vector is identical to that of the
perceptron network. So, the Fisher criterion can
be viewed as a learning law for the single-layer
network.
41
Fishers linear discriminant
relation to the least-squares approach
42
Fishers linear discriminant
relation to the least-squares approach
Bias threshold
43
Fishers linear discriminant
relation to the least-squares approach
44
Fishers linear discriminant
Several classes
d linear features
45
Fishers linear discriminant
Several classes
46
Fishers linear discriminant
Several classes
In the projected d-dimensional y-space
47
Fishers linear discriminant
Several classes
One possible criterion ...
This criterion is unable to find more than (c -
1) linear features
48
FINE

Write a Comment

User Comments (0)