Title: Non-Bayes classifiers.
1Non-Bayes classifiers.
- Linear discriminants,
- neural networks.
2Discriminant functions(1)
Bayes classification rule
Instead might try to find a function
is called discriminant function.
- decision surface
3Discriminant functions (2)
Linear discriminant function
Decision surface is a hyperplane
4Linear discriminant perceptron cost function
Replace
Thus now decision function is and decision
surface is
Perceptron cost function
where
5Linear discriminant perceptron cost function
Perceptron cost function
Value of is proportional to the sum of
distances of all misclassified samples to the
decision surface.
If discriminant function separates classes
perfectly, then Otherwise,
and we want to minimize it.
is continuous and piecewise linear. So
we might try to use gradient descent algorithm.
6Linear discriminant Perceptron algorithm
Gradient descent
At points where is differentiable
Thus
Perceptron algorithm converges when classes are
linearly separable with some conditions on
7Sum of error squares estimation
Let denote as desired
output function, 1 for one class and 1 for the
other.
Want to find discriminant function whose output
is similar to
Use sum of error squares as similarity criterion
8Sum of error squares estimation
Minimize mean square error
Thus
9Neurons
10Artificial neuron.
Above figure represent artificial neuron
calculating
11Artificial neuron.
Threshold functions f
Step function
Logistic function
12Combining artificial neurons
Multilayer perceptron with 3 layers.
13(No Transcript)
14Discriminating ability of multilayer perceptron
Since 3-layer perceptron can approximate any
smooth function, it can approximate
-
optimal discriminant function of two classes.
15Training of multilayer perceptron
f
f
f
f
f
f
Layer r-1
Layer r
16Training and cost function
Desired network output
Trained network output
Cost function for one training sample
Total cost function
Goal of the training find values of
which minimize cost function .
17Gradient descent
Denote
Gradient descent
Since , we might want to
update weights after processing each training
sample separately
18Gradient descent
Chain rule for differentiating composite
functions
Denote
19Backpropagation
If rL, then
If rltL, then
20Backpropagation algorithm
- Initialization initialize all weights with
random values. - Forward computations for each training vector
x(i) compute all - Backward computations for each i, j and rL,
L-1,,2 compute - Update weights
21MLP issues
- What is the best network configuration?
- How to choose proper learning parameter ?
- When training should be stopped?
- Choose another threshold function f or cost
function J?