Title: Linear Discriminant Functions,
1Chapter 5
Linear Discriminant Functions, The Perceptron
Model The Gradient Descent Procedures
2Linear Discriminant Functions
- Objective Designing disc. Functions that are
linear in x that define decision boundaries - How By formulating the problem as one of
minimizing a criterion function, based on the
perceptron model - Perceptron what? Youll see !
- What criterion function? A cost function to be
minimized, such as training error - Is it difficult? Yes
- Why? Because small training error need not mean
small test error ! - So what do we do? Use gradient descent
optimization approaches. Not guaranteed, but
de-facto standard in engineering optimization
problems - Why should I care? Because it will be in the exam
- That you will take over and over again after
you graduate !!!
3The Perceptron Model
4Perceptron Decision Boundary
5Multicategory Case
6The Solution Vector The Solution REgion
Augmented Vectors
7The Gradient Descent
8Homework 5
- Implement the Parzen window density estimation
using the Gaussian window function in 1
dimension. TakeTest it on a number of
distributions. You can generate random numbers
from different distributions using the data
generation commands in the statistics toolbox.
Then modify your algorithm for 2-dimensions
(modify Vn accordingly). - Implement Algorithms 1 and 2 in your text book
for PNN. - Computer exercises 1 2 from Chapter 4.
- Reading Assignment Chapter 4172 187, 195-197
(which you have already read, of course, for last
weeks class), and Chapter 5 215-227. - Yes, there will be (yet another) quiz on
Wednesday !!!
9Linear Discriminant Functions
- Objective Designing disc. Functions that are
linear in x that define decision boundaries - How By formulating the problem as one of
minimizing a criterion function, based on the
perceptron model - Perceptron what? Youll see !
- What criterion function? A cost function to be
minimized, such as training error - Is it difficult? butof course!
- Why? Because small training error need not mean
small test error ! - So what do we do? Use gradient descent
optimization approaches. Not guaranteed, but
de-facto standard in engineering optimization
problems - Why should I care? Because it will be in the exam
10The Perceptron Model
11Perceptron Decision Boundary
12Multicategory Case
13The Solution Vector The Solution REgion
Augmented Vectors
14The Gradient Descent
- So how do we find the appropriate solution region
/ vector that satisfies aTyigt0? - We define a criterion function (again), J(a),
and minimize it such that a is the solution
vector - This reduces the problem of a massive search into
a problem of minimizing a scalar function - How do we minimize J(a)? you ask
- Start at some arbitrary point a1, and compute the
corresponding J(a1) - Compute the gradient (what else?) of J(a1)? ?
J(a1) - Obtain the next point a2 by moving in the
direction of the negative gradient, - ? J(a1), by
some amount ? (the learning rate).
15Gradient DescentDe-Mystified!
J(a)
J(a1)
-? J(a1)
J(a2)
Initialize a, ?, ?(k), k0 do k?k1 a
? a- ?(k) .?J(a) until ?(k) .?J(a)lt
? return a end
-? J(a2)
J(a3)
a
a1
a2
a3
?1
?2
16Some Issues to Consider
- How too choose the learning rate ??
- What should be the criterion function?
- Local / global minimum ?
- When to terminate ?
- There are many forms of the gradient descent that
addresses these issues Newtons descent, the
momentum term, etc., etc., etc.
17Newtons Descent
Red Simple gradient descent Black Newton descent
18The Criterion Function
- What should be the criterion function?
- The obvious choice of misclassified training
data samples, but this is a discontinuous
function, hence is not differentiable. - A better choice The perceptron criterion
function - Geometrically, this is the summation of distances
from the misclassified samples to the decision
boundary - Then,
Set of samples misclassified by ak
19The criterion Functions
of patterns misclassified
Perceptron criterion fc.
Bad
Good!
Total squareerror (TSE)
TSE with margin
Best
Better
But can be computationally expensive
20The Batch Perceptron Algorithm
- Training data is cycled through until error falls
below a threshold
The Batch Perceptron Algorithm for finding a
solution vector The next weight vectoris
obtained by adding some multiple of the sum of
the misclassified samples to thepresent weight
vector
21The Batch Perceptron Algorithm
a(1)0
Error Surface
The bottomof the error surface
22Single / Multi Layer
Single layer
Two-layer
23The Multilayer Perceptron
24Applications
- Ultrasonic Weld Inspection
25Weld Inspection
26Feature extraction Selection
- Discrete Wavelet Transform
27DWT of a UT Signal
Ft1 MHz fs10 MHz
28Gas Sensing
Bare piezoelectric crystal
Central part of the crystal coated with first
gold, and then polymer material
Electrode on back
Electrode on front
Crystal holder
29Gas Sensing
30Gas Sensing
31Gas sensing
32Homework
Implement computer exercises 2, 3, and 4