Title: Least-Mean-Square Algorithm
1Least-Mean-Square Algorithm
- CS/CMPE 537 Neural Networks
2Linear Adaptive Filter
- Linear adaptive filter performs a linear
transformation of signal according to a
performance measure which is minimized or
maximized - The development of LAFs followed work of
Rosenblatt (perceptron) and early neural network
researchers - LAFs can be considered as linear single layer
feedforward neural networks - Least-mean-square algorithm is a popular learning
algorithm for LAFs (and linear single layer
networks) - Wide applicability
- Signal processing
- Control
3Historical Note
- Linear associative memory (early 1970s)
- Function memory by association
- Type linear single layer feedforward network
- Perceptron (late 50s, early 60s)
- Function pattern classification
- Type Nonlinear single layer feedforward network
- Linear adaptive filter or Adaline (1960s)
- Function adaptive signal processing
- Type linear single layer feedforward network
4Spatial Filter
5Wiener-Hopf Equations (1)
- The goal is to find the optimum weights that
minimizes the difference between the system
output y and some desired response d in the
mean-square sense - System equations
- y Sk1 p wkxk
- e d y
- Performance measure or cost function
- J 0.5Ee2 E expectation operator
- Find the optimum weights for which J is a minimum
6Wiener-Hopf Equations (2)
- Substituting and simplifying
- J 0.5Ed2 ESk1 p wkxkd 0.5ESj1 pSk1
pwjwkxjxk - Noting that expectation is a linear operator and
w a constant - J 0.5Ed2 Sk1 p wkExkd 0.5Sj1 pSk1
pwjwkExjxk - Let
- rd Ed2 rdx(k) Edxk rx(j, k) Exjxk
- Then
- J 0.5rd Sk1 p wkrdx(k) 0.5Sj1 pSk1
pwjwkrx(j,k) - To find the optimum weight
- NablawkJ dJ/ dwk 0 k 1, 2,, p
- -rdx(k) Sj1 pwjrx(j,k)
7Wiener-Hopf Equations (3)
- Let wok be the optimum weights, then
- Sj1 pwojrx(j,k) rdx(k) k 1, 2,, p
- These system of equations are known as the
Wiener-Hopf equations. Their solution yields the
optimum weights for the Wiener filter (spatial
filter) - The solution of the Wiener-Hopf equations require
the inverse of the autocorrelation matrix rx(j,
k). This can be computationally expensive
8Method of Steepest Descent (1)
9Method of Steepest Descent (2)
- Iteratively move in the direction of steepest
descent (opposite the gradient direction) until
the minimum is reached approximately - Let wk(n) be the weight at iteration n. Then, the
gradient at iteration n is - NablawkJ(n) -rdx(k) Sj1 pwj(n)rx(j,k)
- Adjustment applied to wk(n) at iteration n is
given by - ? positive learning rate parameter
10Method of Steepest Descent (3)
- Cost function J(n) 0.5Ee2(n) is the ensemble
average of all squared errors at the instant n
drawn from a population of identical filters - An identical update rule can be derived when cost
function is J 0.5Si1 n e2(i) - Method of steepest descent requires knowledge of
the environment. Specifically, the terms rdx(k)
and rx(j, k) must be known - What happens in an unknown environment?
- Use estimates -gt least-mean-square algorithm
11Least-Mean-Square Algorithm (1)
- LMS algorithm is based on instantaneous estimates
of rx(j, k) and rdx(k) - rx(j, kn) xj(n)xk(n)
- rdx(kn) xk(n)d(n)
- Substituting these estimates, the update rule
becomes - wk(n1) wk(n) ?xk(n)d(n) Sj1
pwj(n)xj(n)xk(n) - wk(n1) wk(n) ?d(n) Sj1
pwj(n)xj(n)xk(n) - wk(n1) wk(n) ?d(n) y(n)xk(n) k 1,
2,, p - This is also know as the delta rule or the
Widrow-Hoff rule
12LMS Algorithm (2)
13LMS Vs Method of Steepest Descent
LMS Steepest Descent
Can operate in unknown environment (i.e. w(0) 0) Cannot operate in unknown environment (rx and rdx mut be known
Can operate in stationary and non-stationary environment (optimum seeking and tracking) Can operate in stationary environment only (no adaptation or tracking)
Minimizes instantaneous square error Minimizes mean-square-error (or sum of squared errors)
Stochastic Deterministic
Approximate Exact
14Adaline (1)
15Adaline (2)
- Adaline (adaptive linear element) is an adaptive
signal processing/pattern classification machine
that uses LMS algorithm. Developed by Widrow and
Hoff - Inputs x are either -1 or 1, threshold is
between 0 and 1 and output is either -1 or 1 - LMS algorithm is used to determine the weights.
Instead of using the output y, the net input u is
used in the error computation, i.e., e d u
(because y is quantized in the Adaline)