Least-Mean-Square Algorithm - PowerPoint PPT Presentation

1 / 15

About This Presentation

Title:

Least-Mean-Square Algorithm

Description:

Function: pattern classification. Type: Nonlinear single layer ... in stationary environment only (no adaptation or ... algorithm is used to determine ... – PowerPoint PPT presentation

Number of Views:2291

Avg rating:3.0/5.0

Slides: 16

Provided by: asimk

Category:

more less

Transcript and Presenter's Notes

Title: Least-Mean-Square Algorithm

1
Least-Mean-Square Algorithm

CS/CMPE 537 Neural Networks

2
Linear Adaptive Filter

Linear adaptive filter performs a linear
transformation of signal according to a
performance measure which is minimized or
maximized
The development of LAFs followed work of
Rosenblatt (perceptron) and early neural network
researchers
LAFs can be considered as linear single layer
feedforward neural networks
Least-mean-square algorithm is a popular learning
algorithm for LAFs (and linear single layer
networks)
Wide applicability
Signal processing
Control

3
Historical Note

Linear associative memory (early 1970s)
Function memory by association
Type linear single layer feedforward network
Perceptron (late 50s, early 60s)
Function pattern classification
Type Nonlinear single layer feedforward network
Linear adaptive filter or Adaline (1960s)
Function adaptive signal processing
Type linear single layer feedforward network

4
Spatial Filter
5
Wiener-Hopf Equations (1)

The goal is to find the optimum weights that
minimizes the difference between the system
output y and some desired response d in the
mean-square sense
System equations
y Sk1 p wkxk
e d y
Performance measure or cost function
J 0.5Ee2 E expectation operator
Find the optimum weights for which J is a minimum

6
Wiener-Hopf Equations (2)

Substituting and simplifying
J 0.5Ed2 ESk1 p wkxkd 0.5ESj1 pSk1
pwjwkxjxk
Noting that expectation is a linear operator and
w a constant
J 0.5Ed2 Sk1 p wkExkd 0.5Sj1 pSk1
pwjwkExjxk
Let
rd Ed2 rdx(k) Edxk rx(j, k) Exjxk
Then
J 0.5rd Sk1 p wkrdx(k) 0.5Sj1 pSk1
pwjwkrx(j,k)
To find the optimum weight
NablawkJ dJ/ dwk 0 k 1, 2,, p
-rdx(k) Sj1 pwjrx(j,k)

7
Wiener-Hopf Equations (3)

Let wok be the optimum weights, then
Sj1 pwojrx(j,k) rdx(k) k 1, 2,, p
These system of equations are known as the
Wiener-Hopf equations. Their solution yields the
optimum weights for the Wiener filter (spatial
filter)
The solution of the Wiener-Hopf equations require
the inverse of the autocorrelation matrix rx(j,
k). This can be computationally expensive

8
Method of Steepest Descent (1)
9
Method of Steepest Descent (2)

Iteratively move in the direction of steepest
descent (opposite the gradient direction) until
the minimum is reached approximately
Let wk(n) be the weight at iteration n. Then, the
gradient at iteration n is
NablawkJ(n) -rdx(k) Sj1 pwj(n)rx(j,k)
Adjustment applied to wk(n) at iteration n is
given by
? positive learning rate parameter

10
Method of Steepest Descent (3)

Cost function J(n) 0.5Ee2(n) is the ensemble
average of all squared errors at the instant n
drawn from a population of identical filters
An identical update rule can be derived when cost
function is J 0.5Si1 n e2(i)
Method of steepest descent requires knowledge of
the environment. Specifically, the terms rdx(k)
and rx(j, k) must be known
What happens in an unknown environment?
Use estimates -gt least-mean-square algorithm

11
Least-Mean-Square Algorithm (1)

LMS algorithm is based on instantaneous estimates
of rx(j, k) and rdx(k)
rx(j, kn) xj(n)xk(n)
rdx(kn) xk(n)d(n)
Substituting these estimates, the update rule
becomes
wk(n1) wk(n) ?xk(n)d(n) Sj1
pwj(n)xj(n)xk(n)
wk(n1) wk(n) ?d(n) Sj1
pwj(n)xj(n)xk(n)
wk(n1) wk(n) ?d(n) y(n)xk(n) k 1,
2,, p
This is also know as the delta rule or the
Widrow-Hoff rule

12
LMS Algorithm (2)
13
LMS Vs Method of Steepest Descent
LMS Steepest Descent
Can operate in unknown environment (i.e. w(0) 0) Cannot operate in unknown environment (rx and rdx mut be known
Can operate in stationary and non-stationary environment (optimum seeking and tracking) Can operate in stationary environment only (no adaptation or tracking)
Minimizes instantaneous square error Minimizes mean-square-error (or sum of squared errors)
Stochastic Deterministic
Approximate Exact
14
Adaline (1)
15
Adaline (2)

Adaline (adaptive linear element) is an adaptive
signal processing/pattern classification machine
that uses LMS algorithm. Developed by Widrow and
Hoff
Inputs x are either -1 or 1, threshold is
between 0 and 1 and output is either -1 or 1
LMS algorithm is used to determine the weights.
Instead of using the output y, the net input u is
used in the error computation, i.e., e d u
(because y is quantized in the Adaline)