Introduction to Kalman filtering presentation

About This Presentation

Transcript and Presenter's Notes

Title: Introduction to Kalman filtering

1
An Introduction to Kalman Filtering Parameter
estimation

Serge P. Hoogendoorn Hans van Lint
Transport Planning Department
Delft University of Technology

Rudolf E. Kalman
2
Scope of course

Introduction to Kalman filters
Application of Kalman filters to training ANN
Hands-on experience by exercises applied toyour
own problems or a problem we provide
Book review format
Each week one chapter will be discussed byone of
the course participants
Take care both the book / lecture notes have a
very loose notational convention!
Important information resource
http//en.wikipedia.org/wiki/Kalman_filtering

3
Contents of second lecture

Parameter fitting with KF
A simple regression example
Error bars and the effect of R and Q
More complex parameterized models
Overfitting and underfitting
ANN Mathematical structure
ANN Training and Testing
ANN EKF training
Final thoughts and conclusions
Assignments / Discussion

4
Contents of second lecture

Parameter fitting with KF
A simple regression example
Error bars and the effect of R and Q
More complex parameterized models
Overfitting and underfitting
ANN Mathematical structure
ANN Training and Testing
ANN EKF training
Final thoughts and conclusions
Assignments / Discussion

5
Parameters are everywhere

All models we use in our domain are (often
heavily) parameterized
y F(x, w)
Utility/choice models
Traffic flow models
Queueing/Delay models
Forecasting (timeseries (MA, AR)), regression,
classification (linear and nonlinear),
clustering, inference in general
Parameters, parameters and more parameters

6
Example parameter fitting

For example y F(x, w) w0 w1x
How to find parameters w?
Suppose we have a calibration / training dataset
dk, xk, k1, .., Nk
Steps
minimize cost function, e.g. C E ek2,
ek(dk-yk)
Dwz dC/dw 0
-gt d/dw E (dk-yk)2 0
-gt E (- 2 (dk-yk) dy/dw) 0

7
Example parameter fitting

For example y F(x, w) w0 w1x
How to find parameters w?
Easy in the
2-dimensional
linear case
And thus

8
Example parameter fitting

For example y F(x, w) w0 w1x
How to find parameters w?
Can this also be solved bij Kalman filter (which
is a minimum variance estimator right?
YES if we transform the problem in state-space
form
wk1 wk vr
dk F(x, w) ve

9
Example parameter fitting

For example y F(x, w) w0 w1x
How to find parameters w?
Kalman equations

10
Example parameter fitting

For example y(k) F(x, w) w0 w1x(k)
How to find parameters w?
Example one
Suppose we observe noisy
measurements dyre,
with re?N(0,0.2)
Fixed (guessed) Q, R

11
Example parameter fitting (1)
12
Example parameter fitting (1)
13
Example parameter fitting (1)
14
Example parameter fitting (1)
15
Do different settings of R, Q make things better?

Example two
Suppose we observe noisy
measurements dyre,
with re?N(0,0.2)
Fixed (guessed) Q, R
R now 100 times larger than Q means per step the
Kalman gain K will be smaller than in example one

16
Example parameter fitting (2)
17
Example parameter fitting (2)
18
Do different settings of R, Q make things better?

Example three
Suppose we observe noisy
measurements dyre,
with re?N(0,0.2)
Fixed (guessed) Q, R
Q now 10 times larger than R leads also to
smaller K per step (or?) and affects convergence
(P!)

19
Example parameter fitting (3)
20
Example parameter fitting (3)
21
Graphical explanation of what happens

22
Other ideas?

What if R would depend on the actual observations
and model performance?
What if Q would depend on the actual observations
and model performance?
What if we could reduce weight space a bit?

23
Other ideas?

What if R would depend on the actual observations
and model performance?

24
Leads to slight improvement
25
Other ideas?

What if Q would depend on the actual observations
and model performance?
Assumption
Kalman update
Equals real
update

26
Leads to significant improvement
27
Other ideas?

What if we could reduce weight space a bit?
E.g only search in an area ??(w0?? ??w0, w1??
??w1)
Can be done model becomes
yz(w)x ?x
But we can update weights in unconstraint weight
space by observing that around current w(hat)
dy/dw (dy/d?) (d?/dw).
So in Kalman filter update equations replace x by
x(d?/dw), rest stays the same!
e.g. ? z(w) b w / (1w/a)
Note dy/dw depicts sensitivity of model to
changes in w

28
Works fine if chosen range is feasible
29
Conclusions linear regression

Given noisy observations the resulting KF model
is always (in varying degrees) wrong and so is
(in the linear case) a regression line, albeit
the latter is better then KF, why?
In both examples, KF leads to smaller confidence
bounds at more recent observations and are much
larger than in case of direct LR estimation
Changing R influences magnitude of K small R
puts more weight on observations and larger K.
Slight improvement in making R adaptive
Changing Q also influences magnitude K and
moreso affects convergence speed P-PQ and
P(I-KX) P-. Making Q adaptive leads to
significant improvement

30
Contents of second lecture

Parameter fitting with KF
A simple regression example
Error bars and the effect of R and Q
More complex parameterized models
Overfitting and underfitting
ANN Mathematical structure
ANN Training and Testing
ANN EKF training
Final thoughts and conclusions
Assignments / Discussion

31
Linear or nonlinear?

y F(x, w)
Polynomial of order P
Is LINEAR in its parameters!
Nonlinear regression if y
depends nonlinearly
on its parameters -gt h(z) is
A nonlinear function (NB this is
almost an ANN)

32
Model complexity versus generality
33
Model complexity versus generality

The higher the polynomial order,
The more flexible it becomes more degrees of
freedom (parameters)
BUT
The more prone it becomes to overfitting
So how do you determine the right degree of
complexity when x and y are nonlinearly related,
multidemensional, and when the observations are
noisy ???

34
Through sound statistics!

Completely wrong
Test on training data
Wrong
Arbitrarily dividing x,y data in training and
testset (how would you know the testset
represents more general properties of the
population than the training set)
Right (at least better)
Bootstrap B trainingsets and test (B times) on
residual data -gt leads to stable estimate of mean
and variance of parameters
Random (or distribution based) subsampling
(similar idea)
Bayesian methods

35
What are Artificial Neural Networks?

Mathematical models! (nothing more, nothing less)
Can be be trained ( calibrated) on real data
Are capable of modeling complex mappings in many
dimensions in a very efficient manner

36
So ANN is a Mathematical Model

Basic idea of mathematical models
mimic the behavior of some process in real life
and use it to
Better understand the real process
Predict future states of the real process
Optimize the real process

37
When do I use ANNs?

Some real processes are very well known models
can be built on physical considerations
Newtonian Physics, Relativity Theory
Some real processes are partially known models
are parameterized approximations based on
physical / socio-economical considerations
Micro-economics, Traffic flow theory,
Demographics
For some processes no ready-to-use physical
theory is available models can only be based on
generic parameterized mathematical constructs
Stock Market exchange, Image recognition,
Human behaviour in general?

38
ANN Design
Step 1 Abstraction determine system borders,
relevant inputs, outputs, disturbances
Real Process
input
output
Mathematical model
Y

FANN(x,w)
39
ANN Design
Step 2 Model Selection determine nature and
generic form of mathematical model, in our case
the ANN Type of Artificial Neural
Network Topology of Artificial Neural
Network Learning Mechanisms and Methodology
40
Topology and Structure of ANNs
Inspiration the human brain (biological neural
network) a massively parallel processing system
with almost unlimited capacity in its distributed
memory to date (and in next 50 years) this is
still ALIEN TECHNOLOGY!!!
41
Topology and Structure of ANNs

ANN mathematical abstraction of its biological
counterpart

42
Topology and Structure of ANNs

Forward and backward propagation of signals
through ANN

x
yNN
y
Error
43
Topology and Structure of ANNs
44
Topology and Structure of ANNs

Many different forms and topologies
Static feed-forward
Recurrent or Feedback
Self Organizing Maps
Probabilistic
Different types of training / learning
mechanisms
Supervised (with a teacher correcting each error)
Reinforcement (with a teacher steering in the
right direction)
Unsupervised (Let ANN figure out statistical
properties of input by itself)

45
ANN Training
Step 3 Model Calibration / Validation Estimate
model parameters w on data from the real process

Define some performance Criterion / Function
(e.g. MSE)
Minimize / maximize Performance Function on
calibration dataset ? leads to w
Validate model (with parameterset w) with
validation dataset because
We want the model to perform well on unseen
data!!! (GENERALIZATION)

46
Applications of ANNs general

Image and Speech Recognition
Signal Processing, Filtering and Fault detection
Plant Control, Automated Control of complex
processes
Time Series Prediction (Multivariate!)
Data Mining, Data Modelling, Clustering, Data
Compression
Music composition (links on www.idsia.ch)
Good starting point the ANN FAQ on
ftp//ftp.sas.com/pub/neural/FAQ.html.

47
ANN in Traffic and Transport

Prediction/detection of congestion
Incident detection
Modeling driver behavior
Classification of vehicle platoons
License plate detection
Travel time prediction

48
Example parameter fitting (revisited)

For example y F(x, w)
How to find parameters w?
Suppose we have a calibration / training dataset
dk, xk, k1, .., Nk
Steps
minimize cost function, e.g. C E ek2,
ek(dk-yk)
Dwz dC/dw 0
-gt d/dw E (dk-yk)2 0
-gt E (- 2 (dk-yk) dy/dw) 0
So take steps in - dC/dw to come closer to true
w!
Note often C ½ E (-ek)2, which makes
calculations a bit easier

49
ANN Training Classic BP

See regression example WB
Minimize some cost function implies finding w
where dC/dw0
Is this a global minimum???

50
ANN Training Classic BP

NO! C maybe quadratic in the output error, but
the error is a function of w
w space is huge (dimension number of weights)
C(w) has many (probably infinite) local minima
Derivation of BP on demand (looks very similar
to Extended Kalman Filter Eqs !)
End result adjust w in the negative direction of
dC/dw, so

51
ANN Training Classic BP
52
ANN Training Classic BP

Note that the cost function contains an
Expectation over all data
Convergence to (local) minimum ONLY with weight
updates based on ALL available data (called
epoch)
Usually classic BP converges slow (1000s epochs
required)
Bad local minimum is almost guaranteed
Solutions
Smooth weight updates (called momentum)
Higher order (batch) algorithms
Higher order (incremental) algorithm EKF!!!

53
ANN Training EKF
54
Conclusions

(will show next week) EKF training does not
naturally lead to a general solution, but if
applied correctly leads to a good solution in
light of recent data
Recall that without addressing dependence of R
and Q on w its like driving a car with
blindfolds and a (nervous) instructor to pull the
weel you come home safely but learn nothing
Controlling complexity is crucial (remember
polynomial example)
Nonethless batch training leads to superior
models over online training

55
Next week ANN/EKF Examples

Matlab ANN/EKF examples first (/-) 30 mins next
lecture
Presentation of EKF alternative (UKF) (chap 7).
The UKF does not require derivatives nor does it
pose any normality assumptions on the posterior
distribution (it does on the prior should now
be obvious!)
You might also as soon as the book is there -
want to look at chapter 5, which has a more clear
explanation of the EKF parameter fitting problem
than chap 2.

56
Assignments
57
Assignments

Write a Comment

User Comments (0)

About PowerShow.com

Introduction to Kalman filtering PowerPoint PPT Presentation