Title: Introduction to Kalman filtering
1An Introduction to Kalman Filtering Parameter
estimation
- Serge P. Hoogendoorn Hans van Lint
- Transport Planning Department
- Delft University of Technology
Rudolf E. Kalman
2Scope of course
- Introduction to Kalman filters
- Application of Kalman filters to training ANN
- Hands-on experience by exercises applied toyour
own problems or a problem we provide - Book review format
- Each week one chapter will be discussed byone of
the course participants - Take care both the book / lecture notes have a
very loose notational convention! - Important information resource
- http//en.wikipedia.org/wiki/Kalman_filtering
3Contents of second lecture
- Parameter fitting with KF
- A simple regression example
- Error bars and the effect of R and Q
- More complex parameterized models
- Overfitting and underfitting
- ANN Mathematical structure
- ANN Training and Testing
- ANN EKF training
- Final thoughts and conclusions
- Assignments / Discussion
4Contents of second lecture
- Parameter fitting with KF
- A simple regression example
- Error bars and the effect of R and Q
- More complex parameterized models
- Overfitting and underfitting
- ANN Mathematical structure
- ANN Training and Testing
- ANN EKF training
- Final thoughts and conclusions
- Assignments / Discussion
5Parameters are everywhere
- All models we use in our domain are (often
heavily) parameterized - y F(x, w)
- Utility/choice models
- Traffic flow models
- Queueing/Delay models
- Forecasting (timeseries (MA, AR)), regression,
classification (linear and nonlinear),
clustering, inference in general - Parameters, parameters and more parameters
6Example parameter fitting
- For example y F(x, w) w0 w1x
- How to find parameters w?
- Suppose we have a calibration / training dataset
dk, xk, k1, .., Nk - Steps
- minimize cost function, e.g. C E ek2,
ek(dk-yk) - Dwz dC/dw 0
- -gt d/dw E (dk-yk)2 0
- -gt E (- 2 (dk-yk) dy/dw) 0
7Example parameter fitting
- For example y F(x, w) w0 w1x
- How to find parameters w?
- Easy in the
- 2-dimensional
- linear case
- And thus
8Example parameter fitting
- For example y F(x, w) w0 w1x
- How to find parameters w?
- Can this also be solved bij Kalman filter (which
is a minimum variance estimator right? - YES if we transform the problem in state-space
form - wk1 wk vr
- dk F(x, w) ve
9Example parameter fitting
- For example y F(x, w) w0 w1x
- How to find parameters w?
- Kalman equations
10Example parameter fitting
- For example y(k) F(x, w) w0 w1x(k)
- How to find parameters w?
- Example one
- Suppose we observe noisy
- measurements dyre,
- with re?N(0,0.2)
- Fixed (guessed) Q, R
11Example parameter fitting (1)
12Example parameter fitting (1)
13Example parameter fitting (1)
14Example parameter fitting (1)
15Do different settings of R, Q make things better?
- Example two
- Suppose we observe noisy
- measurements dyre,
- with re?N(0,0.2)
- Fixed (guessed) Q, R
- R now 100 times larger than Q means per step the
Kalman gain K will be smaller than in example one
16Example parameter fitting (2)
17Example parameter fitting (2)
18Do different settings of R, Q make things better?
- Example three
- Suppose we observe noisy
- measurements dyre,
- with re?N(0,0.2)
- Fixed (guessed) Q, R
- Q now 10 times larger than R leads also to
smaller K per step (or?) and affects convergence
(P!)
19Example parameter fitting (3)
20Example parameter fitting (3)
21Graphical explanation of what happens
22Other ideas?
- What if R would depend on the actual observations
and model performance? - What if Q would depend on the actual observations
and model performance? - What if we could reduce weight space a bit?
23Other ideas?
- What if R would depend on the actual observations
and model performance?
24Leads to slight improvement
25Other ideas?
- What if Q would depend on the actual observations
and model performance? - Assumption
- Kalman update
- Equals real
- update
26Leads to significant improvement
27Other ideas?
- What if we could reduce weight space a bit?
- E.g only search in an area ??(w0?? ??w0, w1??
??w1) - Can be done model becomes
- yz(w)x ?x
- But we can update weights in unconstraint weight
space by observing that around current w(hat) - dy/dw (dy/d?) (d?/dw).
- So in Kalman filter update equations replace x by
x(d?/dw), rest stays the same! - e.g. ? z(w) b w / (1w/a)
- Note dy/dw depicts sensitivity of model to
changes in w
28Works fine if chosen range is feasible
29Conclusions linear regression
- Given noisy observations the resulting KF model
is always (in varying degrees) wrong and so is
(in the linear case) a regression line, albeit
the latter is better then KF, why? - In both examples, KF leads to smaller confidence
bounds at more recent observations and are much
larger than in case of direct LR estimation - Changing R influences magnitude of K small R
puts more weight on observations and larger K.
Slight improvement in making R adaptive - Changing Q also influences magnitude K and
moreso affects convergence speed P-PQ and
P(I-KX) P-. Making Q adaptive leads to
significant improvement
30Contents of second lecture
- Parameter fitting with KF
- A simple regression example
- Error bars and the effect of R and Q
- More complex parameterized models
- Overfitting and underfitting
- ANN Mathematical structure
- ANN Training and Testing
- ANN EKF training
- Final thoughts and conclusions
- Assignments / Discussion
31Linear or nonlinear?
- y F(x, w)
- Polynomial of order P
- Is LINEAR in its parameters!
- Nonlinear regression if y
- depends nonlinearly
- on its parameters -gt h(z) is
- A nonlinear function (NB this is
- almost an ANN)
32Model complexity versus generality
33Model complexity versus generality
- The higher the polynomial order,
- The more flexible it becomes more degrees of
freedom (parameters) - BUT
- The more prone it becomes to overfitting
- So how do you determine the right degree of
complexity when x and y are nonlinearly related,
multidemensional, and when the observations are
noisy ???
34Through sound statistics!
- Completely wrong
- Test on training data
- Wrong
- Arbitrarily dividing x,y data in training and
testset (how would you know the testset
represents more general properties of the
population than the training set) - Right (at least better)
- Bootstrap B trainingsets and test (B times) on
residual data -gt leads to stable estimate of mean
and variance of parameters - Random (or distribution based) subsampling
(similar idea) - Bayesian methods
35What are Artificial Neural Networks?
- Mathematical models! (nothing more, nothing less)
- Can be be trained ( calibrated) on real data
- Are capable of modeling complex mappings in many
dimensions in a very efficient manner
36So ANN is a Mathematical Model
- Basic idea of mathematical models
- mimic the behavior of some process in real life
- and use it to
- Better understand the real process
- Predict future states of the real process
- Optimize the real process
37When do I use ANNs?
- Some real processes are very well known models
can be built on physical considerations - Newtonian Physics, Relativity Theory
- Some real processes are partially known models
are parameterized approximations based on
physical / socio-economical considerations - Micro-economics, Traffic flow theory,
Demographics - For some processes no ready-to-use physical
theory is available models can only be based on
generic parameterized mathematical constructs - Stock Market exchange, Image recognition,
Human behaviour in general?
38ANN Design
Step 1 Abstraction determine system borders,
relevant inputs, outputs, disturbances
Real Process
input
output
Mathematical model
Y
FANN(x,w)
39ANN Design
Step 2 Model Selection determine nature and
generic form of mathematical model, in our case
the ANN Type of Artificial Neural
Network Topology of Artificial Neural
Network Learning Mechanisms and Methodology
40Topology and Structure of ANNs
Inspiration the human brain (biological neural
network) a massively parallel processing system
with almost unlimited capacity in its distributed
memory to date (and in next 50 years) this is
still ALIEN TECHNOLOGY!!!
41Topology and Structure of ANNs
- ANN mathematical abstraction of its biological
counterpart
42Topology and Structure of ANNs
- Forward and backward propagation of signals
through ANN
x
yNN
y
Error
43Topology and Structure of ANNs
44Topology and Structure of ANNs
- Many different forms and topologies
- Static feed-forward
- Recurrent or Feedback
- Self Organizing Maps
- Probabilistic
- Different types of training / learning
mechanisms - Supervised (with a teacher correcting each error)
- Reinforcement (with a teacher steering in the
right direction) - Unsupervised (Let ANN figure out statistical
properties of input by itself)
45ANN Training
Step 3 Model Calibration / Validation Estimate
model parameters w on data from the real process
- Define some performance Criterion / Function
(e.g. MSE) - Minimize / maximize Performance Function on
calibration dataset ? leads to w - Validate model (with parameterset w) with
validation dataset because - We want the model to perform well on unseen
data!!! (GENERALIZATION)
46Applications of ANNs general
- Image and Speech Recognition
- Signal Processing, Filtering and Fault detection
- Plant Control, Automated Control of complex
processes - Time Series Prediction (Multivariate!)
- Data Mining, Data Modelling, Clustering, Data
Compression - Music composition (links on www.idsia.ch)
- Good starting point the ANN FAQ on
ftp//ftp.sas.com/pub/neural/FAQ.html.
47ANN in Traffic and Transport
- Prediction/detection of congestion
- Incident detection
- Modeling driver behavior
- Classification of vehicle platoons
- License plate detection
- Travel time prediction
48Example parameter fitting (revisited)
- For example y F(x, w)
- How to find parameters w?
- Suppose we have a calibration / training dataset
dk, xk, k1, .., Nk - Steps
- minimize cost function, e.g. C E ek2,
ek(dk-yk) - Dwz dC/dw 0
- -gt d/dw E (dk-yk)2 0
- -gt E (- 2 (dk-yk) dy/dw) 0
- So take steps in - dC/dw to come closer to true
w! - Note often C ½ E (-ek)2, which makes
calculations a bit easier
49ANN Training Classic BP
- See regression example WB
- Minimize some cost function implies finding w
where dC/dw0 - Is this a global minimum???
-
50ANN Training Classic BP
- NO! C maybe quadratic in the output error, but
the error is a function of w - w space is huge (dimension number of weights)
- C(w) has many (probably infinite) local minima
- Derivation of BP on demand (looks very similar
to Extended Kalman Filter Eqs !) - End result adjust w in the negative direction of
dC/dw, so
51ANN Training Classic BP
52ANN Training Classic BP
- Note that the cost function contains an
Expectation over all data - Convergence to (local) minimum ONLY with weight
updates based on ALL available data (called
epoch) - Usually classic BP converges slow (1000s epochs
required) - Bad local minimum is almost guaranteed
- Solutions
- Smooth weight updates (called momentum)
- Higher order (batch) algorithms
- Higher order (incremental) algorithm EKF!!!
53ANN Training EKF
54Conclusions
- (will show next week) EKF training does not
naturally lead to a general solution, but if
applied correctly leads to a good solution in
light of recent data - Recall that without addressing dependence of R
and Q on w its like driving a car with
blindfolds and a (nervous) instructor to pull the
weel you come home safely but learn nothing - Controlling complexity is crucial (remember
polynomial example) - Nonethless batch training leads to superior
models over online training
55Next week ANN/EKF Examples
- Matlab ANN/EKF examples first (/-) 30 mins next
lecture - Presentation of EKF alternative (UKF) (chap 7).
The UKF does not require derivatives nor does it
pose any normality assumptions on the posterior
distribution (it does on the prior should now
be obvious!) - You might also as soon as the book is there -
want to look at chapter 5, which has a more clear
explanation of the EKF parameter fitting problem
than chap 2.
56Assignments
57Assignments