Introduction to Kalman filtering PowerPoint PPT Presentation

presentation player overlay
1 / 54
About This Presentation
Transcript and Presenter's Notes

Title: Introduction to Kalman filtering


1
An Introduction to Kalman Filtering Parameter
estimation
  • Serge P. Hoogendoorn Hans van Lint
  • Transport Planning Department
  • Delft University of Technology

Rudolf E. Kalman
2
Scope of course
  • Introduction to Kalman filters
  • Application of Kalman filters to training ANN
  • Hands-on experience by exercises applied toyour
    own problems or a problem we provide
  • Book review format
  • Each week one chapter will be discussed byone of
    the course participants
  • Take care both the book / lecture notes have a
    very loose notational convention!
  • Important information resource
  • http//en.wikipedia.org/wiki/Kalman_filtering

3
Contents of second lecture
  • Parameter fitting with KF
  • A simple regression example
  • Error bars and the effect of R and Q
  • More complex parameterized models
  • Overfitting and underfitting
  • ANN Mathematical structure
  • ANN Training and Testing
  • ANN EKF training
  • Final thoughts and conclusions
  • Assignments / Discussion

4
Contents of second lecture
  • Parameter fitting with KF
  • A simple regression example
  • Error bars and the effect of R and Q
  • More complex parameterized models
  • Overfitting and underfitting
  • ANN Mathematical structure
  • ANN Training and Testing
  • ANN EKF training
  • Final thoughts and conclusions
  • Assignments / Discussion

5
Parameters are everywhere
  • All models we use in our domain are (often
    heavily) parameterized
  • y F(x, w)
  • Utility/choice models
  • Traffic flow models
  • Queueing/Delay models
  • Forecasting (timeseries (MA, AR)), regression,
    classification (linear and nonlinear),
    clustering, inference in general
  • Parameters, parameters and more parameters

6
Example parameter fitting
  • For example y F(x, w) w0 w1x
  • How to find parameters w?
  • Suppose we have a calibration / training dataset
    dk, xk, k1, .., Nk
  • Steps
  • minimize cost function, e.g. C E ek2,
    ek(dk-yk)
  • Dwz dC/dw 0
  • -gt d/dw E (dk-yk)2 0
  • -gt E (- 2 (dk-yk) dy/dw) 0

7
Example parameter fitting
  • For example y F(x, w) w0 w1x
  • How to find parameters w?
  • Easy in the
  • 2-dimensional
  • linear case
  • And thus

8
Example parameter fitting
  • For example y F(x, w) w0 w1x
  • How to find parameters w?
  • Can this also be solved bij Kalman filter (which
    is a minimum variance estimator right?
  • YES if we transform the problem in state-space
    form
  • wk1 wk vr
  • dk F(x, w) ve

9
Example parameter fitting
  • For example y F(x, w) w0 w1x
  • How to find parameters w?
  • Kalman equations

10
Example parameter fitting
  • For example y(k) F(x, w) w0 w1x(k)
  • How to find parameters w?
  • Example one
  • Suppose we observe noisy
  • measurements dyre,
  • with re?N(0,0.2)
  • Fixed (guessed) Q, R

11
Example parameter fitting (1)
12
Example parameter fitting (1)
13
Example parameter fitting (1)
14
Example parameter fitting (1)
15
Do different settings of R, Q make things better?
  • Example two
  • Suppose we observe noisy
  • measurements dyre,
  • with re?N(0,0.2)
  • Fixed (guessed) Q, R
  • R now 100 times larger than Q means per step the
    Kalman gain K will be smaller than in example one

16
Example parameter fitting (2)
17
Example parameter fitting (2)
18
Do different settings of R, Q make things better?
  • Example three
  • Suppose we observe noisy
  • measurements dyre,
  • with re?N(0,0.2)
  • Fixed (guessed) Q, R
  • Q now 10 times larger than R leads also to
    smaller K per step (or?) and affects convergence
    (P!)

19
Example parameter fitting (3)
20
Example parameter fitting (3)
21
Graphical explanation of what happens
  • WB

22
Other ideas?
  • What if R would depend on the actual observations
    and model performance?
  • What if Q would depend on the actual observations
    and model performance?
  • What if we could reduce weight space a bit?

23
Other ideas?
  • What if R would depend on the actual observations
    and model performance?

24
Leads to slight improvement
25
Other ideas?
  • What if Q would depend on the actual observations
    and model performance?
  • Assumption
  • Kalman update
  • Equals real
  • update

26
Leads to significant improvement
27
Other ideas?
  • What if we could reduce weight space a bit?
  • E.g only search in an area ??(w0?? ??w0, w1??
    ??w1)
  • Can be done model becomes
  • yz(w)x ?x
  • But we can update weights in unconstraint weight
    space by observing that around current w(hat)
  • dy/dw (dy/d?) (d?/dw).
  • So in Kalman filter update equations replace x by
    x(d?/dw), rest stays the same!
  • e.g. ? z(w) b w / (1w/a)
  • Note dy/dw depicts sensitivity of model to
    changes in w

28
Works fine if chosen range is feasible
29
Conclusions linear regression
  • Given noisy observations the resulting KF model
    is always (in varying degrees) wrong and so is
    (in the linear case) a regression line, albeit
    the latter is better then KF, why?
  • In both examples, KF leads to smaller confidence
    bounds at more recent observations and are much
    larger than in case of direct LR estimation
  • Changing R influences magnitude of K small R
    puts more weight on observations and larger K.
    Slight improvement in making R adaptive
  • Changing Q also influences magnitude K and
    moreso affects convergence speed P-PQ and
    P(I-KX) P-. Making Q adaptive leads to
    significant improvement

30
Contents of second lecture
  • Parameter fitting with KF
  • A simple regression example
  • Error bars and the effect of R and Q
  • More complex parameterized models
  • Overfitting and underfitting
  • ANN Mathematical structure
  • ANN Training and Testing
  • ANN EKF training
  • Final thoughts and conclusions
  • Assignments / Discussion

31
Linear or nonlinear?
  • y F(x, w)
  • Polynomial of order P
  • Is LINEAR in its parameters!
  • Nonlinear regression if y
  • depends nonlinearly
  • on its parameters -gt h(z) is
  • A nonlinear function (NB this is
  • almost an ANN)

32
Model complexity versus generality
33
Model complexity versus generality
  • The higher the polynomial order,
  • The more flexible it becomes more degrees of
    freedom (parameters)
  • BUT
  • The more prone it becomes to overfitting
  • So how do you determine the right degree of
    complexity when x and y are nonlinearly related,
    multidemensional, and when the observations are
    noisy ???

34
Through sound statistics!
  • Completely wrong
  • Test on training data
  • Wrong
  • Arbitrarily dividing x,y data in training and
    testset (how would you know the testset
    represents more general properties of the
    population than the training set)
  • Right (at least better)
  • Bootstrap B trainingsets and test (B times) on
    residual data -gt leads to stable estimate of mean
    and variance of parameters
  • Random (or distribution based) subsampling
    (similar idea)
  • Bayesian methods

35
What are Artificial Neural Networks?
  • Mathematical models! (nothing more, nothing less)
  • Can be be trained ( calibrated) on real data
  • Are capable of modeling complex mappings in many
    dimensions in a very efficient manner

36
So ANN is a Mathematical Model
  • Basic idea of mathematical models
  • mimic the behavior of some process in real life
  • and use it to
  • Better understand the real process
  • Predict future states of the real process
  • Optimize the real process

37
When do I use ANNs?
  • Some real processes are very well known models
    can be built on physical considerations
  • Newtonian Physics, Relativity Theory
  • Some real processes are partially known models
    are parameterized approximations based on
    physical / socio-economical considerations
  • Micro-economics, Traffic flow theory,
    Demographics
  • For some processes no ready-to-use physical
    theory is available models can only be based on
    generic parameterized mathematical constructs
  • Stock Market exchange, Image recognition,
    Human behaviour in general?

38
ANN Design
Step 1 Abstraction determine system borders,
relevant inputs, outputs, disturbances
Real Process
input
output
Mathematical model
Y

FANN(x,w)
39
ANN Design
Step 2 Model Selection determine nature and
generic form of mathematical model, in our case
the ANN Type of Artificial Neural
Network Topology of Artificial Neural
Network Learning Mechanisms and Methodology
40
Topology and Structure of ANNs
Inspiration the human brain (biological neural
network) a massively parallel processing system
with almost unlimited capacity in its distributed
memory to date (and in next 50 years) this is
still ALIEN TECHNOLOGY!!!
41
Topology and Structure of ANNs
  • ANN mathematical abstraction of its biological
    counterpart

42
Topology and Structure of ANNs
  • Forward and backward propagation of signals
    through ANN

x
yNN
y
Error
43
Topology and Structure of ANNs
44
Topology and Structure of ANNs
  • Many different forms and topologies
  • Static feed-forward
  • Recurrent or Feedback
  • Self Organizing Maps
  • Probabilistic
  • Different types of training / learning
    mechanisms
  • Supervised (with a teacher correcting each error)
  • Reinforcement (with a teacher steering in the
    right direction)
  • Unsupervised (Let ANN figure out statistical
    properties of input by itself)

45
ANN Training
Step 3 Model Calibration / Validation Estimate
model parameters w on data from the real process
  • Define some performance Criterion / Function
    (e.g. MSE)
  • Minimize / maximize Performance Function on
    calibration dataset ? leads to w
  • Validate model (with parameterset w) with
    validation dataset because
  • We want the model to perform well on unseen
    data!!! (GENERALIZATION)

46
Applications of ANNs general
  • Image and Speech Recognition
  • Signal Processing, Filtering and Fault detection
  • Plant Control, Automated Control of complex
    processes
  • Time Series Prediction (Multivariate!)
  • Data Mining, Data Modelling, Clustering, Data
    Compression
  • Music composition (links on www.idsia.ch)
  • Good starting point the ANN FAQ on
    ftp//ftp.sas.com/pub/neural/FAQ.html.

47
ANN in Traffic and Transport
  • Prediction/detection of congestion
  • Incident detection
  • Modeling driver behavior
  • Classification of vehicle platoons
  • License plate detection
  • Travel time prediction

48
Example parameter fitting (revisited)
  • For example y F(x, w)
  • How to find parameters w?
  • Suppose we have a calibration / training dataset
    dk, xk, k1, .., Nk
  • Steps
  • minimize cost function, e.g. C E ek2,
    ek(dk-yk)
  • Dwz dC/dw 0
  • -gt d/dw E (dk-yk)2 0
  • -gt E (- 2 (dk-yk) dy/dw) 0
  • So take steps in - dC/dw to come closer to true
    w!
  • Note often C ½ E (-ek)2, which makes
    calculations a bit easier

49
ANN Training Classic BP
  • See regression example WB
  • Minimize some cost function implies finding w
    where dC/dw0
  • Is this a global minimum???

50
ANN Training Classic BP
  • NO! C maybe quadratic in the output error, but
    the error is a function of w
  • w space is huge (dimension number of weights)
  • C(w) has many (probably infinite) local minima
  • Derivation of BP on demand (looks very similar
    to Extended Kalman Filter Eqs !)
  • End result adjust w in the negative direction of
    dC/dw, so

51
ANN Training Classic BP
52
ANN Training Classic BP
  • Note that the cost function contains an
    Expectation over all data
  • Convergence to (local) minimum ONLY with weight
    updates based on ALL available data (called
    epoch)
  • Usually classic BP converges slow (1000s epochs
    required)
  • Bad local minimum is almost guaranteed
  • Solutions
  • Smooth weight updates (called momentum)
  • Higher order (batch) algorithms
  • Higher order (incremental) algorithm EKF!!!

53
ANN Training EKF
54
Conclusions
  • (will show next week) EKF training does not
    naturally lead to a general solution, but if
    applied correctly leads to a good solution in
    light of recent data
  • Recall that without addressing dependence of R
    and Q on w its like driving a car with
    blindfolds and a (nervous) instructor to pull the
    weel you come home safely but learn nothing
  • Controlling complexity is crucial (remember
    polynomial example)
  • Nonethless batch training leads to superior
    models over online training

55
Next week ANN/EKF Examples
  • Matlab ANN/EKF examples first (/-) 30 mins next
    lecture
  • Presentation of EKF alternative (UKF) (chap 7).
    The UKF does not require derivatives nor does it
    pose any normality assumptions on the posterior
    distribution (it does on the prior should now
    be obvious!)
  • You might also as soon as the book is there -
    want to look at chapter 5, which has a more clear
    explanation of the EKF parameter fitting problem
    than chap 2.

56
Assignments
57
Assignments
Write a Comment
User Comments (0)
About PowerShow.com