Rob Harrison - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Rob Harrison

Description:

Rob Harrison. Automatic Control & Systems Engineering. Bridging the Gaps ... trading off accuracy and generality. cross validation. what happens when we can't see ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 41
Provided by: robertfh4
Category:

less

Transcript and Presenter's Notes

Title: Rob Harrison


1
Bridging the Gapsa central problem in data
modelling
  • Rob Harrison
  • Automatic Control Systems Engineering

2
Agenda
  • sampling
  • density distribution
  • representation
  • accuracy vs generality
  • regularization
  • trading off accuracy and generality
  • cross validation
  • what happens when we cant see

3
The Data Modelling Problem
  • y f(x) zye
  • multivariate non-linear
  • measurement errors
  • xi, zi i 1N zi f(xi)ei
  • infer behaviour everywhere from a few examples
  • little or no prior information on f(x)
  • y etc. indicates estimate

4
A Simple Example
  • well-spaced samples
  • enough data (N 6)
  • noise-free

5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
observational data
15
Whats So Hard?
  • the gaps
  • get more (well-spaced) data
  • lack of prior knowledge
  • cant see

16
Dimensionality
  • lose ability to see the shape of f(x)
  • try it in 13-D
  • number of samples exponential in d
  • if N OK in 1-D, Nd needed in d-D
  • how do we know if well-spaced?
  • how can we sample where the action is?
  • observational vs experimental data!
  • ALWAYS undersampled!

17
Two Dimensions
  • same sampling density (N 62)
  • well-spaced?

18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
What goes on in the Gaps?
  • Universal Approximator
  • Advantage
  • can bend to (almost) any shape
  • Disadvantage
  • can bend to (almost) any shape
  • Training data is all we have to go on

22
Generic Structure
  • use e.g. a power series
  • Stone-Weierstrass
  • other bases e.g. Fourier series
  • y a5x5 a4x4 a3x3 a1x a0
  • d 5 six samples no error!

23
PROBLEM SOLVED!
24
Generic Structure
  • use e.g. a power series
  • Stone-Weierstrass
  • other bases e.g. Fourier series
  • y a5x5 a4x4 a3x3 a1x a0
  • d 5 six samples no error!
  • d gt 5 still no error but

25
poor inter-sample behaviour how can we know
without looking?
26
Generic Structure
  • use e.g. a power series
  • Stone-Weierstrass
  • other bases e.g. Fourier series
  • y a5x5 a4x4 a3x3 a1x a0
  • d 5 six samples no error!
  • d gt 5 still no error but
  • measurement error

27
(No Transcript)
28
Generic Structure
  • use e.g. a power series
  • Stone-Weierstrass
  • other bases e.g. Fourier series
  • y a5x5 a4x4 a3x3 a1x a0
  • d 5 six samples no error!
  • d gt 5 still no error but
  • measurement error
  • model is as complex as data

29
Curse Of Dimension
  • we can still use the idea but
  • in 2-D we get 21 terms
  • direct and cross products
  • in d-D we get (dd)!/d!d!
  • e.g. transform a 16x16 bitmap by d3 polynomial
    and get 3 million terms
  • sample size / distribution
  • practical for small problems

30
Other Basis Functions
  • Gaussian radial basis functions
  • additional design choices
  • how many?, where?, how wide?
  • adaptive sigmoidal basis functions
  • how many?

31
Overfitting (sample data)
zero error?
rough components
32
Underfitting (sample data)
over-smooth components
33
Goldilocks
v. small error
just right components
34
Restricting Flexibility
  • use data to tell the estimator how to behave
  • regularization/penalization
  • penalize roughness
  • e.g. SSE rQ
  • Q Sw2ij? w(FTFrI)-1 FTz
  • use potentially complex structure
  • data constrains where it can
  • Q constrains elsewhere

35
Hold-out Method
keep back P for testing wasteful sample dependent
Training RMSE 0.23 Testing RMSE 0.38
36
Cross Validation
  • leave-one-out CV
  • train on all but one
  • test that one
  • repeat N times
  • compute performance
  • m-fold CV
  • divide sample into m non-overlapping sets
  • proceed as above
  • all data used for training and testing
  • more work but realistic performance estimates
  • used to choose hyper-parameters
  • e.g. r, number, width

37
Y4
Y1
Z1
X1
Y2
Z2
X2
Y3
Z3
X3
Z4
X4
Z5
X5
38
X
Z
Z1
X1
Y1
Z2
X2
Y2
Z3
X3
Y3
Z4
X4
Y4
Z5
X5
39
X
Z
Y
Z1
X1
Y1
Z2
X2
Y2
Z3
X3
Y3
Y vs Z
Z4
X4
Y4
Z5
X5
Y5
generalization error
40
Best Practice
  • Use m-fold CV to select
  • regularization constant (hyper-parameter)
  • size of estimator (e.g. polynomial degree)
  • Train using best value all data
  • Retain a final test sample
Write a Comment
User Comments (0)
About PowerShow.com