Model Assessment and Selection - PowerPoint PPT Presentation

About This Presentation
Title:

Model Assessment and Selection

Description:

Further Bias Decomposition. For linear models (eg. Ridge), bias can be further decomposed: ... Bias Variance Decomposition. 20. Optimism of The. Training Error Rate ... – PowerPoint PPT presentation

Number of Views:144
Avg rating:3.0/5.0
Slides: 45
Provided by: jian120
Category:

less

Transcript and Presenter's Notes

Title: Model Assessment and Selection


1
Model Assessment and Selection
  • Lecture Notes for Comp540 Chapter7
  • Jian Li
  • Mar.2007

2
Goal
  • Model Selection
  • Model Assessment

3
A Regression Problem
  • y f(x) noise
  • Can we learn f from this data?
  • Lets consider three methods...

4
Linear Regression
5
Quadratic Regression
6
Joining the dots
7
Which is best?
  • Why not choose the method with the best fit to
    the data?

How well are you going to predict future data
drawn from the same distribution?
8
Model Selection and Assessment
  • Model Selection Estimating performances of
    different models to choose the best one (produces
    the minimum of the test error)
  • Model Assessment Having chosen a model,
    estimating the prediction error on new data

9
Why Errors
  • Why do we want to study errors?
  • In a data-rich situation split the data

Train
Validation
Test
Model Selection
Model assessment
  • But, thats not usually the case

10
Overall Motivation
  • Errors
  • Measurement of errors (Loss functions)
  • Decomposing Test Error into Bias Variance
  • Estimating the true error
  • Estimating in-sample error (analytically )
  • AIC, BIC, MDL, SRM with VC
  • Estimating extra-sample error (efficient sample
    reuse)
  • Cross Validation Bootstrapping

11
Measuring Errors Loss Functions
  • Typical regression loss functions
  • Squared error
  • Absolute error

12
Measuring Errors Loss Functions
  • Typical classification loss functions
  • 0-1 Loss
  • Log-likelihood (cross-entropy loss / deviance)

13
The Goal Low Test Error
  • We want to minimize generalization error or test
    error
  • But all we really know is training error

?
  • And this is a bad estimate of test error

14
Bias, Variance Complexity
  • Training error can always be reduced when
    increasing model complexity, but risks
    over-fitting.

Typically
15
Decomposing Test Error
Model
  • For squared-error loss additive noise

Deviation of the average estimate from the true
functions mean
Irreducible error of target Y
Expected squared deviation of our estimate around
its mean
16
Further Bias Decomposition
  • For linear models (eg. Ridge), bias can be
    further decomposed
  • ? is the best fitting linear approximation

For standard linear regression, Estimation Bias
0
17
Graphical representation of bias variance
Model Space (basic linear regression)
Hypothesis Space
Realization
Model Fitting
Truth
Regularized Model Space (ridge regression)
Model Bias
Estimation Variance
Estimation Bias
18
Bias Variance Decomposition Examples
  • kNN Regression
  • Linear Regression

19
Simulated Example of Bias Variance Decomposition
Prediction error
-- -- --
-- -- --
Bias2
Regression with squared error loss
Variance
Bias-Variance different for 0-1 loss than for
squared error loss
-- -- ltgt --
-- -- ltgt --
Classification with 0-1 loss
Estimation errors on the right side of the
boundary dont hurt!
20
Optimism of The Training Error Rate
  • Typically training error rate lt true error
  • (same data is being used to fit the method and
    assess its error)

lt
overly optimistic
21
Estimating Test Error
  • Can we estimate the discrepancy between err and
    Err?

extra-sample error
Expectation over N new responses at each xi
Errin --- In-sample error
Adjustment for optimism of training error
22
Optimism
Summary
for squared error, 0-1 and other loss functions
  • For linear fit with d indep inputs/basis funcs
  • optimism linearly with d
  • Optimism as training sample size

23
Ways to Estimate Prediction Error
  • In-sample error estimates
  • AIC
  • BIC
  • MDL
  • SRM
  • Extra-sample error estimates
  • Cross-Validation
  • Leave-one-out
  • K-fold
  • Bootstrap

24
Estimates of In-Sample Prediction Error
  • General form of the in-sample estimate
  • For linear fit

25
AIC BIC
Similarly Akaike Information Criterion (AIC)
Bayesian Information Criterion (BIC)
26
AIC BIC
27
MDL(Minimum Description Length)
  • Regularity Compressibility
  • Learning Finding regularities

Learning model
Input Samples Rn
Predictions R1
Real class R1
Real model
?
error
28
MDL(Minimum Description Length)
  • Regularity Compressibility
  • Learning Finding regularities

Description of the model under optimal coding
Length of transmitting the discrepancy given the
model optimal coding under the given model
MDL principle choose the model with the minimum
description length
Equivalent to maximizing the posterior
29
SRM with VC (Vapnik-Chernovenkis) Dimension
  • Vapnik showed that with probability 1-?

As h increases
A method of selecting a class F from a family of
nested classes
30
Errin Estimation
  • A trade-off between the fit to the data and the
    model complexity

31
Estimation of Extra-Sample Err
  • Cross Validation
  • Bootstrap

32
Cross-Validation
test
train
K-fold

33
How many folds?
Computation increases
Variance decreases
bias decreases
k fold
Leave-one-out
k increases
34
Cross-Validation Choosing K
Popular choices for K 5,10,N
35
Generalized Cross-Validation
  • LOOCV can be computational expensive for linear
    fitting with large N
  • Linear fitting
  • For linear fitting under squared-error loss
  • GCV provides a computationally cheaper
    approximation

36
Bootstrap Main Concept
The bootstrap is a computer-based method of
statistical inference that can answer many real
statistical questions without formulas (An
Introduction to the Bootstrap, Efron and
Tibshirani, 1993)
Step 2 Calculate the statistic

Step 1 Draw samples with replacement
37
How is it coming
In practice cannot afford large number of random
samples
The theory tells us the sampling distribution
38
Bootstrap Error Estimation with Errboot
Depends on the unknown true distribution F
A straightforward application of bootstrap to
error prediction
39
Bootstrap Error Estimation with Err(1)
A CV-inspired improvement on Errboot
40
Bootstrap Error Estimation with Err(.632)
An improvement on Err(1) in light-fitting cases
?
41
Bootstrap Error Estimation with Err(.632)
An improvement on Err(.632) by adaptively
accounting for overfitting
  • Depending on the amount of overfitting, the best
    error estimate is as little as Err(.632) , or as
    much as Err(1), or something in between
  • Err(.632) is like Err(.632) with adaptive
    weights, with Err(1) weighted at least .632
  • Err(.632) adaptively mixes training error and
    leave-one-out error using the relative
    overfitting rate (R)

42
Bootstrap Error Estimation with Err(.632)
43
Cross Validation Bootstrap
  • Why bother with cross-validation and bootstrap
    when analytical estimates are known?
  • AIC, BIC, MDL, SRM all requires knowledge of d,
  • which is difficult to attain in most situations.

2) Bootstrap and cross validation gives similar
results to above but also applicable in more
complex situation.
3) Estimating the noise variance requires a
roughly working model, cross validation and
bootstrap will work well even if the model is
far from correct.
44
Conclusion
  • Test error plays crucial roles in model selection
  • AIC, BIC and SRMVC have the advantage that you
    only need the training error
  • If VC-dimension is known, then SRM is a good
    method for model selection requires much less
    computation than CV and bootstrap, but is wildly
    conservative
  • Methods like CV, Bootstrap give tighter error
    bounds, but might have more variance
  • Asymptotically AIC and Leave-one-out CV should be
    the same
  • Asymptotically BIC and a carefully chosen k-fold
    should be the same
  • BIC is what you want if you want the best
    structure instead of the best predictor
  • Bootstrap has much wider applicability than just
    estimating prediction error
Write a Comment
User Comments (0)
About PowerShow.com