Title: BiasVariance Tradeoff
1Bias-Variance Tradeoff
- Presented by Yang Yu
- yyu3_at_glue.umd.edu
2Outline
- Generalization Performance of Learning Methods
- Bias, Variance and Model Complexity
- The Bias-Variance Decomposition
- Example Bias-Variance Tradeoff
- Summary
3Generalization Performance of Learning methods
- Generalization performance of a learning method
relates to its prediction capability on
independent test data - Assessment of generalization performance is
extremely important in practice - Bias and variance are important in assessing
generalization performance
4Bias, Variance and Model Complexity
- Test error and training error
- Test error
- Expected prediction error over an independent
test sample - Training error
- Average loss over the training sample
- X input vector Y target variable
- prediction model
- L loss function measuring error between Y
and
5- Behavior of test sample and training sample error
as the model complexity is varied
6Bias, Variance and Model Complexity
- The relationship of bias, variance and model
complexity - The goal is to find a model with optimal
complexity that gives minimum test error
7The Bias-Variance Decomposition
- Assume , where and
the expression for the expected prediction
error of a regression fit with squared-error loss
is
8The Bias-Variance Decomposition for
K-nearest-neighbor Regression
k inversely related to the model complexity
9The Bias-Variance Decomposition for Linear Model
- where
- Average of the prediction error
- The model complexity is related with p.
10The Bias-Variance Decomposition for Linear
Modelmore on bias
Let denotes the parameters of the
best-fitting linear approximation to f , i.e,
11(No Transcript)
12Example Bias-Variance Tradeoff
13Summary
- With the increase (decrease) of model complexity,
estimation bias decreases(increases) but variance
increases(decreases) - Methods of estimating the test error should be
found out to minimize the error with optimal
model complexity by tuning the model parameters - Bias-variance tradeoff behaves differently with
different loss function, and so does the choice
of tuning paramenters