BiasVariance Decomposition - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

BiasVariance Decomposition

Description:

Gaining insight in the factors that determine the succes of a learning algorithm ... Function that defines the misclassification cost ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 16
Provided by: derkc
Category:

less

Transcript and Presenter's Notes

Title: BiasVariance Decomposition


1
Bias-Variance Decomposition
  • Journals
  • A Unified Bias-Variance Decomposition for
    Zero-One and Squared Loss (2000)
  • Pedro Domingos
  • A Bias-Variance Analysis of a Real World Learning
    Problem The CoIL Challenge 2000
  • Peter van der Putten, Maarten van Someren

Derk Crezee
2
Performance Ananlysis
  • Why
  • Gaining insight in the factors that determine the
    succes of a learning algorithm
  • Getting a better understanding of a learning
    algorithm
  • Making different learning methods comparable
  • How
  • Bias-Variance Decomposition

3
Loss Functions
  • Loss Function
  • Function that defines the misclassification cost
  • Loss function L(t,y) measure the cost of
    predicting y, when the true value is t
  • Learning goal Minimizing avarage loss
  • Examples
  • Squared loss L(t,y) (t - y)2
  • Absolute loss L(t,y) t - y
  • Zero-one loss L(t,y) 0, when t y, and 1
    otherwise

4
Decomposition
  • Loss Function
  • L(t,y) is function of trainset, because the
    classification model constructed by the learner
    is different (usualy) for every trainset
  • Expected Loss
  • ED,t L(t,y)with D a set of trainsets
  • Decomposes into three terms Bias, Variance and
    Noise

5
Definitions
  • Optimal classification y classification that
    minimizes Et L(t,y)
  • Optimal model classifies every instance as y
  • Main prediction ym argminy ED L(y,y)with D
    set of trainsets

6
Bias, Variance and Noise
  • Bias
  • B(x) L(y ,ym)
  • The loss from the main classification relative to
    the optimal calssification
  • The loss standard caused by the learning
    algorithm
  • Noise
  • N(x) Et L(t, y)
  • The loss which is an unavoidable consequence of
    the data

7
Bias, Variance and Noise
  • Variance
  • V(x) ED L(ym , y)
  • The avarage loss from the classifications
    relative to the main prediction
  • A high variance tells us that classifications
    differ a lot if we use different training sets
  • Overfitting!
  • Trade-off between Bias and Variance!

8
Bias-Variance Decomposition
  • ED,t L(t,y)
  • c1 Et L(t, y) L(y ,ym) c2 ED L(ym
    , y)
  • c1 N(x) B(x) c2 V(x)
  • c1 and c2 take on different values for different
    loss functions

9
Remarks
  • Decomposition not always possible
  • Bias, Variance and Noise not always seperable
  • Sometimes it is not possible to calculate Bias-,
    Variance- or Noise- error

10
Real World Example
  • The CoIL Challange 2000
  • Real world problem
  • Data mining competition
  • Data
  • noisy
  • skewed
  • correlated
  • high dimensional
  • weak relation between input and output

11
Real World Example
  • The CoIL Challange 2000
  • Evironmentinsurance products
  • Goals
  • predict which people would be interested in a
    specific insurance product
  • explain why people would be interested in this
    product
  • Use any algorithm(s) you want, just get the best
    result posible

12
CoIL Challenge Results
  • Performance varies over wide range
  • from 1 to 2.5 times the result of random
    selection (0.5 times the total number of correct
    predictions possible)
  • Howto compare the results
  • Bias-Variance Decomposition
  • Zero-one loss function used

13
Result Analysis
  • Learning algorithms are after decomposition
  • Characterized by their Bias- and Variance- error
  • Relation between strength and correctness of bias
    and
  • bias-variance error

14
Conclusion CoIL Challenge
  • For noisy data the biggest problems are noise-
    and variance- error
  • Try to reduce all three components of error
  • Noise error is unavoidable, so no solutions here
  • Reduce variance error by making use of
    datapreparation
  • Reduce variance error by using strong bias
  • Test model for bias error, to reduce it
  • Use simple models, because each degree of freedom
    introduces more possible variance error

15
??? !!!!
  • Questions
  • Remarks
Write a Comment
User Comments (0)
About PowerShow.com