Bagging and Boosting in Data Mining - PowerPoint PPT Presentation

About This Presentation
Title:

Bagging and Boosting in Data Mining

Description:

Bagging (Bootstrap Aggregating) Leo Breiman, UC Berkeley. Boosting. Rob Schapire, ATT Research ... Bagging Algorithm. 1. Create k bootstrap replicates of the dataset ... – PowerPoint PPT presentation

Number of Views:1086
Avg rating:3.0/5.0
Slides: 9
Provided by: ruiz6
Learn more at: http://web.cs.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: Bagging and Boosting in Data Mining


1
Bagging and Boosting in Data Mining
  • Carolina Ruiz
  • ruiz_at_cs.wpi.edu
  • http//www.cs.wpi.edu/ruiz

2
Motivation and Background
  • Problem Definition
  • Given a dataset of instances and a target
    concept
  • Find a model (e.g. set of association rules,
    decision tree, neural network) that helps in
    predicting the classification of unseen
    instances.
  • Difficulties
  • The model should be stable (i.e. shouldnt depend
    too much on input data used to construct it)
  • The model should be a good predictor (difficult
    to achieve when input dataset is small)

3
Two Approaches
  • Bagging (Bootstrap Aggregating)
  • Leo Breiman, UC Berkeley
  • Boosting
  • Rob Schapire, ATT Research
  • Jerry Friedman, Stanford U.

4
Bagging
  • Model Creation
  • Create bootstrap replicates of the dataset and
    fit a model to each one
  • Prediction
  • Average/vote predictions of each model
  • Advantages
  • Stabilizes unstable methods
  • Easy to implement, parallelizable.

5
Bagging Algorithm
  • 1. Create k bootstrap replicates of the
    dataset
  • 2. Fit a model to each of the replicates
  • 3. Average/vote the predictions of the k models

6
Boosting
  • Creating the model
  • Construct a sequence of datasets and models in
    such a way that a dataset in the sequence weights
    an instance heavily when the previous model has
    misclassified it.
  • Prediction
  • Merge the models in the sequence
  • Advantages
  • Improves classification accuracy






7
Generic Boosting Algorithm
  • 1. Equally weight all instance in dataset
  • 2. For I 1 to T
  • 2.1. Fit a model to current dataset
  • 2.2. Upweight poorly predicted instances
  • 2.3 Downweight well-predicted instances
  • 3. Merge the models in the sequence to obtain the
    final model

8
Conclusions and References
  • Boosted naïve Bayes tied for first place in
    KDD-cup 1997
  • Reference
  • Combining Estimators to Improve Performance
    KDD-99 tutorial notes
  • John F. Elder
  • Greg Ridgeway
Write a Comment
User Comments (0)
About PowerShow.com