Bagging and Boosting Classifiers - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Bagging and Boosting Classifiers

Description:

Choose an Unstable Classifier for Bagging ... Reference: BAGGING PREDICTORS. L.BREIMAN, MACHINE LEARNING, 26(2), P.123-140, 1996 ... – PowerPoint PPT presentation

Number of Views:1456
Avg rating:3.0/5.0
Slides: 43
Provided by: nda25
Category:

less

Transcript and Presenter's Notes

Title: Bagging and Boosting Classifiers


1
Bagging and Boosting Classifiers
  • The Churn problem

2
  • The Challenge
  • U.S. Wireless Telecom company
  • To discover whether the customer will churn or
    not during the next few months

?
  • Churn is the response dummy variable such that
  • Churn1 if the customer is churning
  • Churn -1 if the customer is not churning

3
Opportunity for this Challenge
  • Bagging and Boosting
  • Aggregating Classifiers

FINAL RULE
  • Breiman (1996) found gains in accuracy by
    aggregating predictors built from reweighed
    versions of the learning set

4
Bagging and Boosting Aggregating Classifiers
5
Bagging
  • Bagging Bootstrap Aggregating
  • Reweighing of the learning sets is done by
    drawing at random with replacement from the
    learning sets
  • Predictors are aggregated by plurality voting

6
The Bagging Algorithm
  • B bootstrap samples
  • From which we derive

7
Weighting
8
Aggregation
Sign
9
Boosting
  • Freund and Schapire (1997), Breiman (1998)
  • Data adaptively resampled

10
AdaBoost
  • Initialize weights
  • Fit a classifier with these weights
  • Give predicted probabilities to observations
    according to this classifier
  • Compute pseudo probabilities
  • Get new weights
  • Normalize it (i.e., rescale so that it sums
    to 1)
  • Combine the pseudo probabilities

11
Weighting
...
12
Aggregation
Sign
13
Advantages of Bagging Boosting
  • Easy to implement without additional information
  • For Bagging variance reduction, where
  • For Boosting variance and bias reduction, where
  • For Boosting no overfitting

14
Choose an Unstable Classifier for Bagging
  • Changes in the dataset produces big changes in
    the predictor
  • E.g. neural networks, classification and
    regression trees
  • E.g. of stable classifiers K-nearest neighbours

Reference BAGGING PREDICTORS L.BREIMAN, MACHINE
LEARNING, 26(2), P.123-140, 1996
15
Choose a Weak Classifier for Boosting
  • A classifier that performs slightly better than
    random guessing
  • Too weak classifiers do not provide good results
  • E.g. classification into 2 classes
  • Random guessing error rate of 50
  • Weak classifier error rate close to 50 (?45)
  • Stumps are appropriate weak classifiers (binary
    trees with 2 terminal nodes)
  • Adaboost with trees is the best off-the-shelf
    classifier in the world (Breiman, 1996)

Reference ADDITIVE LOGISTIC REGRESSION A
STATISTICAL VIEW OF BOOSTING J.FRIEDMAN,
T.HASTIE R.TIBSHIRANI, THE ANNALS OF
STATISTICS, 28(2)337-407, 2000
16
The Churn Problematic
  • Company
  • A Major wireless TELECOM company (US)
  • Churn rate 1,8 / month
  • Industry highly competitive
  • Consolidation into a few major players
  • Growth is slowing down
  • Competition based on price
  • Customer strategy to offer new services

17
The Churn Problematic
  • Churn rates
  • Annual churn rate in the telecom industry 20-25
  • (last years 25-46 )
  • Monthly churn rate 2
  • Reasons for churn
  • Increased competition
  • Similarities in offerings
  • Portability

18
The Churn Problematic
19
Selection of the Variables the Procedure
  • Descriptive analysis
  • Rejection of the variables with more than 30
    missing values
  • Theoretical background in Marketing
  • Principal Components Analysis for each category

20
Selection of the Variables
  • Available predictors for churn

21
Considered variables
22
Assesment of the performance
  • Training set (80) and test set (20)
  • Misclassification rate
  • Gini Index
  • Top Decile Index

23
Top Decile
  • Customers are sorted from the predicted most
    likely to churn to the predicted least.
  • Take only the higher 10

24
Top Decile
  • Customers are sorted from the predicted most
    likely to churn to the predicted least.
  • Take only the higher 10

10 churners
N 100 customers
25
Gini Index
Risk to churn
10
26
ResultsBagging a decision tree
27
Gini Index
28
Gini Index
29
Gini Index
30
Top Decile
31
Top Decile
32
Top Decile
33
Misclassification Rate
34
Misclassification Rate
35
Misclassification Rate
36
Results Boosting a Decision Stump
  • Gini 36 improvement
  • Top Decile 23 improvement
  • Misclassification Rate 7 improvement

37
Comparison of Bagging and Boosting
38
Conclusions
  • Bagging and Boosting are easy to implement
  • They give convincing results without any
    additional information
  • Results depend heavily on the particular
    classification problem
  • Many competing versions of Boosting (e.g.
    TreeNet Stochastic Gradient Boosting)
  • Still many open issues

39
(No Transcript)
40
Gini Index
OVER-OPTIMISM
41
Top Decile
OVER-OPTIMISM
42
Misclassification Rate
OVER-OPTIMISM
Write a Comment
User Comments (0)
About PowerShow.com