Bagging and Boosting Classifiers - PowerPoint PPT Presentation

1 / 42

About This Presentation

Title:

Bagging and Boosting Classifiers

Description:

Choose an Unstable Classifier for Bagging ... Reference: BAGGING PREDICTORS. L.BREIMAN, MACHINE LEARNING, 26(2), P.123-140, 1996 ... – PowerPoint PPT presentation

Number of Views:1456

Avg rating:3.0/5.0

Slides: 43

Provided by: nda25

Category:

more less

Transcript and Presenter's Notes

Title: Bagging and Boosting Classifiers

1
Bagging and Boosting Classifiers

The Churn problem

The Challenge
U.S. Wireless Telecom company
To discover whether the customer will churn or
not during the next few months

Churn is the response dummy variable such that
Churn1 if the customer is churning
Churn -1 if the customer is not churning

3
Opportunity for this Challenge

Bagging and Boosting
Aggregating Classifiers

FINAL RULE

Breiman (1996) found gains in accuracy by
aggregating predictors built from reweighed
versions of the learning set

4
Bagging and Boosting Aggregating Classifiers
5
Bagging

Bagging Bootstrap Aggregating
Reweighing of the learning sets is done by
drawing at random with replacement from the
learning sets
Predictors are aggregated by plurality voting

6
The Bagging Algorithm

B bootstrap samples
From which we derive

7
Weighting
8
Aggregation
Sign
9
Boosting

Freund and Schapire (1997), Breiman (1998)
Data adaptively resampled

10
AdaBoost

Initialize weights
Fit a classifier with these weights
Give predicted probabilities to observations
according to this classifier
Compute pseudo probabilities
Get new weights
Normalize it (i.e., rescale so that it sums
to 1)
Combine the pseudo probabilities

11
Weighting
...
12
Aggregation
Sign
13
Advantages of Bagging Boosting

Easy to implement without additional information

For Bagging variance reduction, where

For Boosting variance and bias reduction, where

For Boosting no overfitting

14
Choose an Unstable Classifier for Bagging

Changes in the dataset produces big changes in
the predictor
E.g. neural networks, classification and
regression trees
E.g. of stable classifiers K-nearest neighbours

Reference BAGGING PREDICTORS L.BREIMAN, MACHINE
LEARNING, 26(2), P.123-140, 1996
15
Choose a Weak Classifier for Boosting

A classifier that performs slightly better than
random guessing
Too weak classifiers do not provide good results
E.g. classification into 2 classes
Random guessing error rate of 50
Weak classifier error rate close to 50 (?45)
Stumps are appropriate weak classifiers (binary
trees with 2 terminal nodes)

Adaboost with trees is the best off-the-shelf
classifier in the world (Breiman, 1996)

Reference ADDITIVE LOGISTIC REGRESSION A
STATISTICAL VIEW OF BOOSTING J.FRIEDMAN,
T.HASTIE R.TIBSHIRANI, THE ANNALS OF
STATISTICS, 28(2)337-407, 2000
16
The Churn Problematic

Company
A Major wireless TELECOM company (US)
Churn rate 1,8 / month
Industry highly competitive
Consolidation into a few major players
Growth is slowing down
Competition based on price
Customer strategy to offer new services

17
The Churn Problematic

Churn rates
Annual churn rate in the telecom industry 20-25
(last years 25-46 )
Monthly churn rate 2
Reasons for churn
Increased competition
Similarities in offerings
Portability

18
The Churn Problematic
19
Selection of the Variables the Procedure

Descriptive analysis
Rejection of the variables with more than 30
missing values
Theoretical background in Marketing
Principal Components Analysis for each category

20
Selection of the Variables

Available predictors for churn

21
Considered variables
22
Assesment of the performance

Training set (80) and test set (20)
Misclassification rate
Gini Index
Top Decile Index

23
Top Decile

Customers are sorted from the predicted most
likely to churn to the predicted least.
Take only the higher 10

24
Top Decile

Customers are sorted from the predicted most
likely to churn to the predicted least.
Take only the higher 10

10 churners
N 100 customers
25
Gini Index
Risk to churn
10
26
ResultsBagging a decision tree
27
Gini Index
28
Gini Index
29
Gini Index
30
Top Decile
31
Top Decile
32
Top Decile
33
Misclassification Rate
34
Misclassification Rate
35
Misclassification Rate
36
Results Boosting a Decision Stump