Title: More Trees: Ensemble and Applications
1More TreesEnsemble and Applications
2Bias Variance Decomposition
Average error distance bias variance noise
3Bias
- T_2 is a pruned version of T_1, thus having a
larger bias
4Bias
- Bias
- is the distance from the true decision boundary
(target) to the average trained boundary - becomes larger when stronger assumptions are made
by a classifier about the nature of its decision
boundary - less sensitive to the data (already has its own
mind, not very open-minded)
5Variance
- A training set is one instance of many possible
data sets possible - We could have gotten a different training set
- If so, would the trained classifier be very
different? If so, large variance - Variance
- is the variability of trained boundaries from
one another
6Variance
- If a classifier is overfitted, it would result in
a large variance - Because the resulting classifier will be very
different depending on the particular training
data set
7Bias Variance
- Average boundaries (100 data sets)
- Whose bias is larger? In general?
8Ensemble Methods
- Construct a set of trees (or classifiers) from
the training data - Predict class label of previously unseen records
by aggregating predictions made by multiple
classifiers
9General Idea
10Why does it work?
- Suppose there are T25 base classifiers
- Each classifier has error rate, ? 0.35
- Assume classifiers are independent
- Probability that the ensemble classifier makes a
wrong prediction
11Examples of Ensemble Methods
- How to generate an ensemble of classifiers?
- Bagging of complex overfitted classifiers
(trees, neural networks) - Boosting of simple underfitted classifiers
(stump) - Random Forest of complex trees
12Bagging (Breiman)
- Generate T bootstrap data sets by sampling with
replacement an example of T3, n10 - Each sample has probability (1 1/n)n of being
selected - Build classifier on each bootstrap data set
- Combine output of t classifiers
- Reduces variance of classifiers
13Why Bagging works
14Boosting
- An iterative procedure to adaptively change
distribution of training data by focusing more on
previously misclassified records - Initially, all n records are assigned equal
weights - Unlike bagging, weights may change at the end of
boosting round
15Boosting
- Records that are wrongly classified will have
their weights increased - Records that are classified correctly will have
their weights decreased
- Example 4 is hard to classify
- Its weight is increased, therefore it is more
likely to be chosen again in subsequent rounds
16Example AdaBoost
- Base classifiers C1, C2, , CT
- Error rate
- Importance of a classifier
17Example AdaBoost
- Update of weight of the training record i during
j-th boosting round - Weight increases for incorrectly classified
pattern while decreases for correctly classified
pattern - The higher the weight, the more likely to be
selected
18Example AdaBoost
- Instead of majority voting, prediction by each
classifier is weighted according to - Classification
19Random Forest
20Random Forest
- A forest (set) of decision trees, each of which
is built based on a random subset of variables - Less intelligent individually, but as a group,
intelligent - When a large number of variables exist
21Ensemble
- Pros
- Reduces model error
- Is Easy to automate
- Cons
- Hurts Interpretability
- Requires more Time for training and evaluation
- Reference
- Introduction to Data Mining by Tan, Steinbach,
and Kumar for more information
22Up-sell
Platinum
Amount per Transaction
Gold
Maximize
You Want your Customer To spend more
ARPU
Green
?? ?? CRM ???? CRM ??? ?
23- Target Marketing (cf Mass Marketing)
- Who to recommend Platinum Card among our valued
customers? - gt Those who show similar usage patterns to
Platinum customers - How to find them?
- gt Decision Tree
24To identify Platinum Customers
- Objective Identify those non-platinum customers
who are very much like platinum customers - Identify Platinum behaviors
- How? using Decision Tree
Compare with Platinum customers
Non-Platinum customers
Candidates for Up-sell
Pseudo-Platinum customers
25To identify Platinum customers
Build a DT from Usage Data of both Platinum and
non-Platinum customers
Target those non-Platinum customers who are
incorrectly classified as Platinum customers by
the DT
26To identify Platinum customers (Usage data how
much on where)
- Variables stores 17, amount of usage (total,
credit/cash/Loan ratio) - Training Data Platinum/non-platinum 4,877 /
4,877 (Training/Validation 6040)
C
A
B
- U Data
- A Training Data (Platinum customers)
- B Training Data (non-P customers)
- BC to be scored
U
27To identify Platinum Customers
- Resulting Rules
- IF 5 star hotel 110 dollars or more use of
Airline - THEN Platinum 93.1 (787)
- IF Country Club gt 480 Japanese Restaurant gt
100 no use of Airline - THEN Platinum 92.7 (151)
- IF Country Club gt 70 Japanese Restaurant lt
240 5 star hotel lt 110 use of ? - Airline
- THEN Platinum 93.3 (90)
- Identify Prospects
- Among 295,123 non-Ps
28Target Marketing Channel
- First decile experienced telemarketers
- Second decile less experienced telemarketers
- Third decile letter
- Fourth decile SMS
- Fifth decile email
- Below no action
29 - www.netflix.com
- Monthly DVD movie service
- Revenue 5M in 2000 to 1B in 2006
- Busted Blockbuster (www.blockbuster.com )
- CEO Hastings, Stanford CS Masters
- A true analytics maniac
- Test test test before doing anything
30Netflix Core competence
- Cinematch Movie Recommendation engine
- Clustering of movies customer evaluation
- Personalized homepage with recommendations
- 1M cash award for 10 improvement
- Throttling Send movies to valued C first
- Valued C those who watch it rarely
- Movie procurement decision made based on
- Customer response to similar movies
31 - Started as an Internet bookstore
- Now they sell just about anything
- Shin ramen?
- www.amazon.com
32Amazon Core Competence
- Product Recommendations
- Past purchase history
- Past search history
- Check recommendations