Bagging a Stacked Classifier - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

Bagging a Stacked Classifier

Description:

Bagging a Stacked Classifier. Lemmens A. ... Austral. NNet. Logit. Discrim. Tree. Level-Zero Classifiers. Test Error. Rate (in ... Austral. CW. Bagged Stacked ... – PowerPoint PPT presentation

Number of Views:275

Avg rating:3.0/5.0

Slides: 22

Provided by: facu56

Category:

Tags: austral | bagging | classifier | stacked

Transcript and Presenter's Notes

Title: Bagging a Stacked Classifier

1
Bagging a Stacked Classifier

Lemmens A.,
Joossens K.,
and Croux C.

2
Outline of the talk

Combining different classifiers by stacking
(Wolpert 1992, LeBlanc and Tibshirani 1996),
We propose
A new combination algorithm
Optimal Stacking
To apply bagging after stacking
Bagged Optimal Stacking

3
Stacked Classification

The problem
Training set of n observations

The aim is to predict y for a new instance x

4
Stacked Classification

Several classification models exist
Logistic regression
Discriminant analysis
Classification trees
Neural Nets
Support Vector Machine
A solution is to combine
Homogeneous classifiers e.g. Boosting
(for a review, Hastie, Tibshirani and Friedman
2001)
Heterogeneous classifiers e.g. Stacking

Best ???
5
Stacked Classification

Stacking consists in combining K level-zero
classifiers

Level-zero
Level-one
6
Stacked Classification

Stacking consists in combining K level-zero
classifiers

Level-zero
Level-one
-i
-i
-i
Estimated by cross-validation
7
How to combine classifiers?

Optimal Weighted Average combination
Find such that
performs better than any level-zero
classifiers,
with
and

8
How to combine classifiers?

Optimal Weighted Average combination
Find such that
performs better than any level-zero
classifiers,
with
and

Cross-validated error rate on the training set
9
How to combine classifiers?

Finding by greedy algorithm
Compute, by 10-fold cross validation, the scores
Compute the cross-validated error rate to sort
the classifiers from the smallest to the highest
error rate . Set
Update cycle for
find such that the error rate of the
combined classifier
is minimized.
Set
Iterate over different update cycles.

10
Advantages of the Algorithm

By construction, the cross-validated error rate
on the training data of the stacked classifier
outperforms any of its components.
Other optimization criteria can be chosen (ROC,
error rate, specificity, etc.).
Easy to implement.
Remark other level-one classifiers exist.

11
Results (1) Cross-Validated Error Rates
12
Results (2) Test Error Rates
13
Bagging the Stacked Classifiers

Why Bagging (Breiman 1996) ?
Bagging reduces the variance of the stacked
classifiers,
Takes profit from their instability to improve
predictive performance.

14
Bagging Example 1Test Error Rate
15
Bagging Example 2Test Error Rate
16
Results (3) Test Error Rates
17
Conclusions

New optimal weighted combination algorithm for
stacking
Easy to implement,
Optimizes any criterion of choice (ROC, error
rate, specificity, etc.),
By construction, systematic improvement of the
cross-validated error rate on the training data.
Bagging a stacked classifier
Bagged optimal weighted average outperforms the
non-bagged version. Also holds for other
level-one classifiers.

18
References

Breiman, L. (1996), Bagging Predictors, Machine
Learning, 26, 123-140.
Hastie, T., Tibshirani, R. and Friedman, J.
(2001), The elements of statistical learning
data mining, inference, and prediction,
Springer-Verlag, New York.
LeBlanc, M. and Tibshirani, R. (1996), Combining
estimates in regression and classification,
Journal of the American Statistical Association,
91, 1641-1647.
Wolpert, D.H. (1992), Stacked generalization,
Neural Networks, 5, 241-259.
Proceedings, COMPSTAT 2004.

19
(No Transcript)
20
Results (2) Test Error Rates
21
Results (3) Test Error Rates

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

A Classifier-based Deterministic Parser for Chinese PowerPoint PPT Presentation

A Classifier-based Deterministic Parser for Chinese - Constituency parsing is one of the most fundamental tasks in NLP. ... Honours thesis, National University of Singapore. Zhang Le, 2004. ... | PowerPoint PPT presentation | free to view

Bumping and Stacking PowerPoint PPT Presentation

Bumping and Stacking - (Least Median of Squares) Choosing , we get approximate estimate version of LMS estimator. ... Problem is one of surface-fitting of simple math functions. ... | PowerPoint PPT presentation | free to view

Introduction to Weka PowerPoint PPT Presentation

Introduction to Weka - An evaluation method: correlation-based, wrapper, information gain, chi-squared, ... based classifiers, support vector machines, multi-layer perceptrons, logistic ... | PowerPoint PPT presentation | free to view

Combining classifiers based on kernel density estimates and Gaussian mixtures PowerPoint PPT Presentation

Combining classifiers based on kernel density estimates and Gaussian mixtures - Decision Trees and rule-based methods : NewID, AC2, Cal5, CN2, C4.5, CART, Bayes Rule, Itrule ... CART and Neural networks are unstable classifiers. ... | PowerPoint PPT presentation | free to view

ICS 278: Data Mining Lectures 9 and 10: Classification Algorithms PowerPoint PPT Presentation

ICS 278: Data Mining Lectures 9 and 10: Classification Algorithms - Ranking and Lift Curves ... n/r is the 'lift' provided by the classifier for top K ... Random ranking gives lift = 1, or 0% increase in lift ... | PowerPoint PPT presentation | free to view

Prsentation PowerPoint PowerPoint PPT Presentation

Prsentation PowerPoint - 2d strategy (lower variance): consider an underlying indicator-vector that estimates the ... Extra error comes from the variance of around its mean. ... | PowerPoint PPT presentation | free to view

what is an ensemble PowerPoint PPT Presentation

what is an ensemble - ... of train set. create classifier using train' as training set ... generate classifierk with current weighted train set k = sum of wi's of misclassified points ... | PowerPoint PPT presentation | free to view

CIS732-Lecture-27-20031029 PowerPoint PPT Presentation

CIS732-Lecture-27-20031029 - Small change to training set causes large change in output hypothesis ... P[j] Combiner-Stacked-Gen (Train-Set[j], L, k, n, m', Levels - 1) ELSE // Base case: 1 level ... | PowerPoint PPT presentation | free to view

Friday, 23 May 2003 PowerPoint PPT Presentation

Friday, 23 May 2003 - Resulting prediction in set of legal class labels ... Train-Set[i] S[i] P[i] L[i].Train-Inducer (Train-Set[i]) RETURN (Make-Predictor (P, k) ... | PowerPoint PPT presentation | free to view

Machine Learning and Neural Networks Professor Tony Martinez Computer Science Department Brigham Young University http://axon.cs.byu.edu/~martinez PowerPoint PPT Presentation

Machine Learning and Neural Networks Professor Tony Martinez Computer Science Department Brigham Young University http://axon.cs.byu.edu/~martinez - Decision Trees, Nearest Neighbor/IBL, Genetic Algorithms, Rule ... Stacking, Gating/Mixture of Experts, Bagging, Boosting, Wagging, Mimicking, Combinations ... | PowerPoint PPT presentation | free to view

A Survey of Boosting HMM Acoustic Model Training PowerPoint PPT Presentation

A Survey of Boosting HMM Acoustic Model Training - In bagging, generating complementary base-learners is left to chance and to the ... In each round, bagging randomly selects a number of examples from the original ... | PowerPoint PPT presentation | free to view

Weka PowerPoint PPT Presentation

Weka - Download software from http://www.cs.waikato.ac.nz/ml/weka ... Bagging, boosting, stacking, error-correcting output codes, locally weighted learning, ... | PowerPoint PPT presentation | free to view

Data Mining PowerPoint PPT Presentation

Data Mining - A model is a global description of data, or an abstract representation of a real ... Bagging. Boosting. 9/03. Data Mining Classification. G Dong. 28. Bibliography ... | PowerPoint PPT presentation | free to view

MCS 2005 Round Table PowerPoint PPT Presentation

MCS 2005 Round Table - ... always outperform any coverage optimization based MCS, like bagging and boosting. ... deduce the type MCS (boosting, bagging, random forests, boosting, bites, etc. ... | PowerPoint PPT presentation | free to view

Thursday, November 11, 1999 PowerPoint PPT Presentation

Thursday, November 11, 1999 - Ensemble averaging (single-pass: weighted majority, bagging, stacking) ... Combiner may be weight vector (WM), vote (bagging), trained inducer (stacking) ... | PowerPoint PPT presentation | free to view

Cost-Sensitive Learning PowerPoint PPT Presentation

Cost-Sensitive Learning - Objective: Minimize the total misclassification cost. Application. Medical ... Uses the same learning algorithm e.g.. Bagging, Boosting. Heterogeneous Ensembles ... | PowerPoint PPT presentation | free to view

WEKA 3.1.8 PowerPoint PPT Presentation

WEKA 3.1.8 - Another classifier is one of the parameters. Boosting, bagging, logitboost, metacost, stacking, etc. Methods: SMO, k-NN, LWR, logistic... | PowerPoint PPT presentation | free to view

Statistical Pattern Recognition: A Review PowerPoint PPT Presentation

Statistical Pattern Recognition: A Review - learn to distinguish patterns of interest from their background, and ... As opposite of a chaos; it is an entity, vaguely defined, that could be given a name. ... | PowerPoint PPT presentation | free to view

CIS732-Lecture-28-20070329 PowerPoint PPT Presentation

CIS732-Lecture-28-20070329 - 'Bagging, Boosting, and C4.5', Quinlan. Section 5, 'MLC ... Hierarchical mixture models: divide-and-conquer estimation method. Kansas State University ... | PowerPoint PPT presentation | free to view

Thursday, 07 November 2002 PowerPoint PPT Presentation

Thursday, 07 November 2002 - Kansas State University. Department of Computing and Information Sciences ... 3. Do 50 times: create bootstrap S[i], learn decision tree, prune using D ... | PowerPoint PPT presentation | free to view

Essentials of Fire Fighting, PowerPoint PPT Presentation

Essentials of Fire Fighting, - After completing this lesson, the student shall be able to attack various types ... 27. Attack a fire in stacked/piled materials. ( Skill Sheet 15-I-7) ... | PowerPoint PPT presentation | free to view

Tuesday, November 9, 1999 PowerPoint PPT Presentation

Tuesday, November 9, 1999 - Collect votes from pool of prediction algorithms for each training example ... True for decision trees, neural networks; not true for k-nearest neighbor ... | PowerPoint PPT presentation | free to view

CIS 830 (Advanced Topics in AI) Lecture 16 of 45 PowerPoint PPT Presentation

CIS 830 (Advanced Topics in AI) Lecture 16 of 45 - Filtering: use weak inducers in cascade to filter examples for downstream ones. Resampling: reuse data from D by subsampling (don't need huge or 'infinite' D) ... | PowerPoint PPT presentation | free to view

Modeling Consensus: Classifier Combination for WSD PowerPoint PPT Presentation

Modeling Consensus: Classifier Combination for WSD - If errors are uncorrelated, decrease error by a factor of 1/N ... Approximate k with the performance of the classifier (PB) Combining classifiers (cont. ... | PowerPoint PPT presentation | free to view

Ensembles PowerPoint PPT Presentation

Ensembles - If all n were the same model, then no advantage could be gained. ... Different initial parameters, sampling approaches, etc. Different learning algorithms ... | PowerPoint PPT presentation | free to view

Statistical Pattern Recognition: A Review PowerPoint PPT Presentation

Statistical Pattern Recognition: A Review - Statistical Pattern Recognition: A Review Contents Introduction Statistical Pattern Recognition The Curse of Dimensionality & Peaking Phenomena ... | PowerPoint PPT presentation | free to view

Data Mining PowerPoint PPT Presentation

Data Mining - 3. Classification Methods Patterns and Models Regression, NBC k-Nearest Neighbors Decision Trees and Rules Large size data Models and Patterns A model is a global ... | PowerPoint PPT presentation | free to view