Bagging and Boosting in Data Mining - PowerPoint PPT Presentation

About This Presentation

Title:

Bagging and Boosting in Data Mining

Description:

Bagging (Bootstrap Aggregating) Leo Breiman, UC Berkeley. Boosting. Rob Schapire, ATT Research ... Bagging Algorithm. 1. Create k bootstrap replicates of the dataset ... – PowerPoint PPT presentation

Number of Views:1086

Avg rating:3.0/5.0

Slides: 9

Provided by: ruiz6

Learn more at: http://web.cs.wpi.edu

Category:

Tags: bagging | boosting | data | mining

Transcript and Presenter's Notes

Title: Bagging and Boosting in Data Mining

1
Bagging and Boosting in Data Mining

Carolina Ruiz
ruiz_at_cs.wpi.edu
http//www.cs.wpi.edu/ruiz

2
Motivation and Background

Problem Definition
Given a dataset of instances and a target
concept
Find a model (e.g. set of association rules,
decision tree, neural network) that helps in
predicting the classification of unseen
instances.
Difficulties
The model should be stable (i.e. shouldnt depend
too much on input data used to construct it)
The model should be a good predictor (difficult
to achieve when input dataset is small)

3
Two Approaches

Bagging (Bootstrap Aggregating)
Leo Breiman, UC Berkeley
Boosting
Rob Schapire, ATT Research
Jerry Friedman, Stanford U.

4
Bagging

Model Creation
Create bootstrap replicates of the dataset and
fit a model to each one
Prediction
Average/vote predictions of each model
Advantages
Stabilizes unstable methods
Easy to implement, parallelizable.

5
Bagging Algorithm

1. Create k bootstrap replicates of the
dataset
2. Fit a model to each of the replicates
3. Average/vote the predictions of the k models

6
Boosting

Creating the model
Construct a sequence of datasets and models in
such a way that a dataset in the sequence weights
an instance heavily when the previous model has
misclassified it.
Prediction
Merge the models in the sequence
Advantages
Improves classification accuracy

7
Generic Boosting Algorithm

1. Equally weight all instance in dataset
2. For I 1 to T
2.1. Fit a model to current dataset
2.2. Upweight poorly predicted instances
2.3 Downweight well-predicted instances
3. Merge the models in the sequence to obtain the
final model

8
Conclusions and References

Boosted naïve Bayes tied for first place in
KDD-cup 1997
Reference
Combining Estimators to Improve Performance
KDD-99 tutorial notes
John F. Elder
Greg Ridgeway

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Hypertext data mining A tutorial survey PowerPoint PPT Presentation

Hypertext data mining A tutorial survey - Filtering news, email, etc. Narrowing searches and selective data acquisition ... Yahoo/SocietyCulture/Environment/ Recycling. Dimensionality ... | PowerPoint PPT presentation | free to view

Data Mining: Concepts and Techniques Mining Text Data PowerPoint PPT Presentation

Data Mining: Concepts and Techniques Mining Text Data - Playground(p1). Chasing(d1,b1,p1). Semantic analysis. Lexical. analysis (part ... articles, research papers, books, digital libraries, e-mail messages, and Web ... | PowerPoint PPT presentation | free to view

Data Mining Classification: Alternative Techniques PowerPoint PPT Presentation

Data Mining Classification: Alternative Techniques - Tan,Steinbach, Kumar Introduction to Data Mining 02/26/2006 1 ... A lemur triggers rule r3, so it is classified as a mammal. A turtle triggers both r4 and r5 ... | PowerPoint PPT presentation | free to view

Data Mining Classification: Alternative Techniques PowerPoint PPT Presentation

Data Mining Classification: Alternative Techniques - Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 ... A lemur triggers rule R3, so it is classified as a mammal. A turtle triggers both R4 and R5 ... | PowerPoint PPT presentation | free to view

CS490D: Introduction to Data Mining Prof. Chris Clifton PowerPoint PPT Presentation

CS490D: Introduction to Data Mining Prof. Chris Clifton - ... (item_name, brand, type), or time(day, week, month, quarter, year) ... Discovery-drive and multi-feature cubes. From OLAP to OLAM (on-line analytical mining) ... | PowerPoint PPT presentation | free to view

Data Mining Classification: Alternative Techniques PowerPoint PPT Presentation

Data Mining Classification: Alternative Techniques - Add conjuncts that maximizes FOIL's information gain measure: R0: {} = class (initial rule) ... Add conjuncts as long as they improve FOIL's information gain ... | PowerPoint PPT presentation | free to view

Data Mining: Current Status and Research Directions PowerPoint PPT Presentation

Data Mining: Current Status and Research Directions - Text mining, Web mining and Weblog analysis. Spatial, multimedia, scientific data analysis ... customization: home page Weblog user profiles. 9/3/09. Data ... | PowerPoint PPT presentation | free to view

Parallel and Distributed Computing for Data Mining: A review PowerPoint PPT Presentation

Parallel and Distributed Computing for Data Mining: A review - ... Computing. for Data Mining: A review. Elio Lozano Inca ... Car Type. YES. NO. YES 30 =30. Sports, Truck. splitting predicate. Decision Tree. Decision Tree ... | PowerPoint PPT presentation | free to view

Data Mining for an Educational Webbased System PowerPoint PPT Presentation

Data Mining for an Educational Webbased System - Intelligent automated tools needed to discover relevant, useful, and interesting ... discovered rules to produce more intelligent system. 9/16/09. Thesis ... | PowerPoint PPT presentation | free to view

KDD%20Cup%202009 PowerPoint PPT Presentation

KDD%20Cup%202009 - About 75% ensemble methods (1/3 boosting, 1/3 bagging, 1/3 other). About 10% used unscrambling. ... boosting decision tree technology, bagging also used. ... | PowerPoint PPT presentation | free to view

Data Mining in Spatial Databases: A MultiDisciplinary Promise PowerPoint PPT Presentation

Data Mining in Spatial Databases: A MultiDisciplinary Promise - Mining interesting knowledge/patterns from huge amount of spatial data ... Tree pruning methods: boosting/bagging. Na ve-Bayesian classifier boosting ... | PowerPoint PPT presentation | free to view

Extending DSMS for Data Stream Mining PowerPoint PPT Presentation

Extending DSMS for Data Stream Mining - Continuous, unbounded, rapid, time-varying streams of data elements ... Weighted bagging [Wang' 03], adaptive boosting [Chu' 04], inductive transfer [Forman' 06] ... | PowerPoint PPT presentation | free to view

Data%20Stream%20Management%20Systems%20Checkpoint PowerPoint PPT Presentation

Data%20Stream%20Management%20Systems%20Checkpoint - Bagging ... Bagging Ensemble Method. 13. 13. Mining Streams with Concept Changes ... of boosting ensembles is better than that of bagging ensembles [KDD04] ... | PowerPoint PPT presentation | free to view

Dental Data Mining: Practical Issues and Potential Pitfalls PowerPoint PPT Presentation

Dental Data Mining: Practical Issues and Potential Pitfalls - 'Semi-automatic discovery of patterns, associations, anomalies, and statistically ... Trees (CART, MART, boosting, bagging) Random Forests ... | PowerPoint PPT presentation | free to view

Statistics 202: Statistical Aspects of Data Mining PowerPoint PPT Presentation

Statistics 202: Statistical Aspects of Data Mining - Bagging (page 283) -Random Forests (page 290) -Boosting (page 285) Bagging builds different classifiers by training on repeated samples (with ... | PowerPoint PPT presentation | free to view

Data Mining Classification: Alternative Techniques PowerPoint PPT Presentation

Data Mining Classification: Alternative Techniques - Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 ... one ordinal attribute per bin. violates independence assumption. Two-way split: (A v) or (A v) ... | PowerPoint PPT presentation | free to view

CS590D: Data Mining Prof. Chris Clifton PowerPoint PPT Presentation

CS590D: Data Mining Prof. Chris Clifton - Classification A Two-Step Process. Model construction: describing a set of predetermined classes ... Supervised learning (classification) ... | PowerPoint PPT presentation | free to view

Methods for Micro-Array Analysis Data Mining PowerPoint PPT Presentation

Methods for Micro-Array Analysis Data Mining - Typical characteristic of micro array data is the large number of variables ... Then we obtain a data set (mij; i=1,...,G, j=1,...,p), in which mij is the ... | PowerPoint PPT presentation | free to view

Spatial and Temporal Data Mining PowerPoint PPT Presentation

Spatial and Temporal Data Mining - Spatial and Temporal Data Mining Classification and Prediction Vasileios Megalooikonomou (based on notes by Jiawei Han and Micheline Kamber) Agenda What is ... | PowerPoint PPT presentation | free to view

Data Mining in Market Research PowerPoint PPT Presentation

Data Mining in Market Research - Data Mining in Market Research What is data mining? Methods for finding interesting structure in large databases E.g. patterns, prediction rules, unusual cases | PowerPoint PPT presentation | free to view

CS490D: Introduction to Data Mining Prof. Chris Clifton PowerPoint PPT Presentation

CS490D: Introduction to Data Mining Prof. Chris Clifton - CS490D: Introduction to Data Mining Prof. Chris Clifton February 9, 2004 Classification Classification and Prediction What is classification? What is prediction? | PowerPoint PPT presentation | free to view

Statistics 202: Statistical Aspects of Data Mining PowerPoint PPT Presentation

Statistics 202: Statistical Aspects of Data Mining - Statistics 202: Statistical Aspects of Data Mining Professor David Mease Tuesday, Thursday 9:00-10:15 AM Terman 156 Lecture 13 = Finish Chapter 5 and Chapter 8 | PowerPoint PPT presentation | free to view

Statistics 202: Statistical Aspects of Data Mining PowerPoint PPT Presentation

Statistics 202: Statistical Aspects of Data Mining - Statistics 202: Statistical Aspects of Data Mining Professor David Mease Tuesday, Thursday 9:00-10:15 AM Terman 156 Lecture 12 = More of Chapter 5 | PowerPoint PPT presentation | free to view

Browsing around a digital library seminar PowerPoint PPT Presentation

Browsing around a digital library seminar - Slides for Data Mining by I. H. Witten and E. Frank | PowerPoint PPT presentation | free to view

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions PowerPoint PPT Presentation

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions - A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois ... | PowerPoint PPT presentation | free to view

Data Mining com a Ferramenta Weka PowerPoint PPT Presentation

Data Mining com a Ferramenta Weka - Data Mining com a Ferramenta Weka Diogo Fernando Veiga Pedro de Stege Cecconello | PowerPoint PPT presentation | free to view

Part 1 : How Does Web Scraping Help In Extracting Yelp Reviews Data? PowerPoint PPT Presentation

Part 1 : How Does Web Scraping Help In Extracting Yelp Reviews Data? - Foodspark provides web scraping services to extract Yelp data to fetch the information like review’s name, date, star ratings, etc. | PowerPoint PPT presentation | free to view