Decision Trees - PowerPoint PPT Presentation

1 / 15

About This Presentation

Title:

Decision Trees

Description:

Number of Views:39

Avg rating:3.0/5.0

Slides: 16

Provided by: steveh81

Category:

Tags: bagging | decision | trees

Transcript and Presenter's Notes

Title: Decision Trees

1
Decision Trees

2
What is a Classifier?

Given a set of training cases with a vector of
attributes (X, t) (x1, x2, , xk, t)
Usually, t is a discrete variable, representing
into what class a case falls
Want to a way to predict t based on X
Machine learning algorithms, called classifiers,
provide a means to do this
Examples Neural Nets, Decision Trees,
Bayesian Filters, Many More

3
Decision Trees

Decision trees are a type of classifier
Node, Leaf and Branch structure
Generally binary
Leaf value may reflect a full classification (t
0,1)
Or may give an idea of how close case is to one
class (depends on implementation)

xi gt a
xi lt a
Node
t1
xj gt b
xj lt b
t0
t.83
Branch
Leaf
4
Building a Decision Tree

5
Overtraining

xj
xi
ideal
overtrained
6
Pre-pruning

7
Pruning

For a more complicated tree, pre-pruning may
still allow overtraining
Many different pruning algorithms
Simplest
Withhold a small set of the training data
Grow the tree using remaining data
After tree is finished, prune a node to a leaf if
it leads to a lower error rate on the withheld
data

8
Advantages

A decision tree can be easily parsed by a human
or computer program, unlike the black box of a
neural net
Can be grown quickly
Handles discrete data (e.g. of Jets)

9
Disadvantages

Unstable A small change in the training
data can lead to large changes in the trees grown
For simplest algorithms, cannot make use of
correlations (esp. nonlinear) that only occur in
one of signal or background
Does not separate
on smooth lines

10
Ensemble Methods

Remove many disadvantages by combining multiple
trees
Boosting
Train a series of trees, then take linear
combination of their outputs
In each subsequent tree, more weight is given to
hard cases (i.e. the ones misclassified by
previous trees)
Sensitivity to noisy cases may lead to poor
performance

11
Ensemble Methods (cont.)

Bagging
Pick many random subsets of the training cases
(may or may not allow replacement)
Train trees using these subsets, then take an
average of their results
It is tempting to use a weighted average based on
how accurately a tree classifies the training
data, but this can lead to overtraining
Effective for noisy data and for unstable
classifiers like trees (small changes in training
set can lead to large changes in predictions)

12
Ensemble Methods (cont.)

Random Forest
Almost always used in conjunction with bagging
In each tree, at each node, pick at random only a
small subset of the attributes to split on OR
take a random linear combination of the
attributes
Again, weighted average can lead to overtraining

13
Testing Effectiveness

In past train on 60 of data, test on 40
Statisticians use 10-fold cross-evaluation
Divide data into 10 sets S1, S2, , S10
Train on 9 of these sets, test on remaining 1
Repeat for testing on each of the Sis, starting
over from scratch each time
Combine results to get a measure of performance
Provides a better measure of expected accuracy
(though difference is small for large data sets)

14
Appendix Gini