Decision Trees - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Decision Trees

Description:

Decision Trees Definition Mechanism Splitting Function Issues in Decision-Tree Learning Avoiding overfitting through pruning Numeric and missing attributes – PowerPoint PPT presentation

Number of Views:251

Avg rating:3.0/5.0

Slides: 25

Provided by: Ricardo242

Category:

more less

Transcript and Presenter's Notes

Title: Decision Trees

1
Decision Trees

Definition
Mechanism
Splitting Function
Issues in Decision-Tree Learning
Avoiding overfitting through pruning
Numeric and missing attributes

2
Example of a Decision Tree
Example Learning to classify stars.
Luminosity
gt T1
lt T1
Mass
lt T2
Type C
gt T2
Type B
Type A
3
Short vs Long Hypotheses
We mentioned a top-down, greedy approach to
constructing decision trees denotes a preference
of short hypotheses over long hypotheses. Why is
this the right thing to do?
Occams Razor Prefer the simplest hypothesis
that fits the data.
Back since William of Occam (1320). Great debate
in the philosophy of science.
4
Issues in Decision Tree Learning

Practical issues while building a decision tree
can
be enumerated as follows
How deep should the tree be?
How do we handle continuous attributes?
What is a good splitting function?
What happens when attribute values are missing?
How do we improve the computational efficiency?

5
How deep should the tree be? Overfitting the Data
A tree overfits the data if we let it grow deep
enough so that it begins to capture aberrations
in the data that harm the predictive power on
unseen examples
t2
Possibly just noise, but the tree is grown
larger to capture these examples
humidity
t3
size
6
Overtting the Data Definition
Assume a hypothesis space H. We say a hypothesis
h in H overfits a dataset D if there is another
hypothesis h in H where h has better classificati
on accuracy than h on D but worse classification
accuracy than h on D.
training data
overfitting
0.5 0.6 0.7 0.8 0.9 1.0
testing data
Size of the tree
7
Causes for Overtting the Data

What causes a hypothesis to overfit the data?
Random errors or noise
Examples have incorrect class label or
incorrect attribute values.
Coincidental patterns
By chance examples seem to deviate
from a pattern due to
the small size of the sample.
Overfitting is a serious problem that can cause
strong performance degradation.

8
Solutions for Overtting the Data

There are two main classes of solutions
Stop the tree early before it begins to overfit
the data.
In practice this solution is hard to
implement because it
is not clear what is a good stopping
point.
2) Grow the tree until the algorithm stops
even if the overfitting
problem shows up. Then prune the tree as a
post-processing
step.
This method has found great popularity
in the machine
learning community.

9
Decision Tree Pruning
2.) Prune tree to avoid overfitting the data
1.) Grow the tree to learn the training data
10
Methods to Validate the New Tree

Training and Validation Set Approach
Divide dataset D into a training set TR and a
validation set TE
Build a decision tree on TR
Test pruned trees on TE to decide the best final
tree.

Dataset D
Training TR
Validation TE
11
Training and Validation
Dataset D
Training TR (normally 2/3 of D)
Validation TE (normally 1/3 of D)

There are two approaches
Reduced Error Pruning
Rule Post-Pruning

12
Reduced Error Pruning

Main Idea
1) Consider all internal nodes in the tree.
For each node check if removing it (along with
the subtree
below it) and assigning the most common
class to it does
not harm accuracy on the validation set.
Pick the node n that yields the best performance
and prune
its subtree.
4) Go back to (2) until no more improvements are
possible.

13
Example
Possible trees after pruning
Original Tree
14
Example
Possible trees after 2nd pruning
Pruned Tree
15
Example
Process continues until no improvement is
observed on the validation set
Stop pruning the tree
0.5 0.6 0.7 0.8 0.9 1.0
validation data
Size of the tree
16
Reduced Error Pruning

Disadvantages
If the original data set is small, separating
examples away for
validation may leave you with few examples
for training.

Dataset D
Training TR
Training set is too small and so is the
validation set
Testing TE
Small dataset
17
Rule Post-Pruning

Main Idea
1) Convert the tree into a rule-based system.
Prune every single rule first by removing
redundant
conditions.
3) Sort rules by accuracy.

18
Example
x1
Original tree
1
0
x3
x2
1
1
0
0
A
C
A
B
Possible rules after pruning (based on validation
set) x1 -gt Class A x1 x2 -gt
Class B x3 -gt Class A x1 x3
-gt Class C
Rules x1 x2 -gt Class A x1 x2 -gt
Class B x1 x3 -gt Class A x1 x3 -gt
Class C
19
Advantages of Rule Post-Pruning

The language is more expressive.
Improves on interpretability.
Pruning is more flexible.
In practice this method yields high accuracy
performance.

20
Decision Trees

Definition
Mechanism
Splitting Functions
Issues in Decision-Tree Learning
Avoiding overfitting through pruning
Numeric and missing attributes

21
Discretizing Continuous Attributes
Example attribute temperature. 1) Order all
values in the training set 2) Consider only those
cut points where there is a change of class 3)
Choose the cut point that maximizes information
gain
97 97.5 97.6 97.8 98.5 99.0 99.2 100 102.2 102.6
103.2
temperature
22
Claude Shannon
1916 2001 Funded information theory on 1948
with his paper A Mathematical Theory of
Communication Awarded the Alfred Noble American
Institute of American Engineers Award for his
masters thesis. Worked at MIT, Bell Labs. Met
with Alan Turing, Marvin Minsky, John von
Neumann, and Albert Einstein. Creator of the
Ultimate Machine.
23
Missing Attribute Values
Example
X (luminosity gt T1, mass ?)