Decision Trees - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Decision Trees

Description:

Decision Trees Definition Mechanism Splitting Function Issues in Decision-Tree Learning Avoiding overfitting through pruning Numeric and missing attributes – PowerPoint PPT presentation

Number of Views:248
Avg rating:3.0/5.0
Slides: 25
Provided by: Ricardo242
Category:
Tags: about | decision | trees

less

Transcript and Presenter's Notes

Title: Decision Trees


1
Decision Trees
  • Definition
  • Mechanism
  • Splitting Function
  • Issues in Decision-Tree Learning
  • Avoiding overfitting through pruning
  • Numeric and missing attributes

2
Example of a Decision Tree
Example Learning to classify stars.
Luminosity
gt T1
lt T1
Mass
lt T2
Type C
gt T2
Type B
Type A
3
Short vs Long Hypotheses
We mentioned a top-down, greedy approach to
constructing decision trees denotes a preference
of short hypotheses over long hypotheses. Why is
this the right thing to do?
Occams Razor Prefer the simplest hypothesis
that fits the data.
Back since William of Occam (1320). Great debate
in the philosophy of science.
4
Issues in Decision Tree Learning
  • Practical issues while building a decision tree
    can
  • be enumerated as follows
  • How deep should the tree be?
  • How do we handle continuous attributes?
  • What is a good splitting function?
  • What happens when attribute values are missing?
  • How do we improve the computational efficiency?

5
How deep should the tree be? Overfitting the Data
A tree overfits the data if we let it grow deep
enough so that it begins to capture aberrations
in the data that harm the predictive power on
unseen examples
t2
Possibly just noise, but the tree is grown
larger to capture these examples
humidity
t3
size
6
Overtting the Data Definition
Assume a hypothesis space H. We say a hypothesis
h in H overfits a dataset D if there is another
hypothesis h in H where h has better classificati
on accuracy than h on D but worse classification
accuracy than h on D.
training data
overfitting
0.5 0.6 0.7 0.8 0.9 1.0
testing data
Size of the tree
7
Causes for Overtting the Data
  • What causes a hypothesis to overfit the data?
  • Random errors or noise
  • Examples have incorrect class label or
  • incorrect attribute values.
  • Coincidental patterns
  • By chance examples seem to deviate
    from a pattern due to
  • the small size of the sample.
  • Overfitting is a serious problem that can cause
  • strong performance degradation.

8
Solutions for Overtting the Data
  • There are two main classes of solutions
  • Stop the tree early before it begins to overfit
    the data.
  • In practice this solution is hard to
    implement because it
  • is not clear what is a good stopping
    point.
  • 2) Grow the tree until the algorithm stops
    even if the overfitting
  • problem shows up. Then prune the tree as a
    post-processing
  • step.
  • This method has found great popularity
    in the machine
  • learning community.

9
Decision Tree Pruning
2.) Prune tree to avoid overfitting the data
1.) Grow the tree to learn the training data
10
Methods to Validate the New Tree
  • Training and Validation Set Approach
  • Divide dataset D into a training set TR and a
  • validation set TE
  • Build a decision tree on TR
  • Test pruned trees on TE to decide the best final
    tree.

Dataset D
Training TR
Validation TE
11
Training and Validation
Dataset D
Training TR (normally 2/3 of D)
Validation TE (normally 1/3 of D)
  • There are two approaches
  • Reduced Error Pruning
  • Rule Post-Pruning

12
Reduced Error Pruning
  • Main Idea
  • 1) Consider all internal nodes in the tree.
  • For each node check if removing it (along with
    the subtree
  • below it) and assigning the most common
    class to it does
  • not harm accuracy on the validation set.
  • Pick the node n that yields the best performance
    and prune
  • its subtree.
  • 4) Go back to (2) until no more improvements are
    possible.

13
Example
Possible trees after pruning
Original Tree
14
Example
Possible trees after 2nd pruning
Pruned Tree
15
Example
Process continues until no improvement is
observed on the validation set
Stop pruning the tree
0.5 0.6 0.7 0.8 0.9 1.0
validation data
Size of the tree
16
Reduced Error Pruning
  • Disadvantages
  • If the original data set is small, separating
    examples away for
  • validation may leave you with few examples
    for training.

Dataset D
Training TR
Training set is too small and so is the
validation set
Testing TE
Small dataset
17
Rule Post-Pruning
  • Main Idea
  • 1) Convert the tree into a rule-based system.
  • Prune every single rule first by removing
    redundant
  • conditions.
  • 3) Sort rules by accuracy.

18
Example
x1
Original tree
1
0
x3
x2
1
1
0
0
A
C
A
B
Possible rules after pruning (based on validation
set) x1 -gt Class A x1 x2 -gt
Class B x3 -gt Class A x1 x3
-gt Class C
Rules x1 x2 -gt Class A x1 x2 -gt
Class B x1 x3 -gt Class A x1 x3 -gt
Class C
19
Advantages of Rule Post-Pruning
  • The language is more expressive.
  • Improves on interpretability.
  • Pruning is more flexible.
  • In practice this method yields high accuracy
    performance.

20
Decision Trees
  • Definition
  • Mechanism
  • Splitting Functions
  • Issues in Decision-Tree Learning
  • Avoiding overfitting through pruning
  • Numeric and missing attributes

21
Discretizing Continuous Attributes
Example attribute temperature. 1) Order all
values in the training set 2) Consider only those
cut points where there is a change of class 3)
Choose the cut point that maximizes information
gain
97 97.5 97.6 97.8 98.5 99.0 99.2 100 102.2 102.6
103.2
temperature
22
Claude Shannon
1916 2001 Funded information theory on 1948
with his paper A Mathematical Theory of
Communication Awarded the Alfred Noble American
Institute of American Engineers Award for his
masters thesis. Worked at MIT, Bell Labs. Met
with Alan Turing, Marvin Minsky, John von
Neumann, and Albert Einstein. Creator of the
Ultimate Machine.
23
Missing Attribute Values
Example
X (luminosity gt T1, mass ?)
  • We are at a node n in the decision tree.
  • Different approaches
  • Assign the most common value for that attribute
    in node n.
  • Assign the most common value in n among examples
    with the
  • same classification as X.
  • Assign a probability to each value of the
    attribute based on the
  • frequency of those values in node n. Each
    fraction is propagated
  • down the tree.

24
Summary
  • Decision-tree induction is a popular approach to
    classification
  • that enables us to interpret the output
    hypothesis.
  • The hypothesis space is very powerful all
    possible DNF formulas.
  • We prefer shorter trees than larger trees.
  • Overfitting is an important issue in
    decision-tree induction.
  • Different methods exist to avoid overfitting
    like reduced-error
  • pruning and rule post-processing.
  • Techniques exist to deal with continuous
    attributes and missing
  • attribute values.
Write a Comment
User Comments (0)
About PowerShow.com