Decision Trees: Overview - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Decision Trees: Overview

Description:

Title: PowerPoint Presentation Author: Nick Evangelopoulos Description: Lecture s - Instructor's version Last modified by: Administrator Created Date – PowerPoint PPT presentation

Number of Views:155

Avg rating:3.0/5.0

Slides: 26

Provided by: NickEvang7

Category:

more less

Transcript and Presenter's Notes

Title: Decision Trees: Overview

1
DSCI 4520/5240 Data MiningFall 2013 Dr. Nick
Evangelopoulos

Lecture 4
Decision Trees Overview

Some slide material taken from SAS Education
2
On the News The Rise of the Numerati.
BusinessWeek, Sep 8, 2008 With the explosion of
data from the Internet, cell phones, and credit
cards, the people who can make sense of it all
are changing our world. An excerpt from the
introduction of the book The Numerati by Stephen
Baker
Imagine you're in a café, perhaps the noisy one
I'm sitting in at this moment. A young woman at a
table to your right is typing on her laptop. You
turn your head and look at her screen. She surfs
the Internet. You watch. Hours pass. She reads an
online newspaper. You notice that she reads three
articles about China. She scouts movies for
Friday night and watches the trailer for Kung Fu
Panda. She clicks on an ad that promises to
connect her to old high school classmates. You
sit there taking notes. With each passing minute,
you're learning more about her. Now imagine that
you could watch 150 million people surfing at the
same time. That's what is happening today at the
business place.
3
On the News The Rise of the Numerati.
By building mathematical models of its own
employees, IBM aims to improve productivity and
automate management.
In 2005, IBM embarked on research to harvest
massive data on employees, and to build
mathematical models of 50,000 of the companys
consultants. The goal was to optimize them, using
operations research, so that they can be deployed
with ever more efficiency. Data on IBM employees
include
-Allergies -Number of interns managed -Client
visits -Computer languages -Number of words per
e-mail -Amount spent entertaining clients -Number
of weekends worked
-Time spent in meetings -Social network
participation -Time spent surfing the
Web -Response time to e-mails -Amount of
sales -Marital status -Ratio of personal to work
e-mails
4
Agenda

Introduce the concept of Curse of
Dimensionality
Benefits and Pitfalls in Decision Tree modeling
Consequences of a decision

5
The Curse of Dimensionality
The dimension of a problem refers to the number
of input variables (actually, degrees of
freedom). Data mining problems are often massive
in both the number of cases and the dimension.
2D
3D
The curse of dimensionality refers to the
exponential increase in data required to densely
populate space as the dimension increases. For
example, the eight points fill the
one-dimensional space but become more separated
as the dimension increases. In 100-dimensional
space, they would be like distant galaxies. The
curse of dimensionality limits our practical
ability to fit a flexible model to noisy data
(real data) when there are a large number of
input variables. A densely populated input space
is required to fit highly complex models.
6
Addressing the Curse of Dimensionality Reduce
the Dimensions
Redundancy
Irrelevancy
The two principal reasons for eliminating a
variable are redundancy and irrelevancy. A
redundant input does not give any new information
that has not already been explained. Useful
methods principal components, factor analysis,
variable clustering. An irrelevant input is not
useful in explaining variation in the target.
Interactions and partial associations make
irrelevancy more difficult to detect than
redundancy. It is often useful to first eliminate
redundant dimensions and then tackle irrelevancy.
7
Model Complexity
Too flexible
Not flexible enough
A naïve modeler might assume that the most
complex model should always outperform the
others, but this is not the case. An overly
complex model might be too flexible. This will
lead to overfitting accommodating nuances of
the random noise in the particular sample (high
variance). A model with just enough flexibility
will give the best generalization.
8
Overfitting
Training Set
Test Set
9
Better Fitting
Training Set
Test Set
10
The Cultivation of Trees

Split Search
Which splits are to be considered?
Splitting Criterion
Which split is best?
Stopping Rule
When should the splitting stop?
Pruning Rule
Should some branches be lopped off?

11
Possible Splits to Consideran enormous number
500,000
Nominal Input
400,000
Ordinal Input
300,000
200,000
100,000
1
2
4
6
8
10
12
14
16
18
20
Input Levels
12
Splitting Criteria
How is the best split determined? In some
situations, the worth of a split is obvious. If
the expected target is the same in the child
nodes as in the parent node, no improvement was
made, and the split is worthless! In contrast,
if a split results in pure child nodes, the split
is undisputedly best. For classification trees,
the three most widely used splitting criteria are
based on the Pearson chi-squared test, the Gini
index, and entropy. All three measure the
difference in class distributions across the
child nodes. The three methods usually give
similar results.
13
Splitting Criteria
Left
Right
3196
1304
4500
Debt-to-Income Ratio lt 45
Not Bad
154
346
Bad
500
Left
Center
Right
2521
1188
791
4500
Not Bad
A Competing Three-Way Split
115
162
223
500
Bad
4500
0
4500
Not Bad
Perfect Split
0
500
500
Bad
14
Controlling tree growth Stunting
Stunting
A universally accepted rule is to stop growing if
the node is pure. Two other popular rules for
stopping tree growth are to stop if the number of
cases in a node falls below a specified limit or
to stop when the split is not statistically
significant at a specified level. This is called
Pre-pruning, or stunting.
15
Controlling tree growth Pruning
Pruning (also called post-pruning) creates a
sequence of trees of increasing complexity. An
assessment criterion is needed for deciding the
best (sub) tree. The assessment criteria are
usually based on performance on holdout samples
(validation data or with cross-validation). Cost
or profit considerations can be incorporated into
the assessment.
Pruning
16
Benefits of Trees

Interpretability
tree-structured presentation
Mixed Measurement Scales
nominal, ordinal, interval
Regression trees
Robustness
Missing Values

17
Benefits of Trees

Automatically
Detects interactions (AID)
Accommodates nonlinearity
Selects input variables

Multivariate Step Function
18
Drawbacks of Trees

Roughness
Linear, Main Effects
Instability

19
Building and Interpreting Decision Trees

Explore the types of decision tree models
available in Enterprise Miner.
Build a decision tree model.
Examine the model results and interpret these
results.
Choose a decision threshold theoretically and
empirically.

20
Consequences of a Decision
Decision 1 Decision 0
Actual 1 True Positive False Negative
Actual 0 False Positive True Negative
21
Example

Recall the home equity line of credit scoring
example. Presume that every two dollars loaned
eventually returns three dollars if the loan is
paid off in full.

22
Consequences of a Decision
Decision 1 Decision 0
Actual 1 True Positive False Negative (cost2)
Actual 0 False Positive (cost1) True Negative
23
Bayes Rule Optimal threshold

Using the cost structure defined for the home
equity example, the optimal threshold is
1/(1(2/1)) 1/3. That is,
reject all applications whose predicted
probability of default exceeds 0.33.

24
Consequences of a Decision Profit matrix (SAS EM)
Decision 1 Decision 0
Actual 1 True Positive (profit2) False Negative
Actual 0 False Positive (profit-1) True Negative
25
Decision Tree Algorithms