Zhangxi Lin

About This Presentation

Title:

Zhangxi Lin

Description:

Title: Maximal Tree Author: Corporate Microcomputing Department Last modified by: zlin Created Date: 3/21/1999 2:20:07 AM Document presentation format – PowerPoint PPT presentation

Number of Views:108

Avg rating:3.0/5.0

Slides: 68

Provided by: Corpor139

Category:

more less

Transcript and Presenter's Notes

Title: Zhangxi Lin

1
Lecture Notes 3Decision Tree Construction

Zhangxi Lin
ISQS 7342-001
Texas Tech University
Note Most slides are from Decision Tree
Modeling, by SAS

2
Outline

The Mechanics of Decision Tree Construction
Recursive Partitioning

3
Growing a Decision Tree Six Steps
1
2
3
Pre-Process Data
Set Input-Target Characteristics
Select Tree Growth Parameters
Source Data
Trigger Input Search
Manual/Automatic
4
Process/Cluster Inputs
5
Select Branch/Split
6a
6b
Stop/Grow/ Prune?
Select Final Tree
4
1. Data Pre-process

Distinguish categorical and continuous data
Re-code multi-category targets into a 1-of-N
scheme - make each binary
Convert dates and time data into a computable
form
Avoid information loss
Make consistent of the scale directions
Carefully handle multiple response items
Understand correctly the missing value
Pivot records if necessary

5
2. Set the Input and Target Modeling
Characteristics

Target
Check the missing value indicator for the target
field.
Look for values such as -1, -99, or 0 to ensure
the target field is in right status
Create 1-of-N derivation of the categorical codes
for the seemingly interval variable but actually
not.
Inputs
For decision tree analysis, inputs are
transformed into discrete categories
Figure out if an input is ordered or unordered

6
3. Select the Decision Tree Growth Parameters

Considerations
Input categories combination for branching
Branches sorting and combining
of nodes on a branch
of alternative branches
Determine differences among branches
Branch evaluation, selection and display
Input data segmentation in terms of branches
Branch growth strategy empirical tests or
theoretical tests
Pre- or post-branching prune
Stopping rule potential branches and nodes

7
4. Cluster and Process Each Branch-Forming Input
Field (within an input)

Goal of clustering in decision tree construction
Cluster observations, values of input fields, and
same levels of splits on the tree
Maximize the predictive relationship between the
input and the target
The most understandable branch may not always be
the best predictor
Clustering algorithms (what is the difference of
them from k-means?)
Variance reduction
Entropy
Gini
Significance tests
Tuning the levels of significance
The Kass merger-and-split heuristic for multiway
split

8
Kass Merge-and-Split Heuristic

Merge-and-Split. Converges on a single, optimal
clustering of like codes.
Merges codes within clusters and reassigns
consolidated groups of observations to different
branches
Breaks up consolidated groups by splitting out
the members with the weakest relationships
Remerges the broken-up groups with consolidated
groups that are similar
SAS Enterprise Miner uses a variation of
heuristic Merge-and-Shuffle.
Assigns each consolidated group of observations
to a different node. The two nodes that degrade
the worth of the split the least are merged.
Reassigns consolidated groups of observations to
different nodes
Stops when no consolidated group can be reassigned

9
Dealing with missing values

Treat a missing value as a legitimate value
(explicitly include it in the analysis)
Use surrogates to populate descendent nodes where
the input value for the preferred input is
missing
Estimate missing value based on non-missing
inputs
Distribute the missing value in the input to the
descendent node based on a distribution rule
Distribute missing values over all branches in
proportion to the missing values by branch
How SAS EM handles missing values
Distribute missing values across all available
branches
Assign missing values to the most correlated
branch
Assign missing values to the largest branch

10
5. Select the Candidate Decision Tree Branches
(among inputs)

The CHAID approach
F-test. For numeric targets with interval-level
measurements. A measure of between-group
similarity vs. within-group similarity.
Chi-squared test. For categorical targets.
Statistical adjustments
Bonferroni adjustments
The CRT approach
Choices of branches by parameters Number of
leaves, best assessment value, the most leaves,
Gini, variance reduction
Inputs either nominal or interval ordinal
inputs are treated as interval
Different pruning
Retrospective pruning. Try to identify the best
subtree
Cost-complexity pruning. Uses training data to
create a subtree sequence
Reduced-error pruning. Relies on validation data

11
Statistical Adjustments

Bonferroni correction
The Bonferroni correction states that if an
experimenter is testing n dependent or
independent hypotheses on a set of data, then the
statistical significance level that should be
used for each hypothesis separately is 1/n times
what it would be if only one hypothesis were
tested. Statistically significant simply means
that a given result is unlikely to have occurred
by chance.
It was developed by Italian mathematician Carlo
Emilio Bonferroni.
Kass Adjustment
A p-value adjustment that multiplies the p-value
by a Bonferroni factor that depends on the number
of branches and chi-square target values, and
sometimes on the number of distinct input values.
The Kass adjustment is used in the Tree node.

12
6. Complete the Form and Content the Final
Decision Tree

Stop, Grow, Prune or Iterate?
CHAID stops forming decision tree when no node
can produce any significant split below it. A
stopping rule is used.
The user decides when to stop
The node contains too few observations
Reaches the maximum depth
No more split passes F-test or chi-squared test
CRT relies on validation tests to prune branches,
to stop tree growth, and to form an optimal
decision tree.

13
Issues

Assessment Measures
Proportion correctly classified, and
The sum of squared errors (quantitative targets),
or average square error (continuous targets)
Others the proportion of events in the top 50
(or user defined ) on target 1
Main Difference between CHAID and CRT
Whether a test of significance or a
train-and-test measurement comparison is used.
Guiding tree growth with costs and benefits in
the target
Implied costs and benefits lie behind a wide
range of human decision-making
Prior probabilities
Affect the misclassification measure
Do not change the decision tree shape

14
Effect of Decision Threshold
Decision Threshold
Hits
Misses
15
Effect of Decision Threshold
Decision Threshold
False alarm
Hits
Correct rejection
Misses
16
Questions

What are differences of the 6-step process of
decision tree construction from your previous of
decision tree modeling?
Why is clustering utilized in decision tree
algorithms? How?
How is the missing values problem resolved in
decision tree modeling?
What is surrogate split? How does it work?
What are differences between CHAID and CRT?

17
3
2. Recursive Partitioning
2.1 Recursive Partitioning
2.2 Split Selection Bias
2.3 Regression Diagnostics
18
Recursive Partitioning

Recursive partitioning is the standard method
used to fit decision trees. It is a top-down,
greedy algorithm.
Example
Handwriting recognition is a classic application
of supervised prediction. The example data set is
a subset of the pen-based recognition of
handwritten digits data, available from the UCI
repository (Blake et al 1998). The cases are
digits written on a pressure-sensitive tablet.
The input variables measure the position of the
pen. They are scaled to be between 0 and 100. Two
of the original 16 inputs are shown (X1 and X10).
The target is the true written digit (0-9).
This subset contains the 1064 cases corresponding
to the three digits 1, 7, and 9. Each case
represents a point in the input space. (The data
have been jittered for display because many of
the points overlap.)

19
Supervised Prediction Nominal Target
20
Classification Tree
21
Multiway Splits
lt22
22
Decision Regions
23
Root-Node Split
D1 364 D7 364 D9 336 n 1064
X1lt38.5
yes
no
D1 71 D7 1 D9 294 n 366
D1 293 D7 363 D9 42 n 698
24
1-Deep Space
100
1
9
9
1
1
1
1
1
1
1
1
9
9
9
9
9
9
9
9
1
1
1
1
1
1
9
9
9
9
9
9
9
9
9
1
1
1
1
1
1
1
9
9
9
9
9
9
9
9
9
9
9
9
1
1
1
1
1
1
1
1
9
9
9
9
9
9
9
9
9
9
9
9
1
1
1
1
1
1
1
1
1
9
9
9
9
9
9
9
1
1
1
1
1
1
1
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
1
1
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
80
1
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
1
9
9
60
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
1
1
9
9
9
X1
9
9
1
1
9
7
9
9
9
9
1
1
9
9
9
1
1
9
9
9
9
9
40
9
9
1
9
9
1
9
1
1
9
1
7
1
1
1
7
7
9
1
9
9
1
1
9
9
1
7
7
7
9
9
1
1
1
1
1
7
7
1
7
9
1
1
1
1
7
7
7
9
9
1
1
1
1
1
1
1
1
1
1
7
7
7
9
1
1
7
7
7
9
9
20
1
1
1
1
7
7
1
1
1
1
1
7
7
9
1
1
1
7
7
7
9
1
1
7
7
7
7
1
1
7
7
1
1
1
7
7
7
7
7
7
7
1
1
1
1
7
7
7
9
1
1
1
1
1
1
1
7
7
9
9
1
1
1
7
7
7
7
7
7
7
9
7
7
7
7
7
1
1
7
7
7
7
1
1
1
1
7
7
7
9
1
1
1
7
7
7
7
7
7
7
7
7
7
7
1
1
1
1
1
1
7
7
7
7
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
1
7
7
7
7
7
7
1
1
1
1
1
1
1
1
7
7
1
1
1
1
7
7
7
7
0
1
7
7
7
7
7
7
7
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
9
9
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
9
1
1
1
1
7
7
7
7
7
7
0
20
40
60
80
100
X10
25
Depth 2
Root
D1 293 D7 363 D9 42 n 698
D1 71 D7 1 D9 294 n 366
yes
no
yes
no
X10lt0.5
X10lt51.5
D1 285 D7 143 D9 41 n 469
D1 8 D7 220 D9 1 n 229
D1 4 D7 0 D9 276 n 280
D1 67 D7 1 D9 18 n 86
26
2-Deep Space
1
1
1
1
1
9
9
9
9
9
9
9
1
1
1
1
1
1
1
9
9
9
9
9
9
9
9
1
1
1
1
1
1
1
9
9
9
9
9
9
9
9
9
9
9
9
1
1
1
1
1
1
1
9
9
9
9
9
9
9
9
9
9
9
1
1
1
1
1
1
1
1
1
9
9
9
9
9
9
9
1
1
1
1
1
1
1
1
1
1
1
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
1
1
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
1
1
9
9
9
9
9
1
1
9
7
9
9
9
9
1
1
9
9
9
1
1
9
9
9
9
9
9
9
1
9
9
1
9
1
1
9
1
7
1
1
1
7
7
9
1
9
9
1
1
9
9
1
7
7
7
9
9
1
1
1
1
1
7
7
1
7
9
1
1
1
1
7
7
7
9
9
1
1
1
1
1
1
1
1
1
1
7
7
7
9
1
1
7
7
7
9
9
1
1
1
1
7
7
1
1
1
1
1
7
7
9
1
1
1
7
7
7
9
1
1
7
7
7
7
1
1
7
7
1
1
1
7
7
7
7
7
7
7
1
1
1
1
7
7
7
9
1
1
1
1
1
1
1
7
7
9
9
1
1
1
7
7
7
7
7
7
7
9
7
7
7
7
7
1
1
7
7
7
7
1
1
1
1
7
7
7
9
1
1
1
7
7
7
7
7
7
7
7
7
7
7
1
1
1
1
1
1
7
7
7
7
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
1
7
7
7
7
7
7
1
1
1
1
1
1
1
1
7
7
1
1
1
1
7
7
7
7
1
7
7
7
7
7
7
7
9
1
1
1
1
1
1
1
1
7
7
7
7
7
7
7
7
7
7
7
7
9
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
9
9
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
27
Split Characteristics
100
1
9
9
1
1
1
1
1
1
1
1
9
9
9
9
9
9
9
9
1
1
1
1
1
1
9
9
9
9
9
9
9
9
9
1
1
1
1
1
1
1
9
9
9
9
9
9
9
9
9
9
9
9
1
1
1
1
1
1
1
1
9
9
9
9
9
9
9
9
9
9
9
9
1
1
1
1
1
1
1
1
1
9
9
9
9
9
9
9
1
1
1
1
1
1
1
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
1
1
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
80
1
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
1
9
9
60
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
1
1
9
9
9
X1
9
9
1
1
9
7
9
9
9
9
1
1
9
9
9
1
1
9
40
9
9
9
9
9
9
1
9
9
1
9
1
1
9
1
7
1
1
1
7
7
9
1
9
9
1
1
9
9
1
7
7
7
9
9
1
1
1
1
1
7
7
1
7
9
1
1
1
1
7
7
7
9
9
1
1
1
1
1
1
1
1
1
1
7
7
7
9
20
1
1
7
7
7
9
9
1
1
1
1
7
7
1
1
1
1
1
7
7
9
1
1
1
7
7
7
9
1
1
7
7
7
7
1
1
7
7
1
1
1
7
7
7
7
7
7
7
1
1
1
1
7
7
7
9
1
1
1
1
1
1
1
7
7
9
9
1
1
1
7
7
7
7
7
7
7
9
7
7
7
7
7
1
1
7
7
7
7
1
1
1
1
7
7
7
9
1
1
1
7
7
7
7
7
7
7
7
7
7
7
1
1
1
1
1
1
7
7
7
7
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
1
7
7
7
7
7
7
1
1
1
1
1
1
1
1
7
7
1
1
1
1
7
7
7
7
0
1
7
7
7
7
7
7
7
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
9
9
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
9
1
1
1
1
7
7
7
7
7
7
0
20
40
60
80
100
X10
28
Improvement of the greedy algorithm

This greedy algorithm could be improved by
incorporating some type of lookahead or backup.
Aside from the computational burden, trees built
using limited look-ahead have not been shown to
be an improvement in many cases they give
inferior trees (Murthy and Salzberg 1995).
Another variation is oblique splits. Standard
decision trees partition the input space using
boundaries that are parallel to the input
coordinates. These coordinate-axis splits make a
fitted decision tree easy to interpret and
provide resistance to the curse of
dimensionality.
Splits on linear combinations of inputs give
oblique boundaries. Several algorithms have been
developed for inducing oblique decision trees
(BFOS 1984, Murthy et al 1994, Loh and Shih
1997).

29
Outline

Split Search
Ordinal input
Nominal input
Multiway splits
Splitting Criterion
Impurity reduction
Chi-squared test
Regression Trees
Missing Values

Variable X10 X10 X10 X1 X1 . . .
Values 0.5 1.8 11, 46 2.4 1, 4, 61 . . .
30
Partitioning on an Ordinal Input
Splits
1234 1234 1234 1234 1234 1234 1234
31
At Least Ordinal
.20
X
3.3
1.7
14
3.5
2515
ln(X)
1.6
1.2
.53
2.6
1.3
7.8
rank(X)
1
3
2
5
4
6
potential split locations
For interval or ordinal inputs, splits in a
decision tree depend only on the ordering of the
levels, making tree models robust to outliers in
input space. The application of a rank or any
monotonic transformation to an interval variable
will not change the fitted tree.
32
Partitioning on a Nominal Input
1234 2134 3124 4123 1234 1324 1423 1234 1
324 1423 2314 2413 3412 1234
B
L
33
Split Search Shortcuts

Trees treat splits on inputs with nominal and
ordinal measurement scales differently. Splits on
a nominal input are not restricted. For a nominal
input with L distinct levels, there are S(L,B)
partitions into B branches, where S(L,B)is a
Stirling number of the second kind.
Binary splits exclusively
ordinal L ? 1
nominal 2L ? 1 ? 1
Agglomerative clustering of levels
Kass (1980)
Minimum child size

34
Stirling Number

In mathematics, Stirling numbers arise in a
variety of combinatorics problems. They are named
after James Stirling, who introduced them in the
18th century. Two different sets of numbers bear
this name the Stirling numbers of the first kind
and the Stirling numbers of the second kind.
See http//en.wikipedia.org/wiki/Stirling_number

35
Combinatorial Explosion Problem

An exhaustive tree algorithm considers all
possible partitions of all inputs at every node
in the tree. The combinatorial explosion usually
makes an exhaustive search prohibitively
expensive.
Tree algorithms usually take shortcuts to reduce
the split search.
Restricting searches to binary splits,
Using level clustering routines, and
Imposing minimum child size restrictions.
Other options designed to improve tree
efficiency, performance, and interpretability
also impact the split search. They include the
following
Minimum Categorical Size
Use Input Once
Within-node Sampling

36
Clustering Levels
37
Nominal Variable Split - Clustering Branches

Algorithm
Start with an L-way split.
Collapse the two levels that are closest (based
on a splitting criterion).
Repeat the process on the set of L 1
consolidated levels.
This gives a split of each size. Choose the best
of these.
Repeat this process for every input and choose
the best.
The CHAID algorithm adds a backward elimination
step (Kass 1980). The number of splits to
consider is greatly reduced, to L(L -1)/2 for
ordinal and to (L-1)L(L1)/6 for nominal. For
example, only 165 of the 115914 splits of a
10-level nominal input would be considered.

38
Multiway versus Binary
1
2
5
3
4
1
2
5
3
4
In theory, multiway splits are no more flexible
than binary splits. Multiway splits often give
more interpretable trees because split variables
tend to be used fewer times. Many prefer binary
splits because an exhaustive search is more
feasible.
39
SASs Split Search Strategy

SAS Decision Tree node uses a blend of different
shortcuts
By default, if the node size is greater than
5000, then a sample of 5000 cases is used. For
classification trees, the sample is constructed
to be as balanced as possible among the target
classes. To make changes to sample size, use Node
Sample.
Binary splits are used by default. To change
split number, use Maximum Branch.
If multiway splits are specified, then an initial
consolidation phase is conducted to group the
levels of the inputs.
All possible splits among the consolidated levels
are examined, unless that number exceeds 5000, in
which case, an agglomerative algorithm is used.
To change this threshold, use Exhaustive.
For categorical variables, a category must
contain at least the number of observations
specified in Minimum Categorical Size (default is
5) to be considered in a split search. Otherwise,
these observations are treated as missing values.
The use of an input can be limited with the Use
Input Once option. It is turned off by default.

40
Splitting Criteria
?38.5
lt38.5
X1
?Gini
?entropy
logworth
293
71
1
363
1
.197
.504
140
7
42
294
9
lt0.5
?51.5
1-41
42-51
X10
9
143
65
147
1
221
88
1
54
.255
.600
172
7
1
4
16
315
9
41
Impurity Reduction
Parent impurity0 n0
Child2 impurity2 n2
Child1 impurity1 n1
Child4 impurity4 n4
Child3 impurity3 n3
42
Gini Impurity
high diversity, low purity
Pr(interspecific encounter) 1-2(3/8)2-2(1/8)2
.69
low diversity, high purity
Pr(interspecific encounter) 1-(6/7)2-(1/7)2
.24
43
Entropy
1.0
0.5
0.0
0.0
0.5
1.0
44
Gini vs. Entropy

Gini
The Gini index is a measure of variability for
categorical data (developed by Italian
statistician Corrado Gini in 1912).
The Gini index can be interpreted as the
probability that any two elements of a multi-set
chosen at random are different.
In mathematical ecology, the Gini index is known
as Simpsons diversity index. In cryptanalysis,
it is 1 minus the repeat rate.
Entropy
Entropy is a measure of variability for
categorical data.
The ?entropy splitting criterion is used by
Quinlan (1993). It is equivalent to using the
likelihood ratio chi-squared test statistic for
association between the branches and the target
categories.
For classification trees with binary splits,
Breiman (1996) showed that the ?Gini criterion
tends to favor isolating the largest target class
in one branch while the ?entropy criterion tends
to favor split balance.
The ?Gini and ?entropy splitting criteria also
tend to increase as the number of branches
increase. They are not appropriate for fairly
evaluating multiway splits because they favor
large B.

45
Chi-Squared Test
Observed
Expected
?38.5
lt38.5
X1
293
71
.342
239
125
12
23
1
.342
363
1
239
125
64
123
7
.316
42
294
225
116
149
273
9
.656
.344
n1064
225 1064 0.656 0.316
46
Chi-Squared Test

The Pearson chi-squared test can be used to judge
the worth of the split. It tests whether the
column distributions (class proportions) are the
same in each row (child node). The test statistic
measures the difference between the observed cell
counts and what would be expected if the branches
and target classes (rows and columns) were
independent.
The statistical significance of the test is not
monotonically related to the size of the
chi-squared test statistic. The degrees of
freedom of the test is (r-1)(B-1), where r and B
are the dimensions of the table. The expected
value of a chi-squared test statistic with ?
degrees of freedom equals ?. Consequently, larger
tables (more branches) will naturally have larger
chi-squared statistics.
The chi-squared splitting criterion uses the
p-value of the chi-squared test (Kass 1980). When
the p-values are very small, it is more
convenient to use logworth -log10 (P-value) ,
which increases as P decreases.
The ?Gini and ?entropy splitting criteria also
tend to increase as the number of branches
increase. However, they do not have an analogous
degree of freedom adjustment. Consequently, they
are not appropriate for fairly evaluating
multiway splits because they favor large B.

47
p-Value Adjustments
38.5
X1
1
644
2
140
96
138
7
9
17.5
36.5
X1
1
249
42
73
660
4
141
4560
137
7
338
25
1
9
26
16
294
0.5
41.5
51.5
X10
1
814
6
172
156849
167
7
9
logworth -log10 (P-value)
48
Splitting Criteria

The Decision Tree node uses the logworth
(ProbChisq) chi-squared splitting criterion by
default. Alternatively, the ?Gini and ?entropy
splitting criteria can be specified.
By default, the Decision Tree node uses
Bonferroni adjustments to the p-value.
Kass (1980) adjusted the p-values after the split
was chosen on each input. Thus, p-values for
splits on the same input are compared without
adjustment. The Decision Tree node allows these
adjustments to be applied before the split
variable is chosen. Thus, splits on the same
input are compared using adjusted p-values.
So, after implies the adjusted p-values are
compared between different inputs not within the
same input?

49
P-Value Adjustments

Step one Comparing splits on the same input
variable
The chi-squared test statistic (as well as ?Gini
and ?entropy) favors splits into greater numbers
of branches. The p-value (or logworth) adjusts
for this bias through the degrees of freedom. For
binary splits, no adjustment is necessary.
Step two Comparing splits on different input
variables
The maximum logworth tends to become larger as
the number of splits, m, increases. Consequently,
input variables with a larger m are favored.
Nominal inputs are favored over ordinal inputs
with the same number of levels. Among inputs with
the same measurement scale, those with more
levels are favored. Kass (1980) proposed
Bonferroni adjustments of the p-values to account
for the bias. Let a be the probability of a type
I error on each test (that is, discovering an
erroneous association). For a set of m tests, a
conservative upper bound on the probability of at
least one type I error is ma (Bonferroni
inequality). Consequently, the Kass adjustment
multiplies the p-values by m (equivalently,
subtract log10 (m) from the logworth).

50
Splitting with Missing Values
1,2,3,?
1,2,3,?
2,3,?
1
2
1
3,?
1,2,3,?
1,2,3,?
2,3
1,?
2,?
1
3
1,2,3,?
1,2,3,?
1,2,3,?
3,?
1,2
2
1
3
?
2
1,?
3
1,2,3,?
1,2,3,?
3
1,2,?
2,3
1
?
1,2,3,?
1,2,3,?
?
1,2,3
3
1,2
?
51
Handling Missing Values

One of the chief benefits of recursive
partitioning is the treatment of missing input
data. Parametric regression models require
complete cases. One missing value on one input
variable eliminates that case from analysis.
Imputation methods are often used prior to model
fitting to fill in the missing values.
Decision trees can treat missing input values as
a separate level of the input variable. A nominal
input with L levels and a missing value can be
treated as an L 1 level input. If a new case
has a missing value on a splitting variable, then
the case is sent to whatever branch contains the
missing values.
In the case of an ordinal input with missing
values, the missing values cannot usually be
placed in order among the input levels but acts
as a nominal level. Consequently, the split
search should not place any restrictions on the
branch that contains the missing level.

52
Surrogate Splits

Surrogate splits can be used to handle missing
values (BFOS 1984). A surrogate split is a
partition using a different input that mimics the
selected split.
A perfect surrogate maps all the cases that are
in the same node of the primary split to the same
node of the surrogate split.
The agreement between two splits can be measured
as the proportion of cases that are sent to the
same branch. The split with the greatest
agreement is taken as the best surrogate.
The surrogates in SAS EM are used for scoring new
cases, not for fitting the training data. Missing
values on the training data are treated as a new
input level.
If a new case has a missing value on the
splitting variable, then the best surrogate is
used to classify the case.
If the surrogate variable is missing as well,
then the second best surrogate is used.
If the new case has a missing value on all the
surrogates, it is sent to the branch that
contains the missing values of the training data.

53
Surrogate Splits
Agreement76
1
9
9
1
1
1
1
1
1
1
1
9
9
9
9
9
9
9
9
1
1
1
1
1
1
9
9
9
9
9
9
9
9
9
1
1
1
1
1
1
1
9
9
9
9
9
9
9
9
9
9
9
9
1
1
1
1
1
1
1
1
9
9
9
9
9
9
9
9
9
9
9
9
1
1
1
1
1
1
1
1
1
9
9
9
9
9
9
9
1
1
1
1
1
1
1
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
1
1
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
No
1
9
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
1
9
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
9
9
9
9
1
1
9
9
9
9
9
12
354
1
1
9
7
9
9
9
9
1
1
9
9
9
1
1
9
9
9
9
9
X1lt38.5
9
9
1
9
9
1
9
1
1
9
1
7
1
1
1
454
244
7
7
9
1
9
9
1
1
9
9
1
7
7
7
9
9
1
1
1
1
1
7
7
1
7
9
1
1
1
1
7
7
7
9
9
1
1
1
1
1
1
1
1
1
1
7
7
7
9
1
1
7
7
7
9
9
1
1
1
1
7
7
1
1
1
1
1
7
7
9
1
1
1
7
7
7
9
Yes
1
1
7
7
7
7
1
1
7
7
1
1
1
7
7
7
7
7
7
7
1
1
1
1
7
7
7
9
1
1
1
1
1
1
1
7
7
9
9
1
1
1
7
7
7
7
7
7
7
9
7
7
7
7
7
1
1
7
7
7
7
1
1
1
1
7
7
7
9
1
1
1
7
7
7
7
7
7
7
7
7
7
7
1
1
1
1
1
1
7
7
7
7
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
1
7
7
7
7
7
7
1
1
1
1
1
1
1
1
7
7
1
1
1
1
7
7
7
7
1
7
7
7
7
7
7
7
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
9
9
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
9
9
9
1
1
1
1
7
7
7
7
7
7
Yes
No
X10 lt 41.5
54
Variable Importance
1
Developed by BFOS in 1984 Useful for tree
interpretation
Importance is a weighted average of the reduction
in impurity for the surrogate splits using the
jth input across all the internal nodes in the
tree
3
2
55
3
Recursive Partitioning
2.1 Recursive Partitioning
2.2 Split Selection Bias
2.3 Regression Diagnostics
56
Split Selection Bias
Worth Logworth (Branches) Logworth (Branches) Logworth (Branches)
?Gini No Kass Kass Before Kass After
2-Way Split 2-Way Split 2-Way Split 2-Way Split
Inv .0030 (2) 8.10 (2) 7.62 (2) 7.62 (2)
Branch .0043 (2) 11.32 (2) 5.90 (2) 5.90 (2)
?4-Way Split ?4-Way Split ?4-Way Split ?4-Way Split
Inv .0042 (3) 10.12 (3) 10.12 (3) 10.12 (3)
Branch .0059 (4) 13.51 (3) 6.50 (4) 6.05 (3)
?19-Way Split ?19-Way Split ?19-Way Split ?19-Way Split
Inv .0042 (3) 10.12 (3) 10.12 (3) 10.12 (3)
Branch .0062 (19) 13.51 (3) 7.14 (19) 6.05 (3)
57
Interval Targets
NOX
Density
25
50
0
RM
MEDV
58
Impurity Reduction
Parent i(0) n0
Child2 i(2) n2
Child1 i(1) n1
Child4 i(4) n4
Child3 i(3) n3
59
Variance Reduction
RMlt6.94
yes
no
60
One-Way ANOVA
...
61
Heteroscedasticity
62
3
2. Recursive Partitioning
2.1 Recursive Partitioning
2.2 Split Selection Bias
2.3 Regression Diagnostics
63
The SAS EM Model
64
Configure the SAS EM Model
65
Diagnose model residuals
66
Diagnose model residuals

Option B Use the SAS Code node to enhance and
register diagnostic output.
Select the upper SAS Code node. In the Training
section of the Properties panel, select SAS Code.
Right-click in the Editor window and select
Open. Select the EX2.2a.sas program.
Select OK to exit the SAS Code node. Run the node
and view results.
Copy the plot location from the Output window to
a Windows Explorer Address field. (Or select
Start -gt Run from the Windows toolbar and copy
this location into the Open field).
Close the HTML output and SAS Code results
windows.