Title: TreeBased Methods
1Tree-Based Methods
- ENEE698A Communication Seminar
-
- Nov. 05, 2003
- He Huang
-
2 Outline
- Overview of Tree-Based Methods
- Regression Tree
- Classification Tree
- Spam Example (application of Classification Tree)
3Overview of the Tree-Based Methods
- 1.1 General Tree-Based Methods
- a. Splitting the feature space into a set
of regions. - b. Fit a simple model (e.g. constant) for
each partition region. -
-
- Problem
- Some resulting regions are complicated to
describe. - Solution
- Using recursive binary partition.
R5
c5
R4
c4
R1
R2
R3
c1
c2
c3
4 1.2 Recursive Binary Partition
- a. How it works (an example)
R5
c5
t4
R2
c2
x2
R3
t2
c3
R4
c4
R1
c1
t1
t3
x1
Binary Tree
Binary Partition
5 1.2 Recursive Binary Partition (Cont.)
- How to describe the model
- Cm the regression model prediction value
corresponding to the region Rm -
- 1.3 Basic Issues in Tree-based Methods
- 1. How to decide the splitting point?
- 2. How to control the size of the tree?
6 2. Regression Tree
- 2.1 Basics for Regression Trees
- For each of N observations, input is xi (xi1 ,
xi2 , , xip), - output is yi
- Splitting the space into M regions R1, R2, ,
RM. And model each region as a constant Cm . - To decide the optimal value of the Cm , by
minimizing the sum of squared error
7 2. Regression Tree (Cont.)
- 2.2 How to decide each splitting point, the
pair (j , s)? - Greedy algorithm
- For each splitting variable j, the optimal
splitting point s can be - decided by solving the criterion function (2-2).
Then the best - pair (j , s) can be decided after going through
all splitting - variables j.
82. Regression Tree (cont.)
- Very large tree overfit
- Small tree might not capture the structure
- 2.3 How to control the size of the tree? (Where
to stop the splitting?) -
- Strategies
- 1. split only when the decrement of the error
exceeds some threshold (short-sighted) - 2. Grow the tree to the pre-defined size, then
apply Cost-complexity pruning (preferred)
92. Regression Tree Cost-complexity Pruning
- Pruning collapsing some internal node to
minimize - the Cost Complexity Criterion (2-3)
- Nm number of the observation data falling in the
region Rm - m index of terminal nodes on the binary tree T
- T the number of terminal nodes in T
- ? tuning parameter
Penalty on the complexity/size of the tree
Cost sum of squared errors
102. Regression Tree - Pruning
- For each ?, there is a unique smallest tree T?
that minimize C?(T). - To find T? weakest link pruning
- Each time collapse an internal node which produce
the smallest increase in Sm NmQm(T), continue
until to the single-node tree. - Choose from this tree sequence T? that the C?(T)
is the minimum. - To estimate the ? by the five- or tenfold cross
validation. Choose the value ? to minimize the
cross-validated sum of squares. The final tree is
T? .
113. Classification Trees
- 3.1 Basics of Classification Tree
- a. For each observations, the output yi taking
values - 1,2,,k (not continuous values as in
Regression Trees). - Rm is the partition region corresponding to
the terminal - node m, with Nm observations. Pmk is the
proportion of - observation of class k in node m .
-
- b. The majority class in node m is
-
123. Classification Trees (Cont.)
- 3.2 The criterions applied for splitting and
pruning - Misclassification error
- Gini index
- Cross-entropy or deviance
133. Classification Trees (cont.)
- Three cost criterion for 2-class classification,
a function of the proportion p - in one class
- Cross-entropy and Gini are more sensitive
144. Spam example Classification Tree
- 4601 observations
- inputs58 attributes indicating if a particular
word or a character is frequently used in the
spam emails. - Outputs spam or non-spam
- The purpose to generate a spam filter.
- Grow the tree by Cross-entropy,
- Prune the tree by Misclassification error.
15(No Transcript)
16Predicted
True
email
spam
email
57.3
4.0
spam
5.3
33.4
Overall error rate is 8.7
17Reference
- 1. The elements of statistical learning-Data
mining, Inference and Prediction.Trevor Hastie,
Robert Tibshirani, Jerome Friedman. - 2. Pattern Recognition and Neural Networks,
Ripley, B.D, Cambridge University Press. - 3. Classification and Regression Trees, Breiman,
L. etc, Wadsworth.
18