TreeBased Methods - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

TreeBased Methods

Description:

1.3 Basic Issues in Tree-based Methods. 1. How to decide the ... 2. Grow the tree to the pre-defined size, then apply Cost-complexity pruning (preferred) ... – PowerPoint PPT presentation

Number of Views:165
Avg rating:3.0/5.0
Slides: 19
Provided by: Joy29
Category:

less

Transcript and Presenter's Notes

Title: TreeBased Methods


1
Tree-Based Methods
  • ENEE698A Communication Seminar
  • Nov. 05, 2003
  • He Huang

2
Outline
  • Overview of Tree-Based Methods
  • Regression Tree
  • Classification Tree
  • Spam Example (application of Classification Tree)

3
Overview of the Tree-Based Methods
  • 1.1 General Tree-Based Methods
  • a. Splitting the feature space into a set
    of regions.
  • b. Fit a simple model (e.g. constant) for
    each partition region.
  • Problem
  • Some resulting regions are complicated to
    describe.
  • Solution
  • Using recursive binary partition.

R5
c5
R4
c4
R1
R2
R3
c1
c2
c3
4
1.2 Recursive Binary Partition
  • a. How it works (an example)

R5
c5
t4
R2
c2
x2
R3
t2
c3
R4
c4
R1
c1
t1
t3
x1
Binary Tree
Binary Partition
5
1.2 Recursive Binary Partition (Cont.)
  • How to describe the model
  • Cm the regression model prediction value
    corresponding to the region Rm
  • 1.3 Basic Issues in Tree-based Methods
  • 1. How to decide the splitting point?
  • 2. How to control the size of the tree?

6
2. Regression Tree
  • 2.1 Basics for Regression Trees
  • For each of N observations, input is xi (xi1 ,
    xi2 , , xip),
  • output is yi
  • Splitting the space into M regions R1, R2, ,
    RM. And model each region as a constant Cm .
  • To decide the optimal value of the Cm , by
    minimizing the sum of squared error

7
2. Regression Tree (Cont.)
  • 2.2 How to decide each splitting point, the
    pair (j , s)?
  • Greedy algorithm
  • For each splitting variable j, the optimal
    splitting point s can be
  • decided by solving the criterion function (2-2).
    Then the best
  • pair (j , s) can be decided after going through
    all splitting
  • variables j.

8
2. Regression Tree (cont.)
  • Very large tree overfit
  • Small tree might not capture the structure
  • 2.3 How to control the size of the tree? (Where
    to stop the splitting?)
  • Strategies
  • 1. split only when the decrement of the error
    exceeds some threshold (short-sighted)
  • 2. Grow the tree to the pre-defined size, then
    apply Cost-complexity pruning (preferred)

9
2. Regression Tree Cost-complexity Pruning
  • Pruning collapsing some internal node to
    minimize
  • the Cost Complexity Criterion (2-3)
  • Nm number of the observation data falling in the
    region Rm
  • m index of terminal nodes on the binary tree T
  • T the number of terminal nodes in T
  • ? tuning parameter

Penalty on the complexity/size of the tree
Cost sum of squared errors
10
2. Regression Tree - Pruning
  • For each ?, there is a unique smallest tree T?
    that minimize C?(T).
  • To find T? weakest link pruning
  • Each time collapse an internal node which produce
    the smallest increase in Sm NmQm(T), continue
    until to the single-node tree.
  • Choose from this tree sequence T? that the C?(T)
    is the minimum.
  • To estimate the ? by the five- or tenfold cross
    validation. Choose the value ? to minimize the
    cross-validated sum of squares. The final tree is
    T? .



11
3. Classification Trees
  • 3.1 Basics of Classification Tree
  • a. For each observations, the output yi taking
    values
  • 1,2,,k (not continuous values as in
    Regression Trees).
  • Rm is the partition region corresponding to
    the terminal
  • node m, with Nm observations. Pmk is the
    proportion of
  • observation of class k in node m .
  • b. The majority class in node m is

12
3. Classification Trees (Cont.)
  • 3.2 The criterions applied for splitting and
    pruning
  • Misclassification error
  • Gini index
  • Cross-entropy or deviance

13
3. Classification Trees (cont.)
  • Three cost criterion for 2-class classification,
    a function of the proportion p
  • in one class
  • Cross-entropy and Gini are more sensitive

14
4. Spam example Classification Tree
  • 4601 observations
  • inputs58 attributes indicating if a particular
    word or a character is frequently used in the
    spam emails.
  • Outputs spam or non-spam
  • The purpose to generate a spam filter.
  • Grow the tree by Cross-entropy,
  • Prune the tree by Misclassification error.

15
(No Transcript)
16
Predicted
True
email
spam
email
57.3
4.0
spam
5.3
33.4
Overall error rate is 8.7
17
Reference
  • 1. The elements of statistical learning-Data
    mining, Inference and Prediction.Trevor Hastie,
    Robert Tibshirani, Jerome Friedman.
  • 2. Pattern Recognition and Neural Networks,
    Ripley, B.D, Cambridge University Press.
  • 3. Classification and Regression Trees, Breiman,
    L. etc, Wadsworth.

18
  • The End
  • Thanks!
Write a Comment
User Comments (0)
About PowerShow.com