Decision Tree Pruning - PowerPoint PPT Presentation

About This Presentation
Title:

Decision Tree Pruning

Description:

Reduced Error Pruning. Split the sample to two part S1 and S2. Use S1 to build a tree. Use S2 to sample whether to prune. Process every inner node v ... – PowerPoint PPT presentation

Number of Views:1333
Avg rating:3.0/5.0
Slides: 25
Provided by: tau
Category:

less

Transcript and Presenter's Notes

Title: Decision Tree Pruning


1
Decision Tree Pruning
2
Problem Statement
  • We like to output small decision tree
  • Model Selection
  • The building is done until zero training error
  • Option I Stop Early
  • Small decrease in index function
  • Cons may miss structure
  • Option 2 Prune after building.

3
Pruning
  • Input tree T
  • Sample S
  • Output Tree T
  • Basic Pruning T is a sub-tree of T
  • Can only replace inner nodes by leaves
  • More advanced
  • Replace an inner node by one of its children

4
Reduced Error Pruning
  • Split the sample to two part S1 and S2
  • Use S1 to build a tree.
  • Use S2 to sample whether to prune.
  • Process every inner node v
  • After all its children has been process
  • Compute the observed error of Tv and leaf(v)
  • If leaf(v) has less errors replace Tv by leaf(v)

5
Reduced Error Pruning Example
6
Pruning CV SRM
  • Generate for each pruning size
  • compute the minimal error pruning
  • At most m different sub-trees
  • Select between the prunings
  • Cross Validation
  • Structural Risk Minimization
  • Any other index method

7
Finding the minimum pruning
  • Procedure Compute
  • Inputs
  • k number of errors
  • T tree
  • S sample
  • Output
  • P pruned tree
  • size size of P

8
Procedure compute
  • IF IsLeaf(T)
  • IF Errors(T) ? k
  • THEN size1
  • ELSE size ?
  • PT return
  • IF Errors(root(T)) ? k
  • size1 Proot(T) return

9
Procedure compute
  • For i 0 to k DO
  • Call Compute(i, T0, S0, sizei,0,Pi.0)
  • Call Compute(k-i, T1, S1, sizei,1,Pi.1)
  • size minimum sizei,0 sizei,1 1
  • I arg min sizei,0 sizei,1 1
  • P MakeTree(root(T),PI,0, PI,1
  • What is the time complexity?

10
Cross Validation
  • Split the sample S1 and S2
  • Build a tree using S1
  • Compute the candidate pruning
  • Select using S2
  • Output the tree with smallest error on S2

11
SRM
  • Build a Tree T using S
  • Compute the candidate pruning
  • kd the size of the pruning with d errors
  • Select using the SRM formula

12
Drawbacks
  • Running time
  • Since T O(m)
  • Running time O(m2)
  • Many passes over the data
  • Significant drawback for large data sets

13
Linear Time Pruning
  • Single Bottom-up pass
  • linear time
  • Use SRM like formula
  • Local soundness
  • Competitiveness to any pruning

14
Algorithm
  • Process a node after processing its children
  • Local parameters
  • Tv current sub-tree at v, of size sizev
  • Sv sample reaching v, of size mv
  • lv length of path leading to v
  • Local Test
  • obs(Tv,Sv) a(mv,sizev,lv,?) gt obs (root(Tv),Sv)

15
The function a()
  • Parameters
  • paths(H,l) set of paths of length l over H.
  • trees(H,s) set of trees of size s over H.
  • Formula

16
The function a()
  • Finite Class H
  • paths(H,l) lt Hl.
  • trees(H,s) lt (4H)s.
  • Formula
  • Infinite Classes VC-dim

17
Example
m
lv 3
mv
a(mv,sizev,lv,?)
sizev
18
Local uniform convergence
  • Sample S
  • Sc x ?S c(x)1, mcSc
  • Finite classes C and H
  • e(hc) Pr h(x) ?f(x) c(x)1
  • obs(hc)
  • Lemma with probability 1-?

19
Global Analysis
  • Notation
  • T original tree (depends on S)
  • T pruned tree
  • Topt optimal tree
  • rv (lvsizev)logH log (mv/?)
  • a(mv,sizev,lv,?) O( sqrt rv/mv )? 1

20
Sub-Tree Property
  • Lemma with probability 1-?
  • T is a sub-tree of Topt
  • Proof
  • Assume the all the local lemmas hold.
  • Each pruning reduces the error.
  • Assume T has a subtree outside Topt
  • Adding that subtree to Topt will improve it!

21
Comparing T and Topt
  • Additional pruned nodes Vv1, , vt
  • Additional error
  • e(T) - e(Topt) ? (e(vi)-e(Tvi))Prvi
  • Claim With high probability

22
Analysis
  • Lemma With probability 1-?
  • If Prvi gt 12(lopt log H log 1/ ?)/m b
  • THEN Prvi gt 2obs(vi)
  • Proof
  • Relative Chernoff Bound.
  • Union over Hl paths.
  • V vi ?V Prvigtb

23
Analysis of ?
  • Sum over V-V bounded by sopt b

24
Analysis of ?
  • Sum of mv lt loptsizeopt
  • Sum of rv ltsizeopt(lopt log H log m/ ?)
  • Putting it all together
Write a Comment
User Comments (0)
About PowerShow.com