Variable Randomness in Decision Tree Ensembles Fei Tony Liu

About This Presentation

Title:

Variable Randomness in Decision Tree Ensembles Fei Tony Liu

Description:

Some decision tree ensembles utilize randomization techniques to create diverse ... Examples of such are Bagging, Randomized Trees, Random Subspace, Decision Tree ... – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 2

Provided by: ndea

Category:

more less

Transcript and Presenter's Notes

Title: Variable Randomness in Decision Tree Ensembles Fei Tony Liu

1
Variable Randomness in Decision Tree
EnsemblesFei Tony Liu Kai Ming TingGippsland
School of Information Technology
The ability to approximate non-axis-parallel
boundary increase.
Introduction Decision Tree Ensemble is a kind of
data-mining classifier, which combines multiple
decision trees for predictive tasks. Some
decision tree ensembles utilize randomization
techniques to create diverse individual trees.
Examples of such are Bagging, Randomized Trees,
Random Subspace, Decision Tree randomization,
Random Forests, Random Decision Tree and
Max-diverse Ensemble. In this work, we are
concerned with three research questions
Gaussian Mixture data
2. What are the effects of different degrees of
randomness? (a) Using Max-diverse.a, one can
explore the changes in forming non-axis-parallel
boundries as a varies. An example can be found
in Figure 1 and 2. (b) Analytically, we find
that the ability to eliminate irrelevant features
also changes with the degrees of randomness.
a 0.1 a decreases Fig. 2 Decision
boundaries of Max-diverse. a
Optimal Decision Boundary
Training samples Fig. 1
3. What is the appropriate level of randomness
for a given problem? To choose an appropriate a
value for a given task, in Figure 3, we introduce
an estimation procedure based on progressive
training errors. Using progressive training
errors, Max-diverse.a is able to select an
estimated a prior to its predictive tasks. Our
experiment shows that Max-diverse.a is
significantly better than Max-diverse Ensemble
and Random Forests. It is also comparable to C5
boosting. A pair-wise comparison can be found in
Figure 4.
a 0.5 a 0.9
1. Is there a better way to control the amount of
randomness used? We propose Max-diverse.a and the
feature selection process can be found in
Algorithm 1. Max-diverse.a is a better
alternative to control the amount of randomness
used as compared to Random Forests. It covers
the full spectrum of variable randomness from
completely random to pure-deterministic, which
gives a fine granularity representing any level
of randomness.
t is the number of trees in an ensemble, err( )
return the training error of an ensemble of size
i, set at a. Fig. 3 Estimating a prior to
predictive tasks
Sub Heading apears in Arial 20pt size, bold, left
aligned Body text appears in Arial, 16pt size,
left aligned.
Fig. 4 A pair-wise comparison using 45 UCI data
sets.
This work is awarded both the Best Paper and the
Best Student Paper Awards in PAKDD 06.

Write a Comment

User Comments (0)