Boosted Decision Trees, an Alternative to Artificial Neural Networks presentation

About This Presentation

Transcript and Presenter's Notes

Title: Boosted Decision Trees, an Alternative to Artificial Neural Networks

1
Boosted Decision Trees, an Alternative to
Artificial Neural Networks

B. Roe, University of Michigan

2
Collaborators on this work

H. Yang, J. Zhu, University of Michigan
Y. Liu, I. Stancu, University of Alabama
G. McGregor, Los Alamos National Lab.
and the good people of Mini-BooNE

3
What I will talk about

I will VERY BRIEFLY mention Artificial Neural
Networks (ANN), introduce the new technique of
boosted decision trees and then, using the
miniBooNE experiment as a test bed compare the
techniques for distinguishing signal from
background

4
Outline

What is ANN?
What is Boosting?
What is MiniBooNE?
Comparisons of ANN and Boosting for the MiniBooNE
experiment

5
Artificial Neural Networks

Use to classify events, for example into signal
and noise/background.
Suppose you have a set of feature variables,
obtained from the kinematic variables of the
event

6
Neural Network Structure

Combine the features in a non-linear way to a
hidden layer and then to a final layer
Use a training set to find the best wik to
distinguish signal and background

7
Training and Testing Events

Both ANN and boosting algorithms use a set of
known events to train the algorithm.
It would be biased to use the same set to
estimate the accuracy of the selection the
algorithm has been trained for this specific
sample.
A new set, the testing set of events, is used to
test the algorithm.
All results quoted here are for the testing set.

8
Boosted Decision Trees

What is a decision tree?
What is boosting the decision trees?
Two algorithms for boosting.

9
Decision Tree

Go through all PID variables and find best
variable and value to split events.
For each of the two subsets repeat the process
Proceeding in this way a tree is built.
Ending nodes are called leaves.

10
Select Signal and Background Leaves

Assume an equal weight of signal and background
training events.
If more than ½ of the weight of a leaf
corresponds to signal, it is a signal leaf
otherwise it is a background leaf.
Signal events on a background leaf or background
events on a signal leaf are misclassified.

11
Criterion for Best Split

Purity, P, is the fraction of the weight of a
leaf due to signal events.
Gini Note that gini is 0 for all signal or all
background.
The criterion is to minimize gini_left
gini_right.

12
Criterion for Next Branch to Split

Pick the branch to maximize the change in gini.
Criterion giniparent giniright-child
ginileft-child

13
Decision Trees

This is a decision tree
They have been known for some time, but often are
unstable a small change in the training sample
can produce a large difference.

14
Boosting the Decision Tree

Give the training events misclassified under this
proceedure a higher weight.
Continuing build perhaps 1000 trees and average
the results (1 if signal leaf, -1 if background
leaf).

15
Two Commonly used Algorithms for changing weights

1. AdaBoost
2. Epsilon boost (shrinkage)

16
Definitions

Xi set of particle ID variables for event i
Yi 1 if event i is signal, -1 if background
Tm(xi) 1 if event i lands on a signal leaf of
tree m and -1 if the event lands on a background
leaf.

17
AdaBoost

Define err_m weight wrong/total weight

18
Scoring events with AdaBoost

Renormalize weights
Score by summing over trees

19
Epsilon Boost (shrinkage)

After tree m, change weight of misclassified
events, typical 0.01 (0.05). For wrong
events
Renormalize weights
Score by summing over trees

20
Example

AdaBoost Suppose the weighted error rate is 40,
i.e., err0.4 and beta 1/2
Then alpha (1/2)ln((1-.4)/.4) .203
Weight of a misclassified event is multiplied by
exp(0.203)1.225
Epsilon boost The weight of wrong events is
increased by exp(2X.01) 1.02

21
Comparison of methods

Epsilon boost changes weights a little at a time
AdaBoost can be shown to try to optimize each
change of weights. Lets look a little further at
that

22
AdaBoost Optimization
23
AdaBoost Fitting is Monotone
24
References

R.E. Schapire The strength of weak
learnability. Machine Learning 5 (2), 197-227
(1990). First suggested the boosting approach
for 3 trees taking a majority vote
Y. Freund, Boosting a weak learning algorithm
by majority, Information and Computation 121
(2), 256-285 (1995) Introduced using many trees
Y. Freund and R.E. Schapire, Experiments with
an new boosting algorithm, Machine Learning
Proceedings of the Thirteenth International
Conference, Morgan Kauffman, SanFrancisco,
pp.148-156 (1996). Introduced AdaBoost
J. Friedman, T. Hastie, and R. Tibshirani,
Additive logistic regression a statistical
view of boosting, Annals of Statistics 28 (2),
337-407 (2000). Showed that AdaBoost could be
looked at as successive approximations to a
maximum likelihood solution.
T. Hastie, R. Tibshirani, and J. Friedman, The
Elements of Statistical Learning Springer
(2001). Good reference for decision trees and
boosting.

25
The MiniBooNE Experiment
26
The MiniBooNE Collaboration
Y.Liu, I.StancuUniversity of Alabama S.Koutsoliot
asBucknell University E.Hawker, R.A.Johnson,
J.L.RaafUniversity of Cincinnati T.Hart,
R.H.Nelson, E.D.ZimmermanUniversity of
Colorado A.A.Aguilar-Arevalo, L.Bugel,
J.M.Conrad, J.Link, J.Monroe, D.Schmitz,
M.H.Shaevitz, M.Sorel, G.P.ZellerColumbia
University D.SmithEmbry Riddle Aeronautical
University L.Bartoszek, C.Bhat, S.J.Brice,
B.C.Brown, D.A.Finley, R.Ford, F.G.Garcia,
P.Kasper, T.Kobilarcik, I.Kourbanis, A.Malensek,
W.Marsh, P.Martin, F.Mills, C.Moore, E.Prebys,
A.D.Russell, P.Spentzouris, R.Stefanski,
T.WilliamsFermi National Accelerator
Laboratory D.Cox, A.Green, T.Katori, H.Meyer,
R.TayloeIndiana University G.T.Garvey, C.Green,
W.C.Louis, G.McGregor, S.McKenney, G.B.Mills,
H.Ray, V.Sandberg, B.Sapp, R.Schirato, R.Van de
Water, N.L.Walbridge, D.H.WhiteLos Alamos
National Laboratory R.Imlay, W.Metcalf,
S.Ouedraogo, M.Sung, M.O.WasckoLouisiana State
University J.Cao, Y.Liu, B.P.Roe,
H.J.YangUniversity of Michigan A.O.Bazarko,
P.D.Meyers, R.B.Patterson, F.C.Shoemaker,
H.A.TanakaPrinceton University P.NienaberSt.
Mary's University of Minnesota B.T.FlemingYale
University
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
Examples of data events
33
Numerical Results

There are 2 reconstruction-particle id packages
used in MiniBooNE, rfitter and sfitter
The best results for ANN and Boosting used
different numbers of variables, 21 or 22 being
best for ANN and 50-52 for boosting
Results quoted are ratios of background kept by
ANN to background kept for boosting, for a given
fraction of signal events kept
Only relative results are shown

34
(No Transcript)
35
Comparison of Boosting and ANN

A. Bkrd are cocktail events. Red is 21 and
black is 52 training var.
B. Bkrd are pi0 events. Red is 22 and black is 52
training variables
Relative ratio is ANN bkrd kept/Boosting bkrd kept

Percent nue CCQE kept
36
Comparison of 21 (or 22) vs 52 variables for
Boosting

Vertical axis is the ratio of bkrd kept for
21(22) var./that kept for 52 var., both for
boosting
Red is if training sample is cocktail and black
is if training sample is pi0
Error bars are MC statistical errors only

Ratio
37
AdaBoost vs Epsilon Boost and differing tree sizes

A. Bkrd for 8 leaves/ bkrd for 45 leaves. Red
is AdaBoost, Black is Epsilon Boost
B. Bkrd for AdaBoost/ bkrd for Epsilon Boost
Nleaves 45.

38
Numerical Results from sfitter

Extensive attempt to find best variables for ANN
and for boosting starting from about 3000
candidates
Train against pi0 and related backgrounds22 ANN
variables and 50 boosting variables for the
region near 50 of signal kept, the ratio of ANN
to boosting background was about 1.2

39
How did the sensitivities change with a new
optical model?

In Nov. 04, a new much changed optical model was
introduced for making MC events
Both rfitter and sfitter needed to be changed to
optimize fits for this model
Using the SAME feature variables as for the old
model
For both rfitter and sfitter, the boosting
results were about the same.
For sfitter, the ANN results became about a
factor of 2 worse

40
Number of feature variables in boosting

In recent trials we have used 92 variables.
Boosting worked well.
However, by looking at the frequency with which
each variable was used as a splitting variable,
it was possible to reduce the number to 60
without loss of sensitivity.

41
For ANN

For ANN one needs to set temperature, hidden
layer size, learning rate There are lots of
parameters to tune.
For ANN if one
a. Multiplies a variable by a
constant,
var(17)? 2.var(17)
b. Switches two variables
var(17)??var(18)
c. Puts a variable in twice
The result is very likely to change.

42
For boosting

Boosting can handle more variables than ANN it
will use what it needs.
Duplication or switching of variables will not
affect boosting results.
Suppose we make a change of variables yf(x),
such that if x_2gtx_1, then y_2gty_1. The boosting
results are unchanged. They depend only on the
ordering of the events
There is considerably less tuning for boosting
than for ANN.

43
Robustness

For either boosting or ANN, it is important to
know how robust the method is, i.e. will small
changes in the model produce large changes in
output.
In mini-BooNE this is handled by generating many
sets of events with parameters varied by about 1
sigma and checking on the differences. This is
not complete, but, so far, the selections look
quite robust.

44
Conclusions

For MiniBooNE boosting is better than ANN by a
factor of 1.21.8
AdaBoost and Epsilon Boost give comparable
results within the region of interest (40--60
nue kept)
Use of a larger number of leaves (45) gives
10--20 better performance than use of a small
number (8).
It is expected that boosting techniques will have
wide applications in physics.
Preprint Physics/0408124 N.I.M., in press
C and FORTRAN versions of the boost program
(including a manual) are available on my
homepage
http//www.gallatin.physics.lsa.umich.edu/ro
e/

Write a Comment

User Comments (0)

About PowerShow.com

Boosted Decision Trees, an Alternative to Artificial Neural Networks PowerPoint PPT Presentation