Boosted Decision Trees, an Alternative to Artificial Neural Networks PowerPoint PPT Presentation

presentation player overlay
1 / 44
About This Presentation
Transcript and Presenter's Notes

Title: Boosted Decision Trees, an Alternative to Artificial Neural Networks


1
Boosted Decision Trees, an Alternative to
Artificial Neural Networks
  • B. Roe, University of Michigan

2
Collaborators on this work
  • H. Yang, J. Zhu, University of Michigan
  • Y. Liu, I. Stancu, University of Alabama
  • G. McGregor, Los Alamos National Lab.
  • and the good people of Mini-BooNE

3
What I will talk about
  • I will VERY BRIEFLY mention Artificial Neural
    Networks (ANN), introduce the new technique of
    boosted decision trees and then, using the
    miniBooNE experiment as a test bed compare the
    techniques for distinguishing signal from
    background

4
Outline
  • What is ANN?
  • What is Boosting?
  • What is MiniBooNE?
  • Comparisons of ANN and Boosting for the MiniBooNE
    experiment

5
Artificial Neural Networks
  • Use to classify events, for example into signal
    and noise/background.
  • Suppose you have a set of feature variables,
    obtained from the kinematic variables of the
    event

6
Neural Network Structure
  • Combine the features in a non-linear way to a
    hidden layer and then to a final layer
  • Use a training set to find the best wik to
    distinguish signal and background

7
Training and Testing Events
  • Both ANN and boosting algorithms use a set of
    known events to train the algorithm.
  • It would be biased to use the same set to
    estimate the accuracy of the selection the
    algorithm has been trained for this specific
    sample.
  • A new set, the testing set of events, is used to
    test the algorithm.
  • All results quoted here are for the testing set.

8
Boosted Decision Trees
  • What is a decision tree?
  • What is boosting the decision trees?
  • Two algorithms for boosting.

9
Decision Tree
  • Go through all PID variables and find best
    variable and value to split events.
  • For each of the two subsets repeat the process
  • Proceeding in this way a tree is built.
  • Ending nodes are called leaves.

10
Select Signal and Background Leaves
  • Assume an equal weight of signal and background
    training events.
  • If more than ½ of the weight of a leaf
    corresponds to signal, it is a signal leaf
    otherwise it is a background leaf.
  • Signal events on a background leaf or background
    events on a signal leaf are misclassified.

11
Criterion for Best Split
  • Purity, P, is the fraction of the weight of a
    leaf due to signal events.
  • Gini Note that gini is 0 for all signal or all
    background.
  • The criterion is to minimize gini_left
    gini_right.

12
Criterion for Next Branch to Split
  • Pick the branch to maximize the change in gini.
  • Criterion giniparent giniright-child
    ginileft-child

13
Decision Trees
  • This is a decision tree
  • They have been known for some time, but often are
    unstable a small change in the training sample
    can produce a large difference.

14
Boosting the Decision Tree
  • Give the training events misclassified under this
    proceedure a higher weight.
  • Continuing build perhaps 1000 trees and average
    the results (1 if signal leaf, -1 if background
    leaf).

15
Two Commonly used Algorithms for changing weights
  • 1. AdaBoost
  • 2. Epsilon boost (shrinkage)

16
Definitions
  • Xi set of particle ID variables for event i
  • Yi 1 if event i is signal, -1 if background
  • Tm(xi) 1 if event i lands on a signal leaf of
    tree m and -1 if the event lands on a background
    leaf.

17
AdaBoost
  • Define err_m weight wrong/total weight

18
Scoring events with AdaBoost
  • Renormalize weights
  • Score by summing over trees

19
Epsilon Boost (shrinkage)
  • After tree m, change weight of misclassified
    events, typical 0.01 (0.05). For wrong
    events
  • Renormalize weights
  • Score by summing over trees

20
Example
  • AdaBoost Suppose the weighted error rate is 40,
    i.e., err0.4 and beta 1/2
  • Then alpha (1/2)ln((1-.4)/.4) .203
  • Weight of a misclassified event is multiplied by
    exp(0.203)1.225
  • Epsilon boost The weight of wrong events is
    increased by exp(2X.01) 1.02

21
Comparison of methods
  • Epsilon boost changes weights a little at a time
  • AdaBoost can be shown to try to optimize each
    change of weights. Lets look a little further at
    that

22
AdaBoost Optimization
23
AdaBoost Fitting is Monotone
24
References
  • R.E. Schapire The strength of weak
    learnability. Machine Learning 5 (2), 197-227
    (1990). First suggested the boosting approach
    for 3 trees taking a majority vote
  • Y. Freund, Boosting a weak learning algorithm
    by majority, Information and Computation 121
    (2), 256-285 (1995) Introduced using many trees
  • Y. Freund and R.E. Schapire, Experiments with
    an new boosting algorithm, Machine Learning
    Proceedings of the Thirteenth International
    Conference, Morgan Kauffman, SanFrancisco,
    pp.148-156 (1996). Introduced AdaBoost
  • J. Friedman, T. Hastie, and R. Tibshirani,
    Additive logistic regression a statistical
    view of boosting, Annals of Statistics 28 (2),
    337-407 (2000). Showed that AdaBoost could be
    looked at as successive approximations to a
    maximum likelihood solution.
  • T. Hastie, R. Tibshirani, and J. Friedman, The
    Elements of Statistical Learning Springer
    (2001). Good reference for decision trees and
    boosting.

25
The MiniBooNE Experiment
26
The MiniBooNE Collaboration
Y.Liu, I.StancuUniversity of Alabama S.Koutsoliot
asBucknell University E.Hawker, R.A.Johnson,
J.L.RaafUniversity of Cincinnati T.Hart,
R.H.Nelson, E.D.ZimmermanUniversity of
Colorado A.A.Aguilar-Arevalo, L.Bugel,
J.M.Conrad, J.Link, J.Monroe, D.Schmitz,
M.H.Shaevitz, M.Sorel, G.P.ZellerColumbia
University D.SmithEmbry Riddle Aeronautical
University L.Bartoszek, C.Bhat, S.J.Brice,
B.C.Brown, D.A.Finley, R.Ford, F.G.Garcia,
P.Kasper, T.Kobilarcik, I.Kourbanis, A.Malensek,
W.Marsh, P.Martin, F.Mills, C.Moore, E.Prebys,
A.D.Russell, P.Spentzouris, R.Stefanski,
T.WilliamsFermi National Accelerator
Laboratory D.Cox, A.Green, T.Katori, H.Meyer,
R.TayloeIndiana University G.T.Garvey, C.Green,
W.C.Louis, G.McGregor, S.McKenney, G.B.Mills,
H.Ray, V.Sandberg, B.Sapp, R.Schirato, R.Van de
Water, N.L.Walbridge, D.H.WhiteLos Alamos
National Laboratory R.Imlay, W.Metcalf,
S.Ouedraogo, M.Sung, M.O.WasckoLouisiana State
University J.Cao, Y.Liu, B.P.Roe,
H.J.YangUniversity of Michigan A.O.Bazarko,
P.D.Meyers, R.B.Patterson, F.C.Shoemaker,
H.A.TanakaPrinceton University P.NienaberSt.
Mary's University of Minnesota B.T.FlemingYale
University
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
Examples of data events
33
Numerical Results
  • There are 2 reconstruction-particle id packages
    used in MiniBooNE, rfitter and sfitter
  • The best results for ANN and Boosting used
    different numbers of variables, 21 or 22 being
    best for ANN and 50-52 for boosting
  • Results quoted are ratios of background kept by
    ANN to background kept for boosting, for a given
    fraction of signal events kept
  • Only relative results are shown

34
(No Transcript)
35
Comparison of Boosting and ANN
  • A. Bkrd are cocktail events. Red is 21 and
    black is 52 training var.
  • B. Bkrd are pi0 events. Red is 22 and black is 52
    training variables
  • Relative ratio is ANN bkrd kept/Boosting bkrd kept

Percent nue CCQE kept
36
Comparison of 21 (or 22) vs 52 variables for
Boosting
  • Vertical axis is the ratio of bkrd kept for
    21(22) var./that kept for 52 var., both for
    boosting
  • Red is if training sample is cocktail and black
    is if training sample is pi0
  • Error bars are MC statistical errors only

Ratio
37
AdaBoost vs Epsilon Boost and differing tree sizes
  • A. Bkrd for 8 leaves/ bkrd for 45 leaves. Red
    is AdaBoost, Black is Epsilon Boost
  • B. Bkrd for AdaBoost/ bkrd for Epsilon Boost
    Nleaves 45.

38
Numerical Results from sfitter
  • Extensive attempt to find best variables for ANN
    and for boosting starting from about 3000
    candidates
  • Train against pi0 and related backgrounds22 ANN
    variables and 50 boosting variables for the
    region near 50 of signal kept, the ratio of ANN
    to boosting background was about 1.2

39
How did the sensitivities change with a new
optical model?
  • In Nov. 04, a new much changed optical model was
    introduced for making MC events
  • Both rfitter and sfitter needed to be changed to
    optimize fits for this model
  • Using the SAME feature variables as for the old
    model
  • For both rfitter and sfitter, the boosting
    results were about the same.
  • For sfitter, the ANN results became about a
    factor of 2 worse

40
Number of feature variables in boosting
  • In recent trials we have used 92 variables.
    Boosting worked well.
  • However, by looking at the frequency with which
    each variable was used as a splitting variable,
    it was possible to reduce the number to 60
    without loss of sensitivity.

41
For ANN
  • For ANN one needs to set temperature, hidden
    layer size, learning rate There are lots of
    parameters to tune.
  • For ANN if one
  • a. Multiplies a variable by a
    constant,
  • var(17)? 2.var(17)
  • b. Switches two variables
  • var(17)??var(18)
  • c. Puts a variable in twice
  • The result is very likely to change.

42
For boosting
  • Boosting can handle more variables than ANN it
    will use what it needs.
  • Duplication or switching of variables will not
    affect boosting results.
  • Suppose we make a change of variables yf(x),
    such that if x_2gtx_1, then y_2gty_1. The boosting
    results are unchanged. They depend only on the
    ordering of the events
  • There is considerably less tuning for boosting
    than for ANN.

43
Robustness
  • For either boosting or ANN, it is important to
    know how robust the method is, i.e. will small
    changes in the model produce large changes in
    output.
  • In mini-BooNE this is handled by generating many
    sets of events with parameters varied by about 1
    sigma and checking on the differences. This is
    not complete, but, so far, the selections look
    quite robust.

44
Conclusions
  • For MiniBooNE boosting is better than ANN by a
    factor of 1.21.8
  • AdaBoost and Epsilon Boost give comparable
    results within the region of interest (40--60
    nue kept)
  • Use of a larger number of leaves (45) gives
    10--20 better performance than use of a small
    number (8).
  • It is expected that boosting techniques will have
    wide applications in physics.
  • Preprint Physics/0408124 N.I.M., in press
  • C and FORTRAN versions of the boost program
    (including a manual) are available on my
    homepage
  • http//www.gallatin.physics.lsa.umich.edu/ro
    e/
Write a Comment
User Comments (0)
About PowerShow.com