Some Current Statistical Considerations in Particle Physics - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

Some Current Statistical Considerations in Particle Physics

Description:

Try to determine a parameter l given a measurement x ... Both Cher. and scintillation light. Geometrical shape and timing distinguishes events ... – PowerPoint PPT presentation

Number of Views:116
Avg rating:3.0/5.0
Slides: 56
Provided by: byr79
Category:

less

Transcript and Presenter's Notes

Title: Some Current Statistical Considerations in Particle Physics


1
Some Current Statistical Considerations in
Particle Physics
  • Byron P. Roe
  • Department of Physics
  • University of Michigan
  • Ann Arbor, MI 48109

2
Outline
  • Preliminaries
  • Nuisance Variables
  • Modern Classification Methods (especially
    boosting and related methods).

3
Preliminaries
  • Try to determine a parameter l given a
    measurement x
  • For each l draw line so probability x is within
    limits is 90
  • The probability of a result falling in region is
    90
  • Given an x, then for 90 of experiments l is in
    that region. This is the Neyman Construction

4
Frequentist and Bayesian
  • This construction is frequentist no
    probability is assigned to l a physical quantity
  • Bayesian point of view probability refers to
    state of knowledge of parameter and l can have a
    probability
  • Formerly a war between the two views.
  • People starting to realize each side has some
    merits and uses war abating

5
Ambiguities
  • At a given l, 90 of the time x will fall in
    region, but do you want 5 on each side or 8 on
    lower and 2 on upper?
  • Useful ordering principle introduced into physics
    by Feldman and Cousins Choose the region to
    have the largest values of Rlikelihood this l
    given x / best likelihood of any physical l given
    x
  • Always gives a region goes automatically from
    limits to regions

6
But is it new?
  • Feldman and Cousins soon realized this was a
    standard statistical technique described in a
    text (Kendall and Stuart)
  • Physicists have, in the past, often ignored
    statistical literature to the detriment of both
    physicists and statisticians
  • In recent years, helped by conferences on
    statistics in physics since 2000, there has been
    more and more cooperation

7
Nuisance Parameters
  • Experiments may depend on background, efficiency
    which are not the targets of the experiment, but
    are needed to get to the physical parameter l
  • These are called nuisance parameters
  • The expectation values may be well known or have
    an appreciable uncertainty.

8
Problem with Feldman Cousins
  • Karmen experiment in 1999 reported results. They
    were checking LSND expt.
  • Background was known to be 2.83/-0.13 events.
  • They observed 0 events and set a limit on lambda
    using FC at 1.1 at 90 CL
  • Common sense if 0 signal, 2.3 is 90 CL
  • FC ordering is P given data, BUT 90 CL is
    overall P, not P given data

9
Attempts to Improve Estimate
  • With a statistician, Michael Woodroofe, I tried
    different methods
  • Suppose try Bayesian method, taking a prior
    (initial) probability for l uniform.
  • Obtain credible limit (Bayesian equivalent of CL)
  • Go back to frequentist view and look at coverage
  • Quite close to frequentist and with small
    modification, very close in almost all regions,
    but gives 2.5 for Karmen limit close to desired
    2.3

10
Nuisance Variables with Significant Uncertainty
  • Can draw a 90 CL region for joint probability of
    l and b (nuisance par.)
  • Project onto l axis and take extreme values for
    CL
  • Safe, but often grossly over-covers

11
Nuisance Parameters 2
  • 1. Integrate over nuisance parameter b using
    measured probability of b. Introduces Bayesian
    concept for b. Tends to over-cover and there are
    claims of under-coverage
  • 2. Suppose max. likelihood solution has values L,
    B. Suppose, given a l, the maximum likelihood
    solution for b is bl. Consider
  • R likelihood(xl,bl)/
    likelihood(xL,B)
  • (Essentially FC ratio)
  • Let Gl,b probl,b(RgtC) CL
  • Approximate Gl,b approx Gl,bl

12
Nuisance Parameters 3
  • Use this and make a full Neyman construction.
    Good coverage for a number of examples, OR
  • Assume -2ln R is approximately a c2 distribution.
    (It is asymptotically.) This is method of MINOS
    in MINUIT. Coverage good in some recent cases.
    Clipping required for nuisance parameters far
    from expected values.

13
Data Classification
  • Given a set of events to separate into signal and
    background and some partial knowledge in the form
    of set of particle identification (PID)
    variables.
  • Make a series of cuts on PID variablesoften
    inefficient.
  • Neural net. Invented by John Hopfield as a
    method the brain might use to learn.
  • Newer methodsboosting,

14
Neural Nets and Modern Methods
  • Use a training sample of events for which you
    know which are signal and which are background.
  • Practice an algorithm on this set, updating it
    and trying to find best discrimination.
  • Need second unbiased set to test result on, the
    test sample.
  • If the test set was used to determine parameters
    or stopping point of algorithm, need a third set,
    verification sample
  • Results here for testing samples. Verification
    samples in our tests gave essentially same
    results.

15
Neural Network Structure
  • Combine the features in a non-linear way to a
    hidden layer and then to a final layer
  • Use a training set to find the best wik to
    distinguish signal and background

16
Intuition
  • Neural nets and most modern methods use PID
    variables in complicated non-linear ways.
    Intuition is somewhat difficult
  • However, they are often much more efficient than
    cuts and are used more and more.
  • I will not discuss neural nets further, but will
    discuss modern methodsboosting,etc.

17
Boosted Decision Trees
  • What is a decision tree?
  • What is boosting the decision trees?
  • Two algorithms for boosting.

18
Decision Tree
Background/Signal
  • Go through all PID variables and find best
    variable and value to split events.
  • For each of the two subsets repeat the process
  • Proceeding in this way a tree is built.
  • Ending nodes are called leaves.

19
Select Signal and Background Leaves
  • Assume an equal weight of signal and background
    training events.
  • If more than ½ of the weight of a leaf
    corresponds to signal, it is a signal leaf
    otherwise it is a background leaf.
  • Signal events on a background leaf or background
    events on a signal leaf are misclassified.

20
Criterion for Best Split
  • Purity, P, is the fraction of the weight of a
    leaf due to signal events.
  • Gini Note that gini is 0 for all signal or all
    background.
  • The criterion is to minimize ginileft giniright
    of the two children from a parent node

21
Criterion for Next Branch to Split
  • Pick the branch to maximize the change in gini.
  • Criterion giniparent giniright-child
    ginileft-child

22
Decision Trees
  • This is a decision tree
  • They have been known for some time, but often are
    unstable a small change in the training sample
    can produce a large difference.

23
Boosting the Decision Tree
  • Give the training events misclassified under this
    procedure a higher weight.
  • Continuing build perhaps 1000 trees and do a
    weighted average of the results (1 if signal
    leaf, -1 if background leaf).

24
Two Commonly used Algorithms for changing weights
  • 1. AdaBoost
  • 2. Epsilon boost (shrinkage)

25
Definitions
  • Xi set of particle ID variables for event i
  • Yi 1 if event i is signal, -1 if background
  • Tm(xi) 1 if event i lands on a signal leaf of
    tree m and -1 if the event lands on a background
    leaf.

26
AdaBoost
  • Define err_m weight wrong/total weight

Increase weight for misidentified events
27
Scoring events with AdaBoost
  • Renormalize weights
  • Score by summing over trees

28
e-Boost (shrinkage)
  • After tree m, change weight of misclassified
    events, typical e 0.01 (0.03). For
    misclassfied events
  • Renormalize weights
  • Score by summing over trees

29
Unwgted, Wgted Misclassified Event Rate vs No.
Trees
30
Comparison of methods
  • e-boost changes weights a little at a time
  • Let y1 for signal, -1 for bkrd, Tscore summed
    over trees
  • AdaBoost can be shown to try to optimize each
    change of weights. exp(-yT) is minimized
  • The optimum value is
  • T½ log odds probability that y is 1 given x

31
Tests of Boosting Parameters
  • 45 Leaves seemed to work well for our application
  • 1000 Trees was sufficient (or over-sufficient).
  • AdaBoost with b about 0.5 and e-Boost with e
    about 0.03 worked well, although small changes
    made little difference.
  • For other applications these numbers may need
    adjustment
  • For MiniBooNE need around 50-100 variables for
    best results. Too many variables degrades
    performance.
  • Relative ratio const.(fraction bkrd kept)/
  • (fraction
    signal kept). Smaller is better!

32
Effects of Number of Leaves and Number of Trees
Smaller is better! R c X frac. sig/frac. bkrd.
33
Number of feature variables in boosting
  • In recent trials we have used 182 variables.
    Boosting worked well.
  • However, by looking at the frequency with which
    each variable was used as a splitting variable,
    it was possible to reduce the number to 86
    without loss of sensitivity. Several methods for
    choosing variables were tried, but this worked as
    well as any
  • After using the frequency of use as a splitting
    variable, some further improvement may be
    obtained by looking at the correlations between
    variables.

34
Effect of Number of PID Variables
35
Comparison of Boosting and ANN
  • Relative ratio here is ANN bkrd kept/Boosting
    bkrd kept. Greater than one implies boosting
    wins!
  • A. All types of background events. Red is 21
    and black is 52 training var.
  • B. Bkrd is p0 events. Red is 22 and black is 52
    training variables

Percent nue CCQE kept
36
Robustness
  • For either boosting or ANN, it is important to
    know how robust the method is, i.e. will small
    changes in the model produce large changes in
    output.
  • In MiniBooNE this is handled by generating many
    sets of events with parameters varied by about 1s
    and checking on the differences. This is not
    complete, but, so far, the selections look quite
    robust for boosting.

37
How did the sensitivities change with a new
optical model?
  • In Nov. 04, a new, much changed optical model of
    the detector was introduced for making MC events
  • The reconstruction tunings needed to be changed
    to optimize fits for this model
  • Using the SAME PID variables as for the old
    model
  • For a fixed background contamination of p0
    events fraction of signal kept dropped by 8.3
    for boosting and dropped by 21.4 for ANN

38
For ANN
  • For ANN one needs to set temperature, hidden
    layer size, learning rate There are lots of
    parameters to tune.
  • For ANN if one
  • a. Multiplies a variable by a
    constant,
  • var(17)? 2.var(17)
  • b. Switches two variables
  • var(17)??var(18)
  • c. Puts a variable in twice
  • The result is very likely to change.

39
For Boosting
  • Only a few parameters and once set have been
    stable for all calculations within our
    experiment.
  • Let yf(x) such that if x1gtx2 then y1gty2, then
    the results are identical as they depend only on
    the ordering of values.
  • Putting variables in twice or changing the order
    of variables has no effect.

40
Tests of Boosting Variants
  • None clearly better than AdaBoost or EpsilonBoost
  • I will not go over most, except Random Forests
  • For Random Forest, one uses only a random
    fraction of the events (WITH replacement) per
    tree and only a random fraction of the variables
    per node. NO boosting is usedjust many trees.
    Each tree should go to completionevery node very
    small or pure signal or background
  • Our UM programs werent designed well for this
    many leaves and better results (Narsky) have been
    obtainedbut not better than boosting.

41
(No Transcript)
42
Can Convergence Speed be Improved?
  • Removing correlations between variables helps.
  • Random Forest WHEN combined with boosting.
  • Softening the step function scoring
    y(2purity-1) score sign(y)sqrt(y).

43
Soft Scoring and Step Function
44
Performance of AdaBoost with Step Function and
Soft Scoring Function
45
Conclusions for Nuisance Variables
  • Likelihood ratio methods seem very useful as an
    organizing principle with or without nuisance
    variables
  • Some problems in extreme cases where data is much
    smaller than is expected
  • Several tools for handling nuisance variables
    were described. The method using approx.
    likelihood to construct Neyman region seems to
    have good performance.

46
References for Nuisance Variables 1
  • J. Neyman, Phil. Trans. Royal Soc., London A333
    (1937).
  • G.J. Feldman and R.D. Cousins, Phys. Rev.
    D57,3873 (1998).
  • A. Stuart, K. Ord, and S. Arnold, Kendalls
    Advanced Theory of Statistics, Vol 2A, 6th ed.,
    (London 1999).
  • R. Eitel and B. Zeitnitz, hep-ex/9809007.
  • The LSND collaboration, C. Athanassopoulos et.
    al. Phys. Rev. Lett. 75, 2650(1995) Phys. Rev.
    Lett. 77, 3082(1996) Phys. Rev. C54, 2685
    (1996) Phys. Rev. D64, 112008 (2001).
  • B. P. Roe and M.B. Woodroofe, Phys. Rev. D60,
    053009 (1999).
  • B.P. Roe and M.B. Woodroofe, Phys. Rev. D63,
    013009 (2001).

47
References for Nuisance Variables 2
  • R. Cousins and V.L. Highland, Nucl. Instrum.
    Meth. A 320, 331 (1992).
  • J. Conrad, O. Botner, A. Hallgren and C. Perez de
    los Heros, Phys. Rev. D67, 118101 (2003).
  • R.D. Cousins, to appear in Proceedings of
    PHYSTAT2005 Statistical Problems in Particle
    Physics, Astrophysics, and Cosmology (2005).
  • K.S. Cranmer, Proceedings of PHYSTAT2003
    Statistical Problems in Particle Physics,
    Astrophysics and Cosmology, 261 (2003).
  • G, Punzi, to appear in Proceedings of
    PHYSTAT2005.
  • F. James and M. Roos, Nucl. Phys. B172, 475
    (1980).
  • W.A. Rolke and A.M. Lopez, Nucl. Intrum. Meth.
    A458, 745 (2001).
  • W.A. Rolke, A.M. Lopez and J. Conrad, Nucl.
    Instrum. Meth. A551, 493 (2005).

48
Conclusions For Classification
  • Boosting is very robust. Given a sufficient
    number of leaves and trees AdaBoost or
    EpsilonBoost reaches an optimum level, which is
    not bettered by any variant tried.
  • Boosting was better than ANN in our tests by
    1.2-1.8.
  • There are ways (such as the smooth scoring
    function) to increase convergence speed in some
    cases.
  • Several techniques can be used for weeding
    variables. Examining the frequency with which a
    given variable is used works reasonably well.
  • Downloads in FORTRAN or C available at
  • http//www.gallatin.physics.lsa.umich.edu/ro
    e/

49
References for Boosting
  • R.E. Schapire The strength of weak
    learnability. Machine Learning 5 (2), 197-227
    (1990). First suggested the boosting approach
    for 3 trees taking a majority vote
  • Y. Freund, Boosting a weak learning algorithm
    by majority, Information and Computation 121
    (2), 256-285 (1995) Introduced using many trees
  • Y. Freund and R.E. Schapire, Experiments with
    an new boosting algorithm, Machine Learning
    Proceedings of the Thirteenth International
    Conference, Morgan Kauffman, SanFrancisco,
    pp.148-156 (1996). Introduced AdaBoost
  • J. Friedman, Recent Advances in Predictive
    (Machine) Learning, Proceedings of PHYSTAT2003
    Statistical Problems in Particle Physics,
    Astrophyswics and Cosmology, 196 (2003).
  • J. Friedman, T. Hastie, and R. Tibshirani,
    Additive logistic regression a statistical
    view of boosting, Annals of Statistics 28 (2),
    337-407 (2000). Showed that AdaBoost could be
    looked at as successive approximations to a
    maximum likelihood solution.
  • T. Hastie, R. Tibshirani, and J. Friedman, The
    Elements of Statistical Learning Springer
    (2001). Good reference for decision trees and
    boosting.
  • B.P. Roe et. al., Boosted decision trees as an
    alternative to artificial neural networks for
    particle identification, NIM A543, pp. 577-584
    (2005).
  • Hai-Jun Yang, Byron P. Roe, and Ji Zhu, Studies
    of Boosted Decision Trees for MiniBooNE Particle
    Identification, Physics/0508045, NIM A555,
    370-385 (2005).

50
Adaboost Output for Training and Test Samples
51
The MiniBooNE Collaboration
52
40 D tank, mineral oil, surrounded by about 1280
photomultipliers. Both Cher. and scintillation
light. Geometrical shape and timing
distinguishes events
53
Numerical Results from sfitter (a second
reconstruction program)
  • Extensive attempt to find best variables for ANN
    and for boosting starting from about 3000
    candidates
  • Train against pi0 and related backgrounds22 ANN
    variables and 50 boosting variables
  • For the region near 50 of signal kept, the
    ratio of ANN to boosting background was about 1.2

54
Post-Fitting
  • Post-Fitting is an attempt to reweight the trees
    when summing tree scores after all the trees are
    made
  • Two attempts produced only a very modest (few ),
    if any, gain.

55
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com