An update on the Goodness of Fit Statistical Toolkit - PowerPoint PPT Presentation

About This Presentation
Title:

An update on the Goodness of Fit Statistical Toolkit

Description:

Anderson-Darling test. Anderson-Darling approximated test. Cramer-von Mises test ... Anderson-Darling. Unbinned Distributions. Binned Distributions. AVERAGE CPU TIME ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 21
Provided by: mariagr
Category:

less

Transcript and Presenter's Notes

Title: An update on the Goodness of Fit Statistical Toolkit


1
An update on the Goodness of Fit Statistical
Toolkit
  • B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon,
    P. Viarengo

4th Geant4 Space Users Workshop
http//www.ge.infn.it/geant4/analysis/HEPstatistic
s http//www.ge.infn.it/statisticaltoolkit
2
Goodness of Fit testing
Goodness-of-fit testing is the mathematical
foundation for the comparison of data
distributions
  • Regression testing
  • Throughout the software life-cycle
  • Online DAQ
  • Monitoring detector behaviour w.r.t. a reference
  • Simulation validation
  • Comparison with experimental data
  • Reconstruction
  • Comparison of reconstructed vs. expected
    distributions
  • Physics analysis
  • Comparison with theoretical distributions
  • Comparisons of experimental distributions

Use cases in experimental physics
3
(No Transcript)
4
Software process guidelines
  • Adopt a process
  • software quality
  • Unified Process, specifically tailored to the
    project
  • practical guidance and tools from the RUP
  • both rigorous and lightweight
  • mapping onto ISO 15504 (and CMM)
  • Incremental and iterative life-cycle
  • 1st cycle 2-sample GoF tests
  • 1-sample GoF in preparation

5
Architectural guidelines
  • The project adopts a solid architectural approach
  • to offer the functionality and the quality needed
    by the users
  • to be maintainable over a large time scale
  • to be extensible, to accommodate future
    evolutions of the requirements
  • Component-based architecture
  • to facilitate re-use and integration in diverse
    frameworks
  • layer architecture pattern
  • core component for statistical computation
  • independent components for interface to user
    analysis environments
  • Dependencies
  • no dependence on any specific analysis tool
  • can be used by any analysis tools, or together
    with any analysis tools
  • offer a (HEP) standard (AIDA) for the user layer

6
(No Transcript)
7
The algorithms are specialised on the kind of
distribution (binned/unbinned)
8
GoF algorithms in the Statistical Toolkit
TWO-SAMPLE PROBLEM
  • Binned distributions
  • Anderson-Darling test
  • Anderson-Darling approximated test
  • Chi-squared test
  • Fisz-Cramer-von Mises test
  • Tiku test (Cramer-von Mises test in chi-squared
    approximation)

It is the most complete software for the
comparison of two distributions, even among
commercial/professional statistics tools. It
provides all 2-sample (edf) GoF algorithms
existing in statistics literature
  • Unbinned distributions
  • Anderson-Darling test
  • Anderson-Darling approximated test
  • Cramer-von Mises test
  • Generalised Girone test
  • Goodman test (Kolmogorov-Smirnov test in
    chi-squared approximation)
  • Kolmogorov-Smirnov test
  • Kuiper test
  • Tiku test (Cramer-von Mises test in chi-squared
    approximation)
  • Weighted Kolmogorov-Smirnov test (2 flavours)
  • Weighted Cramer-von Mises test

9
User Layer
  • Simple user layer
  • Shields the user from the complexity of the
    underlying algorithms and design
  • Only deal with the users analysis objects and
    choice of comparison algorithm
  • First release user layer for AIDA analysis
    objects
  • LCG Architecture Blueprint, Geant4 requirement
  • Second release added user layer for ROOT
    analysis objects
  • in response to user requirements

10
Which test to use?
  • Do we really need such a wide collection of GoF
    tests? Why?
  • Which is the most appropriate test to compare two
    distributions?
  • How good is a test at recognizing real
    equivalent distributions and rejecting fake ones?
  • The choice of the most suitable GoF test can be
    performed on the basis of two different criteria
  • Computational performance
  • Statistical performance (power)

11
A) Performance of the GoF tests
AVERAGE CPU TIME Binned Distributions Unbinned Distributions
Anderson-Darling (0.690.01) ms (16.90.2) ms
Anderson-Darling (approximated) (0.600.01) ms (16.10.2) ms
Chi-squared (0.550.01) ms
Cramer-von Mises (0.440.01) ms (16.30.2) ms
Generalised Girone (15.90.2) ms
Goodman (11.90.1) ms
Kolmogorov-Smirnov (8.90.1) ms
Kuiper (12.10.1) ms
Tiku (0.690.01) ms (16.70.2) ms
Watson (14.20.1) ms
Weighted Kolmogorov-Smirnov (AD) (14.00.1) ms
Weighted Kolmogorov-Smirnov (Buning) (14.00.1) ms
Weighted Cramer-von Mises (14.00.1) ms
12
B) Power of GoF tests
The power of a test is the probability of
rejecting the null hypothesis correctly
  • Systematic study of all existing GoF tests in
    progress
  • made possible by the extensive collection of
    tests in the Statistical Toolkit
  • GoF tests power evaluated in a variety of
    alternative situations considered
  • No clear winner the statistical performance of a
    test depends on the features of the distributions
    to be compared (skewness and tailweight) and on
    the sample size
  • Practical recommendations
  • first classify the type of the distributions in
    terms of skewness and tailweight
  • choose the most appropriate test given the type
    of distributions evaluating the best test by
    means of the quantitative model proposed
  • Topic still subject to research activity in the
    domain of statistics

General recipe
plt0.0001
13
Examples of practical applications
14
Statistical Toolkit Usage
  • Geant4 physics validation
  • rigorous approach quantitative evaluation of
    Geant4 physics models with respect to established
    reference data
  • see for instance K. Amako et al., Comparison of
    Geant4 electromagnetic physics models against the
    NIST reference dataIEEE Trans. Nucl. Sci. 52-
    4 (2005) 910-918
  • LCG Simulation Validation project
  • see for instance A. Ribon, Testing Geant4 with a
    simplified calorimeter setup, http//www.ge.infn.i
    t/geant4/events/july2005
  • CMS
  • validation of new histograms w.r.t. reference
    ones in OSCAR Validation Suite
  • Usage also in space science, medicine,
    statistics, etc.

15
Validation of Geant4 e.m. physics models vs. NIST
reference data
Experimental set-up
Electron Stopping Power
centre
p-value stability study
Geant4 LowE Penelope Geant4 Standard Geant4 LowE
EEDL NIST - XCOM
c2 test (to include data uncertainties in the
computation of the test statistics value)
p-value
Geant4 LowE Penelope Geant4 Standard Geant4 LowE
EEDL
The three Geant4 models are equivalent
H0 REJECTION AREA
Z
16
Validation of Geant4 Atomic Relaxation vs NIST
reference data
Shell-end Kolmogorov-Smirnov D p-value
10 0.0192 1
11 0.0175 1
13 0.0250 1
14 0.0256 1
18 0.0294 1
19 0.0312 1
21 0.1429 0.997085
22 0.0588 1
Fluorescence - Shell-start 3
? Geant4 ? NIST
17
Validation of Geant4 electromagnetic and hadronic
models against proton data
  • Low Energy EM ICRU49 p, ions
  • Low Energy EM Livermore g, e-
  • Standard EM e
  • HadronElastic with BertiniElastic
  • Bertini Inelastic

LowE EM ICRU49
BertiniElastic
Bertini Inelastic
0.5 M events
p-value p-value p-value
CvM KS AD
Left branch 0.977
Right branch 0.985
Whole curve 0.994
CvM Cramer-von Mises test KS
Kolmogorov-Smirnov test AD Anderson-Darling
test
Geant4 Experimental data
mm
18
Test beam at Bessy Bepi-Colombo mission
c2 not appropriate (lt 5 entries in some bins,
physical information would be lost if rebinned)
Experimental measurements are comparable with
Geant4 simulations
Anderson-Darling Ac (95) 0.752
19
Comparison of alternative vehicle concepts in
human missions to Mars
Reference rigid structures as in the ISS (2 - 4
cm Al)
  • Kolmogorov-Smirnov test
  • Multi-layer 10 cm water equivalent to 4 cm Al
  • Multi-layer 5 cm water equivalent to 2.15 cm
    Al

Shielding material Energy deposited in phantom (MeV) Energy deposited in phantom (MeV) Energy deposited in phantom (MeV)
Shielding material EM Bertini Binary
ML 5 cm water 73.5 0.3 130.2 0.5 119.3 0.4
ML 10 cm water 71.9 0.3 128.0 0.5 117.3 0.5
4 cm Al 72.9 0.3 127.5 0.5 117.0 0.4
2.15 cm Al 73.9 0.3 130.5 0.5 119.3 0.5
Inflatable habitat vs a conventional rigid
habitat
An inflatable habitat exhibits a shielding
capability equivalent to a conventional rigid one
20
Conclusions
  • A novel, complete software software toolkit for
    statistical analysis is being developed
  • all the two-sample GoF tests available in
    statistical domain chi-squared test
  • rigorous architectural design
  • rigorous software process
  • It is the most complete software for the
    comparison of two distributions, even among
    commercial/professional statistics tools.
  • A systematic study of the power of GoF tests is
    in progress
  • unexplored area of research
  • Application in various domains
  • Geant4, HEP, space science, medicine
  • Feedback and suggestions are very much
    appreciated
Write a Comment
User Comments (0)
About PowerShow.com