Title: An update on the Goodness of Fit Statistical Toolkit
1An update on the Goodness of Fit Statistical
Toolkit
- B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon,
P. Viarengo
4th Geant4 Space Users Workshop
http//www.ge.infn.it/geant4/analysis/HEPstatistic
s http//www.ge.infn.it/statisticaltoolkit
2Goodness of Fit testing
Goodness-of-fit testing is the mathematical
foundation for the comparison of data
distributions
- Regression testing
- Throughout the software life-cycle
- Online DAQ
- Monitoring detector behaviour w.r.t. a reference
- Simulation validation
- Comparison with experimental data
- Reconstruction
- Comparison of reconstructed vs. expected
distributions - Physics analysis
- Comparison with theoretical distributions
- Comparisons of experimental distributions
Use cases in experimental physics
3(No Transcript)
4Software process guidelines
- Adopt a process
- software quality
- Unified Process, specifically tailored to the
project - practical guidance and tools from the RUP
- both rigorous and lightweight
- mapping onto ISO 15504 (and CMM)
- Incremental and iterative life-cycle
- 1st cycle 2-sample GoF tests
- 1-sample GoF in preparation
5Architectural guidelines
- The project adopts a solid architectural approach
- to offer the functionality and the quality needed
by the users - to be maintainable over a large time scale
- to be extensible, to accommodate future
evolutions of the requirements - Component-based architecture
- to facilitate re-use and integration in diverse
frameworks - layer architecture pattern
- core component for statistical computation
- independent components for interface to user
analysis environments - Dependencies
- no dependence on any specific analysis tool
- can be used by any analysis tools, or together
with any analysis tools - offer a (HEP) standard (AIDA) for the user layer
6(No Transcript)
7The algorithms are specialised on the kind of
distribution (binned/unbinned)
8GoF algorithms in the Statistical Toolkit
TWO-SAMPLE PROBLEM
- Binned distributions
- Anderson-Darling test
- Anderson-Darling approximated test
- Chi-squared test
- Fisz-Cramer-von Mises test
- Tiku test (Cramer-von Mises test in chi-squared
approximation)
It is the most complete software for the
comparison of two distributions, even among
commercial/professional statistics tools. It
provides all 2-sample (edf) GoF algorithms
existing in statistics literature
- Unbinned distributions
- Anderson-Darling test
- Anderson-Darling approximated test
- Cramer-von Mises test
- Generalised Girone test
- Goodman test (Kolmogorov-Smirnov test in
chi-squared approximation) - Kolmogorov-Smirnov test
- Kuiper test
- Tiku test (Cramer-von Mises test in chi-squared
approximation) - Weighted Kolmogorov-Smirnov test (2 flavours)
- Weighted Cramer-von Mises test
9User Layer
- Simple user layer
- Shields the user from the complexity of the
underlying algorithms and design - Only deal with the users analysis objects and
choice of comparison algorithm - First release user layer for AIDA analysis
objects - LCG Architecture Blueprint, Geant4 requirement
- Second release added user layer for ROOT
analysis objects - in response to user requirements
10Which test to use?
- Do we really need such a wide collection of GoF
tests? Why? - Which is the most appropriate test to compare two
distributions? - How good is a test at recognizing real
equivalent distributions and rejecting fake ones? - The choice of the most suitable GoF test can be
performed on the basis of two different criteria - Computational performance
- Statistical performance (power)
11A) Performance of the GoF tests
AVERAGE CPU TIME Binned Distributions Unbinned Distributions
Anderson-Darling (0.690.01) ms (16.90.2) ms
Anderson-Darling (approximated) (0.600.01) ms (16.10.2) ms
Chi-squared (0.550.01) ms
Cramer-von Mises (0.440.01) ms (16.30.2) ms
Generalised Girone (15.90.2) ms
Goodman (11.90.1) ms
Kolmogorov-Smirnov (8.90.1) ms
Kuiper (12.10.1) ms
Tiku (0.690.01) ms (16.70.2) ms
Watson (14.20.1) ms
Weighted Kolmogorov-Smirnov (AD) (14.00.1) ms
Weighted Kolmogorov-Smirnov (Buning) (14.00.1) ms
Weighted Cramer-von Mises (14.00.1) ms
12B) Power of GoF tests
The power of a test is the probability of
rejecting the null hypothesis correctly
- Systematic study of all existing GoF tests in
progress - made possible by the extensive collection of
tests in the Statistical Toolkit - GoF tests power evaluated in a variety of
alternative situations considered - No clear winner the statistical performance of a
test depends on the features of the distributions
to be compared (skewness and tailweight) and on
the sample size - Practical recommendations
- first classify the type of the distributions in
terms of skewness and tailweight - choose the most appropriate test given the type
of distributions evaluating the best test by
means of the quantitative model proposed - Topic still subject to research activity in the
domain of statistics
General recipe
plt0.0001
13Examples of practical applications
14Statistical Toolkit Usage
- Geant4 physics validation
- rigorous approach quantitative evaluation of
Geant4 physics models with respect to established
reference data - see for instance K. Amako et al., Comparison of
Geant4 electromagnetic physics models against the
NIST reference dataIEEE Trans. Nucl. Sci. 52-
4 (2005) 910-918 - LCG Simulation Validation project
- see for instance A. Ribon, Testing Geant4 with a
simplified calorimeter setup, http//www.ge.infn.i
t/geant4/events/july2005 - CMS
- validation of new histograms w.r.t. reference
ones in OSCAR Validation Suite - Usage also in space science, medicine,
statistics, etc.
15Validation of Geant4 e.m. physics models vs. NIST
reference data
Experimental set-up
Electron Stopping Power
centre
p-value stability study
Geant4 LowE Penelope Geant4 Standard Geant4 LowE
EEDL NIST - XCOM
c2 test (to include data uncertainties in the
computation of the test statistics value)
p-value
Geant4 LowE Penelope Geant4 Standard Geant4 LowE
EEDL
The three Geant4 models are equivalent
H0 REJECTION AREA
Z
16Validation of Geant4 Atomic Relaxation vs NIST
reference data
Shell-end Kolmogorov-Smirnov D p-value
10 0.0192 1
11 0.0175 1
13 0.0250 1
14 0.0256 1
18 0.0294 1
19 0.0312 1
21 0.1429 0.997085
22 0.0588 1
Fluorescence - Shell-start 3
? Geant4 ? NIST
17Validation of Geant4 electromagnetic and hadronic
models against proton data
- Low Energy EM ICRU49 p, ions
- Low Energy EM Livermore g, e-
- Standard EM e
- HadronElastic with BertiniElastic
- Bertini Inelastic
LowE EM ICRU49
BertiniElastic
Bertini Inelastic
0.5 M events
p-value p-value p-value
CvM KS AD
Left branch 0.977
Right branch 0.985
Whole curve 0.994
CvM Cramer-von Mises test KS
Kolmogorov-Smirnov test AD Anderson-Darling
test
Geant4 Experimental data
mm
18Test beam at Bessy Bepi-Colombo mission
c2 not appropriate (lt 5 entries in some bins,
physical information would be lost if rebinned)
Experimental measurements are comparable with
Geant4 simulations
Anderson-Darling Ac (95) 0.752
19Comparison of alternative vehicle concepts in
human missions to Mars
Reference rigid structures as in the ISS (2 - 4
cm Al)
- Kolmogorov-Smirnov test
- Multi-layer 10 cm water equivalent to 4 cm Al
- Multi-layer 5 cm water equivalent to 2.15 cm
Al
Shielding material Energy deposited in phantom (MeV) Energy deposited in phantom (MeV) Energy deposited in phantom (MeV)
Shielding material EM Bertini Binary
ML 5 cm water 73.5 0.3 130.2 0.5 119.3 0.4
ML 10 cm water 71.9 0.3 128.0 0.5 117.3 0.5
4 cm Al 72.9 0.3 127.5 0.5 117.0 0.4
2.15 cm Al 73.9 0.3 130.5 0.5 119.3 0.5
Inflatable habitat vs a conventional rigid
habitat
An inflatable habitat exhibits a shielding
capability equivalent to a conventional rigid one
20Conclusions
- A novel, complete software software toolkit for
statistical analysis is being developed - all the two-sample GoF tests available in
statistical domain chi-squared test - rigorous architectural design
- rigorous software process
- It is the most complete software for the
comparison of two distributions, even among
commercial/professional statistics tools. - A systematic study of the power of GoF tests is
in progress - unexplored area of research
- Application in various domains
- Geant4, HEP, space science, medicine
- Feedback and suggestions are very much
appreciated