POOLED DATA DISTRIBUTIONS - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

POOLED DATA DISTRIBUTIONS

Description:

Alan Steele, Ken Hill, and Rob Douglas. National Research Council of Canada ... Steele, Hill, and Douglas: Pooled Data Distributions. 3. The Normal Approach ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 17
Provided by: ALANS65
Category:

less

Transcript and Presenter's Notes

Title: POOLED DATA DISTRIBUTIONS


1
POOLED DATA DISTRIBUTIONS
National Research Conseil national Council
Canada de recherches
  • GRAPHICAL AND STATISTICAL TOOLS FOR EXAMINING
    COMPARISON REFERENCE VALUES
  • Alan Steele, Ken Hill, and Rob Douglas
  • National Research Council of Canada
  • E-mail alan.steele_at_nrc.ca

Measurement comparison data sets are generally
summarized using a simple statistical reference
value calculated from the pool of the
participants results. Consideration of the
comparison data sets, particularly with regard to
the consequences and implications of such data
pooling, can allow informed decisions regarding
the appropriateness of choosing a simple
statistical reference value. Graphs of the
relevant distributions provide insight to this
problem.
2
Introduction
  • Comparison data collection and analysis continues
    to grow in importance among the tasks of
    international metrology
  • Sample distributions and populations are
    routinely considered when preparing the summary
    of the comparison
  • Reference values (KCRVs) are often calculated
    from the measurement data supplied by the
    participants
  • We believe that graphical techniques are an aid
    to understanding and communication in this field

3
The Normal Approach
  • Generally, initial implicit assumption is to
    consider that all of the participants data, as
    xi/ui, represent individual samples from a single
    (normal) population
  • A coherent picture of the population mean and
    standard deviation can be built from the
    comparison data set that is fully consistent with
    the reported values and uncertainties
  • Most outlier-test protocols rely on this
    assumption to identify when and if a given
    laboratory result should be excluded, since its
    inclusion would violate this internal consistency

4
Pooled Data Distributions
  • Creating pooled data distributions tackles this
    problem from the opposite direction
  • The independent distributions reported by each
    participant (through their value and uncertainty)
    are summed directly
  • Result is taken as representative of the
    underlying population as revealed in the
    comparison measurements
  • Monte Carlo methods are useful when calculations
    involve Student distributions or medians rather
    than means

5
Monte Carlo Calculations
  • High quality linear congruent uniform random
    number generators are easy to find
  • Transformation from uniform to any distribution
    done via cumulative distribution
  • Example shows Student distribution transform
  • Our Excel Toolkit includes an external DLL for
    doing fast Monte Carlo simulations with multiple
    large arrays

6
Dealing with Student Distributions
  • Student Cumulative Distribution Functions for
    different Degrees of Freedom (n 210)
  • Note that the line at 97.5 cumulative
    probability crosses each curve at the coverage
    factor, k, appropriate for a 95 confidence
    interval

7
Example Data From KCDB
  • Recent results for CCAUV.U-K1
  • Low power, 1.9 MHz 5 Labs
  • Finite degrees of freedom specified for all
    participants
  • Data failed consistency check using weighted mean
  • Median chosen as KCRV

8
Statistical Distributions
  • Results of Monte Carlo simulation
  • lab distributions used to resample comparison
  • pooled data histogram incremented once for each
    lab per event
  • mean, weighted mean, and median calculated for
    each event
  • Population revealed by measurement is multi-modal
    and evidently not normal

9
Statistical Distributions
  • Results of Monte Carlo simulation
  • lab distributions used to resample comparison
  • pooled data histogram incremented once for each
    lab per event
  • mean, weighted mean, and median calculated for
    each event
  • Population revealed by measurement is multi-modal
    and evidently not normal

10
Advantages of Monte Carlo
  • Technique is simple to implement
  • Allows calculation of confidence intervals for
    statistics
  • Covariances can be accommodated in
    straightforward manner
  • Possible to include outlier rejection schemes
  • Easy to track quantities of interest, such as
    probability of a given participant being median
    laboratory
  • Can consider other candidate reference values

11
Example CCT-K3 Argon Point
  • Another example from KCDB
  • CCT-K3 Argon Triple Point
  • Large variation in reported values
  • Large variation in stated uncertainties
  • No KCRV was assigned, based on data pooling
    analysis

12
Algorithmic Reference Values
  • Linear combinations of simple estimators can be
    used as robust estimators of location
  • For CCT-K3, proposal to use simple average of
    mean, weighted mean, and median
  • Evaluation of any such algorithmic estimator is
    easy to do with Monte Carlo

13
Quantifying the Comparison
  • Calculating a reference value typically the
    variance-weighted mean or the median - is a
    routine part of reporting comparisons
  • The suitability of these statistics for
    representing the data set can be checked using
    chi-squared testing
  • It is also possible to perform such tests without
    invoking a reference value by considering the
    data in pair wise fashion
  • Advantages of pair-statistics
  • Always works, even before choosing a reference
    value
  • More rigorous, since can handle correlations
    exactly
  • Explicit, following metrological chains of
    inference

14
Pair-Difference Distributions
  • Similar to exclusive statistics
  • Consider difference between one lab and rest of
    world
  • Sum of per-lab differences is the
    all-pairs-difference (APD) distribution this is
    symmetric
  • Width of APD is a measure of global quality
    assurance for independent calibration of an
    artifact by two different labs chosen at random

15
Reduced Chi-Squared Testing
  • Normalizing the pair differences by the pair
    uncertaintiesallows us to build tests of the
    measurement capability claims
  • This is still independent of any chosen reference
    value
  • This All Pairs Difference reduced ?2 has N-1
    degrees of freedom
  • If a data set fails the APD ?2 test, it will fail
    for every possible KCRV

APD
16
Conclusions
  • Monte Carlo technique is fast and simple to
    implement
  • Graphs provide a powerful tool for visual
    consideration of
  • Pooled data (sum distribution)
  • Simple Estimators (mean, weighted mean, median)
  • Other Estimators (any algorithm can be used)
  • All-pairs reduced chi-squared statistic is
    egalitarian over participants, and independent of
    choice of KCRV
  • No single choice of KCRV can adequately represent
    a comparison that fails the all-pairs-difference
    chi-squared test
Write a Comment
User Comments (0)
About PowerShow.com