Title: Confidence in Metrology: At the National Lab
1Confidence in MetrologyAt the National Lab
On the Shop Floor
National Research Conseil national Council
Canada de recherches
- Alan Steele, Barry Wood Rob Douglas
- National Research Council
- Ottawa, CANADA
- e-mail alan.steele_at_nrc.ca
2Outline
- Measurements
- Communications, Comparisons
- Fluctuations, Predictions
- Confidence
- Comparisons, Proficiency Tests, on the Shop Floor
- Probability Calculus
- confidence intervals
- confidence levels
- A Toolkit for Excel
- some Visual Basic Code
- A Worked Example
- with real comparison data
- Conclusions
3Measurement means Communication
- The sole purpose of measurement is to communicate
an aspect of physical reality from one person,
place and time to another person, place and
time. or autonomous system for which a person
is responsible
Alas, my work is all in vain If it doesnt
get to Roundheads brain
- The two people must have in common an
understanding of the measurand a system of
numbers and units of measurement a means for
describing measurement accuracy
4Measurement means Comparison
- Any useful measurement is a comparison
- The world uses the SI to provide a network that
can inter-relate most of these comparisons - The implied inter-relationships are checked by
special Comparisons for Quality Assurance Shop
floor Calibrations (with NMIs) Proficiency
Demonstrations (with NMIs) Bilateral
Comparisons (between NMIs) Regional
Comparisons (among NMIs) CIPM Key Comparisons
(among NMIs) - At NMIs, special definition-based comparisons are
also required for the kelvin, second, kilogram
etc.
5Measurement means Fluctuations
- Usually, fluctuations can be observed in a
measurement even when we try to keep everything
as constant as possible - WE INCLUDE THIS
- Usually, larger fluctuations are observed as
temperature, pressure, humidity are allowed to
vary - WE INCLUDE THIS - Usually, we anticipate an even larger range of
fluctuations if the measurement were to be made
by other reasonable means - WE INCLUDE THIS IN
STANDARD UNCERTAINTY
6Measurement means Prediction
- The most useful aspect of a measurement is its
predictive ability, either explicit or implicit - The results of past comparisons are used to infer
results for future comparisons
?
- There is a challenge to relate environmental
conditions, history and aging to the accuracy
ofa future comparison
7Confidence as a Commodity
- Measurement Confidence starts with CIPM, BIPM,
CCs, definition-based standards and realizations - MRA, JCRB, Key Comparisons and Regional and
Bilateral Comparisons demonstrate confidence - Shared research and visits help develop
Confidence in equivalence to the SI - This system builds confidence at the National Lab
level
8Confidence as a Commodity
- You can buy Confidence from your NMI (NRC, NIST)
as calibration reports and Round-robin
proficiency tests - You can multiply Confidence in a well-run lab
(CLAS) or on your factory floor - You can sell Confidence as a commodity within
your organization, as well as to your
organizations clients - To market Confidence, it should be technically
rigorous and accessible to non-statisticians
9False Confidence
- Any technically unjustified confidence claim is
potentially very harmful to any calibration or
testing laboratorys reputation - Overly strong or technically wrong confidence
claims are potentially lethal or actionable - Sometimes clients need protection from
themselvesWhy do you have to measure it? I just
want a calibration certificate for it!! - Rigour and careful wording can avoid false
confidence
10Overly Complicated Confidence
- The equivalence study of eleven 10 Volt zeners
showed a difference of Lab A - Lab B 2.73?1.91
ppm with 230 degrees of freedom, where the ?1.91
is the expanded uncertainty corresponding to
approximately 95 confidence for a Student-t
distribution with 230 degrees of freedom, k1.97
times the pair standard uncertainty, 0.97 ppm, of
the pair difference determined from the internal
standard uncertainty statements of the
measurements from the two laboratories (?1.06 ppm
for Lab A and ?1.49 ppm for Lab B), with a
correlation coefficient of 0.76 accounting for a
covariance of 1.2x10-12. The external standard
deviation was also evaluated with 21 degrees of
freedom and gave a Birge ratio of 1.2. - There is a very limited market for this type of
Confidence Statement, which still requires the
user to deal with the 2.7 ppm bias it reveals...
11Simple Confidence Statements
- Lab A and Lab B are equivalent. Not Rigorous
- ... 10 V measurements from Lab A and Lab B can
be expected to agree with each other within 4.3
ppm, 19 times out of 20.Has potential
12Improved Confidence Statements
- The Mutual Recognition Arrangement formalizes the
Key Comparison differences as the preferred means
for generating confidence about equivalence - New methods are being used to transform
comparisons into statements of confidence
likeOn the basis of this Comparison, similar
measurements made by Lab A and Lab B can be
expected to agree with each other to within 4.3
ppm, with 95 confidenceor 19 times out of
20. - This clearer Confidence Statement has a wider
market
13Communicating with your Clients
- Clarity is important to
- Users of your measurements
- Your users management and QA managers
- Your users clients
- Your management
- Your NMI can help you to communicate confidence
clearly
14Confidence from NMIs
- The methods used to create statements of
confidence for Key Comparisonscan be used for
proficiency testing done by your NMI - Some calibration reports can also be used to
generate this type of confidence statement,
provided that the travel uncertainty of the
artefact is under proper statistical control.
15Confidence for your Clients
- The methods used to create statements of
confidence for Key Comparisons can be used for
proficiency testing done by you on your factory
floor - The statements are the simplest quantitative
expressions about the equivalence of two
measurement stations
16Proficiency Testing
- Accreditation bodies routinely specify that
proficiency testing on a regularly scheduled
basis is a requirement for maintaining
accreditation - Usually the Pilot Laboratory for the comparison
is the National Metrology Institute - Usually the Pilot Laboratory result is taken as
the comparison reference value, and the
participants are initially evaluated against
this truth - This is a time-consuming and expensive exercise!
17Proficiency Demonstrations
- A pilot lab measures and sends one or more
artefacts around to be measured at other Labs - Pilot re-measures artefact
- Pilot receives otherLabs measurements,analyzes
them in escrow as comparisons,assigns travel
uncertainty and prepares a report.
3
2
4
14
15
5
1
16
Pilot Lab
13
6
12
11
7
10
9
8
18Proficiency Demonstrations
- CIPM organizes them for NMIs
- NMIs (NRC) organizes them for you
- Do you organize them for yourself ?
- Do you organize them for your clients ?
19Proficiency Demonstrations NMIs
- A pilot NMI measures and sends one or more
artefacts around to be measured by NMIs - Pilot NMI re-measures the artefact
- Pilot NMI receives otherNMIs measurements,analy
zes them in escrow as comparisons,assigns travel
uncertainty and prepares a report,CC and CIPM
approve report, results posted on internet.
3
2
4
14
15
NRC
5
1
16
Pilot NMI
13
6
12
11
7
10
9
8
20Proficiency Demonstrations CLAS labs
- NRC measures and sends one or more artefacts
around to be measured by CLAS labs - NRC re-measures the artefact
- NRC receives CLASlabs measurements,analyzes
them in escrow as comparisons,assigns travel
uncertainty and prepares a report.
3
2
Your Lab
4
14
15
5
1
16
NRC
13
6
12
11
7
10
9
8
21Proficiency Demonstrations Shop-Floor
- You measure and send one or more artefacts around
to be measured by instruments you normally
calibrate - You re-measure the artefact
- You receive otherworkstations
measurements,analyze them in escrow as
comparisons,assign travel uncertainty and
prepare a report.
3
2
4
14
15
5
1
16
Your Lab
13
6
12
11
7
10
9
8
22Proficiency Demonstrations vs Calibrations
- Proficiency Demonstrations evaluate travel
uncertainty better - Proficiency Demonstrations evaluate everything
affecting the best capabilities, including
environment and operator... - Proficiency Demonstrations can establish tighter
equivalence - Proficiency Demonstrations require more artefacts
and more organization - Proficiency Demonstrations have new statistical
tools and toolkits available for evaluating
comparisons
23Comparisons
- Measurement comparisons provide the main
experimental evidence for equivalence - In general, all participants measure a common
artifact and their various results are analyzed
from a single common perspective - The participants may be different laboratories,
or different measurement stations on your shop
floor
24Key Comparisons and NMIs
- National Metrology Institutes have recently
signed a Mutual Recognition Arrangement in
which the validity of their Calibration and
Measurement Capabilities is expressed - The scientific underpinning for this arrangement
is a series of Key Comparisons which are
conducted at the very highest levels of metrology - In practice, they are not much different from the
proficiency tests already in general use among
accredited laboratories around the world
25Reporting Results
- A metrologist reports a result in two parts
- the mean value mL
- the uncertainty uL
- The results are plotted as data points with error
bars
26Uncertainty Budgets
- The ISO Guide to the Expression of Uncertainty of
Measurement is widely used as the basis for
formulating and publishing laboratory uncertainty
statements regarding measurement capabilities - Error bars are an intrinsically probabilistic
description of our belief in what will happen
next time based on what we have done in the past
27Uncertainty Budgets
- Error bars are intrinsically probabilistic
- The ? standard uncertainty interval contains 68
of the events, or 68 of the histogrammed events,
or 68 of the probability density function, in
physical sciences often referred to as the
probability distribution
28Probability Distributions
- An ISO Guide-compliant uncertainty statement
means that the error bars represent the most
expert opinion about the underlying normal
(Gaussian) probability distribution - The fancy name for working with these
distributions is Probability Calculus - In general, we are interested in integrals of the
probability distribution - Integration is only fancy addition
29Confidence Levels
- A confidence level is what we get upon
integrating a probability distribution over a
given range a,b - The fractional probability of observing a value
between a b is the normalized integration of
the probability distribution function in the
range a, b - This is just addition of all the bits of the
function between a b
30Confidence Intervals
- Remember a confidence level is what we get by
integrating the distribution over a given range
a,b - The confidence interval is the fancy name for the
range associated with the confidence level - The range -1s,1s is the 68 confidence
interval - The range -2s,2s is the 95 confidence interval
31Why would you want to do this?
- Lots of time and energy (and expense!) is
invested in creating a laboratory result in a
comparison - Getting the maximum amount of information from a
measurement comparison is desirable - Youd like to show off your confidence to
colleagues (and auditors!) - Quantifying things is what we do as metrologists
- Your clients may want specific quantified answers
to questions of Demonstrated Equivalence based on
your Proficiency Testing results
32How hard is it to do this?
- With normal distributions, the arithmetic is
pretty easy - You can try this for yourself and really see how
it works - or you can let us do it for you!
- We have generated simple expressions to help
evaluate normal confidence levels and normal
confidence intervals, using well known
statistical methods developed over the last
hundred years or so - We have put these expressions into a Toolkit for
Excel
33A Toolkit for Excel
- At NRC, we have written a Quantified Demonstrated
Equivalence Toolkit for Microsoft Excel - The Toolkit is freely available by contacting us
at - qde_at_nrc.ca
- Well add you to our mailing list and send you a
copy of the sample spreadsheet with the Toolkit,
plus a Users Guide in .pdf format
34Toolkit Functions and Macros
- The Toolkit contains Functions to
- calculate pair uncertainties (including
correlations) - calculate weighted averages
- calculate confidence levels
- calculate confidence intervals
- The Toolkit contains Macros to
- generate bilateral tables of equivalence
- generate bilateral tables of confidence
intervals - generate bilateral tables of confidence levels
35Toolkit Philosophy and Operation
- Functions and Macros are built right in to the
Spreadsheet, and work just like regular Excel
components
36Toolkit Philosophy and Operation
- The code is written in Visual Basic
- You can examine the code to see how it works
- Long variableNames help to self document the
programs - You dont have to look at the code or write your
own functions to use the QDE Toolkit from NRC
37A Worked Example
- 13 Laboratories participated in a Proficiency
Test at 10 kW
38Comparison to the NMI En
- One common measure of success in Proficiency
Tests is the Normalized Error - This is the ratio of the laboratory deviation to
the expanded uncertainty - En(k2) abs(mLab - mRef)/sqrt(ULab2 URef2)
- Generally, the Laboratory passes when En lt 1
- En is a dimensionless quantity
39Comparison to the NMI QDC
- A quantified approach to Proficiency Tests is to
ask the following question - What is the probability that a repeat comparison
would yield results such that Lab 1s 95
uncertainty interval encompasses the Pilot Lab
value? - We call this Quantified Demonstrated Confidence
- QDC is a dimensionless quantity expressed in
40Comparison to the NMI En vs QDC
- and
are
both dimensionless quantities - En and its interpretation as an acceptance
criterion are difficult to explain to
non-metrologists - QDC and its numerical value are easily explained
to non-metrologists - Note that when En 1 (and URef ltlt ULab) QDC 50
Normalized Error
Quantified Demonstrated Confidence
41Comparison to the NMI QDE0.95
- A different quantified approach to Proficiency
Tests is to ask the following question - Within what confidence interval can I expect the
Lab 1 value and the Pilot Lab value to agree,
with a 95 confidence level? - We call this Quantified Demonstrated Equivalence
- QDE0.95 is a dimensioned quantity, same units as V
42Comparison between Labs Agreement
- We can ask similar questions about agreement
between any two participants in the experiment - Within what confidence interval (in ppm) can I
expect the Lab 1 value and the Lab 2 value to
agree, with a 95 confidence level?
43Comparison between Labs Confidence
- What if we ask
- What is the probability that a repeat comparison
would yield results such that Lab 1s 95
uncertainty interval encompasses Lab 2s value? - Or how about
- What is the probability that a repeat comparison
would yield results such that Lab 2s 95
uncertainty interval encompasses Lab 1s value?
44Comparison between Labs Confidence
- The answers to these questions of Quantified
Demonstrated Confidence are shown here
45Quantifying Equivalence
- What is the probability that a repeat comparison
would have a Lab 2 value within Lab 1s 95
uncertainty interval?
Probability Calculus tells us the answer
QDC 47
95 interval
- This is exactly the type of awkward question
that a Client might ask!
46Quantifying Equivalence
- What is the probability that a repeat comparison
would have a Lab 1 value within Lab 2s 95
uncertainty interval?
Probability Calculus tells us the answer
QDC 22
95 interval
- These subtly different awkward questions have
very different straightforward answers!
47Tricky things about Equivalence
- Equivalence is not transitive
- Lab 1 and Lab 2 may both be equivalent to the
Pilot, but not to each other! - Equivalence is not commutative
- we are asking two very different questions here!
48Conclusions
- You are already doing quite a bit of Probability
Calculus when you present your results - The arithmetic for quantified calculations is
very straightforward when we have Normal
Distributions - Adding Statistical Confidence explicitly into
your Labs results helps you to explain them to
non-metrologists, and to present precisely what
Proficiency Testing has demonstrated for - equivalence from different National Laboratories
- accreditation assessment
- your clients
- your factory floor
49A Toolkit for Excel
- At NRC, we have written a Quantified Demonstrated
Equivalence Toolkit for Microsoft Excel - The Toolkit is freely available by contacting us
at - qde_at_nrc.ca
- Well add you to our mailing list and send you a
copy of the sample spreadsheet with the Toolkit,
plus a Users Guide in .pdf format