Uncertainty and Sampling - PowerPoint PPT Presentation

1 / 61

About This Presentation

Title:

Uncertainty and Sampling

Description:

A least-squares fit. CORM 2002: Uncertainty. Combining a Smooth. and Interpolation ... Use this parameter instead of the least-squares in the fit calculations. ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 62

Provided by: richar636

Category:

more less

Transcript and Presenter's Notes

Title: Uncertainty and Sampling

1
Uncertainty and Sampling

Dr. Richard Young
Optronic Laboratories, Inc.

2
Introduction

Uncertainty budgets are a growing requirement of
measurements.
Multiple measurements are generally required for
estimates of uncertainty.
Multiple measurements can also decrease
uncertainties in results.
How many measurement repeats are enough?

3
Random Data Simulation
Here is an example probability distribution
function of some hypothetical measurements.
We can use a random number generator with this
distribution to investigate the effects of
sampling.
4
Random Data Simulation
Here is a set of 10,000 data points
5
Random Data Simulation
Plotting Sample on a log scale is better to
show behaviour at small samples.
6
Random Data Simulation
There is a lot of variation, but how is this
affected by the data set?
7
Sample Mean
Here we have results for 200 data sets.
8
Sample Mean
9
Sample Standard Deviation
10
Sample Standard Deviation
The most probable value for the sample standard
deviation of 2 samples is zero! Many samples are
needed to make 10 most probable.
11
Cumulative Distribution
Sometimes it is best to look at the CDF.
The 50 level is where lower or higher values are
equally likely.
12
Uniform Distribution
What if the distribution was uniform instead of
normal?
The most probable value for gt2 samples is ? 10.
13
Uniform Distribution
Underestimated values are still more probable
because the PDF is asymmetric.
14
Uniform Distribution

Throwing a die is an example of a uniform random
distribution.
A uniform distribution is not necessarily random
however.
It may be cyclic e.g. temperature variations due
to air conditioning.
With computer controlled acquisition, data
collection is often at regular intervals.
This can give interactions between the cycle
period and acquisition interval.

15
Cyclic Variations
For symmetric cycles, any multiple of two data
points per cycle will average to the average of
the cycle.
16
Cyclic Variations
Unless synchronized, data collection may begin at
any point (phase) within the cycle.
Correct averages are obtained when full cycles
are sampled, regardless of the phase.
17
Cyclic Variations
Again, whole cycles are needed to give good
values.
Standard Deviation
The value is not 10 because sample standard
deviation has a (n-1)0.5 term.
18
Cyclic Variations
The population standard deviation is 10 at each
complete cycle.
Each cycle contains all the data of the
population.
The standard deviation for full cycle averages
0.
19
Smoothing

Smoothing involves combining adjacent data points
to create a smoother curve than the original.
A basic assumption is that data contains noise,
but the calculation does NOT allow for
uncertainty.
Smoothing should be used with caution.

20
Smoothing
What is the difference?
21
Savitzky-Golay Smoothing
Here is a spectrum of a white LED.
It is recorded at very short integration time to
make it deliberately noisy.
22
Savitzky-Golay Smoothing
A 25 point Savitzky-Golay smooth gives a line
through the center of the noise.
23
Savitzky-Golay Smoothing
The result of the smooth is very close to the
same device measured with optimum integration time
24
Spectral Sampling
But how does the number of data points affect
results?
Here we have 1024 data points.
25
Spectral Sampling
Now we have 512 data points.
26
Spectral Sampling
Now we have 256 data points.
27
Spectral Sampling
Now we have 128 data points.
28
Spectral Sampling
A 25 point smooth follows the broad peak but not
the narrower primary peak.
29
Spectral Sampling
To follow the primary peak we need to use a 7
point smooth
But it doesnt work so well on the broad peak.
30
Spectral Sampling
This is because some of the higher signal data
have been removed.
Comparing to the optimum scan, the intensity of
the primary peak is underestimated.
31
Spectral Sampling
Beware of under-sampling peaks you may
underestimate or overestimate intensities.
32
Exponential Smoothing
Here is the original data again.
What about other types of smoothing?
33
Exponential Smoothing
An exponential smooth shifts the peak.
Beware of asymmetric algorithms!
34
Sampling Without Noise
This is the optimum integration scan but with 128
points like the noisy example.
With lower noise, can we describe curves with
fewer points?
35
Sampling Without Noise
64 points.
36
Sampling Without Noise
32 points.
Is this enough to describe the peak?
37
Interpolation

Interpolation is the process of estimating data
between given points.
National Laboratories often provide data that
requires interpolation to be useful.
Interpolation algorithms generally estimate a
smooth curve.

38
Interpolation

There are many forms of interpolation
LeGrange, B-spline, Bezier, Hermite, Cardinal
spline, cubic, etc.
They all have one thing in common
They go through each given point and hence ignore
uncertainty completely.
Generally, interpolation algorithms are local in
nature and commonly use just 4 points.

39
Interpolation
The interesting thing about interpolating data
containing random noise is you never know what
you will get.
40
Interpolation
Uneven sampling can cause overshoots.
The Excel curve can even double back.
41
Combining a Smoothand Interpolation

If a spectrum can be represented by a function,
e.g. polynomial, the closest fit to the data
can provide smoothing and give the values between
points.
The fit is achieved by changing the
coefficients of the function until it is closest
to the data.
A least-squares fit.

42
Combining a Smoothand Interpolation

The square of the differences between values
predicted by the function, and those given by the
data are added to give a goodness of fit
measure.
Coefficients are changed until the goodness of
fit is minimized.
Excel has a regression facility that performs
this calculation.

43
Combining a Smoothand Interpolation

Theoretically, any simple smoothly varying curve
can be fitted by a polynomial.
Sometimes it is better to extract the data you
want to fit by some reversible calculation.
This means you can use, say, 9th order
polynomials instead of 123rd order to make the
calculations easier.

44
Polynomial Fitting
NIST provide data at uneven intervals.
To use the data, we have to interpolate to
intervals required by our measurements.
45
Method 1
NIST recommend to fit a high-order polynomial to
data values multiplied by l5/exp(ab/l) for
interpolation.
The result looks good, but
46
Method 1
...on a log scale, the match is very poor at
lower values.
47
Method 1
When converted back to the original scale, lower
values bear no relation to the data.
48
What went wrong?

The goodness of fit parameter is a measure of
absolute differences, not relative differences.
NIST use a weighting of 1/E2 to give relative
differences, and hence closer matching, but that
is not easy in Excel.
Large values tend to dominate smaller ones in the
calculation.
A large dynamic range of values should be
avoided.
We are trying to match data over 4 decades!

49
How do NIST deal with it?

Although NISTs 1/E2 weighting gives closer
matches than this data, to get best results they
split the data into 2 regions and calculate
separate polynomials for each.
This a reasonable thing to do but can lead to
local data effects and arbitrary splits that do
not fit all examples.
Is there an alternative?

50
Alternative Method 1
A plot of the log of El5 values vs. l-1 is a
gentle curve almost a straight line.
We can calculate a polynomial without splitting
the data.
The fact that we are fitting a log scale means we
are effectively using relative differences in the
least squares calculation.
51
Method 2
Incandescent lamp emission is close to that of a
blackbody.
52
Method 2
If we calculate a scaled blackbody curve as we
would to get the distribution temperature
and then divide the data by the blackbody...
53
Method 2
...we get a smooth curve with very little dynamic
range.
The fit is not good because of the high initial
slope and almost linear falling slope.
54
Method 2
Plotting vs. l-1, as in alternative method 1,
allows close fitting of the polynomial.
55
Comparing results
Method 2 shows lower residuals, but there is not
much difference.
56
Comparing results
All methods discussed give essentially the same
result when converted back to the original scale.
57
Algorithms and Uncertainty

None of the algorithms mentioned allow for
uncertainty or assume it is constant.
If we replaced the least-squares goodness of
fit parameter with most probable, this would
use the uncertainty we know is there to determine
the best fit.
Why is this not done?
Difficult in Excel.
Easy with custom programs.

58
Algorithms and Uncertainty
From the data value (mean) and the standard
deviation, we can calculate the PDF.
59
Algorithms and Uncertainty

Multiply the probabilities at each point to give
the goodness of fit parameter.
Use this parameter instead of the least-squares
in the fit calculations.
MAXIMIZE the goodness of fit parameter to
obtain the best fit.
The fit will be closest where uncertainties are
lowest.

60
Conclusions

Standard deviations may be under-estimated with
small samples.
Cyclic variations should be integrated for
complete cycle periods.
Smoothing and interpolation should be used with
caution
Do not assume results are valid check.

61
Conclusions

Polynomial fits can give good results, but
Avoid large dynamic range
Avoid complex curvatures
Avoid high initial slopes
All these manipulations ignore uncertainty (or
assume it is constant).
But least-squares fits can be replaced by maximum
probability to take uncertainty into
consideration.

Write a Comment

User Comments (0)