Title: Statistics
1Statistics
- Introduction
- 1.) All measurements contain random error
- results always have some uncertainty
- 2.) Uncertainty are used to determine if two or
more experimental results are equivalent or
different - Statistics is used to accomplish this task
Is the mutant (transgenic) mouse significantly
fatter than the normal (wild-type) mouse?
Statistical Methods Provide Unbiased Means to
Answer Such Questions.
Masuzaki, H., et. al Science (2001), 294(5549),
2166
2Statistics
- Gaussian Curve
- 1.) For a series of experimental results with
only random error - (i) A large number of experiments done under
identical conditions will yield a distribution of
results. - (ii) Distribution of results is described by a
Gaussian or Normal Error Curve
High population about correct value
low population far from correct value
Number of Occurrences
Value
3Statistics
- Gaussian Curve
- 2.) Any set of data (and corresponding Gaussian
curve) can be characterized by two parameters - (i) Mean or Average Value ( )
- where
- n number of data points
- xi value of data point number i
- value1 value2 value3 valuen
- (ii) Standard Deviation (s)
Smaller the standard deviation is, more precise
the measurement is.
4Statistics
- Gaussian Curve
- 3.) Other Terms Used to Describe a Data Set
- (i) Variance Related to the standard deviation
- Used to describe how wide or precise a
distribution of results is - variance (s)2
- where s standard deviation
- (ii) Range difference in the highest and lowest
values in a set of data - Example measurments of 4 light bulb lifetimes
-
- 821, 783, 834, 855
- High Value 855 hours
- Low Value 783 hours
- Range High Value Low Value
5Statistics
- Gaussian Curve
- 3.) Other Terms Used to Describe a Data Set
- (iii) Median The value in a set of data which
has an equal number of data values above it and
below it - For odd number of data points, the median is
actually the middle value - For even number of data points, the median is the
value halfway between the two middle values - Example
- Data Set 1.19, 1.23, 1.25, 1.45 ,1.51
mean( ) 1.33 - Data Set 1.19, 1.23, 1.25, 1.45 mean( )
1.28 - median 1.24
Median value
Median value
6Statistics
- Gaussian Curve
- (iii) Example
For the following bowling scores 116.0, 97.9,
114.2, 106.8 and 108.3, find the mean, median,
range and standard deviation.
7Statistics
- Gaussian Curve
- 4.) Relating Terms Back to the Gaussian Curve
- (i) Formula for a Gaussian curve
- where e base of natural logarithm (2.71828)
- m (mean)
- s s (standard deviation)
mean
Entire area under curve is normalized to one
standard deviation
8Statistics
- Standard Deviation and Probability
- 1.) By knowing the standard deviation (s) and the
mean ( ) of a set of result (and the
corresponding Gaussian curve) - (i) The probability of the next result falling
in any given range can be calculated by - (ii) The probability of a result falling in that
portion of the Gaussian curve is equal to the
normalized area of the curve in that portion. - (iii) Example
- 68.3 of the area of a Gaussian curve occurs
between the values -1s and 1s (
1s) - Thus, any new result has a 68.3 chance of
falling within this range.
Probability of Measuring a value in a certain
range is equal to the area of that range
9Statistics
Standard Deviation and Probability
- - Area under curve from mean value and result.
- Total ½ area is 0.5.
- Remaining area is 0.5 Area.
- Example
- z 1.3?area from mean to 1.3 is 0.403
- ? area from infinity to 1.3 is 0.5
0.403 0.097
10Statistics
- Standard Deviation and Probability
- (iii) Example
- A bowler has a mean score of 108.6 and a
standard deviation of 7.1. - What fraction of the bowlers scores will be
less than 80.2?
11Statistics
- Standard Deviation and Probability
- 2.) Knowing the standard deviation (s) of a data
set indicates the precision of a measurement - (i) Common intervals used for expressing
analytical results are shown below - (ii) The precision of many analytical
measurements is expressed as - There is only a 5 chance (1 out of 20) that any
given measurement on the sample will be outside
of this range
12Statistics
- Standard Deviation and Probability
- 4.) The precision of a mean (average) result is
expressed using a confidence interval - (i) Relationship between the true mean value (m)
and the measured mean ( ) is given by - where s standard deviation
- n number of measurements
- t students t value
-
- degrees of freedom (n-1)
Confidence interval
Note As n increases, the confidence interval
becomes smaller (m
becomes more precisely known)
13Statistics
- Standard Deviation and Probability
- 4.) The precision of a mean (average) result is
expressed using a confidence interval - (ii) Students t
- Statistical tool frequently used to express
confidence intervals
From number of measurements (n-1)
A probability distribution that addresses the
problem of estimating the mean of a normally
distributed population when the sample size is
small.
Population standard deviation (s) is unknown and
has to be estimated from the data using s.
14Statistics
- Standard Deviation and Probability
- 4.) The precision of a mean (average) result is
expressed using a confidence interval - (iii) The meaning of Confidence Interval
- To determine the true mean need to collect an
infinite number of data points. - - obviously not possible
- Confidence interval tells us the probability that
the range of numbers contains the true mean. - 50 confidence interval ? range of numbers only
contains true mean 50 of the time - 90 confidence interval ? range of numbers
contains true mean 90 of the time.
true mean
50 of data sets do not contain true mean
15Statistics
- Standard Deviation and Probability
- (iii) Example
-
- For the following bowling scores 116.0, 97.9,
114.2, 106.8 and 108.3, a bowler has a mean score
of 108.6 and a standard deviation of 7.1. - What is the 90 confidence interval for the
mean?
16Statistics
- Standard Deviation and Probability
- 5.) Comparison of Two Data Sets
- (i) To determine if two results obtained by the
same method are statistically the same, use the
following formula to determine a calculated t - where
- mean results of samples 1 2
- n1, n2 number of measurements of samples 1
2 - spooled pooled standard deviation
Requires standard deviation from the two data
sets be similar.
17Statistics
- Standard Deviation and Probability
- 5.) Comparison of Two Data Sets
- (ii) Compare calculated t to the corresponding
value in the Students t probability table. - Use the desired confidence level at the
appropriate Degrees of freedom - Degrees of Freedom (n1 n2 -2)
- (iii) If calculated t is greater than the value
in the Students t probability table, then the
two results are significantly different at the
given confidence level. - Easier to achieve for lower confidence level
Calculated t needs to be less than table value
18Statistics
- Standard Deviation and Probability
- 5.) Comparison of Two Data Sets
- (iv) Example
The amount of 14CO2 in a plant sample is measured
to be 28, 32, 27, 39 40 counts/min (mean
33.2). The amount of radioactivity in a blank is
found to be 28, 21, 28, 20 counts/min (mean
24.2). Are the mean values significantly
different at a 95 confidence level?
19Statistics
- Standard Deviation and Probability
- 5.) Comparison of Two Data Sets
- (iv) Example
Degrees of Freedom (5 4 2) 7
From Students t probability table
Degrees of Freedom (7)
95 Confidence level
Calculated t (2.48) gt 2.365 The results are
significantly different at a 95 confidence
level, but not at 98 or higher confidence levels
20Statistics
- Standard Deviation and Probability
- 6.) Comparison of Two Methods
- (i) To determine if the results of two methods
for the same sample are the same, use the
following formula to determine a calculated t - where
- difference in the mean values of the
two methods - n number of samples analyzed by each
method - sd
- (ii) Degree of Freedom (n - 1)
- (iii) If calculated t is greater than the value
in the Students t probability table, then the
two methods are significantly different at the
given confidence level.
21Statistics
- Standard Deviation and Probability
- 6.) Comparison of Two Methods
- (iv) Example
Two methods for measuring cholesterol in blood
provide the following results Are
these methods significantly different at the 95
confidence level?
22Statistics
- Standard Deviation and Probability
- 6.) Comparison of Two Methods
- (iv) Example
Degrees of Freedom (6-1 5)
95 Confidence level
Calculated t (1.20) 2.571 The results are not
significantly different at a 95 confidence level.
23Statistics
- Dealing with Bad Data
- 1.) Q Test
- (i) Method used to decide whether or not to
reject a bad data point. - (ii) Procedure
- Arrange Data in order of increasing value.
- Determine the lowest and highest values and the
total range of values. - Example
- 12.47 12.48 12.53 12.56 12.67
- Determine the difference between the bad data
point and the nearest value. - - Calculate the Q value
Questionable point
Range 0.20
gap 0.11
24Statistics
- Dealing with Bad Data
- 1.) Q Test
- (ii) Procedure
- 4. Compare the calculated Q value to those in
Tables at the same value of n and the desired
confidence level. - - n total number of values or observations
- - For example, at n 5 and 90 confidence, the
value of Q is 0.64 - - Since
- Q (calculated) Q (table)
- 0.55 0.64
- - data point 12.67 can not be rejected at the
90 confidence level - (iii) Although the Q-test is valuable in
eliminating bad data, common sense and repeating
experiments with questionable results are usually
more helpful.
25Statistics
- Dealing with Bad Data
- 1.) Q Test
- (ii) Example
- For the following bowling scores 116.0, 97.9,
114.2, 106.8 and 108.3, a bowler has a mean score
of 108.6 and a standard deviation of 7.1. - Using the Q test, decide whether the number 97.9
should be discarded.