Statistics presentation

About This Presentation

Transcript and Presenter's Notes

Title: Statistics

1
Statistics

Introduction
1.) All measurements contain random error
results always have some uncertainty
2.) Uncertainty are used to determine if two or
more experimental results are equivalent or
different
Statistics is used to accomplish this task

Is the mutant (transgenic) mouse significantly
fatter than the normal (wild-type) mouse?
Statistical Methods Provide Unbiased Means to
Answer Such Questions.
Masuzaki, H., et. al Science (2001), 294(5549),
2166
2
Statistics

Gaussian Curve
1.) For a series of experimental results with
only random error
(i) A large number of experiments done under
identical conditions will yield a distribution of
results.
(ii) Distribution of results is described by a
Gaussian or Normal Error Curve

High population about correct value
low population far from correct value
Number of Occurrences
Value
3
Statistics

Gaussian Curve
2.) Any set of data (and corresponding Gaussian
curve) can be characterized by two parameters
(i) Mean or Average Value ( )
where
n number of data points
xi value of data point number i
value1 value2 value3 valuen
(ii) Standard Deviation (s)

Smaller the standard deviation is, more precise
the measurement is.
4
Statistics

Gaussian Curve
3.) Other Terms Used to Describe a Data Set
(i) Variance Related to the standard deviation
Used to describe how wide or precise a
distribution of results is
variance (s)2
where s standard deviation
(ii) Range difference in the highest and lowest
values in a set of data
Example measurments of 4 light bulb lifetimes
821, 783, 834, 855
High Value 855 hours
Low Value 783 hours
Range High Value Low Value

5
Statistics

Gaussian Curve
3.) Other Terms Used to Describe a Data Set
(iii) Median The value in a set of data which
has an equal number of data values above it and
below it
For odd number of data points, the median is
actually the middle value
For even number of data points, the median is the
value halfway between the two middle values
Example
Data Set 1.19, 1.23, 1.25, 1.45 ,1.51
mean( ) 1.33
Data Set 1.19, 1.23, 1.25, 1.45 mean( )
1.28
median 1.24

Median value
Median value
6
Statistics

Gaussian Curve
(iii) Example

For the following bowling scores 116.0, 97.9,
114.2, 106.8 and 108.3, find the mean, median,
range and standard deviation.
7
Statistics

Gaussian Curve
4.) Relating Terms Back to the Gaussian Curve
(i) Formula for a Gaussian curve
where e base of natural logarithm (2.71828)
m (mean)
s s (standard deviation)

mean
Entire area under curve is normalized to one
standard deviation
8
Statistics

Standard Deviation and Probability
1.) By knowing the standard deviation (s) and the
mean ( ) of a set of result (and the
corresponding Gaussian curve)
(i) The probability of the next result falling
in any given range can be calculated by
(ii) The probability of a result falling in that
portion of the Gaussian curve is equal to the
normalized area of the curve in that portion.
(iii) Example
68.3 of the area of a Gaussian curve occurs
between the values -1s and 1s (
1s)
Thus, any new result has a 68.3 chance of
falling within this range.

Probability of Measuring a value in a certain
range is equal to the area of that range
9
Statistics
Standard Deviation and Probability

- Area under curve from mean value and result.
Total ½ area is 0.5.
Remaining area is 0.5 Area.
Example
z 1.3?area from mean to 1.3 is 0.403
? area from infinity to 1.3 is 0.5
0.403 0.097

10
Statistics

Standard Deviation and Probability
(iii) Example
A bowler has a mean score of 108.6 and a
standard deviation of 7.1.
What fraction of the bowlers scores will be
less than 80.2?

11
Statistics

Standard Deviation and Probability
2.) Knowing the standard deviation (s) of a data
set indicates the precision of a measurement
(i) Common intervals used for expressing
analytical results are shown below
(ii) The precision of many analytical
measurements is expressed as
There is only a 5 chance (1 out of 20) that any
given measurement on the sample will be outside
of this range

12
Statistics

Standard Deviation and Probability
4.) The precision of a mean (average) result is
expressed using a confidence interval
(i) Relationship between the true mean value (m)
and the measured mean ( ) is given by
where s standard deviation
n number of measurements
t students t value
degrees of freedom (n-1)

Confidence interval
Note As n increases, the confidence interval
becomes smaller (m
becomes more precisely known)
13
Statistics

Standard Deviation and Probability
4.) The precision of a mean (average) result is
expressed using a confidence interval
(ii) Students t
Statistical tool frequently used to express
confidence intervals

From number of measurements (n-1)
A probability distribution that addresses the
problem of estimating the mean of a normally
distributed population when the sample size is
small.
Population standard deviation (s) is unknown and
has to be estimated from the data using s.
14
Statistics

Standard Deviation and Probability
4.) The precision of a mean (average) result is
expressed using a confidence interval
(iii) The meaning of Confidence Interval
To determine the true mean need to collect an
infinite number of data points.
- obviously not possible
Confidence interval tells us the probability that
the range of numbers contains the true mean.
50 confidence interval ? range of numbers only
contains true mean 50 of the time
90 confidence interval ? range of numbers
contains true mean 90 of the time.

true mean
50 of data sets do not contain true mean
15
Statistics

Standard Deviation and Probability
(iii) Example
For the following bowling scores 116.0, 97.9,
114.2, 106.8 and 108.3, a bowler has a mean score
of 108.6 and a standard deviation of 7.1.
What is the 90 confidence interval for the
mean?

16
Statistics

Standard Deviation and Probability
5.) Comparison of Two Data Sets
(i) To determine if two results obtained by the
same method are statistically the same, use the
following formula to determine a calculated t
where
mean results of samples 1 2
n1, n2 number of measurements of samples 1
2
spooled pooled standard deviation

Requires standard deviation from the two data
sets be similar.
17
Statistics

Standard Deviation and Probability
5.) Comparison of Two Data Sets
(ii) Compare calculated t to the corresponding
value in the Students t probability table.
Use the desired confidence level at the
appropriate Degrees of freedom
Degrees of Freedom (n1 n2 -2)
(iii) If calculated t is greater than the value
in the Students t probability table, then the
two results are significantly different at the
given confidence level.
Easier to achieve for lower confidence level

Calculated t needs to be less than table value
18
Statistics

Standard Deviation and Probability
5.) Comparison of Two Data Sets
(iv) Example

The amount of 14CO2 in a plant sample is measured
to be 28, 32, 27, 39 40 counts/min (mean
33.2). The amount of radioactivity in a blank is
found to be 28, 21, 28, 20 counts/min (mean
24.2). Are the mean values significantly
different at a 95 confidence level?
19
Statistics

Standard Deviation and Probability
5.) Comparison of Two Data Sets
(iv) Example

Degrees of Freedom (5 4 2) 7
From Students t probability table
Degrees of Freedom (7)
95 Confidence level
Calculated t (2.48) gt 2.365 The results are
significantly different at a 95 confidence
level, but not at 98 or higher confidence levels
20
Statistics

Standard Deviation and Probability
6.) Comparison of Two Methods
(i) To determine if the results of two methods
for the same sample are the same, use the
following formula to determine a calculated t
where
difference in the mean values of the
two methods
n number of samples analyzed by each
method
sd
(ii) Degree of Freedom (n - 1)
(iii) If calculated t is greater than the value
in the Students t probability table, then the
two methods are significantly different at the
given confidence level.

21
Statistics

Standard Deviation and Probability
6.) Comparison of Two Methods
(iv) Example

Two methods for measuring cholesterol in blood
provide the following results Are
these methods significantly different at the 95
confidence level?
22
Statistics

Standard Deviation and Probability
6.) Comparison of Two Methods
(iv) Example

Degrees of Freedom (6-1 5)
95 Confidence level
Calculated t (1.20) 2.571 The results are not
significantly different at a 95 confidence level.
23
Statistics

Dealing with Bad Data
1.) Q Test
(i) Method used to decide whether or not to
reject a bad data point.
(ii) Procedure
Arrange Data in order of increasing value.
Determine the lowest and highest values and the
total range of values.
Example
12.47 12.48 12.53 12.56 12.67
Determine the difference between the bad data
point and the nearest value.
- Calculate the Q value

Questionable point
Range 0.20
gap 0.11
24
Statistics

Dealing with Bad Data
1.) Q Test
(ii) Procedure
4. Compare the calculated Q value to those in
Tables at the same value of n and the desired
confidence level.
- n total number of values or observations
- For example, at n 5 and 90 confidence, the
value of Q is 0.64
- Since
Q (calculated) Q (table)
0.55 0.64
- data point 12.67 can not be rejected at the
90 confidence level
(iii) Although the Q-test is valuable in
eliminating bad data, common sense and repeating
experiments with questionable results are usually
more helpful.

25
Statistics

Dealing with Bad Data
1.) Q Test
(ii) Example
For the following bowling scores 116.0, 97.9,
114.2, 106.8 and 108.3, a bowler has a mean score
of 108.6 and a standard deviation of 7.1.
Using the Q test, decide whether the number 97.9
should be discarded.

Write a Comment

User Comments (0)

About PowerShow.com

Statistics PowerPoint PPT Presentation