Statistics - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Statistics

Description:

2.) Uncertainty are used to determine if two or more experimental results are ... the mutant (transgenic) mouse significantly fatter than the normal (wild-type) mouse? ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 26
Provided by: Robert80
Category:

less

Transcript and Presenter's Notes

Title: Statistics


1
Statistics
  • Introduction
  • 1.) All measurements contain random error
  • results always have some uncertainty
  • 2.) Uncertainty are used to determine if two or
    more experimental results are equivalent or
    different
  • Statistics is used to accomplish this task

Is the mutant (transgenic) mouse significantly
fatter than the normal (wild-type) mouse?
Statistical Methods Provide Unbiased Means to
Answer Such Questions.
Masuzaki, H., et. al Science (2001), 294(5549),
2166
2
Statistics
  • Gaussian Curve
  • 1.) For a series of experimental results with
    only random error
  • (i) A large number of experiments done under
    identical conditions will yield a distribution of
    results.
  • (ii) Distribution of results is described by a
    Gaussian or Normal Error Curve

High population about correct value
low population far from correct value
Number of Occurrences
Value
3
Statistics
  • Gaussian Curve
  • 2.) Any set of data (and corresponding Gaussian
    curve) can be characterized by two parameters
  • (i) Mean or Average Value ( )
  • where
  • n number of data points
  • xi value of data point number i
  • value1 value2 value3 valuen
  • (ii) Standard Deviation (s)

Smaller the standard deviation is, more precise
the measurement is.
4
Statistics
  • Gaussian Curve
  • 3.) Other Terms Used to Describe a Data Set
  • (i) Variance Related to the standard deviation
  • Used to describe how wide or precise a
    distribution of results is
  • variance (s)2
  • where s standard deviation
  • (ii) Range difference in the highest and lowest
    values in a set of data
  • Example measurments of 4 light bulb lifetimes
  • 821, 783, 834, 855
  • High Value 855 hours
  • Low Value 783 hours
  • Range High Value Low Value

5
Statistics
  • Gaussian Curve
  • 3.) Other Terms Used to Describe a Data Set
  • (iii) Median The value in a set of data which
    has an equal number of data values above it and
    below it
  • For odd number of data points, the median is
    actually the middle value
  • For even number of data points, the median is the
    value halfway between the two middle values
  • Example
  • Data Set 1.19, 1.23, 1.25, 1.45 ,1.51
    mean( ) 1.33
  • Data Set 1.19, 1.23, 1.25, 1.45 mean( )
    1.28
  • median 1.24

Median value
Median value
6
Statistics
  • Gaussian Curve
  • (iii) Example

For the following bowling scores 116.0, 97.9,
114.2, 106.8 and 108.3, find the mean, median,
range and standard deviation.
7
Statistics
  • Gaussian Curve
  • 4.) Relating Terms Back to the Gaussian Curve
  • (i) Formula for a Gaussian curve
  • where e base of natural logarithm (2.71828)
  • m (mean)
  • s s (standard deviation)

mean
Entire area under curve is normalized to one
standard deviation
8
Statistics
  • Standard Deviation and Probability
  • 1.) By knowing the standard deviation (s) and the
    mean ( ) of a set of result (and the
    corresponding Gaussian curve)
  • (i) The probability of the next result falling
    in any given range can be calculated by
  • (ii) The probability of a result falling in that
    portion of the Gaussian curve is equal to the
    normalized area of the curve in that portion.
  • (iii) Example
  • 68.3 of the area of a Gaussian curve occurs
    between the values -1s and 1s (
    1s)
  • Thus, any new result has a 68.3 chance of
    falling within this range.

Probability of Measuring a value in a certain
range is equal to the area of that range
9
Statistics
Standard Deviation and Probability
  • - Area under curve from mean value and result.
  • Total ½ area is 0.5.
  • Remaining area is 0.5 Area.
  • Example
  • z 1.3?area from mean to 1.3 is 0.403
  • ? area from infinity to 1.3 is 0.5
    0.403 0.097

10
Statistics
  • Standard Deviation and Probability
  • (iii) Example
  • A bowler has a mean score of 108.6 and a
    standard deviation of 7.1.
  • What fraction of the bowlers scores will be
    less than 80.2?

11
Statistics
  • Standard Deviation and Probability
  • 2.) Knowing the standard deviation (s) of a data
    set indicates the precision of a measurement
  • (i) Common intervals used for expressing
    analytical results are shown below
  • (ii) The precision of many analytical
    measurements is expressed as
  • There is only a 5 chance (1 out of 20) that any
    given measurement on the sample will be outside
    of this range

12
Statistics
  • Standard Deviation and Probability
  • 4.) The precision of a mean (average) result is
    expressed using a confidence interval
  • (i) Relationship between the true mean value (m)
    and the measured mean ( ) is given by
  • where s standard deviation
  • n number of measurements
  • t students t value
  • degrees of freedom (n-1)

Confidence interval
Note As n increases, the confidence interval
becomes smaller (m
becomes more precisely known)
13
Statistics
  • Standard Deviation and Probability
  • 4.) The precision of a mean (average) result is
    expressed using a confidence interval
  • (ii) Students t
  • Statistical tool frequently used to express
    confidence intervals

From number of measurements (n-1)
A probability distribution that addresses the
problem of estimating the mean of a normally
distributed population when the sample size is
small.
Population standard deviation (s) is unknown and
has to be estimated from the data using s.
14
Statistics
  • Standard Deviation and Probability
  • 4.) The precision of a mean (average) result is
    expressed using a confidence interval
  • (iii) The meaning of Confidence Interval
  • To determine the true mean need to collect an
    infinite number of data points.
  • - obviously not possible
  • Confidence interval tells us the probability that
    the range of numbers contains the true mean.
  • 50 confidence interval ? range of numbers only
    contains true mean 50 of the time
  • 90 confidence interval ? range of numbers
    contains true mean 90 of the time.

true mean
50 of data sets do not contain true mean
15
Statistics
  • Standard Deviation and Probability
  • (iii) Example
  • For the following bowling scores 116.0, 97.9,
    114.2, 106.8 and 108.3, a bowler has a mean score
    of 108.6 and a standard deviation of 7.1.
  • What is the 90 confidence interval for the
    mean?

16
Statistics
  • Standard Deviation and Probability
  • 5.) Comparison of Two Data Sets
  • (i) To determine if two results obtained by the
    same method are statistically the same, use the
    following formula to determine a calculated t
  • where
  • mean results of samples 1 2
  • n1, n2 number of measurements of samples 1
    2
  • spooled pooled standard deviation

Requires standard deviation from the two data
sets be similar.
17
Statistics
  • Standard Deviation and Probability
  • 5.) Comparison of Two Data Sets
  • (ii) Compare calculated t to the corresponding
    value in the Students t probability table.
  • Use the desired confidence level at the
    appropriate Degrees of freedom
  • Degrees of Freedom (n1 n2 -2)
  • (iii) If calculated t is greater than the value
    in the Students t probability table, then the
    two results are significantly different at the
    given confidence level.
  • Easier to achieve for lower confidence level

Calculated t needs to be less than table value
18
Statistics
  • Standard Deviation and Probability
  • 5.) Comparison of Two Data Sets
  • (iv) Example

The amount of 14CO2 in a plant sample is measured
to be 28, 32, 27, 39 40 counts/min (mean
33.2). The amount of radioactivity in a blank is
found to be 28, 21, 28, 20 counts/min (mean
24.2). Are the mean values significantly
different at a 95 confidence level?
19
Statistics
  • Standard Deviation and Probability
  • 5.) Comparison of Two Data Sets
  • (iv) Example

Degrees of Freedom (5 4 2) 7
From Students t probability table
Degrees of Freedom (7)
95 Confidence level
Calculated t (2.48) gt 2.365 The results are
significantly different at a 95 confidence
level, but not at 98 or higher confidence levels
20
Statistics
  • Standard Deviation and Probability
  • 6.) Comparison of Two Methods
  • (i) To determine if the results of two methods
    for the same sample are the same, use the
    following formula to determine a calculated t
  • where
  • difference in the mean values of the
    two methods
  • n number of samples analyzed by each
    method
  • sd
  • (ii) Degree of Freedom (n - 1)
  • (iii) If calculated t is greater than the value
    in the Students t probability table, then the
    two methods are significantly different at the
    given confidence level.

21
Statistics
  • Standard Deviation and Probability
  • 6.) Comparison of Two Methods
  • (iv) Example

Two methods for measuring cholesterol in blood
provide the following results Are
these methods significantly different at the 95
confidence level?
22
Statistics
  • Standard Deviation and Probability
  • 6.) Comparison of Two Methods
  • (iv) Example

Degrees of Freedom (6-1 5)
95 Confidence level
Calculated t (1.20) 2.571 The results are not
significantly different at a 95 confidence level.
23
Statistics
  • Dealing with Bad Data
  • 1.) Q Test
  • (i) Method used to decide whether or not to
    reject a bad data point.
  • (ii) Procedure
  • Arrange Data in order of increasing value.
  • Determine the lowest and highest values and the
    total range of values.
  • Example
  • 12.47 12.48 12.53 12.56 12.67
  • Determine the difference between the bad data
    point and the nearest value.
  • - Calculate the Q value

Questionable point
Range 0.20
gap 0.11
24
Statistics
  • Dealing with Bad Data
  • 1.) Q Test
  • (ii) Procedure
  • 4. Compare the calculated Q value to those in
    Tables at the same value of n and the desired
    confidence level.
  • - n total number of values or observations
  • - For example, at n 5 and 90 confidence, the
    value of Q is 0.64
  • - Since
  • Q (calculated) Q (table)
  • 0.55 0.64
  • - data point 12.67 can not be rejected at the
    90 confidence level
  • (iii) Although the Q-test is valuable in
    eliminating bad data, common sense and repeating
    experiments with questionable results are usually
    more helpful.

25
Statistics
  • Dealing with Bad Data
  • 1.) Q Test
  • (ii) Example
  • For the following bowling scores 116.0, 97.9,
    114.2, 106.8 and 108.3, a bowler has a mean score
    of 108.6 and a standard deviation of 7.1.
  • Using the Q test, decide whether the number 97.9
    should be discarded.
Write a Comment
User Comments (0)
About PowerShow.com