Title: A'Descriptive Statistics
1- A. Descriptive Statistics
- 1. Mean
- 2. Median
- 3. Mode
- 4. Standard Error
- Standard Deviation
- B. Probability
- 1. Addition and Multiplication Rules
- 2. Discrete and Continuous Probability
Distributions - a. Binomial Distribution
- b. Poisson Distribution
- C. The Normal Distribution
- 1. Relation of width to standard deviation
- 2. Kurtosis
- 3. Skewness
- Applicability to experimental measures
2- D. Significance Tests for a Single Population
- 1. Z-tests for a single population
- 2. T-tests for a single population
- 3. Z-tests for two populations
- 4. T-tests for two populations
- 5. Confidence Intervals
- Error in interpretation of statistical tests
- 1. Type I error
- 2. Type II error
- Regression
- 1. Correlation
- 2. Linear Regression
- 3. Multiple Regression
- Analysis of Variance (ANOVA)
- 1. F-tests
- 2. Multiple Comparison Procedures
- Data Fitting
- 1. Linear Least Squares
- 2. Chi-Squared Analysis
3Measures of Central Tendency of a
distribution Where is the middle of a
distribution, or set of data? (Arithemetic)
Mean Median Mode Midrange? geometric mean)
4Measure of Central Tendency Arithmetic Mean
Where x1, x2, x3 xn are n distinct data points
5Measure of Central Tendency Arithmetic Mean
Where x1, x2, x3 xn are n distinct data points
Example
6Measure of Central Tendency Arithmetic Mean
Most commonly used average
limitation mean is overly sensitive to outliers,
or extreme values Ex 1 x 8, 9, 9 , 10, 10,
10, 11, 11, 12, 40
7Measure of Central Tendency Median
rank observations in ascending order. median is
middle observation
n odd median (n1)/2 observation n even
median mean of (n/2) observation (n/2
1) observation
More or less insensitive to outliers
Ex 1 x 8, 9, 9 , 10, 10, 10, 11, 11, 12,
40
8Measure of Central Tendency Mode
Most frequently occurring observation
Largely insensitve to outliers
Ex 1 x 8, 9, 9 , 10, 10, 10, 11, 11, 12,
40
9Measure of Central Tendency
If a distribution is (roughly) symmetric, then
the relative positions of points to the left and
right of the median are about the same, (and so
we expect the mean and the median to be
comparable). If a distribution is positively
skewed (skewed to the right), it means that
points to the right of the median tend to be
further from the median than those to the left.
These points will skew the mean, and bring it up.
The mean will tend to be greater than the
median. Similarly, if a distribution is
negatively skewed, the mean will be smaller than
the median.
10Measure of Central Tendency
Observation frequency
1 1 2 1 3 1 4
1 5 2 6 2 7 3 8
3 9 4 10 4 11 5 12
6 13 7 14 8 15 10
16 8 17 6 18 3 19 2
20 1
Total of 78 observations, clearly clustered
around 15. Whats the mode? Whats the
median? Whats the mean?
11Measure of Central Tendency Geometric Mean
The log of the geometric mean is the (arithmetic)
mean of the logs! Any base will do.
Or, alternatively,
Useful for concentrations as this type of data
is typically skewed.
nth root
12Measure of Central Tendency Geometric Mean
Show that the geometric mean is the same as the
arithmetic mean of the logs Is it?
13Measure of Central Tendency Geometric Mean
Ex in serial dilutions, concentrations can be
expressed as a constant multiplied by a power of
two, as in , where c is a
constant, and k is some integer.
Distribution of minimal inhibitory concentration
(MIC) of penicillin G for N. gonorrhoea
14Measure of Central Tendency Geometric Mean
Distribution of minimal inhibitory concentration
(MIC) of penicillin G for N. gonorrhoea
Whats the arithmetic mean? Whats the geometric
mean?
15Measure of Central Tendency Geometric Mean
Distribution of minimal inhibitory concentration
(MIC) of penicillin G for N. gonorrhoea
Whats the arithmetic mean? 0.2336. Whats
the geometric mean? 0.1862
16Measure of Spread of a Distribution
Range difference between the largest and
smallest values of the distribution. Variance
(or sample variance) average distance from
the mean Standard Deviation square root
of the variance
Note that this is n -1 in denominator. If we were
talking of population variance There would be
an n instead
17Variance and Standard Deviation
Why the average of the squares of distances from
mean? Why not the average of the distances of
each value from the mean? Why introduce the
concept of standard deviation? Why not just use
variance?
18Variance and Standard Deviation
Why the average of the squares of distances from
mean? Why not the average of the distances of
each value from the mean? Why introduce the
concept of standard deviation? Why not just use
variance?
Consider some simple made up set 8, 9, 9, 10,
10,10, 10, 11, 11, 12 the mean is clearly
.what? Why not use a simpler way to measure the
spread, say
19Variance and Standard Deviation
Why the average of the squares of distances from
mean? Why not the average of the distances of
each value from the mean? Why introduce the
concept of standard deviation? Why not just use
variance?
Consider some simple made up set 8, 9, 9, 10,
10,10, 10, 11, 11, 12
Calculate the variance. Whats the variance? Why
bother to use standard deviation? Does it tell
you something new? Remember that these data are
(typically) measurements. suppose they are in
units of mg.
20Measure of Position within a Distribution
z score number of standard deviations above (or
below) the mean a particular value is ex
So for instance the z score for the value x1 8
is given by
And the z score for the value x6 11 is given by
21Measure of Position within a Distribution
z score number of standard deviations above (or
below) the mean a particular value is ex
So for instance the z score for the value x1 8
is given by
And the z score for the value x6 is given by
22Measure of Position within a Distribution
Percentiles the pth percentile (is roughly)
that number for which is larger than p of all
the values. The 50th percentile is the same as
the median. Technically, just as with median,
the formal definition depends on the number of
data points in the set. It depends on whether
np/100 is an integer.
23Example
Example of z score, Example of percentiles
24Example
25Example