Title: From the population to the sample
1From the population to the sample
- The sampling distribution FETP India
2Competency to be gained from this lecture
- Use the properties of the sampling distribution
to calculate standard error to the mean
3Key issues
- Population parameters versus sample statistics
- Sampling distribution and its properties
- Mean and standard error of the sampling
distribution
4Things we already know
- Mean
- Arithmetic sum of data divided by number of
observations - Standard deviation
- Index of variability (spread) of data about the
mean - Z-score
- Distance from mean in standard deviation unitsz
(x-mean)/sd - Normal curve
- Bell-shaped curve that relates probability to
z-scores
Parameters and statistics
5Population parameters
- A population parameter is a numerical descriptive
measure of a population - Examples
- Population mean (µ)
- Standard deviation (?)
Parameters and statistics
6A statistic
- A statistic is a numerical descriptive measure of
a sample - Examples
- Sample mean x
- Sample standard deviation s
Parameters and statistics
7Inference
- The parameter is fixed
- The sample statistics varies from sample to
sample - We try to infer what happens in the population
from what we see in the sample
Parameters and statistics
8Sample mean A typical situation
- A sample might be taken
- The mean and standard deviation are computed
- From this data, one will want to infer that the
population values are identical or at least
similar - In other words, it is hoped that the sample data
reflects the population data
Sampling distribution
9Sample mean Another approach
- Change your thinking from a single sample
- Consider the situation where you
- Take many samples
- Calculate a mean and standard deviation for each
sample
Sampling distribution
10Taking many samples from a population
- Consider a population of 1,000 individuals with
various heights - If we take 10 samples of 100 persons from the
population, each of the 10 samples will have a
specific frequency distribution with - A specific mean
- A specific standard deviation
- In each sample, each data point is a height
Sampling distribution
11Looking at the means of the samples
- We can look at the frequency distribution of the
means of each of the 10 samples - In this case
- The data points are no longer the heights
- The data points are the means
Sampling distribution
12Intuitive observation
- If we take iterative samples from a population,
we are unlikely to sample extreme values every
time - Values close to the mean are common
- Extreme values are less common
- Thus, when we compare the distribution of the
heights and the distribution of the means, we
observe - More variation in the distribution of individual
heights - Less variation in the distribution of the means
Sampling distribution
13Taking many samples from the population
- If we take many samples, we can plot a complete
frequency distribution of the means of the
samples - Each sample produces a statistic (mean)
- The distribution of statistics (means) is called
a sampling distribution
Sampling distribution
14Multiple sample means
Sampling distribution
15Important properties of the sampling distribution
- The sampling distribution is normally distributed
- The mean of the sampling distribution is equal to
the mean of the population
Sampling distribution
16Standard deviation of the sampling distribution
- If the standard deviation of the population is ?
- The standard deviation of the sampling
distribution will be ? / (v n) - n is the sample size
Sampling distribution
17Terminology
- The mean of the sampling distribution continues
to be called the mean - The standard deviation of the sampling
distribution is the standard error
Standard error
18Distribution of sample means
- One could obtain a standard deviation of sample
means which would describe the variability and
the spread of sample means about the true
population mean - In a practical situation
- There is only one sample mean
- One hopes this sample mean is near the real
population mean - Wouldn't it be nice to have an estimate of the
standard deviation of sample means which describe
the spread of sample means?
Standard error
19Standard error of the mean
- Divide the standard deviation by the square root
of the number of observations - The resulting estimate of the standard deviation
of sample means is called the standard error of
means - It can be interpreted in a manner similar to the
standard deviation of raw scores - For example, the probability of obtaining a
sample mean which is outside the -1.96 to 1.96
range is 5 out of 100
Standard error
20Central limit theorem
- If x possesses any distribution with mean µ and
standard deviation SD - Then the sample mean x based on a random sample
of size n will have a distribution that
approaches the distribution of a normal random
variable - Mean µ
- Standard deviation SD/square root of n as n
increases without limit. - Special case
- If x is normally distributed, the result is true
for any sample size
Standard error
21Simple example
- Let the population be 1,2,3,4,5
- Mean 15/5 3 µ
- Lets take a sample of two elements
- The 25 possible samples are
1,1 1,2 1,3 1,4 1,5 2,1 2,2 2,3 2,4 2,5 3,1 3,2 3,
3 3,4 3,5 4,1 4,2 4,3 4,4 4,5 5,1 5,2 5,3 5,4 5,5
Standard error
22The frequency distribution of the population is
not normal
2
Frequency
1
0
1
2
3
4
5
Values
Standard error
23Standard deviation of the population
Standard error
24Looking at the mean of the samples
- The 25 means of the 25 samples are
1 1.5 2 2.5 3 1.5 2 2.5 3 3.5 2 2.5 3 3.5 4 2.5 3
3.5 4 4.5 3 3.5 4 4.5 5
Mean of sample means 75/25 3 Same as
population mean
Standard error
25The sampling distribution tends to be normal
6
5
4
Frequency
3
2
1
0
1
1.5
2
2.5
3
3.5
4
4.5
5
Values
Even if the population is not normally
distributed, the sampling distribution will tend
to be normal
Standard error
26Standard deviation of the sample
Standard error
27Standard deviation in the population and
standard error
- Standard deviation in the population
- 1.4
- Sample size
- 2
- Square root of the sample size
- 1.4
- Standard deviation / square root of the sample
size - 1.4 / 1.4 1
- Standard error
Standard error
28Applying the standard error Male's serum uric
acid levels (1/2)
- Population mean
- 5.4 mg per 100 ml
- Standard deviation is
- 1
- Take 100 samples of 25 men in each sample
- Compute 100 sample means
- How many of those means would you expect to fall
within the range 5.4-(1.96x1) to 5.4(1.96x1)? - The answer is 95!
Standard error
29Applying the standard error Male's serum uric
acid levels (2/2)
- One sample
- Mean serum uric acid level of 8.2
- Would you assume this was "significantly"
different from the population mean? - Yes, because a mean of that magnitude could occur
less than 5 times in 100
Standard error
30Key messages
- While population parameters are fixed, samples
provide estimates (statistics) that fluctuate - The distribution of a statistic for all possible
samples of given size n is called the sampling
distribution. - For large n, the sampling distribution is
normal, even if the original distribution is
not. - If the original distribution is normal, the
result is true even for small n. - The mean of the sampling distribution is the
population mean and the standard deviation
(standard error) is the population SD/ sq.root n