Title: Estimation Bias, Standard Error and Sampling Distribution
 1 Estimation Bias, Standard Error and Sampling 
Distribution
Topic 9 
 2From sample to population
- Inductive (inferential) statistical methods
Make inference about a population based on 
information from a sample derived from that 
population
Population
inductive statistical methods
sample 
 3Statistical Concepts of Sampling
- Suppose we want to estimate the mean birthweight 
 of Malay male live births in Singapore, 1992
- Due to logistical constraints, we decide to take 
 a random sample of 50 live births from the
 records of all Malay male live births for that
 year
4Sampling from Target Population 
Target population
random sample of 50 Malay male live births in 
Singapore, 1992
All Malay male live births in Singapore, 1992
Suppose sample mean  3.55 kg sample SD (S)  
0.92 kg What can we say about the population mean? 
 5Statistical Modeling
- Assume the population values follow a normal or 
 some other appropriate distribution. This means a
 relative frequency histogram of the population
 values will look like a normal or that
 appropriate distribution.
- Assume we have a random sample, i.e., we sample 
 n (50 in example) values independently from the
 population
6Notation
Sample data
Assume
are independent and each is
distributed according to say a normal distribution
Population parameters
Population mean  mean of the normal population
Population variance  variance of the normal 
population
Population standard deviation 
 7Statistical Inference
Two general areas (a) Statistical 
Estimation i.e. estimating population parameters 
based on sample statistics
(b) Hypothesis Testing i.e. testing certain 
assumptions about the population 
Also called Test of Statistical Significance 
 8Statistical Estimation
- There are two ways by which a population 
 parameter can be estimated from a sample
- (1) Point estimate 
- (2) Interval estimate
9Point Estimate
- Estimate the population parameter by a 
- single value 
-  Sample mean population mean 
-  Sample median population median 
-  Sample variance population variance 
-  Sample SD population SD 
-  Sample proportion population proportion
10Point Estimate
- If the average birthweight for a random sample of 
 Malay male births was 3.55 kg and we use it to
 estimate m, the mean birthweight of all Malay
 male births in the population, we would be making
 a point estimate for m
- Poor practice to report just the point estimate 
 because people cannot judge how good the estimate
 is
- Should also report the accuracy of the estimate. 
- Remember that the quality of an estimator is 
 judged by its performance over REPEATED SAMPLING
 although we have just one sample in hand.
 Inference for population parameter should make 
allowance for sampling error 
 11Accuracy of statistical estimation
- Two types of error 
- (a) Sampling error or fluctuation 
-  random error or fluctuation that is due 
 entirely to chance in the process of sampling.
 Minimizing the sampling error maximizes the
 precision of a statistical estimate.
(b) Systematic error or bias Non-random 
error/bias which is either a property of the 
estimator itself or due to bias in the sampling 
or measurement process. Minimizing the 
systematic error maximizes the validity of a 
statistical estimate. Systematic errors can be 
minimized by making efforts to reduce measurement 
bias (eg non-random sampling, non-response and 
non-coverage, untruthful answers, unreliable 
calibration, errors with data recording and 
coding etc)  
 12Unbiased estimation of the mean
i.e., the sample mean equals the population 
mean when averaged over repeated samples 
 13Hypothetical results of repeated sampling
- Unbiasedness means the sample mean equals the 
 population mean when averaged over repeated
 samples
- However, there is fluctuation from sample to 
 sample
- Variance  ? 
14(No Transcript) 
 15(No Transcript) 
 16Standard Error (SE) of an estimator
- The SE of an estimator (e.g., the sample mean) is 
 just the standard deviation (SD) of the
 estimator. It measures the variability of the
 estimator under repeated sampling
- SE is just a special case of SD 
- The reason why the standard deviation of an 
 estimator is called standard error is because it
 is a measure the magnitude of the estimation
 error due to sampling fluctuation
17Standard Deviation vs Standard Error
- The population standard deviation (SD) measures 
 the amount of variation among the individual
 measurements that make up the population and can
 be estimated from a sample using the sample
 standard deviation.
- The standard error (e.g. of the sample mean), on 
 the other hand, measures how much the value of
 the estimator changes from sample to sample under
 repeated sampling.
- As we take only 1 sample rather that repeated 
 samples in practice, it seems impossible at first
 to estimate standard error which is defined with
 reference to repeated sampling.
- Fortunately, the standard error of the sample 
 mean is a function of the population SD. As the
 latter is estimable from a single sample, so is
 the standard error.
18Estimated standard error of the sample mean 
- Let denote the population SD 
- It was shown earlier that 
- SE  SD(sample mean)  / , where n is 
 the sample size
- Since can be estimated by the sample standard 
 deviation S, we can estimate the standard error
 by SE  S/
Note that SE decreases with n at the rate 1/ 
, i.e., the precision of the sample mean improves 
as sample size increases 
 19Knowing the mean and standard error of an 
estimator still doesnt tell us the whole story
The whole story is told by the sampling 
distribution since that helps in calculating the 
probabilities 
 20Sampling distribution of the sample mean
- The distribution of the sample mean under 
 repeated sampling from the population
- Distribution of the sample mean rather than 
 individual measurements
- In practice, we take only one sample, not 
 repeated samples and so the sampling distribution
 is unobserved but fortunately it can often be
 derived theoretically
Demo http//www.ruf.rice.edu/lane/stat_sim/inde
x.html 
 21Exact result when sampling from a normal 
population
- If the population is normal with mean and 
 variance , then the sample mean based on a
 random sample of size n is also normal with mean
 and variance
- Note how we can derive theoretically the 
 distribution of the sample mean under repeated
 sampling without actually drawing repeated
 samples
- This is important because we usually only have 
 one sample at our disposal in practice
22Topic 10 Interval Estimate
- Provides an estimate of the population parameter 
 by defining an interval or range of plausible
 values within which the population parameter
 could be found with a given confidence.
- This interval is called a confidence interval. 
- The sampling distribution is used in constructing 
 confidence intervals.
23Confidence interval for the mean of a normal 
population
Fact With probability 0.95, a normally 
distributed variable is within 1.96 standard 
deviations from its mean.
Now
- It follows that the sample mean must be within 
 1.96 standard errors from the population mean
 with probability 0.95.
-  Equivalently, the population mean is within 1.96 
 standard errors from the sample mean.
24We call
a 95 confidence interval for the population mean.
If is unknown, replace it by the sample SD
and replace 1.96 by the upper 2.5-percentile of a 
t-distribution with n-1 degrees of freedom to 
yield  
 25as a 95 confidence interval for the population 
mean 
 26The t densities
- t densities are symmetric and similar in 
 appearance to N(0,1) density but with heavier
 tails
- Tables for t distributions are widely available 
- As d.f. increases, t distribution converges to 
 standard normal distribution
Demo http//www.isds.duke.edu/sites/java.html 
 2795 confidence interval for the population mean
Birthweight data revisited
- n  100, Sample mean  3.55 kg, S  0.92 kg 
- SE  .92/sqrt(50)  0.13 kg 
- d.f.  49, upper 2.5-percentile of t  2.01 
- 95 C.I. for the mean Malay male birthweight is 
-  3.55 /- 2.01 (0.13)  (3.29 kg, 3.81 kg)
28The meaning of confidence interval
Under repeated sampling, 
will contain the true mean 95 of the times. 
 29Demo http//www.isds.duke.edu/sites/java.html