Title: Statistical Sampling
1Statistical Sampling Analysis of Sample Data
- (Lesson - 04/A)
- Understanding the Whole from Pieces
2Sampling
- Sampling is
- Collecting sample data from a population and
- Estimating population parameters
- Sampling is an important tool in business
decisions since it is an effective and efficient
way obtaining information about the population.
3Sampling (Cont.)
- How good is the estimate obtained from the
sample? - The means of multiple samples of a fixed size (n)
from some population will form a distribution
called the sampling distribution of the mean - The standard deviation of the sampling
distribution of the mean is called the standard
error of the mean
4Sampling (Cont.)
- Standard Error of the mean
- Estimates from larger sample sizes provide more
accurate results - If the sample size is large enough the sampling
distribution of the mean is approximately normal,
regardless of the shape of the population
distribution - Central Limit Theorem
5Sampling Distribution of the Mean
THE CENTRAL LIMIT THEREOM For samples of n
observations taken from a population with mean ?
and standard deviation ?, regardless of the
populations distribution, provided the sample
size is sufficiently large, the distribution of
the sample mean , will be normal with a
mean equal to the population mean
. Further, the standard deviation will equal the
population standard deviation divided by the
square-root of the sample size .
The larger the sample size, the better the
approximation to the normal distribution.
6Sampling Statistics
- Sampling statistics are statistics that are based
on values that are created by repeated sampling
from a population, - such as
- Mean of the sampling means
- Standard Error of the sampling mean
- Sampling distribution of the means
7Sampling Key Issues
- Key Sampling issues are
- Sample Design (Planning)
- Sampling Methods (Schemes)
- Sampling Error
- Sample Size Determination.
8Sampling Design
- Sample Design (Sample Planning) describes
- Objective of Sampling
- Target Population
- Population Frame
- Method of Sampling
- Statistical tools for Data Analysis
9Sampling Methods
Sampling Methods (Sampling Schemes)
- Subjective Methods
- Judgment Sampling
- Convenience Sampling
- Probabilistic Methods
- Simple Random Sampling
- Systematic Sampling
- Stratified Sampling
- Cluster Sampling
10Sampling Methods (Cont.)
- Simple Random Sampling Method
- refers to a method of selecting items from a
population such that every possible sample of a
specified size has an equal chance of being
selected - with or without replacement
11Sampling Methods (Cont.)
- Stratified Sampling Method
- Population is divided into natural subsets
(Strata) - Items are randomly selected from stratum
- Proportional to the size of stratum.
12Stratified Sampling Example
Population Cash holdings of All Financial
Institutions in the Country
Stratified Sample of Cash Holdings of Financial
Institutions
13Cluster Sampling
- Cluster sampling refers to a method by which the
population is divided into groups, or clusters,
that are each intended to be mini-populations. A
random sample of m clusters is selected.
14Cluster Sampling Example
Mid-Level Managers by Location for a Company
15Sampling Error
- SAMPLING ERROR-SINGLE MEAN
- The difference between a value (a statistic)
computed from a sample and the corresponding
value (a parameter) computed from a population. - Where
-
16Sampling Error (Cont.)
- Sampling Error is inherent in any sampling
process due to the fact that samples are only a
subset of the total population. - Sampling Errors depends on the relative size of
sample - Sampling Errors can be minimized but not
eliminated.
17Sampling Error (Cont.)
- If Sampling size is more than 5 of the
population - With Replacement assumption of Central Limit
Theorem and hence, Standard Error calculations
are violated - Correction by the following factor is needed.
18Sampling Size
- Sample Size Determination.
where, n sample size z z-score a factor
representing probability in terms of standard
deviation a (100 - confidence level) E
interval on either side of the mean
19Estimation
- Estimation (Inference) is assessing the the value
of a population parameter using sample data - Two types of estimation
- Point Estimates
- Interval Estimates
20Estimation
FOR ESTIMATION USE ALLWAYS z or t DISTRIBUTION
21Estimation (Cont.)
- Most common point estimates are the descriptive
statistical measures. - If the expected value of an estimator equals to
the population parameter then it is called
unbiased.
22Estimation (Cont.)
That means that we can use sample estimates as if
they were population parameters without
committing an error.
23Estimation (Cont.)
- Interval Estimate provides a range within which
population parameter falls with certain
likelihood. - Confidence Level is the probability (likelihood)
that the interval contains the population
parameter. Most commonly used confidence levels
are 90, 95, and 99.
24Confidence Interval
- Confidence Interval (CI) is an interval estimate
specified from the perspective of the point
estimate. - In other words CI is
- an interval on either side (/-) of the point
estimate - based on a fraction (t or z-score) of the Std.
Dev. of the point estimate
25Confidence Intervals
Lower Confidence Limit
Upper Confidence Limit
Point Estimate
2695 Confidence Intervals
0.95
z.025 -1.96
z.025 1.96
27CI for Proportions
- For categorical variables having only two
possible outcomes proportions are important. - An unbiased estimation of population proportion
(p) is the sample statistics
p x/n where, x number of observations in the
sample with desired characteristics
28Confidence Interval- From General to Specific
Format -
29CI of the Mean (Cont.)
where, E Margin of Error
30Confidence Interval- From Statistical Expression
to Excel Formula -
- Where
- z a/2 Normsinv(1 a/tails)
- and when n lt 30 AND s is not known , then z ?
t so - t a/2 n-1 Tinv(2a/tails, n-1)
31CI of the Mean (Cont.)
where, z z-score a critical factor
representing probability in terms of Standard
Deviation (for sampling Standard Error) (valid
for normal distribution) (critical value) t
t-score a factor representing probability in
terms of standard deviation (or Std. Error)
(valid for t distribution) (critical value) a
(100 - confidence level)
32Confidence Interval
33Z-score
- A z-score is a critical factor, indicating how
many standard deviation (standard error for
sampling) away from the mean a value should be to
observe a particular (cumulative) probability. - There is a relationship between z-score and
probability over p(x) (1-Normsdist(z))tails
and - There is a relationship between z-score and the
value of the random variable over
34Z-score (Cont.)
- Since the z-score is a measure of distance from
the mean in terms of Standard Deviation (Standard
Error for sampling), it provides us with
information that a cumulative probability could
not. For example, the larger z-score the unusual
is the observation.
35Students t-Distribution
The t-distribution is a family of distributions
that is bell-shaped and symmetric like the
Standard Normal Distribution but with greater
area in the tails. Each distribution in the
t-family is defined by its degrees of freedom.
As the degrees of freedom increase, the
t-distribution approaches the normal distribution.
36Degrees of freedom
Degrees of freedom (df) refers to the number of
independent data values available to estimate the
populations standard deviation. If k parameters
must be estimated before the populations
standard deviation can be calculated from a
sample of size n, the degrees of freedom are
equal to n - k.
37Example of a CI Interval Estimate for ?
- A sample of 100 cans, from a population with ?
0.20, produced a sample mean equal to 12.09. A
95 confidence interval would be
38Example of Impact of Sample Size on Confidence
Intervals
- If instead of sample of 100 cans, suppose a
sample of 400 cans, from a population with ?
0.20, produced a sample mean equal to 12.09. A
95 confidence interval would be
12.0704 ounces
12.1096 ounces
n400
n100
12.051 ounces
12.129 ounces
39Example of CI for Proportion
- 62 out of a sample of 100 individuals who were
surveyed by Quick-Lube returned within one month
to have their oil changed. To find a 90
confidence interval for the true proportion of
customers who actually returned
40From Margin of Error to Sampling Size
41Sampling Size
- Sample Size Determination.
where, n sample size z z-score a factor
representing probability in terms of standard
deviation a 100 - confidence level E
interval on either side of the mean
42Sampling Size
43Pilot Samples
A pilot sample is a sample taken from the
population of interest to provide and estimate
for the population standard deviation. Normally
its size is smaller than the anticipated sample
size.
44Example of Determining Required Sample Size
- The manager of the Georgia Timber Mill wishes to
construct a 90 confidence interval with a margin
of error of 0.50 inches in estimating the mean
diameter of logs. A pilot sample of 100 logs
yield a sample standard deviation of 4.8 inches.
45Next Lesson
- (Lesson - 04/B)
- Hypothesis Testing