Title: Supplement 12: FinitePopulation Correction Factor
1Supplement 12Finite-Population Correction Factor
The ppt is a joint effort Mr KE Yi discussed
whether we should make the finite population
correction with Dr. Ka-fu Wong on 19 April 2007
Ka-fu explained the problem Yi drafted the ppt
Ka-fu revised it. Use it at your own risks.
Comments, if any, should be sent to
kafuwong_at_econ.hku.hk.
2Binomial distribution
- The binomial distribution has the following
characteristics - An outcome of an experiment is classified into
one of two mutually exclusive categories, such as
a success or failure. - The data collected are the results of counts in a
series of trials. - The probability of success stays the same for
each trial. - The trials are independent.
- For example, tossing an unfair coin three times.
- H is labeled success and T is labeled failure.
- The data collected are number of H in the three
tosses. - The probability of H stays the same for each
toss. - The results of the tosses are independent.
3Hypergeometric Distribution
- The hypergeometric distribution has the following
characteristics - There are only 2 possible outcomes.
- The probability of a success is not the same on
each trial. - It results from a count of the number of
successes in a fixed number of trials.
4Example Hypergeometric Distribution
In a bag containing 7 red chips and 5 blue chips
you select 2 chips one after the other without
replacement.
R2
6/11
R1
7/12
B2
5/11
R2
7/11
B1
5/12
B2
4/11
The probability of a success (red chip) is not
the same on each trial.
5Binomial vs. hypergeometric
The formula for the binomial probability
distribution is P(x) C(n,x) px(1- p)n-x
The formula for the hypergeometric probability
distribution is P(x) C(S,x)C(N-S,n-x)/C(N,n)
6Example Hypergeometric Distribution
In a bag containing 700 red chips and 500 blue
chips you select 2 chips one after the other
without replacement.
R2
699/1199
R1
700/1200
500/1199
B2
R2
700/1199
B1
500/1200
B2
499/1199
The probability of a success (red chip) is not
the same on each trial but the difference is very
small. So small that our conclusion will not be
affected much even if we ignore the difference in
probability in the two trials.
7Example Hypergeometric Distribution
In a bag containing 700 red chips and 500 blue
chips you select 2 chips one after the other
without replacement.
The probability of a success (red chip) is not
the same on each trial but the difference is very
small. So small that our conclusion will not be
affected even if we ignore the difference in
probability in the two trials.
Knowing when we can ignore the difference is
important when
1. we have limited computation power.
2. N is unknown but we know that N is somewhat
large relative to sample size.
8Finite? Infinite?
- Actually, most population we are working with is
finite - Number of goods produced in a workshop per hour
- The number of times you visit a club per month
- Even the population of a big city, say Shanghai
-
-
ALL FINITE!!
9Cost and benefits
- Would it be harmful to make correction every time
you came across a finite population? - No harm!
- But the correction can be costly
- Need to collect / know the population size.
- Need additional computational power.
- Need to remember one additional formula
- Whether to recognize the finite population (and
hence whether to use a different formula) depends
on the benefits and costs of doing so.
10Cost and benefits
- Whether to recognize the finite population (and
hence whether to use a different formula) depends
on the benefits and costs of doing so. - Benefits increases with the proportion of sample
size to population size. - Cost is higher when
- We do not have a computer with us.
- When we need to make extra effort to find out N.
Use an approximation instead of the exact formula
if
Cost of using exact formula gt benefit
11Finite? Infinite?
- Actually, most population we are working with is
finite - Number of goods produced in a workshop per hour
- The number of times you visit a club per month
- Even the population of a big city, say Shanghai
-
-
ALL FINITE!!
But do we know N ?
12An Example benefits decreases when n/N decreases
- A population of size N500 (finite), ? is known
- When the size of sample n300
- With finite population correction the standard
error of the sample is - Why? 0.633 make a big difference.
-
The benefit of recognizing the finite population
(in terms of the conclusion of testing hypothesis
and constructing CI) appears big.
13An Example benefits decreases when n/N decreases
- A population of size N500 (finite), ? is known
- When the size of sample n5
- With finite population correction the standard
error of the sample is - Why? 0.996 make a small difference
The benefit of recognizing the finite population
(in terms of the conclusion of testing hypothesis
and constructing CI) appears small.
14N relative to n
- 1st case, 500 is relative small when the sample
size is 300, correction is needed - 2nd case, 500 is relative large when the sample
size is 5, correction can be ignored (a finite
population can be treated as infinite) - Notes
- Usually, If n/N lt .05, the finite-population
correction factor is ignored - This applies to sample proportions as well
15Supplement 12Finite-Population Correction Factor