Chapter 20: chance errors in sampling - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Chapter 20: chance errors in sampling

Description:

Chapter 20: chance errors in sampling – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 22
Provided by: University354
Category:

less

Transcript and Presenter's Notes

Title: Chapter 20: chance errors in sampling


1
Chapter 20 chance errors in sampling
Sample surveys involve chance error estimate
parameter bias chance error even when we have
used random sampling Chance error also referred
to as sampling error This chapter deals with how
we find the likely size of chance error in a
percentage for simple random samples when we the
composition of the population Example
health study based on a representative
cross-section 6,672 people aged 18 to 79 We
want to interview them but can only afford to
interview 100 to avoid bias we want to draw 100
at random
2
Chapter 20 chance errors in sampling
Step 1 computer randomly selects 100 numbers
between 1 and 6,672 question is such a sample
representative of the population for gender
3,091/6,672 males 46 3,581/6,672
females 54 age, income, education
etc. Sample 1 51/100 males 51 chance error
of 5 randomly sample samples of 100 250 times
OH Table 2, p. 357 some samples lt 46 male, some
gt 46 male lowest 34, highest
58 only 17 samples out of 250 had male
46 OH Fig 1, p. 357 Chance variability
prevents more samples with males 46 Chance
variability in sampling chance variability in
coin tossing
3
Chapter 20 chance errors in sampling
Chance variability in sampling chance
variability in coin tossing toss a coin get
either a head or a tail number of heads
either goes up or stays the same probabilities
are 50-50 each time sampling get either a male
or a female number of males either goes up
or stays the same probabilities are about 46-54
each time (taking 100 out of 6,672 does not
change P much) What about increasing the sample
size? Is the sample more
representative of the population?
4
Chapter 20 chance errors in sampling
computer randomly selects 400 numbers between 1
and 6,672 question is such a sample
representative of the population for gender
3,091/6,672 males 46 3,581/6,672
females 54 randomly sample samples of 400
250 times OH Table 2, p. 357 some samples lt 46
male, some gt 46 male lowest 39, highest
54 OH Fig 2, p. 358 (note less variability
than Fig 1) Note multiplying sample size by 4
reduces the likely size of chance error in the
percentage by a factor of 2 sample size/ of
draws 100 SE 50 sample size x
4 400 SE 100 Note males in
sample males in population chance
error chance error is variable from sample to
sample The standard error allows us to estimate
the its likely size
5
Chapter 20 Expected value and standard error
In simple random sampling expected value for
sample population So our box model
3,091 1s
3,581 0s
Population for males 3,091/6,672
46 Expected value for males in the sample
46 But this will be off due to chance error
which we can estimate with the SE SE
SD .50 SE for 100
draws 5 So the of males in the sample is
expected to be 46 give or take 5
6
Chapter 20 chance errors in sampling
Note computing SE for a SE for the x
100 sample size In our
example expected value 46 46 out of
100 box SD 0.50 SE (0.5)
5 so the sum of 100 draws from the box 46
/- 5 SE for (5/100)100 5 As n
increases, SE increases by a factor of the square
root of the factor by which n was multiplied So
with n 400, expected value 46
184/400 box SD 0.50 SE
(0.5) 10 so the sum of 400 draws from the
box 184 /- 10 SE for (10/400)100 2.5
7
Chapter 20 chance errors in sampling
These formulas are exact when sampling with
replacement good approximations when sampling
without replacement if the of draws (n) is
small relative to the of tickets in the box(
the population size) Our example population
6,672 of draws 100 regardless which tickets
are drawn, the of 1s in the box stays close to
46 first draw P(1) 46 if 1st
99 are all 1, P(final draw 1) 2,992/6,573
45.5 As far as probabilities are concerned when
n is small relative to the population size, there
is not much difference between sampling with or
without replacement
8
Chapter 20 chance errors in sampling
So the SE for a comes from the SE for a SE
for the x 100 sample size But notice that
the two SEs behave very differently SE for a
n(factor) increases SE by SE for a
n(factor) decreases the SE by Recall with n
100 SE for 5 SE for (5/100)100
5 with n 400 SE for 10 SE for
(10/400)100 2.5 Exercise Set A 1, 2, 3, 4, 8
9
Chapter 20 using the normal curve
Example population 100,000 cable
subscribers company wants to do a market
survey wants to do a simple random sample of
400 Census information 20 of the 100,000 earn
gt 50,000 per year Percentage in sample earning gt
50,000 ?
Step 1 Draw the box model income gt 50,000
ticket with 1 income lt 50,000 ticket with
0 Population 20,000 1s and 80,000
0s Drawing at random without
replacement Sample of 400 400 draws from the
box Step 2 Expected value for 400 draws 400
(box average) 80 SD of the box
.40 SE 8 So the sum of draws will be
around 80 give or take 8
10
Chapter 20 using the normal curve
Example population 100,000 cable
subscribers company wants to do a market
survey wants to do a simple random sample of
400 Census information 20 of the 100,000 earn
gt 50,000 per year Percentage in sample earning gt
50,000 ?
So the sum of draws will be around 80 give or
take 8 The question is about so we need to
convert to relative to n Expected value for
(80/400)100 20 SE for (8/400)100
2 Expected value for the sample 20 give or
take 2 So about 20 /- 2 of the sample will
earn gt 50,000
11
Chapter 20 using the normal curve
Example 2 population 100,000 cable
subscribers company wants to do a market
survey wants to do a simple random sample of
400 Census information 20 of the 100,000 earn
gt 50,000 per year P(between 18 and 22 of the
sample earn gt 50,000) ?
We know that Expected value for
(80/400)100 20 SE for (8/400)100
2 Convert to standard units
18 20 22
-1 0 1
Table A105 68 lies between -1 and 1 std units
12
Chapter 20 using the normal curve
Example 2 population 100,000 cable
subscribers company wants to do a market
survey wants to do a simple random sample of
400 Census information 20 of the 100,000 earn
gt 50,000 per year P(between 18 and 22 of the
sample earn gt 50,000 68
Here the normal curve was used to figure the
probability this is legitimate because there is
a probability histogram for the of high earners
in the sample OH Fig 3, p. 365 for example
area between 80 and 90 represents the chances
of drawing a sample which has between 80 and 90
high earners expected value 80 SE
8 p(80 to 90 high earners) P(between 0 and
1.25) 39 The normal approximation is valid
because the probability histogram follows the
normal curve
13
Chapter 20 using the normal curve
Note both examples started out with quantitative
data incomes both problems required
reclassifying incomes were reclassified as high
(1) and not-high (0) then the high earners were
counted ultimately the data were treated as
qualitative each income either is/is not
high When do we reclassify to a box with 1s and
0s? Depends on what we want to do with the
sample values add up sample values to compute
an average classify values as 0 or 1 and count
the 1s now we reclassify Exercise set B
1, 2, 3
14
Chapter 20 the correction factor
When estimating s accuracy depends on the
absolute size of the sample NOT the relative
size of the sample in relation to the
population (This is true when the sample is only
a small proportion of the population) Example 199
2 presidential election New Mexico poll
2,500 voters out of 1.2 million or 1 out of
500 Texas poll 2,500 voters out of 12.5 million
or 1 out of 5,000 So we have 2 boxes
NM box 1.2 million tickets Democrats 1, others
0 of 1s 50
TX box 12.5 million tickets Democrats 1,
others 0 of 1s 50
Company A polls 2,500 Estimate of 1s in
population based on the sample Off by chance
error
Company B polls 2,500 Estimate of 1s in
population based on the sample Off by chance
error
15
Chapter 20 the correction factor
NM box 1.2 million tickets Democrats 1, others
0 of 1s 50
TX box 12.5 million tickets Democrats 1,
others 0 of 1s 50
Company A polls 2,500 Estimate of 1s in
population based on the sample Off by chance
error
Company B polls 2,500 Estimate of 1s in
population based on the sample Off by chance
error
Assume sampled with replacement - does not
matter which box is used 1 has a 50-50 chance
from each box and box size is irrelevant both
boxes SD .50 both SEs 25 SE for of
1s among the draws (25/2,500)100 1
16
Chapter 20 the correction factor
NM box 1.2 million tickets Democrats 1, others
0 of 1s 50
TX box 12.5 million tickets Democrats 1,
others 0 of 1s 50
Company A polls 2,500 Estimate of 1s in
population based on the sample Off by chance
error
Company B polls 2,500 Estimate of 1s in
population based on the sample Off by chance
error
If sampled without replacement - does it matter
which box is used? the of draws in both cases
is a tiny fraction of the population on each
draw the chance of drawing a 1 is still close to
.50 however it does get a little smaller so the
SE should vary slightly Dealt with via a
mathematical formula SE (without replacement)
correction factor x SE (with replacement)
17
Chapter 20 the correction factor
Correction factor
OH Table 3, p. 368 when population is large
relative to the sample size, the correction
factor is close to 1.0 and can be ignored as SE
(without replacement) correction factor x SE
(with replacement) so SE (without replacement )
1 times SE(with replacement) in this case the
absolute size of the sample determines the
accuracy via the SE for drawing with
replacement If the sample is a substantial
proportion of the population then the correction
factor must be used
18
Chapter 20 the correction factor
Correction factor
Note this works even if the s of 1s differ
between the boxes as the SDs remain about the
same election example NM 46 1s 54
0s TX 37 1s 63 0s SD .50 SD
.48 Thus the SEs are not that much
different SE 25 SD 24 in terms of
s 1 versus 0.96
19
Chapter 20 continuity correction
Non-mathematical analogy sample a drop of liquid
from a bottle for analysis a drop from a well
mixed liquid should represent the composition of
the entire bottle it does not matter what the
size of the bottle is drop could be 1 or
1/100 of 1 of the liquid Tickets in the box are
analogous to the molecules of the liquid a drop
from a well mixed liquid is like a random
sample the of molecules in the drop correspond
to the of tickets drawn the (sample size) is
so large that chance error in the s
is negligible Exercise Set C 1, 2, 4, 5
20
Chapter 20 review
1. Sample is a small part of the population so
the composition of the sample differs from
that of the population 2. For probability
samples, the SE estimates the likely size of
the chance error 3. To figure the SE, we need a
box model 4. When classifying and counting or
figuring s, the box must contain only 1s and
0s 5. When drawing at random from a 1 and 0
box expected value for of 1s in the
sample the in the box 6. To get the SE in
this case first get the SE for the
corresponding number and then convert to
via SE for (SE for /sample size)100
21
Chapter 20 review
7. When sample is small compared to population,
it is the absolute size of the sample that
determines the accuracy of the sample the size
of the sample relative to the population
is meaningless 8. The square root law is exact
when drawing with replacement 9. When drawing
without replacement, the law gives a good
approximation provided that the sample size is
small compared to the population 10. When drawing
without replacement, to get the exact SE we
need to multiply by the correction
factor 11. When the population is much larger
than the sample, the correction factor is close
to 1 and not necessary
Write a Comment
User Comments (0)
About PowerShow.com