Class Handout

About This Presentation

Title:

Class Handout

Description:

A simple random sample of size n is one selected in such a way so that each item ... model to predict whether or not a nonsmoker will develop cardiovascular ... – PowerPoint PPT presentation

Number of Views:27

Avg rating:3.0/5.0

Slides: 47

Provided by: genespr

Category:

more less

Transcript and Presenter's Notes

Title: Class Handout

1
Class Handout 17 (material not in text)
Definitions
A simple random sample of size n is one selected
in such a way so that each item in the sampled
population has an equal chance of being selected
and is selected independently of any other item.
(This is essentially the sampling method which
has been assumed in all of the inferential
statistics studied thus far.)
1. (a)
Suppose that a sample of n 2 names is to be
selected from the following list of four names
Patrice, Julia, Daniel, James.
List all possible samples consisting of n 2
distinct names. How many such samples are
possible?
There are six possible samples of size n 2
these are
Patrice , Julia
Patrice , Daniel
Patrice , James
Julia , Daniel
Julia , James
Daniel, James
2
Suppose one sample in part (a) is to be selected,
with each sample having the same probability of
being selected. Find (i) the probability
corresponding to the selection of any particular
sample, (ii) the probability corresponding
to the selection of any particular name.
(b) (c) (d)
1/6
3/6 1/2
Suppose one sample in part (a) is to be selected
by flipping a fair coin if heads is observed,
the two males are chosen, and if tails is
observed, the two females are chosen. Find
(i) the probability corresponding to the
selection of any particular sample,
(ii) the probability corresponding to the
selection of any particular name.
The probability of selecting the sample Patrice
, Julia is 1/2.
The probability of selecting the sample Daniel,
James is 1/2.
The probability of selecting each of the other
samples is 0.
1/2
Explain why the sampling method in part (c) is
not simple random sampling.
Even though the probability corresponding to the
selection of any particular name is the same
(1/2) for each name, the sampling method in part
(c) is not simple random sampling because the
names are not selected independently of one
another.
3
Class Handout 17 (material not in text)
Definitions
A simple random sample of size n is one selected
in such a way so that each item in the sampled
population has an equal chance of being selected
and is selected independently of any other item.
(This is essentially the sampling method which
has been assumed in all of the inferential
statistics studied thus far.)
Obtaining a simple random sample in practice is
often not easy, since physical randomization
procedures do not always work as well as one
might expect. It can be difficult and time
consuming to devise a sampling method that truly
randomizes.
A (calculator or computer) program capable of
generating random numbers or a table of random
numbers produced by such a program can be helpful
with random selection. (The table of random
numbers displayed in this handout was generated
from a sequence of single digits chosen from 0,
1, 2, 3, 4, 5, 6, 7, 8, and 9 in such a way so
that each of the ten integers has an equal
probability (1/10) of being selected, and each
digit is selected independently of all other
digits selected.)
4
Table of Random Numbers Page 1 of 2
52285 53301 71193 18991 34854 61701 10262 41876 19
487 06996 70189 45193 46899 90746 97060 46547 6452
3 16987 60706 51116 83441 17072 50243 83300 63817
07510 05828 95271 07689 29757 16254 51933 02155 93
543 20033 88132 16695 58878 39877 32928 84056 7048
9 14252 94132 04605 38293 60501 20415 82886 04396
01398 76800 59510 70789 59184 72725 81987 60820 67
407 06777 46083 69602 70703 54693 85747 69453 4015
8 84787 65193 07982 74331 57826 36074 65258 56350
67475 10856 05061 54175 45490 25271 57349 62441 93
647 49612 26541 48268 02745 50788 47621 39202 7631
1 95471 81348 27869 51539 78557 98949 37103 52406
34579 78142 36414 51185 76519 93762 52903 80133 50
426 60660 05815 46497 15942 97225 73766 07688 0399
5 63519 96185 79889 12256 07450 14703 66624 65096
05991 47792 58688 42390 72273 47813 55926 97304 99
955 38725 88637 32076 00059 69312 80328 14387 7601
5 22189 48236 40675 56748 08779 05052 81391 91175
12871 38800 45132 29899 05541 97211 00228 37943 75
643 24707 01790 16008 71773 70610 30776 99122 0637
1 95624 85352 10541 51382 87353 27396 55549 61991
35378 10592 91612 24863 82825 07971 05251 88080 17
223 00214 02158 32327 39454 00488 12942 10387 4263
2 71863 15885 19797 33809 43331 23947 52006 49206
5
12011 45094 27304 69411 93673 44863 57021 27073 02
530 05554 58606 04896 14194 50974 15115 47297 4617
2 09915 53970 45107 32011 56324 53923 94211 02603
31294 56761 62518 28402 94369 76525 54749 54816 27
929 16543 25163 23840 82143 67689 79686 00885 7899
9 80525 67499 38229 07047 70281 27868 28125 70575
23897 16928 42382 97327 00010 12017 09077 55481 11
533 67540 48352 74755 54420 95487 88860 15474 7100
7 71176 15957 55571 84380 89850 63181 12825 39798
64614 75342 59859 67318 25476 66567 06900 85623 22
070 69356 51678 13675 05150 96915 45290 69947 9509
3 39012 02347 65222 14159 89423 39265 32780 51071
32141 18317 80061 61293 46152 44829 58578 36048 44
759 64105 45368 48058 18536 63557 56784 04185 0121
6 22678 62302 24258 52706 41960 83558 74152 64791
79863 70243 23171 27735 78682 24526 79798 65401 66
916 12948 49739 83458 02173 36878 75965 11143 5165
8 61071 92410 67354 71470 50351 49670 90852 53216
90436 16823 71004 28796 31809 88289 64149 41630 10
620 71445 50695 16631 30928 17151 48671 89919 3399
3 24160 00432 22646 12661 40143 80449 48565 66144
02950 23605 14699 88568 94758 26095 37076 50570 47
774 35325 53309 57110 81659 44003 47856 37401 0887
8 38040 40886 79776 17453 16386 21934 51879 98593
6
Table of Random Numbers Page 2 of 2
73436 15235 68742 37408 54480 14276 19199 09247 51
883 31558 52085 88482 90978 50213 81692 71534 5517
2 89050 75064 22917 74921 67644 43406 47325 74620
57386 21548 45531 23754 57318 11779 30845 04668 57
815 86485 68987 65038 86689 37010 89742 12697 4029
5 38712 00802 12178 04126 79675 89010 56424 10949
01530 95937 54384 88243 69773 47554 87366 84193 32
006 74961 85512 28259 62138 06750 74984 82732 3935
2 47103 66217 69987 56593 44037 62860 83469 59660
03227 30197 38752 01730 91522 48971 46910 05183 47
636 38411 46134 05488 43152 50491 46963 80174 3704
1 68187 14965 08437 09184 78120 77283 85683 03000
10451 16818 20909 41817 22258 05721 89967 52817 48
289 37249 15709 96644 07787 19769 89928 25008 7516
1 71145 87686 14697 80410 73637 91873 19172 31133
03062 47472 77656 89860 29345 78387 26525 83058 27
609 39367 60404 17724 19381 09401 41116 80340 4580
2 51136 53479 02947 21568 38732 80795 32185 67188
19303 67335 12415 52422 88966 42252 39225 12063 66
612 96193 65547 43596 26205 84974 19288 64531 3088
6 67894 59180 87874 55138 63666 92563 73142 49480
87045 90442 51276 24575 08896 41251 49346 92401 08
983 37610 51640 32392 38966 94555 30858 34360 3339
5 50573 54086 76272 04358 49068 74122 65103 46740
7
28798 62591 88671 40094 20671 94535 13831 73727 23
013 17250 28821 35325 49845 40713 10831 62889 3114
4 94793 47631 69475 61406 11022 10801 82661 96007
82777 10886 50354 80586 36537 53064 03232 18298 42
549 17696 90115 73195 46877 43756 65747 10340 0784
1 84279 96642 43454 32981 36294 62135 87647 37954
75471 44635 36918 78946 58286 46874 08289 02970 45
582 97166 54595 16847 31134 89115 09788 97384 6964
2 64739 60784 08725 46054 66831 76812 76767 33350
66654 32282 46201 38030 61321 64056 31307 94018 92
901 18269 76377 77698 36684 01007 31710 48772 3963
4 51600 09518 90956 87022 30606 24204 42723 99132
71878 37326 75740 56392 33145 48232 04240 85284 24
372 70326 52795 28840 07950 09409 59846 58692 8403
9 66761 14916 74160 94307 80909 98649 46434 65594
18673 27853 77889 54909 49947 36496 61287 09743 69
322 00658 57232 68305 88356 10208 65712 93837 9478
8 28566 25575 69803 02395 80901 40244 25023 58347
62769 19152 42725 06747 32435 50598 47708 66061 26
076 77413 01441 77154 23681 26553 06565 60362 9759
1 65225 55668 47806 52357 67042 87617 05415 84880
38953 67029 58816 03215 41258 91948 81731 28846 88
081 38023 26118 25129 69856 67321 65109 49574 9611
3 07275 51855 73484 97206 38430 93330 87042 50463
8
2.
Suppose we want to select a simple random sample
of size n 8 from 456 employees of a factory.
Complete each description of how the random
number table displayed in this handout could be
utilized to select the sample then use the
random number table to select the sample. (i)
Obtain a list of the names of all 456 employees,
and assign each name one of the labels Choose an
arbitrary starting point in the random number
table, and read distinct sets of three-digit
entries between Choosing to start with the 26th
row of the first page of the random number table,
and reading the first three digits of each set of
five moving across the rows until the desired
sample is obtained, labels of the selected
entries are
001, 002, 003, ..., 456.
001 and 456 inclusive until we
obtain a sample of size 8.
238
159
169
423
120
090
115
154
This is the 26th row of the first page of the
random number table.
23897 16928 42382 97327 00010 12017 09077 55481 11
533 67540 48352 74755 54420 95487 88860 15474 7100
7 71176 15957 55571 84380 89850 63181 12825 39798
64614 75342 59859 67318 25476 66567 06900 85623 22
070 69356 51678 13675 05150 96915 45290 69947 9509
3 39012 02347 65222 14159 89423 39265 32780 51071
9
2.
Suppose we want to select a simple random sample
of size n 8 from 456 employees of a factory.
Complete each description of how the random
number table displayed in this handout could be
utilized to select the sample then use the
random number table to select the sample. (i)
Obtain a list of the names of all 456 employees,
and assign each name one of the labels Choose an
arbitrary starting point in the random number
table, and read distinct sets of three-digit
entries between Choosing to start with the 26th
row of the first page of the random number table,
and reading the first three digits of each set of
five moving across the rows until the desired
sample is obtained, labels of the selected
entries are
001, 002, 003, ..., 456.
001 and 456 inclusive until we
obtain a sample of size 8.
238
159
169
423
120
090
115
154
We had to read 19 entries from the table in order
to obtain a sample size of 8, because there were
so many unused not between 001 and 456. A more
efficient way to use the table is to give each of
the 456 employees two labels.
10
(ii)
Obtain a list of the names of all 456 employees,
and assign each name one of the label
pairs Choose an arbitrary starting point in the
random number table, and read distinct sets of
three-digit entries between Choosing to start
with the 6th row of the second page of the random
number table, and reading the first three digits
of each set of five moving across the rows until
the desired sample is obtained, label pairs of
the selected entries (where underlining denotes
the labels read from the table) are
(001, 501), (002, 502), ..., (456, 956).
either 001 and 456 inclusive or
501 and 956 inclusive, until we obtain a sample
of size 8.
(015, 515)
(043, 543)
(382, 882)
(197, 697)
(373, 873)
(341, 841)
(249, 749)
(320, 820)
This is the 6th row of the second page of the
random number table.
01530 95937 54384 88243 69773 47554 87366 84193 32
006 74961 85512 28259 62138 06750 74984 82732 3935
2 47103 66217 69987 56593 44037 62860 83469 59660
03227 30197 38752 01730 91522 48971 46910 05183 47
636 38411 46134 05488 43152 50491 46963 80174 3704
1 68187 14965 08437 09184 78120 77283 85683 03000
11
(ii)
Obtain a list of the names of all 456 employees,
and assign each name one of the label
pairs Choose an arbitrary starting point in the
random number table, and read distinct sets of
three-digit entries between Choosing to start
with the 6th row of the second page of the random
number table, and reading the first three digits
of each set of five moving across the rows until
the desired sample is obtained, label pairs of
the selected entries (where underlining denotes
the labels read from the table) are
(001, 501), (002, 502), ..., (456, 956).
either 001 and 456 inclusive or
501 and 956 inclusive, until we obtain a sample
of size 8.
(015, 515)
(043, 543)
(382, 882)
(197, 697)
(373, 873)
(341, 841)
(249, 749)
(320, 820)
We only had to read 10 entries in order to obtain
the sample size of 8,
12
3.
Suppose a simple random sample of tax returns are
to be selected for auditing from a population of
2,980 returns. Describe how the random number
table displayed in this handout could be utilized
to select the sample.
Assign each tax return one of the label
triples (0001, 3001, 6001), (0002, 3002,
6002), ..., (2980, 5980, 8980) .
The random sample could now be selected by
reading distinct sets of four-digit entries from
an arbitrary starting point in the random number
table,
A (calculator or computer) program capable of
generating random numbers or a table of random
numbers produced by such a program can be helpful
with random selection. (The table of random
numbers displayed in this handout was generated
from a sequence of single digits chosen from 0,
1, 2, 3, 4, 5, 6, 7, 8, and 9 in such a way so
that each of the ten integers has an equal
probability (1/10) of being selected, and each
digit is selected independently of all other
digits selected.)
Even with the aid of random number generation,
simple random sampling can be difficult or
tedious. Consequently, it can be convenient to
utilize alternative sampling procedures providing
results at least as good as simple random
sampling.
13
A sampling frame is a list of all the items in a
population. A sampling frame can be useful in
the selection of a sample but is not always
available, especially i f the population is large
or infinite.
A systematic random sample of size n from a
population of size N is one selected by defining
the integer k to be N / n, rounded up to the
nearest integer if necessary, and randomly
selecting one of the first k items and every kth
item thereafter this procedure can be modified
in an obvious way if the size of the population
is large or infinite.
4. (a)
Suppose a systematic random sample of n 3
employees is to be selected from a mythical
corporation consisting of N 15 employees.
Describe how to use a list of employee names and
the random number table to select the systematic
random sample.
In order to select a systematic random sample of
size n 3, first we let k 15/3 5, Then we
use the random number table to randomly select
one of the first k 5 names on the list, and we
select every 5th name thereafter.
14
(b)
Suppose the systematic random sample is to be
selected from the alphabetical list of employee
names displayed below. Make a list of all the
possible samples which could be selected and
indicate the corresponding probability of each.
For each sample, write the names and salaries.
Appleby(55,000) Gray(100,000) Pearson(60,000)
Employees of Mythical Corporation Ordered
Alphabetically by Last Name Name Position Salar
y Appleby Salesman 55,000 Bernhardt Senior
Vice President 90,000 Birch Salesman
50,000 Dickenson Secretary 20,000 Fry Junior
Vice President 85,000 Gray President
100,000 Jones Secretary 10,000 Mendel Salesm
an 35,000 Newman Salesman
65,000 Olsen Salesman 40,000 Pearson Salesma
n 60,000 Quill Head Secretary
25,000 Smith Salesman 45,000 Therman Salesma
n 75,000 Vern Salesman 70,000
Bernhardt(90,000) Jones(10,000) Quill(25,000)
Birch(50,000) Mendel(35,000) Smith(45,000)
Dickenson(20,000) Newman(65,000)
Therman(75,000)
Fry(85,000) Olsen(40,000) Vern(70,000)
Each of these samples has a probability
of being the selected sample.
1/5
15
4. - continued (c)
Suppose the systematic random sample is to be
selected from the list of employee names ordered
by salary displayed below. Make a list of all
the possible samples which could be selected and
indicate the corresponding probability of each.
For each sample, write the names and salaries.
Gray(100,000) Newman(65,000) Olsen(40,000)
Employees of Mythical Corporation Ordered by
Salary Name Position Salary Gray President
100,000 Bernhardt Senior Vice President
90,000 Fry Junior Vice President
85,000 Therman Salesman 75,000 Vern Salesma
n 70,000 Newman Salesman
65,000 Pearson Salesman 60,000 Appleby Sale
sman 55,000 Birch Salesman
50,000 Smith Salesman 45,000 Olsen Salesman
40,000 Mendel Salesman
35,000 Quill Head Secretary
25,000 Dickenson Secretary
20,000 Jones Secretary 10,000
Bernhardt(90,000) Pearson(60,000)
Mendel(35,000)
Fry(85,000) Appleby(55,000) Quill(25,000)
Therman(75,000) Birch(50,000)
Dickenson(20,000)
Vern(70,000) Smith(45,000) Jones(10,000)
16
Each of these samples has a probability
of being the selected sample.
1/5
(d) (e)
Suppose the purpose in selecting the sample of n
3 employees is to represent the distribution of
salaries in the corporation. Compare the
systematic random sampling procedures in parts
(b) and (c).
Each of the possible systematic samples in part
(c) contains one of the top five salaries, one of
the middle five salaries, and one of the bottom
five salaries (because the sampling frame is
ordered by salary), but this is not true for each
of the possible systematic samples in part (b)
(because the sampling frame is ordered
alphabetically which is random order with regard
to salary). Consequently, we expect the
systematic random sample from part (c) to better
represent the population.
Again, suppose the purpose in selecting the
sample of n 3 employees is to represent the
distribution of salaries in the corporation.
Discuss how the systematic random sampling method
compares with the simple random sampling method.
With a sampling frame in random order with regard
to salary, we expect systematic random sampling
and simple random sampling to have about the same
chance of producing a good representation of the
population with a sampling frame in either
ascending or descending order with regard to
salary, we expect that it is somewhat more likely
for the systematic random sample to better
represent the population.
17
A sampling frame is a list of all the items in a
population. A sampling frame can be useful in
the selection of a sample but is not always
available, especially i f the population is large
or infinite.
A systematic random sample of size n from a
population of size N is one selected by defining
the integer k to be N / n, rounded up to the
nearest integer if necessary, and randomly
selecting one of the first k items and every kth
item thereafter this procedure can be modified
in an obvious way if the size of the population
is large or infinite.
With a sampling frame in random order with regard
to the variable of interest, we expect systematic
random sampling and simple random sampling to
have about the same chance of producing a good
representation of the population. Systematic
random sampling might be preferred, since it
requires no complicated labeling scheme, making
it often easier to implement than simple random
sampling.
With a sampling frame in either ascending order
or descending order with regard to the variable
of interest, we expect systematic random sampling
to have a somewhat better chance than simple
random sampling of producing a good
representation of the population. With a
sampling frame that contains a cyclical pattern
with regard to the variable of interest, how well
a systematic sample represents the population
depends on how the values of n and k are related
to the cyclical pattern
18
5. (a)
Suppose a systematic random sample of n employees
is to be selected from a mythical corporation
consisting of N 20 employees with four
employees in each of five different departments.
The purpose is to represent the distribution of
salaries in the corporation. The sampling frame
is the list displayed on the right.
Employees of Mythical Corporation Ordered by
Salary within Department Name
Position Salary Taylor Head of Dept
A 50,000 Saunders Assistant Head of Dept
A 45,000 Nelson Member of Dept A
20,000 King Member of Dept A
20,000 Templar Head of Dept B
45,000 Randall Assistant Head of Dept B
40,000 Sinclair Member of Dept B
20,000 Wilde Member of Dept B
15,000 Hopkirk Head of Dept C
47,500 Nichols Assistant Head of Dept C
45,000 Philips Member of Dept C
25,000 Fitzhugh Member of Dept C
27,500 Kent Head of Dept D 55,000 Jason
Assistant Chm of Dept D 42,500 Simon
Member of Dept D 22,500 Curry Member of
Dept D 22,500 Hayes Head of Dept E
52,500 Collins Assistant Head of Dept E
47,500 Post Member of Dept E
17,500 Martin Member of Dept E 22,500
Suppose the sample size to be selected were n
5. Then find k, describe the possible systematic
random samples, and compare this systematic
random sampling method with simple random
sampling.
Since k 20/5 4, we select the systematic
sample by randomly selecting one of the first k
4 names and selecting every 4th name thereafter.
19
This will result in one of four possible samples
a sample consisting of the five department heads,
a sample consisting of the five assistant
department heads, or one of two samples
consisting of one member from each of the five
departments.
Samples containing only department heads or only
assistant department heads will tend to contain
the higher salaries, while samples containing
only department members will tend to contain the
lower salaries. None of these samples appears to
provide a reasonable representation of the
distribution of salaries in the corporation.
Consequently, simple random sampling would be
better.
(b)
Suppose the sample size to be selected were n
4. Then find k, describe the possible systematic
random samples, and compare this systematic
random sampling method with simple random
sampling.
Since k 20/4 5, we select the systematic
sample by randomly selecting one of the first k
5 names and selecting every 5th name thereafter.
This will result in one of five possible samples,
each consisting of exactly one department head,
one assistant department head, and two department
members.
Each such sample contains a variety of salaries,
therefore seeming to be a very good
representation of the distribution of salaries in
the corporation. Consequently, this systematic
sampling method would be better than simple
random sampling.
20
A stratified random sample is one selected by
first dividing a population into a few groups
called strata, and then selecting a simple random
sample from each group (stratum). In effect,
selecting a stratified random sample is selecting
several independent simple random samples, one
from each stratum.
21
6. (a)
Suppose a stratified random sample is to be
selected from the mythical corporation consisting
of N 20 employees from Class Exercise 5. The
purpose is to represent the distribution of
salaries in the corporation. The list of
employees from the corporation displayed in Class
Exercise 5 is redisplayed here on the right.
Employees of Mythical Corporation Ordered by
Salary within Department Name
Position Salary Taylor Head of Dept
A 50,000 Saunders Assistant Head of Dept
A 45,000 Nelson Member of Dept A
20,000 King Member of Dept A
20,000 Templar Head of Dept B
45,000 Randall Assistant Head of Dept B
40,000 Sinclair Member of Dept B
20,000 Wilde Member of Dept B
15,000 Hopkirk Head of Dept C
47,500 Nichols Assistant Head of Dept C
45,000 Philips Member of Dept C
25,000 Fitzhugh Member of Dept C
27,500 Kent Head of Dept D 55,000 Jason
Assistant Chm of Dept D 42,500 Simon
Member of Dept D 22,500 Curry Member of
Dept D 22,500 Hayes Head of Dept E
52,500 Collins Assistant Head of Dept E
47,500 Post Member of Dept E
17,500 Martin Member of Dept E 22,500
Suppose that the employees are divided into 2
strata, one consisting of the ten department
heads and assistant department heads, and the
other consisting of the ten members of various
departments. A stratified random sample is then
to be selected by randomly selecting 4 employees
from each stratum.
22
Find the total sample size n, describe the
possible stratified random samples, and compare
this stratified random sampling method with
simple random sampling.
The stratified random sample will be of size n
8 and is guaranteed to contain four of the ten
highest salaries and four of the ten lowest
salaries. Since no such guarantee is possible
with simple random sampling, we expect that it is
somewhat more likely for the stratified random
sample to better represent the population than
for a simple random sample.
A stratified random sample is one selected by
first dividing a population into a few groups
called strata, and then selecting a simple random
sample from each group (stratum). In effect,
selecting a stratified random sample is selecting
several independent simple random samples, one
from each stratum.
When there is less variation within each stratum
than there is within the entire population, the
likelihood of obtaining a good representation of
the population with stratified random sampling is
higher than that with simple random sampling.
When there is as much variation within each
stratum as there is within the entire population,
the likelihood of obtaining a good representation
of the population with stratified random sampling
is about the same as that with simple random
sampling.
23
Find the total sample size n, describe the
possible stratified random samples, and compare
this stratified random sampling method with
simple random sampling.
The stratified random sample will be of size n
8 and is guaranteed to contain four of the ten
highest salaries and four of the ten lowest
salaries. Since no such guarantee is possible
with simple random sampling, we expect that it is
somewhat more likely for the stratified random
sample to better represent the population than
for a simple random sample.
(b)
Suppose that the employees are divided into 5
strata, with each of the departments A, B, C, D,
and E being one stratum. A stratified random
sample is then to be selected by randomly
selecting 2 employees from each stratum. Find
the total sample size n, describe the possible
stratified random samples, and compare this
stratified random sampling method with simple
random sampling.
The stratified random sample will be of size n
10 and will be about as likely to contain some of
the higher salaries and some of the lower
salaries as will a imple random sample,
Consequently, the stratified random sample is
just as likely to be a good representation of the
population as a simple random sample would be.
24
A stratified random sample is one selected by
first dividing a population into a few groups
called strata, and then selecting a simple random
sample from each group (stratum). In effect,
selecting a stratified random sample is selecting
several independent simple random samples, one
from each stratum.
When there is less variation within each stratum
than there is within the entire population, the
likelihood of obtaining a good representation of
the population with stratified random sampling is
higher than that with simple random sampling.
When there is as much variation within each
stratum as there is within the entire population,
the likelihood of obtaining a good representation
of the population with stratified random sampling
is about the same as that with simple random
sampling.
Stratified sampling can also be used to assure
adequate representation of some small segment of
a population. For example, if only 0.5 of all
tax returns are from individuals making more than
500,000, we might deliberately select 5 or 10
of a stratified random sample from such
individuals, whereas with simple random sampling
there is no guarantee that any of these
individuals would be selected. When the
percentage of items in a stratified random sample
which come from a given stratum is not the same
as the percentage of items in the population
which come from that stratum, then this must of
course be taken into account in any statistical
analysis that is done.
25
A random cluster sample is one selected by first
dividing a population into many groups called
clusters, and then selecting a simple random
sample of the clusters. The sample consists of
all items included in each of the selected
clusters.
26
7. (a)
Suppose a random cluster sample is to be selected
from the mythical corporation consisting of N
20 employees from Class Exercises 5 and 6. The
purpose is to represent the distribution of
salaries in the corporation. The list of
employees from the corporation displayed in Class
Exercises 5 and 6 is redisplayed here on the
right.
Employees of Mythical Corporation Ordered by
Salary within Department Name
Position Salary Taylor Head of Dept
A 50,000 Saunders Assistant Head of Dept
A 45,000 Nelson Member of Dept A
20,000 King Member of Dept A
20,000 Templar Head of Dept B
45,000 Randall Assistant Head of Dept B
40,000 Sinclair Member of Dept B
20,000 Wilde Member of Dept B
15,000 Hopkirk Head of Dept C
47,500 Nichols Assistant Head of Dept C
45,000 Philips Member of Dept C
25,000 Fitzhugh Member of Dept C
27,500 Kent Head of Dept D 55,000 Jason
Assistant Chm of Dept D 42,500 Simon
Member of Dept D 22,500 Curry Member of
Dept D 22,500 Hayes Head of Dept E
52,500 Collins Assistant Head of Dept E
47,500 Post Member of Dept E
17,500 Martin Member of Dept E 22,500
Suppose that the employees are divided into 5
clusters, with each of the departments A, B, C,
D, and E being one cluster. A random cluster
sample is then to be selected by randomly
selecting 2 clusters (that is, two departments)
the sample will consist of all employees in the
selected clusters (departments).
27
Find the total sample size n, describe the
possible random cluster samples, and compare this
random cluster sampling method with simple random
sampling.
The random cluster sample will be of size n 8
and is guaranteed to contain four of the ten
highest salaries and four of the ten lowest
salaries. Since no such guarantee is possible
with simple random sampling, we expect that it is
somewhat more likely for the random cluster
sample to better represent the population than
for a simple random sample.
A random cluster sample is one selected by first
dividing a population into many groups called
clusters, and then selecting a simple random
sample of the clusters. The sample consists of
all items included in each of the selected
clusters.
Cluster sampling works best at giving a good
representation of a population, the population is
divided into many clusters with much variation
within each cluster.
The number of items in each cluster is not
necessarily the same, and consequently the sample
size after selecting random clusters may not be
known exactly ahead of time. This is not really
a serious disadvantage, since one can usually
exercise at least approximate control over the
sample size
28
Find the total sample size n, describe the
possible random cluster samples, and compare this
random cluster sampling method with simple random
sampling.
The random cluster sample will be of size n 8
and is guaranteed to contain four of the ten
highest salaries and four of the ten lowest
salaries. Since no such guarantee is possible
with simple random sampling, we expect that it is
somewhat more likely for the random cluster
sample to better represent the population than
for a simple random sample.
(b)
Suppose that the employees are divided into
employees are divided into 3 clusters, with one
cluster consisting of the five department
chairmen, one cluster consisting of the five
assistant department chairmen, one cluster
consisting of the ten members of various
departments. (Note that the number of items are
not the same for each cluster.) the sample will
consist of all employees in the selected
clusters. Decide what can be said about the
total sample size n, describe the possible random
cluster samples, and compare this random cluster
sampling method with simple random sampling.
The size n of the random cluster sample will be
between 10 and 15, but the sample is not
guaranteed to contain a variety of salaries,
Consequently, the random cluster sample is not
any more likely to be a good representation of
the population than is a simple random sample.
29
A random cluster sample is one selected by first
dividing a population into many groups called
clusters, and then selecting a simple random
sample of the clusters. The sample consists of
all items included in each of the selected
clusters.
Cluster sampling works best at giving a good
representation of a population, the population is
divided into many clusters with much variation
within each cluster.
The number of items in each cluster is not
necessarily the same, and consequently the sample
size after selecting random clusters may not be
known exactly ahead of time. This is not really
a serious disadvantage, since one can usually
exercise at least approximate control over the
sample size
Cluster sampling, like systematic sampling, is
often easier to implement than simple random
sampling. For instance, a labeling scheme to
identify groups of items in a large population
will be considerably less complex than a labeling
scheme to identify each item in the large
population.
A multi-stage sample is one selected by combining
one or more sampling procedures. (For example,
one might first randomly select clusters, and
then select a systematic random sample from each
selected cluster.) A sequential sample is one
where items are selected until some given
criterion is satisfied consequently no sample
size n can be chosen ahead of time.
30
A nonrandom sampling method involves no random
selection in general, it can be difficult to
evaluate the adequacy of such samples. A
convenience sample consists of whatever items are
available. A judgment sample is one which is
deemed to represent accurately the population of
interest by some expert opinion(s). An arbitrary
sample is one chosen by an experimenter in a
haphazard, seemingly random, manner. (The
difference between simple random sampling and
arbitrary sampling in that arbitrary selection
cannot guarantee that each item has an equal
chance of being selected.)
The link to the project (worth 200 points, the
same as a semester exam) is now active. You
should be working on this project, which is due
the last day of class.
Check some answers in Homework 24 before
submission
31
Homework 24 Score____________
/ 15 Name ______________
Additional HW Exercise 9.5 (a)
Predicting whether or not nonsmokers will develop
cardiovascular problems within two years from
age, percent body fat, weight, and hours of
aerobic exercise weekly is being studied. A 0.05
significance level is chosen for a logistic
regression. Data from a random sample of
nonsmokers is recorded and entered into the SPSS
data file cardio.
Define a dummy variable cvp which matches the way
the variable cvp is coded in the SPSS data file
cardio. Write a logistic regression model to
predict whether or not a nonsmoker will develop
cardiovascular problems within two years from
age, percent body fat, weight, and minutes of
aerobic exercise daily.
1 for cardiovascular problems within two
years cvp 0 for no cardiovascular problems
within two years
prcft percent body fat wght weight exc
weekly hours of aerobic exercise
Log(odds) ?0 ?1(age) ?2(prcft) ?3(wght)
?4(exc)
32
Additional HW Exercise 9.5.-continued
(c) (d)
Based on the SPSS output in part (b), explain why
one predictor should be eliminated from the
model, and identify which predictor should be
eliminated.
Since neither weight or age is statistically
significant at the 0.05 level with predictors
percent body fat and aerobic exercise in the
model, then we choose to eliminate the predictor
corresponding to the larger P-value, which is
weight.
Repeat part (b) with all predictors in the
Covariates section except the one selected to be
eliminated in part (c). Title the output to
identify the homework exercise (Additional HW
Exercise 9.5 - part (d)), your name, todays
date, and the course number (Math 214). Use the
File gt Print Preview options to see if any
editing is needed before printing the output.
Attach the printed copy to this assignment before
submission.
33
(e) (f)
Based on the SPSS output in part (d), explain why
one more predictor should be eliminated from the
model, and identify which predictor should be
eliminated.
Since age is not statistically significant at the
0.05 level with predictors percent body fat and
aerobic exercise in the model, then we choose to
eliminate age.
Repeat part (b) with all predictors in the
Covariates section except the two selected to be
eliminated in parts (c) and (e). Title the output
to identify the homework exercise (Additional HW
Exercise 9.5 - part (f)), your name, todays
date, and the course number (Math 214). Use the
File gt Print Preview options to see if any
editing is needed before printing the output.
Attach the printed copy to this assignment before
submission.
34
Additional HW Exercise 9.5.-continued
(g) (h) (i)
Based on the SPSS output in part (f), explain why
no predictor should be eliminated from the model.
Since each of the two predictors is statistically
significant at the 0.05 level with the other
predictor in the model, then we choose to keep
both predictors in the model.
Write the estimated logistic equation for
predicting whether or not cardiovascular problems
will occur within two years.
Log(odds) 4.318 0.179(prcft) 0.063(exc)
Use the estimated logistic equation to predict
whether or not cardiovascular problems will occur
within two years for a nonsmoker whose percent
body fat is 20 and whose weekly aerobic exercise
time is 50 hours.
Log(odds) 4.318 0.179(20) 0.063(50)
3.888
Since Log(odds) is negative, we predict that
cardiovascular problems will not occur within two
years.
35
The Final Exam has been scheduled for Final Exam
Week. (1) (2) The final exam is open notes and
book. (3) The final exam requires a
calculator. (4)
About half of the final exam covers the material
on Handouts 14, 15, 16, 17, 18 (i.e., Homeworks
22, 23, 24,25) the other half is cumulative
and consists of questions from selected topics.
If you wish, you may keep Homework 25 to study
and submit it with your final exam.
The final exam is out of 400 points (50 of which
are from homework).
Tips on taking the exams 1. 2.
Dont spend too much time on one problem. Go
through the entire exam and do all the problems
you can do easily and quickly.
Dont spend too a lot of time looking through
your notes and textbook. You should be familiar
enough with the material so that you only need
your notebook for an occasional quick reference.
36
The project (worth 200 points, the same as a
semester exam) is due today (the last day of
class).
You may spend the rest of this class working on
Homework 25 which can be submitted today or any
time up to, but not after, the final exam.
37
Homework 25 Score____________
/ 10 Name ______________
HW Exercise 25-1
Check answers before submitting Homework 25
A simple random sample of n 6 students is to be
selected from rosters consisting of 434 names.
Each name is represented by one of the following
pairs of labels (001, 501), (002, 502),
..., (434, 934) . Select the labels for the
sample, by reading the first three digits of each
set of five digits in the 21st row of the second
page of the random number table.
(287, 787) (125, 625) (386, 886) (400, 900)
(206, 706) (138, 638) (The underlined numbers
are the ones read from the random number table.)
38
HW Exercise 25-2
A simple random sample of n 5 subjects are to
be selected from a list of 30 names. Each name
is represented by one of the following
triples (01, 31, 61), (02, 32, 62), ...,
(30, 60, 90) . Select the labels for the
sample, by reading the first two digits of each
set of five digits in the 26th row of the first
page of the random number table.
(23, 53, 83) (16, 46, 76) (12, 42, 72) (09,
39, 69) (25, 55, 85) (The underlined numbers
are the ones read from the random number table.)
39
HW Exercise 25-3
A simple random sample of n 100 students is to
be selected from rosters containing 2856 names.
Indicate how the names might be labeled in order
that the sample will be selected in the most
efficient manner from a random number table. Do
not actually select the sample!
Each student can be represented by one of the
following triples (0001,3001,6001) (0002,
3002, 6002) ... (2856, 5856, 8856)
40
HW Exercise 25-4
A simple random sample of n 50 students is to
be selected from rosters containing 290 names.
Indicate how the names might be labeled in order
that the sample will be selected in the most
efficient manner from a random number table. Do
not actually select the sample!
Each student can be represented by one of the
following triples (001,301,601) (002,302,602)
... (290,590,890)
41
HW Exercise 25-5 (a) (b)
The average age of cars on the road in a certain
state is to be estimated by selecting a sample of
auto registrations. Decide if a good
representation of the population with systematic
random sampling is more likely, just as likely,
or less likely than with simple random sampling,
and explain why, for each of the following
situations
The sampling frame is a list automobiles ordered
alphabetically by owner's last name.
We expect the alphabetical ordering of the
sampling frame to be randomly ordered with regard
to age of the automobile. Consequently, we
expect a good representation of the population
with systematic random sampling to be just as
likely as with simple random sampling.
The sampling frame is a list automobiles first
ordered alphabetically by make of the auto, then
ordered by year of the auto.
We expect this sampling frame to exhibit a
cyclical pattern with regard to the age of the
automobile. Consequently, a good representation
of the population with systematic random sampling
could be more likely, just as likely, or less
likely than with simple random sampling.
42
The sampling frame is a list automobiles first
ordered by year of the auto, then ordered
alphabetically by make of the auto.
(c)
We expect this sampling frame roughly to be
ordered with regard to age of the automobile.
Consequently, we expect a good representation of
the population with systematic random sampling to
be somewhat more likely than with simple random
sampling.
43
HW Exercise 25-6 (a) (b)
(c)
Identify the type of sample method which is being
utilized in each of the following situations
A sample of machine parts produced on an assembly
line is selected by choosing every 25th machine
part.
systematic random sampling
A sample of households from a certain city is
selected by first dividing the city into five
regions based on how expensive the housing in
each region is, then randomly selecting ten
houses from each region.
stratified random sampling
A sample of households from a certain city is
selected by randomly selecting ten city blocks
and including all the households in each block in
the sample.
random cluster sampling
44
HW Exercise 25-7 (a) (b)
A stratified random sample of households from a
certain city is to be selected by first dividing
the city into five regions based on how expensive
the housing in each region is, and then randomly
selecting ten houses from each region. Decide if
a good representation of the population with
stratified random sampling is more likely, just
as likely, or less likely than with simple random
sampling, and explain why, for each of the
following situations
The purpose of sampling is to obtain information
about annual income per household.
Since there will most likely be less variation
with regard to annual income within each stratum
(i.e., within each of the five regions) than
within the population, a good representation of
the population is more likely with stratified
random sampling than with simple random sampling.
The purpose of sampling is to obtain information
of the amount of breakfast cereal consumed in
each household.
Since there will most likely be the same
variation with regard to cereal consumed within
each stratum (i.e., within each of the five
regions) than within the population, a good
representation of the population is just as
likely with stratified random sampling than with
simple random sampling.
45
HW Exercise 25-8 (a) (b)
A random cluster sample of households from a
certain city is to be selected by by randomly
selecting 10 city blocks and including all the
households in each block in the sample. Decide
if a good representation of the population with
random cluster sampling is more likely, just as
likely, or less likely than with simple random
sampling, and explain why, for each of the
following situations
The purpose of sampling is to obtain information
about annual income per household.
Since there will most likely be less variation
with regard to annual income within each cluster
(i.e., within each city block) than within the
population, a good representation of the
population is less likely with random cluster
sampling than with simple random sampling.
The purpose of sampling is to obtain information
of the amount of breakfast cereal consumed in
each household.
Since there will most likely be the same
variation with regard to cereal consumed within
each cluster (i.e., within each city block) than
within the population, a good representation of
the population is just as likely with random
cluster sampling than with simple random sampling.
46
HW Exercise 25-9 (a) (b)
Identify the type of sample method which is being
utilized in each of the following situations
A sample of trees is to be selected from a wooded
area consisting of 25,000 square feet. The
wooded area is divided into 625 areas each
containing 1000 square feet. After randomly
selecting 20 of the 625 areas, every 25th tree in
each selected areas is selected for the sample.
Multi-stage sampling is used first a random
cluster sample of areas is selected, and then a
systematic random sample of trees within each
selected cluster is selected.
A pollster approaches people walking along a busy
street at will and asks then to complete a short
verbal survey.
convenience or arbitrary sampling

Write a Comment

User Comments (0)