Title: Multistage Sample Design
1- Multistage Sample Design
- Â
- Basic ideas
- Â
- Two-stage cluster sampling can be extended to
multistage sampling to facilitate the sampling
needs and requirements. - Â
- Different frames are used at different stages.
It is possible to use area frames at certain
stages and time frames at other stages. It is
also possible to use more than one frame at any
stage.
2- Different sampling procedures can be used at
different stages, and stratification can be
introduced at any stage. - Â
- Sampling rates at different stages are often
determined to accomplish certain design
objectives such as self- weighting, oversampling
of certain units, reduction of design effect,
and accommodation of fieldwork requirements. - Â Two examples (Current Population Survey and the
Third National Health and Nutrition Examination
Survey) are reviewed below.
3- I. Current Population Survey (CPS)
- Â
- Background
- Â
- The first attempt to design a large-scale
probability sample survey was made in 1937 by
Work Progress Administration (WPA) and the first
survey was conducted in 1940 to estimate the
unemployment rate. - Â
- In 1942 this survey function was transferred
to the Bureau of the Census. The design
included 68 PSUs, which covered 125 counties and
cities. In 1945 it was expanded to include
25,000 households and 21,000 were usually
interviewed.
4- In 1954, CPS was expanded to include 230 PSU to
cover 25,000 households, and the overall
sampling ratio was 1/2245. In 1967, it was
redesigned to include 449 PSUs to cover nearly
60,000 households or over 100,000 persons of 16
years old and over. It had at least some
coverage in every state and the overall sampling
ratio was 1/1170. - Â The current design was introduced in 1996,
which includes about 60,000 households from 754
primary sampling areas.
5- Design objectives of CPS
- To produce monthly unemployment estimates for
the nation and for states. This survey is also
used to collect current demographic data such as
migration, school enrollment, and family size. - Â
6- It is designed to maintain a 1.9 of CV
(coefficient of variation) on national monthly
estimates of unemployment rate. This translates
into a change of 0.2 percentage point in the
unemployment rate being significant at a 90
confidence level. For each state, the design
maintains a CV of at most 8percent on the
annual average estimate of unemployment level,
assuming a 6 unemployment level.
7- Sample design of CPS
- Â The entire area of the US, consisting of 3,141
counties and independent cities, is divided into
2,007 PSUs. The PSUs are grouped into 754
strata, of which 428 contain one populous PSU,
and they are automatically selected to form
self-representing strata (or certainty strata).
The remaining PSUs strata are stratified into
326 strata that are similar in several
population characteristics. One PSU is selected
from each of non-representing strata by PPS
sampling procedure.
8- Within PSU sampling is based on census blocks
that are bounded primarily by streets in urban
areas and by other prominent physical features
such as rivers or railroad tracks in rural
areas. Census blocks are grouped into three
strata Unit (regular housing units), Group
(group living quarters), and Area (open
country). Blocks are sorted within strata by one
or two census characteristics such as proportion
of female heads of household and/or proportion
of owner occupied households. Within blocks
housing units (1990 census data updated by
building permits issued since 1990) are sorted
geographically and grouped into clusters of four
households.
9- The state level sampling rates are determined
based on population size and the reliability
requirements described above. They range
roughly from 1 in every 100 households to 1 in
very 3000 households. Then the within-PSU
sampling rate is determined based on the PSU
level sampling ratio and the overall state level
sampling rate. For example, for a PSU with a
probability of selection of 1 in 5 and the state
sampling ratio of 1/500, the within PSU sampling
ratio is 1 in 100.
10- A systematic sample of these clusters are
selected using the within PSU sampling ratio.
Then all eligible persons in selected clusters
are interviewed.
11- Rotation Scheme in CPS
- The sample within PSU is divided into 8
subsamples or rotation groups. One subsample is
replaced each month by taking a new sample. - Â
- A given subsample remains in the sample for 4
consecutive months, leaves the sample during the
following 8 months, and returns to the sample
for another 4 consecutive months. - Under this system of rotation, 75 percent of
the sample is common from month to month and 50
percent from year to year for the same month. - Â
12- Estimation in CPS
- Â
- The non-interview adjustment is made
separately for clusters of similar sample areas. -
- Post-stratification adjustment is made in two
stages (at the PSU level and the household
level) based on population census updated with
postcensal births and deaths.
13- Composite estimation procedure is used to
produce estimates, which is based on a weighted
average of current estimate based on the entire
sample and the composite estimate of previous
month plus the monthly change estimated based on
6 rotation groups common to both months. A bias
adjustment term is also added to the composite
estimate to correct for somewhat high
unemployment estimates for persons in their
first and fifth months of interview. The
year- to-year overlap in the sample would also
stabilize the estimates, although this change is
not included in the composite estimation
procedure.
14- References for CPS
- Â Bureau of Labor Statistics,
- URL http//stats.bls.gov/cpstn.htm
- Â
- Bureau of the Census, The Current Population
Survey Design and Methodology, Technical Paper
No. 40.
15 II. National Health and Nutrition Examination
Survey (NHANES) B Background     Three rounds
of health examination surveys were conducted in
1960s to produce information for the Nations
health status. Â Â Beginning in 1970 a large
nutrition component was added to the basic
design, and the name was changed to
NHANES. Â
16- A special survey of Hispanic population
(HHANES) was conducted in 1982-84, covering
Mexican Americans in Southwest, Cuban Americans
in Florida, and Puerto Rican Americans in New
York. - Â NHANES III was conducted in 1988-94 as the
seventh in a series of national examination
studies.
17S Survey objectives of NHANES III A . A
prevalence statistic of 10 percent should have a
relative standard error (RSE) less than 30
percent. A Â Â Differences of at least 10
percent in health or nutrition statistics
between any two sub-domains should be detected
with a type I error of no more than 0.05 and a
type II error of no more than 0.10. A set of 52
sub-domains is defined based on gender, age
group, and three race-ethnic groups (Black,
White and all other, and Hispanic). Â
18T To meet the above precision requirements, the
sample size for each of the defined sub-domains
was determined to be 560 or greater. Adjusting
for the design effect, the required sample size
for both black and Hispanic is 9000, and 12000
for white and all other persons. To yield the
required sample size, a total of about 40,000
persons are sampled. T The number of sample
persons selected at each survey site turned out
to be somewhere between 300 and 600, with an
average of approximately 450, yielding an
expected 340 examined persons. T The minimum
time to complete fieldwork at any site is 4
weeks.
19- Sample design of NHANES III
- Â
- The target population is the total civilian
non- institutionalized population, 2 months of
age or over, in the 50 states of the US. - Â
20- In 1st stage, 81 PSUs are selected from 2,812
PSUs defined based on counties or
combined counties. They are divided into 47
strata, of which 13 are certainty strata each of
which contains 1 large urban county. From the
remaining 34 strata 2 PSUs are selected by PPS
sampling without replacement. The 13 large
counties are rearranged into 21 survey sites,
subdividing some large counties. - The 89 sample areas are randomly divided into 2
sets. The 44 sites in the first set were
surveyed in 1988- 91 (Phase I), and the 45 sites
in the second set were surveyed in 1991-94
(Phase II). Each phase sample is an independent
sample.
21Â Â Â Â Â Â Â Â In 2nd stage, sampling within each of
selected PSUs, a sample of area segments (census
blocks) are selected within each strata based on
population density and the percent of Hispanic
population. Various controlled selection
procedures are used to provide a self-weighting
sample for all sex-age sub-domains of both black
and white and all other persons. In Phase I,
segments are defined based on 1980 census data
supplemented with building permits issued since
1980. In Phase II, segments are defined based on
1990 census data. There are 24 segments selected
in most sample areas.
22- IIn 3rd stage, households and certain types of
group quarters are selected. All households in
the sample segments are listed, and a subsample
of households and group quarters is designated
for screening to identify potential sample
persons for interviews and examinations. The
subsampling rates are designed to produce a
national, approximately equal, probability sample
of households, with higher rates for the
geographical strata with high minority
concentrations.
23- In 4th stage, eligible persons are selected
within households. All eligible persons within
the screened households are listed and a
subsample of individuals is selected based on
sex, age, and race-ethnicity. Oversampled
segments of population include young persons,
elderly persons, blacks, and Hispanics.
24E Estimation procedures in NHANES III Â Â The
sample weight is calculated for each individual
in the sample, which is the product of three
component weights inverse of the probabilities
of selection, nonreresponse adjustment, and
poststratification ratio adjustment. Â Separate
weights are developed for the interviewed
persons, examined persons, and special subsampled
persons. Weights are also available for Phases I
and II separately. TTo facilitate variance
estimation, data are restructured to have two
PSUs in each stratum.
25- References for NHANES III
- Â
- NCHS, Sample Design Third National Health and
Nutrition Examination Survey, Vital and
Health Statistics, Series 2, No. 113, 1992. - NCHS, Plan and Operation of the Third National
Health and Nutrition Examination Survey, Vital
and Health Statistics, Series 1, No. 32, 1994.