Title: Sampling Racial and Ethnic Minorities
1Sampling Racial andEthnic Minorities
- William D. Kalsbeek
- Director, Survey Research Unit
- Professor, Department of Biostatistics
- University of North Carolina
- June 15, 2000
2Acknowledgements
- Ms. Gayle Shimokura
- For significant contributions to this
presentation through her meticulous background
research. - CDC/National Center for Health Statistics
(Contract No. UR6/CCU417428-01) - For funding support for this presentation
- UNC-CHs Center for Health Statistics Research
- http//www.sph.unc.edu/chsr
3Race/Ethnic Minorities ( of Population March
2000 CPS)
- Hispanics (11.7 )
- Settled (95)
- Mobile (5 )
- African-American (12.8 )
- Settled (99.9)
- Mobile (0.1)
- Asian-American (4.0)
- Native-American (0.9)
4Overview
- Some basics on probability sampling
- Problems in sampling rare population subgroups
- A review of some existing remedies
- Note that a reference list is available
5Context Sampling Race/Ethnic Minorities
lt-------------------- General Population
-------------------gt
Targeted
With Oversampling
Ethnic Minority
- As the population subgroup of interest in a
specially targeted study (targeted sampling) - As a key subgroup in a general population study
(oversampling)
6Probability vs. Nonprobability Sampling?
- Probability sampling
- Random sampling methods used
- Each member of the target population with a
known, nonzero selection probability - Nonprobability sampling in exceptional
circumstances - Judgment used
- Requires models to analyze
- Probability sampling is generally preferred
7Sampling Frames and Linkage
- Sampling Frame List(s) used to select a
probability sample - EXAMPLE List of patients to sample health care
users - Usefulness of a frame is tied to
- The linkage that exists between entries on the
list and the population being sampled
8Sample Weights
- A number for each member of the sample
- Reflecting the inverse of the selection
probability for the sample member - May be adjusted for sample imbalance due to
- Nonresponse
- Incomplete frame coverage
- Other selection problems
9What are the Statistical Goals of Probability
Sampling?
- Validity
- The ability to produce estimates without bias
tied to sampling - Achieved if all population members have some
known chance to be chosen in the sample - Efficiency
- Tied to precision of estimates
- Achieved if the right sampling tools are used
- Greater efficiency costs more (cost-efficiency)
10What Selection Tools Might be Used to Sample
Race/Ethnic Minorities?
- Stratified sampling
- Separate sampling within each of a number of
population groupings (strata) - Screening for the targeted minority group
- Identify subgroup members in initial sample of
the full population
11Stratified Sampling
- Population divided into a H subgroups called
strata - Separate probability sample in each stratum
- Combine estimates from each stratum to produce
the estimate for the whole population - Vs. Stratified Analysis
12Stratified Sampling Used When
- Wish to improve the efficiency of population-wide
estimates - AND/OR
- Wish to control the sample size of estimates for
important population subgroups - Isolatable to some degree by the strata
13Stratum Allocation Options
- Ch Average cost of adding another respondent
- to the sample in the h-th stratum
14Stratum Allocation Options
15Screening for a Targeted Population Subgroup
- Sampling in two phases
- Goal is to locate members of the population
subgroup - Usually done by telephone or face-to-face in
general population surveys - Process
- Select an initial sample
- Administer a relatively short interview
- To determine membership in the targeted subgroup
- Retain all target subgroup and (perhaps) a random
portion of the rest
16What May Lead to Problems in Sampling Race/Ethnic
Minorities?
- Incomplete Frame(s)
- A sizable portion of the population not linked to
entries on the list(s) used for sampling - Rarity
- They usually comprising a relatively small
percentage of the target population
17What May Lead to Problems in Sampling Race/Ethnic
Minorities?
- Mobility
- Some of them move around a lot, thus creating a
more dynamic than static linkage between the
frame and sampled population - Dispersion
- They are somewhat scattered geographically
- May have some pockets with relatively high
concentrations
18(No Transcript)
19Some Remedies
- Targeted Sampling
- Multiple Frame Methods
- Linkage Exploitation Methods
- Network/multiplicity sampling
- Snowball sampling
- Adaptive cluster sampling
- Time and Space Sampling
- Oversampling
- Disproportionate Stratified Sampling with
Screening
20Multiple Frame Methods Selection Approaches
- Premise
- Frame options taken alone may be inadequate or
too costly to use, - BUT
- Choosing the sample jointly from multiple frames
may - Produce better coverage of the targeted
population and - Be more cost-effective
- Dual-Frame Designs --- Two frames
21Multiple Frames
Frame B
Frame A
Frame C
22Multiple Frame Methods EXAMPLE
- Sampling Native Americans
- Two frames
- List of tribal rolls
- Less complete
- Less expensive to locate NAs
- Area household frame from
- List of residential dwellings in a sample of
block groups (neighborhoods) - More complete
- More expensive because of the need to screen
- Most cost-effective mix ?
23Multiple Frame Methods Estimation Approaches
- Work by Hartley (1962), Choudry (1989), and
Skinner and Rao (1996) - Special Requirements
- Identify/eliminate overlap prior to sampling
- OR
- Require knowledge of membership in intersection
groups for analysis adjustments
24Multiple Frame Methods Estimation Approaches
- Eliminate frame duplication treat as a
stratified sample - OR
- Select with duplication present and either
- Combine estimates for intersection groups
- OR
- Determine frame membership for sample respondents
and weight accordingly
25Multiple Frame Methods Implications for Sampling
Race/Ethnic Minorities
- Advantages
- Improved sample coverage over using a single list
- Potential cost savings if cost of frame use
differs among frames - Disadvantages
- Higher design/selection/analysis complexity
relative to single frame use - Challenge in finding the most cost-effective mix
of sample sizes for frames
26Linkage Exploitation Methods Selection Approaches
- Premise
- Population members with a rare attribute can
often identify others with the same attribute - Various adaptations
- Based in the notion of multiplicity in frames
- Differ according to how multiplicity is utilized
27Multiplicity
Frame Listing
Population Member
28Linkage Exploitation Methods Various Adaptations
- Network/multiplicity sampling
- Network --- social/spatial/organizational linkage
among members of the targeted subgroup - EXAMPLES relatives, friends, co-workers,
co-habitants, organization co-members, etc. - Linkages may be
- Asymmetric
- Complex
- EXAMPLE friends
29Linkage Exploitation Methods Various Adaptations
- Network/multiplicity sampling
- Sampling Process
- Chose an initial sample of targeted subgroup
- Sample members interviewed and asked to nominate
other members of their network who are members of
the targeted subgroup - Interview those nominated and have them nominate
others in like manner - Selection probability directly tied to size of
network
30Linkage Exploitation Methods Various Adaptations
- Snowball sampling
- Network sampling but with multiple phases of
nomination - Snowballing may be best used to construct frames
to sample rare populations - Continue waves of nomination until list expansion
ceases
31Linkage Exploitation Methods Various Adaptations
- Adaptive cluster sampling
- Exploits the tendency for members of some
targeted subgroups to cluster together - Original motivation from ecology and geology
- Sampling Process
- Select a random sample of the population
- Where one identifies members of the targeted
subgroup, sample others in the neighborhood
32Linkage Exploitation Methods EXAMPLE
- Snowballing sampling frame of prenatal care
providers - Study of recent female immigrants from Central
and South America - Process
- Contact OB-GYNs in private practices and public
clinics - Those providing prenatal care to immigrants
nominate others doing the same - Continue iteratively until the no new providers
are discovered
33Linkage Exploitation Methods Estimation
- Major contributors Sirken (network), Goodman
(snowball), and Thompson (adaptive) - Approaches
- Weighted multiplicity estimation (Sirken)
- Rao-Blackwellization to improve estimator
efficiency (Thompson) - Special requirements
- Network membership information
- Multiplicity counts
34Linkage Exploitation Methods Implications for
Sampling Race/Ethnic Minorities
- Advantages
- Greater operational efficiency in locating
members of the target population - Find a hotspot then sample nearby
- Disadvantages
- Difficult to determine selection probabilities
for weights - Asymmetric linkages (A nominates B, but not vice
versa) - Valid probability samples?
35Time and Space Sampling Selection Approach
- Premise
- Portions of ethnic subpopulations are relatively
mobile (e.g., migrant farm workers, homeless) - Sampling a chunk of time
- Linkage between members of the target subgroup
and the frame is dynamic overtime - Those moving more frequently have greater chance
of selection - Sample space and time to address this potential
for bias
36Time and Space Sampling EXAMPLE
- Sampling migrant seasonal farm workers
- Process
- Spatial dimension sample migrant housing
locations - On farms
- In other residential housing areas
- Time dimension sample time periods during the
data collection period - Three consecutive days
37Time and Space Sampling Estimation
- Contributors Kalsbeek (1988) Kalton (1991)
- Approaches
- Multiplicity estimators similar to those used in
network samples - Special Requirements
- Need multiplicity count for each sample member?
- Sampling scheme compromise needed between
- Statistical precision of estimates
- Operational effectiveness
38Time and Space Sampling Implications for
Sampling Race/Ethnic Minorities
- Advantages
- Deals with the fluidity of frame-population
linkage in mobile populations - Provides a framework for finding a cost-efficient
solution - Disadvantages
- Added complexity to selection, data gathering,
and analysis of sample
39Disproportionate Stratified Sampling with
Screening Selection Approach
- Premise
- Concentrations of the targeted subgroup vary in
the population - Sample strata with higher concentrations more
heavily - Result larger sample size for the target
subgroup relative to a proportionate sample
40(No Transcript)
41DSS with Screening EXAMPLE
- Oversampling African-Americans
- A simple process
- Stratify the population
- By relatively high and low concentrations of
African-Americans - High concentration areas in the South and large
cities - Sample with relatively higher rates in the high
concentration stratum
42DSS with Screening Estimation
- Approaches
- Weighted estimate to account for sample
disproportionality - Effect of variable weights is to lower precision
of some population estimates - Special Requirements
- Establishing the most cost-efficient overall and
stratum-specific sampling rates
43DSS with Screening Implications for Sampling
Race/Ethnic Minorities
- Advantages
- Increased sample size for the targeted subgroups
- Are target subgroup non-members in the
(oversampled) high concentration strata) - Disadvantages
- Loss in precision on overall population estimates
44A Two-Stratum Model for Effects of Oversampling
- Setting
- Oversampling a minority group
- 10 of the population
- Two sampling strata
- One with higher minority (to oversample)
- One with lower minority (to undersample)
- Two alternative sets of strata
- Nearly Pure --- strata virtually all members or
non-members - Less Pure --- strata mostly all members or
non-members
45Nearly Pure Strata
Oversampled Stratum
Undersampled Stratum
TARGET POPULATION
46Less Pure Strata
Oversampled Stratum
Undersampled Stratum
TARGET POPULATION
47A Two-Stratum Model for Effects of Oversampling
- Assumptions
- Simple random sampling in each stratum
- Stratum unit variances are equal
- Other minor simplifying conditions
48A Two-Stratum Model for Effects of Oversampling
- Sample Sizes (Relative to Proportionate)
- Minority_Nom Nominal Sample Size for Minority
- Observed increase in size of minority sample
- Due to oversampling of the predominantly minority
stratum - Minority_Eff Effective Sample Size for Minority
- Adjusted size of minority sample
- Considering the (downward) effect of variable
sample weights on statistical quality of
estimates - Overall_Eff Effiective Size of Overall Sample
- Adjusted size of overall sample
- Considering the (downward) effect of variable
sample weights on statistical quality of
estimates
49Effects of Oversampling Nearly Pure Strata
50Effects of OversamplingLess Pure Strata
51Summary
- Sampling rare ethnic groups is possible
- BUT
- Accomplishing it effectively is likely to be
- Complex (dealing with multiplicity, dealing with
multiple frames, resolving statistical-operational
dilemmas) - Costly (screening, stratification)
- Adverse effect on overall population estimates
(if oversampling done) - Loss of sampling validity? (snowball sampling)
52A Case-Study in Oversampling Blacks and
Mexican-Americans
- The Third National Health and Nutrition
Examination Survey (NHANESIII)
53Cluster Sampling
- Random selection applied to one or more levels of
a population hierarchy - Sampling Stage Level of hierarchy at which
sampling is done - Jargon
- PSU Primary Sampling Unit is what is sampled in
the first selection stage - SSU Secondary Sampling Unit is what is sampled
in the second stage
54Population Hierarchies
55Population Hierarchies
- EXAMPLE African-American residents of the US
non-institutionalized household population
Resident gt Household gt Block Group gt Census Tract
gt Minor Civil Division gt County gt State gt US
56NHANES III Overview
- National health survey
- U.S. civilian noninstitutionalized population
- Stratified multi-stage sample design
- Detailed profile and predictors of health status
- Data gathering timeline
- 1988-94
- Data collected by
- Face-to-face interviews in the home
- Detailed examination at mobile sites
57NHANES III Target Population
- U.S. residents
- Two months and older
- Including those living in Alaska and Hawaii
- Civilians only
- Excludes housing on military bases
- Noninstitutionalized population only
- Excludes some residents of hospitals, nursing
homes, prisons, and other comparable institutions
- Eligibility determined as of the time of interview
58NHANES III in General
- Key minority domains
- Black (non-Hispanic)
- Mexican American
- Children 2 months 5 years
- The Elderly gt 60 years
59(No Transcript)
60Stratification to OversampleKey Minority Domains
Applied at
- The PSU level
- Race/ethnicity or income indicator
- The segment level
- Density of Mexican-Americans
- The household level
- Race/ethnicity
- The (sample) person level
- Age
61Oversampling of Key Minority Domains
- Implementation accomplished by
- Disproportionate allocation favoring key minority
domains - Using a weighted measure of size
62Stratification to Oversample Key Minority
Domains in NHANES III
63Stratification to Oversample Key Minority
Domains in NHANES III
- Oversampling implies more widely variable
selection probabilities and sample weights - Effect of variable weights is to increase
variances of estimates - One model Increased variance by a factor of,
64Stratification to Oversample Key Minority Domains
in NHANES III
- EXAMPLE
- Effect of variable sample weights on total
population estimates using data from the
MEC-examined NHANES III sample