Ch 4: Stratified Random Sampling STS - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

Ch 4: Stratified Random Sampling STS

Description:

Divide sampling frame into mutually exclusive and exhaustive strata. Assign each SU to one and only one ... Caribou survey example. 49. Optimal allocation ... – PowerPoint PPT presentation

Number of Views:1314
Avg rating:3.0/5.0
Slides: 72
Provided by: sarahn
Category:

less

Transcript and Presenter's Notes

Title: Ch 4: Stratified Random Sampling STS


1
Ch 4 Stratified Random Sampling (STS)
  • DEFN A stratified random sample is obtained by
    separating the population units into
    non-overlapping groups, called strata, and then
    selecting a random sample from each stratum

2
Procedure
  • Divide sampling frame into mutually exclusive and
    exhaustive strata
  • Assign each SU to one and only one stratum
  • Select a random sample from each stratum
  • Select random sample from stratum 1
  • Select random sample from stratum 2

Stratum H
Stratum 1
h1
h2
. . .
. . .
hH
3
Ag example
  • Divide 3078 counties into 4 strata corresponding
    to regions of the countries
  • Northeast (h 1)
  • North central (h 2)
  • South (h 3)
  • West (h 4)
  • Select a SRS from each stratum
  • In this example, stratum sample size is
    proportional to stratum population size
  • 300 is 9.75 of 3078
  • Each stratum sample size is 9.75 of stratum
    population

4
Ag example 2
5
Procedure 2
  • Need to have a stratum value for each SU in the
    frame
  • Minimum set of variables in sampling frame SU
    id, stratum assignment

6
Ag example 3
7
Procedure 3
  • Each stratum sample is selected independently of
    others
  • New set of random numbers for each stratum
  • Basis for deriving properties of estimators
  • Design within a stratum
  • For Ch 4, we will assume a SRS is selected within
    each stratum
  • Can use any probability design within a stratum
  • Sample designs do not need to be the same across
    strata

8
Uses for STS
  • To improve representativeness of sample
  • In SRS, can get ANY combination of n elements in
    the sample
  • In SYS, we severely restricted the set to k
    possible samples
  • Can get bad samples
  • Less likely to get unbalanced samples if frame is
    sorted using a variable correlated with Y

9
Uses for STS 2
  • To improve representativeness of sample - 2
  • In STS, we also exclude samples
  • Explicitly choose strata to restrict possible
    samples
  • Improve chance of getting representative samples
    if use strata to encourage spread across
    variation in population

10
Uses for STS 3
  • To improve precision of estimates for population
    parameters
  • Achieved by creating strata so that
  • variation WITHIN stratum is small
  • variation AMONG strata is large
  • Uses same principal as blocking in experimental
    design
  • Improve precision of estimate for population
    parameter by obtaining precise estimates within
    each stratum

11
Uses for STS 4
  • To study specific subpopulations
  • Define strata to be subpopulations of interest
  • Examples
  • Male v. female
  • Racial/ethnic minorities
  • Geographic regions
  • Population density (rural v. urban)
  • College classification
  • Can establish sample size within each stratum to
    achieve desired precision level for estimates of
    subpopulations

12
Uses for STS 5
  • To assist in implementing operational aspects of
    survey
  • May wish to apply different sampling and data
    collection procedures for different groups
  • Agricultural surveys (sample designs)
  • Large farms in one stratum are selected using a
    list frame
  • Smaller farms belong to a second strata, and are
    selected using an area sample
  • Survey of employers (data collection methods)
  • Large firms use mail survey because information
    is too voluminous to get over the phone
  • Small firms telephone survey

13
Estimation strategy
  • Objective estimate population total
  • Obtain estimates for each stratum
  • Estimate stratum population total
  • Use SRS estimator for stratum total
  • Estimate variance of estimator in each stratum
  • Use SRS estimator for variance of estimated
    stratum total
  • Pool estimates across strata
  • Sum stratum total estimates and variance
    estimates across strata
  • Variance formula justified by independence of
    samples across strata

14
Ag example 4
15
Ag example 5
  • Estimated total farm acres in US

16
Ag example 6
17
Ag example 7
  • Estimated variance for estimated total farm acres
    in US

18
Ag example 8
  • Compare with SRS estimates

19
Estimation strategy - 2
  • Objective estimate population mean
  • Divide estimated total by population size
  • OR equivalently,
  • Obtain estimates for each stratum
  • Estimate stratum mean with stratum sample mean
  • Pool estimates across strata
  • Use weighted average of stratum sample means with
    weights proportional to stratum sizes Nh

20
Ag example 9
  • Estimated mean farm acres / county

21
Ag example 10
  • Estimate variance of estimated mean farm acres /
    county

22
Notation
  • Index set for stratum h 1, 2, , H
  • Uh 1, 2, , Nh
  • Nh number of OUs in stratum h in the population
  • Partition sample of size n across strata
  • nh number of sample units from stratum h
    (fixed)
  • Sh index set for sample belonging to stratum h

Stratum H
23
Notation 2
  • Population sizes
  • Nh number of OUs in stratum h in the population
  • N N1 N2 NH
  • Partition sample of size n across strata
  • nh number of sample units from stratum h
  • n n1 n2 nH
  • The stratum sample sizes are fixed
  • In domain estimation, they are random
  • For now, we will assume that the sampling unit
    (SU) is an observation unit (OU)

24
Notation 3
  • Response variable
  • Yhj characteristic of interest for OU j in
    stratum h
  • Population and stratum totals

25
Notation 4
  • Population and stratum means

26
Notation 5
  • Population stratum variance

27
Notation 6
  • SRS estimators for stratum parameters

28
STS estimators
  • For population total

29
STS estimators 2
  • For population mean

30
STS estimators 3
  • For population proportion

31
Properties
  • STS estimators are unbiased
  • Each estimate of stratum population mean or total
    is unbiased (from SRS)

32
Properties 2
  • Inclusion probability for SU j in stratum h
  • Definition in words
  • Formula ?hj

33
Properties 3
  • In general, for any stratification scheme, STS
    will provide a more precise estimate of the
    population parameters (mean, total, proportion)
    than SRS
  • For example
  • Confidence intervals
  • Same form (using z?/2)
  • Different CLT

34
Sampling weights
  • Note that
  • Sampling weight for SU j in stratum h
  • A sampling weight is a measure of the number of
    units in populations represented by SU j in
    stratum h

35
Example
  • Note weights for each OU within a stratum are
    the same

36
Example 2
  • Dataset from study

37
Sampling weights 2
  • For STS estimators presented in Ch 4, sampling
    weight is the inverse inclusion probability

38
Defining strata
  • Depends on purpose of stratification
  • Improved representativeness
  • Improved precision
  • Subpopulations estimates
  • Implementing operational aspects
  • If possible, use factors related to variation in
    characteristic of interest, Y
  • Geography, political boundaries, population
    density
  • Gender, ethnicity/race, ISU classification
  • Size or type of business
  • Remember
  • Stratum variable must be available for all OUs

39
Allocation strategies
  • Want to sample n units from the population
  • An allocation rule defines how n will be spread
    across the H strata and thus defines values for
    nh
  • Overview for estimating population parameters

Special cases of optimal allocation
40
Allocation strategies 2
  • Focus is on estimating parameter for entire
    population
  • Well look at subpopulations later
  • Factors affecting allocation rule
  • Number of OUs in stratum
  • Data collection costs within strata
  • Within-stratum variance

41
Proportional allocation
  • Stratum sample size allocated in proportion to
    population size within stratum
  • Allocation rule

42
Ag example 11
43
Proportional allocation 2
  • Proportional allocation rule implies
  • Sampling fraction for stratum h is constant
    across strata
  • Inclusion probability is constant for all SUs in
    population
  • Sampling weight for each unit is constant

44
Proportional allocation 3
  • STS with proportional allocation leads to a
    self-weighting sample
  • What is a self-weighting sample?
  • If whj has the same value for every OU in the
    sample, a sample is said to be self-weighting
  • Since each weight is the same, each sample unit
    represents the same number of units in the
    population
  • For self-weighting samples, estimator for
    population mean to sample mean
  • Estimator for variance does NOT necessarily
    reduce to SRS estimator for variance of

45
Proportional allocation 4
  • Check to see that a STS with proportional
    allocation generates a self-weighting sample
  • Is the sample weight whj is same for each OU?
  • Is estimator for population mean equal to
    the sample mean ?
  • What happens to the variance of ?

46
Ag example 12
  • Even though we have used proportional allocation,
    rounding in setting sample sizes can lead to
    unequal (but approximately equal) weights

47
Neyman allocation
  • Suppose within-stratum variances vary
    across strata
  • Stratum sample size allocated in proportion to
  • Population size within stratum Nh
  • Population standard deviation within stratum Sh
  • Allocation rule

48
Caribou survey example
49
Optimal allocation
  • Suppose data collection costs ch vary across
    strata
  • Let C total budget
  • c0 fixed costs (office rental, field
    manager)
  • ch cost per SU in stratum h (interviewer
    time, travel cost)
  • Express budget constraints asand determine nh

50
Optimal allocation 2
  • Assume general case stratum population sizes,
    stratum variances, and stratum data collection
    costs vary across strata
  • Sample size is allocated to strata in proportion
    to
  • Stratum population size Nh
  • Stratum standard deviation Sh
  • Inverse square root of stratum data collection
    costs
  • Allocation rule

51
Optimal allocation 3
  • Obtain this formula by finding nh such that
    is minimized given cost constraints
  • The optimal stratum allocation will generate the
    smallest variance of for a given
    stratification and cost constraint
  • Sample size for stratum h (nh ) is larger in
    strata where one or more of the following
    conditions exist
  • Stratum size Nh is large
  • Stratum variance is large
  • Stratum per-unit data collection costs ch are
    small

52
Welfare example
  • Objective
  • Estimate fraction of welfare participant
    households in NE Iowa that have access to a
    reliable vehicle for work
  • Sample design
  • Frame welfare participant list
  • Stratum 1 Phone
  • N1 4500 households, p1 0.85, c1 100
  • Stratum 2 No phone
  • N2 500 households, p2 0.50, c2 300
  • Sample size n 500

53
Welfare example 2
  • Optimal allocation with phone strata

54
Optimal allocation 4
  • Proportional and Neyman allocation are special
    cases of optimal allocation
  • Neyman allocation
  • Data collection costs per sample unit ch are
    approximately constant across strata
  • Telephone survey of US residents with regional
    strata
  • ch term cancels out of optimal allocation formula

55
Optimal allocation 5
  • Proportional allocation
  • Data collection costs per sample unit ch are
    approximately constant across strata
  • Within stratum variances are approximately
    constant across strata
  • Y number of persons per household is relatively
    constant across regions
  • ch and Sh terms drop out of allocation formula

56
Subpopulation allocation
  • Suppose main interest is in estimating stratum
    parameters
  • Subpopulation (stratum) mean, total, proportion
  • Define strata to be subpopulations
  • Estimate stratum population parameters
  • Allocation rules derived from independent SRS
    within each stratum (subpopulation)
  • Equal allocation for equal stratum costs,
    variances
  • Stratum variances change across strata

57
Subpopulation allocation 2
  • Equal allocation
  • Assume
  • Desired precision levels for each subpopulation
    (stratum) are constant across strata
  • Stratum costs, stratum variances equal across
    strata
  • Stratum FPCs near 1
  • Allocation rule is to divide n equally across
    the H strata (subpopulations)
  • If Nh vary much, equal allocation will lead to
    less precise estimates of parameters for full
    population

58
Welfare example 3
  • Suppose we wanted to estimate proportion of
    welfare households that have access to a car for
    households in each of three subpopulations in NE
    Iowa
  • Metropolitan county
  • Counties adjacent to metropolitan county
  • Counties not adjacent to metro county

59
Welfare example 4
  • Equal allocation with population density strata

60
Subpopulation allocation 3
  • More complex settings If Sh vary across strata,
    can use SRS formulas for determining stratum
    sample sizes, e.g., for stratum mean
  • Result is
  • May get sample sizes (nh) that are too large or
    small relative to budget
  • Relax margin of error eh and/or confidence level
    100(1-?)
  • Recalibrate stratum sample sizes to get desired
    sample size

61
Welfare example 5
  • 95 CI, e 0.10 for all pop density strata

62
Compromise allocations
Proportional Allocation
Equal Allocation
nh n /H
nh nNh /N
nh
nh
Nh
Nh
Nh
Square Root Allocation
63
Square root allocation
  • More SUs to small strata than proportional
    allocation
  • Fewer SUs to large strata than equal
  • Variance for subpopulation estimates is smaller
    than proportional
  • Variance for whole population estimates is
    smaller than equal allocation

Nh
Square Root Allocation
64
Compromise allocations 2
  • May want to set
  • Minimum number of SUs in a stratum
  • Cap on max number of SUs in a stratum
  • Rule
  • nh min for Nh lt A
  • nh max for Nh gt B
  • Apply rule in between A and B
  • Square root
  • Proportional

nh
max nh
min nh
A B Nh
nh
max nh
min nh
A B Nh
65
Welfare example 6
  • Comparing equal, proportional and square root
    allocation

66
Other allocations
  • Certainty stratum is used to guarantee inclusion
    in sample
  • Census (sample all) the units in a stratum
  • For certainty stratum h
  • Allocation nh Nh
  • Inclusion probability ?hj 1
  • Ad hoc allocations
  • The sample allocation does not have to follow any
    of the rules mentioned so far
  • However, you should determine the stratum
    allocation in relation to analysis objectives and
    operational constraints

67
Welfare example 7
  • Ad hoc allocation

68
Determining sample size n
  • Determine allocation using rule expressed in
    terms of relative sample size nh /n
  • Rewrite variance of as a function of
    relative sample sizes (ignoring stratum FPCs)
  • Sample size calculation based on margin of error
    e for population total

69
Determining sample size n 2
  • Rewrite variance of as a function of
    relative sample sizes (ignoring stratum FPCs)
  • Samples size calculation based on margin of error
    e for population mean

70
Welfare example 8
  • Relative sample size for equal allocation
  • Value of ?
  • For 95 CI with e 0.1

71
STS Summary
  • Choose stratification scheme
  • Scheme depends on objectives, operational
    constraints
  • Must know stratum identifier for each SU in the
    frame
  • Set a design for each stratum
  • Design for each stratum SRS, SYS,
  • Determine n and nh
  • Select sample independently within each stratum
  • Pool stratum estimates to get estimates of
    population parameters
Write a Comment
User Comments (0)
About PowerShow.com