Title: Ratio estimation under SRS
1Ratio estimation under SRS
- Assume
- Absence of nonsampling error
- SRS of size n from a pop of size N
- Ratio estimation is alternative to under SRS,
uses auxiliary information (X ) - Sample data observe yi and xi
- Population information
- Have yi and xi on all individual units, or
- Have summary statistics from the population
distribution of X, such as population mean, total
of X - Ratio estimation is also used to estimate
population parameter called a ratio (B )
2Uses
- Estimate a ratio
- Tree volume or bushels per acre
- Per capita income
- Liability to asset ratio
- More precise estimator of population parameters
- If X and Y are correlated, can improve upon
- Estimating totals when pop size N is unknown
- Avoids need to know N in formula for
- Domain estimation
- Obtaining estimates of subsamples
- Incorporate known information into estimates
- Postratification
- Adjust for nonresponse
3Estimating a ratio, B
- Population parameter for the ratio B
- Examples
- Number of bushels harvested (y) per acre (x)
- Number of children (y) per single-parent
household (x) - Total usable weight (y) relative to total
shipment weight (x) for chickens
4Estimating a ratio
- SRS of n observation units
- Collect data on y and x for each OU
- Natural estimator for B ?
5Estimating a ratio -2
- Estimator for B
- is a biased estimator for B
-
- is a ratio of random variables
6Bias of
7Bias of 2
- Bias is small if
- Sample size n is large
- Sample fraction n/N is large
- is large
- is small (pop std deviation for x)
- High positive correlation between X and Y
- (see Lohr p. 67)
8Estimated variance of estimator for B
- Estimator for
- If is unknown?
9Variance of
- Variance is small if
- sample size n is large
- sample fraction n/N is large
- deviations about line e y ? Bx are small
- correlation between X and Y close to ?1
- is large
10Ag example 1
- Frame 1987 Agricultural Census
- Take SRS of 300 counties from 3078 counties to
estimate conditions in 1992 - Collect data on y , have data on x for sample
- Existing knowledge about the population
11Ag example 2
0.9866 farm acres in 1992 relative to 1987 farm
acres
12Ag example 3
- Need to calculate variance of ei s
13Ag example 4
- For each county i, calculate
- Coffee Co, AL example
- Sum of squares for ei
-
14Ag example 5
15Estimating proportions
- If denominator variable is random, use ratio
estimator to estimate the proportion p - Example (p. 72)
- 10 plots under protected oak trees used to assess
effect of feral pigs on native vegetation on
Santa Cruz Island, CA - Count live seedlings y and total number of
seedlings x per plot - Y and X correlated due to common environmental
factors - Estimate proportion of live seedlings to total
number of seedlings
16Estimating population mean
- Estimator for
- Adjustment factor for sample mean
- A measure of discrepancy between sample and
population information, and - Improves precision if X and Y are correlated
17Underlying model
-
- with B gt 0
- B is a slope
- B gt 0 indicates X and Y are positively
correlated - Absence of intercept implies line must go
through origin (0, 0)
0
18Using population mean of X to adjust sample mean
-
-
-
-
- Discrepancy between sample pop info for X is
viewed as evidence that same relative discrepancy
exists between
19Bias of
- Ratio estimator for the population mean is biased
- Rules of thumb for bias of apply
20Estimator for variance of
- Estimator for variance of
21Ag example 6
22Ag example - 8
23Ag example 9
- Expect a linear relationship between X and Y
(Figure 3.1) - Note that sample mean is not equal to population
mean for X
24MSE under ratio estimation
- Recall
- MSE Variance Bias2
- SRS estimators are unbiased so
- MSE Variance
- Ratio estimators are biased so
- MSE gt Variance
- Use MSE to compare design/estimation strategies
- EX compare sample mean under SRS with ratio
estimator for pop mean under SRS
25Sample mean vs. ratio estimator of mean
- is smaller than
if and only if - For example, if and
- ratio estimation will be better than SRS
26Estimating the MSE
- Estimate MSE with sample estimates of bias and
variance of estimator - This tends to underestimate MSE
- and are approximations
- Estimated MSE is less biased if
- is small (see earlier slide)
- Large sample size or sampling fraction
- High correlation for X and Y
- is a precise estimate (small CV for )
- We have a reasonably large sample size (n gt 30)
27Ag example 10
28Estimating population total t
- Estimator for t
- Is biased?
- Estimator for
29Ag example 11
30Summary of ratio estimation
31Summary of ratio estn 2
32Regression estimation
- What if relationship between y and x is linear,
but does NOT pass through the origin - Better model in this case is
33Regression estimation 2
- New estimator is a regression estimator
- To estimate , is predicted value
from regression of y on x at - Adjustment factor for sample mean is linear,
rather than multiplicative
34Estimating population mean
- Regression estimator
- Estimating regression parameters
35Estimating pop mean 2
- Sample variances, correlation, covariance
36Bias in regression estimator
37Estimating variance
- Note This is a different residual than ratio
estimation (predicted values differ)
38Estimating the MSE
- Plugging sample estimates into Lohr, equation
3.13
39Estimating population total t
- Is regression estimator for t unbiased?
40Tree example
- Goal obtain a precise estimate of number of
dead trees in an area - Sample
- Select n 25 out of N 100 plots
- Make field determination of number of dead trees
per plot, yi - Population
- For all N 100 plots, have photo determination
on number of dead trees per plot, xi - Calculate 11.3 dead trees per plot
41Tree example 2
- Lohr, p. 77-78
- Data
- Plot of y vs. x
- Output from PROC REG
- Components for calculating estimators and
estimating the variance of the estimators - We will use PROC SURVEYREG, which will give you
the correct output for regression estimators
42Tree example 3
- Estimated mean number of dead trees/plot
- Estimated total number of dead trees
43Tree example 4
- Due to small sample size, Lohr uses t
-distribution w/ n ? 2 degrees of freedom - Half-width for 95 CI
- Approx 95 CI for ty is (1115, 1283) dead trees
44Related estimators
- Ratio estimator
- B0 0 ? ratio model
- Ratio estimator ? regression estimator with no
intercept - Difference estimation
- B1 1 ? slope is assumed to be 1
45Domain estimation under SRS
- Usually interested in estimates and inferences
for subpopulations, called domains - If we have not used stratification to set the
sample size for each domain, then we should use
domain estimation - We will assume SRS for this discussion
- If we use stratified sampling with strata
domains, then use stratum estimators (Ch 4) - To use stratification, need to know domain
assignment for each unit in the sampling frame
prior to sampling
46Stratification vs. domain estimation
- In stratified random sampling
- Define sample size in each stratum before
collecting data - Sample size in stratum h is fixed, or known
- In other words, the sample size nh is the same
for each sample selected under the specified
design - In domain estimation
- nd sample size in domain d is random
- Dont know nd until after the data have been
collected - The value of nd changes from sample to sample
47Population partitioned into domains
- Recall U index set for population 1, 2, , N
- Domain index set for domain d 1, 2, , D
- Ud 1, 2, , Nd where Nd number of OUs in
domain d in the population - In sample of size n
- nd number of sample units from domain d are
in the sample - Sd index set for sample belonging to domain d
Domain D
48Boat owner example
- Population
- N 400,000 boat owners (currently licensed)
- Sample
- n 1,500 owners selected using SRS
- Divide universe (population) into 2 domains
- d 1 own open motor boat gt 16 ft. (large boat)
- d 2 do not own this type of boat
- Of the n 1500 sample owners
- n1 472 owners of open motor boat gt 16 ft.
- n2 1028 owners do not own this kind of boat
49New population parameters
50Boat owner example - 2
- Estimate population domain mean
- Estimate the average number of children for boat
owners from domain 1 - Estimate proportion of boat owners from domain 1
who have children - Estimate population domain total
- Estimate the total number of children for large
boat owners (domain 1)
51New population parameter 2
- Ratio form of population mean
-
- Numerator variable
- Denominator variable
52Boat owner example - 3
- Estimate mean number of children for owners from
domain 1
Applies to whole pop
Zero values for OUs that are not in domain 1
53Boat example 4
54 Estimator for population domain mean
55Boat example 5
56Boat example 6
- Domain 1 and domain 2 data combined
1104 zeros 76 zeros from domain 1 1028
zeros from domain 2
57Boat example 7
- Two ways of estimating mean
-
-
Whole data set
Domain 1 data only
58 Estimator for variance of
59Boat example 8
60Boat example 9
61Approximation for estimator of variance of
Domain 1 data only
62Estimated variance of
- Estimator for
- Domain variance estimator is directly related
63 Relationship to estimating a ratio with
- Population mean of X
- Residual
64 Relationship to estimating a ratio with - 2
65 Estimator for variance of
66Estimating a population domain total
- If we know the domain sizes, Nd
67Estimating a population domain total - 2
- If we do NOT know the domain sizes
Standard SRS estimator using u as the variable
68Boat example 10
- Do not know the domain size, N1
69Comparing 2 domain means
- Suppose we want to test the hypothesis that two
domain means are equal -
- Construct a z-test with Type 1 error rate ? (for
falsely rejecting null hypothesis) - Test statistic
- Critical value z?/2
- Reject H0 if z gt z?/2
70Boat example - 10
- Large boat owners (d 1)
- Other boat owners (d 2)
71Boat example - 11
- Test whether domain means are equal at ? 0.05
- Calculate z-statistic
- Critical value z?/2 z0.25 1.96
- Apply rejection rule
- z -1.041.04 lt 1.96 z0.25
- Fail to reject H0
72Overview
- Population parameters
- Mean
- Total
- Proportion (w/ fixed denom)
- Ratio
- Includes proportion w/ random denominator
- Domain mean
- Domain total
73Overview 2
- Estimation strategies
- No auxiliary information
- Auxiliary information X, no intercept
- Y and X positively correlated
- Linear relationship passes through origin
- Auxiliary information X, intercept
- Y and X positively correlated
- Linear relationship does not pass through origin
74Overview 3
- Make a table of population parameters (rows) by
estimation strategy (columns) - In each cell, write down
- Estimator for population parameter
- Estimator for variance of estimated parameter
- Residual ei
- Notes
- Some cells will be blank
- Look for relationship between mean and total, and
mean and proportion - Look at how the variance formulas for many of the
estimators are essentially the same form