Investigation of Treatment of Influential Values - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Investigation of Treatment of Influential Values

Description:

... uses the observation but in a manner that assures its contribution does not have ... Kh for each stratum in a manner than minimizes the mse (Kokic and Bell ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 39
Provided by: fel75
Category:

less

Transcript and Presenter's Notes

Title: Investigation of Treatment of Influential Values


1
Investigation of Treatment of Influential Values
  • Mary H. Mulry
  • Roxanne M. Feldpausch

2
Outline
  • Current practices
  • Methods investigated
  • Results
  • Next steps

3
Influential Observation
  • An observation is considered influential if its
    weighted contribution has an excessive effect on
    the estimate of the total (Chambers et al 2000)

4
The Data - U.S. Monthly Retail Trade Survey
  • Collect sales and inventories
  • Monthly survey of about 12,500 retail business
    with paid employees
  • Sample selected every 5 years
  • Sample is stratified based on industry and sales
  • Quarterly sample of births
  • Deaths are removed

5
The Data
  • Analysis done at published NAICS level
  • Hidiroglou-Berthelot algorithm ran on the data
    before looking for influential values
  • Horvitz-Thompson estimator

6
Causes of Influential Units
  • One time or rare event
  • Erroneous measure of size
  • Change in the make-up of the unit
  • Seasonal Businesses

7
Current Practices
  • Analyst review an effect listing of micro level
    data and investigates units that may be
    influential
  • When the analyst determines a correctly reporting
    unit may be influential, the case is referred to
    a statistician

8
Current Practices
  • One time influential value
  • Imputation
  • Recurring influential value
  • Weight adjustment based on the principles of
    representativeness
  • Moving the unit to a different industry when the
    nature of the business changes

9
Goals
  • To improve upon current methodology by making it
    more objective and rigorous
  • To find methodology that uses the observation but
    in a manner that assures its contribution does
    not have an excessive effect on the total

10
Assumptions
  • Influential observations occur infrequently, but
    are problematic when they appear.
  • The influential observation is true, although
    unusual. It is not the result of a reporting or
    coding error.

11
Strategy
  • Identify candidate methodologies and test with
    real data from one industry (about 700
    businesses) for a month that contains an
    influential value

12
Evaluation Criteria
  • Number of influential observations detected,
    including the number of true and false detections
    made
  • Estimate of bias
  • Impact on month-to-month change

13
Notation
  • where
  • Yi is the sales for the i-th business in a
    survey sample of size n
  • wi is the sample weight for the i-th unit
  • Xi is the previous months sales for the ith
    business

14
Methods Examined
  • Weight trimming
  • Reverse calibration
  • Winsorization
  • Generalized M-estimation

15
Weight Trimming
  • Does not identify influential units
  • Adjusts the weight of the observation

16
Weight Trimming
  • Truncate the weight of the influential
    observation
  • Adjust the weights of the non-influential
    observations to account for the remainder of the
    truncated weight
  • Sum of the new weights is the same as the sum of
    the original weights
  • (Potter 1990)

17
Weight Trimming Notes
  • Calculations were done within sample stratum.
  • Choice of correction factor could be
    investigated. We arbitrarily chose ciwi/3.

18
Reverse Calibration
  • Does not identify influential units
  • Adjusts the value of the observation

19
Reverse Calibration
  • Use a robust estimation method to estimate the
    total
  • Modify the influential observations to achieve
    that total
  • (Chambers and Ren 2004)

20
Winsorization
  • Identifies influential units
  • Adjusts the value of the observation

21
Winsorization
  • Type I
  • Type II

22
Winsorization Defining K
  • Define a separate Kh for each stratum in a manner
    than minimizes the mse (Kokic and Bell 1994)
  • Define a separate Ki for each observation in a
    manner that minimizes the mse (Clarke 1995)

23
Winsorization Defining K
  • Use unweighted data to define Kh for each stratum
    where Kh mh 2sh
  • Use weighted data to define Kh for each stratum
    where Kh mh 2sh where mh and sh are based on
    the weighted data

24
Winsorization-Our Implementation
  • Used a robust regression in SAS to estimate
    the parameters needed in the calculations

25
M-estimation
  • M-estimators are robust estimators that come
    from a generalization of maximum likelihood
    estimation

26
M-estimation
  • Identifies influential units
  • Adjusts either the weight or the value of the
    influential observation

27
M-estimation
  • Used a weighted M-estimation technique that is
    able to modify the weights or the values of the
    influential observations (Beaumont and Alavi 2004)

28
Results
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
Number of Outliers Detected
Method does not detect outliers, one outlier was
specified
33
Replacement Values (in Millions)
Weight trimming adjusts the other 18 weights in
the stratum Winsor wgt ?2s identified 3 other
values
34
Total Sales for the Industry
35
(No Transcript)
36
(No Transcript)
37
Chosen for Further Study
  • Winsorization by each observation
  • M-estimation by observation
  • M-estimation by weight

38
Contact Information
  • Mary.H.Mulry_at_census.gov
  • Roxanne.Feldpausch_at_census.gov
Write a Comment
User Comments (0)
About PowerShow.com