Automatic Editing with Hard and Soft Edits - PowerPoint PPT Presentation

About This Presentation
Title:

Automatic Editing with Hard and Soft Edits

Description:

Automatic Editing with Hard and Soft Edits Some First Experiences Sander Scholtus Sevin G ksen (Statistics Netherlands) – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 17
Provided by: SanderS5
Learn more at: https://unece.org
Category:

less

Transcript and Presenter's Notes

Title: Automatic Editing with Hard and Soft Edits


1
Automatic Editing with Hardand Soft Edits Some
First Experiences
  • Sander Scholtus
  • Sevinç Göksen
  • (Statistics Netherlands)

2
Introduction
  • Error localisation problem
  • Try to identify variables with erroneous/missing
    values
  • Edits
  • Constraints that should be satisfied by the data
  • Hard (fatal) e.g. Turnover Costs Profit
  • Soft (query) e.g. Profit / Turnover 0.6
  • Manual editing hard and soft edits
  • Automatic editing only hard edits

3
Error localisation (1)
  • Fellegi and Holt (1976)
  • Find the smallest (weighted) number of variables
    that can be imputed so that all edits are
    satisfied
  • Minimise
  • so that all edits are satisfied
  • No room for soft edits

4
Error localisation (2)
  • Alternative approach
  • Choose a function Dsoft that measures the degree
    of suspicion associated with particular soft edit
    failures
  • Minimise
  • so that all hard edits are satisfied
  • Prototype algorithm in R (based on editrules)

5
Simulation study (1)
  • Two data sets
  • Dutch SBS 2007, medium-sized wholesale businesses
  • Raw and manually edited data available
  • One half used as test data, one half as reference
    data
  • Test data set 1
  • 728 records, 12 variables, 16 hard edits, 10 soft
    edits
  • Synthetic errors
  • Test data set 2
  • 580 records, 10 variables, 17 hard edits, 24 soft
    edits
  • Real errors

6
Simulation study (2)
editing approach (choice of Dsoft) records with perfect solution records with perfect solution
data set 1 data set 2
no soft edits, only hard edits 40.2 58.4
all edits as hard edits 36.8 n/a









7
Choices for Dsoft fixed weights (1)
  • Fixed failure weights
  • Resulting target function to be minimised
  • Higher failure weight ? harder soft edit

8
Choices for Dsoft fixed weights (2)
  • Possible choices for sk
  • All failure weights equal to 1
  • Proportion of records that satisfy edit k in
    manually edited reference data
  • Interpretation P(edited record satisfies edit
    k)
  • P(edited record satisfies edit k raw record
    fails edit k)
  • Alternative categorised versions of B and C

9
Simulation study (3)
editing approach (choice of Dsoft) records with perfect solution records with perfect solution
data set 1 data set 2
no soft edits, only hard edits 40.2 58.4
all edits, using soft edits as hard edits 36.8 n/a
sum of fixed failure weights A 47.3 63.4
sum of fixed failure weights B 52.1 60.9
sum of fixed failure weights C 43.3 60.7
sum of fixed failure weights B(cat) 50.0 64.5
sum of fixed failure weights C(cat) 43.1 64.5




10
Choices for Dsoft quantile edits (1)
  • Drawback of fixed failure weights no difference
    between large and small edit failures
  • Trick quantile edits

11
Choices for Dsoft quantile edits (2)
  • Idea use different versions of the same edit by
    varying one of the constants
  • Choose values for this
  • constant based on the
  • fraction of reference
  • data records that fail
  • the resulting edit
  • (e.g. 1, 5, 10)

12
Choices for Dsoft quantile edits (3)
  • Example ratio edit x1 / x3 c

records failed c in ref. data quantile edit sk cumul. sk
10 0.75 x1 / x3 0.75 1 1
5 0.60 x1 / x3 0.60 1 2
1 0.10 x1 / x3 0.10 1 3
13
Simulation study (4)
editing approach (choice of Dsoft) records with perfect solution records with perfect solution
data set 1 data set 2
no soft edits, only hard edits 40.2 58.4
all edits, using soft edits as hard edits 36.8 n/a
sum of fixed failure weights A 47.3 63.4
sum of fixed failure weights B 52.1 60.9
sum of fixed failure weights C 43.3 60.7
sum of fixed failure weights B(cat) 50.0 64.5
sum of fixed failure weights C(cat) 43.1 64.5
10-5-1-quantile edits, weights 0.33-0.33-0.33 54.4 63.4
10-5-1-quantile edits, weights 0.90-0.05-0.05 56.5 63.8


14
Choices for Dsoft dynamic expressions
  • Size of edit failure ek
  • Linear equality edit ak1x1 akpxp bk 0
  • Take ek ak1x1 akpxp bk
  • Linear inequality edit ak1x1 akpxp bk
    0
  • Take ek max 0, (ak1x1 akpxp bk)
  • Use reference data to standardise
  • Linear sum
  • Mahalanobis distance

15
Simulation study (5)
editing approach (choice of Dsoft) records with perfect solution records with perfect solution
data set 1 data set 2
no soft edits, only hard edits 40.2 58.4
all edits, using soft edits as hard edits 36.8 n/a
sum of fixed failure weights A 47.3 63.4
sum of fixed failure weights B 52.1 60.9
sum of fixed failure weights C 43.3 60.7
sum of fixed failure weights B(cat) 50.0 64.5
sum of fixed failure weights C(cat) 43.1 64.5
10-5-1-quantile edits, weights 0.33-0.33-0.33 54.4 63.4
10-5-1-quantile edits, weights 0.90-0.05-0.05 56.5 63.8
sum of standardised soft edit failures 49.2 ?
Mahalanobis distance of soft edit failures 46.8 ?
16
Conclusion
  • Using soft edits ? improved error localisation
  • Choice of Dsoft
  • Results not unequivocal
  • Quantile edits seem to work well
  • Room for improvement
  • Future work
  • Extended simulation study with mixed data/edits
Write a Comment
User Comments (0)
About PowerShow.com