Automatic Editing with Hard and Soft Edits - PowerPoint PPT Presentation

About This Presentation

Title:

Automatic Editing with Hard and Soft Edits

Description:

Automatic Editing with Hard and Soft Edits Some First Experiences Sander Scholtus Sevin G ksen (Statistics Netherlands) – PowerPoint PPT presentation

Number of Views:52

Avg rating:3.0/5.0

Slides: 17

Provided by: SanderS5

Learn more at: https://unece.org

Category:

more less

Transcript and Presenter's Notes

Title: Automatic Editing with Hard and Soft Edits

1
Automatic Editing with Hardand Soft Edits Some
First Experiences

Sander Scholtus
Sevinç Göksen
(Statistics Netherlands)

2
Introduction

Error localisation problem
Try to identify variables with erroneous/missing
values
Edits
Constraints that should be satisfied by the data
Hard (fatal) e.g. Turnover Costs Profit
Soft (query) e.g. Profit / Turnover 0.6
Manual editing hard and soft edits
Automatic editing only hard edits

3
Error localisation (1)

Fellegi and Holt (1976)
Find the smallest (weighted) number of variables
that can be imputed so that all edits are
satisfied
Minimise
so that all edits are satisfied
No room for soft edits

4
Error localisation (2)

Alternative approach
Choose a function Dsoft that measures the degree
of suspicion associated with particular soft edit
failures
Minimise
so that all hard edits are satisfied
Prototype algorithm in R (based on editrules)

5
Simulation study (1)

Two data sets
Dutch SBS 2007, medium-sized wholesale businesses
Raw and manually edited data available
One half used as test data, one half as reference
data
Test data set 1
728 records, 12 variables, 16 hard edits, 10 soft
edits
Synthetic errors
Test data set 2
580 records, 10 variables, 17 hard edits, 24 soft
edits
Real errors

6
Simulation study (2)
editing approach (choice of Dsoft) records with perfect solution records with perfect solution
data set 1 data set 2
no soft edits, only hard edits 40.2 58.4
all edits as hard edits 36.8 n/a

7
Choices for Dsoft fixed weights (1)

Fixed failure weights
Resulting target function to be minimised
Higher failure weight ? harder soft edit

8
Choices for Dsoft fixed weights (2)

Possible choices for sk
All failure weights equal to 1
Proportion of records that satisfy edit k in
manually edited reference data
Interpretation P(edited record satisfies edit
k)
P(edited record satisfies edit k raw record
fails edit k)
Alternative categorised versions of B and C

9
Simulation study (3)
editing approach (choice of Dsoft) records with perfect solution records with perfect solution
data set 1 data set 2
no soft edits, only hard edits 40.2 58.4
all edits, using soft edits as hard edits 36.8 n/a
sum of fixed failure weights A 47.3 63.4
sum of fixed failure weights B 52.1 60.9
sum of fixed failure weights C 43.3 60.7
sum of fixed failure weights B(cat) 50.0 64.5
sum of fixed failure weights C(cat) 43.1 64.5

10
Choices for Dsoft quantile edits (1)

Drawback of fixed failure weights no difference
between large and small edit failures
Trick quantile edits

11
Choices for Dsoft quantile edits (2)

Idea use different versions of the same edit by
varying one of the constants
Choose values for this
constant based on the
fraction of reference
data records that fail
the resulting edit
(e.g. 1, 5, 10)

12
Choices for Dsoft quantile edits (3)

Example ratio edit x1 / x3 c

records failed c in ref. data quantile edit sk cumul. sk
10 0.75 x1 / x3 0.75 1 1
5 0.60 x1 / x3 0.60 1 2
1 0.10 x1 / x3 0.10 1 3
13
Simulation study (4)
editing approach (choice of Dsoft) records with perfect solution records with perfect solution
data set 1 data set 2
no soft edits, only hard edits 40.2 58.4
all edits, using soft edits as hard edits 36.8 n/a
sum of fixed failure weights A 47.3 63.4
sum of fixed failure weights B 52.1 60.9
sum of fixed failure weights C 43.3 60.7
sum of fixed failure weights B(cat) 50.0 64.5
sum of fixed failure weights C(cat) 43.1 64.5
10-5-1-quantile edits, weights 0.33-0.33-0.33 54.4 63.4
10-5-1-quantile edits, weights 0.90-0.05-0.05 56.5 63.8

14
Choices for Dsoft dynamic expressions

Size of edit failure ek
Linear equality edit ak1x1 akpxp bk 0
Take ek ak1x1 akpxp bk
Linear inequality edit ak1x1 akpxp bk
0
Take ek max 0, (ak1x1 akpxp bk)
Use reference data to standardise
Linear sum
Mahalanobis distance

15
Simulation study (5)
editing approach (choice of Dsoft) records with perfect solution records with perfect solution
data set 1 data set 2
no soft edits, only hard edits 40.2 58.4
all edits, using soft edits as hard edits 36.8 n/a
sum of fixed failure weights A 47.3 63.4
sum of fixed failure weights B 52.1 60.9
sum of fixed failure weights C 43.3 60.7
sum of fixed failure weights B(cat) 50.0 64.5
sum of fixed failure weights C(cat) 43.1 64.5
10-5-1-quantile edits, weights 0.33-0.33-0.33 54.4 63.4
10-5-1-quantile edits, weights 0.90-0.05-0.05 56.5 63.8
sum of standardised soft edit failures 49.2 ?
Mahalanobis distance of soft edit failures 46.8 ?
16
Conclusion