WP.31 Effects of Rounding on Data Quality - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

WP.31 Effects of Rounding on Data Quality

Description:

Unbiased rounding rule. Round r up with probability r/B. Round r down with probability 1 - r/B ... unbiased rounding. have the same mean, variance and MSE. as ... – PowerPoint PPT presentation

Number of Views:12
Avg rating:3.0/5.0
Slides: 29
Provided by: pzk6
Learn more at: https://unece.org
Category:

less

Transcript and Presenter's Notes

Title: WP.31 Effects of Rounding on Data Quality


1
WP.31 Effects of Rounding on Data Quality
  • Jay J. Kim , Lawrence H. Cox,
  • Myron Katzoff, Joe Fred Gonzalez, Jr.
  • U.S. National Center for Health Statistics

2
  • Introduction
  • Reasons for rounding
  • Rounding noninteger values to integer values for
    statistical purposes
  • To enhance readability of the data
  • To protect confidentiality of records in the
    file
  • To keep the important digits only.

3
  • Purpose
  • Evaluate the effects of four rounding methods
    on data quality and utility in two ways
  • (1) bias and variance
  • (2) effects on the underlying distribution of the
    data determined by a distance measure.

4
  • B Base
  • Quotient
  • Remainder
  • Types of rounding
  • Unbiased rounding ER(r)r r
  • Sum-unbiased rounding ER(r) E(r)

5
II. Four rounding rules
  • 1. Conventional rounding
  • Suppose r 0, 1, 2, . . . ,9 (B-1).
  • If r (B/2), round r up to 10 (B)
  • else round down r to zero (0).
  • If B is odd, round r up when r

6
  • r is assumed to follow a discrete uniform
    distribution
  • 2. Modified Conventional rounding
  • Same as conventional rounding, except
  • rounding 5 (B/2) up or down with probability ½.
  • 3. Zero-restricted 50/50 rounding
  • Except zero (0), round r up or down with
  • probability ½.

7
  • Unbiased rounding rule
  • Round r up with probability r/B
  • Round r down with probability 1 - r/B
  • III. Mean and variance
  • III.1 Mean and variance of unrounded number
  • r 0, 1, 2, 3, . . . B-1.

8
  • In general,

9
  • III. 2 Conventional rounding when B is even

  • for unrounded number.

10
(No Transcript)
11
  • III.3 Conventional rounding when B is odd
  • for unrounded number

12
  • Modified conventional rounding,
  • 50/50 rounding
  • and
  • unbiased rounding
  • have the same mean, variance and MSE
  • as the conventional rounding with odd B.

13
  • IV. Distance measure
  • Define

14
  • Reexpressing the numerator of U, we have
  • With conventional rounding with B10,
  • Then we have

15
  • Expected value of U
  • We define

16
  • IV.1 Conventional rounding with even B
  • which can be reexpressed as

17
  • The upper and lower bounds for harmonic series
  • are
  • The upper bound for the first term of

18
  • The second term of
  • Note the second term of E(U) is,
  • IV.2 Modified conventional rounding with even B
  • Has the same E(U) as the conventional rounding.

19
  • IV.3 50/50 rounding
  • The first term of
  • The second term
  • IV.4 Unbiased rounding
  • The first term of

20
  • The second term
  • IV.5 Comparisons of three rounding rules
  • Conv 50/50 Unbiased
  • 1st
  • Term
  • 2nd
  • Term

21
  • Comparisons of three rounding rules
  • B10
  • Conv 50/50 Unbiased
  • 1st term 2.61 11.49 (4.4) 4.5
    (1.7)
  • 2nd term .85 2.85 1.65

22
  • Comparisons of three rounding rules
  • B1,000
  • Conv 50/50 Unbiased
  • 1st term 193.65 3,453.88 (18) 499.5
    (2.6)
  • 2nd term 83.33 322.83 166.67

23
  • IV.6 E(1/q) for log-normal distribution
  • Let
  • y ln x
  • Then, x has a lognormal distribution, i.e.,

24
  • Let
  • Then
  • IV.6 E(1/q) for Pareto distribution of 2nd kind
  • The Pareto distribution of the second kind is,

25
  • where k is the minimum value of q and c is the
    cumulative probability from 1.
  • IV.7 Upper limit for E(1/q) for multinomial
    distribution
  • The multinomial distribution has the form
  • 0,1,2,

26
  • Note,
  • for all i.
  • Let be the size of the category i and

27
  • V. Concluding comments
  • Various methods of rounding and in some
    applications various choices for rounding base B
    are available.
  • The question becomes which method and/or
    base is expected to perform best in terms of data
    quality and preserving distributional properties
    of original data and, quantitatively, what is the
    expected distortion due to rounding?
  • This paper provides a preliminary analysis
    toward answering these questions

28
  • References
  • Grab, E.L Savage, I.R. (1954), Tables of the
    Expected Value of 1/X for Positive Bernoulli and
    Poisson Variables, Journal of the American
    Statistical Association 49, 169-177.
  • N.L. Johnson S. Kotz (1969). Distributions in
    Statistics, Discrete Distributions, Boston
    Houghton Mifflin Company.
  • N.L. Johnson S. Kotz (1970). Distributions in
    Statistics, Continuous Univariate
    Distributions-1, New York John Wiley and Sons,
    Inc.
  • Kim, Jay J., Cox, L.H., Gonzalez, J.F.
    Katzoff, M.J. (2004), Effects of Rounding
    Continuous Data Using Rounding Rules, Proceedings
    of the American Statistical Association, Survey
    Research Methods Section, Alexandria, VA,
    3803-3807 (available on CD).
  • Vasek Chvatal. Harmonic Numbers, Natural
    Logarithm and the Euler-Mascheroni Constant. See
    www.cs.rutgers.edu/chvatal/notes/harmonic.html
Write a Comment
User Comments (0)
About PowerShow.com