Title: Why a Small "n"
1Why a Small "n" is Surrounded by
Confidentiality Ensuring Confidentiality and
Reliability in Microdatabases and Summary Tables.
2 Glynn D. Ligon Barbara S. Clements Vicente
Paredes Evaluation Software Publishing,
Incorporated Austin, Texas
3There are two angles.
1. Control access to databases, and protect the
identity of individuals in databases extracted
for others.
2. Avoid reporting a statistic that can be
linked to an individual.
400000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000001000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
000000000000000000
Problem 1 An individual is identified in the
database.
1
5There are two angles.
1. Control access to databases, and protect the
identity of individuals in databases extracted
for others.
2. Avoid reporting a statistic that can be
linked to an individual.
6Data Problem 176993445
Small Cell Size
An individuals data are captured by the
report. An individual is the only one in a
group.
7Data Problem n
The n Crowd
Internal disclosure to people in the
know. Someone with additional information can
identify individuals.
8Data Problem X
X-ternal Disclosure
External disclosure to people with good math
skills. Using marginal totals, individual cell
values can be calculated.
9Bad News
Data Problems 100 0
Good News
100 are Low Income.
0 are Low Income!
All or None Make the Grade.
10Typical PR Decision...
Bad News is
Good News is
Confidential!
NEWS!
11Data Problem mm
m
Mean
Minimum
m
Median
Maximum
Mean or Median Equals Min or Max. When the mean
or median equals the minimum or maximum value,
then all values in the cell are the same.
12Data Problem 6
Reliability (p) is based upon frequency (n) and
standard deviation.
Small Inns
1
2
2
3
4
WELCOME TO MOTEL 6
OFFICE
Reliability of the data is low.
13Natural Persons
Most Organizations All personal data typically
protected
14Name
Natural Persons
Test Scores
School Location
Handicapping Conditions
Family Income
Public Education Some personal data protected
Salary
15Most Organizations All unit data typically
protected
Collective Units
16Public Education Almost no unit data protected
Collective Units
17Perturbing the Data
Statisticians change data to mask identities but
maintain statistical integrity.
18Perturbing the Data
ROUNDING
3 5
NOISE False data added.
1 0
19Data Swapping
- Any frequency of 1 combines with a frequency of
2. - If 3 frequencies of 1, the central cell is set to
3, others set to 0. - Any frequency of 1 combines with the greatest
frequency. - 1 is subtracted from the greatest frequency and
added to any frequency of 2. - A frequency of 2 is split up.
20What if our text says, Every gender and ethnic
group was represented in the awards!
Data swapping creates a dilemma.
21Suppressing the Data
- Any frequency less than 3 is suppressed.
- Complementary cells for any suppressed are
suppressed. - Complementary tables are masked.
- Whole tables are suppressed.
22What if our text says, Every gender and ethnic
group was represented in the awards!
Suppression creates mysteries.
23What do we call a cell that has been suppressed?
Quell
24Glynn D. Ligon Barbara S. Clements Vicente
Paredes Evaluation Software Publishing,
Incorporated 1510 West 34th Street Suite
200 Austin, Texas 78703 512-458-8364 Fax
512-371-0520 gligon_at_evalsoft.com www.evalsoft.com