Title: Statistical%20confidentiality%20and%20privacy.%202.%20Case%20study:%20IPUMS-International%20www.ipums.org/international%20%20*%20*%20*%20Robert%20McCaa%20Minnesota%20Population%20Center%20rmccaa@umn.edu
1Statistical confidentiality and privacy.2. Case
study IPUMS-International www.ipums.org/internati
onal Robert McCaaMinnesota Population
Centerrmccaa_at_umn.edu
Inadequate use of microdata has high
costs--Len Cook (2003, registrar general, ONS)
2MPC largest provider of integrated microdata to
trusted, non-commercial researchers
International(census)
History(19th c.)
USA (census)
GIS
Employment
Health
Time-Use
3IPUMS-Global (first 10 years) dark green
integrated and disseminating (44 countries, 130
censuses, 279 millon person records)green to
be integrated (35 countries, 90 censuses, 150
mill.)
See Inventory handout
Inventory IPUMS confidentiality protocols
used
Mollweide projection
4Outline IPUMS statistical confidentiality
methods
- IPUMS A restricted access, web-based microdata
dissemination system - IPUMS The trusted user/institution approach
- A. Legal Disclosure Controls
- B. Administrative Disclosure Controls
- C. Technical Disclosure Controls
- Example Saint Lucia, 1991
- IPUMS Assessments (2007)
- UN-ECE Case Study
- Trewin on-site evaluation
51. IPUMS-International Goals
- Inventory census microdata and documentation,
world-wide - Recover and preserve at-risk microdata
- Integrate census microdata and documentation
- Disseminate--without cost--extracts of samples to
bona-fide researchers worldwide, regardless of
country of birth, citizenship or residence. - Sustained funding 1999-20156 grants of 5 years
duration - National Science Foundation (USA) 3 successive
grants - National Institutes of Health (USA) Latin
America, Europe, Eur-Asia
6IPUMS-International a restricted-access,
web-based microdata extraction system
- Researcher licensed to access microdata 1/3
rejected - NO Public access, source files, or complete
datasets - Licensed researcher selects
- Countries,
- Censuses,
- Cases/sub-populations,
- Variables, and sample densities
- Extract engine queues request, generates extract
- Password protected to make and retrieve
extracts - Researcher retrieves extract via web with SSL
128-bit encryption and analyzes using own wares
(soft/hard/wet)
76 steps using www.ipums.org/international
See 10 tips handout
8IPUMS-International worlds largest
disseminator of integrated microdata to trusted,
non-commercial researchers
- 1999 Founded by Steven Ruggles and Bob McCaa,
- restrict access to trusted users, and apply
corresponding confidentiality techniques - 2002 1st release of integrated samples for 7
countries gt200 users in first year - Big success! 80 countries signed 70 entrusted
microdata to IPUMS, datasets for more than 250
censuses, gt180 entire datasets - 2006
9IPUMS-International worlds largest
disseminator of integrated microdata to trusted,
non-commercial researchers
- 1999 Founded
- 2006, 3rd release
- data for 20 countries, samples for 63 censuses,
- 185 million person records,
- gt1,000 users
- 2010, 7th release
- data for 50 countries, samples for 160 censuses
- 300 million person records
- gt4,000 users
- Note data extracts are provided only to
licensed users.
102. IPUMS-International The trusted-user/institu
tion approach to disseminating integrated,
anonymized microdata extracts
Disclosure ControlsA. Legal Memorandum with
NSIB. Administrative License with
researchersC. Technical Sample, Data
modifications
113 kinds of confidentiality protections
- Legal Dissemination agreement between
University of Minnesota and each National
Statistical Institute - Uniform 11 point Memorandum of Understanding
regarding ownership, use, authorization,
restrictions, confidentiality, security,
publication, violations, sharing, arbitration,
and order of precedence - Administrative conditional use license between
the University of Minnesota and each researcher - Permission to use restricted access microdata, 3
criteria research need, research competence,
and agree to abide by conditions of use license - Technical data protection measures
- Specific to each country /
12A. NSI with U of Minnesota
13A. NSI with U. of Minnesota
143 kinds of confidentiality protections
- Legal Dissemination agreement between
University of Minnesota and each National
Statistical Institute - Uniform 11 point Memorandum of Understanding
regarding ownership, use, authorization,
restrictions, confidentiality, security,
publication, violations, sharing, arbitration,
and order of precedence - Administrative conditional use license between
the University of Minnesota and each researcher - Permission to use restricted access microdata, 3
criteria research need, research competence,
and agree to abide by conditions of use license - Technical data protection measures
- Specific to each country /
15LICENSE
IPUMSi
B. License with researchersRestricted Access
web-based system
- Legally-binding license agreement
- forces would-be intruder to violate law by which
they can be fined and/or jailed - Researchers institution sanctioned
- protects privacy and confidentiality
- assures proper use
- Access limited to
- Bona-fide researchers (credentials)
- With a demonstrated scientific need
- who agree to abide by license restrictions
- Confidentiality
- No redistribution
- Safely secured
- Alleging that a person has been identified is
prohibited
16LICENSE
IPUMSi
B. License with researchersRestricted Access
web-based system
- Legally-binding license agreement
- forces would-be snoopers to violate law
- protects privacy and confidentiality
- assures proper use
- Access limited to
- Bona-fide researchers (credentialed)
- with demonstrated scientific need
- who agree to abide by license restrictions
- Confidentiality
- No redistribution, no commercial use
- Data safely secured
- Alleging that a person can be or has been
identified is a violation
17(No Transcript)
18(No Transcript)
19Apply for Access
20(No Transcript)
21(No Transcript)
22(No Transcript)
23(No Transcript)
24Must click acceptance of each restriction to gain
access.
25License is for 1 year, renewable.
End of application
26C. 9 Technical Disclosure Controls(Thorogood,
1999)
- Restrict access to samples
- Limit geographical detail
- Recode sparse categories
- Truncate top and bottom codes
- Construct age from birthdate, if necessary
- Suppress date of birth, precise place of birth
- Migration timing/place not identified in detail
- Identify place of residence by major civil
division (popgt20k, 60k, 100k, 250k, 1
millioni.e., national convention) - Suppress any sensitive variable requested by NSI
27C. Technical Disclosure ControlsExample Saint
Lucia, 1991 Census
- Restrict access to samples 10 (13,405 persons)
- Limit geographical detail (nlt2,000) suppress
region, district, town, settlement, enumeration
district, school identification retain
urban-rural - Recode sparse categories (nlt25)? other.
- Type of dwelling suppress townhouse, barracks
- Land occupation suppress sharecrop
- Type of ownership suppress squatted, leased
- Type of roof suppress 5 categories
- Wall material suppress 5 categories
- Water supply suppress pubwell
- Type of lighting suppress gas
- Ethnic origin suppress Chinese, Portuguese,
Syrian-Lebanese - Religion suppress 6 categories
- School, work mode of transport bicycle
- Type of school technical institute, university
- Number of hours worked last week 5 hour
groups. , 70 - Pay period suppress quarterly, annually
- Occupation, industry, training code reduce from
4 digits to 1
28C. Technical Disclosure ControlsExample Saint
Lucia, 1991
- Top-bottom code
- Number of rooms 10
- Number of bedrooms 7
- Number of radios 4
- Number of tvs 3
- Number of videos 2
- Number of emigrants in dwelling 2
- Age 81
- Age at first child lt 14
- Age at first union lt14, 41
- Age at last child lt14, 45
- Number of school subjects lt3, gt7
- Income categories 8
29C. Technical Disclosure ControlsExample Saint
Lucia, 1991
- Suppress
- date of birth, precise place of birth, type of
work wanted - Migration timing/place not identified in detail
- Country last lived suppress 37 categories
- Year of immigration lt1948
- Identify place of residence by major civil
division (popgt20k, 60k, 100k, 250k, 1
millioni.e., national convention) - all suppressed
- Suppress any sensitive variable requested by NSI
- none (as yet)
303. AssessmentsA. Why was IPUMS cited as good
practice by the UN-ECE (2007, Annex 23, pp.
98-103)?http//www.unece.org/stats/documents/tfcm
.htm
31UN-ECE Good practices (see annex 23)
- High level of confidence and transparency between
the researchers (users) and the national
statistical institutes - The data are anonymized by highly efficient
technical means - The conditions of use are well defined
- Good use is assured by both juridical and
administrative mechanisms to prevent violations - Sanctions for misuse are clearly spelled out
- Sanctions are imposed not only against those who
misuse the data but also against their
institutions
32See Trewin Report handout
B. The Trewin Report
The security of the computing environment used
by IPUMS-International is first class and appears
to be of the standard of the beststatistical
offices.--Dennis Trewin, former-Australian
Statistician,past-President International
Statistical Institute,chair, UN-ECE Committee on
Managing Statistical Confidentiality and
Microdata Access (CES 2007)
33Statistical confidentiality and securitysee the
on-site review by Dennis Trewinwww.hist.umn.edu/
rmccaa/ipums-global (click Trewin Report)
- An Outsiders view from inside IPUMS-International
- The best practice for an international
repository of microdata - The security of IPUMS is first classthe
standard of the best national statistical
offices - in full compliance with the principles and
recommendations of the ECE
34IPUMS-International strengths
- Uniform legal authorization with national
statistical authorities - Access restricted to academics with need who
agree to abide by stringent confidentiality
protections. Sanctions against individual and
institutiondenial of access to all microdata for
the entire institution - Strong technical methods of microdata
anonymization - Experienced integration teams
- Proven web-based access management system
- High producer and user satisfaction
- Sustainable MPC, NSF, NIH
35Join us at the 58th ISI Dublin, Aug 21-26,
2011http//www.isi2001.ie
- IPUMS Workshop, Aug 19-20.
- Microdata sessions.
- IPUMS Funding for delegates from developing
countries. - IPUMS booth
- Participate in ISI sessions.
- Network with stat offices, international
agencies, etc.
36Thank you!Morewww.hist.umn.edu/rmccaa/ipums-g
lobal see Durban workshop (2009) Microdata
recovery, Jamaica reportLisbon workshop
(2007)Saint Lucia report Contact
rmccaa_at_umn.edu this ppt is also available
atipums-global (See Port of Spain workshop)