The Microdata Revolution and IPUMS-International: Building a secure resource for comparative social science research in time and space * * * Robert McCaa For additional details, please see: https://international.ipums.org/international For a copy of - PowerPoint PPT Presentation

About This Presentation
Title:

The Microdata Revolution and IPUMS-International: Building a secure resource for comparative social science research in time and space * * * Robert McCaa For additional details, please see: https://international.ipums.org/international For a copy of

Description:

The Microdata Revolution and IPUMSInternational: Building a secure resource for comparative social s – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 58
Provided by: Robert90
Category:

less

Transcript and Presenter's Notes

Title: The Microdata Revolution and IPUMS-International: Building a secure resource for comparative social science research in time and space * * * Robert McCaa For additional details, please see: https://international.ipums.org/international For a copy of


1
The Microdata Revolution and IPUMS-International
Building a secure resource for comparative
social science research in time and space
Robert McCaa For additional details, please
seehttps//international.ipums.org/international
For a copy of this presentation, Trewin report,
morewww.hist.umn.edu/rmccaa/ipums-global
4th Conference for Social and Economic Data
(RatSWD)Wiesbaden, 19-20 June, 2008
2
A note of thanks
  • Organizers of RatSWD-2008
  • Mr. Walter Radermacher and Mr. Johann Hahlen,
    former President FSO
  • Dr. Gert Wagner
  • Mr. Markus Zwick
  • Ms. Andrea Harausz
  • Mr. Martin Podehl (Statistics Canada,
    ret.)translated German census documentation for
    IPUMS!!!
  • Dr. James Vaupel, Max Planck Institute (Rostock)
  • Empirical tradition of German scholarship
  • From Alexander von Humboldt to Leopoldo von Ranke
    and beyond

3
Our common fate on a crowded planet new forms
of global cooperation are required.We must
engage interdisciplinary research combining
theory and practice. --Jeffrey D. Sachs, Common
Wealth (Penguin 2008)
Imagine!!!
a microdata revolution!
4
Free!!!
A Microdata Revolution
  • Preserve all microdata and documentation 15
    slides
  • Confidentialize 21
  • Integrate 6
  • Disseminate to researchers world-wide
    3
  • Conclusion strengths, challenges, 7 golden rules
    3

5
A Census Microdata Revolution
  • Preserve all census microdata and documentation
  • 1960s present
  • 100 countries (80 have endorsed MoU)
  • 400 censuses (214 are entrusted to IPUMS)
  • Confidentialize legal, administrative,
    technical
  • Integrate both microdata and metadata
  • Disseminate to researchers world-wide extracts
    of database countries, censuses,
    sub-populations, sample size, variables

6
IPUMS-International Today dark green already
integrated 35 countries, 111 censuses, 263
million person recordsgreen to be integrated
39 countries, 103 censuses, 150 mill.
Mollweide projection
7
IPUMS dissemination calendar (see
handout)samples for 35 countries available now,
74 soon
  • Europe
  • Available (10) Austria, Belarus, France, Greece,
    Hungary, Netherlands, Portugal, Romania, Spain,
    UK
  • Soon (4) Germany, Czech Republic, Slovenia,
    Switzerland
  • Americas (funding renewed July 1)
  • Available (11) Argentina, Brazil, Canada, Chile,
    Colombia, Costa Rica, Ecuador, Mexico, Panama,
    USA, Venezuela
  • Soon (11) Bolivia, Cuba, Dominican Republic, El
    Salvador, Guatemala, Honduras, Nicaragua,
    Paraguay, Peru, Puerto Rico, Uruguay
  • Africa
  • Available (6) Egypt, Ghana, Kenya, Rwanda, South
    Africa, Uganda
  • Soon (11) Botswana, Ethiopia, Guinea (Conakry),
    Madagascar, Malawi, Mali, Mauritius, Sierra
    Leone, Sudan, Tanzania, Zambia
  • Asia
  • Available (8) Cambodia, China, Iraq, Israel,
    Malaysia, Palestine, Philippines, Vietnam
  • Soon (13) Armenia, Bangladesh, Fiji, India,
    Indonesia, Jordan, Kyrgyz Republic, Mongolia,
    Nepal, Pakistan, Thailand, Turkmenistan

8
IPUMS timeline
  • 1995 IPUMS-USA first release of integrated
    microdata
  • IPUMS-USA continues 1850-2000 ACS
    samples
  • 1999 IPUMS-International funded
  • 2002 - 1st International release 7 countries,
    including Colombia and Mexico
  • 2006 20 countries, 63 censuses
  • 2008 35 countries, 111 censuses
  • 263 million person records
  • Two thousand users
  • 2013 70 countries, 200 censuses
  • 214 sets of microdata are already entrusted to
    MPC
  • Coming Germany (8), Switzerland (4), Bangladesh
    (2), Cuba (1)...

9
The IPUMS team (Feb. 2008)
Steven Ruggles, inventor of IPUMS, Professor of
History, and Director of the Minnesota Population
Center
(Not present computer gurus, some researchers,
and others who were too busy for a photo!)
10
1. Preserve (Archive)IPUMS Global workshop, ISI
(Lisbon, Aug 2007)
11
Preservation 1973 census tapes of Sudan were
at risk!
12
Data recovery. Example Bangladesh Bureau of
Statistics--1981 census, 276 tapes, recovery in
Aug. 08)
gt3,000 tapes recovered 1971 Germany1980
Mexico, Mali 76, Sudan 73and many more
13
Census Microdata 1950sfew countries archived
microdata (a country in green indicates
microdata exist for the decade)see
www.hist.umn.edu/rmccaa/IUMSI/country6.htm
Mollweide projection
14
Census Microdata 1960sThe Americas in the
vanguard for preservation of microdata
Mollweide projection
15
Census Microdata 1970sthe preservation of
microdata was almost universal in the
Americasand was becoming widespread in Europe,
Africa and Asia
Germany Thanks to RDC, microdata and metadata
for the 1970 FRG and 1971 GDR censuses are
recovered.
Mollweide projection
16
Census Microdata 1980sThe preservation of
microdata became generalized
Germany 1981 GDR and 1987 FRG microdata and
metadata are recovered.
Mollweide projection
17
See tomorrows presentation by Wendy Thomas
for more on Archiving
Census Microdata 1990smany countries preserved
microdata(or are disposed to recover them)
Mollweide projection
18
Census Microdata 2000smany countries have
microdata(or are disposed to make them available
for research)
Mollweide projection
19
Inventory of census microdata archived by region
and decade ( of censuses conducted)
Region/continent Countries 2000s 1990s 1980s 1970s   1960s
Latin America 21 100 100 89 81 72
North America 27 91 72 64 24 8
Africa 58 15 22  25  15  2 
Asia 44 ? 54 31 30 13
Europe 46 ? 67 55 41 13
Pacific(pobgt.5m) 7 100 100 100 43 29
  • Note cases confirmed by the corresponding
    official statistical institute. Some datasets
    remain to be certified. Some countries have not
    responded to the invitation to inventory their
    stocks of data. Source http//www.hist.umn.ed
    u/rmccaa/IPUMS/country6.htm

20
Microdata Documentation for Germany by census
year and typeentrusted to IPUMS. Integrated
samples to be launched for the 5th RatSWD?
See tomorrows presentation by Andrea Harausz
for more on Germancontribution to IPUMS
21
2. Confidentialize The trusted-user/trusted-in
stitution approach to disseminating integrated,
anonymized extracts of census samples
22
Imagine!!!
Whats the problem?
  • Confidentializing an integrated microdata base
    with
  • 200 census samples of households (70 countries)
  • Containing ½ billion person records with
    thousands of variables
  • Available free of cost to tens of thousands of
    licensed researchers regardless of country of
    birth, citizenship, residence or place of work
  • Without a single allegation of violation of
    privacy or statistical confidentiality

Ever!!
23
Solution a restricted-access, web-based system
  • Password protected to make extracts and
    retrieve microdata
  • Licensed researcher selects
  • Countries,
  • Censuses,
  • Cases/sub-populations,
  • Variables, and
  • Sample densities
  • Extract engine queues request, generates extract
  • Researcher retrieves extract via web with SSL
    128-bit encryption and analyzes using own wares
    (soft/hard/wet)
  • NO CDs. NO source files. NO complete datasets.

24
4 points on IPUMSStatistical Confidentiality,
Privacy and Security
  1. Memorandum of Understanding between University of
    Minnesota and each National Statistical Office
  2. License agreement between each Researcher and the
    University of Minnesota
  3. Technical protections applied to the microdata
  4. Why these are good practices (UN-ECE) and best
    practices (Dennis Trewin on-site inspection)

25
A. NSI with U of Minnesota
26
A. NSI with U. of Minnesota(2005)
27
LICENSE
IPUMSi
B. License with researchersRestricted Access
web-based system
  • Legally-binding license agreement
  • forces would-be snoopers to violate law by which
    they can be fined and jailed
  • protects privacy and confidentiality
  • assures proper use
  • Access limited to
  • Bona-fide researchers (credentials)
  • With a demonstrated scientific need
  • who agree to abide by license restrictions
  • Confidentiality
  • No redistribution
  • Safely secured
  • Alleging that a person has been identified is
    prohibited

28
LICENSE
IPUMSi
B. License with researchersRestricted Access
web-based system
  • Legally-binding license agreement
  • forces would-be snoopers to violate law
  • protects privacy and confidentiality
  • assures proper use
  • Access limited to
  • Bona-fide researchers (credentialed)
  • With a demonstrated scientific need
  • who agree to abide by license restrictions
  • Confidentiality
  • No redistribution, no commercial use
  • Safely secured
  • Alleging that a person can be or has been
    identified is illegal

29
(No Transcript)
30
(No Transcript)
31
Apply for Access
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
License valid for 1 year, renewable.
End of application
38
CONFIDENTIALIZES
IPUMSi
C. technical measures(in addition to legal
administrative protections)
Suppress geographical detail Blur/aggregate
sensitive codes Convert dates to ages (blur key
vars.) Swap cases between districts Scramble
order of records
39
EUROSTAT statistical confidentiality standards
(Thorogood, 1999) --all endorsed by
IPUMS-International
  • 1. Restrict access to samples
  • 2. Limit geographical detail
  • 3. Re-code unique categories--top and bottom
  • 4. Sign non-disclosure agreement
  • 5. Prohibit redistribution to third parties
  • 6. Prohibit attempts to identify individuals or
    of making any claim to that effect
  • 7. Require users to provide copies of
    publications

40
EUROSTAT statistical confidentiality standards
(Thorogood, 1999) --all endorsed by
IPUMS-International
  • 8. Construct age from birthdate, if necessary
  • 9. Do not identify date of birth
  • 10. Do not identify precise place of birth
  • 11. Migration timing/place not identified in
    detail
  • 12. Identify place of residence by major civil
    division (popgt20k, 60k, 100k, 1 millioni.e.,
    national convention)
  • 13. Do sensitivity analysis (not yet)
  • 14. Do confidentiality assessment (not yet)

41
D. IPUMS auditedcited as good practice by
UN-ECEReport (2007, Annex 23, pp.
98-103)http//www.unece.org/stats/documents/tfcm.
htm
42
Good practices (UN-ECE report, see annex 23)
  1. High level of confidence and transparency between
    the researchers (users) and the national
    statistical institutes
  2. The conditions of use are well defined
  3. Sanctions for mis-use are clearly spelled out
  4. Good use is assured by both juridical and
    administration mechanisms to prevent violations
  5. Sanctions are imposed not only against those who
    misuse the data but also against their
    institutions.
  6. The data are anonymized by highly efficient
    technical means

43
Statistical confidentiality and securitysee the
on-site review by Dennis Trewinwww.hist.umn.edu/
rmccaa/ipums-global (click Trewin Report)
  • The best practice for an international
    repository of microdata
  • The security of IPUMS is first classthe
    standard of the best national statistical
    offices
  • in full compliance with the principles and
    recommendations of the ECE

44
3. Integration Microdata and Metadata
45
IPUMS integration of metadata and microdata
  • Comprehensive documentation, including
  • Data dictionaries and codebooks
  • Complete original source documentation in the
    official language questionnaires, manuals, etc.
  • All translated to English (from the
    German--thanks again to Martin Podehl!!) and
    converted into metadatabase for each census
  • Integration ? standardization
  • Composite codes (11, 12, 21, 22) ? serial codes
    (1, 2, 3, ) (see next slide)

46
IPUMSMicrodata integration method composite
codes (multiple digits)retains not only
significant distinctions but also integrates
comparable concepts

Chile Chile México México
Code Label 1992 2002 1990 2000
0 NIU X X X X
ACTIVE (In Labor Force)
100 EMPLOYED, not specified
110 At work X X X X
111 At work, and 'student' X
112 At work, and 'housework' X
113 At work, and 'seeking work' X
114 At work, and 'retired' X
115 At work, and 'no work' X
116 At work, and 'other' X
117 At work, family holding, not specified
118 At work, family holding, not agricultural
119 At work, family holding, agricultural
120 Have job, not at work last week X X X X
47
IPUMSMicrodata integration method composite
codes (multiple digits)retains not only
significant distinctions but also integrates
comparable concepts

Chile Chile México México
Code Label 1992 2002 1990 2000
0 NIU X X X X
ACTIVE (In Labor Force)
100 EMPLOYED, not specified
110 At work X X X X
111 At work, and 'student' X
112 At work, and 'housework' X
113 At work, and 'seeking work' X
114 At work, and 'retired' X
115 At work, and 'no work' X
116 At work, and 'other' X
117 At work, family holding, not specified
118 At work, family holding, not agricultural
119 At work, family holding, agricultural
120 Have job, not at work last week X X X X
Goal of integration coding scheme Assist each
researcher in making informed decisions on
comparabilitynot to attempt to make the one best
decision for all researchers.
48
In addition
  • Microdata new high precision samples not only
    for contemporary censuses but also for historical
    ones (before the 90s)
  • Systematic metadata for all variables
  • Universes
  • Definitions
  • Comparability
  • Dynamic Systemfacilitates comparing the wording
    of questionnaires and instructions for any
    combination of countries and censuses

49
IPUMS integrated metadata Instantly, compare
text /or image of enumeration forms and
instructions for any combination of countries and
censuses (example educational attainment)
50
4. Dissemination
51
- Caution -
  • IPUMS microdata are anonymized samples.
  • They are for advanced analysis and research.
  • Use of a statistical software is required.
  • Statistical software provides great power.
  • With great power, comes great responsibility.
  • IPUMS samples are for analysis.
  • IPUMS samples are not official statistics.

52
6 steps using https//international.ipums.org/inte
rnational
53
Conclusion IPUMS Strengths and Challenges plus
7 golden rules for promoting microdata revolution
54
IPUMS-International strengths
  1. Uniform legal authorization with national
    statistical authorities
  2. Access restricted to academics with need who
    agree to abide by stringent confidentiality
    protections
  3. Sanctions against individual and
    institutiondenial of access to all microdata for
    the entire institution
  4. Experienced integration teams
  5. Proven web-based distribution system
  6. High user satisfaction with microdata metadata
  7. Sustainable funding NSF, NIH

55
5 Challenges
  • Microdata to recover (30 countries), integrate
    (60 countries)
  • 2010 round of censuses (100 countries)
  • Tabulator (research toolnot official stats)
  • GIS
  • High security laboratory for sensitive,
    comprehensive microdata

See tomorrows presentation by Albert Esteve
for more on Tabulator
56
7 golden rules for the global microdata
revolution
  • Respect restricted-access conditions of use
  • protect confidentiality
  • share data only with registered users
  • Study both source documentation and metadata
  • Original source census forms, instructions to
    enumerators, etc.
  • Integrated metadata samples, variables,
    comparability discussions
  • Construct extracts judiciously
  • extract only needed countries, censuses,
    variables, sub-pops
  • use sample size /or subsamp features to keep
    samples small
  • Use weightseither households or individuals
    (geographical strata power)
  • Analyze carefullyproper statistical techniques,
    keeping in mind data quality, sample error
  • Cite properly IPUMS and National Statistical
    Agencies
  • Share publications IPUMS and National
    Statistical Agencies

57
Thank you!!rmccaa_at_umn.edu
Write a Comment
User Comments (0)
About PowerShow.com