Title: The census in global perspective and the coming census microdata revolution * * * Robert McCaa
1The census in global perspective and the coming
census microdata revolution Robert McCaa
Steven RugglesMinnesota Population
Centerhttp//www.ipums.orgIPUMS International
funded byNational Science Foundation
2Subtext Why should Nordic countries
participate in a project to preserve the worlds
census microdata and help make them usable?
Longest historical series of census microdata in
the world Cross-national research on a global
scale requires representation of all cultural
regions Intriguing demographic, historical
laboratory Large pool of scientific talent with
global concerns Persisting cultural, scientific
ties with Minnesota (would, for example, U. of
Texas be as interested?)
3 Globalization of the census the coming census
microdata revolution
- 1. Introduction census census microdata
- 2. The population census goes global coverage,
periodicity, and content - 3. Liberating census microdata preservation,
anonymization, integration, dissemination - 4. Statistical confidentiality and census
samples a 36 year-long perfect record - 5. International norms of statistical
confidentiality - 6. Harmonizing and disseminating scientifically
anonymized census samples IPUMSi
41. IntroductionThe census what is it?Census
microdata what are they?How can they be made
usable? Why should we care?
516th c. census of Mexico (Nahuatl, 1530s).
Here is the home of one...
(from Museum of Antropology, Mexico City)
original ms.
transcribed
translated
digitized
616th c. census of Mexico (Nahuatl, 1530s).
Here is the home of one...
(from Museum of Antropology, Mexico City)
original ms.
transcribed
translated
digitized
When is a census, a census? Goyer (1986)
5. Individual enumeration 6. Periodic
enumeration7. Publication of results8.
Dissemination of results
1. National legal authority2. Defined
enumeration area 3. Complete coverage 4.
Simultaneous enumeration
7An Aztec extended family 5 conjugal units, 4
generations, 3 married brothers
1530
8450 years later An example of a patrilateral
household from rural Morelos 5 conjugal unions, 3
generations
1990
(not kin)
9Examples to percentagesHave there been changes
in 4 1/2 centures?
10Census microdata of the late 20th century What
are they?Who bears preservation
responsibility?Who will make them usable?
Person number
Age
Sex
- 12100102600700720000011210000104
- 22200202600700720000011210000104
- 32300100600700720000012123000000
- 42300200400700000000000000000000
- 52300200200700000000000000000000
- 62300200000700000000000000000000
Census microdata
Censuses are costly
Public goods should be democratized
Where microdata are available, they are used
11 Globalization of the census the coming census
microdata revolution
- 1. Introduction census census microdata
- 2. The population census goes global coverage,
periodicity, and content - 3. Liberating census microdata preservation,
anonymization, integration, dissemination - 4. Statistical confidentiality and census
samples a 36 year-long perfect record - 5. International norms of statistical
confidentiality - 6. Harmonizing and disseminating scientifically
anonymized census samples the case of IPUMSi
122. The population census goes global.Coverage
becomes universal(thanks to A.N. Kiær,
Statistics Norway, who promoted globalization of
census at beginning of 20th c.)Content becomes
uniformDecennial censuses become the norm
13Population censuses became universal in the 20th
century.
Will census microdata ... in the 21st?
- 153 countries with 1 million pop. in 2000
- 2000 round figures are provisional
14Content ... increasingly uniform, principal
source on population information.social
variables
15Content ... increasingly uniformeducation and
migration variables
16Content ... increasingly uniformdemographic and
economic variables
17Decennial censuses are the rule (1945-2004).of
153 countries with 1 million poptotaling 6
billion people in 2000
- At least one census per decade 66
countries 50 of worlds population - Missed a single decennial enumeration 43
countries 38 of worlds population - Missed 2 or 3 enumerations 32 countries 10
pop. - Fewer than 3 enumerations 12 countries
2 of pop.
18On a millennial scale, censuses and census
microdata survive for only a short, but
significant period
19 Globalization of the census the coming census
microdata revolution
- 1. Introduction census census microdata
- 2. The population census goes global coverage,
periodicity, and content - 3. Liberating census microdata preservation,
anonymization, integration, dissemination - 4. Statistical confidentiality and census
samples a 36 year-long perfect record - 5. International norms of statistical
confidentiality - 6. Harmonizing and disseminating scientifically
anonymized census samples the case of IPUMSi
20 official statistics that meet the test of
practical utility are to be compiled and made
available on an impartial basis by official
statistical agencies to honor citizens
entitlement to public information.-- UN
Statistical Commission, 1994
21IPUMSi helps five ways
- 1. Inventory the worlds census microdata
- 2. Preserve endangered microdata and
documentation - 3. Anonymize census microdata to preserve
statistical confidentiality, using highest
standards (Stat. Nether.) - 4. Integrate datasets of selected countries using
UN, Eurostat and other standards - 5. Disseminate database free with complete copies
to all partners
Integrated Public Use Microdata Series -
International
22IPUMSi
INVENTORIES
- Microdata...for any population or administrative
division Nation, province, district, city,
ethnic group, etc.
- Example Latin America, - 20 countries- 67
censuses inventoried- 1 - 100 sample
densities- 100,000 to 150 million cases19th
century 2 censuses1960s 14 1970s 17 1
980s 16 1990s 17
- Found complete census data for Colombia 1973 and
16 other countries
23PRESERVES
IPUMSi
UN Demographic Center for Latin America (CELADE,
Santiago, Chile)3000 microdata tapes to be
preserved
and metadata (documentation)
24Preserve against accident, deterioration and
technological obsolescence
- Microdata
- - transfer to stable media
- - use standard data storage protocols
- - entrust copies with at least two depositories
- Metadata collect, catalogue, and reproduce
- - Enumeration forms (preserve all versions used)
- - Enumerator and data processing instructions
- - Codebooks (photocopies and scanned images)
- - Technical studies, evaluations, reports
UN Stat. Div. entire archive deposited, to be
scanned
25 Globalization of the census the coming census
microdata revolution
- 1. Introduction census census microdata
- 2. The population census goes global coverage,
periodicity, and content - 3. Liberating census microdata preservation,
anonymization, integration, dissemination - 4. Statistical confidentiality and census
samples a 36 year-long perfect record - 5. International norms of statistical
confidentiality - 6. Harmonizing and disseminating scientifically
anonymized census samples the case of IPUMSi
26How anonymized census samples became a standard
statistical product
- US Census Bureau
- - 1960 census 0.1 public use microdata series
- - 1970 census six 1 samples harmonized with
1960 - - 1984 1940, 1950 1 samples
- - 1980, 1990 samples varying densities, contents
- CELADE Latin America
- - 1960s 16 countries, densities 1-5
- - 1970s 19 countries, 1-10
27How anonymized census samples became a standard
statistical product
- Canada
- - 1971, 1976, 1981, 1986, 1991, 1996 varying
designs, densities - - 1996 Data Liberation Initiative led to an
explosion in of usage in research and teaching - UK
- - 1991 2 individuals, 0.5 householdshundreds
of publications, thousands of users - - 2001 double the densities because
confidentiality assessments were too conservative.
28Risk assessment of statistical confidentiality
- Take into account error, coding variability and
changing of personal characteristics in time - Dale and Elliott, JRSS-A (forthcoming)
For a user of an outside
database, attempting this sort of match with no
opportunity for verification would prove
fruitless. In the first place, the small degree
of expected overlap would be a considerable
deterrent to an intruder. However, if a match
between the two files was attempted the large
number of apparent matches would be highly
confusing as an intruder would have no way of
checking correct identification.
29Statistical confidentiality in the USA a brief
history
- Before 1954
- - 1850 exclusively for the use of the
government, and not to be used...to the
gratification of curiosity... - - 1920s deny access to data on individuals
- - 1942 refused to supply War Dept. w/ addresses
of Japanese-Americans - after 1954
- - census microdata do not reveal identities of
individuals - - basic geographical identifiers, low sample
densities, masking, swapping, top-coding,
re-coding - In practice, not a single breach or allegation of
a breach!
30Heightened concerns about confidentiality in USA
- Assault on privacy by businesses
- Distrust of government
- Never a question of use of census microdata. Yet
must avoid any possible perception of mis-use to
retain confidence and cooperation of citizens. - Pro-active strategy
- - Publicize confidentiality safe-guards
- - Offer a variety of microdata products higher
risks, higher security - - Data enclaves expensive, low usage,
exceedingly detailed microdata
31 Globalization of the census the coming census
microdata revolution
- 1. Introduction census census microdata
- 2. The population census goes global coverage,
periodicity, and content - 3. Liberating census microdata preservation,
anonymization, integration, dissemination - 4. Statistical confidentiality and census
samples a 36 year-long perfect record - 5. International norms of statistical
confidentiality - 6. Harmonizing and disseminating scientifically
anonymized census samples the case of IPUMSi
32statistical confidentiality shall mean the
protection of data related to single statistical
units which are obtained directly for statistical
purposes or indirectly from administrative or
other sources against any breach of the right to
confidentiality. It implies the prevention of
non-statistical utilization of the data obtained
and unlawful disclosure. --COUNCIL REGULATION
(EC) No 322/97 of 17 February 1997
33Statistical confidentiality standards in Eurostat
Countries ( in IPUMSi consortium)
- Norway Statistics Norway is prohibited to
publish or disclose data from which information
about individual persons or firms can be derived.
Researchers may be given access to such
information under strict rules and conditions.
Guidelines provided by the Norwegian Data
Inspectorate form the framework for internal
management of data security. - Other countries with strict provisions
Austria, Canada, Denmark, Finland, France,
Germany, Ireland, Netherlands, Sweden
34Anonymized census microdata sampleavailability
for European countries( in IPUMSi consortium,
negotiating)
- 15 countries available via PAU, 1990 round (3 in
IPUMSi), - Belgium, Czech Republic, Estonia, Finland,
Hungary, Italy, Latvia, Lithuania, Norway,
Poland, Spain, Sweden, Switzerland, Turkey, UK
- 11 countries not available via PAU (2 in IPUMSi)
- Austria, Croatia, Denmark, France, Germany,
Iceland, Ireland, Netherlands, Portugal, Slovak
Republic, Slovenia
35EUROSTAT statistical anonymity standards(Thorogoo
d, 1999)--all accepted by IPUMSi
- 1. small sample size
- 2. limited geographical detail
- 3. top and bottom coding of unique categories
- 4. signed non-disclosure agreement
- 5. prohibit redistribution of datasets to third
parties - 6. prohibit attempts to identify individuals or
the making any claim to that effect - 7. require users to provide copies of
publications
36EUROSTAT statistical anonymity standards(Thorogoo
d, 1999)--all accepted by IPUMSi and more
- 8. Age (constructed, where necessary)
- 9. Never identify date of birth
- 10. Never identify place of birth
- 11. Migration timing and place not identified
in detail - 12. Place of residence identified by major civil
division (popgt60k, 120k, 250k, 1
million--national rule) - 13. Sensitivity analysis of variables by
national experts - 14. Confidentiality assessment by national
experts
37International Monetary Funds General Data
Dissemination System52 countries with uniform
standards
- All embrace strict standards of statistical
confidentiality - Prohibit disclosure of information which may
identify individuals or entities - 37 countries distribute anonymized census
microdata samples
38 Globalization of the census the coming census
microdata revolution
- 1. Introduction census census microdata
- 2. The population census goes global coverage,
periodicity, and content - 3. Liberating census microdata preservation,
anonymization, integration, dissemination - 4. Statistical confidentiality and census
samples a 36 year-long perfect record - 5. International norms of statistical
confidentiality - 6. Harmonizing and disseminating scientifically
anonymized census samples the case of IPUMSi
39IPUMSi
Making the data usable... and used.
IPUMSi,1999-2004 20 countries 1850-2000
40PAYS
IPUMSi
National experts in each country are contracted
to
Assemble microdata and documentation Develop
samples to minimize confidentiality risks and
maximize robustness Design national integration
plancensus-by-censusconcept-by-conceptcode-by-c
ode Write integrated documentation
41INTEGRATES
IPUMSi
StandardUN/Eurostat Principles Recs...
Census documentation compiled for Colombian
microdata
Photos from Colombia integration project,
February-March, 20004 experts from DANE (census
office)7 academics (3 universities)
42 IPUMSi integration principles
- 1. Respect absolute anonymity
- 2. Preserve all original data, except adjustments
to insure privacy (top codes blurrings, masking,
re-ordering, etc.) - 3. Harmonize codes for countriesoccupation
ISCO, HISCO (detailed, general)education ISCED
family IPUMS, etc.
- 4. Enhance with constructed variables
43INTEGRATES
IPUMSi
10 projects started
First 18 months
USA 1850-1880, 1900-2000 France 1962, 1968,
1975, 1982, 1990 Norway 1801, 1865, 1875, 1900
negotiating 1960, 1970, 1980, 1990, 2001 Canada
1871, 1881, 1901 negotiating
1961-2001 United Kingdom (1851, 1881), 1991
negotiating 1961, 1971, 1981, 2001 Argentina
1869, 1895 Colombia 1964, 1973,1985, 1993,
2003 Vietnam 1989, 1999 Hungary 1970, 1980,
1990, 2000
44INTEGRATES
IPUMSi
5 projects planned
Mexico 1960, 1970, 1980, 1990, 2000 Spain 1981,
1991, 2001 Brazil 1960, 1970, 1980, 1991,
2001 China 1982, 1990, 2000 Kenya 1989, 1999
3 negotiations underway
Ghana 1984, 2000 Italy 1981, 1991,
2001 Austria 1971, 1981, 1991, 2001
45??
IPUMSi
7 future possibilities
Country Census microdata a. 1860, 1870,
1880, 1950, 1960, 1970, 1980, 1990,
2000 b. 1961, 1971, 1981, 1991, 2001 c. 1961,
1971, 1976, 1981, 1986, 1991, 1996 d. 1960,
1965, 1970, 1975, 1980, 1985, 1990,
1995 e. 1960, 1966, 1970, 1975, 1980, 1985,
1990, 1995 f. 1971, 1981, 1991, 2001 g. 1970,
1980, 1990, 2000 and .... ???
46ANONYMIZES
IPUMSi
Using the highest standards currently
availabletechnical (Statistics
Netherlands)administrative (license agreement)
Imagine a new statistical product a
scientifically anonymized census microdata sample
made up of unidentifiable individuals...
47 IPUMSi preserves statistical confidentiality(in
addition to NSO safe-guards)
- 1. Construct small samples
- 2. Suppress geographical detail (minor civil
divisions and others with less than 100,000
population), date of birth, 3-4 digit
occupational codes, etc. - 3. Blur codes for sensitive variables where
identity might be compromised (income) - 4. Top-code income, education, etc.
- 5. Swap a small fraction of records
- 6. Assess confidentiality risks for unique
records for all defined geographical areas
(ARGUS, Statistics Netherlands)
48 Repositories of anonymized census microdata
samples for scientific research
- ICPSR, University of Michigan
- ACAP, University of Pennsylvania
- CELADE, Centro Latino Americano de Demografía,
Santiago Chile. - ECE/PAU, Population Affairs Unit, Geneva
Switzerland. - EWC, East-West Center, U. of Hawaii.
- IPUMSi, University of Minnesota.
- Will others (a Nordic institution?) join the
effort?
49DISSEMINATES
IPUMSi
International web-based access system
End-User license agreement protects privacy and
confidentiality assures proper use User selects
countries, cases, variables, and
samples--makes cross-national research
possible Open architecture software and mirror
sites available to all partners
50Why should Nordic countries participate now?
Legal and scientific foundations in place
EUROSTAT, France, Austria, UK, etc. Project has
been underway 18 months of 5 year project if
resources are required, budget planning must
begin soon. Historical census microdata projects
are well advanced 1801, 1865 (100 club), 1875,
1900. Time to turn to contemporary census
microdata
51additional information athttp//www.ipums.org
Thank you
52Work plan, part II make census microdata usable
- 3. Integrate March 2000- National partners
- -integrate phase I countries using UN/Eurostat
Principles Recommendations - -help to design prototype
- Analyze all concepts, variables and codes of
census schedules for 30 target countries - -help to implement for phase I and II countries
- 4. Disseminate -October 2004
- - Design international data access engine
- - Implement with phase I and II countries