Title: Census Processing Procedures
1 Census Processing Procedures
Matt Sobek
Minnesota Population Center
Funded by the National Science Foundation
2IPUMS Work Process
1. Inventory
2. English Translation
3. Data Restructuring
4. Sample Creation
5. Confidentiality Measures
6. Data Harmonization
7. Data Improvement
8. Dissemination
3IPUMS Work Process
- For each sample
- data
- data dictionary
- census questionnaire and instructions
- sample design
- census design
- published tabulations, post-enumeration
- surveys, demographic analyses
- (when available)
1. Inventory
2. Translation
3. Data Restructuring
4. Sample Creation
5. Confidentiality
6. Harmonization
7. Data Improvement
8. Dissemination
4IPUMS Work Process
1. Inventory
2. Translation
- Census questionnaire
- Census instructions
- Data dictionary codes and labels
3. Data Restructuring
4. Sample Creation
5. Confidentiality
6. Harmonization
7. Data Improvement
8. Dissemination
5IPUMS Work Process
1. Inventory
2. Translation
a) Create labels/set-up file
3. Data Restructuring
4. Sample Creation
5. Confidentiality
6. Harmonization
7. Data Improvement
8. Dissemination
6Labels File, Costa Rica 2000
7IPUMS Work Process
1. Inventory
2. Translation
a) Create labels/set-up file
3. Data Restructuring
b) Analyze data
4. Sample Creation
5. Confidentiality
- Unique IDs or other means
- of distinguishing household
- membership
6. Harmonization
7. Data Improvement
8. Dissemination
8IPUMS Work Process
1. Inventory
2. Translation
a) Create labels/set-up file
3. Data Restructuring
b) Analyze data
4. Sample Creation
5. Confidentiality
- Unique IDs or other means
- of distinguishing household
- membership
6. Harmonization
7. Data Improvement
8. Dissemination
c) Reformat the data
- Convert to household-person
- hierarchical structure
9Reformat Rectangular Sample
(Person records only household data duplicated
on person records)
(Brazil 1980)
10Reformat Dwelling-Household-Person Sample
(Separate dwelling and household records)
(Chile 1992)
11Reformat Dwelling-Person Sample
(Multi-household dwellings no separate household
record)
(Colombia 1993)
12Merge Separate Household and Person Files
Household File
Person File
(Brazil 2000)
13Reformat Individual-level Data
(Individuals only not organized in households)
(Mexico 1960)
14IPUMS Work Process
1. Inventory
2. Translation
a) Create labels/set-up file
3. Data Restructuring
b) Analyze data
4. Sample Creation
5. Confidentiality
- Unique IDs or other means
- of distinguishing household
- membership
6. Harmonization
7. Data Improvement
8. Dissemination
c) Reformat the data
- Convert to household-person
- hierarchical structure
d) Identify and flag errors in structure
15Flags Identifying Structural Issues, Chile 1970
16IPUMS Work Process
1. Inventory
a) Formerly, systematic samples
2. Translation
- We developed a household-
- substitution technique to exclude
- corrupt records during sampling
3. Data Restructuring
4. Sample Creation
5. Confidentiality
6. Harmonization
7. Data Improvement
8. Dissemination
17Sampling Procedure Colombia 1973
Take
Take
No
No
Take
No
Take
18IPUMS Work Process
1. Inventory
a) Formerly, systematic samples
2. Translation
- We developed a household-
- substitution technique to exclude
- corrupt records during sampling
3. Data Restructuring
4. Sample Creation
5. Confidentiality
b) Stratified samples
6. Harmonization
- Variables for variance estimation
7. Data Improvement
8. Dissemination
- Develop strata for each sample
- using geography, ethnicity, hh size,
- hh type, socioeconomic status
- adjusted as necessary for census
19IPUMS Work Process
1. Inventory
2. Translation
3. Data Restructuring
5 measures, as required
4. Sample Creation
5. Confidentiality
- Limit geographic specificity
6. Harmonization
- Swap across geographic units
7. Data Improvement
- Randomize order within geographies
8. Dissemination
- Merge small variable categories
- Top-code sensitive numeric variables
20IPUMS Work Process
1. Inventory
2. Translation
3. Data Restructuring
4. Sample Creation
5. Confidentiality
6. Harmonization
a) Data translation matrices
7. Data Improvement
8. Dissemination
21Translation Matrix Marital Status
22Translation Matrix Marital Status
23Translation Matrix Marital Status
24Translation Matrix Marital Status
25IPUMS Work Process
1. Inventory
2. Translation
3. Data Restructuring
4. Sample Creation
5. Confidentiality
6. Harmonization
a) Data translation matrices
7. Data Improvement
b) Specialized variable programming
8. Dissemination
- Where one-to-one recoding
- of the translation matrix is
- insufficient
26IPUMS Work Process
1. Inventory
2. Translation
3. Data Restructuring
4. Sample Creation
5. Confidentiality
6. Harmonization
a) Constructed variables
7. Data Improvement
- Family structure and
- other derived variables
8. Dissemination
- Location of mother, father
- and spouse
27IPUMS Pointer Variables
(Simple household)
Spouses
2
1
0
0
0
0
Mothers
Fathers
0
0
0
0
0
0
2
1
1
2
1
2
(Colombia 1985)
28IPUMS Pointer Variables
(Complex household)
Spouses
Fathers
Mothers
0
0
0
0
0
0
0
0
0
6
0
5
0
0
0
5
6
0
5
6
0
0
0
0
9
0
0
9
0
(Colombia 1985)
29IPUMS Work Process
1. Inventory
2. Translation
3. Data Restructuring
4. Sample Creation
5. Confidentiality
6. Harmonization
a) Constructed variables
7. Data Improvement
Location of mother, father, and spouse.
8. Dissemination
Family structure and other derived variables.
b) Data editing and missing data allocation
30Missing Data Allocation Occupation Script
OCC allocated when 975, 996, 998
sex (2 categories) 1 2
empstat (3 categories) 10-19 20-29 30-39
classwkr (3 categories) 10-19 20-29 99
age (6 categories) 10-19 20-29 30-39 40-49
50-59 60-126
race (3 categories) 100-199 200-299 300-899
(USA pre-1940 samples)
31IPUMS Work Process
1. Inventory
a) Metadata
2. Translation
3. Data Restructuring
4. Sample Creation
5. Confidentiality
6. Harmonization
7. Data Improvement
8. Dissemination
32IPUMS Work Process
1. Inventory
a) Metadata
2. Translation
3. Data Restructuring
4. Sample Creation
5. Confidentiality
6. Harmonization
b) Dissemination Programming
7. Data Improvement
8. Dissemination
- Extract interface (front end)
- Extract engine (back end)
33 End
Matt Sobek sobek_at_pop.umn.edu