Title: Manipulating Census SF1 by Using SAS
1Manipulating Census SF1 by Using SAS
Wei Sun University of Notre Dame
2SF1 File Structure
- 39 table files and one geographic header file in
each state. - Table file contains data tables
st00001.uf1st0039.uf1. - Geographic header file includes geographic
identification variables stgeo.uf1. -
3Geographic Header Record File
- Provides geographic record codes and area codes.
The area codes - are from region, division, state, to tracts,
and block. - Summary level Sequence Code (SUMLEV) identifies
hierarchy - geographic level.
- e.g. SUMLEV080 stands for
State-County-Place-Tract - The geographic header file is standard across
all Census electronic data products.
4Table File
- Does not contain geographic area codes.
- At the beginning of each table file, there are
five geographic - record codes FILEID, STUSAB, CHARITER, CIFSN,
- LOGRECNO. LOGRECNO is logical record number,
- which is assigned to link all files for a
specific geographic - entity.
- The rest of table file contains data tables.
5Reading SF1 Data into SAS
- Unzipped SF1 data files and geographic header
file. - Look up data dictionary and File/Table
Segmentation in SF1 - document to determine which data files need to
be downloaded. - For example, table P8 in il00002.uf1 has
population for - white alone (p008003), black alone(p008004),
and - Hispanics(p008010)
6Reading SF1 Data into SAS (Cont.)
- Use INFILE statement to point to external file
- il00002.uf1 and ilgeo.uf1
- INPUT statement reads variables from each file
in SAS - geographic header record file (ilgeo.uf1) is
fixed field format -
- table file (il00002.uf1) is comma delimited
format -
7Reading SF1 Data into SAS (Cont.)SAS Program File
libname il 'C\wei\workshop\sf1' filename ilgeo
'c\wei\workshop\sf1\ilgeo.uf1' data
il.header infile ilgeo missover dsd lrecl8192
input fileid 1-6 stusab 7-8 sumlev 9-11
geocomp 12-13 logrecno 19-25 region
26-26 division 27-27 state 30-31
county 32-34 cousub 37-41 place 46-50
tract 56-61
8Reading SF1 Data into SAS (Cont.)
- To test if raw data files are successfully read
into SAS, use the following SAS procedures - PROC CONTENTS
- PROC PRINT
- VAR varnames
9Subsetting Geographic Header File
- Specify summary level (SUMLEV), county code
(COUNTY) - and tract numbers (TRACT) using IF statement.
- if SUMLEV080
- if COUNTY310 (Cook county, IL)
- if 610100
- or 310100
- or 300100/Little Village /
-
10Combining Geographic Header File with Table File
- Sort files by common variable LOGRECNO.
- PROC SORT
- BY logrecno
- Use MERGE statement to combine two files.
-
- DATA newfile
- MERGE file1 file2
- BY logrecno
11Computing Dissimilarity Index
- Summarize each population group by place using
-
- PROC SUMMARY
- CLASS place / New City, Pilsen, and Little
Village / - VAR varnames
- Calculate percent population in each group by
tract. - Take differences between two percent population
- percent Hispanics vs. percent white
- percent Hispanics vs. percent black
- percent white vs. percent black
12Computing Dissimilarity Index (Cont.)
- Take the absolute values of differences and
divide them by 2. - Summarize the dissimilarity index on each tract
by place. - PROC PRINT
- SUM varname
- BY place
13Frequently Used Operators in SAS
14SAS Output
------------------------ regionLittle V
------------------- Obs
difhw difhb difwb 1
0.00383 0.01277 0.00895
2 0.00031 0.00686 0.00656
3 0.00227 0.02205 0.01978
4 0.00022 0.01692 0.01670
5 0.01210 0.00566 0.00644
6 0.00154 0.01074 0.00920
7 0.02101 0.03457
0.01356 8 0.02021 0.02998
0.00977 9 0.02334
0.03715 0.01380 10 0.00367
0.00756 0.00389 11
0.00709 0.01261 0.00552 12
0.00646 0.01718 0.01072
13 0.14540 0.32526 0.17987
14 0.00175 0.00531 0.00357
15 0.01165 0.02610 0.01444
16 0.02043 0.03587 0.01543
17 0.02346 0.05987 0.03640
18 0.01921 0.08513
0.06592 19 0.00144 0.01007
0.01151 20 0.00948
0.01350 0.02298 ------ -------
------- ------- region
0.33487 0.77517 0.47502
15Comparison Between SAS Output and Excel Results
Dissimilarity Index in New
City, IL, 2000
SAS Excel Hispanics/White
0.612
61.2 Hispanics/Black 0.761
76.2 Black/White
0.813 81.3
16Comparison Between SAS Output and Excel Results
Exposure Index in New City, IL,
2000
SAS Excel Hispanics to White
0.096 9.7 Hispanics to Black
0.135
13.6 Hispanics to Others 0.0135
1.4