Manipulating Census SF1 by Using SAS - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Manipulating Census SF1 by Using SAS

Description:

Provides geographic record codes and area codes. The area codes ... Does not contain geographic area codes. ... Specify summary level (SUMLEV), county code (COUNTY) ... – PowerPoint PPT presentation

Number of Views:124
Avg rating:3.0/5.0
Slides: 17
Provided by: wsun
Category:
Tags: sas | area | census | code | manipulating | sf1 | using

less

Transcript and Presenter's Notes

Title: Manipulating Census SF1 by Using SAS


1
Manipulating Census SF1 by Using SAS
Wei Sun University of Notre Dame
2
SF1 File Structure
  • 39 table files and one geographic header file in
    each state.
  • Table file contains data tables
    st00001.uf1st0039.uf1.
  • Geographic header file includes geographic
    identification variables stgeo.uf1.

3
Geographic Header Record File
  • Provides geographic record codes and area codes.
    The area codes
  • are from region, division, state, to tracts,
    and block.
  • Summary level Sequence Code (SUMLEV) identifies
    hierarchy
  • geographic level.
  • e.g. SUMLEV080 stands for
    State-County-Place-Tract
  • The geographic header file is standard across
    all Census electronic data products.

4
Table File
  • Does not contain geographic area codes.
  • At the beginning of each table file, there are
    five geographic
  • record codes FILEID, STUSAB, CHARITER, CIFSN,
  • LOGRECNO. LOGRECNO is logical record number,
  • which is assigned to link all files for a
    specific geographic
  • entity.
  • The rest of table file contains data tables.

5
Reading SF1 Data into SAS
  • Unzipped SF1 data files and geographic header
    file.
  • Look up data dictionary and File/Table
    Segmentation in SF1
  • document to determine which data files need to
    be downloaded.
  • For example, table P8 in il00002.uf1 has
    population for
  • white alone (p008003), black alone(p008004),
    and
  • Hispanics(p008010)

6
Reading SF1 Data into SAS (Cont.)
  • Use INFILE statement to point to external file
  • il00002.uf1 and ilgeo.uf1
  • INPUT statement reads variables from each file
    in SAS
  • geographic header record file (ilgeo.uf1) is
    fixed field format
  • table file (il00002.uf1) is comma delimited
    format

7
Reading SF1 Data into SAS (Cont.)SAS Program File
libname il 'C\wei\workshop\sf1' filename ilgeo
'c\wei\workshop\sf1\ilgeo.uf1' data
il.header infile ilgeo missover dsd lrecl8192
input fileid 1-6 stusab 7-8 sumlev 9-11
geocomp 12-13 logrecno 19-25 region
26-26 division 27-27 state 30-31
county 32-34 cousub 37-41 place 46-50
tract 56-61
8
Reading SF1 Data into SAS (Cont.)
  • To test if raw data files are successfully read
    into SAS, use the following SAS procedures
  • PROC CONTENTS
  • PROC PRINT
  • VAR varnames

9
Subsetting Geographic Header File
  • Specify summary level (SUMLEV), county code
    (COUNTY)
  • and tract numbers (TRACT) using IF statement.
  • if SUMLEV080
  • if COUNTY310 (Cook county, IL)
  • if 610100
  • or 310100
  • or 300100/Little Village /

10
Combining Geographic Header File with Table File
  • Sort files by common variable LOGRECNO.
  • PROC SORT
  • BY logrecno
  • Use MERGE statement to combine two files.
  • DATA newfile
  • MERGE file1 file2
  • BY logrecno

11
Computing Dissimilarity Index
  • Summarize each population group by place using
  • PROC SUMMARY
  • CLASS place / New City, Pilsen, and Little
    Village /
  • VAR varnames
  • Calculate percent population in each group by
    tract.
  • Take differences between two percent population
  • percent Hispanics vs. percent white
  • percent Hispanics vs. percent black
  • percent white vs. percent black

12
Computing Dissimilarity Index (Cont.)
  • Take the absolute values of differences and
    divide them by 2.
  • Summarize the dissimilarity index on each tract
    by place.
  • PROC PRINT
  • SUM varname
  • BY place

13
Frequently Used Operators in SAS
14
SAS Output
------------------------ regionLittle V
-------------------   Obs
difhw difhb difwb   1
0.00383 0.01277 0.00895
2 0.00031 0.00686 0.00656
3 0.00227 0.02205 0.01978
4 0.00022 0.01692 0.01670
5 0.01210 0.00566 0.00644
6 0.00154 0.01074 0.00920
7 0.02101 0.03457
0.01356 8 0.02021 0.02998
0.00977 9 0.02334
0.03715 0.01380 10 0.00367
0.00756 0.00389 11
0.00709 0.01261 0.00552 12
0.00646 0.01718 0.01072
13 0.14540 0.32526 0.17987
14 0.00175 0.00531 0.00357
15 0.01165 0.02610 0.01444
16 0.02043 0.03587 0.01543
17 0.02346 0.05987 0.03640
18 0.01921 0.08513
0.06592 19 0.00144 0.01007
0.01151 20 0.00948
0.01350 0.02298 ------ -------
------- ------- region
0.33487 0.77517 0.47502  
15
Comparison Between SAS Output and Excel Results
Dissimilarity Index in New
City, IL, 2000
SAS Excel Hispanics/White
0.612
61.2 Hispanics/Black 0.761
76.2 Black/White
0.813 81.3
16
Comparison Between SAS Output and Excel Results
Exposure Index in New City, IL,
2000
SAS Excel Hispanics to White
0.096 9.7 Hispanics to Black
0.135
13.6 Hispanics to Others 0.0135
1.4
Write a Comment
User Comments (0)
About PowerShow.com