Using Census Summary File Data for Research - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Using Census Summary File Data for Research

Description:

Contains the same basic variables as individual census data ... Ancestry. Disability. Employment Status. Grandparents as Caregivers. Households and Families ... – PowerPoint PPT presentation

Number of Views:127
Avg rating:3.0/5.0
Slides: 54
Provided by: nancya9
Category:

less

Transcript and Presenter's Notes

Title: Using Census Summary File Data for Research


1
Using Census Summary File Data for Research
  • Nancy A. Denton
  • SUNY Albany
  • n.denton_at_albany.edu

2
Why use summary data?
  • Available more quickly than microdata
  • Interested in places rather than people
  • Believe that places affect people

3
Research with Summary Data
  • Contains the same basic variables as individual
    census data
  • Involves a different way of thinking
  • Requires somewhat different programming skills
  • Raises different statistical issues and different
    error sources

4
Characteristics of Summary Data
  • Each value of what is a traditional variable in
    microdata becomes a separate variable in summary
    data - a summary variable
  • Each summary variable is a single cell of an
    n-way cross tabulation so it contains a count
  • Each table has a universe of persons, families,
    households, houses

5
Individual v. Summary Data
  • Variable is RACE
  • Values of Race
  • 1white
  • 2black
  • 3 Native American
  • 4Asian
  • 5NHOPI
  • 6Other
  • 7Two or more
  • Values are codes
  • Variables in Table 7
  • P007001Total pop
  • P007002White alone
  • P007003Black alone
  • P007004Nat. Am. alone
  • P007005Asian alone
  • P007006NHOPI alone
  • P007007Other alone
  • P007008Two or more
  • Values are counts

6
Characteristics of Data Files
  • Individual Data File
  • Household 1
  • person 1 race, age, sex ..
  • person 2 race, age, sex ..
  • person n race, age, sex ..
  • Household 2
  • Summary Data File
  • Geographic Unit 1 Total, Whites, Blacks, Asians,
    NHOPI, Native Americans, Other, Age 0, Age 1, Age
    2, Age 3, Age 4 Age n, Male, Female
  • Geographic Unit 2 Total, Whites, Blacks, etc.

7
Compared to Individual Data Files, Summary Files
are
  • Not as flexible
  • Very large
  • Seldom are amenable to analysis with traditional
    statistical programs without some preliminary
    manipulation

8
But the big advantage of summary files is that
they
  • Allow you to break the 100,000 person barrier --
    the size of the smallest PUMA on microdata

9
For large geographical units, you can use either
type of data
  • How does the child poverty rate compare across
    states?
  • Which is more important in determining the state
    child poverty rate, racial composition or family
    structure?

10
How would you answer these questions with PUMS
data?
  • Essentially, youd create aggregate data by
    adding up the individual data to the state level
  • Programming wise, youd create dummy variables
    for children, poor children, blacks, and single
    mother families and then add up to the state
    level to get the rates

11
More concretely (in SAS)
  • Data temp
  • Set pumsdat person file w/hhold data
    attached
  • If age le 17 then child1 else child0
  • If child1 and pov1 then poorchild1
  • else poorchild0
  • If race2 then black1 else black0
  • If hhtyp3 then singmom1
  • else singmom0

12
Now add up
  • Proc Means noprint
  • by state
  • Var child poorchild black singmom
  • tpop tfam
  • output outstatdat
  • sum

13
Now compute rates
  • Data srate
  • set statdat
  • Chpovpoorchild/child100
  • Pcblkblack/tpop100
  • Pcsingsingmom/tfam100
  • Proc print var state chpov pcblk pcsing
  • Proc reg
  • Model chpovpcblk pcsing

14
Nested Geographic LevelsState Files
  • Nation
  • States
  • Counties
  • County subdivisions
  • Places
  • Census tracts
  • Block groups (for selected
    tables)

15
Small-Area Geography Overview
16
Census Tracts
  • For the first time for Census 2000 Cover the
    nation
  • Relatively homogenous population characteristics
  • 65,000 Census tracts across U.S.
  • Size optimally 4,000 people, range between 1,000
    and 8,000

17
2000 Summary File (SF) Data
  • Short form data
  • --- PL94-171
  • --- SF1
  • --- SF2 (race detail)
  • Long form data
  • --- SF3
  • --- SF4 (race/ethnic detail)

18
Census 2000 Short Form Questionnaire
  • 7 Questions
  • Name
  • Sex
  • Age
  • Relationship
  • Hispanic Origin
  • Race
  • Owner/Renter Status

19
Population SubjectsSummarized to Census Tract
  • Ancestry
  • Disability
  • Employment Status
  • Grandparents as Caregivers
  • Households and Families
  • Income (Family, Nonfamily, Indiv)
  • Language Spoken
  • Marital Status
  • Migration
  • Birthplace, Year of Entry, Citizenship
  • Poverty Status
  • School Enrollment and Educational Attainment

20
Housing Subjects
  • Units in Structure
  • Year Built
  • Rooms
  • Year Householder Moved In
  • Rent/Value
  • House Heating Fuel
  • Vehicles Available
  • Mortgage Status and Monthly Costs
  • Plumbing and Kitchen Facilities
  • Telephone Service
  • Occupants Per Room

21
SF Data features
  • Same data available for ALL units of geography
    covered by that file
  • Smallest unit of geography varies across files
  • All files have nested geography

22
Some Uses of Summary Data in Research
  • Find out about a particular place
  • Compute Metro Area Indices
  • Construct Patterns of Neighborhood Race/Ethnic
    Composition
  • Calculate Neighborhood Profiles
  • Trace Paths of Neighborhood Change over time
  • Attach Summary Data to Individual Data to Predict
    Neighborhood Effects

23
I. Find out about a particular place
  • Go to the library
  • Go on line to www.census.gov
  • Use American Factfinder

24
(No Transcript)
25
II. Metro Area Indices
  • What is an index?
  • An Index is a single number
  • which reflects the characteristics of tracts (or
    any other unit of geography)
  • aggregated to the metro (city, suburban, county)
    level
  • in such a way that it reveals something about
    the distribution of groups in space

26
Index of Dissimilarity
  • D is its common name
  • measures evenness
  • what proportion of either group would have to
    change neighborhoods if each neighborhood had the
    same racial composition as the city (or metro
    area) as a whole?
  • workhorse of segregation studies

27
Formula for Dissimilarity
  • where xi and yi refer to tract totals and X and Y
    refer to metro-wide totals

28
So how would we calculate that
  • Create little x and ys for each tract
  • Sum up to get big X and big Y for metro area
  • Calculate the index and add up across all tracts

29
In SAS
  • Data temptract Set tractdata
  • xP007002 whites
  • yP007003 blacks
  • Proc means noprint
  • Var x y
  • output outtots sum TX TY

30
Then
  • Data index
  • merge temptract tots put denominators on
  • by msa
  • calculate index
  • Dwb .5ABS(sum((x/TX),-(y/TY)))
  • if last.msa then do
  • output write out index for 1st msa
  • Dwb0 to begin anew for next msa
  • end
  • Proc print by msa Format Dwb 5.3

31
Issue to face with Dissimilarity
  • Youre only comparing two groups at a time while
    the population of almost all areas contains more
    than that
  • If you define your groups as group x and the
    remainder, then when you compare indices to each
    other, the reference group changes for each group
    studied

32
P-star Indices
  • Look at things from the perspective of within the
    neighborhood
  • How many people look like me?
  • Isolation
  • How many people are different and of what type?
  • Contact
  • Both are calculated the same way

33
P-star Formula
  • Where xi and yi are tract-level populations of
    groups x and y, ti is the total population of the
    tract, and X is the metro-wide total of group X
  • For isolation, use same group on both sides

34
Issue to face with P-star
  • What is the tract total population in a
    multi-group world?

35
Other Dimensions of Segregation
  • Centralization
  • Clustering
  • Concentration
  • Indices representing these dimensions also have
    the two-group problem
  • See Reardon and Firebaugh for latest information
    on Multigroup Measures.

36
So, what do we know about segregation today?
37
U.S. Census Bureau report on Residential
Segregation 1980-2000 says
  • The trend for Blacks or African Americans is
    clearest of all -- declines in segregation were
    observed over the 1980 to 2000 period across all
    dimensions of segregation we considered.
  • Despite these declines, residential segregation
    was still higher for African Americans than for
    the other groups across all measures. Hispanics
    or Latinos were generally the next most highly
    segregated, followed by Asians and Pacific
    Islanders, and then American Indians and Alaska
    Natives, across a majority of the measures.

38
Same report continues
  • Asians and Pacific Islanders, as well as
    Hispanics, tended to experience increases in
    segregation, though not across all dimensions.
  • Increases were generally larger for Asians and
    Pacific Islanders than for Hispanics.
  • Iceland, Weinberg and Steinmetz, 2002.

39
(No Transcript)
40
III. Portray Patterns of Neighborhood Race/Ethnic
Composition
  • Assume youre interested in four groups
  • Whites, Blacks, Hispanics, Asians

41
  • Number of Groups in Neighborhood
  • 1 group 2 group 3 group 4 group
  • W--- WB-- WBH- WBHA
  • Wddd W-H- WB-A
  • -B-- W--A W-HA
  • --H- -BH- -BHA
  • ---A -B-A
  • --HA
  • Need to establish group presence cut-off

42
In 50 Largest MSA/CMSAs in 2000
  • Pattern Tracts Pop (000)
  • W--- 14.1 18,899
  • Wddd 11.0 19,275
  • WB-- 14.8 23,291
  • W-H- 14.2 26,686
  • WB-H 13.2 25,534
  • W-HA 9.1 18,857
  • WBHA 9.3 19,943
  • Total 85.7 152,485
  • 41,521 85.9

43
In Suburbs of 50 Largest MSA/CMSAs in 2000
  • Pattern Tracts Pop (000)
  • W--- 22.4 15,666
  • Wddd 13.5 12,473
  • WB-- 12.0 9,773
  • W-H- 15.4 14,930
  • WBH- 10.3 11,011
  • W-HA 9.2 10,165
  • WBHA 7.5 8,887
  • Total 90.3 82,905
  • 23,505 85.6

44
IV. Calculate Neighborhood Profiles
  • Average Neighborhood Characteristics for members
    of a particular group
  • Variation on the P-star Index
  • Strategy
  • compute characteristics for each tract
  • use population groups as weights

45
Cleveland, 2000
  • Pattern Tracts Pop (000)
  • W--- 36.8 991
  • Wddd 9.4 241
  • -B-- 12.6 209
  • WB-- 21.5 421
  • W-H- 3.2 64
  • WA 1.5 39
  • WBH- 11.3 208
  • WB-A 2.5 52
  • Total 98.8 2,225
  • 99.2

46
Cleveland, Neighborhood SES Characteristics
  • 2000 Group Med. Inc. Med. House
  • Total 42,937 117,149
  • W--- 54,051 146,330
  • Wddd 51,135 142,348
  • -B-- 22,334 60,652
  • WB-- 41,196 108,965
  • W-H- 38,359 91,909
  • WA 65,234 231,590
  • WBH- 27,503 70,829
  • WB-A 38,830 148,829

47
V. Trace Paths of Neighborhood Change over time
  • Basically just computing patterns for different
    years and then cross classifying them
  • More difficult is fact that the tract boundaries
    must be matched over time

48
Change in All-White (95) Neighborhoods
1970-1990
49
Ten Most Frequent Paths of Neighborhood Change
1970-1990 for All-white Neighborhoods in 1970
  • Start All-White N __
  • W---W---W--- 2744 30.1
  • W---W---Wddd 815 8.9
  • W---WdddWddd 271 3.0
  • W---WB--WB-- 220 2.4
  • W---WdddW--A 168 1.8
  • W---W-H-W-H- 140 1.5
  • W---W---W-H- 123 1.3
  • W---WdddW-H- 120 1.3
  • W---W---W--A 112 1.2
  • W---W---WB-- 98 1.1

50
1990 Neighborhood Characteristics of All-white
Neighborhoods in 1970
  • 1990 Group Med. Inc. Med. House
  • All-white 70-90 41, 273 113,949
  • Wddd 46,629 152,951
  • WB-- 35,743 88,955
  • WA 61,331 283,363
  • W-H- 38,224 139,400
  • WBH- 33,035 101,165
  • W-HA 47,738 236,144
  • WBHA 42,129 154,509

51
VI. Attach Summary Data to Individual Data
  • If youre collecting your own data, then you can
    geocode it from the address
  • With US Census data, because of privacy issues,
    you must use a confidential data center

52
Neighborhood Effects
  • Some publicly available data have already done
    this
  • PSID
  • Add-Health
  • MCSUI
  • MTO

53
In conclusion
  • Summary data are currently underutilized in
    research
  • Methodological issues remain to be solved
  • Availability of confidential sites should
    increase their potential for use by researchers
Write a Comment
User Comments (0)
About PowerShow.com