Title: Censuses and Surveys: Still Useful for the Common Good
1Censuses and Surveys Still Useful for the
Common Good?
- Henry E. Brady
- Professor of Political Science and Public Policy
- Director, Survey Research Center and UC DATA
- University of California, Berkeley
2Uses of Census and Survey Data
- Two Examples of Their Usefulness
- Historical Question Was Howard County Maryland
a slave county? - Current Policy Question Immigrants and Welfare
programs in California - Methods
- Mapping Data
- Linking Data
- Across administrative datasets
- Across administrative and survey datasets
3Example Number One Was Howard County Maryland a
Slave County?
- Source Henry E. Brady (UC Berkeley)
4Was Howard County, Maryland a Slave County?
- Method Consider historical census materials by
county across states and over time - Collect data
- Map it
- Visualize it
- Data Source University of Virginia Library,
Historical Census Browser, Geospatial and
Statistical Data Center http//fisher.lib.virgin
ia.edu/collections/stats/histcensus - Further Question How long did this legacy
matter for politics? -
- Data Source ICPSR County Level Returns matched
to Census Data. http//www.icpsr.umich.edu/
5Map of Maryland Counties
District of Columbia
6(No Transcript)
7West Virginia and Virginia
Line Where Virginia and West Virginia Separated
West Virginia
West Virginia
Virginia
8Mason-Dixon Line
9Southern Pennsylvania, Maryland, Delaware, and
Virginia Slavery in 1850
Pennsylvania
Delaware
Maryland
Empirical Line of Demarcation Between Slave and
Non-Slave Counties
Virginia
10What was the Legacy? Getting Voting Data
- Go to ICPSR
- Search for data using search engine
- Download data in SPSS format
- Add Census data by pasting and hand entry
- Analyze data using statistical package
111864
Howard
Southern Counties
121876
Howard
13Example Number Two What is the Experience of
Immigrants with Welfare Programs?
- Source Henry E. Brady (UCB)
- and Jon Stiles (UCB)
14Immigrants and Welfare
- Question What is the experience of immigrants
with welfare programs? - Problem Very few datasets have both
- Immigration status (native, naturalized citizen,
non-citizen) - Immigrant welfare and job experience over time
15Census Survey Data Can Provide
- Nativity Whether native or non-native and date
of entry to US and citizenship status for
non-natives - SES and Demographics -- Household composition,
education, sources of income, race/ethnicity,
marital status, etc. - Cross-Sectional Population Samples -- Description
of both program participants and non-participants
at a point in time.
16Administrative Data Can Provide
- Program Participation Over Time Medi-Cal
Eligibility Data System (MEDS) - Monthly record of eligibility for welfare
programs, 1988-2002 - Programmatic basis for eligibility
- Work History Over Time Employment Development
Department - Base Wage files - Quarterly earnings as reported for UI/DI coverage
from 1991 to 1999 - Identifies number of employers, total covered
earnings
17Census Surveys with Program Participation by
Nativity, 1990-02
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
Samples are drawn each year with household and
personal characteristics measured at
sampling. The CPS follows sampled housing units
for 4 months in 2 consecutive years, while the
SIPP follows households for 2.5 years with
interviews each 4 months
CPS and SIPP samples
18California Administrative Data Program
Participation by Year (MEDS)
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
1
2
1990
1
2
1991
1
2
1992
1
2
1993
1
2
1994
1
2
1995
1
2
1996
1
1997
1
1998
1
1999
1
2
2000
2001
2002
MEDS DATA
Medi-Cal Eligibility and Program Participation
identified monthly for samples following the
initial survey interview for
the year of sampling.
19Year of MEDS coverage
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
1990
1
2
3
4
5
6
7
8
9
10
11
12
13
1991
1
2
3
4
5
6
7
8
9
10
11
12
1992
1
2
3
4
5
6
7
8
9
10
11
1993
1
2
3
4
5
6
7
8
9
10
1994
1
2
3
4
5
6
7
8
9
1995
1
2
3
4
5
6
7
8
1996
1
2
3
4
5
6
7
1997
1
2
3
4
5
6
1998
1
2
3
4
5
1999
1
2
3
4
2000
1
2
3
2001
2002
MEDS DATA
and each subsequent year
through 2002. So individuals in each panel may
be potentially tracked in the MEDS data for up to
13 years after initial sampling
20California Administrative Data Wages by Year
from UI Base Wage File
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
1
2
3
4
5
6
7
8
9
10
12
1990
1
2
3
4
5
6
7
8
9
1991
1
2
3
4
5
6
7
8
1992
1
2
3
4
5
6
7
1993
1
2
3
4
5
6
1994
1
2
3
4
5
1995
1
2
3
4
1996
1
2
3
1997
1
2
1998
1
1999
1
2
2000
2001
2002
EDD DATA
Earnings in UI covered employment are
identified for each quarter from mid-1991 through
1999
21Census Survey Data
State Administrative Data
MEDS Obtains SSN for (almost) all Medi-Cal
eligible persons
CPS Requests SSN for all persons aged 15 in the
household
SIPP Attempts to obtain SSN for all in the
household
EDD Obtains SSN and wages from employers of
UI/DI covered employees
CES and LEHD assign Protected Identification
Keys based on SSN
CES provides crosswalk between PIKs and
publicly available identifiers for CPS and SIPP
CES provides anonymized MEDS And EDD Base Wage
records identified only by PIK
Survey records and Administrative records are
merged using the PIK to create a matched file
Final Linked File
222
3
2001
2002
EDD DATA
MEDS DATA
Matched Survey, MEDS, and UI Earnings data cover
pre- and post-Welfare Reform periods, and weak
and strong economies.
23Basic Problems
- Highly confidential data
- Requires Census Research Data Center
- Non-public data available to researchers
- On a strictly controlled basis
- State Data Must be Matched by Census
- But these are expensive to run
24Two Big Findings in California
- Non-Citizen Elderly Immigrants on Welfare
Non-citizen (but legal) immigrants more likely to
eventually end up on welfare for the elderly
(SSI/SSP), especially if they came at older age
(probably because of less Social Security based
work) - Non-Citizen Immigrant Women on Welfare
Non-citizen (but legal) immigrant women in
two-parent families less likely to get off
welfare (probably because of fewer skills, less
language competency, perhaps cultural factors).
25Percent of Adults on SSI/SSP at Some Point of
Adults in Surveys who Were (or Became) 65 or
Older During 90-02
SSI/SSP Welfare Program for Elderly Poor or
Disabled
Non-Citizens and Naturalized
Entire Population
Non-Citizen
Naturalized
Native
26Aid and Employment in Years after Sampling Women
initially in 2 Parent AFDC/TANF cases(Total
percentage declines over time as we lose track of
people)
Native Women
Non-Citizen Women
Working
Working
Welfare and Work
Still on Welfare
Years Since Initial Sampling
27Thinking about Census and Survey Data
28Three Dimensions of DataQuantity and Quality for
Each
Number of Variables and Item Quality
Number of Cases and Representativeness
Length of Time and Panel Integrity
29Ideal Data Set
- Variables
- As many variables as possible
- High item quality
- Cases
- As many cases as possible
- Highly representative (e.g., random sample)
- Time
- As long a period of time as possible
- Continuous observationno panel mortality
30Surveys Rich in Variables For Short Time
Periods Not Many Cases
Variables
Cases
Most Survey Data
Time
31Administrative Data Weak in Variables Rich in
Cases Rich in Time if Linked Over Time
Variables
Cases
Linked Administrative Data
Time
32Problems with Surveys and Censuses
- Designing/Implementing Good Sample Frames
- Telephone cell phones, no phones, etc.
- Internet choosing random sample,
self-selection - Responses Hard to Get
- Interview Response Rates Declining
- Item non-responses problematic (e.g., income,
race) - Costs High In-person Telephone Expensive
- In-person about 500 to 1500/interview
- Telephone about 50 to 150/interview
- Internet about 5 to 50/interview
- Confidentiality Concerns with Collected Data
33Internet Surveys as Solution?
- Virtues Inexpensive way to collect data but it
requires e-mail addresses hence hard to get
random samples - Three Methods
- Self-selected samples
- Starting with random sample and give them
computers - Very expensive initially
- Hard to maintain random sample because of panel
mortality - Matching Method
- File of e-mail addresses Collects large numbers
of e-mail addresses and personal information from
those willing to be interviewed on the web. - File enumerating Americans Chooses random
samples from a file (like a phone book)
constructed by a commercial firm which contains a
nearly universal file of Americans and some
demographic and SES information on each one of
them. - Matched Sample Interviews the nearest match in
its e-mail address file to those in its random
samples. - Is this Representative Enough? Still not sure
but
34Administrative Data as Solution?
- Virtues Inexpensive way to collect data but it
requires linking of data over time and across
various data-sets using fallible identifiers - Problems
- Mixed quality data Excellent for data related
to administrative purpose often poor for all
other - Confidentiality concerns and problems
- Incomplete coverage
- Change in computer systems over time
35Linked Social Services Data in American
States--1999
36Conclusions
- Exciting New Possibilities
- Internet Interviewing
- Administrative Data
- With Some Real Problems
- Representativeness
- Confidentiality
- Linking