Title: Survey Documentation and Analysis SDA
1Survey Documentation and Analysis (SDA)
2Presenter
- Ed Nelson
- Sociology
- CSU Fresno
- ednelson_at_csufresno.edu
- 559-278-2275
3Workshop Agenda
- Overview
- What is online analysis?
- Available SDA data sets
- Statistical procedures (Frequencies, Crosstabs,
Regression) - Recoding, subsetting, downloading
- Teaching resources for SDA and developing
instructional materials
4SSRIC
Social Science Research Instructional
Councilhttp//www.ssric.org
5The Council
- Oldest CSU discipline council
- Founded in 1972
- Representatives from CSU campuses meet three
times per year - Negotiates with data providers for
- access to data
- Promotes use of data analysis in
- research and teaching
6The Council
- Annual student research conference
- at CSU Long Beach in 2008
- at CSU Sacramento in 2009
- Sponsors travel to ICPSR summer workshops in Ann
Arbor, Michigan - http//www.ssric.org/participate/icpsr_summer
- Works with Field Research
- Question credits to California Field Poll
- Selects faculty fellow
7What is Online Analysis?
- Online data analysis" refers to the ability to
perform statistical analysis using special
Web-based software as an alternative to
downloading data into a standalone statistical
package on your computer. - The software were using is called Survey
Documentation and Analysis (SDA), which was
developed at the University of California,
Berkeley.
8Alternative Statistical Packages
- You can get a complete list of available online
statistical packages at http//statpages.org/ - Some of these include
- OpenStat
- ViSta
- Statext
- SISA
9Advantages
- Many like SDA are free dont require a site
license - Only require a computer with an internet
connection - Some like SDA are easy to learn
- Can show students how to use some of them in 30
minutes or less
10Disadvantages
- Some online statistical packages (certainly not
all) are limited in what they can do
statistically - Documentation is not very good for some
- Some (like SDA) can only be used with data sets
that have already been created in a format that
can be read by that package
11Available SDA Data Sets
12SDA Data Sets
- While SDA is an extremely easy statistical
package to learn to use, its difficult to create
SDA data sets. - You have to purchase a SDA site license to create
a data set and then learn how to use it. - So we typically use SDA data sets that have been
created for us.
13Sources for SDA Data Sets
- SDA Archive located at UC Berkeley
(http//sda.berkeley.edu/archive.htm) - ICPSR Topical Archives (http//www.icpsr.org/cocoo
n/ICPSR/all/archives.xml) - Field data located at UC Berkeley
(http//ucdata.berkeley.edu/data_record.php?recid
3analyze) - List of SDA data sets at CSU Long Beach
(http//www.csulb.edu/library/eref/datasets.html)
- University of Denvers IDEA project
(http//www.du.edu/idea/data.htm
14SDA Archive at UC Berkeley (http//sda.berkeley.
edu/archive.htm)
- GSS Cumulative Datafile (1972-2008 2008 is a
preliminary version). - ANES Cumulative Datafile (1948-2000) and ANES
datafiles for 1996, 2000, and 2004. - Census microdata including 2000-2003 American
Community Surveys and 1990 and 2000 U.S. 1 PUMS
with separate files for 2000 and 1990 California
PUMS.
15ICPSR
- National Archive of Computerized Data on Aging
(http//www.icpsr.umich.edu/NACDA/) - National Archive of Criminal Justice Data
(http//www.icpsr.umich.edu/NACJD/) - Substance Abuse and Mental Health Data Archive
(http//www.icpsr.umich.edu/SAMHDA/) - International Archive of Education Data
(http//www.icpsr.umich.edu/IAED/)
16Field Data http//ucdata.berkeley.edu/data_record
.php?recid3analyze
- Field Polls from 1956 through 2006 are available
as publicly-accessible SDA data sets - More recent Field Polls are available as SPSS
data sets (through FTP) for CSU faculty, staff,
and students.
17Other Sources of SDA Data Sets at ICPSR
- Voting Behavior The 2004 Election by Charles
Prysby and Carmine Scavo (http//www.icpsr.umich.e
du/SETUPS/) - Investigating Community and Social Capital by
Lori Weber (http//www.icpsr.umich.edu/ICSC/index.
htm)
18Statistical Procedures
19Available Statistical Procedures
- Frequencies and crosstabulation (discussed in
this workshop) - Comparison of means
- Correlation matrix
- Comparison of correlations
- Multiple regression (discussed in this workshop)
- Logit/Probit regression
20Using SDA
- Select the data set
- Look at the codebook
- Decide what statistical procedure to use
- Fill in what you want to do
- Run it
21Data Set
- Were going to use the GSS 1972-2008 Cumulative
Data File (2008 is preliminary data) - http//sda.berkeley.edu/archive.htm
- Were going to use three variables
- SEX
- RELITEN
- PORNLAW
22Frequencies
- List the variables you want to use
- ROW SEX,RELITEN,PORNLAW
- Click on Run the Table
23(No Transcript)
24(No Transcript)
25Crosstabs
- Now lets use RELITEN as our independent variable
and PORNLAW as our dependent variable to create
two bivariate crosstabulations. - List the variables
- ROW PORNLAW
- COLUMN RELITEN
26Crosstabulation Continued
- Options
- Percentaging column
- Statistics
- Question text
- Color coding
- Run the Table
27(No Transcript)
28(No Transcript)
29Your Turn
- Lets run two more bivariate crosstabs
- Independent variable SEX
- Dependent variables RELITEN and PORNLAW
- Go ahead and run these crosstabs
30What Did we Discover?
- RELITEN is strongly related to PORNLAW.
- SEX is also related to both RELITEN and PORNLAW.
- Could the relationship between RELITEN and
PORNLAW be spurious? SEX is related to both
RELITEN and PORNLAW and could be creating the
relationship between RELITEN and PORNLAW. - How do we test this possibility? Lets run a
three-variable crosstabulation with RELITEN as
our independent variable, PORNLAW as our
dependent variable, and SEX as our control
variable.
31Multivariate Crosstabulation
- List the variables
- ROW PORNLAW
- COLUMN RELITEN
- CONTROL SEX
- Options
- Percentaging column
- Statistics
- Question text
- Color coding
32(No Transcript)
33(No Transcript)
34(No Transcript)
35Spuriousness
- Was the relationship between RELITEN and PORNLAW
spurious due to SEX? - How do you know?
- Does that mean that the relationship can never be
spurious?
36Regression
- Crosstabulation is used when all the variables
are categorical. - What do we do when our variables are continuous
(i.e., interval and/or ratio)? - Regression is the answer.
37Bivariate Regression
- Lets look at the relationship between the
respondents socioeconomic status (SEI) and the
amount of television one watches (TVHOURS). - List the variables
- Dependent TVHOURS
- Independent SEI
- Options
- T-Tests
- Correlation matrix
- Color coding
- Question Text
38(No Transcript)
39(No Transcript)
40Multivariate Regression
- Now lets add in another variable SEX
- But sex is not a continuous variable. How do we
enter a variable like SEX into the regression
analysis? Answer create a dummy variable. - Dummy variables take on the values of 1 and 0.
41Creating a Dummy Variable
- SEX (d1)
- SEX is the name of the variable to want to make
into a dummy variable - d indicates that you want to create a dummy
variable - 1 indicates that the value 1 will be assigned the
value 1. All other values will be assigned the
value 0. - Run the table
42(No Transcript)
43(No Transcript)
44Recoding, Subsetting, Downloading
45Recoding Existing VariablesExample (from GSS
Cumulative File) ATTEND (How often Respondent
attends religious services)
- ATTEND0 Never1 Less than once a year2 Once a
year - 3 Several times a year
- 4 Once a month
- 5 2 to 3 times a month
- 6 Nearly Every Wk
- 7 Every week
- 8 More than once a week
- 9 DK/NA (Missing)
- ATTENDR
- 1 Seldom (0 to 3)
- 2 Sometimes (4 to 5)
- 3 Often (6 to 8)
- 9 Missing (9)
46(No Transcript)
47(No Transcript)
48(No Transcript)
49(No Transcript)
50Your Turn
- Recode AGE into the following categories
- 1 18-29
- 2 30-64
- 3 65 and older
- Obtain FREQUENCIES for the result
51For More Information, See
- http//sda.berkeley.edu/HELPDOCS/helpnewv.htmreco
de
52Compute a New Variable Example (from GSS
Cumulative File) Alienation Index
- Create measure of ALIENATION from these variables
asked in 1978 only (all coded as 1agree,
2disagree, other missing data) - ALIENAT1 PEOPLE RUNNING COUNTRY DONT CARE
- ALIENAT2 RICH GET RICHER, POOR POORER
- ALIENAT3 WHAT YOU THINK DOESNT COUNT
- ALIENAT4 YOU'RE LEFT OUT OF THINGS
- ALIENAT5 POWERFUL PEOPLE TAKE ADVANTAGE OF YOU
- ALIENAT6 PEOPLE IN WASH D.C. ARE OUT OF TOUCH
53(No Transcript)
54(No Transcript)
55(No Transcript)
56Your Turn
- Create an index of parental education (MAEDUC
PAEDUC)/2
57For More Information, See
- http//sda.berkeley.edu/HELPDOCS/helpnewv.htmcomp
ute
58Subsetting and Downloading
- Example create and download a subset of the GSS
cumulative file, selecting only cases from 2008,
all Case Identification variables and some
Personal and Family Information variables
(MARITAL, AGEWED, DIVORCE, WIDOWED). - At end of each intermediate step, click on
Continue button.
59(No Transcript)
60(No Transcript)
61(No Transcript)
62(No Transcript)
63SPSS Syntax File
64Creating an SPSS system file
- Run SPSS (syntax) file against data (ASCII) file.
- For more information, see
- http//www.ssric.org/data/icpsr_direct (scroll
down) - http//www.ssric.org/data/icpsr_direct (scroll to
Syntax Files) - http//www.icpsr.com/cocoon/ICPSR/FAQ/0062.xml
- http//web.pdx.edu/stipakb/download/Data/SDA_data
_to_SPSS.pdf (portions outdated)
65File Directory
66(No Transcript)
67(No Transcript)
68Your Turn
- Subset and download your own custom GSS SPSS
system file.
69Sample Instructional ApplicationsCrosstabs With
a Control Variable
70Example 1
- GSS Cumulative File (selecting 2002 and 2004
only) - Crosstab Voting in 2000 election (VOTE00) by
computer usage (COMPUSE). - Repeat, but with a control for respondents
education level (DEGREE).
71Example 2
- ANES 2004 Study
- Instructors note In addition to using this
example in teaching use of control variables, I
also use it in teaching about reactivity in
interviewing. - Run frequency distribution for V5205 (Working
mother can have warm relationship with kids). - Crosstab V5205 with V1109a (Respondent gender).
Weight by Post-election weight - Repeat, but use V4103 (Interviewer gender) as
independent variable - Run frequency distribution for V4103
- Repeat 1 with a control for V4103
- Repeat 2 with a control for V1109a
72Teaching Resources for SDAand Developing
Instructional Materials
73ICPSR Web-Based Instructional Materialshttp//www
.icpsr.umich.edu/ICPSR/training/index.htmlinstruc
tional
74Investigating Community Social
Capitalhttp//www.icpsr.umich.edu/ICSC/index.html
75Voting Behavior the 2004 Electionhttp//www.icps
r.umich.edu/SETUPS/index.html