Survey Documentation and Analysis SDA - PowerPoint PPT Presentation

1 / 74
About This Presentation
Title:

Survey Documentation and Analysis SDA

Description:

Teaching resources for SDA and developing instructional materials. SSRIC ... Logit/Probit regression. Using SDA. Select the data set. Look at the codebook ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 75
Provided by: ssr6
Category:

less

Transcript and Presenter's Notes

Title: Survey Documentation and Analysis SDA


1
Survey Documentation and Analysis (SDA)
2
Workshop Agenda
  • Overview
  • What is online analysis?
  • Available SDA data sets
  • Statistical procedures (Frequencies, Crosstabs,
    Regression)
  • Recoding, subsetting, downloading
  • Teaching resources for SDA and developing
    instructional materials

3
SSRIC
Social Science Research Instructional
Councilhttp//www.ssric.org
4
The Council
  • Oldest CSU discipline council
  • Founded in 1972
  • Representatives from CSU campuses meet three
    times per year
  • Negotiates with data providers for
  • access to data
  • Promotes use of data analysis in
  • research and teaching

5
The Council
  • Annual student research conference
  • at CSU Long Beach in 2008
  • at CSU Sacramento in 2009
  • Sponsors travel to ICPSR summer workshops in Ann
    Arbor, Michigan
  • http//www.ssric.org/participate/icpsr_summer
  • Works with Field Research
  • Question credits to California Field Poll
  • Selects faculty fellow

6
What is Online Analysis?
  • Online data analysis" refers to the ability to
    perform statistical analysis using special
    Web-based software as an alternative to
    downloading data into a standalone statistical
    package on your computer.
  • The software were using is called Survey
    Documentation and Analysis (SDA), which was
    developed at the University of California,
    Berkeley.

7
Alternative Statistical Packages
  • You can get a complete list of available online
    statistical packages at http//statpages.org/
  • Some of these include
  • OpenStat
  • ViSta
  • Statext
  • SISA

8
Advantages
  • Many like SDA are free dont require a site
    license
  • Only require a computer with an internet
    connection
  • Some like SDA are easy to learn
  • Can show students how to use some of them in 30
    minutes or less

9
Disadvantages
  • Some online statistical packages (certainly not
    all) are limited in what they can do
    statistically
  • Documentation is not very good for some
  • Some (like SDA) can only be used with data sets
    that have already been created in a format that
    can be read by that package

10
Available SDA Data Sets
11
SDA Data Sets
  • While SDA is an extremely easy statistical
    package to learn to use, its difficult to create
    SDA data sets.
  • You have to purchase a SDA site license to create
    a data set and then learn how to use it.
  • So we typically use SDA data sets that have been
    created for us.

12
Sources for SDA Data Sets
  • SDA Archive located at UC Berkeley
    (http//sda.berkeley.edu/archive.htm)
  • ICPSR Topical Archives (http//www.icpsr.org/cocoo
    n/ICPSR/all/archives.xml)
  • Field data located at UC Berkeley
    (http//ucdata.berkeley.edu/data_record.php?recid
    3analyze)
  • List of SDA data sets at CSU Long Beach
    (http//www.csulb.edu/library/eref/datasets.html)
  • University of Denvers IDEA project
    (http//www.du.edu/idea/data.htm

13
SDA Archive at UC Berkeley (http//sda.berkeley.
edu/archive.htm)
  • GSS Cumulative Datafile (1972-2008 2008 is a
    preliminary version).
  • ANES Cumulative Datafile (1948-2000) and ANES
    datafiles for 1996, 2000, and 2004.
  • Census microdata including 2000-2003 American
    Community Surveys and 1990 and 2000 U.S. 1 PUMS
    with separate files for 2000 and 1990 California
    PUMS.

14
ICPSR
  • National Archive of Computerized Data on Aging
    (http//www.icpsr.umich.edu/NACDA/)
  • National Archive of Criminal Justice Data
    (http//www.icpsr.umich.edu/NACJD/)
  • Substance Abuse and Mental Health Data Archive
    (http//www.icpsr.umich.edu/SAMHDA/)
  • International Archive of Education Data
    (http//www.icpsr.umich.edu/IAED/)

15
Field Data http//ucdata.berkeley.edu/data_record
.php?recid3analyze
  • Field Polls from 1956 through 2006 are available
    as publicly-accessible SDA data sets
  • More recent Field Polls are available as SPSS
    data sets (through FTP) for CSU faculty, staff,
    and students.

16
Other Sources of SDA Data Sets at ICPSR
  • Voting Behavior The 2004 Election by Charles
    Prysby and Carmine Scavo (http//www.icpsr.umich.e
    du/SETUPS/)
  • Investigating Community and Social Capital by
    Lori Weber (http//www.icpsr.umich.edu/ICSC/index.
    htm)

17
Statistical Procedures
18
Available Statistical Procedures
  • Frequencies and crosstabulation (discussed in
    this workshop)
  • Comparison of means
  • Correlation matrix
  • Comparison of correlations
  • Multiple regression (discussed in this workshop)
  • Logit/Probit regression

19
Using SDA
  • Select the data set
  • Look at the codebook
  • Decide what statistical procedure to use
  • Fill in what you want to do
  • Run it

20
Data Set
  • Were going to use the GSS 1972-2008 Cumulative
    Data File (2008 is preliminary data)
  • http//sda.berkeley.edu/archive.htm
  • Were going to use three variables
  • SEX
  • RELITEN
  • PORNLAW

21
Frequencies
  • List the variables you want to use
  • ROW SEX,RELITEN,PORNLAW
  • Click on Run the Table

22
(No Transcript)
23
(No Transcript)
24
Crosstabs
  • Now lets use RELITEN as our independent variable
    and PORNLAW as our dependent variable to create
    two bivariate crosstabulations.
  • List the variables
  • ROW PORNLAW
  • COLUMN RELITEN

25
Crosstabulation Continued
  • Options
  • Percentaging column
  • Statistics
  • Question text
  • Color coding
  • Run the Table

26
(No Transcript)
27
(No Transcript)
28
Your Turn
  • Lets run two more bivariate crosstabs
  • Independent variable SEX
  • Dependent variables RELITEN and PORNLAW
  • Go ahead and run these crosstabs

29
What Did we Discover?
  • RELITEN is strongly related to PORNLAW.
  • SEX is also related to both RELITEN and PORNLAW.
  • Could the relationship between RELITEN and
    PORNLAW be spurious? SEX is related to both
    RELITEN and PORNLAW and could be creating the
    relationship between RELITEN and PORNLAW.
  • How do we test this possibility? Lets run a
    three-variable crosstabulation with RELITEN as
    our independent variable, PORNLAW as our
    dependent variable, and SEX as our control
    variable.

30
Multivariate Crosstabulation
  • List the variables
  • ROW PORNLAW
  • COLUMN RELITEN
  • CONTROL SEX
  • Options
  • Percentaging column
  • Statistics
  • Question text
  • Color coding

31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
Spuriousness
  • Was the relationship between RELITEN and PORNLAW
    spurious due to SEX?
  • How do you know?
  • Does that mean that the relationship can never be
    spurious?

35
Regression
  • Crosstabulation is used when all the variables
    are categorical.
  • What do we do when our variables are continuous
    (i.e., interval and/or ratio)?
  • Regression is the answer.

36
Bivariate Regression
  • Lets look at the relationship between the
    respondents socioeconomic status (SEI) and the
    amount of television one watches (TVHOURS).
  • List the variables
  • Dependent TVHOURS
  • Independent SEI
  • Options
  • T-Tests
  • Correlation matrix
  • Color coding
  • Question Text

37
(No Transcript)
38
(No Transcript)
39
Multivariate Regression
  • Now lets add in another variable SEX
  • But sex is not a continuous variable. How do we
    enter a variable like SEX into the regression
    analysis? Answer create a dummy variable.
  • Dummy variables take on the values of 1 and 0.

40
Creating a Dummy Variable
  • SEX (d1)
  • SEX is the name of the variable to want to make
    into a dummy variable
  • d indicates that you want to create a dummy
    variable
  • 1 indicates that the value 1 will be assigned the
    value 1. All other values will be assigned the
    value 0.
  • Run the table

41
(No Transcript)
42
(No Transcript)
43
Recoding, Subsetting, Downloading
44
Recoding Existing VariablesExample (from GSS
Cumulative File) ATTEND (How often Respondent
attends religious services)
  • ATTEND0 Never1 Less than once a year2 Once a
    year
  • 3 Several times a year
  • 4 Once a month
  • 5 2 to 3 times a month
  • 6 Nearly Every Wk
  • 7 Every week
  • 8 More than once a week
  • 9 DK/NA (Missing)
  • ATTENDR
  • 1 Seldom (0 to 3)
  • 2 Sometimes (4 to 5)
  • 3 Often (6 to 8)
  • 9 Missing (9)

45
(No Transcript)
46
(No Transcript)
47
(No Transcript)
48
(No Transcript)
49
Your Turn
  • Recode AGE into the following categories
  • 1 18-29
  • 2 30-64
  • 3 65 and older
  • Obtain FREQUENCIES for the result

50
For More Information, See
  • http//sda.berkeley.edu/HELPDOCS/helpnewv.htmreco
    de

51
Compute a New Variable Example (from GSS
Cumulative File) Alienation Index
  • Create measure of ALIENATION from these variables
    asked in 1978 only (all coded as 1agree,
    2disagree, other missing data)
  • ALIENAT1 PEOPLE RUNNING COUNTRY DONT CARE
  • ALIENAT2 RICH GET RICHER, POOR POORER
  • ALIENAT3 WHAT YOU THINK DOESNT COUNT
  • ALIENAT4 YOU'RE LEFT OUT OF THINGS
  • ALIENAT5 POWERFUL PEOPLE TAKE ADVANTAGE OF YOU
  • ALIENAT6 PEOPLE IN WASH D.C. ARE OUT OF TOUCH

52
(No Transcript)
53
(No Transcript)
54
(No Transcript)
55
Your Turn
  • Create an index of parental education (MAEDUC
    PAEDUC)/2

56
For More Information, See
  • http//sda.berkeley.edu/HELPDOCS/helpnewv.htmcomp
    ute

57
Subsetting and Downloading
  • Example create and download a subset of the GSS
    cumulative file, selecting only cases from 2008,
    all Case Identification variables and some
    Personal and Family Information variables
    (MARITAL, AGEWED, DIVORCE, WIDOWED).
  • At end of each intermediate step, click on
    Continue button.

58
(No Transcript)
59
(No Transcript)
60
(No Transcript)
61
(No Transcript)
62
SPSS Syntax File
63
Creating an SPSS system file
  • Run SPSS (syntax) file against data (ASCII) file.
  • For more information, see
  • http//www.ssric.org/data/icpsr_direct (scroll
    down)
  • http//www.ssric.org/data/icpsr_direct (scroll to
    Syntax Files)
  • http//www.icpsr.com/cocoon/ICPSR/FAQ/0062.xml
  • http//web.pdx.edu/stipakb/download/Data/SDA_data
    _to_SPSS.pdf (portions outdated)

64
File Directory
65
(No Transcript)
66
(No Transcript)
67
Your Turn
  • Subset and download your own custom GSS SPSS
    system file.

68
Sample Instructional ApplicationsCrosstabs With
a Control Variable
69
Example 1
  • GSS Cumulative File (selecting 2002 and 2004
    only)
  • Crosstab Voting in 2000 election (VOTE00) by
    computer usage (COMPUSE).
  • Repeat, but with a control for respondents
    education level (DEGREE).

70
Example 2
  • ANES 2004 Study
  • Instructors note In addition to using this
    example in teaching use of control variables, I
    also use it in teaching about reactivity in
    interviewing.
  • Run frequency distribution for V5205 (Working
    mother can have warm relationship with kids).
  • Crosstab V5205 with V1109a (Respondent gender).
    Weight by Post-election weight
  • Repeat, but use V4103 (Interviewer gender) as
    independent variable
  • Run frequency distribution for V4103
  • Repeat 1 with a control for V4103
  • Repeat 2 with a control for V1109a

71
Teaching Resources for SDAand Developing
Instructional Materials
72
ICPSR Web-Based Instructional Materialshttp//www
.icpsr.umich.edu/ICPSR/training/index.htmlinstruc
tional
73
Investigating Community Social
Capitalhttp//www.icpsr.umich.edu/ICSC/index.html

74
Voting Behavior the 2004 Electionhttp//www.icps
r.umich.edu/SETUPS/index.html
Write a Comment
User Comments (0)
About PowerShow.com