SDA: a tool for teaching and research with microdata - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

SDA: a tool for teaching and research with microdata

Description:

Advantages and disadvantages for teaching and research ... Logit/probit regressions. Tips & tricks. Have we not gotten around to coding the missing values? ... – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 36
Provided by: Laine
Category:

less

Transcript and Presenter's Notes

Title: SDA: a tool for teaching and research with microdata


1
SDA a tool for teaching and research with
microdata
  • Laine Ruus ltlaine.ruus_at_utoronto.cagt
  • University of Toronto. Data Library Service
  • 2008-12-03, revised 2009-04-14
  • http//www.chass.utoronto.ca/misc/mun09/sda_intro.
    ppt

2
What this session covers
  • Introduction
  • Demo of main SDA capabilities
  • Some tips and tricks
  • Advantages and disadvantages for teaching and
    research
  • Common questions about SDA

3
SDA_at_UT is brought to you by
  • University of California, Berkeley.
    Computer-assisted Survey Methods Program (CSM)
    writes and supports the server-side software
  • University of Toronto. Centre for Computing in
    the Humanities and Social Sciences (CHASS)
    provides the hardware, buys the software, and
    provides system support wetware
  • University of Toronto. Libraries provides the
    budget to purchase the data, and care, feeding
    and user support wetware
  • And Memorial University Libraries which
    subscribes to the service.

4
Our experience with SDA
  • CHASS installed SDA in the fall of 2004
  • At last count, have 900 data files in SDA
  • Some have only the metadata that was generated
    from the original syntax files (SAS/SPSS/Stata),
    but a number also have full question text.
  • Most are microdata, but a few are aggregate
    statistics (census files)
  • A number of voracious data users now expect to
    find the latest microdata released by Stat Can in
    SDA

5
(No Transcript)
6
(No Transcript)
7
Review of main SDA utilities
  • Frequencies, weighted unweighted
  • Crosstabulations
  • Comparison of means (ANOVA)
  • Correlations
  • Regressions
  • Logit/probit regressions

8
Tips tricks
  • Have we not gotten around to coding the missing
    values?
  • Want to include missing values in your
    cross-tabulation, or other analysis?
  • Collapsing uniform categories of continuous
    variables on the fly
  • Recoding variables on the fly

9
Problem in this variable, we have not yet coded
value 5 as missing data. Therefore it would be
included in analyses.
10
Solution specify, after the variable name, only
those values you want to include
11
Problem to include values coded as missing in
descriptive statistics or analyses
This is a missing value. It will not be included
in descriptive statistics or analyses.
12
Solution 1 specify, after the variable name,
the lowest value thru .
13
Solution 2 use include missing data values
under Table options
14
Solution 3 list the values explicitly after the
variable name
15
Problem to generate frequencies or a
cross-tabulation of a continuous variable
16
Solution 1 collapse to uniform categories,
defining a starting point
c30000,-30000 means - collapse to uniform
categories - each category should be 30000 in
size - begin with value -30000
17
Solution 2 recode to desired categories. Note
use of to denote both lowest and highest values.
18
Tips tricks (contd)
  • Computing percentages in aggregate data
  • Dummy coding variables in regressions
  • Defining an interaction on the fly

19
Problem given a file of aggregate statistics,
list percentages rather than counts. NB use the
Listcase program
These are all counts
20
Solution define percentages in the Listcase
program.
Defines a percentage with v4 in the numerator and
v2 in the denominator.
21
Problem to use a categorical variable in a
regression analysis, it needs to be dummy-coded
(ie 1 and 0).
22
Solution dummy-code categorical variables
on-the-fly. Interactions can also be coded
on-the-fly, including interactions with
dummy-coded variables.
Dummy coded values 10-14 will be coded to 1,
all others will be 0.
Interaction involving a dummy coded variable and
a continuous variable.
23
Advantages for teaching
  • Stable environment, 24x7 access
  • Very easy to explain to novice users
  • Reduce/eliminates need for computer labs with
    statistical software
  • Allows you to each statistics rather than
    software
  • Students get hands on data quickly
  • Switch easily between weighted and unweighted
    distributions

24
Advantages for teaching (contd)
  • Measures of association and tests of significance
    comparable to SAS
  • Design effects, in files in which cluster and/or
    statum variables are available
  • Interactive demonstration of statistical concepts
  • Share recoded variables
  • Can quickly mount additional data to fulfill your
    teaching needs

25
Advantages for research
  • Stable environment, 24x7 access
  • Access to latest available version of the data
  • Basic exploratory data analysis eg are there
    enough cases for my subset?
  • Design effects, where cluster/sample variables
    available
  • Download data and import to SAS/SPSS/Stata on own
    workstation
  • Share recoded variables
  • Integrated variable descriptions (selected data
    files)

26
Advantages for data management
  • Creates metadata from SAS/SPSS/Stata syntax or
    DDI format xml files
  • Very easy and fast to import files with good
    syntax files
  • Control over what users can and cannot do
  • Outputs include SAS/SPSS/Stata syntax or DDI
    format xml files
  • Overhead size of uncompressed data about 50

27
Disadvantages of SDA
  • Search for variables/values among data files not
    yet implemented at UT/CHASS
  • Cant download created/recoded variables coming
    in spring 2009
  • Graphics minimal, eg no stem-and-leaf, box-plots
    etc
  • Doesnt output SAS/SPSS/Stata system/export
    files, only raw data files plus syntax files
  • Little support for Study/File level metadata
    (DDI)
  • No support for nCubes (DDI 2)

28
How SDA compares to the competition
  • See table at
  • http//www.chass.utoronto.ca/datalib/misc/accoleds
    /2008/sda_compare.htm

29
Common questions from researchers students
  • When to weight versus not to weight
  • Does it only do cross-tabs?
  • But I want the raw data, not a cross-tabulation!
  • Differences between syntax, data, and system
    files.

30
An application we wouldnt have tackled without
SDA
  • Q I need the average expenditure on eye care in
    Canada by age group of household head for as long
    a time-period as possible.
  • A Once we explained SDA, the student had
    generated this statistics from each of the
    FAMEX/SHS files, 1969-2004 in under 30 mins. (He
    knew only Stata.)

31
Functions we know to be coming in SDA
  • Among-file variable searching already available
    but not yet implemented on CHASS
  • Downloading recoded variables
  • Will allow users to load own data files (Archiver
    in SDA 3.1) -- already available but not yet
    implemented on CHASS

32
Exercises
  • First time SDA user? Try these exercises using
    the Census 2001 microdata on individuals
  • Experienced SDA user? Try these exercises using a
    variety of DLI data files

33
Questions
  • Question 1 Where will I find the SDA server at
    University of Toronto?
  • Answer 1 The URL is
  • http//www.chass.utoronto.ca/datalib/
  • Select Microdata analysis and extraction

34
Questions (contd)
  • Question 2
  • How are files chosen to be mounted on the SDA
    server at UT?
  • Answer 2
  • All significant Canadian microdata files, eg by
    Statistics Canada as released by DLI
  • Other files based on your requests

35
Questions (contd)
  • Question 3
  • My research is being done collaboratively with a
    colleague at another Canadian university. Can my
    colleague get access to SDA?
  • Answer 3
  • SDA is available as a subscription service to
    other Canadian DLI-member universities and
    colleges. Current subscribers include U of
    Victoria, Ryerson U, and Memorial U
Write a Comment
User Comments (0)
About PowerShow.com