SDA: a tool for teaching and research with microdata

About This Presentation

Title:

SDA: a tool for teaching and research with microdata

Description:

Advantages and disadvantages for teaching and research ... Logit/probit regressions. Tips & tricks. Have we not gotten around to coding the missing values? ... – PowerPoint PPT presentation

Number of Views:111

Avg rating:3.0/5.0

Slides: 36

Provided by: Laine

Category:

more less

Transcript and Presenter's Notes

Title: SDA: a tool for teaching and research with microdata

1
SDA a tool for teaching and research with
microdata

Laine Ruus ltlaine.ruus_at_utoronto.cagt
University of Toronto. Data Library Service
2008-12-03, revised 2009-04-14
http//www.chass.utoronto.ca/misc/mun09/sda_intro.
ppt

2
What this session covers

Introduction
Demo of main SDA capabilities
Some tips and tricks
Advantages and disadvantages for teaching and
research
Common questions about SDA

3
SDA_at_UT is brought to you by

University of California, Berkeley.
Computer-assisted Survey Methods Program (CSM)
writes and supports the server-side software
University of Toronto. Centre for Computing in
the Humanities and Social Sciences (CHASS)
provides the hardware, buys the software, and
provides system support wetware
University of Toronto. Libraries provides the
budget to purchase the data, and care, feeding
and user support wetware
And Memorial University Libraries which
subscribes to the service.

4
Our experience with SDA

CHASS installed SDA in the fall of 2004
At last count, have 900 data files in SDA
Some have only the metadata that was generated
from the original syntax files (SAS/SPSS/Stata),
but a number also have full question text.
Most are microdata, but a few are aggregate
statistics (census files)
A number of voracious data users now expect to
find the latest microdata released by Stat Can in
SDA

5
(No Transcript)
6
(No Transcript)
7
Review of main SDA utilities

Frequencies, weighted unweighted
Crosstabulations
Comparison of means (ANOVA)
Correlations
Regressions
Logit/probit regressions

8
Tips tricks

Have we not gotten around to coding the missing
values?
Want to include missing values in your
cross-tabulation, or other analysis?
Collapsing uniform categories of continuous
variables on the fly
Recoding variables on the fly

9
Problem in this variable, we have not yet coded
value 5 as missing data. Therefore it would be
included in analyses.
10
Solution specify, after the variable name, only
those values you want to include
11
Problem to include values coded as missing in
descriptive statistics or analyses
This is a missing value. It will not be included
in descriptive statistics or analyses.
12
Solution 1 specify, after the variable name,
the lowest value thru .
13
Solution 2 use include missing data values
under Table options
14
Solution 3 list the values explicitly after the
variable name
15
Problem to generate frequencies or a
cross-tabulation of a continuous variable
16
Solution 1 collapse to uniform categories,
defining a starting point
c30000,-30000 means - collapse to uniform
categories - each category should be 30000 in
size - begin with value -30000
17
Solution 2 recode to desired categories. Note
use of to denote both lowest and highest values.
18
Tips tricks (contd)

Computing percentages in aggregate data
Dummy coding variables in regressions
Defining an interaction on the fly

19
Problem given a file of aggregate statistics,
list percentages rather than counts. NB use the
Listcase program
These are all counts
20
Solution define percentages in the Listcase
program.
Defines a percentage with v4 in the numerator and
v2 in the denominator.
21
Problem to use a categorical variable in a
regression analysis, it needs to be dummy-coded
(ie 1 and 0).
22
Solution dummy-code categorical variables
on-the-fly. Interactions can also be coded
on-the-fly, including interactions with
dummy-coded variables.
Dummy coded values 10-14 will be coded to 1,
all others will be 0.
Interaction involving a dummy coded variable and
a continuous variable.
23
Advantages for teaching

Stable environment, 24x7 access
Very easy to explain to novice users
Reduce/eliminates need for computer labs with
statistical software
Allows you to each statistics rather than
software
Students get hands on data quickly
Switch easily between weighted and unweighted
distributions

24
Advantages for teaching (contd)

Measures of association and tests of significance
comparable to SAS
Design effects, in files in which cluster and/or
statum variables are available
Interactive demonstration of statistical concepts
Share recoded variables
Can quickly mount additional data to fulfill your
teaching needs

25
Advantages for research

Stable environment, 24x7 access
Access to latest available version of the data
Basic exploratory data analysis eg are there
enough cases for my subset?
Design effects, where cluster/sample variables
available
Download data and import to SAS/SPSS/Stata on own
workstation
Share recoded variables
Integrated variable descriptions (selected data
files)

26
Advantages for data management

Creates metadata from SAS/SPSS/Stata syntax or
DDI format xml files
Very easy and fast to import files with good
syntax files
Control over what users can and cannot do
Outputs include SAS/SPSS/Stata syntax or DDI
format xml files
Overhead size of uncompressed data about 50

27
Disadvantages of SDA

Search for variables/values among data files not
yet implemented at UT/CHASS
Cant download created/recoded variables coming
in spring 2009
Graphics minimal, eg no stem-and-leaf, box-plots
etc
Doesnt output SAS/SPSS/Stata system/export
files, only raw data files plus syntax files
Little support for Study/File level metadata
(DDI)
No support for nCubes (DDI 2)

28
How SDA compares to the competition

See table at
http//www.chass.utoronto.ca/datalib/misc/accoleds
/2008/sda_compare.htm

29
Common questions from researchers students

When to weight versus not to weight
Does it only do cross-tabs?
But I want the raw data, not a cross-tabulation!
Differences between syntax, data, and system
files.

30
An application we wouldnt have tackled without
SDA

Q I need the average expenditure on eye care in
Canada by age group of household head for as long
a time-period as possible.
A Once we explained SDA, the student had
generated this statistics from each of the
FAMEX/SHS files, 1969-2004 in under 30 mins. (He
knew only Stata.)

31
Functions we know to be coming in SDA

Among-file variable searching already available
but not yet implemented on CHASS
Downloading recoded variables
Will allow users to load own data files (Archiver
in SDA 3.1) -- already available but not yet
implemented on CHASS

32
Exercises

First time SDA user? Try these exercises using
the Census 2001 microdata on individuals
Experienced SDA user? Try these exercises using a
variety of DLI data files

33
Questions

Question 1 Where will I find the SDA server at
University of Toronto?
Answer 1 The URL is
http//www.chass.utoronto.ca/datalib/
Select Microdata analysis and extraction

34
Questions (contd)

Question 2
How are files chosen to be mounted on the SDA
server at UT?

Answer 2
All significant Canadian microdata files, eg by
Statistics Canada as released by DLI
Other files based on your requests

35
Questions (contd)

Question 3
My research is being done collaboratively with a
colleague at another Canadian university. Can my
colleague get access to SDA?

Answer 3
SDA is available as a subscription service to
other Canadian DLI-member universities and
colleges. Current subscribers include U of
Victoria, Ryerson U, and Memorial U

Write a Comment

User Comments (0)

About PowerShow.com

SDA: a tool for teaching and research with microdata - PowerPoint PPT Presentation

SDA: a tool for teaching and research with microdata

Advantages and disadvantages for teaching and research ... Logit/probit regressions. Tips & tricks. Have we not gotten around to coding the missing values? ... – PowerPoint PPT presentation