Title of the Presentation - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Title of the Presentation

Description:

Mathematical and statistical nutrients at university. Sampling courses ... of 80 university programs that offer MS or PhD in Statistics or Biostatistics ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 52
Provided by: Vin585
Category:

less

Transcript and Presenter's Notes

Title: Title of the Presentation


1
The Care, Feeding, and Training of Survey
Statisticians
Sharon L. Lohr
2
Care and Feeding of Iguanas
  • Iguana iguana
  • Natural sunlight
  • Variety of fruits and vegetables
  • Water
  • Bathing is a good habit

drexotic.com/care_iguanas Picture from Wikipedia
3
Care and Feeding of Puppies
  • Canis lupus familiaris
  • Balanced diet
  • Exercise
  • Socialization
  • Bathing is a good habit

4
Care and Feeding of Survey Statisticians
  • Statisticus exemplus repræsentativus
  • Balanced diet
  • Exercise
  • Natural sunlight
  • Socialization
  • Bathing is a good habit

5
Survey Sampling
Ethnography
Psychology
Psychology
Statistics
Management
Lots more
Geography
6
Balanced Diet
  • Mathematical and statistical nutrients at
    university
  • Sampling courses
  • Other aspects of training and care

7
Essence of Survey Sampling
  • How to generalize from seen to unseen?
  • Quantify uncertainty about population
  • 18th, 19th Century
  • Immanuel Kant
  • Charles Peirce
  • John Venn
  • Adolphe Quetelet
  • What is P(sun will rise tomorrow)?

8
1920s and 1930s
  • Convenience, judgment samples
  • Models (usually not explicitly stated)
  • Faith
  • Famous example Literary Digest Survey
  • Correct winner, every election 1912-1932
  • Uncanny accuracy n 2.3 million
  • 1936 predicted Landon with 55
  • 1936 Roosevelt won with 61

9
1940 Probability sampling
  • Revolutionary Idea Inference is based on random
    variables for sample inclusion
  • Fisher, Neyman, Mahalanobis, Hansen
  • Robust, nonparametric approach

10
Probability Sampling
Not sampled Zi 0
Sample Zi 1
y3
y2
y4
y7
y6
y5
y1
y8
y9
ys fixed random variables Zi
11
1960s Predictive approach
  • Use stochastic model about quantity y to predict
    the values of y not in the sample
  • Brewer, Royall, Dorfman, Valliant
  • Balanced sampling
  • Can model nonresponse

12
Model-based inference
Predict values of y not in sample Y f (x) e
Sample

y3
y2
y4



y7
y6
y5

y1

y8

y9
Inference depends on stochastic model
13
HHM Volume I (1953)
  • Sampling Principles
  • Biases, Nonsampling Errors
  • Sample Designs
  • SRS, Stratified, One- Two-Stage Cluster
    Sampling, Stratified Multistage
  • Control of Variation in Cluster Size
  • Estimating Variances
  • Regression Estimates, Double Sampling, Other
  • Case Studies

14
HHM Volume II (1953)
  • Fundamental Theory of Probability
  • Derivations for Chapters of Volume 1
  • Response Errors in Surveys

15
What diet are students getting?
  • SRS of 80 university programs that offer MS or
    PhD in Statistics or Biostatistics
  • Exclude JPSM, Iowa State, UNC, UNL
  • Sampling frame www.amstat.org listings
  • Thank you, Burcu Eke!

16
Basic syllabus
HHM Vol. 1
  • Sampling Principles
  • Biases, Nonsampling Errors
  • Sample Designs
  • SRS, Stratified, One- Two-Stage Cluster
    Sampling, Stratified Multistage
  • Control of Variation in Cluster Size
  • Estimating Variances
  • Regression Estimates, Double Sampling, Other
  • Case Studies
  • SRS
  • Stratified
  • Cluster
  • Multistage
  • Ratio, regression estimation

17
Beyond Basics
  • Replication variance estimation
  • Nonresponse models, calibration
  • Regression, categorical data
  • Spatial sampling
  • Adaptive sampling
  • Model-based inference

18
SRS of 80 Grad Programs
No class 21 Not offered 9
19
Exercise Analyze Survey Data
  • Download data from fedstats.gov
  • Codebook, SAS code
  • Investigate topics of interest to students
  • Graph data
  • Multivariate analyses
  • Regression, logistic regression, categorical
  • Discuss nonsampling errors
  • Variance estimation

20
Exercise Analyze Survey Data
  • Cholesterol, obesity (NHANES)
  • Predicting number of friends (Add Health)
  • Energy-saving systems consumption (Commercial
    Buildings Energy Consumption Survey)
  • Math scores, sex, calculator use (TIMSS)
  • Jackknife macros

21
Exercise Design
  • Work on all steps of a survey
  • Survey center helpful, not necessary
  • Take sample from Internet data
  • amazon.com
  • Treat large data set as population
  • IPUMS, baseball
  • Compare sampling designs
  • Generate nonresponse

22
Exercise Inferential Framework
  • Population N 100
  • Take SRS of size 30
  • X1 mean of first sample
  • Put them back
  • Take a second SRS of size 30
  • X2 mean of second sample
  • Are X1 and X2 independent?
  • Model-, design-based simulations in R

23
Socialization
  • Students need to work with people outside
    statistics
  • Socialize with other statisticians
  • Exposure to new ideas
  • Integrate sampling with other classes

24
Bathing
  • Need to cleanse old, crusted concepts
  • What are main goals?
  • Would I teach this material if starting over?
  • Do students really need to work out small samples
    by hand?
  • Want data-centric training
  • Problem solvers

25
Sunlight
  • Instead of preparing statisticians for survey
    problems of 1950, look at
  • What a survey statistician actually does
  • What a survey statistician might need to do in
    the future

26
Current Research Topics
  • Weighting and weight smoothing / trimming
  • Computer-intensive variance estimation
  • Visualization
  • Multi-mode, multi-frame
  • Small area, disease mapping
  • Nonparametric, robust models for surveys
  • Time series / spatial methods
  • Record linkage, administrative data
  • Confidentiality
  • Nonresponse, calibration, imputation

27
Technology and Sampling
  • 1940s Errors in surveys
  • Depression, war Need for data
  • Sampling lower cost, fewer errors
  • Computing
  • 1960s Telephone, errors, computing
  • Measurement error
  • Model-based inference
  • 1980s Computing ? Replication variance
    estimation methods, data analysis

28
2000s Internet
  • Inexpensive data collection
  • But
  • Coverage problems
  • Nonresponse
  • Measurement error
  • Opportunity for ingenuity in sample design
    HHM, V1, p. 456

29
1920s and 1930s
  • Convenience, judgment samples
  • Models (usually not explicitly stated)
  • Faith
  • Literary Digest Survey
  • Claimed accuracy
  • Predicted correct winner, 1912-1932

30
2000s
  • ACS, other govt surveys high quality data
  • Volunteer (or paid) online panel polls
  • Convenience, judgment samples
  • Models (usually not explicitly stated)
  • Faith
  • Claim accuracy because predicted correct winner
    in last few elections
  • But give margin of error

31
From pollster.com blogs
  • September 10, 2009
  • Justification of convenience samples for
    estimating population values
  • Use model-based inference
  • See Sharon Lohrs Sampling Design and Analysis
  • But what is the model, and how do you know it
    fits non-volunteers?

32
2000s
  • Coverage
  • Nonresponse
  • Measurement error
  • Massive amounts of data available
  • Networked data
  • Multiple sources, linking
  • Data fusion

33
Danger
  • Ready availability of data
  • Wilkinson (2008) structural equation software
  • Correlational studies ?
  • Designed experiments ?
  • Designed surveys are important
  • Careful data collection
  • Inference to population

34
New uses for survey data
  • Detecting anomalies
  • False discovery rates
  • Forecasting
  • Better survey design
  • Combining information from surveys
  • From data sampling to data integration
  • ????

35
New uses for survey methods
  • Relationships in massive data sets
  • SRS sometimes used, but rarely other designs
  • Dynamic data collection
  • Data dispersed on servers
  • Microarray data
  • Effectiveness of medical treatments
  • Value added by teachers

36
Better connections
  • Tukey (1962) The Future of Data Analysis
  • It is, incidentally, both surprising and
    unfortunate that those concerned with statistical
    theory and statistical mathematics have had so
    little contact with the recent developments of
    sophisticated procedures of empirical sampling.

37
Better connections
  • Efron (2007) The Future of Statistics
  • Statistics is in a period of rapid expansion and
    change. During such times, it pays to concentrate
    on basics and not tie oneself too closely to any
    one technology or analysis fad.

38
Training for the Future
  • Balanced diet mathematical and statistical
    background that will give flexibility
  • Variety of backgrounds
  • Parallels with 1930s
  • Economic
  • Need for more survey theory, expertise
  • Who foresaw probability sampling in 1920?

39
Statistics Curriculum
Mathematical Theory
Methodology Regression, Categorical, Time
Series, etc.
40
Statistics Curriculum
Mathematical Theory
Methodology Regression, Categorical, Time
Series, etc.
Sampling
41
Training for the Future
  • Still need
  • mathematical theory for statistics
  • methodology
  • probability and model-based sampling
  • But these need to be updated
  • Solve problems using statistical thinking
  • Integrate theory and practice
  • Emphasize data collection

42
Socialization
  • Better integration of survey sampling with other
    courses
  • Asymptotics, probability
  • Computing
  • See stat.berkeley.edu/users/statcur
  • Some students should learn about
  • Machine learning
  • Graph and social network theory
  • Spatial statistics, bioinformatics,

43
Statistics Curriculum
Data Mining
Sampling
Data and Statistical Thinking
Mathematical Theory
DOE
Statistical Methodology
44
Species Survival
  • Groves Senate Confirmation Hearing, May 15
  • Sen. Akaka The federal government is facing
    major human capital challenges 45 of current
    Census employees will be eligible to retire next
    year.
  • Bob Groves I am terribly worried about this
    problem the number of programs in the country
    training people that have the requisite skills
    for the Census Bureau is way below the need.

45
SRMS Distribution
46
SRMS Members per Million People
47
Morris Hansen
  • Born Thermopolis, WY, 1910
  • Univ. Wyoming (Deming, Bryant)
  • Bachelors degree, accounting, 1934
  • Why did he become a survey statistician?

Interview with I. Olkin in Statistical Science,
1987
48
Morris Hansen
In accounting I was exposed to courses in
economic statistics by a professor in the
Commerce Department. He was a really fascinating
teacher and got me interested in statistics. When
I finished those courses, I thought I knew
something about statistics and learned later that
was a misconception. But I knew a little and
decided that I would like to go into statistics.

49
Teacher Forest R. Hall
  • Asst prof, 1927
  • Depression Regional Director of Dept of Labor
  • 4-state Study of Consumer Purchases

50
Propagating the Species
  • Data not the plural of anecdote
  • But recruitment is anecdotal, personal
  • Activities that allow students to experience
    importance, excitement of subject
  • Great teaching
  • Sampling in intro stat, graduate curriculum
  • Work with survey investigations
  • Numerical detectives (B. Joiner)

51
Adult Care
  • Balanced diet
  • Exercise
  • Natural sunlight
  • Socialization
  • Bathing
  • Reproduction
  • Good teaching
  • Collateral reproduction
  • High pay
Write a Comment
User Comments (0)
About PowerShow.com