Title: Enhancing the Teaching of Statistics Using SPSS Statistics
1Enhancing the Teaching of Statistics Using SPSS
Statistics
- Prof. Sharon L. Weinberg
- New York University
- sharon.weinberg_at_nyu.edu
- Prof. Sarah K. Abramowitz
- Drew University
- sabramow_at_drew.edu
2Commonly Asked Questions
- 1. Will I be able to get copies of the slides
after the event? - 2. Is this web seminar being taped so I or others
can view it after the fact?
Yes
Yes
3The Technology Revolution
- Has changed the way statisticians work and
should change what and how we teach (GAISE
(2005) report) - A caution In choosing among technologies, the
focus cannot be on the technology itself, but on
how that technology improves the teaching of our
subject (Moore, 1997, p. 135). -
4Enhancing Student Learning of Statistics The
GAISE (2005) Report
- Use Real Data (Many others also have made this
recommendation (e.g., Chance, Ben-Zvi, Garfield,
and Medina, 2008) - Use Technology for Developing Conceptual
Understanding Analyzing Such Data - Emphasize Statistical Literacy Develop
Statistical Thinking - Stress Conceptual Understanding Rather than Mere
Knowledge of Procedures - Foster Active Learning in (and Outside) the
Classroom - Use Assessments to Improve and Evaluate Student
Learning
5Recommendation 1 Use Real Data
- Gives students a better idea of what
statisticians do by having them analyze real and
often messy data. - Facilitates the discussion of more interesting
problems that often are captured by large and
more complicated data sets. - Motivates students to take ownership of the
process of analyzing and drawing conclusions
about data so as to uncover answers to timely and
relevant questions. - Creates an empowering experience students learn
skills that may be used in other settings and
arenas unrelated to the class in question.
6Sources for Real Data Classroom Activities
- The Data and Story Library (DASL,
http//lib.stat.cmu.edu/DASL) - The Journal of Statistics Education (JSE) Dataset
and Stories feature (see http//www.amstat.org/pub
lications/jse/jse_data_archive.html) - CAUSE (http//www.causeweb.org)
7Additional Real Data Sources
- http//www.statsci.org/datasets.html
- http//www.stat.ucla.edu/data/
- http//www.cdc.gov/datastatistics/
- http//www.sci.usq.edu.au/staff/dunn/Datasets/
- http//www.umass.edu/statdata/statdata/non-local-d
ata.html - http//www.amstat.org/publications/jse/jse_data_ar
chive.html - http//lib.stat.cmu.edu/
- http//espn.go.com/
- https//www.icpsr.umich.edu/
- http//www.du.edu/idea/data.htm
- http//mathforum.org/workshops/sum96/data.collecti
ons/datalibrary/other.resources.html - http//www.google.com/search?qdatasets
8Another Source of Data NELS
- The NELS data set collected by the U.S.
Department of Educations National Center of
Education Statistics (NCES). - Nationally Representative Longitudinal Data Set
to measure achievement outcomes in four core
subject areas (English, history, mathematics, and
science), in addition to personal, familial,
social, institutional, and cultural factors that
might relate to these outcomes.
9Students May Find Answers to Such Interesting and
Relevant Questions As
- Do boys perform better on math achievement tests
than girls? - Does socioeconomic status relate to educational
and/or income aspirations? - To what extent does enrollment in advanced math
in eighth grade predict twelfth-grade math
achievement scores? - Can we distinguish between students who use
marijuana and those who dont in terms of
self-concept? - Does owning a computer vary as a function of
geographical region of residence (Northeast,
North Central, South, and West)? - Note The text Statistics Using SPSS An
Integrative Approach (2nd ed.) by Weinberg
Abramowitz (2008), contains a subsample of these
data, with 500 cases and 48 variables on an
accompanying CD.
10And Yet Another Source of Data Framingham Heart
Study
- First prospective study of risk factors and their
joint effects related to coronary heart disease
(CHD). - Longitudinal data collection began in 1956 on
5,209 subjects. - Risk factors and disease markers of CHD include
blood pressure, blood chemistry, lung function,
smoking history, health behaviors, and medication
use. - Note Data on a random subsample of 400 cases at
first examination in 1956, and at third
examination in 1968, blocking on smoking and
gender, is included on a CD accompanying
Statistics Using SPSS.
11Students May Find Answers to Such Interesting and
Relevant Questions As
- Is the mean body mass index of smokers lower than
that for non-smokers among the population of
non-institutionalized adults? - What evidence is there to suggest that HDL is
good cholesterol and LDL is bad cholesterol? - Does total serum cholesterol predict the
incidence of coronary heart disease? - Is there a difference in diastolic blood pressure
levels, on average, between those who do and
those who do not take anti-hypertensive
medication?
12Recommendation 2 Use Technology
- Advantages
- Reduces focus on time-consuming computation
- Frees student to focus on conceptual
understanding - Enhances the teaching of our subject
- Models statistical practice
- Pay Attention to Such Features As
- Availability and Cost with student network
versions available - Ease of use for particular audiences
- Ease of data entry, ability to import data in
multiple formats - Dynamic linking between data, graphical, and
numerical analyses - Interactive and High Speed Capabilities
- Versatility all purpose tool for use throughout
the course beyond, in academic settings and
industry - Portability for classroom and home
- Our Software Choice SPSS, which has all of the
above features.
13Recommendation 3 Emphasize Statistical Literacy
Thinking
- Modeling statistical thinking from conception to
conclusion Example 11.8 from Statistics Using
SPSS. - Exploring a large data set checking underlying
inferential assumptions Exercise 6.9 from
Statistics Using SPSS. - The importance of open-ended problems Exercise
3.15 from Statistics Using SPSS.
14Modeling statistical thinking from conception to
conclusion
- Example 11.8, Statistics Using SPSS, using the
NELS data set. Does the population of
college-bound males from the southern United
States who have always been at grade level differ
from the corresponding population of females in
terms of the number of years of math in high
school? Conduct the significance test at alpha
.05.
15Modeling statistical thinking from conception to
conclusion
- Use boxplots of the variable number of years in
math in high school to investigate visually the
tenability of the underlying assumptions of the
t-test - Boxplots suggest that homogeneity of variance
assumption is met. - Boxplots suggest also that median number of years
taken in math by males and females is quite
similar.
16Modeling statistical thinking from conception to
conclusion
- The output of the t-test on means
17Modeling statistical thinking from conception to
conclusion
- Levenes test -- suggests that homogeneity of
variance assumption of t-test is met (p .294 gt
.05). - Note that t(148) 2.107, p .04.
- Reject null hypothesis in favor of alternative.
- Conclude that for those from the South who have
always stayed on track, mean number of years of
math taken in high school by college-bound males
is statistically significantly different than
that taken by females. - Effect size of result, according to Cohens d, is
small to moderate. - Mean number of years of math taken by males is
.35 standard deviations greater than mean number
taken by females. - Results appear to contradict visual impressions
based on the boxplots.
18Modeling statistical thinking from conception to
conclusion
- Explore apparent contradiction of results with a
population pyramid provides a more detailed view
of the separate male and female distributions. - Even with similarly-shaped distributions, medians
can provide a different result from means and
that multiple representations of data can be
useful.
19Modeling statistical thinking from conception to
conclusion
20Exploring a large data set checking underlying
inferential assumptions
- Exercise 6.9, Statistics Using SPSS
- Use the Framingham data set to regress initial
total cholesterol (TOTCHOL1) on initial body mass
index (BMI1). - Create the scatterplot for these two variables.
Label the scatterplot by ID number and
superimpose the regression line. Would you say
that linear regression is appropriate in this
case? Explain. - According to the scatterplot, which person is
most unusual in terms of the linear trend between
serum cholesterol and BMI? What is his or her ID
number? Looking at the data set, is this person
male or female? How old was this person when the
study began? - With all people included in the data set, what is
the equation of the regression line? With the
bivariate outlier omitted, what is the equation
of the regression line? Do the coefficients
change as a result of omitting this person from
the data set?
21Exploring a large data set checking underlying
inferential assumptions
- Scatterplot suggests a linear, rather than a
non-linear, relationship between these variables.
- A bivariate outlier (case 205) may be identified
as 55 years old and female. She has an unusual
combination of total cholesterol and BMI values
relative to the other cases.
22Exploring a large data set checking underlying
inferential assumptions
- To what extent does this individual influence the
regression coefficients? - With all people included, the equation is
predicted TOTCHOL1 1.82(BMI1)189.98 - When omitting ID 205, the equation is predicted
TOTCHOL1 1.93(BMI1)186.52 - Using technology we are able to demonstrate
simply that even one case in a data set of 400
cases can influence the results of a regression
analysis.
23The importance of open-ended problems
- Open-ended problems -- the student is required to
develop an analytic strategy on his/her own
he/she is not asked to carry out a series of
directed analyses (e.g., compute the mean,
compute the SD, etc.) - As an example, see Exercise 3.15, Statistics
Using SPSS. This question asks students to
provide a demographic assessment by sex of total
serum cholesterol (TOTCHOL1) measured at baseline
in the Framingham data set. - The student needs to construct an appropriate
strategy for arriving at this assessment.
24The importance of open-ended problems
- One Approach. According to the boxplot, the
cholesterol distribution is fairly symmetric for
men, but positively skewed for women. The IQR for
females is larger, so their cholesterol values
are more heterogeneous. The median cholesterol
for women is slightly higher than it is for men.
Descriptive statistics quantify these
impressions. According to the skewness ratio, the
distribution is fairly symmetric for men (1.17)
and severely positively skewed for women (4.34).
The median cholesterol for men in the study at
initial examination (median 231.50) was
slightly lower than for women (median 239.00).
The distribution of cholesterol for men was
slightly more homogeneous (IQR 54) than it was
for women (IQR 60). For men, the cholesterol
levels ranged from 133 to 333, whereas for women
they ranged from 152 to 464.
25Additional Ways to Emphasize Statistical Literacy
Thinking
- Choosing an appropriate method of analysis
Leaving it up to the student Exercise 11.29,
Statistics Using SPSS. - Visualizing data Table 6.4, Anscombe data sets
-- Statistics Using SPSS. - Using simulations Example 9.5, Statistics Using
SPSS.
26Choosing an appropriate method of analysis
leaving it up to the student
- Exercise 11.29, Statistics Using SPSS. For each
of the following questions based on the NELS data
set, select an appropriate statistical procedure
to use to answer it from the list that follows.
Then use SPSS to conduct the appropriate
hypothesis test in cases where the underlying
assumptions are tenable. If the result of a
hypothesis test is statistically significant,
report and interpret an appropriate measure of
effect size. - Among college-bound students who are always at
grade level, do those who attended nursery school
(NURSERY) tend to have higher SES than those who
did not? - Among college-bound students who are always at
grade level, does self-concept differ in eighth
(SLFCNC08) and tenth (SLFCNC10) grades? - Among college-bound students who are always at
grade level, do those who attend public school
(SCHTYP8) perform differently in twelfth-grade
math achievement (ACHMATH12) from those who
attend private school? (Note that to answer this
question the variable SCHTYP8 has to be recoded
to be dichotomous as described in Chapter 4.) - Among college-bound students who are always at
grade level, do families typically have four
members (FAMSIZE)? - Among college-bound students who are always at
grade level, do students tend to take more years
of English (UNITENGL) than math (UNITMATH)?
27Visualizing data Anscombes (1973) data sets
from Statistics Using SPSS.
28Visualizing data Anscombes (1973) data sets
from Statistics Using SPSS.
- While visually quite different, for all four
panels of data -
- Mx 9.0, My 7.5, Sx 3.17, Sy 1.94, rxy
.82, - the standard error of the estimate is 1.12, and
- predicted Y 0.5X 3.
- Underscores the importance of having the ability
to visualize ones data. -
29The Utility of Simulations
- See Example 9.5, Statistics Using SPSS, which
uses simulation to appreciate the power of the
Central Limit Theorem. - Based on the following syntax file, students may
view first hand how the sampling distribution of
means varies as a function of sample size.
30The Utility of Simulations
- The syntax file SAMPDISVER2.SPS appears below
- DEFINE bootmn (nsamp !tokens(1) / nsize
!tokens(1) /samsize !tokens(1) - / bootvar !tokens(1) / outfile !tokens(1)).
- Sort cases by !bootvar.
- vector data (!nsize).
- compute data (casenum) !bootvar.
- compute nobreak 1.
- Aggregate outfile
- /break nobreak
- /data1 to !concat(data,!nsize) max(data1 to
!concat(data,!nsize)). - Vector data data1 to !concat(data,!nsize).
- vector tmp(!nsize).
- loop p 1 to !nsamp.
- loop q 1 to !nsize.
- compute tmp(q) 0.
- End loop.
- Loop I 1 to !samsize.
- compute id runk(uniform(!nsize) 1).
- Compute tmp(id) tmp(id) 1.
31The Utility of Simulations
- Figure 9.5. Positively skewed population of 1,000
scores ? 8, ? 4.
32The Utility of Simulations
- Figure 9.6. Sampling distribution of 10,000
means, each of size 100 ? 8, ? 0.4.
33Recommendation 4 Stress Conceptual Understanding
Over Procedural Knowledge
- Avoid time-consuming mechanical hand computation
through the use of technology. - Focus on data exploration of a single data set to
show that only from multiple analyses can one
achieve a thorough understanding of the
information contained in that data set. - Present formulas in a form that enables greater
conceptual understanding, not computational ease
See Equation 3.4 in Statistics Using SPSS.
34Recommendation 5 Foster active learning in (and
outside) the classroom
- Bring an applied perspective to an otherwise
theoretical or conceptual presentation by
demonstrating how one may apply the method(s)
discussed. We have found that by using SPSS one
can accomplish this well. - Demonstrate with the help of real data that
students can relate to (e.g., the NELS). - Present the theoretical or conceptual
underpinnings of a method before demonstrating on
SPSS, for example. - Have instructor demonstrate on his/her own
computer with class following along. - Provide good notes of how to replicate what has
been demonstrated in class. Our text has the
SPSS commands explicitly stated in boxed-in
areas. - Assign data-driven problems for homework that
continue to reinforce statistical literacy and
thinking.
35Recommendation 6 Assessments
- Align with Learning Goals.
- Focus on Understanding Key Ideas and Not Just on
Skills, Procedures, and Computed Answers. - Include the Communication of Statistical Concepts
and Analytic Results. - Use Multiple Formats Quizzes, Exams, Homework
Problem Sets, Article Critiques, Individual and
Group Term Projects.
36Poised for the Future
- With a good conceptual understanding of
statistics and a good hands-on working knowledge
of a versatile and accessible software package
like SPSS, students will be well-poised for
future study in statistics and for conducting
their own research at the undergraduate or
graduate level that relies on a fundamental
quantitative analysis of data. - Statistics Using SPSS An Integrative Approach
has been written to allow students to achieve
both goals under one cover.
37Where to Find Materials Discussed Today
- Statistics using SPSS An Integrative Approach
- http//www.cambridge.org
- ISBN 9780521676373
- Download a free, 30-day trial of SPSS Statistics
17.0 at - http//www.spss.com/statistics
- Click on the Downloads tab
38References
- Anscombe, F.J.. (1973). Graphs in Statistical
Analysis, American Statistician, 27, 17-21. - Chance, B., Ben-Zvi, D., Garfield, J., Medina,
E. (2007). The role of technology in improving
student learning of statistics. Technology
Innovations in Statistics Education Vol. 1 No.
1, Article 2. http//repositories.cdlib.org/uclas
tat/cts/tise/vol1/iss1/art2 - Friel, S. (2007). The research frontier Where
technology interacts with the teaching and
learning of data analysis and statistics. In G.W.
Blume M.K. Heid (Eds.), Research on technology
and the teaching and learning of mathematics
Cases and Perspectives (Vol. 2, pp. 279-331).
Greenwich, CT Information Age Publishing, Inc. - GAISE (2005). Guidelines for assessment and
instruction in statistics education (GAISE)
college report. The American Statistical
Association (ASA). Retrieved August 25, 2008.
http//www.amstat.org/education/gaise/GAISEColleg
e.htm - Moore, D.S. (1997). New pedagogy and new
content the case of statistics. International
Statistics Review, 635, 123-165. - Weinberg, S.L. Abramowitz, S.K. (2008).
Statistics Using SPSS An Integrative Approach
(2nd ed.). New York Cambridge University Press.
39Contact Information
Prof. Sharon L. Weinberg New York
University sharon.weinberg_at_nyu.edu Prof. Sarah
K. Abramowitz Drew University sabramow_at_drew.edu