Title: LEARNING TO USE SPSS: SOME COMMON PROBLEMS
1LEARNING TO USE SPSS SOME COMMON PROBLEMS
- Paul Kinnear
-
- University of Aberdeen
2Introduction
- Colin Gray and I have been teaching our students
statistics and the use of statistical packages
for many years. - We have written several books on SPSS, published
by LEA and then Psychology Press. - The latest is SPSS12 Made Simple (2004).
3Our associates
- Our teaching has benefited hugely from the advice
and support of John Lemon, our Universitys
Senior Computing Adviser, and Caroline Green,
Senior Teaching Fellow in the School of
Psychology. - This morning, I would like to share with you some
of our experiences in teaching SPSS.
4Purpose of this talk
- The advent of Windows-based versions of SPSS has
been a boon to students and teachers alike. - Certain aspects of SPSS, however, present
problems for students. - My purpose today is to identify some of the most
common difficulties.
5Example of a problem
- Many difficulties arise from students not taking
time to make a list of what their dependent and
independent variables are, and what they are
trying to show in terms of their experimental
hypothesis (or hypotheses).
6Summary of problem areas
- Not identifying the variables and their types.
- Not understanding the differences between within
subjects and between subjects designs. - Not undertaking exploratory data analysis (EDA)
before launching into statistical analyses. - Not knowing how to select the appropriate
analysis.
7Summary of problem areas (continued)
- Not knowing how to edit out unnecessary bits of
output. - Misunderstanding the meaning of p-values recorded
as .0000 - Not knowing how to process frequency data in
contingency tables.
8Kinds of problems
- Data entry.
- Choosing the right statistics.
- Interpretation of the output.
- These categories are certainly not independent
problems in one often arise from problems in
another.
9Data entry
- Often, what seems to be a problem with data
entry really arises from a lack of clarity about
the purpose and design of the investigation and
the variables involved.
10Purpose and design
- The user must be able to answer the following
questions - What was the hypothesis?
- How was the study designed?
- What pattern in the results would confirm the
hypothesis?
11The variables Some key questions for learners
- Was your investigation an experiment or a
correlational study? - If it was an experiment, what were the
independent and dependent variables? - Was the experiment of between subjects or of
within subjects design?
12Key terms
- To use SPSS, the user must be fully conversant
with some standard terms in experimental design.
- The user needs to know these terms in order in
complete SPSSs dialog boxes.
13Some common terms
- Dependent (DV) and independent (IV) variables.
- Between subjects versus within subjects (repeated
measures). - Independent versus related samples.
- Nominal, ordinal and interval (or scale) data.
- Factors and levels.
- Covariate.
14Between subjects or within subjects?
- If different participants have been used for
different conditions, then this is a between
subjects design. - If the same participant is tested more than once
under different conditions, then more than one
score will result for each participant. This is
a within subjects design.
15Misunderstood concepts univariate, bivariate and
multivariate statistics
- Univariate statistics relate to analyses with
just one DV (e.g. driving performance). - Bivariate statistics relate to analyses such as
correlation where two variables are measured but
neither can be considered as an IV (e.g. scores
in two tests). - Multivariate statistics relate to analyses with
more than one DV (e.g. reaction time number of
errors).
16We discourage the use of MANOVA
- MANOVA is multivariate analysis of variance i.e.
ANOVA applied to more than one DV. - In line with Tabachnick Fidells recommendation
(Using Multivariate Statistics 4th Edition p323),
we discourage students from using MANOVA and
encourage them instead to use several univariate
ANOVAs. - The difficulty is that a MANOVA significant
result does not tell you which DVs are sensitive
to the IV(s).
17SPSS data sets
- An SPSS data set in the Data Editor is not in all
respects like the sort of table one sees in a
book or a journal article. - In a published table, the columns may represent
data gathered on different groups of people. - In an SPSS data set, each row contains data on
only one participant.
18Between subjects example (p.23)
- An experiment has been carried out to investigate
the effects of a drug upon performance. - The design is between subjects a Placebo and
Drug group are compared in their performance on a
test of skill.
19A between subjects experiment (Kinnear Gray
2004, page 23)
20Grouping variables
- Each participants group membership is indicated
by a code number 1 Placebo 2 Drug. - Together, the code numbers make up what is known
as a grouping variable. - Now, each row in the data set contains
information on only one participant.
21Between subjects data in SPSS format (ibid page
24)
22Variable View of Data Editor for between subjects
design (ibid page 29)
23Data View of Data Editor for between subjects
design (ibid p.32)
24Within subjects example (p.250)
- Participants were each asked to produce three
pictures of an object using a different drawing
medium for each. - The media were crayon, paintbrush and felt-tip
pen. - A panel of judges then rated the aesthetic
pleasingness of the pictures. - Since each participant used all three media, the
design is within subjects. (The order of using
the media counterbalanced across participants).
25Within subjects experiment (ibid page 250)
26Variable View of Data Editor for within subjects
design
27Data View of Data Editor for within subjects
design
28Not undertaking exploratory data analysis (EDA)
before launching into statistical analyses.
- Learners are often so excited at having
completed a data file that they do not - Bother to check whether there have been any
transcription errors. - Whether there are outliers needing attention.
29Exploring your data
- Statistics such as the mean, SD and r are very
useful for summarising some feature of the data.
But they can all be misleading. - It is essential to use graphical methods to
confirm the summaries provided by the statistics.
30Get to know your data first
- Students must be warned against pressing ahead
with formal tests before they have first explored
their data thoroughly.
31What to look for in a scatterplot
- The cloud of points should either be elliptical
or circular. - An ellipse indicates a linear relationship a
circular cloud indicates independence.
32Example of an outlier
33More problems
- Failure to label variables and values carefully.
This can make the output very difficult to read. - Inability to analyse nominal data.
34Choosing the right statistic
- We use statistical techniques for TWO purposes.
- Summarising and exploring our data.
- Confirming the patterns we find in our data.
35Five common research situations (Kinnear Gray
2004, page 6)
36Not knowing how to select the appropriate analysis
- Learners find difficulty finding the ANOVA
analysis because SPSS does not have an item in
the Analyze drop-down menu called ANOVA. - They also have difficulty finding the chi-square
test of association and other statistics for
analysing contingency tables because they do not
remember that it is in the Crosstabs item of the
drop-down menu Descriptive Statistics.
37Yet more problems
- Failure to remove unnecessary items from the SPSS
output. - Confusion by expressions such as .0000 in columns
of p-values.
38Not knowing how to edit out unnecessary bits of
output
- Learners are prone to print all SPSSs output
without taking the trouble to edit out
unnecessary bits. - Editing can include cutting out whole items
(tables, graphics) or just bits within tables. - The output for within subjects ANOVA can be
particularly obscure without appropriate editing.
39More extensive outputs
- Lets have a look at some ANOVA outputs and see
how we can edit them.
40Complete within subjects output
41Within subjects Mauchly output
42Within subjects effects output
43ANOVA summary table after editing
44Edited Bonferroni output
45Profile plot before editing
46Profile plot after editing
47Results of editing output
- Hopefully you will agree that suitable editing of
the SPSS output has made the material much more
suitable for inserting into laboratory or
research reports. - Learners often think that they are obliged to
paste everything SPSS outputs into their reports.
48Misunderstanding the meaning of p-values recorded
as .0000
- Learners seem to have difficulty interpreting
p-values. - They often do not realise that .0600 is larger
than .0500 and hence that the statistical test is
not significant (assuming ? 0.05). - They are bewildered by decimals with just zeros
after the point e.g. .0000
49Not knowing how to process frequency data in
contingency tables
- Two problems here
- How to enter the data.
- How to find chi-square and other contingency
table statistics. - Learners need reminding that they need two coding
variables as well as a count variable.
50Contingency table example (Kinnear Gray 2004,
page 308)
51Contingency Table for SPSS
- The table has to be re-arranged into two coding
variables and a count variable as in the next
slide.
52Two coding variables and a count variable (ibid
p.309)
53Remember the Weight Cases procedure
- When data are aggregated as in the previous
table, it is necessary to tell SPSS that the
numbers in the Count variable are frequencies. - This is done by means of the Weight Cases item in
the Data drop-down menu. - Weight Cases is not needed when each row of the
data set represents a participant.
54The Weight Cases procedure for getting SPSS to
treat the data in Count as a frequency
55Finding the Chi-square measure of association in
Crosstabs
56The Crosstabs dialog box (ibid p.310)
57Selecting Chi-square and expected frequencies
58Contingency table with expected counts
59Chi-square output (continued)
60Conclusions
- I have tried to identify the most common
difficulties we have found for people learning to
use SPSS. - Some of these are due to basic statistical
inexperience but others relate to the way in
which SPSS operates. - Learners find difficulties in working out how to
re-arrange their data to fit in with SPSSs Data
Editor requirement that rows generally represent
single participants (or cases) and columns
represent variables.
61Conclusions (continued)
- Learners need to be clear about what are their
DVs and what are their IVs. They also need to be
clear about whether the experiment has
independent or related samples of scores (i.e.
between subjects or within subjects designs). - Prudent editing of output is desirable.
- They need to understand how to re-arrange data
for the Chi-Square test of association.
62The End
- Many thanks for your attention.