Title: LEARNING TO USE SPSS: SOME COMMON PROBLEMS
1LEARNING TO USE SPSS SOME COMMON PROBLEMS
- Paul Kinnear
-
- University of Aberdeen
2Introduction
- Colin Gray and I have been teaching our students
statistics and the use of statistical packages
for many years. - We have written several books on SPSS, published
by Psychology Press. - The latest is SPSS14 Made Simple (2006). We are
currently updating it for the recently released
SPSS15. - Page numbers in later images relate to the SPSS14
book.
3Our associates
- Our teaching has benefited hugely from the advice
and support of John Lemon, our Universitys
Senior Computing Adviser, and Caroline Green,
Senior Teaching Fellow in the School of
Psychology. - Today, I would like to share with you some of our
experiences in teaching SPSS to our students.
4Purpose of this talk
- The advent of Windows-based versions of SPSS has
been a boon to students and teachers alike. - Certain aspects of SPSS, however, present
problems for students. - My purpose today is to identify some of the most
common difficulties.
5Example of a problem
- Many difficulties arise from users not taking
time to make a list of what their dependent and
independent variables are, and what they are
trying to show in terms of their experimental
hypothesis (or hypotheses).
6Summary of problem areas
- Not identifying the variables and their types.
- Not understanding the differences between within
subjects and between subjects designs. - Not undertaking exploratory data analysis (EDA)
before launching into statistical analyses. - Not knowing how to select the appropriate
analysis.
7Summary of problem areas (continued)
- Not knowing how to edit out unnecessary bits of
output. - Misunderstanding the meaning of p-values recorded
as .0000 - Not knowing how to process frequency data in
contingency tables.
8Where problems can arise
- Data entry.
- Choosing the right statistics.
- Interpretation of the output.
- These areas are certainly not independent
problems in one area often arise from problems in
another.
9Data entry
- Often, what seems to be a problem with data
entry arises from a lack of clarity about the
purpose and design of the investigation and
about the variables involved.
10Purpose and design
- The user must be able to answer the following
questions - What was the hypothesis?
- How was the study designed?
- What pattern in the results would confirm the
hypothesis?
11The variables Some key questions for learners
- Was your investigation an experiment or a
correlational study? - If it was an experiment, what were the
independent and dependent variables? - Was the experiment a between subjects or a within
subjects (or possibly a mixed) design?
12Key terms
- To use SPSS, the user must be fully conversant
with some standard terms in experimental design.
- The user needs to know these terms in order to
complete SPSSs dialog boxes.
13Some common terms
- Dependent (DV) and independent (IV) variables.
- Between subjects versus within subjects (repeated
measures). - Independent versus related samples.
- Nominal, ordinal and interval (or scale) data.
- Factors and levels.
- Covariates.
14Between subjects or within subjects?
- Users are often unclear about this distinction.
- A treatment factor is between subjects if there
is no basis for pairing the data obtained under
different conditions. - For example, if the same participant is tested
twice under different conditions, a pair of
scores will result. - Such paired data are likely to be correlated.
15Misunderstood concepts univariate, bivariate and
multivariate statistics
- Univariate statistics relate to analyses with
just one DV (e.g. driving performance). - Bivariate statistics relate to analyses such as
correlation where two variables are measured but
neither can be considered as an IV (e.g. scores
in two tests). - Multivariate statistics relate to analyses with
more than one DV (e.g. reaction time number of
errors).
16We discourage the use of MANOVA
- MANOVA is multivariate analysis of variance i.e.
ANOVA applied to more than one DV. - In line with Tabachnick Fidells recommendation
(Using Multivariate Statistics 4th Edition, p.
323), we discourage students from using MANOVA
and encourage them instead to use several
univariate ANOVAs. - The difficulty is that a MANOVA significant
result does not tell you which DVs are sensitive
to the IV(s).
17SPSS data sets
- An SPSS data set in the Data Editor is not in all
respects like the sort of table one sees in a
book or a journal article. - In a published table, the columns may represent
data gathered on different groups of people. - In an SPSS data set, each row contains data on
only one participant.
18Between subjects example (p. 23)
- An experiment has been carried out to investigate
the effects of a drug upon performance. - The design is between subjects a Placebo and
Drug group are compared in their performance on a
test of skill.
19A between subjects experiment (p.26)
20Grouping variables
- Each participants group membership is indicated
by a code number 1 Placebo 2 Drug. - Together, the code numbers make up what is known
as a grouping variable. - Now, each row in the data set contains
information on only one participant.
21Between subjects data in SPSS format (p. 27)
22Variable View of Data Editor for between subjects
design (p. 31)
23Data View of Data Editor for between subjects
design (p. 35)
24Within subjects example (p. 302)
- Participants took part in shooting at three
shapes of target (circle square triangle) and
their accuracy scores were recorded. - Since each participant fired at all three target
shapes, the design is within subjects. (The
order of using the media counterbalanced across
participants).
25Variable View of Data Editor for within subjects
design (p. 302)
26Data View of Data Editor for within subjects
design
27Not undertaking exploratory data analysis (EDA)
before launching into statistical analyses.
- Learners are often so excited at having
completed a data file that they do not - Bother to check whether there have been any
transcription errors. - Whether there are outliers needing attention.
28Exploring your data
- Statistics such as the mean, SD and r are very
useful for summarising some feature of the data.
But they can all be misleading. - It is essential to use graphical methods to
confirm the summaries provided by the statistics.
29Get to know your data first
- Learners must be warned against pressing ahead
with formal tests before they have first explored
their data thoroughly.
30Panelled bar charts (p.144)
31Pie Chart
32What to look for in a scatterplot
- The cloud of points should either be elliptical
or circular. - An ellipse indicates a linear relationship a
circular cloud indicates independence.
33Example of an outliers (p. 185)
34More problems
- Failure to label variables and values carefully.
This can make the output very difficult to read. - Inability to analyse nominal data.
35Choosing the right statistic
- We use statistical techniques for TWO purposes.
- Summarising and exploring our data.
- Confirming the patterns we find in our data.
36Five common research situations (p. 13)
37Not knowing how to select the appropriate analysis
- Learners find difficulty finding the ANOVA
analysis because SPSS does not have an item in
the Analyze drop-down menu called ANOVA. - They also have difficulty finding the chi-square
test of association and other statistics for
analysing contingency tables because they do not
remember that it is in the Crosstabs item of the
drop-down menu Descriptive Statistics.
38Yet more problems
- Failure to remove unnecessary items from the SPSS
output. - Confusion by expressions such as .0000 in columns
of p-values.
39Not knowing how to edit out unnecessary bits of
output
- Learners are prone to print all SPSSs output
without taking the trouble to edit out
unnecessary bits. - Editing can include cutting out whole items
(tables, graphics) or just bits within tables. - The output for within subjects ANOVA can be
particularly obscure without appropriate editing.
40Need to edit output too
- Learners are also wary of trying to edit output.
- The result is that they copy large tables with
unnecessary columns and unnecessary decimal
places into their documents.
41More extensive outputs
- Lets have a look at some ANOVA outputs and see
how we can edit them.
42Complete within subjects output
43Within subjects Mauchly output (p. 313)
44Within subjects effects output (p. 314)
45ANOVA summary table after editing
46Edited Bonferroni output
47Profile plot before editing
48Profile plot after editing the differences are
less impressive
49Results of editing output
- Hopefully you will agree that suitable editing of
the SPSS output has made the material much more
suitable for inserting into laboratory or
research reports. - Learners often think that they are obliged to
paste everything SPSS outputs into their reports.
50Misunderstanding the meaning of p-values recorded
as .0000
- Learners seem to have difficulty interpreting
p-values. - They often do not realise that .0600 is larger
than .0500 and hence that the statistical test is
not significant (assuming ? 0.05). - They are bewildered by decimals with just zeros
after the point e.g. .0000
51Not knowing how to process frequency data in
contingency tables
- Two problems here
- How to enter the data.
- How to find chi-square and other contingency
table statistics. - Learners need reminding that they need two coding
variables as well as a count variable.
52Contingency table example (p. 380)
53Contingency Table for SPSS
- The table has to be re-arranged into two coding
variables and a count variable as in the next
slide.
54Two coding variables and a count variable (p. 383)
55Remember the Weight Cases procedure
- When data are aggregated as in the previous
table, it is necessary to tell SPSS that the
numbers in the Count variable are frequencies. - This is done by means of the Weight Cases item in
the Data drop-down menu. - Weight Cases is not needed when each row of the
data set represents a participant.
56The Weight Cases procedure for getting SPSS to
treat the data in Count as a frequency
57Finding the Chi-square measure of association in
Crosstabs (p. 383)
58The Crosstabs dialog box (p. 384)
59Selecting Chi-square and expected frequencies
60Contingency table with expected counts (p. 386)
61Chi-square output (continued)
62Conclusions
- I have tried to identify the most common
difficulties we have found for people learning to
use SPSS. - Some of these are due to basic statistical
inexperience but others relate to the way in
which SPSS operates. - Learners find difficulties in working out how to
re-arrange their data to fit in with SPSSs Data
Editor requirement that rows generally represent
single participants (or cases) and columns
represent variables.
63Conclusions (continued)
- Learners need to be clear about what are their
DVs and what are their IVs. They also need to be
clear about whether the experiment has
independent or related samples of scores (i.e.
between subjects or within subjects designs). - Prudent editing of output is desirable.
- They need to understand how to re-arrange data
for the Chi-Square test of association.
64The End
- Many thanks for your attention.