Title: The Presentation of Statistics in Clinical and Health Psychology Research
1The Presentation of Statistics in Clinical and
Health Psychology Research
- Jeremy Miles
- Department of Health Sciences
Susanne Hempel Centre for Reviews and
Dissemination
2Introduction
- Statistics in clinical and health psychology
- Appropriate statistics used
- Statistics appropriately presented
- Graphical display
- Verbal presentation
3Methodology
- Reviewed 2003 volumes (4 issues) of
- British Journal of Clinical Psychology
- British Journal of Health Psychology
- Looking for
- Errors of statistical presentation /
interpretation - Potential areas of improvement
4Results
- BJCP 29 papers reviewed
- BJHP 31 papers reviewed
- 5 excluded (qualitative, narrative review)
- Wide range of problems identified
- Emerging themes
- P-values
- Inferential statistics
- Effect Sizes
- Reliability
- Other Issues
- 2 papers with no issues
5Statistical Significance
6Statistical Significance
- Confusing and controversial issue
- Misunderstood by students, researchers, teachers,
textbook authors - (Broadly) two rival approaches to probability
- Fisher report exact significance value
- Neyman-Pearson lt0.05, or not
- These are incompatible(!)
- (Ignoring Bayes ignoring meanings of probability)
7A Bastardised Approach
- (From Gigerenzer, 1992)
- The two approaches are misunderstood, and
combined - We must report the exact p
- We must present results as lt0.xx
- Recommended
- Exact probability values (e.g. Wilkinson, et al,
1999)
8Results of p-value reporting
- BJCP 8 out of 29 reported exact p-values
- 1 used strict N-P approach
- BJHP 4 out of 26 reported exact p-values
9More on P-Values
- 2 papers reported p lt 0 (.00)
- True values were 0.000040, 0.000007
- Several reported arbitrary cutoffs
- lt0.07, lt0.02
- Incorrect, but not deceptive
10Misleading?
- Not using exact p-values sometimes appears fishy
- Exact p-values for all except where p 0.049,
reported as p lt 0.05 - Gave p gt 0.05 (p 0.057), p lt 0.05 (p 0.048)
- P lt 0.01 when p 1 10-19 (others in same paper
reported as p lt 0.001) - p 0.0104, described as lt 0.01, p 0.0123
described as lt0.05
11Finally Mistakes
- Good old errors
- Very hard for readers and reviewers to spot, but
still - F (1, 69) 4.58, p lt 0.001
- No, p 0.035
- F (1.76, 142.51) 3.026, p .058.
- No, p 0.084
- F 4.02, (df not given, but are 2, 62), p
0.05. (information in table) - No, p 0.022
12Inferential Statistics
13Reporting Test Statistics
- Most people cant interpret a test statistic
- Even fewer are interested
- Why report a test statistic exactly, and not the
exact p? - no significant interaction of both variables,
F (1,67) .289. No p-value given (its 0.59) - F without df
- No use at all (unless df can be worked out, but
can be tricky or ambiguous)
14Standard Errors
- Standard error is the standard deviation of the
sampling distribution - Used to calculate t (and hence p-value) and CIs
- 95 CIs given by
- Value depends on df
- df 5, ta/2 2.57
- df 100, ta/2 1.98
- Standard error has little use.
15Graph shows mean /- 1 SE. SE Mean is not showing
anything useful
16Graph shows mean /- standard error. Data are
repeated measures.
17Confidence Intervals
- Generally recommended that confidence intervals
are reported - Better idea of the likely value in the population
- Not significant ? no effect
- Appropriate confidence intervals
- BJCP 3 (of 29)
- BJHP 4 (of 26)
18Inappropriate Confidence Intervals / Standard
Errors
- Compare two groups
- Appropriate standard error / confidence interval
is of the difference , not of each group
19Independent groups study Significant difference?
Yes. t 2.7, df 18, p 0.016, difference
2.7, 95 CIs 0.60, 4.80
20Repeated measures study Significant difference?
t 2.25, df 9, p 0.051 Difference 2.7, 95
CIs -0.02, 2.25
Trick question. Its the same graph, and I
havent given you enough information
21Effect Sizes
22Effect Sizes
- More statistically significant larger, more
important effect? - No
- Effect sizes describe the size of the effect
- r, d, h2, R2
-
Yes No
BJCP 4 16
BJHP 7 10
23Reliability Reporting
24Reliability Reporting
- Small, but important
- Reliability is not a property of a test
- It is a property of a test, in a population, at a
particular time - Reliability should always be evaluated, and
presented
All Some None
BJCP 5 4 14
BJHP 6 3 11
25Stepwise Regression
- Almost never appropriate
- Small differences in samples can lead to large
differences in results - 1 paper discusses differences between two
stepwise regressions - Df are wrong (hence F, and p are also wrong)
- Use of stepwise regression
- BJCP 1
- BJHP 2 (one not described as stepwise)
26A Collection of Smaller Issues
27Distributional Assumptions
- Very few tests assume normal distribution of the
variables - When sample sizes are at least moderate, normal
distribution unimportant - Kolmogorov-Smirnov test examines significant
difference from normality - Not important difference from normality (Field?)
- 2 papers (BJCP) used the KS test
- Non-parametric tests
28Other Miscellany
- Mann-Whitney test described as comparing medians
(it doesnt necessarily) - Principal components analysis described as
exploratory factor analysis (its not) - Expected values of chi-square test violated
- Arithmetical errors in chi-square test
- Correlation used as measure of agreement
- We all know that it isnt
- Inappropriate dichotomisation of continuous
variables - Never necessary
29Hall of Shame
30Conclusions
31Summary
- Picture isnt rosy
- Errors are not limited to psychology
- Garcia-Berthou and Alcaraz (2004) found errors in
Nature and the British Medical Journal - There are a lot of areas for improvement
32Solutions? Short Term
- More statistical refereeing?
- More guidelines for reviewers
- More reviewers with expertise in statistics
- BJCP and BJEP have statistical reviewers
- Rapid response?
- Could be set up with the electronic journals
- Work in other fields
33Solutions? Long Term
- Statistical / methodological training?
- Undergraduate? Postgraduate? CPD?
- Work more closely with statisticians?
- Common in other fields MSc in Medical
Statistics is possible, MSc in Psychological
Statistics is not
34Final Thought
- Aaagggghhhhh!
- We just did a piece of qualitative research?