Title: The Scholarship of Teaching and Learning SOTL
1 p Values and Effect Sizes Making Sense of
Research Findings
Scott Cottrell, Ed.D. West Virginia University
School of Medicine Carol Thrush,
Ed.D. University of Arkansas for Medical
Sciences Britta Thompson, Ph.D. Baylor College
of Medicine
2p values and Effect Sizes
- Workshop Goals
- Examine the calls for reporting effect sizes
- Define p values and effect sizes
- Identify limitations of p values
- Identify types of effect sizes
- Discuss how p values and effect sizes influence
interpretations of findings - Offer practical tips for reporting and
interpreting findings
3 4Reporting Effect Size
- APA Task Force on Statistical Inference
recommends reporting the direction, size, and
confidence interval of the effect - 20 journals require effect size reporting
52006
- For each critical statistical result there should
be - An effect size such as a treatment effect, a
regression coefficient, or an odds ratio - Indication of the uncertainty of effect (std
error or CI) - Interpretation of meaningfulnesse.g., the
estimated effect is large enough to be
educationally important but these data do not
rule out the possibility that the true effect is
actually quite small.
6Three Types of Significance
- Statistical significance (p-value)
- Measure of probability (how likely)
- Evaluate if group means or SDs are different
- Practical significance (effect size)
- Measure of magnitude of difference (how large)
- Clinical significance
- Measure of the value of the intervention to
individuals
Thompson, 2002
7A Matter of Estimation
We calculate statistics (e.g., t, r) from
samples. We ask How Rare is the statistic (p
value)? If we had population data, then there is
no need to calculate a statistic.
8 Limitation of Statistical Significance Tests
- So, statistical testing answers the question
- How rare (0 to 1.00) is a calculated statistical
result for a given sample size. - A p value is NOT the probability that a result is
important or practically significant.
9Effect Size its not about how likely but
about how big
Among lions, gender has a LARGE effect size
Among tigers, gender has a small effect size
10Different Effect Sizes
- Over 40 different effect sizes (Kirk, 1996)
- -Chi square analysis may use an odds ratio
- -Correlation may use an r 2
- There are many frameworks for distinguishing
effect sizes (Thompson, 2007) - Authors must explicitly say what the effect size
is!!!
11Common Effect Size Measures
- Standardized Differences
- Cohens d
- Glass delta
- Measures of Association
- Eta squared (?2)
- Coefficient of Determination(r 2)
12Calculation of Effect Sizes
- Glass delta
- Independent samples
- Mexp - Mcontl / SDcontl
- Dependent samples
- Mpre - Mpost / SDpre
- Cohens d
- Independent samples
- Mexp - Mcontl / SDpooled
Hojat Xu, 2004
13Interpretation of Effect Sizes
Large d .80 r .50
Hojat Xu, 2004 Kline, 2004
14Examples
15 Two Independent Samples
- Hypothesis
- A new learning method
- will help students improve their understanding of
statistics.
16 Research Design
- Two independent groups
- Intervention group (Innovative Method)
- Control group (Lecture only)
- Measure
- Quiz students before and after the 2-wk course
- Calculate difference in quiz score
- Sample
- Intervention group 5
- Control group 5
17Results
- Intervention group
- M 1.14
- SD .10
- 95 CI 1.09-1.19
Control group M 1.13 SD .13 95 CI
1.09-1.18
18Statistical Analysis
- Set Alpha (by custom) at .05
- Alpha is the cut-off p-value that is set before
the study is conducted - Run an independent samples t-test
- Result t 1.40 p value .09
- Conclusion
- Not statistically significant (.09 gt .05)
19 t Distribution
(I need a t statistic gt 1.86, which is rare)
20 Implications of My Findings?
- A p value of .09 is not good.
- Does that mean my findings are not important?
21I want to use my Innovative Method!!
- Through calculations, I determine that I will
need a total sample size of 42 - So, I increase my groups from 5 to 21
- What is my new result with the larger sample
size? - t 2.30 (higher than 1.86)
- p value of .001
22What do my results mean?
- Are my results important or significant?
- A misconception a p value determines whether a
result is important and replicable!! - The truth Only effect sizes can help us
determine whether a result offers a practical
significance.
23Effect Sizes My Example
- An observed effect can be statistically
significant, but yield little significant value. - For example, look at my mean scores
- mean quiz difference for intervention 1.14
- mean quiz difference for control 1.13
24 Effect Sizes Independent Samples Example
- Cohens d
- Mexp - Mcontl / SDpooled
- 1.14 1.13 / .12 .083
- Result - small effect
25 Dependent Sample
- Hypothesis
- A new palliative care curriculum will improve
students - palliative care attitudes
26 Research Design
- Dependent group (pre-post)
- Measure
- Attitudinal questionnaire administered before and
6-months after intervention - Sample
- Matched pre-post 300 students
27Results (Scale 1-7)
- Pre
- M 4.00
- SD 1.02
- 95 CI 3.41-4.06
Post M 5.50 SD .56 95 CI 5.44-5.55
28Statistical Analysis
- Set Alpha (by custom) at .05
- Run an independent samples t-test
- Result t 4.30 p value lt.001
- Conclusion
- Statistically significant (lt.001 lt .05)
29 Implications of My Findings?
- A p value of lt.001 is good
- Does that mean my findings (difference of 1.5
points) are important?
30Effect Sizes Dependent sample example
- Glass delta
- Mpre - Mpost / SDpre
- 4.00 - 5.50 / 1.02 1.47
- Is this a large effect?
31Take Home Pearls
32Effect Size is Key in Meta-Analysis
p values cant contribute to answering an
important question Is the magnitude of the
effect stable across studies? Replicable
results? Effect sizes can ascertain whether an
innovative teaching method has a stable,
practical significance across studies.
33Differences Between Students in Problem-Based and
Lecture-Based Curricula Measured by Clerkship
Performance Ratings at the Beginning of the Third
Year Whitfield, Mauger, Zwicker, Lehman 14(4),
2002, 211-217
- RESULTS Mean scoresdiffered significantly in
- some clerkships, but the effect size was small.
The - effect sizes for fund of knowledge ranged from
0.20 to - 0.41 for clinical problem-solving skills, they
ranged - from 0.26 to 0.39. These differences between the
- problem-based and lecture-based students were of
- the same magnitude as the difference at the
start - of medical school on the MCAT, namely d 0.31.
34- A Result that is Rare
- may not be significant
- Rare (plt.05) ? Practical Significance
- Yes, by itself, statistical significance means
very little. It merely means that the results
are rare Carver (1978)
35Pearls (Cont)
Avoid saying almost significant at
.06 extremely significant at .0001 ..000001
Avoid interpreting p as p ? .000 Surely, God
loves a .06 as much as .05. Rosnow and
Rosenthal
36Pearls (cont)
- Common Misconception
- If your results are not statistically
significant, then your results are not important - As Thompson has noted
- Another experiment with a larger sample may find
a statistically significant difference. - Its a matter of probability for a given sample
at a given time.
37Thank You!!!!
- Remember
- Statistics should never replace
- good judgment
38References
- American Educational Research Association (AERA).
Standards for Reporting on Empirical Social
Science Research in AERA Publications.
Educational Researcher, Aug/Sep, 2006, Vol. 35,
No. 6, 3340. - Carver, R. (1978). The case against statistical
significance testing. Harvard Educational Review
48(3) 378-99. - Colliver, J.A. (2002). Call for greater emphasis
on effect-size measures in published articles in
Teaching and Learning in Medicine. Teaching and
Learning in Medicine 14(4) 206-10. - Hojat, M., Xu, G. (2004). A visitor's guide to
effect sizes Statistical significance versus
practical (clinical) importance of research
findings. Advances in Health Sciences Education
9241-249. - Kline, R.B. (2004). Beyond significance testing
Reforming data analysis methods in behavioral
research. American Psychological Association
Washington, DC. - Thompson, B. (2006). Foundations of behavioral
statistics An insight-based approach. Guilford
New York. - Thompson, B. (2002). "Statistical", "practical",
and "clinical" How many kinds of significance do
counselors need to consider? J Counseling
Development 8064-71. - Wilkinson, L., Task Force on Statistical
Inference (1999). Statistical methods in
psychological journals Guidelines and
explanations. American Psychologist, 54, 594-604.