Title: Hypothesis Testing, Statistical Power
1Hypothesis Testing, Statistical Power
- Psychology 203
- Lecture 15
- Week 8
2By now you know that
- Psychological science progresses by taking
calculated risks - Accept H1 and reject Ho
- Reject H1 and accept Ho
- But failing to reject Ho does not mean that it is
true - Absence of evidence is not evidence of absence
- Thus we have the concept of type 1 and type 2
errors
3Summary of Hypothesis Testing
- Hypothesised population parameter The
population mean is the standard against which a
sample will be compared and the null hypothesis
tested - Sample Statistic The sample estimate of the
population mean is used to establish the
probability of the sample mean occurring - To do this we need an estimate of the sampling
distribution of the mean (SEM) - We use this to help calculate the Test Statistic
4Criticisms of hypothesis testing
- Hypothesis testing produces an all or none
decision - z1.90 is not significant whereas z2.0 is
- But you have to draw a line in the sand
somewhere - An experimental treatment ALWAYS has an effect
however small. - If so, then Ho is always false
- But as we are using an inferential process it is
impossible to show that Ho is true but we can
show that Ho is very unlikely (in other words
falsify it)
5Further Criticism
- Tests of significance tell us about chance they
do not tell us about the significance of the
result - The size of the treatment effect is being
compared to what we would expect by chance - If the SEM is small then the actual treatment
effect can be relatively small and be
significant in practice the effect might be
trivial
6For Example
If n25, s 20 and x-µ 5
7For Example
If n100, s 20 and x-µ 5
2.5 (plt.05) significant
LESSON A small effect can be significant if the
sample size is large enough
8Tests of Hypotheses do not tell us about
MAGNITUDE of the effect!
- It is now a requirement of the APA format that
tests of significance also have to be accompanied
by an EFFECT SIZE. - COHENs d is one of the simplest measures of
effect size.
Cohens d Mean Difference Std Deviation
9To go back to our previous example
If n100, s 20 and x-µ 5 then d 5/20
If n25, s 20 and x-µ 5 then d5/20
Cohens d is unaffected by sample size unlike
tests of statistical significance!
10What do effect sizes look like
11A rule of thumb about effect sizes
- 0ltdlt.2 Small effect mean difference less than .2
SD - .2ltdlt.8 Medium effect size
- dgt.8 Larger effect
12Statistical Power
- If the criterion is Plt.05 then
- Type 1 error rate set at 5
- Type 1 error rejecting Ho when it is true
- The probability of a correct decision is 1-a e.g.
1-.05 .95 - The determination of the Type 2 error rate is
less straight forward - Type 2 Failure to reject Ho when it is false
- p(Type 2 error)ß
- It is not straight forward as it depends on the
number of subjects and the effect size - Probability of correctly rejecting Ho is 1-ß
- This is known as POWER
13The Benefits of a Power Analysis
- Imagine the case where Ho states a new cancer
drug is no better than the old cancer drug but
that Ho is false (ie. the new drug is better) - Researchers would be blind to the benefits of the
new drug if the test they used was not sensitive
to the effect (however small) the drug was
having. - Power analysis could be used to provide the tools
necessary to set up studies assuming H1 is true
and Ho false to reject Ho with a relatively large
probability (e.g. p0.8) if it was indeed false - In other words, if Ho is false we want to be able
to conduct an experiment that has a chance of
leading us to this conclusion - We can quantify this chance
14Effect Size and Power
- If Ho is true then the size of the effect exerted
by the IV is zero. - Sampling variation may lead to a non-zero effect
size - Magnitude of a significance test statistic (t, F,
r) size of effect x size of the study - If the sample is large enough we could get a
difference or a relationship simply by chance
alone - Relying purely on statistical significance to
tell us about the real significance of an effect
is therefore suspect. The result needs to be
interpreted in context
15a
16Statistical Power 1
- If the experiment is under powered we may miss an
experimental effect (commit a type 2 error) as
more sample means would be expected to fall below
the a value - By under powered we usually mean that the
expected effect size (mean difference or extent
of covariation) was small - - relative to the number of participants in the
study.
17Statistical Power 2
- If an experiment is over powered we run the risk
of making the opposite mistake that there was
no effect (type 1 error) and the difference is
simply due to chance. - By over powered we usually mean that the sample
size was so large that trivial differences were
treated as being significant. - Triviality is determined by the research context
18A definition of Power
- Power is the probability that the test will
correctly reject the null hypothesis. - Power is the probability of obtaining data in the
critical region when the null hypothesis is
false. - If the probability of incorrectly rejecting the
null hypothesis is .2 then - Power p(reject a false Ho) 1-ß or .8 in this
case
19Power and Effect Size
- Power and effect size are related as
- If an effect is large then it will be easy to
detect p(type 2 error is low) as the sampling
distribution of the means will overlap to a
lesser extent than if the effect were small - If an effect size is small then p(type 2 error)
is higher as there is a greater chance the
distribution of the sample means will overlap - The more the distribution of the sample means
overlap the greater is the probability that the
mean for the treatment group will be less than
the value at a
20A set of sample means that would be obtained with
a 20-point treatment effect.
21Here we see a 40 point effect. The sampling
distribution of the means is the same as the
previous example but the overlap is much less.
The power in this example is roughly 95.
22We can calculate the power of a test precisely by
locating where Alpha for the null distribution
falls with respect to the sampling distribution
of the means for the treatment distribution
23Calculating power
- Step 1 Where is alpha? (µ 1.96sM) 220
- Step 2 Where is 220 in the treatment sampling
distribution? - Step 3 Calculate
ß .025 and Power is roughly .975 in this case
(1-ß) (or 97.5)
24Power and Type 2 Error Rate
- The convention is that the power of a stufy
should be . 8 which means that the probability of
accepting Ho when it is false (a type 2 error) is
.2 - This means the null hypothesis, when it is false,
will be rejected 80 of the time - With a power of .25 Ho will only be rejected on
one occasion out of 4! - Under-powered experiments really are a waste of
time! - When ever possible we should estimate the power
of our studies before we do them.
25Power and Sample Size
- When the sample size is small the sampling
distribution of the means is large and it is
harder to detect an experimental effect - Because if the effect size is small then the
sampling distribution of the means for the
control and treatment groups are more likely to
overlap and hence H1 has a great probability of
being rejected. - When the sample size is large the sampling
distribution of the means is smaller and hence
even with a small effect size the overlap in the
sampling distributions of the means for the
control and treatment groups will be small.
26Note effect of sample size on the standard errors
Sampling distributions of means for null and
treatment populations
27Other Factors Affecting Power
- Making alpha smaller (e.g., plt.01) reduces power
as the more extreme criterion will move a further
into the sampling distribution of the treatment
means (more overlap in sampling distributions) - Changing from a two tailed test to a one tailed
test will increase power as the critical value at
the alpha point in the distribution will move
away from the sampling distribution of the
treatment mean (less overlap) - See http//www.socialresearchmethods.net/kb/power.
htm
28Calculating power using tables
Look up power tables and if power gt.80 then the
study is acceptable.
Cohens d
29If the required power.8 What does d need to be?
30Calculating power using tables
From the above we can easily see how to determine
the required sample size
d is determined from the table The value of
Cohens d is determined from experience (or the
literature)
31Calculating sample size
- Imagine the expected effect size is .5 and the
required power is .8. - d 2.8