Hypothesis Testing, Statistical Power - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Hypothesis Testing, Statistical Power

Description:

By now you know that... Psychological science progresses by taking calculated risks: Accept H1 and reject Ho ... Absence of evidence is not evidence of absence ... – PowerPoint PPT presentation

Number of Views:346
Avg rating:3.0/5.0
Slides: 32
Provided by: DavidMo172
Category:

less

Transcript and Presenter's Notes

Title: Hypothesis Testing, Statistical Power


1
Hypothesis Testing, Statistical Power
  • Psychology 203
  • Lecture 15
  • Week 8

2
By now you know that
  • Psychological science progresses by taking
    calculated risks
  • Accept H1 and reject Ho
  • Reject H1 and accept Ho
  • But failing to reject Ho does not mean that it is
    true
  • Absence of evidence is not evidence of absence
  • Thus we have the concept of type 1 and type 2
    errors

3
Summary of Hypothesis Testing
  • Hypothesised population parameter The
    population mean is the standard against which a
    sample will be compared and the null hypothesis
    tested
  • Sample Statistic The sample estimate of the
    population mean is used to establish the
    probability of the sample mean occurring
  • To do this we need an estimate of the sampling
    distribution of the mean (SEM)
  • We use this to help calculate the Test Statistic

4
Criticisms of hypothesis testing
  • Hypothesis testing produces an all or none
    decision
  • z1.90 is not significant whereas z2.0 is
  • But you have to draw a line in the sand
    somewhere
  • An experimental treatment ALWAYS has an effect
    however small.
  • If so, then Ho is always false
  • But as we are using an inferential process it is
    impossible to show that Ho is true but we can
    show that Ho is very unlikely (in other words
    falsify it)

5
Further Criticism
  • Tests of significance tell us about chance they
    do not tell us about the significance of the
    result
  • The size of the treatment effect is being
    compared to what we would expect by chance
  • If the SEM is small then the actual treatment
    effect can be relatively small and be
    significant in practice the effect might be
    trivial

6
For Example
If n25, s 20 and x-µ 5
7
For Example
If n100, s 20 and x-µ 5
2.5 (plt.05) significant
LESSON A small effect can be significant if the
sample size is large enough
8
Tests of Hypotheses do not tell us about
MAGNITUDE of the effect!
  • It is now a requirement of the APA format that
    tests of significance also have to be accompanied
    by an EFFECT SIZE.
  • COHENs d is one of the simplest measures of
    effect size.

Cohens d Mean Difference Std Deviation
9
To go back to our previous example
If n100, s 20 and x-µ 5 then d 5/20
If n25, s 20 and x-µ 5 then d5/20
Cohens d is unaffected by sample size unlike
tests of statistical significance!
10
What do effect sizes look like
11
A rule of thumb about effect sizes
  • 0ltdlt.2 Small effect mean difference less than .2
    SD
  • .2ltdlt.8 Medium effect size
  • dgt.8 Larger effect

12
Statistical Power
  • If the criterion is Plt.05 then
  • Type 1 error rate set at 5
  • Type 1 error rejecting Ho when it is true
  • The probability of a correct decision is 1-a e.g.
    1-.05 .95
  • The determination of the Type 2 error rate is
    less straight forward
  • Type 2 Failure to reject Ho when it is false
  • p(Type 2 error)ß
  • It is not straight forward as it depends on the
    number of subjects and the effect size
  • Probability of correctly rejecting Ho is 1-ß
  • This is known as POWER

13
The Benefits of a Power Analysis
  • Imagine the case where Ho states a new cancer
    drug is no better than the old cancer drug but
    that Ho is false (ie. the new drug is better)
  • Researchers would be blind to the benefits of the
    new drug if the test they used was not sensitive
    to the effect (however small) the drug was
    having.
  • Power analysis could be used to provide the tools
    necessary to set up studies assuming H1 is true
    and Ho false to reject Ho with a relatively large
    probability (e.g. p0.8) if it was indeed false
  • In other words, if Ho is false we want to be able
    to conduct an experiment that has a chance of
    leading us to this conclusion
  • We can quantify this chance

14
Effect Size and Power
  • If Ho is true then the size of the effect exerted
    by the IV is zero.
  • Sampling variation may lead to a non-zero effect
    size
  • Magnitude of a significance test statistic (t, F,
    r) size of effect x size of the study
  • If the sample is large enough we could get a
    difference or a relationship simply by chance
    alone
  • Relying purely on statistical significance to
    tell us about the real significance of an effect
    is therefore suspect. The result needs to be
    interpreted in context

15
a
16
Statistical Power 1
  • If the experiment is under powered we may miss an
    experimental effect (commit a type 2 error) as
    more sample means would be expected to fall below
    the a value
  • By under powered we usually mean that the
    expected effect size (mean difference or extent
    of covariation) was small
  • - relative to the number of participants in the
    study.

17
Statistical Power 2
  • If an experiment is over powered we run the risk
    of making the opposite mistake that there was
    no effect (type 1 error) and the difference is
    simply due to chance.
  • By over powered we usually mean that the sample
    size was so large that trivial differences were
    treated as being significant.
  • Triviality is determined by the research context

18
A definition of Power
  • Power is the probability that the test will
    correctly reject the null hypothesis.
  • Power is the probability of obtaining data in the
    critical region when the null hypothesis is
    false.
  • If the probability of incorrectly rejecting the
    null hypothesis is .2 then
  • Power p(reject a false Ho) 1-ß or .8 in this
    case

19
Power and Effect Size
  • Power and effect size are related as
  • If an effect is large then it will be easy to
    detect p(type 2 error is low) as the sampling
    distribution of the means will overlap to a
    lesser extent than if the effect were small
  • If an effect size is small then p(type 2 error)
    is higher as there is a greater chance the
    distribution of the sample means will overlap
  • The more the distribution of the sample means
    overlap the greater is the probability that the
    mean for the treatment group will be less than
    the value at a

20
A set of sample means that would be obtained with
a 20-point treatment effect.
21
Here we see a 40 point effect. The sampling
distribution of the means is the same as the
previous example but the overlap is much less.
The power in this example is roughly 95.
22
We can calculate the power of a test precisely by
locating where Alpha for the null distribution
falls with respect to the sampling distribution
of the means for the treatment distribution
23
Calculating power
  • Step 1 Where is alpha? (µ 1.96sM) 220
  • Step 2 Where is 220 in the treatment sampling
    distribution?
  • Step 3 Calculate

ß .025 and Power is roughly .975 in this case
(1-ß) (or 97.5)
24
Power and Type 2 Error Rate
  • The convention is that the power of a stufy
    should be . 8 which means that the probability of
    accepting Ho when it is false (a type 2 error) is
    .2
  • This means the null hypothesis, when it is false,
    will be rejected 80 of the time
  • With a power of .25 Ho will only be rejected on
    one occasion out of 4!
  • Under-powered experiments really are a waste of
    time!
  • When ever possible we should estimate the power
    of our studies before we do them.

25
Power and Sample Size
  • When the sample size is small the sampling
    distribution of the means is large and it is
    harder to detect an experimental effect
  • Because if the effect size is small then the
    sampling distribution of the means for the
    control and treatment groups are more likely to
    overlap and hence H1 has a great probability of
    being rejected.
  • When the sample size is large the sampling
    distribution of the means is smaller and hence
    even with a small effect size the overlap in the
    sampling distributions of the means for the
    control and treatment groups will be small.

26
Note effect of sample size on the standard errors
Sampling distributions of means for null and
treatment populations
27
Other Factors Affecting Power
  • Making alpha smaller (e.g., plt.01) reduces power
    as the more extreme criterion will move a further
    into the sampling distribution of the treatment
    means (more overlap in sampling distributions)
  • Changing from a two tailed test to a one tailed
    test will increase power as the critical value at
    the alpha point in the distribution will move
    away from the sampling distribution of the
    treatment mean (less overlap)
  • See http//www.socialresearchmethods.net/kb/power.
    htm

28
Calculating power using tables
  • The one sample t-test

Look up power tables and if power gt.80 then the
study is acceptable.
Cohens d
29
If the required power.8 What does d need to be?
30
Calculating power using tables
  • If

From the above we can easily see how to determine
the required sample size
d is determined from the table The value of
Cohens d is determined from experience (or the
literature)
31
Calculating sample size
  • Imagine the expected effect size is .5 and the
    required power is .8.
  • d 2.8
Write a Comment
User Comments (0)
About PowerShow.com