Title: Update on statistics
1Update on statistics
Phil Rowe Liverpool School of Pharmacy
2Update on statistics
- Plan ahead
- Interpretation of significance/non-significance
- Sample size calculations in the real world
- Keep it simple - keep it clear
- Keep it simple - keep it powerful
- Avoid multiple analyses
- Other techniques are available
3Plan ahead
Or else
- Inadequate power
- Poor experimental design
- Results incapable of analysis
- Optimal analysis no longer legitimate
4Plan ahead
Inadequate power Amazing proportion of published
experiments were virtually guaranteed to produce
a non-significant result even if a sizeable
experimental effect had been present. Poor
experimental design What statistical methods are
applicable will depend upon details of
experimental design. If you perform an unpaired
experiment, you may have to use (say) a
two-sample t-test, whereas a paired design could
have allowed the more powerful paired t-test
5Plan ahead
Results incapable of analysis Fundamental flaws
in experimental design should be identifiable
even without considering statistical analysis.
However, if you produce a satisfactory
statistical analysis plan in advance you can
assure the exclusion of a lot of errors (eg lack
of proper controls). Optimal analysis no longer
legitimate Some statistical tweaks are only
legitimate if planned in advance. eg one tailed
tests, equivalence limits.
6Interpretation of significance/non-significance
A statistically significant result provides
evidence against the null hypothesis and
therefore shifts the balance of evidence in
favour of the alternative hypothesis (There is an
experimental effect). But remember several things
7Significant results are not absolute
The evidence against the null hypothesis is not
absolute. If (say) P 0.01, we have merely
demonstrated that the results we obtained would
have been unlikely to arise if the null
hypothesis were true. We have not shown that the
null hypothesis is impossible.
8Results might have been unlikely to arise if the
null hypothesis is true, but the alternative
hypothesis may be even less likely!
A clinical trial of a homeopathic medicine
suggests there is a pharmacological effect (P
0.02).
Explanation 1 Null hypothesis is true.
Homeopathic medicine has no real effect. The
apparent effect we saw was due to a statistical
fluke that would arise on 1 occasion in 50.
Explanation 2 Alternative hypothesis is true.
Homeopathic medicine does work.
First explanation is difficult to believe, but
the other is even harder. Rational conclusion is
still that homeopathy doesnt work. (A series of
successful trials would eventually force
acceptance of effectiveness.)
9Statistical significance is not the same as
practical significance
- Where P lt 0.05 all you have produced is evidence
that an effect does exist. You should always
consider the size of the effect. - With a measured end point How much does the
mean value change as a result of the change in
treatment? - With a classified end point How great is the
change in the proportion of individuals falling
into each category?
10Non-significant doesnt justify a negative
conclusion
- A non-significant result may arise either because
- There is no effect present
- or
- There is an effect but your experiment lacks the
power to detect it.
11Non-significant doesnt justify a negative
conclusion
If you want to achieve an effective exclusion of
any difference, you must establish Equivalence
limits and compare your results to these. See
standard statistics lecture 9.
12Taken from Stats lecture 9
Determining whether two digoxin preparations are
equivalent
Mintab reports the 95 C.I. For the difference in
AUCs as -0.303 to 0.45
-0.8 -0.6 -0.4 -0.2 0
0.2 0.4 0.6 0.8
Change in AUC (µg.h.L-1)
Superimpose a Region of equivalence. Judgement
is that a difference of 0.6 µg.h.L-1 (or less)
is of no practical significance. Conclusion
Two preparations are Equivalent.
13Sample size calculations in the real world
Say we want to look at the effect of training on
successful completion of a task by hospital
pharmacists. Randomise pharmacists into 2
groups. Train one group and leave the others
alone (Controls). Test ability to complete the
task. Classify each individual as Successful
or Unsuccessful. Assume that 60 of controls
will be successful, that we want to be able to
detect an increase to 80 among the trained group
and that we want 80 power.
14Sample size for contingency c2 test
Size of difference between outcomes to be
detected
Sample size calculation
n
Power required
15Calculating necessary sample size
Follow the menus Stat Power and Sample Size 2
Proportions ...
1680 success
Power of 80
60 success
17Minitab output
Power and Sample Size Test for Two
Proportions Testing proportion 1 proportion 2
(versus not ) Calculating power for proportion 2
0.8 Alpha 0.05 Sample
Target Proportion 1 Size Power Actual
Power 0.6 82 0.8
0.803780 The sample size is for each group.
Require 82 controls plus 82 trained.
18Unrealistic approach it another way
Within un undergraduate project, there is no way
that we will be able to experiment on 164
pharmacists! Start out by deciding the maximum
number we might conceivably deal with. Say this
is 25 controls and 25 trained. Now use Minitab
to calculate the size of change that would be
detectable
19Max group size we can deal with
60 success rate for controls
20Mintab output
Power and Sample Size Test for Two
Proportions Testing proportion 1 proportion 2
(versus not ) Calculating power for proportion 2
0.6 Alpha 0.05 Sample Size Power
Proportion 1 Proportion 1 25 0.8
0.928484 0.219428 The sample size is for
each group.
A sample size of 25 would allow us to distinguish
between a success rate of 60 and one of 93 or
22
21Is the experiment worth doing?
- May decide either that
- There is no realistic probability that the
training method will raise success rates to 93.
So, even if the training was pretty successful
(Maybe raise success rates to 85) the experiment
would still be virtually guaranteed to produce a
non-significant result. Abandon the whole
proposal. - or
- Training might be that successful, so it is
worth carrying on. If we do, we must remember
that the experiment has less than optimum power
and a non-significant result must not be
interpreted as definite evidence that the
training failed. Non-significance could simply
reflect the lack of power of our experiment.
22Keep it simple Keep it clear
t-test versus ANOVA Compare 2 treatments (A B).
If the t-test produces a significant result the
interpretation is unambiguous. Treatment A leads
to higher/lower values than B. Compare 5
treatments (A E). If an ANOVA produces
significance, where are the differences? Can use
Follow up tests such as Tukey test, but even
that never as clear as the t-test. 2X2
contingency table versus large table Even worse!
There are no follow up tests.
23Keep it simple Keep it powerful
If treatments A B genuinely differ from one
another, a simple t-test comparison of the 2 may
show a significant difference. However if these
2 are accompanied by a string of additional
treatments that produce results intermediate
between A B and an ANOVA is used, the
significance may be masked.
24Keep it simple Keep it powerful
eg Real purpose of experiment is to see whether
an Iridium catalyst will increase the yield of a
chemical process where a platinum catalyst is
currently used. However various other metals
(Palladium, Pt/Ir alloy and Pd/Ir alloy) are
available, so we try these as well.
25Keep it simple Keep it powerful
Yields of product (g)
Pt Ir Pd Pt/Ir Pd/Ir 3.45 5.34 2.23 3.71 2.611.81
3.61 2.92 3.97 3.142.95 3.25 2.12 1.41 2.420.89
2.66 2.25 3.22 3.672.22 2.26 3.56 2.43 3.113.57
3.97 1.24 1.83 3.272.79 3.39 2.01 3.73 1.932.06
1.41 4.93 4.45 3.202.38 4.13 3.10 2.39 2.081.94
4.99 3.06 1.70 3.91
26If wed kept it simple
Two-Sample T-Test and CI Platinum, Iridium
Two-sample T for Platinum vs Iridium
N Mean StDev SE MeanPlatinum 10 2.406
0.811 0.26Iridium 10 3.50 1.20
0.38 Difference mu (Platinum) - mu
(Iridium) Estimate for difference -1.09500 95
CI for difference (-2.07008, -0.11992) T-Test
of difference 0 (vs not ) T-Value -2.39
P-Value 0.030 DF 15
Statistical significance is achieved
27But we would get smart
One-way ANOVA Platinum, Iridium, Palladium,
Pt/Ir, Pd/Ir Source DF SS MS F
PFactor 4 6.314 1.578 1.68 0.172Error
45 42.372 0.942Total 49 48.686 S 0.9704
R-Sq 12.97 R-Sq(adj) 5.23
Statistical significance is no longer achieved
28But we would get smart
Individual 95 CIs For Mean Based on Level
------------------------------------ Platin
um (------------------) Iridium
(-------------------) Palladium
(-------------------) Pt/Ir
(------------------) Pd/Ir
(------------------)
------------------------------------
1.80 2.40 3.00 3.60
Platinum and Iridium contrast strongly (Sig), but
within the group of 5 metals contrasts are
generally weaker (Non-sig.)
29Avoid multiple analyses
If you test a treatment that has absolutely no
effect, there is always a 5 risk that random
sampling error will lead to an apparent effect
great enough to pass as statistically
significant. That level of risk is considered
acceptable. However, if you make 10
comparisons, there is a 40 risk that at least
one will generate a false positive. Ultimately
this becomes a problem.
30Avoid multiple analyses
- Need to apply some common sense. Many projects
will realistically need more than one statistical
analysis, but - Avoid unnecessary proliferation of tests
- Consider declaring (in advance) one or two tests
as being Primary and others as Secondary. If
the latter are significant, the results would
need to be confirmed by further work. - Be especially wary of an odd isolated
Significant result amid a sea of
non-significance, after a long series of tests.
31Other techniques are available
- In my lectures (L2) I only covered a limited
range of possible experimental structures. Your
project may well not fit any of these. - Does a measured variable affect a categorical
one. (eg Does age affect the likelihood that
patients will comply with instructions?) - Is a measured endpoint affected by two factors
one of which is a classification and the other a
measured value. (eg Is blood pressure affected
by age and gender?) - Dont panic!!! Theres a statistical procedure
for most experimental structures that you are
likely to use.
32Summary
- Write statistical analysis plan before
generating data - Think about how you will interpret significance
or non-significance - Can be flexible about sample size calculations
but you must consider power - Keep it simple
- Beware of multiple analyses
- Ask about additional statistical procedures