Update on statistics - PowerPoint PPT Presentation

1 / 32
About This Presentation

Update on statistics


However various other metals (Palladium, Pt/Ir alloy and Pd/Ir alloy) are ... One-way ANOVA: Platinum, Iridium, Palladium, Pt/Ir, Pd/Ir. Source DF SS MS F P ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 33
Provided by: Temp108


Transcript and Presenter's Notes

Title: Update on statistics

Update on statistics
Phil Rowe Liverpool School of Pharmacy
Update on statistics
  • Plan ahead
  • Interpretation of significance/non-significance
  • Sample size calculations in the real world
  • Keep it simple - keep it clear
  • Keep it simple - keep it powerful
  • Avoid multiple analyses
  • Other techniques are available

Plan ahead
Or else
  • Inadequate power
  • Poor experimental design
  • Results incapable of analysis
  • Optimal analysis no longer legitimate

Plan ahead
Inadequate power Amazing proportion of published
experiments were virtually guaranteed to produce
a non-significant result even if a sizeable
experimental effect had been present. Poor
experimental design What statistical methods are
applicable will depend upon details of
experimental design. If you perform an unpaired
experiment, you may have to use (say) a
two-sample t-test, whereas a paired design could
have allowed the more powerful paired t-test
Plan ahead
Results incapable of analysis Fundamental flaws
in experimental design should be identifiable
even without considering statistical analysis.
However, if you produce a satisfactory
statistical analysis plan in advance you can
assure the exclusion of a lot of errors (eg lack
of proper controls). Optimal analysis no longer
legitimate Some statistical tweaks are only
legitimate if planned in advance. eg one tailed
tests, equivalence limits.
Interpretation of significance/non-significance
A statistically significant result provides
evidence against the null hypothesis and
therefore shifts the balance of evidence in
favour of the alternative hypothesis (There is an
experimental effect). But remember several things

Significant results are not absolute
The evidence against the null hypothesis is not
absolute. If (say) P 0.01, we have merely
demonstrated that the results we obtained would
have been unlikely to arise if the null
hypothesis were true. We have not shown that the
null hypothesis is impossible.
Results might have been unlikely to arise if the
null hypothesis is true, but the alternative
hypothesis may be even less likely!
A clinical trial of a homeopathic medicine
suggests there is a pharmacological effect (P
Explanation 1 Null hypothesis is true.
Homeopathic medicine has no real effect. The
apparent effect we saw was due to a statistical
fluke that would arise on 1 occasion in 50.
Explanation 2 Alternative hypothesis is true.
Homeopathic medicine does work.
First explanation is difficult to believe, but
the other is even harder. Rational conclusion is
still that homeopathy doesnt work. (A series of
successful trials would eventually force
acceptance of effectiveness.)
Statistical significance is not the same as
practical significance
  • Where P lt 0.05 all you have produced is evidence
    that an effect does exist. You should always
    consider the size of the effect.
  • With a measured end point How much does the
    mean value change as a result of the change in
  • With a classified end point How great is the
    change in the proportion of individuals falling
    into each category?

Non-significant doesnt justify a negative
  • A non-significant result may arise either because
  • There is no effect present
  • or
  • There is an effect but your experiment lacks the
    power to detect it.

Non-significant doesnt justify a negative
If you want to achieve an effective exclusion of
any difference, you must establish Equivalence
limits and compare your results to these. See
standard statistics lecture 9.
Taken from Stats lecture 9
Determining whether two digoxin preparations are
Mintab reports the 95 C.I. For the difference in
AUCs as -0.303 to 0.45
-0.8 -0.6 -0.4 -0.2 0
0.2 0.4 0.6 0.8
Change in AUC (µg.h.L-1)
Superimpose a Region of equivalence. Judgement
is that a difference of 0.6 µg.h.L-1 (or less)
is of no practical significance. Conclusion
Two preparations are Equivalent.
Sample size calculations in the real world
Say we want to look at the effect of training on
successful completion of a task by hospital
pharmacists. Randomise pharmacists into 2
groups. Train one group and leave the others
alone (Controls). Test ability to complete the
task. Classify each individual as Successful
or Unsuccessful. Assume that 60 of controls
will be successful, that we want to be able to
detect an increase to 80 among the trained group
and that we want 80 power.
Sample size for contingency c2 test
Size of difference between outcomes to be
Sample size calculation

Power required
Calculating necessary sample size
Follow the menus Stat Power and Sample Size 2
Proportions ...
80 success
Power of 80
60 success
Minitab output
Power and Sample Size Test for Two
Proportions Testing proportion 1 proportion 2
(versus not ) Calculating power for proportion 2
0.8 Alpha 0.05 Sample
Target Proportion 1 Size Power Actual
Power 0.6 82 0.8
0.803780 The sample size is for each group.
Require 82 controls plus 82 trained.
Unrealistic approach it another way
Within un undergraduate project, there is no way
that we will be able to experiment on 164
pharmacists! Start out by deciding the maximum
number we might conceivably deal with. Say this
is 25 controls and 25 trained. Now use Minitab
to calculate the size of change that would be
Max group size we can deal with
60 success rate for controls
Mintab output
Power and Sample Size Test for Two
Proportions Testing proportion 1 proportion 2
(versus not ) Calculating power for proportion 2
0.6 Alpha 0.05 Sample Size Power
Proportion 1 Proportion 1 25 0.8
0.928484 0.219428 The sample size is for
each group.
A sample size of 25 would allow us to distinguish
between a success rate of 60 and one of 93 or
Is the experiment worth doing?
  • May decide either that
  • There is no realistic probability that the
    training method will raise success rates to 93.
    So, even if the training was pretty successful
    (Maybe raise success rates to 85) the experiment
    would still be virtually guaranteed to produce a
    non-significant result. Abandon the whole
  • or
  • Training might be that successful, so it is
    worth carrying on. If we do, we must remember
    that the experiment has less than optimum power
    and a non-significant result must not be
    interpreted as definite evidence that the
    training failed. Non-significance could simply
    reflect the lack of power of our experiment.

Keep it simple Keep it clear
t-test versus ANOVA Compare 2 treatments (A B).
If the t-test produces a significant result the
interpretation is unambiguous. Treatment A leads
to higher/lower values than B. Compare 5
treatments (A E). If an ANOVA produces
significance, where are the differences? Can use
Follow up tests such as Tukey test, but even
that never as clear as the t-test. 2X2
contingency table versus large table Even worse!
There are no follow up tests.
Keep it simple Keep it powerful
If treatments A B genuinely differ from one
another, a simple t-test comparison of the 2 may
show a significant difference. However if these
2 are accompanied by a string of additional
treatments that produce results intermediate
between A B and an ANOVA is used, the
significance may be masked.
Keep it simple Keep it powerful
eg Real purpose of experiment is to see whether
an Iridium catalyst will increase the yield of a
chemical process where a platinum catalyst is
currently used. However various other metals
(Palladium, Pt/Ir alloy and Pd/Ir alloy) are
available, so we try these as well.
Keep it simple Keep it powerful
Yields of product (g)
Pt Ir Pd Pt/Ir Pd/Ir 3.45 5.34 2.23 3.71 2.611.81
3.61 2.92 3.97 3.142.95 3.25 2.12 1.41 2.420.89
2.66 2.25 3.22 3.672.22 2.26 3.56 2.43 3.113.57
3.97 1.24 1.83 3.272.79 3.39 2.01 3.73 1.932.06
1.41 4.93 4.45 3.202.38 4.13 3.10 2.39 2.081.94
4.99 3.06 1.70 3.91
If wed kept it simple
Two-Sample T-Test and CI Platinum, Iridium
Two-sample T for Platinum vs Iridium
N Mean StDev SE MeanPlatinum 10 2.406
0.811 0.26Iridium 10 3.50 1.20
0.38 Difference mu (Platinum) - mu
(Iridium) Estimate for difference -1.09500 95
CI for difference (-2.07008, -0.11992) T-Test
of difference 0 (vs not ) T-Value -2.39
P-Value 0.030 DF 15
Statistical significance is achieved
But we would get smart
One-way ANOVA Platinum, Iridium, Palladium,
Pt/Ir, Pd/Ir Source DF SS MS F
PFactor 4 6.314 1.578 1.68 0.172Error
45 42.372 0.942Total 49 48.686 S 0.9704
R-Sq 12.97 R-Sq(adj) 5.23
Statistical significance is no longer achieved
But we would get smart
Individual 95 CIs For Mean Based on Level
------------------------------------ Platin
um (------------------) Iridium
(-------------------) Palladium
(-------------------) Pt/Ir
(------------------) Pd/Ir
1.80 2.40 3.00 3.60
Platinum and Iridium contrast strongly (Sig), but
within the group of 5 metals contrasts are
generally weaker (Non-sig.)
Avoid multiple analyses
If you test a treatment that has absolutely no
effect, there is always a 5 risk that random
sampling error will lead to an apparent effect
great enough to pass as statistically
significant. That level of risk is considered
acceptable. However, if you make 10
comparisons, there is a 40 risk that at least
one will generate a false positive. Ultimately
this becomes a problem.
Avoid multiple analyses
  • Need to apply some common sense. Many projects
    will realistically need more than one statistical
    analysis, but
  • Avoid unnecessary proliferation of tests
  • Consider declaring (in advance) one or two tests
    as being Primary and others as Secondary. If
    the latter are significant, the results would
    need to be confirmed by further work.
  • Be especially wary of an odd isolated
    Significant result amid a sea of
    non-significance, after a long series of tests.

Other techniques are available
  • In my lectures (L2) I only covered a limited
    range of possible experimental structures. Your
    project may well not fit any of these.
  • Does a measured variable affect a categorical
    one. (eg Does age affect the likelihood that
    patients will comply with instructions?)
  • Is a measured endpoint affected by two factors
    one of which is a classification and the other a
    measured value. (eg Is blood pressure affected
    by age and gender?)
  • Dont panic!!! Theres a statistical procedure
    for most experimental structures that you are
    likely to use.

  • Write statistical analysis plan before
    generating data
  • Think about how you will interpret significance
    or non-significance
  • Can be flexible about sample size calculations
    but you must consider power
  • Keep it simple
  • Beware of multiple analyses
  • Ask about additional statistical procedures
Write a Comment
User Comments (0)
About PowerShow.com