EPI820 EvidenceBased Medicine - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

EPI820 EvidenceBased Medicine

Description:

BST is different from the mean body wt. of control cows. Ux Uy ... 20 cows randomized to BST (X) and control (Y). Measure wt. gain. Calculate mean wt. ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 33
Provided by: Epidem3
Learn more at: https://www.msu.edu
Category:

less

Transcript and Presenter's Notes

Title: EPI820 EvidenceBased Medicine


1
EPI-820 Evidence-Based Medicine
  • LECTURE 7 CLINICAL STATISTICAL INFERENCE
  • Mat Reeves BVSc, PhD

2
Objectives
  • Understand the theoretical underpinnings and the
    flaws associated with the current approach to
    clinical statistical testing (the frequentist
    approach).
  • Understand the difference between testing and
    estimation
  • Understand the advantages of the CI and the CI
    functions.
  • Understand the logic of a Bayesian Approach

3
Personal Statistical History.
  • Post-DVM
  • Clue-less. Sceptical of the role of statistics
  • Thinks research the search for P lt 0.05
  • PhD Era
  • Increasing obsession with stat methods
  • Lots of tools! SLR, ANOVA, MLR, LR, LL Cox
  • Thinks statistics real science
  • Post-PhD
  • Healthy scepticism for the way stats are used
  • Stats methods which have inherent limitations
  • Not a substitute for clear scientific thought or
    understanding the scientific method

4
  • Review of Significance Tests

Substantive hypothesis Cows on BST will tend to
gain weight
Null hypothesis (Ho) the mean body wt. of cows
trt with BST is not different from the mean
body wt. of control cows Ux
Uy
Alternative hypothesis (Ha) the mean body wt. of
cows trt with BST is different from the
mean body wt. of control cows
Ux ? Uy
5
  • Review of Significance Tests

- Logically, if Ho is refuted Ha is confirmed -
investigator seeks to 'nullify' Ho
Expt
20 cows randomized to BST (X) and control (Y).
Measure wt. gain. Calculate mean wt. change per
group.
6
  • Review of Significance Tests

Assumptions
i) Sample statistic (X - Y) is one instance of an
infinitely large number of sample statistics
obtained from an infinite number of replications
of the expt., under the same conditions
(frequentist assumption)
ii) Populations are normally distributed, equal
variance iii) The Ho is true
7
  • Review of Significance Tests (t-test)

N (0, 1) df (n1 1) (n2 1)
Where
standard error of the difference between two
independent means.
S2 estimate of pooled population variance
- t may take on any value, no value is logically
inconsistent with Ho! Smaller t values are
more consistent with Ho being true. - all else
equal, larger ns increase value of t (higher
power).
8
  • Review of Significance Tests

Large values of t indicate i) test assumptions
are true, a rare event has occurred ii) one of
the assumptions of the test is false, and by
convention it is assumed that the Ho is not
true.
- By convention, relative frequency of t where we
decide to choose (ii) above as a logical
conclusion is set to 5 (alpha level or
significance level) - Expt t 2.55, p
0.02, reject Ho - result is significant
9
  • Review of Significance Tests

- Type 1 error (alpha), occurs 5 of the time
when Ho is true
- Type II error (beta), occurs B of the time
when Ho is false
- Alpha and beta are inversely related
- Fixing alpha at 5, means Sp is 95
- Beta is not set 'a priori, hence Se (power)
tends to be low
- Scientific caution dictates that set alpha
small
- Scientific ignorance dictates we ignore beta!
10
Alpha and beta are inversely related
?
?
11
  • Relationship between diagnostic test result and
    disease status

DISEASE
PRESENT (D)
ABSENT (D-)
PVP a a b
TP
FP
POSITIVE (T)
a
b
TEST
d
c
PVN d c d
FN
TN
NEGATIVE (T-)
Sp d/b d
Se a/a c
Sp P(T-D-)
Se P(TD)
12
  • Relationship between significance test results
    and truth

TRUTH
Ho False
Ho True
TP
FP
PVP TP TP FP
REJECT Ho
(1 - B)
Type I (a)
SIGNF.
TEST
PVN TN TN FN
FN
TN
ACCEPT Ho
(1 - a)
Type II (B)
Sp TN/TN FP
Se TP/TP FN
Se Power (1 - B)
13
  • Power

- Probability of rejecting Ho when Ho is false
- Se TP/(TP FN) or (1 - B)
- Power is a function of
i) Alpha (increase by making Ha one sided i.e.,
Ux gt Uy) (consistent with changing the
cut-off value)
ii) Reliability (as measured by SE of the
difference)
- Power increases with decreasing SE
- SE decreases with increasing sample size (
decr variance)
iii) Size of treatment effect
14
  • The Consequences of Low Power

i) difficult to interpret negative results
- truly no effect
- expt unable to detect true difference
ii) increase proportion of type 1 errors in
literature
iii) fail to identify many important associations
iv) low power means low precision (indicated by
the confidence interval)
15
Questions?
  • What proportion of statistically significant
    findings published in the literature are false
    positive (Type 1) errors?
  • What well known measure is this proportion? and,
    what elements does this figure therefore depend
    on?

16
Hypothetical outcomes of 500 experiments, a
0.05, Power 0.50, and 20 prevalence of false
Hos
  • TRUTH

Ho FALSE
Ho TRUE
PV 50/70 71
50
20
REJECT Ho
SIGNF. TEST
50
380
ACCEPT Ho
N 500
100
400
Se 50
Sp 95
If all signf. results published, 29 are Type 1
errors
17
  • The P value

- probability of obtaining a value of the test
statistic (X) at least as large as the one
observed, given the Ho is true
  • P (gtX Ho true)

Common Incorrect Interpretations
  • It is NOT P (Ho trueData)!!!

- We can never state the probability of a
hypothesis being true! (under the frequentist
approach)
  • The probability that the results were due to
    chance!

18
  • Criticisms of Significance Tests

i) Decision vs Inference (Neyman-Pearson)
- pioneers of modern statistics were interested
in producing results that enabled decisions to
be made
- problem of automatic acceptance or rejection
based on an arbitrary cutoff (P 0.04 vs P0.06)
- results should adjust your degree of belief in
a hypothesis rather than forcing you to accept
an artificial dichotomy
- "intellectual economy"
19
  • Criticisms of Significance Tests

ii) Asymmetry of significance tests
- frequently, the experimental data can be found
to be consistent with a Ho of no effect or a Ho
of a 20 increase
- acceptance of both Ho's given the data leads to
2 very different conclusions!
- asymmetry was recognized by Fisher, hence
convention is to identify theory with the Ha but
to test the Ho
- Is there an effect? is the wrong question!
Should ask What is the size of the effect?
20
  • Criticisms of Significance Tests

iii) Corroborative power of significance tests
- Both Fisherian and Neyman-Pearson schools make
no assumption about the prior probability of Ho
- Both schools presume Ho is almost always false
- rejection of Ho does nothing to illuminate
which of the vast number of Has are supported
by the data!
- Failing to reject Ho does not prove Ho is true
(Popper 'we can falsify hypotheses but not
confirm them')
21
  • Criticisms of Significance Tests

iv) Effect size and significance tests
- Test statistics and p values are a function of
both effect size and sample size
- Cannot infer size of an effect by inspection of
the P value reporting Plt 0.00001 has no
scientific merit!
- Highly significant results may be derived from
trivial effects if sample size is large.
- Confidence intervals give plausible range for
the unknown popl parameter (signf tests show
what the parameter is not!)
22
Relationship between the Size of the Sample and
the Size of the P Value
  • Example RCT
  • Intervention new a/b for pneumonia.
  • Outcome Recovery Rate of patients in
    clinical recovery by 5 days
  • Facts
  • Known Existing drug of choice results in 35
    recovery rate at 5 days
  • Unknown New drug improves recovery rate by 5
    (to 40)

23
P values Generated by RCT by Sample Size
24
  • Conclusion?

Significance testing should be abandoned and
replaced with interval estimation (point estimate
and CI)! Why?
- not couched in pseudo-scientific hypothesis
testing language
- do not imply any decision making implications
- give plausible range to unknown popl parameter
- gives clue as to sample size (width of the CI)
- avoids danger of inferring a large effect when
result if highly significant
25
  • Interval estimation

- view "experimentation" as a measurement exercise
- want an unbiased, precise measure of effect
- Point estimate best estimate of the true
effect, given the data (aka MLE) and it indicates
the magnitude of effect (but is imprecise)
- Confidence intervals indicate degree of
precision of estimate. Represent a set of all
possible values for the parameter that are
consistent with the data
- width of CI depends on variability and level of
confidence ()
26
  • Interval estimation

- 90 CI
  • 90 of such intervals will include the true
    unknown popl. parameter (necessary frequentist
    interpretation)
  • - it does not represent a 90 probability of
    including the true unknown popl. parameter within
    it

- CIs indicate magnitude and precision.
- CI are linked to alpha and hypothesis testing
(1 - alpha) 95
27
  • Interval estimation - Example

OUTCOME
-

TRT A
P(success) 35
7
13
20
20
P(success) 70
6
14
TRT B
Significance test P 0.06 or NS!
Interval estimation of difference 35 (95CI
-1,71)
28
  • Confidence Intervals

- CI are non-uniform, true parameter is more
likely to be located centrally than near to
limits. Therefore precise location of
boundary is irrelevant!
  • - For a study to be reassuring about a lack of
    effect, boundaries
  • of CI should be near the null value
  • CIs have clear advantages over the p-value but
    still suffer from
  • the necessary frequentist interpretation (a CI
    represents one member of a family of CIs produced
    by an infinite number of replications of the same
    experiment)

- CI functions
29
Which is the more important study?
Study A
  • Study B

larger effect
null point
30
Importance of Beta (Type II error) and Sample
Size in RCTs (Freiman et al 1978)
  • Reviewed 71 negative (P gt 0.05) RCT published
    from 1960-77
  • Assume 25 treatment effect
  • 94 (N 67) of trials had lt 90 power
  • Only 15 (N 10) had sufficient evidence to
    conclude no effect
  • Assume 50 treatment effect
  • 70 (N 50) of trials had lt 90 power
  • Only 32 (N 16) had sufficient evidence to
    conclude no effect

31
The P Value Fallacy - Goodman
  • Derives from the simultaneous application of the
    p-value as
  • A long-run, error based, deductive tool (Neyman
    Pearson frequentist application), and
  • A short-run, evidential and inductive tool (i.e.,
    what is the meaning of this particular result?)
  • The p-value was never designed to serve these two
    conflicting roles

32
The Bayes Factor - Goodman
  • Comparison of how well two hypotheses predict the
    data
  • P (Data given the Ho)
  • P (Data given the Ha)
  • Allows explicitly the incorporation of external
    evidence (in terms of prior probability/belief)
  • Use of Bayesian statistics shows that weight of
    evidence against the Ho is not as strong as the
    p-value suggests (Table 2)
Write a Comment
User Comments (0)
About PowerShow.com