ASNEMGE - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

ASNEMGE

Description:

... (H0) explanation with the systematic plus-chance explanation (HA) for the ... Methods: Fifty-two healthy volunteers not taking NSAIDs, alcohol, antibiotics, ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 52
Provided by: vale233
Category:

less

Transcript and Presenter's Notes

Title: ASNEMGE


1
ASNEMGE
  • Young Investigator Meeting
  • Vienna April 21-23, 2006

Maastricht University Department of Methodology
and Statistics
2
Outline
  • What do clinician expects from statistics could
    that be a source of misunderstanding??
  • The infamous Null Hypothesis Significance Testing
    (NHTS)
  • What it is and what it is not
  • A small selection of questions of practitioners
    interest Sample size (power), randomisation,
    counfounding, bias and interaction
  • Interactive session

3
The present (though partly challenged) state of
affairs
  • The simplistic dichotomisation of hypothesis
    testing, already manifested in the language
    usage
  • Accept/reject, significant/non-significant
  • p-value has been handled and regarded as a
    magical device (it is often overrated and
    misinterpreted)
  • There is this misconception that significance
    testing will put all doubts aside

4
What medical doctors wish for
  • As a model of clarity, I commend to you the
    style of Improvised Munitions Handbook, in
    which the authors append a comment for each
    material This material was tested It is
    effective
  • All else is recondite, if elegant, but ultimately
    fruitless ratiocination
  • Can you afford to share this opinion, even if
    this point of view is temptingly attractive and
    practical?
  • Should you get involved with obsessively abstruse
    mathematics and futile conjectures??

5
How things really are
  • Though statistical calculations carry an aura of
    numerical exactitude, debate necessarily
    surrounds statistical conclusions, MADE AS THEY
    ARE AGAINST A BACKGROUND OF UNCERTAINTY
  • Statistical tests are aid to wise judgment, NOT A
    TWO-VALUED LOGICAL DECLARATION OF TRUTH OR
    FALSITY

6
Statistical Inference
  • The process of drawing conclusions about a
    populations on the basis of measurement of
    observations made on a sample of individuals from
    the population
  • Statistics provides the tools for the ancient
    inductive process of reasoning from the
    particular to the general
  • Statistics deals with some general methods of
    finding patterns that are hidden in a cloud of
    irrelevancies, of natural variability, and of
    error prone observations or measurements

7
Inference from Sample to Population
  • Population parameters
  • s (Standard deviation)
  • µ (Mean)

Sample Statistics (or parameters) Standard
deviation - s Sample mean -
8
NHST The formal structure
  • Outcome variable and its measurement scale
  • Distributional properties (put on hold!)
  • Null hypothesis H0
  • Alternative hypothesis HA
  • NHTS a confrontation of all-chance (H0)
    explanation with the systematic plus-chance
    explanation (HA) for the observed data
  • Because most of the time you usually test what
    you want to discredit, NHST has been described as
    the ritualised exercise of devils advocate

9
Null Hypothesis Significance Testing (NHST) A
gist
  • The starting assumptions of hypothesis testing is
    that the observed data can be primarily explained
    by chance factors. This starting assumption is
    formulated as the null hypothesis, alias H0

10
A small digression The Normal distribution
11
The H0 distribution and a

Critical area

H0
HA
HA
12
Formulating the hypothesis(interactive session)
  • NSAIDs frequently cause gastrointestinal injury
    and increase the risk of ulcer complications
  • An RCT is carried out to test whether a NSAID
    (etodolac) causes less gastric injury than a
    standard NSAID (naproxen) in a double-blind trial
    assessing the effects on gastroduodenal injury,
    symptoms and prostaglandin production in healthy
    volunteers
  • (Gastrointest Endosc 1995 42428-33)

13
The logic behind NHSTThe SIGNAL/NOISE ratio
  • Often the test statistics (z, t, F) are referred
    to as the SIGNAL/NOISE ratio which represents the
    basic rationale of any statistical test!

HA
H0
14
The SIGNAL/NOISE ratio
  • If the signal, the difference, is large enough,
    AS COMPARED TO THE NOISE, then it is reasonable
    to conclude that the signal has some effect.
  • If the signal does not rise above the noise
    level, then it is reasonable to conclude that no
    association (systematic effect) exists.
  • The basis of all inferential statistics is to
    attach a probability to this ratio.

15
(No Transcript)
16
Errors in hypothesis testing
  • Decision Null hypothesis is
  • _______________________________
  • True False
  • __________________________________________________
    __
  • Reject null Type I Error Correct
    decision
  • (a) (power1-ß)
  • Fail to reject null Correct decision
    Type II Error
  • confidence (1-a) (ß)

17
The H0 and HA distributions a and ß
Power 1- ?
Critical value(s)
18
What is the p-value??
p lt a gt H0 discredited given the data
Observed value (sample measurement)- Test
statistic
19
p-value continued
p gtgt a gt Not enough evidence to dismiss H0. It
is retained.
Observed value (sample measurement)- Test
statistic
20
Statistical significance and the p-value
  • The p-value can be thought as a probability of
    obtaining a test statistic as extreme as or more
    extreme than the actual test statistic obtained,
    GIVEN THAT THE NULL HYPOTHESIS IS TRUE!
  • All that a significant result implies is that
    one has observed something relatively unlikely to
    happen given the hypothetical situation.
    Everything else is a matter of what one does with
    this information.
  • Statistical significance is a statement about the
    likelihood p(dataH0) of the observed result,
    nothing else. It does not guarantee that
    something important or even meaningful has been
    found.

21
The language of the null hypothesis testing
  • The terms of accepting or rejecting the null
    hypothesis are too strong.
  • An alternative would be to replace these terms by
    more moderate ones
  • Retaining the null hypothesis, or treating it
    as viable and discrediting or dismissing the
    null hypothesis

22
The power of a test
  • Definition
  • Statistical power refers to the ability of a
    statistical test to detect relationships between
    variables to detect a difference, i.e. an effect
    of specified size, when it exists.
  • It is the basis of procedures for estimating the
    sample size needed to detect an effect of a
    particular magnitude.
  • Official definition The power refers to the
    probability of dismissing the null hypothesis
    when it is false (1-ß), where ß represents the
    probability of retaining a null hypothesis, when
    it is false.

23
Power
  • Power is a direct function of four variables
  • Significance level (a)
  • Sample size (N)
  • Effect size
  • The type of statistical test being conducted

24
How can you increase power ?
(1) Select a larger a
25
How can you increase power ?
(1) Select larger a
26
How can you increase power?
(1) Select larger a
27
How can you increase power?
(1) Select a larger a
4
3
2
1
0
1.4
1.6
1.8
2.2
2
x
H0
HA
28
How can you increase power ?
(2) Increase the difference between populations
means
29
How can you increase power ?
(2) Increase the relevant effect difference
HA
H0
30
How can you increase power?
(3) Increase the sample size gt standard error
decreases
31
How can you increase power?
(3) Larger sample size
32
How can you increase power ?
(3) Larger sample size
33
How can you increase power ?
(3) Larger sample size
4
3
2
1
0
1.4
1.6
1.8
2.2
2
x
34
Power formulas
For a continuous outcome variable
For proportions
With c1 z0.80 0.84 z0.90 1.28 z0.95
1.65 z0.975 1.96, s2 p(1 - p) and p (p1
p2)/2
35
A randomized, double-blind comparison of placebo,
etodolac, and naproxen On gastro-intestinal
injury and prostaglandin production
Background NSAIDs frequently cause
gastrointestinal injury and increase the risk of
ulcer complications. We compared an NSAID
suggested to cause less gastric injury (etodolac)
with a standard NSAID (naproxen) and a placebo in
a 4-week double-blind trial assessing the effects
on gastroduodenal injury, symptoms, and
prostaglandin production in healthy
volunteers.Methods Fifty-two healthy
volunteers not taking NSAIDs, alcohol,
antibiotics, bismuth, or anti-ulcer drugs and
with a normal endoscopic examination were
randomly assigned to identical drugs placebo,
etodolac 400 mg, or naproxen 500 mg b.i.d. for 4
weeks. Endoscopies with biopsies were repeated at
weeks 1 and 4. The number and dimensions of
ulcers and erosions were recorded to quantitate
injury.Results At week 1 the mean number and
area of gastric ulcers per subject were greater
with naproxen than placebo or etodolac (area
naproxen, 7.4 mm2 placebo, 0.6 mm2, p 0.02 vs
naproxen etodolac, 2.1 mm2, p 0.06 vs
naproxen). Ulcer scores at week 4 were low and
comparable in the three groups. The mean number
and area of gastric erosions per subject were
greatest with naproxen at both weeks 1 and 4
(week 4 area naproxen, 58.3 mm2 placebo, 29.0
mm2 etodolac, 13.9 mm2, p lt 0.02, naproxen vs
placebo and vs etodolac). Placebo injury was
presumably due to biopsies at prior endoscopy.
Gastric mucosal prostaglandin E2 production did
not change significantly from baseline after 1 or
4 weeks of treatment with placebo or etodolac,
but did decrease significantly with naproxen
(week 0, 1689 week 1, 479 week 4, 577 pg/mg
protein). Gastrointestinal symptoms were present
in only 1 (5) of 20 visits in which endoscopy
showed no erosions or ulcers vs 21 (26) of 82
visits in which a mucosal defect was identified
(p 0.066).Conclusions Gastric injury with 4
weeks of etodolac is comparable to that seen with
placebo and significantly less than that
occurring with naproxen, presumably due to the
fact that etodolac does not suppress gastric
mucosal prostaglandin production, whereas
naproxen leads to a significant reduction.
(Gastrointest Endosc 199542428-33.)
36
Interactive session I
  • Acute upper gastrointestinal bleeding (UGIB) is a
    serious complication of a variety of GI diseases,
    and is associated with a mortality rate around
    10-20. A new antifibrinolytic drug is to be
    tested to see if it can reduce the mortality in
    UGIB patients, and you are asked to suggest a
    sample size for a proposed placebo-controlled
    trial. How would you explain to the investigators
    the need to specify a clinically significant
    difference to be detected, and what size would
    you recommend if they agree on a 20 reduction in
    mortality??

37
Interactive session IIThe NSAID example
  • 2. How would you set up a RCT to investigate the
    effect of the suggestively less detrimental
    NSAID? And given the study design, which
    statistical technique would you consider adequate
    to test your hypothesis? In order to answer this
    question, consider the following aspects
  • Which is your outcome variable and how is it
    measured (which scale)?
  • How many groups will be compared?
  • Do you think it suffices to measure the patients
    only once (cross-sectional) or, preferably twice
    with a pre and post measurement?
  • Are the samples to be compared dependent or
    independent from each other?
  • Consider an extra explanatory factor,
    additionally to the NSAID, you know to have an
    influence on your outcome measure. How would your
    study design be changed by considering this extra
    variable? Would you carry several tests
    separately, for each variable, or one test, with
    the two variables together?

38
Study designs
  • Two samples are said to be independent when the
    data points in one sample are unrelated to the
    data points in a second sample.
  • An experimental study design The Randomised
    Clinical Trial (RCT). Random allocation of
    patients to case and control groups (independent
    samples).
  • Observational study designs
  • The cross-sectional where the participants are
    seen/ measured at only one point in time
    (independent measures)
  • The longitudinal, follow-up study, where the same
    group of people are followed over time
    (dependent, repeated measures).
  • In these different study designs, they all share
    a common denominator the main question of
    interest is often the comparison of the mean
    values of two groups
  • The simplest method to make this comparison
    between groups is the Students t-test.

39
The set-up
  • A study is carried out to evaluate the effect of
    the new NSAID on the prostaglandin concentration
  • Three situations are distinguished
  • One of independent samples with drug(s) and
    control (placebo) groups (CASE I)
  • A second case in which one group is measured
    twice, before and after the administration of the
    drug (CASE II)
  • A third case with dependent samples (before and
    after measurements) within independent
    experimental and control groups (CASE III)
  • Evaluate the pros and contras of each of these
    design settings and choose the adequate
    statistical test.

40
The t-test
  • The usual signal/noise ratio

41
  • Subject Placebo NSAID 1
  • --------------------------------------------------
    --
  • 1 65 62
  • 2 88 86
  • 3 125 118
  • 4 103 105
  • 5 90 91
  • 6 76 72
  • 7 85 81
  • 8 126 122
  • 9 97 95
  • 10 142 145
  • 11 132 132
  • 12 110 105
  • Mean 103.3 101.2
  • SD 24.0 24.8

CASE I separate, independent groups
42
With SPSS
Group Statistics
Independent Samples Test
The test statistic
The p-value.
43
A different setting
  • Now, instead of having a drug and placebo groups,
    lets follow a single group of patients, whose
    prostaglandin concentrations are measured before
    and after the intervention, i.e. the drug
    administration.

44
  • Subject Pretest Posttest Difference (d)
  • --------------------------------------------------
    ----------------------------
  • 1 65 62 -3
  • 2 88 86 -2
  • 3 125 118 -7
  • 4 103 105 2
  • 5 90 91 1
  • 6 76 72 -4
  • 7 85 81 -4
  • 8 126 122 -4
  • 9 97 95 -2
  • 10 142 145 3
  • 11 132 132 0
  • 12 110 105 -5
  • Mean 103.3 101.2 -2.1
  • SD 24.0 24.8 3.02

CASE II repeated measures
45
Paired Samples Statistics
Paired Samples Correlations
Paired Samples Test
The test statistic
The p-value
46
Note.
  • The Signal the numerator for the computation of
    the test statistics (the signal) is the same for
    the dependent and independent cases (the
    difference is 2.4). The distinction between the
    two cases lies in the denominator, representing
    the noise variability
  • The Noise The SD of pre-measures and
    post-measures are quite large (25), reflecting
    the large stable differences in prostaglandin
    concentrations of human beings
  • By contrast, the SD of the concentration
    differences is much smaller, only 3.5
  • Stable differences between individuals are far
    greater than any likely difference resulted from
    treatment within individuals
  • Between subject SD is much larger than within
    subject SD. By using the SD of the differences,
    we are eliminating the intrinsic variability of
    prostaglandin distribution in the population
    from the noise.
  • In this way you can regards each single
    individual is her/his own control.

47
To sum up
  • Drastic different conclusions results from the
    application of two different statistical
    approaches to the data (Same numerical data,
    different experimental designs!!!!)
  • The appropriate test procedure is determined by
    how the experiment is performed with independent
    or dependent samples (repeated or matched
    measures)
  • In CASE II the experiment is organised in such a
    way as to measure individual rather than average
    losses over the population, which is represented
    by CASE I.

48
Advantages of the paired approach
  • You eliminate between subject difference from the
    denominator (the noise variability) of the test
  • This can lead to a potential gain in statistical
    power
  • It might also be possible to correct for baseline
    differences between groups (inadequate
    randomisation)
  • This advantage only exists as long as the
    subjects or pairs have systematic differences
    between them. If this is not the case, the test
    can result in a loss, instead of gain in
    statistical power.
  • Which is the shortcoming???

49
  • Experimental Control
  • Subject Pretest Posttest (d) Pretest
    Posttest (d)
  • --------------------------------------------------
    ------------------------------------
  • 1 65 62 -3 68 70 2
  • 2 88 86 -2 122 123 1
  • 3 125 118 -7 84 83 -1
  • 4 103 105 2 95 97 -2
  • 5 90 91 1 106 106 0
  • 6 76 72 -4 71 72 1
  • 7 85 81 -4 87 86 -1
  • 8 126 122 -4 147 152 5
  • 9 97 95 -2 129 131 2
  • 10 142 145 3 136 138 2
  • 11 132 132 0 105 104 -1
  • 12 110 105 -5 99 100 1
  • Mean 103.3 101.2 -2.1 104.1 105.2
    1.1
  • SD 24.0 24.8 3.02 25.2 26.2 1.73

CASE III Separate groups, Within each
group Repeated measures
50
(No Transcript)
51
Reminder
  • Most statistical tests, relates the magnitude of
    an observed difference to the probability that
    such a difference might occur by chance alone.
  • The notion of statistical significance is
    embodied in this probability. But statistical
    significance does not, of itself, reveal anything
    about the importance of the observed difference.
Write a Comment
User Comments (0)
About PowerShow.com