Understanding P-values and Confidence Intervals - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

Understanding P-values and Confidence Intervals

Description:

Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08 Announcements Optional reading about P-values and Confidence Intervals on the ... – PowerPoint PPT presentation

Number of Views:298

Avg rating:3.0/5.0

Slides: 38

Provided by: ThomasN164

Category:

more less

Transcript and Presenter's Notes

Title: Understanding P-values and Confidence Intervals

1
Understanding P-values and Confidence Intervals

Thomas B. Newman, MD, MPH

20 Nov 08
2
Announcements

Optional reading about P-values and Confidence
Intervals on the website
Exam questions due Monday 11/24/08 500 PM
Next week (11/27) is Thanksgiving
Following week Physicians and Probability
(Chapter 12) and Course Review
Final exam to be distributed in SECTION 12/4 and
posted on web
Exam due 12/11 845 AM
Key will be posted shortly thereafter

3
Overview

Introduction and justification
What P-values and Confidence Intervals dont mean
What they do mean analogy between diagnostic
tests and clinical researc
Useful confidence interval tips
CI for negative studies absolute vs. relative
risk
Confidence intervals for small numerators

4
Why cover this material here?

P-values and confidence intervals are ubiquitous
in clinical research
Widely misunderstood and mistaught
Pedagogical argument
Is it important?
Can you handle it?

5
Example Douglas Altman Definition of 95
Confidence Intervals

"A strictly correct definition of a 95 CI is,
somewhat opaquely, that 95 of such intervals
will contain the true population value.
Little is lost by the less pure interpretation
of the CI as the range of values within which we
can be 95 sure that the population value lies.

Quoted in Guyatt, G., D. Rennie, et al. (2002).
Users' guides to the medical literature
essentials of evidence-based clinical practice.
Chicago, IL, AMA Press.
6
Understanding P-values and confidence intervals
is important because

It explains things which otherwise do not make
sense, e.g. the need to state hypotheses in
advance and correction for multiple hypothesis
testing
You will be using them all the time
You are future leaders in clinical research

7
You can handle it because

We have already covered the important concepts at
length earlier in this course
Prior probability
Posterior probability
What you thought before new information what
you think now
We will support you through the process

8
Review of traditional statistical significance
testing

State null (Ho) and alternative (Ha) hypotheses
Choose a
Calculate value of test statistic from your data
Calculate P- value from test statistic
If P-value lt a, reject Ho

9
Problem

Traditional statistical significance testing has
led to widespread misinterpretation of P-values

10
What P-values dont mean

If the P-value is 0.05, there is a 95
probability that
The results did not occur by chance
The null hypothesis is false
There really is a difference between the groups

11
So if P 0.05, what IS there a 95 probability
of?
12
White board

2x2 tables and false positive confusion
Analogy with diagnostic tests
(This is covered step-by-step in the course book.)

13
Analogy between diagnostic tests and research
studies
14
Analogy between diagnostic tests and research
studies
15
Extending the Analogy

Intentionally ordered tests and hypotheses stated
in advance
Multiple tests and multiple hypotheses
Laboratory error and bias
Alternative diagnoses and confounding

16
Bonferroni

Inequality If we do k different tests, each with
significance level a, the probability that one or
more will be significant is less than or equal to
k ? a
Correction If we test k different hypotheses and
want our total Type 1 error rate to be no more
than alpha, then we should reject H0 only if P lt
a/k

17
Derivation

Let A B probability of a Type 1 error for
hypotheses A and B
P(A or B) P(A) P(B) P(A B)
Under Ho, P(A) P(B) a
So P(A or B) a a - P(A B) 2a - P(A B).
Of course, it is possible to falsely reject 2
different null hypotheses, so P(A B) gt 0.
Therefore, the probability of falsely rejecting
either of the null hypotheses must be less than
2a.
Note that often A B are not independent, in
which case Bonferroni will be even more
excessively conservative

18
Problems with Bonferroni correction

Overly conservative (especially when hypotheses
are not independent)
Maintains specificity at the expense of
sensitivity
Does not take prior probability into account
Not clear when to use it
BUT can be useful if results still significant

19
CONFIDENCE INTERVALS
20
What Confidence Intervals dont mean

There is a 95 chance that the true value is
within the interval
If you conclude that the true value is within the
interval you have a 95 chance of being right
The range of values within which we can be 95
sure that the population value lies

21
One source of confusion Statistical confidence

(Some) statisticians say You can be 95
confident that the population value is in the
interval.
This is NOT the same as There is a 95
probability that the population value is in the
interval.
Confidence is tautologously defined by
statisticians as what you get from a confidence
interval

22
Illustration

If a 95 CI has a 95 chance of containing the
true value, then a 90 CI should have a 90
chance and a 40 CI should have a 40 chance.
Study 4 deaths in 10 subjects in each group
RR 1.0 (95 CI 0.34 to 2.9)
40 CI 0.75 to 1.33
Conclude from this study that there is 60 chance
that the true RR is lt0.75 or gt 1.33?

23
Confidence Intervals apply to a Process

Consider a bag with 19 white and 1 pink
grapefruit
The process of selecting a grapefruit at random
has a 95 probability of yielding a white one
But once Ive selected one, does it still have a
95 chance of being white?
You may have prior knowledge that changes the
probability (e.g., pink grapefruit have thinner
peel are denser, etc.)

24
Confidence Intervals for negative studies 5
levels of sophistication

Example 1 Oral amoxicillin to treat possible
occult bacteremia in febrile children
Randomized, double-blind trial
3-36 month old children with T 39º C (N 955)
Treatment Amox 125 mg/tid ( 10 kg) or 250 mg
tid (gt 10 kg)
Outcome major infectious morbidity

Jaffe et al., New Engl J Med 19873171175-80
25
Amoxicillin for possible occult bacteremia 2
Results

Bacteremia in 19/507 (3.7) with amox, vs 8/448
(1.8) with placebo (P0.07)
Major Infectious Morbidity 2/19 (10.5) with
amox vs 1/8 (12.5) with placebo (P 0.9)
Conclusion Data do not support routine use of
standard doses of amoxicillin

26
5 levels of sophistication

Level 1 P gt 0.05 treatment does not work
Level 2 Look at power for study. (Authors
reported power 0.24 for OR4. Therefore, study
underpowered and negative study uninformative.)

27
5 levels of sophistication, contd

Level 3 Look at 95 CI!
Authors calculated OR 1.2 (95 CI 0.02 to 30.4)
This is based on 1/8 (12.5) with placebo vs 2/19
(10.5) with amox
(They put placebo on top)
(Silly to use OR)
With amox on top, RR 0.84 (95 CI 0.09 to
8.0)
This was level of TBN in letter to the editor
(1987)

28
5 levels of sophistication, contd

Level 4 Make sure you do an intention to
treat analysis!
It is not OK to restrict attention to bacteremic
patients
So it should be 2/507 (0.39) with amox vs 1/448
(0.22) with placebo
RR 1.8 (95 CI 0.05 to 6.2)

29
Level 5 the clinically relevant quantity is the
Absolute Risk Reduction (ARR)!

2/507 (0.39) with amox vs 1/448 (0.22) with
placebo
ARR -0.17 amoxicillin worse
95 CI (-0.9 harm to 0.5 benefit)
Therefore, LOWER limit of 95 CI for benefit
(I.e., best case) is NNT 1/0.5 200
So this study suggests need to treat 200
children to prevent Major Infectious Morbidity
in one

30
Stata output

. csi 2 1 505 447
Exposed Unexposed
Total
------------------------------------------------
---
Cases 2 1
3
Noncases 505 447
952
------------------------------------------------
---
Total 507 448
955
Risk .0039448 .0022321
.0031414
Point estimate 95
Conf. Interval
-------------------------------
---------------
Risk difference .0017126
-.005278 .0087032
Risk ratio 1.767258
.1607894 19.42418
Attr. frac. ex. .4341518
-5.219315 .9485178
Attr. frac. pop .2894345
--------------------------------
---------------
chi2(1) 0.22
Prgtchi2 0.6369

31
Example 2 Pyelonephritis and new renal scarring
in the International Reflux Study in Children

RCT of ureteral reimplantation vs prophylactic
antibiotics for children with vesicoureteral
reflux
Overall result surgery group fewer episodes of
pyelonephritis (8 vs 22 NNT 7 P lt 0.05) but
more new scarring (31 vs 22 P .4)
This raises questions about whether new scarring
is caused by pyelonephritis

Weiss et al. J Urol 1992 1481667-73
32
Within groups no association between new pyelo
and new scarring

Trend goes in the OPPOSITE direction

RR0.28 95 CI (0.09-1.32)Weiss, J Urol
19921481672
33
Stata output to get 95 CI
. csi 2 18 28 58 Exposed
Unexposed Total -------------------------
---------------------------- Cases
2 18 20
Noncases 28 58
86 ---------------------------------------------
-------- Total 30 76
106
Risk .0666667 .2368421
.1886792
Point estimate
95 Conf. Interval
------------------------------------------------
Risk difference -.1701754
-.3009557 -.0393952 Risk ratio
.2814815 .069523 1.13965 Prev.
frac. ex. .7185185 -.1396499
.930477 Prev. frac. pop .2033543
--------------------------
--------------- chi2(1)
4.07 Prgtchi2 0.0437
34
Conclusions

No evidence that new pyelonephritis causes
scarring
Some evidence that it does not
P-values and confidence intervals are
approximate, especially for small sample sizes
There is nothing magical about 0.05
Key concept calculate 95 CI for negative
studies
ARR for clinical questions (less generalizable)
RR for etiologic questions

35
Confidence intervals for small numerators
36
When P-values and Confidence Intervals Disagree

Usually P lt 0.05 means 95 CI excludes null
value.
But both 95 CI and P-values are based on
approximations, so this may not be the case
Illustrated by IRSC slide above
If you want 95 CI and P- values to agree, use
test-based confidence intervals see next slide

37
Alternative Stata output Test-based CI