A short introduction to epidemiology Chapter 9: Data analysis PowerPoint PPT Presentation

presentation player overlay
1 / 51
About This Presentation
Transcript and Presenter's Notes

Title: A short introduction to epidemiology Chapter 9: Data analysis


1
A short introduction to epidemiologyChapter 9
Data analysis
  • Neil Pearce
  • Centre for Public Health Research
  • Massey University
  • Wellington, New Zealand

2
Chapter 9Data analysis
  • Basic principles
  • Basic analyses
  • Control of confounding

3
Basic principles
  • Effect estimation
  • Confidence intervals
  • P-values

4
Testing and estimation
  • The effect estimate provides an estimate of the
    effect (e.g. relative risk, risk difference) of
    exposure on the occurrence of disease
  • The confidence interval provides a range of
    values in which it is plausible that the true
    effect estimate may lie
  • The p-value is the probability that differences
    as large or larger as those observed could have
    arisen by chance if the null hypothesis (of no
    association between exposure and disease) is
    correct
  • The principal aim of an individual study should
    be to estimate the size of the effect (using the
    effect estimate and confidence interval) rather
    than just to decide whether or not an effect is
    present (using the p-value)

5
Problems of significance testing
  • The p-value depends on two factors the size of
    the effect and the size of the study
  • A very small difference may be statistically
    significant if the study is very large, whereas a
    very large difference may not be significant if
    the study is very small.
  • The purpose of significance testing is to reach a
    decision based on a single study. However,
    decisions should be based on information from all
    available studies, as well as non-statistical
    considerations such as the plausibility and
    coherence of the effect in the light of current
    theoretical and empirical knowledge (see chapter
    10).

6
Chapter 9Data analysis
  • Basic principles
  • Basic analyses
  • Control of confounding

7
Basic analyses
  • Measures of occurrence
  • Incidence proportion (risk)
  • Incidence rate
  • Incidence odds
  • Measures of effect
  • Risk ratio
  • Rate ratio
  • Odds ratio

8
Example
E
E
M1
C
a
b
c
d
M0
N0
N1
T
9
Example Smoking and Ovarian Cancer
E
E
60
36
24
C
40
58
98
158
76
82
10
(No Transcript)
11
(No Transcript)
12
  • This ?2 is based on the assumptions that the
    marginal totals of the table (N1, N0, M1,M0) are
    fixed and that the proportion of exposed cases is
    the same as the proportion of exposed controls
    (i.e. that the overall proportion M1/T applies to
    both cases and controls)

13
The natural logarithm of the odds ratio has
(under a binomial model) an approximate standard
error of SEln(OR) (1/a 1/b 1/c
1/d)0.5 An approximate 95 confidence interval
for the odds ratio is then given by OR e1.96
SE
14
Chapter 9Data analysis
  • Basic principles
  • Basic analyses
  • Control of confounding

15
Control of confounding
  • There are two methods of calculating a summary
    effect estimate to control confounding
  • Pooling
  • Standardisation

16
Example of pooling
The unadjusted (crude) findings indicate that
there is a strong association between smoking and
the ovarian cancer. Suppose, however, that we
are concerned about the possibility that the
effect of smoking is confounded by use of oral
contraception (this would occur if oral
contraception caused the ovarian cancer and if
oral contraception was associated with smoking).
We then need to stratify the data into those who
have used oral contraceptives and those who have
not.
17
OC use
Yes
No
Smoking
Smoking
Yes
No
15
4
9
32
Cases
19
41
8
28
36
Controls
50
12
62
17
60
77
65
16
81
18
In those who have used oral contraceptives, the
odds ratio for smoking is In those who have
not used oral contraceptives, the odds ratio for
smoking is
19
Thus, the crude OR for smoking (0.46) was partly
elevated due to confounding by oc use. When we
remove this problem (by stratifying on oral
contraceptive use) the odds ratios increase and
are close to 1.0
20
In this example, the odds ratios are not exactly
the same in each stratum. If they are very
different (e.g. 1.0 in one stratum and 4.0 in the
other stratum) then we would usually report the
findings separately for each stratum. However,
if the odds ratio estimates are reasonably
similar then we usually wish to summarize our
findings into a single summary odds ratio by
taking a weighted average of the OR estimates in
each stratum.
21
where ORi OR in stratum i Wi
weight given to stratum i
22
One obvious choice of weights would be to weight
each stratum by the inverse of its variance
(precision-based estimates). However, this
method of obtaining a summary odds ratio yields
estimates which are unstable and highly affected
by small numbers in particular strata.
23
A better set of weights were developed by
Mantel-Haenszel. These involve using the weights
bi ci /Ti
24
Stratum 1
Stratum 2
E
E
15
4
9
32
19
41
C
C
C
50
12
62
8
28
36
C
65
16
81
17
60
77
25
This set of weights yields summary odds ratio
estimates which are very close to being
statistically optimal (they are very close to the
maximum likelihood estimates) and are very robust
in that they are not unduly affected by small
numbers in particular strata (provided that the
strata do not have any zero marginal totals).
26
We can calculate a corresponding chi-square
27
Stratum 1
Stratum 2
E
E
15
4
9
32
19
41
C
C
C
50
12
62
8
28
36
C
65
16
81
17
60
77
28
The natural logarithm of the odds ratio has
(under a binomial model) an approximate standard
error of SPR S(PS QR) SQS SE -----
-------------- ------ 2R2 2RS 2S2 wher
e P (ai di)/Ti Q (bi ci)/Ti R
aidi/Ti S bici/Ti R SR S SS
29
An approximate 95 confidence interval for the
odds ratio is then given by OR e1.96 SE
30
Rate ratios
E
a
b
c
M1
Y1
Y0
PY
31
E
350
125
Case PY Rate
10,000
10,000
0.00125
0.00350
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
The summary Mantel-Haenszel rate ratio involves
taking the weights bY1/T to yield
36
(No Transcript)
37
The equivalent Mantel-Haenszel chi-square is
38
This is very similar to the ?2MH for
case-control studies, but it has some minor
modifications to take account of the fact that we
are using person-time data rather than binomial
data.
39
(No Transcript)
40
An approximate standard error for the natural log
of the rate ratio is SM1iY1iY0i/Ti20.5 SE
------------------------------ (SaiY0i/Ti)(S
biY1i/Ti)0.5
41
An approximate 95 confidence interval for the
rate ratio is then given by RR e1.96 SE
42
Risk ratios
E
a
b
Cases
M1
c
d
Non Cases
M0
N1
N0
Total
43
(No Transcript)
44
(No Transcript)
45
An approximate standard error for the natural log
of the risk ratio is SM1iN1iN0i/Ti2 -
aibi/Ti0.5 SE --------------------------------
- (SaiN0i/Ti)(SbiN1i/Ti)0.5
46
An approximate 95 confidence interval for the
risk ratio is then given by RR e1.96 SE
47
Standardization, in contrast to pooling, involves
taking a weighted average of the rates in each
stratum (eg age-group) before taking the ratio of
the two standardized rates. Standardization has
many advantages in descriptive epidemiology
involving comparisons between countries, regions,
ethnic groups or gender groups. However,
pooling (when done appropriately) has some
superior statistical properties when comparing
exposed and non-exposed in specific study.
48
Summary of Stratified Analysis
  • If we are concerned about confounding by a factor
    such as age, gender, smoking then we need to
    stratify on this factor (or all factors
    simultaneously if there is more than one
    potential confounder) and calculate the exposure
    effect separately in each stratum.
  • If the effect is very different in different
    strata then we would report the findings
    separately for each stratum.

49
If the effect is similar in each stratum then we
can obtain a summary estimate by taking a
weighted average of the effect in each
stratum. If the adjusted effect is different from
the crude effect this means that the crude effect
was biased due to confounding.
50
Usually we need to adjust the findings (ie
stratify on) age, gender, and some other
factors. If we have five age-groups and two
gender-groups then we need to divide the data
into ten age-gender-groups. If we have too many
strata then we begin to get strata with zero
marginal totals (eg with no cases or no
controls). The analysis then begins to break
down and we have to consider using mathematical
modelling.
51
A short introduction to epidemiologyChapter 9
Data analysis
  • Neil Pearce
  • Centre for Public Health Research
  • Massey University
  • Wellington, New Zealand
Write a Comment
User Comments (0)
About PowerShow.com