Title: Biostatistics in Practice
1Biostatistics in Practice
Session 5 Associations and Confounding
Peter D. Christenson Biostatistician http//gcrc
.LABioMed.org/Biostat
2Session 5 Preparation 1
1. We often hear news reports of "seasonally
adjusted unemployment rates". Can you think of a
logical way that this adjustment could be made?
3Session 5 Preparation 2
From Table 3
Unadjusted
What does adjusted mean? How is it done?
Adjusted
4Goal One of Session 5
Earlier Compare means for a single measure among
groups. Use t-test, ANOVA.
Session 5 Relate two or more measures. Use
correlation or regression.
?Y/?X
?
Qu et al(2005), JCEM 901563-1569.
5Goal Two of Session 5
Try to isolate the effects of different
characteristics on an outcome. Previous slide
Gender
GH Peak
BMI
6Correlation
Visualize Y (vertical) by X (horizontal) scatter
plot. Pearson correlation, r, is used to measure
association between two measures X and Y Ranges
from -1 (perfect inverse association) to 1
(perfect direct association) Value of r does
not depend on scales (units) of X and Y which
role X and Y assume, as in a X-Y plot Value of
r does depend on the ranges of X and Y values
chosen for X, if X is fixed Y is measured
7Graphs and Values of Correlations
8Logic for Value of Correlation
-
-
S (X-Xmean) (Y-Ymean) vS(X-Xmean)2 S(Y-Ymean)2
r
Statistical software gives r.
9Correlation Depends on Ranges of X Y
B
A
Graph B contains only the graph A points in the
ellipse. Correlation is reduced in graph B. Thus
correlations for the same quantities X and Y may
be quite different in different study
populations.
10Correlation and Measurement Precision
A
B
overall
12 10
r0 for s
5 6
B
A lack of correlation for the subpopulation with
5ltxlt6 may be due to inability to measure x and y
well. Lack of evidence of association is not
evidence of lack of association.
11Regression
Again Y (vertical) by X (horizontal)
scatterplot, as with correlation. See next
slide. X and Y now assume unique roles Y
is an outcome, response, output, dependent
variable. X is an input, predictor,
independent variable. Regression analysis is
used to Measure X-Y association, as with
correlation. Fit a straight line through the
scatter plot, for Prediction of Y from X.
Estimation of ? in Y for a unit change in X
(slope effect of X on Y).
12Regression Example
MinimizesSei2
ei
Range for Individuals
Range for mean
Statistical software gives all this info.
13X-Y Association
If slope0 then X and Y are not associated. But
the slope measured from a sample will never be 0.
How different from 0 does a measured slope need
to be in order to claim X and Y are
associated? Side note It turns out that
slope0 is equivalent to correlation r 0.
14X-Y Association
Test slope0 vs. slope?0, with the rule Claim
association (slope?0) if tcslope/SE(slope) gt
t 2. There is a 5 chance of claiming an X-Y
association that really does not exist. Note
similarity to t-test for means tcmean/
SE(mean) Formula for SE(slope) is in
statistics books.
15Example Software Output
The regression equation is Y 81.6 2.16
X Predictor Coeff StdErr T
P Constant 81.64 11.47 7.12
lt0.0001 X 2.1557 0.1122
19.21 lt0.0001 S 21.72 R-Sq
79.0 Predicted Values X
100 Fit 297.21 SE(Fit) 2.17 95 CI
292.89 - 301.52 95 PI 253.89 - 340.52
19.212.16/0.112 should be between -2 and 2 if
true slope0.
Refers to Intercept
Predicted y 81.6 2.16(100) Range of Ys with
95 assurance for Mean of all subjects with
x100. Individual with x100.
16Goal Two of Session 5
Try to isolate the effects of different
characteristics on an outcome.
Ethnicity
Outcome
Age
17Another Study
Potential doping test for athletes.
J Clin Endocrin Metab 2006 Nov 91(11)4424-32.
18Study Goals Outcomes are IGF-1 and Collagen
Markers
Determine the relative and combined explanatory
power of age, gender, BMI, ethnicity, and sport
type on the markers.
Figure 2.
One conclusion is lack of differences between
ethnic IGF-1 means, after adjustment for age,
gender, and BMI (Fig 2). How are these
adjustments made?
for age, gender, and BMI.
19Adjustment For a Single Continuous
Characteristic
We simulate data for Caucasians and Africans only
for simplicity, to demonstrate attenuation of a
155-14015 µg/L ethnic difference to a 160-1573
µg/L ethnic difference.
158
160
140
155
20Adjustment For a Single Continuous
Characteristic
Problem Want to compare groups on IGF-1. Groups
to be compared (ethnicities) have different mean
ages, and IGF-1 tends to decrease with
age. Solution Make groups appear to have the
same mean age.
21Adjustment For a Single Continuous
Characteristic
Solution Make groups appear to have the same
mean age. To do this, Find regression line
predicting IGF-1 from age. Move each subject
parallel to the regression line to the mean age.
This is the expected IGF-1 if this subject had
been at the mean age. Adjusted means are means of
these adjusted individual values.
22(No Transcript)
23Adjustment For a Single Continuous
Characteristic
We have just described a special case of multiple
regression, in which an outcome is estimated by
multiple predictors.
Simple Regression Estimated IGF-1 intercept
slope(age)
Multiple Regression Estimated IGF-1 intercept
slope(age) diff(indicator)
Indicator 0 if African, 1 if Caucasian.
24Adjustment For a Single Continuous
Characteristic
Software Select Regression or Analysis of
Covariance. Usually menu such as
Output Values of b0,b1,b2 for IGF1b0b1(age)
b2(indicator)
25Multiple Regression
We have seen the logic of adjusting for a single
characteristic. The next few slides try to give
a geometric view of generalizing adjustment to
account for several factors simultaneously.
26Multiple Regression Geometric View
Multiple predictors may be continuous. Geometrical
ly, this is fitting a slanted plane to a cloud of
points
www.StatisticalPractice.com
LHCY is the Y (homocysteine) to be predicted from
the two Xs LCLC (folate) and LB12 (B12).
LHCY b0 b1LCLC b2LB12 is the equation of
the plane
27How Are Coefficients Interpreted?
LHCY b0 b1LCLC b2LB12
Outcome
Predictors
LB12 may have both an independent and an indirect
(via LCLC) association with LHCY
LCLC
b1 ?
LHCY
Correlation
b2 ?
LB12
28Coefficients Meaning of their Values
LHCY b0 b1LCLC b2LB12
Outcome
Predictors
LHCY increases by b2 for a 1-unit increase in
LB12 if other factors (LCLC) remain constant,
or adjusting for other factors in the model
(LCLC)
May be physiologically impossible to maintain one
predictor constant while changing the other by 1
unit.
29Another Example HDL Cholesterol
Output
Std Coefficient
Error t Pr gt t Intercept
1.16448 0.28804 4.04 lt.0001 AGE
-0.00092 0.00125 -0.74 0.4602 BMI
-0.01205 0.00295 -4.08 lt.0001 BLC 0.05055
0.02215 2.28 0.0239 PRSSY -0.00041
0.00044 -0.95 0.3436 DIAST 0.00255
0.00103 2.47 0.0147 GLUM -0.00046
0.00018 -2.50 0.0135 SKINF 0.00147
0.00183 0.81 0.4221 LCHOL 0.31109 0.10936
2.84 0.0051 The predictors of log(HDL) are age,
body mass index, blood vitamin C, systolic and
diastolic blood pressures, skinfold thickness,
and the log of total cholesterol. The equation
is Log(HDL) 1.16 - 0.00092(Age)
0.311(LCHOL)
www. Statistical Practice .com
30HDL Example Coefficients
- Interpretation of coefficients on previous slide
- Need to use entire equation for making
predictions. - Each coefficient measures the difference in
expected LHDL between 2 subjects if the factor
differs by 1 unit between the two subjects, and
if all other factors are the same. E.g., expected
LHDL is 0.012 lower in a subject whose BMI is 1
unit greater, but is the same as the other
subject on other factors.
Continued
31HDL Example Coefficients
- Interpretation of coefficients two slides back
- P-values measure the association of a factor with
Log(HDL) , if other factors do not change. -
- This is sometimes expressed as after accounting
for other factors or adjusting for other
factors, and called its independent association.
- SKINF is probably is associated. Its p0.42 says
that it has no additional info to predict LogHDL,
after accounting for other factors such as BMI.