Title: From last time
1From last time.
2Basic Biostats Topics
- Summary Statistics
- mean, median, mode
- standard deviation, standard error
- Confidence Intervals
- Hypothesis Tests
- t-test (paired and unpaired)
- Chi-Square test
- Fishers exact test
3More Advanced
- Linear Regression
- Logistic Regression
- Repeated Measures Analysis
- Survival Analysis
- Analyzing fMRI data
4General Biostatistics References
- Practical Statistics for Medical Research.
Altman. Chapman and Hall, 1991. - Medical Statistics A Common Sense Approach.
Campbell and Machin. Wiley, 1993 - Principles of Biostatistics. Pagano and
Gauvreau. Duxbury Press, 1993. - Fundamentals of Biostatistics. Rosner. Duxbury
Press, 1993.
5Lecture 3Linear Regression
Child Psychiatry Research Methods Lecture Series
- Elizabeth Garrett
- esg_at_jhu.edu
6Introduction
- Simple linear regression is most useful for
looking at associations between continuous
variables. - We can evaluate if two variables are associated
linearly. - We can evaluate how well we can predict one of
the variables if we know the other.
7Motivating Example (Tierney et al. 2001)
- Is there an association between total sterol
level and ADI scores in autistic children? - Hypothesis Children with lower sterol levels
will tend to have poorer performance (i.e. higher
scores) on the following components of the ADI - social
- nonverbal
- repetitive
8Preliminary Data
- 9 individuals with autism
- Some have been on cholesterol supplementation (7
out of 9) - Mean age 14
- Age range 8 - 32 years
- Sterol is a continuous variable
- ADI scores are continuous variables
9Statistical Language
- Need to choose what variable is the predicted (Y)
and which is the predictor (X). - Y outcome, dependent variable, endogenous
variable - X covariate, predictor, regressor, explanatory
variable, exogenous variable, independent
variable. - Our example?
10How can we conclude if there is or is not
an association between sterol and the ADI scores?
11One approach Correlation
- Correlation is a measure of LINEAR association
between two variables. - It takes values from -1 to 1.
- Often notated r or ?
- r 1 ? perfect positive correlation
- r -1 ? perfect negative correlation
- r 0 ? no correlation
12r 0.95
r 0.77
r -0.95
r 0.09
13Correlation between ADI measures and Sterol
r -0.85
r -0.70
r 0.06
14Related to r R2
- R2 of variation in Y explained by X.
- Example
- Correlation between nonverbal score and sterol is
-0.85. - R2 is 0.852 0.73
- 73 of the variation in nonverbal score is
explained by sterol - Gives a sense of the value of sterol in
predicting nonverbal score - Other examples
- R2 between sterol and social is 0.49
- R2 between sterol and repetitive is 0.004
15Simple Linear Regression (SLR) Approach
- (1) Fits best line to describe the association
between Y and X (note straight line) - (2) Line can be described by two numbers
- - intercept
- - slope
- (3) By-product of regression correlation
measures how close points fall from the line. - (4) Why simple? Only one X variable.
16Intercept 24.8
Slope -0.01
17SLR answers two questions.
- Association?
- Does nonverbal score tend to decrease on average
when sterol increases? - Is slope different than zero?
- Prediction?
- Can we predict nonverbal score if we know sterol
level? - Is the correlation (or R2) high?
- You CAN have association with low correlation!
18Equation of a line
- ?0 Intercept
- ?0 is the estimated nonverbal score if it were
possible to have a sterol level of 0 (nonsensical
in this case). - ?0 calibrates height of line
- ?1 Slope
- ?1 is the estimated change in nonverbal score for
a one unit change in sterol - ?1 the estimated difference in nonverbal score
comparing two kids whose sterol levels differ by
one. - We usually use ?1 as our measure of association
19The slope, ?1
- Is ?1 different than zero?
Are each of these reasonable given the data that
we have observed?
20Evaluating Association
- ?1is a statistic, similar to a sample mean, and
as such has a precision estimate. - The precision estimate is called the standard
error of ?1. Denoted se(?1). - We look at how large ?1 is compared to its
standard error - ?1 is often called a regression coefficient or
a slope.
21General Rule
- If , then we say that
?1 is - statistically significantly different than
zero. - T-test interpretation
- H0 ?1 0
- Ha ?1 ? 0
- If is true, then p-value less than
0.05. - Intuition
- ?1 is large compared to its precision ? not
likely that ?1 is 0.
22For large samples.
23ADI Nonverbal and Sterol
Outcome
pvalue
- --------------------------------------------------
---------------------------- - nonvrb Coef. Std. Err. t
Pgtt 95 Conf. Interval - -------------------------------------------------
---------------------------- - totster -.0099066 .0022804 -4.344
0.003 -.0152988 -.0045144 - _cons 24.84349 2.578369 9.635
0.000 18.74661 30.94036 - --------------------------------------------------
----------------------------
se(?1)
?1
Predictor
?0
R-squared 0.73
24Interpretation
- Comparing two autistic kids whose sterol levels
differ by 1, we estimate that the one with lower
sterol will have an ADI nonverbal score that is
higher by 0.01 points. - Put it in real units
- Comparing two autistic kids whose sterol levels
differ by 200, we estimate that the child with
the lower sterol level will have an ADI nonverbal
score that is higher by 2 points.
(Note 200 x 0.01 2.0)
25A few other details...
- 95 Confidence interval interpretation
- ?1 ? 2se(?1) does not include zero.
- ?1/se(?1) is called the
- t-statistic
- Z-statistic
- If you have small sample (i.e. fewer than 50
individuals), need to use a t-correction.
26Relationship between correlation and SLR
- Testing that correlation is equal to zero is
equivalent to testing that the slope is equal to
zero. - Can have strong association and low correlation
r 0.93 ?1 1.86 pvalue lt 0.001
r 0.55 ?1 1.88 pvalue lt 0.001
27Additional Points
- (1) Association measured is LINEAR
r 0.02
28Additional Points
- (2) Difference (i.e. distance) between observed
data and fitted line is called a residual, ?.
1. 0.74 2. -0.95 3. -2.53 4. 3.01
5. 2.52 6. 0.45 7. -3.15 8.
-0.07 9. 0.59 .
?3
?5
29Additional Points
- (3) Often see model equation as
Refers to regression line
Refer to observed data
Generically,
30Additional Points
- (4) Spread of points around line is assumed to be
constant (i.e. variance of residuals is constant)
BAD!
31Multiple Linear Regression
- More than one X variable
- Generally the same, except
- Cant make plots in multi-dimensions
- Interpretation of ?s is somewhat different
32Other ADI and Sterol SLRs
- How is age when supplementation began related to
sterol? - How is age when supplementation began related to
nonverbal score?
33(No Transcript)
34How might this change our previous result?
- What if age when cholesterol supplementation
began is associated with both sterol level and
nonverbal score? - Is it correct to conclude that total sterol level
is associated with nonverbal score?
Sterol
Nonverbal Score
Supplementation Age
35We can adjust!
- ------------------------------------------------
------------------------------ - nonvrb Coef. Std. Err. t
Pgtt 95 Conf. Interval - -------------------------------------------------
---------------------------- - sterol -.0105816 .0022118 -4.784
0.003 -.0159937 -.0051696 - agester .1570626 .1158509 1.356
0.224 -.1264143 .4405394 - _cons 23.81569 2.551853 9.333
0.000 17.57153 30.05985 - --------------------------------------------------
----------------------------
36Interpretation of Betas
- Now that we have adjusted for age at
supplementation, we need to include that in our
result - Comparing two kids who began cholesterol
supplementation at the same age and whose sterol
levels differ by 250 units, we estimate that the
child with the lower sterol level will have an
ADI nonverbal score higher by 2 points. - Adjusting for age at supplementation, comparing
two kids whose sterol levels differ by 250 units,
we estimate - Controlling for age at supplementation ..
- Holding age at supplementation constant..
37Collinearity
- If two variables are
- correlated with each other
- correlated with the outcome
- Then, when combined in a MLR model, it could
happen that - neither is significant
- only one is significant
- both remain significant
38ADI and Sterol
- Correlation Matrix
- nonvrb sterol agester
- ------------------------------------
- nonvrb 1.0000
- sterol -0.8541 1.0000
- agester 0.0531 0.2251 1.0000
We say that cholesterol time and sterol are
collinear.
39Summing up example.
- After adjusting for age at supplementation, it
appears that sterol is still a significant
predictor of ADI nonverbal score. - BUT!
- Only NINE observations! With more, we would
almost CERTAINLY see even stronger associations! - We havent controlled for other potential
confounders - length of time on supplementation
- nonverbal score prior to supplementation