Title: Statistics 262: Intermediate Biostatistics
1Statistics 262 Intermediate Biostatistics
Regression Models for longitudinal data Mixed
Models
2Example with time-dependent, continuous predictor
6 patients with depression are given a drug that
increases levels of a happy chemical in the
brain. At baseline, all 6 patients have similar
levels of this happy chemical and scores gt14 on
a depression scale. Researchers measure
depression score and brain-chemical levels at
three subsequent time points at 2 months, 3
months, and 6 months post-baseline. Here are the
data in broad form
3Turn the data to long form
data long4 set new4 time0 scoretime1
chemchem1 output time2 scoretime2
chemchem2 output time3 scoretime3
chemchem3 output time6 scoretime4
chemchem4 output run
4Data in long form
5Graphically, lets see whats going on First, by
subject.
6(No Transcript)
7(No Transcript)
8(No Transcript)
9(No Transcript)
10(No Transcript)
11All 6 subjects at once
12Mean chemical levels compared with mean
depression scores
13Introduction to Mixed Models
Return to our chemical/score example.
Ignore chemical for the moment, just ask if
theres a significant change over time in
depression score
14Introduction to Mixed Models
Return to our chemical/score example.
15Introduction to Mixed Models
Linear regression line for each person
16Introduction to Mixed Models
Mixed models fixed and random effects. For
example,
17Introduction to Mixed Models
What is a random effect?
--Rather than assuming there is a single
intercept for the population, assume that there
is a distribution of intercepts. Every persons
intercept is a random variable from a shared
normal distribution. --A random intercept for
depression score means that there is some average
depression score in the population, but there is
variability between subjects.
18Compare to OLS regression
Compare with ordinary least squares regression
(no random effects)
Unexplained variability in Y. LEAST SQUARES
ESTIMATION FINDS THE BETAS THAT MINIMIZE THIS
VARIANCE (ERROR)
19RECALL, SIMPLE LINEAR REGRESSION
The standard error of Y given T is the average
variability around the regression line at any
given value of T. It is assumed to be equal at
all values of T.
Y
T
20All fixed effects
59.482929
24.90888889
-0.55777778
21 The REG
Procedure
Model MODEL1
Dependent Variable score Analysis of
Variance
Sum of Mean Source
DF Squares Square F
Value Pr gt F Model
1 35.00056 35.00056 0.59
0.4512 Error 22
1308.62444 59.48293 Corrected
Total 23 1343.62500
Root MSE 7.71252 R-Square
0.0260 Dependent Mean
23.37500 Adj R-Sq -0.0182
Coeff Var 32.99473
Parameter Estimates
Parameter
Standard Variable DF
Estimate Error t Value Pr gt t
Intercept 1 24.90889
2.54500 9.79 lt.0001 time
1 -0.55778 0.72714
-0.77 0.4512
Where to find these things in OLS in SAS
22Introduction to Mixed Models
Adding back the random intercept term
23Meaning of random intercept
24Introduction to Mixed Models
25 Covariance
Parameter Estimates
Cov Parm Subject Estimate
Variance id
44.6121 Residual 18.9264
Fit
Statistics -2 Res
Log Likelihood 146.7
AIC (smaller is better) 152.7
AICC (smaller is
better) 154.1
BIC (smaller is better) 152.1
Solution for Fixed
Effects
Standard Effect Estimate
Error DF t Value Pr gt t
Intercept 24.9089 3.0816 5
8.08 0.0005 time
-0.5578 0.4102 17 -1.36
0.1916
Where to find these things in from MIXED in SAS
Interpretation is the same as with GEE -.5578
decrease in score per month time.
26With random effect for time, but fixed intercept
Allowing time-slopes to be random
27Meaning of random beta for time
28With random effect for time, but fixed intercept
29With both random
With a random intercept and random time-slope
30Meaning of random beta for time and random
intercept
31With both random
With a random intercept and random time-slope
16.6311
Additionally, we have to estimate the covariance
of the random intercept and random slope here
-1.9943 (adding random time therefore cost us 2
degrees of freedom)
24.90888889
53.0068
0.4162
0.55777778
32Choosing the best model
Aikake Information Criterion (AIC) a fit
statistic penalized by the number of parameters
- AIC - 2log likelihood 2(parameters)
- Values closer to zero indicate better fit and
greater parsimony. - Choose the model with the smallest AIC.
33AICs for the four models
MODEL AIC
All fixed 162.2
Intercept random Time slope fixed 150.7
Intercept fixed Time effect random 161.4
All random 152.7
34In SASto get model with random intercept
- proc mixed datalong
- class id
- model score time /s
- random int/subjectid
- run quit
35Model with chem
- proc mixed datalong
- class id
- model score time chem/s
- random int/subjectid
- run quit
Typically, we take care of the repeated measures
problem by adding a random intercept, and we stop
therethough you can try random effects for
predictors and time.
36 Cov Parm
Subject Estimate
Intercept id 35.5720
Residual
10.2504
Fit Statistics -2
Res Log Likelihood 143.7
AIC (smaller is better)
147.7 AICC (smaller
is better) 148.4
BIC (smaller is better) 147.3
Solution for Fixed
Effects
Standard Effect Estimate
Error DF t Value Pr gt t
Intercept 38.1287 4.1727 5
9.14 0.0003 time
-0.08163 0.3234 16 -0.25
0.8039 chem -0.01283
0.003125 16 -4.11 0.0008
Residual and AIC are reduced even further due to
strong explanatory power of chemical.
Interpretation is the same as with GEE we cannot
separate between-subjects and within-subjects
effects of chemical.
37Example 2 time-independent binary predictor
From GEE Strong effect of time. No group
difference Non-significant grouptime trend.
38SAS code
- proc mixed datalong
- class id group
- model score time group timegroup/s corrb
- random int /subjectid
- run quit
39Results (random intercept)
Fit Statistics
-2 Res Log Likelihood
138.4 AIC
(smaller is better) 142.4
AICC (smaller is better)
143.1 BIC (smaller
is better) 142.0
Solution for Fixed Effects
Standard
Effect group Estimate Error
DF t Value Pr gt t
Intercept 40.8333 4.1934
4 9.74 0.0006 time
-5.1667 1.5250 16 -3.39
0.0038 group A
7.1667 5.9303 16 1.21
0.2444 group B 0
. . . .
timegroup A -3.5000 2.1567
16 -1.62 0.1242 timegroup
B 0 . . .
.
40Compare to GEE results
Analysis Of GEE Parameter
Estimates Empirical
Standard Error Estimates
Standard 95 Confidence
Parameter Estimate Error Limits
Z Pr gt Z Intercept
40.8333 5.8516 29.3645 52.3022 6.98
lt.0001 group A 7.1667
6.1974 -4.9800 19.3133 1.16 0.2475
group B 0.0000 0.0000 0.0000
0.0000 . . time
-5.1667 1.9461 -8.9810 -1.3523 -2.65
0.0079 timegroup A -3.5000
2.2885 -7.9853 0.9853 -1.53 0.1262
Same coefficient estimates. Nearly identical
p-values.
Mixed model with a random intercept is equivalent
to GEE with exchangeable correlation(slightly
different std. errors in SAS because PROC MIXED
additionally allows Residual variance to change
over time.
41Summary
- GEE and Mixed Models correct for the dependency
of observations within subjects - In GEE analysis by assuming a working correlation
structure - In random coefficient analysis by allowing the
regression coefficients to vary between subjects.