Econometrics With Eviews Chapter 17 Version 4 Discrete and Limited Dependent Variable Models Part 1: - PowerPoint PPT Presentation

1 / 78
About This Presentation
Title:

Econometrics With Eviews Chapter 17 Version 4 Discrete and Limited Dependent Variable Models Part 1:

Description:

baseline CD4 cell counts and the baseline cigarette usage. ... Relating Baseline Counts & Baseline Cigarette Use ... cigarettes 1 1328 40.88 .0001 ... – PowerPoint PPT presentation

Number of Views:487
Avg rating:3.0/5.0
Slides: 79
Provided by: HALL2
Category:

less

Transcript and Presenter's Notes

Title: Econometrics With Eviews Chapter 17 Version 4 Discrete and Limited Dependent Variable Models Part 1:


1
Longitudinal Data Analysis with Discrete and
Continuous Responses using Proc MixedPart 2
General Linear Mixed Model Evaluating
Covariance Structures
December 10, 2003 Charlie Hallahan
2

Introduction
  • These notes are based on a 3-day course taught by
    SAS Institute.
  • Part 1 Introduction to Longitudinal Analysis
    Exploratory Analysis
  • ? Part 2 General Linear Mixed Model Evaluating
    Covariance Structures
  • Part 3 Model Development and Interpretation
  • Part 4 Random Coefficient Models
  • Part 5 Model Assessment
  • Part 6 Generalized Linear Models Generalized
    Linear Mixed Models

3

General Linear Model
4
General Linear Mixed Model
5
Fixed and Random Effects
Fixed effects A variable represents a fixed
effect if the levels of that variable
included in the study either
represent all the possible levels or
inference is to made only for those
levels. For example, if the
variable drug is an explanatory variable with
level values of A, B, and
C and if drug is treated as a fixed effect, then
any inferences derived
from the analysis only apply to those three drug
levels and not, for
example, to drug D. Random effects If the
levels of a variable included in a study
represent a random
sample from a larger population of possible
levels, then that
variable is treated as a random effect. For
example, if four clinics
are included in a study and these four clinics
are a random sample
from a population of clinics, then the variable
clinic is treated as
a random effect. Inferences derived from the
study can then apply
to the relevant population of clinics.
6
Model Assumptions in PROC Mixed
  • Random effects and error terms are normally
    distributed with means of zero.
  • Random effects and error terms are independent of
    each other.
  • The relationship between the response variable
    and predictor variables is linear.
  • Variance-covariance matrices for random effects
    and error terms exhibit
  • structures available in PROC Mixed.
  • Note that assumption 1. implies that the response
    variable must be continuous for
  • PROC Mixed.

7
Linear Mixed Models for Longitudinal Data
8
Linear Mixed Models for Longitudinal Data
9
Estimation in Mixed Models
  • The variance-covariance matrix of the
    observations involves
  • the covariance structure of the random effects,
    denoted as G.
  • the covariance structure of the random errors,
    denoted as R.
  • OLS is no longer the best method as the OLS
    assumptions no longer hold.
  • Generalized least squares (GLS)
  • takes into account the covariance structure G
    and R.
  • require a reasonable estimate of G and R.

10
Estimation in General Linear Models versus Mixed
Models
GLM Assumes errors are independent, normally
distributed and with common
variance. Estimates parameters and
standard errors using OLS. Mixed Model
Estimates G and R using a likelihood-based
method. Estimates
parameters and standard errors using estimated
GLS.
11
ML versus REML
  • Both are based on the likelihood principle,
    which has the properties of
  • consistency, asymptotic normality, and
    efficiency.
  • REML corrects for the downward bias in the ML
    parameter estimates for R G.
  • REML handles strong correlations among the
    responses more effectively.
  • The differences between ML and REML increase as
    the number of fixed effects
  • in the model increases.
  • ? REML estimators are less sensitive to outliers
    than ML.

12
Mixed Procedure
PROC MIXED DATA SAS-data-set
CLASS variables MODEL response
RANDOM random
effects REPEATED
RUN The variables listed as fixed
effects determine the X matrix. The variables
listed as random effects determine the Z matrix.
A random intercept is indicated with the keyword
INTERCEPT or INT. The options on the
RANDOM statement determine the covariance
structure of the random effects, i.e., the matrix
G. The REPEATED statement specifies the R matrix
in the mixed model. This is the key statement for
longitudinal data.
13
Block-Diagonal Covariance Matrix
For repeated measures data, each subject has
several measures, usually over time. In this
case, an ID variable identifies the subject
associated with an observation, and a time
variable indicates the various repeated measures
for that subject. PROC MIXED is very flexible
and does not require that each subject have
either the same number of repeated observations
or that the time periods be the same for each
subject. The SUBJECT option in the REPEATED
statement determines the block-diagonal structure
of the covariance R.
14
Selecting the Appropriate Covariance Structure
Selecting a structure that is too simple
increases the Type I rate. Selecting a structure
that is too complex sacrifices power and
efficiency. Therefore, its important to select
a reasonable covariance structure for R.
15
Variance Component (VC) or Simple
The covariance for each subject is
This structure assumes that there is no
within-subject error correlation, which might be
reasonable if the time periods between
observations are long enough. This structure is
the default for both the RANDOM and REPEATED
statements.
16
Compound Symmetry
Assumes equal error correlation no matter what
the time-distance between observations. This is
the assumption in univariate ANOVA.
17
Unstructured Covariance
This is the structure used in multivariate
ANOVA. Problems - too many parameters, can
lead to computational difficulties
- does not account for any potential trends
in the correlations - assumes
that within each subject, the time distance
between successive
measurements are equal. I.e., for subject i, the
time between the 1st 2nd
measurements is the same as the time
distance between the 2nd and 3rd
measurements. This assumption
is not valid for the CD4 data, for example.
18
First-order Autoregressive AR(1)
This structure takes into account a common trend
in the longitudinal data. It also assumes the
time points are equally spaced, which is not the
case for the CD4 data. Requires estimating 2
parameters, as opposed to the unstructured
assumption which involves T(T1)/2 parameters.
19
Toeplitz
Generalizes the AR(1) structure by assuming that
observations within a subject that are the same
time-distance apart have the same
correlation. Requires estimation of T
parameters. Assumes the observations are equally
spaced and the correlation structure does not
change appreciably over time.
20
Spatial Power
Most general (other than unstructured) allows
for unequally spaced observations. Mainly used
in geostatistical models. Generalizes the AR(1)
structure for unequally spaced data. Will see
later how to handle heteroskedasticity with the
RANDOM statement. Recall that V ZGZ R and
weve just been dealing with R with the
REPEATED statement.
21
Spatial Gaussian
Also allows for unequally spaced
observations. The sample variogram can be used
as a diagnostic tool to select a
covariance structure.
22
Fitting a Longitudinal Model in PROC MIXED
Recall data set used in the examples CD4 Data
Set HIV causes AIDS by attacking an immune
cell called the CD4 cell which facilitates the
bodys ability to fight infection. An uninfected
person has approximately 1100 cells per
milliliter of blood. Since CD4 cells decrease
in number form the time of infection, a
persons CD4 cell count can be used to monitor
disease progression. A subset of the
Multicenter AIDS Cohort Study (1987) is used
with 369 infected men to examine CD4 cell
counts over time.
23
Fitting a Longitudinal Model in PROC MIXED
Objectives of CD4 Cell Numbers Study 1.
Estimate the average time course of CD4 cell
depletion 2. Estimate the time course for
individual men 3. Characterize the degree of
heterogeneity across men in the rate of
progression. 4. Identify factors which predict
CD4 cell changes
24
Fitting a Longitudinal Model in PROC MIXED
CD4 Data Set Variables CD4 CD4 cell
count time time in years since HIV
detectable (seroconversion) age in years
relative to arbitrary origin
cigarettes packs of cigarettes smoked per day
drug recreational drug use (1yes, 0no)
partners number of partners relative to
arbitrary origin depression CES-D (a
depression scale) id subject
identification number The data is unbalanced
because the measurements can occur at any time
and the number of measurements can vary over
subjects
25
Summary of Exploratory Data Analysis
1. There seems to be a cubic relationship
between CD4 cell count time. 2. The group
profile plots show a time by cigarette usage
interaction. 3. The cross-sectional plots show
a positive relationship between the
baseline CD4 cell counts and the baseline
cigarette usage. 4. The longitudinal plots show
a positive relationship between the change
in CD4 cell counts and the change
in the number of partners. All these
results will be helpful in specifying an initial
model for PROC MIXED.
26
1st 25 Observations of CD4 Data
Line Listing of CD4 Data

Obs id CD4 time age
cigarettes drug partners depression
1 10002 548 -0.74196
6.57 0 0 5
8 2 10002 893
-0.24641 6.57 0 1 5
2 3 10002
657 0.24367 6.57 0 1
5 -1 4 10005
464 -2.72964 6.95 0 1
5 4 5
10005 845 -2.25051 6.95 0
1 5 -4
6 10005 752 -0.22177 6.95 0
1 5 -5
7 10005 459 0.22177 6.95
0 1 5 2
8 10005 181 0.77481 6.95
0 1 5 -3
9 10005 434 1.25667
6.95 0 1 5
-7 10 10029 846
-1.24025 2.64 0 1 5
18 11 10029
1102 -0.74196 2.64 0 1
5 18 12
10029 801 -0.25188 2.64 0
1 5 38
13 10029 824 0.25188 2.64
0 1 5 7
14 10029 866 0.76934 2.64
0 1 5 15
15 10029 704 1.41273 2.64
0 1 5 21
16 10029 757 1.80698
2.64 0 1 5
25 17 10029 726
2.42026 2.64 0 1 5
29 18 10039 1277
-1.39357 11.28 3 1
-4 -7 19 10039
1132 -0.72006 11.28 3 0
-2 -5 20
10039 1454 -0.26010 11.28 3
1 -3 -6
21 10039 738 0.26010 11.28
3 0 -4 -7
22 10048 994 -0.30664 17.99
0 1 5 -7
23 10048 486 0.30664 17.99
0 1 5 -7
24 10048 605 0.81314
17.99 0 1 5
-5 25 10048 880
1.09514 17.99 0 1 5
7
27
Cubic Relationship with Time
CD4 count appears to remain constant around 1000
until seroconversion, then rapidly declines
before leveling off again around 500. The
relationship appears to be cubic in nature.
28
Time-Cigarette Interaction
Heavy cigarette smokers experience a more rapid
decline in CD4 count. The gap between the lines
may indicate a time-cigarette interaction.
29
Relating Baseline Counts Baseline Cigarette Use
Strong positive relationship between baseline
CD4 counts baseline values of number of packs
of cigarettes smoked per day.
30
Relating Change in Counts Change in Partners
Evidence of a positive relationship between
change in partners and change in baseline CD4
count.
31
Fitting a Longitudinal Model in PROC MIXED
It is a good idea to have all the variables on
the same scale to avoid convergence problems.
rescale data to avoid computational
problems data aids set mixed.aids
cd4_scalecd4/100 run The scale of the
original data is determined by the exploratory
analysis done in Part 1. The next step after the
preliminary exploratory analysis is to fit an
initial complex linear model.
32
Fitting a Longitudinal Model in PROC MIXED
Because the data is sorted by time within each
subject, it is not necessary to declare time as
a class variable and include time after the
REPEATED statement. In this case, since time has
so many distinct values, declaring time as a
class variable results in an out of memory
message. The initial model includes all
first-order and second-order interactions and
time is entered as a cubic term.
proc mixed dataaids model cd4_scaletime age
cigarettes drug partners depression
timeage timedepression
timepartners timedrug timecigarettes
timetime timetimetime
/ solution ddfmkr repeated /
typecs subjectid r rcorr title
'Longitudinal Model with Compound Symmetry
Covariance Structure' run
33
Fitting a Longitudinal Model in PROC MIXED
Selected MODEL statement options DDFMKR
performs the degrees of freedom calculations
proposed by Kenward and Roger
(1997) SOLUTION requests estimates for all
fixed effects in the model, together with
standard errors, t-statistics, and
p-values Selected REPEATED statement
options R requests the residual covariance
matrix, R, be displayed by default, R is
printed for the first subject. RCORR same as
R, but for the correlation matrix. SUBJECTID id
entifies the subjects in the mixed
model TYPECS specifies compound symmetry as
the cov. structure
34
Fitting a Longitudinal Model in PROC MIXED
Model Information Table
Longitudinal Model with
Compound Symmetry Covariance Structure



The Mixed Procedure
Model
Information
Data Set WORK.AIDS
Dependent Variable
cd4_scale
Covariance Structure Compound
Symmetry
Subject Effect id
Estimation Method
REML Residual
Variance Method Profile
Fixed Effects SE Method Prasad-Rao-Jes
ke-
Kackar-Harville
Degrees of Freedom
Method Kenward-Roger
35
Fitting a Longitudinal Model in PROC MIXED
Dimensions Table

Dimensions
Covariance Parameters 2
Columns in
X 14
Columns in Z
0
Subjects 369
Max Obs Per
Subject 12
Observations Used
2376
Observations Not Used 0
Total Observations
2376

Iteration History
Iteration Evaluations -2 Res Log Like
Criterion
0 1 12668.04910184
1
2 11846.03145506 0.00000217
2
1 11846.02324942 0.00000000
36
Fitting a Longitudinal Model in PROC MIXED


Estimated R Matrix for Subject 1
Row Col1 Col2 Col3
1 12.0198 5.7939 5.7939
2 5.7939 12.0198 5.7939

3 5.7939 5.7939
12.0198 Estimated R Correlation
Matrix for Subject 1 Row Col1
Col2 Col3 1
1.0000 0.4820 0.4820 2
0.4820 1.0000 0.4820
3 0.4820 0.4820 1.0000
Covariance Parameter Estimates
Cov Parm Subject Estimate
CS id 5.7939
Residual 6.2259
37
Fitting a Longitudinal Model in PROC MIXED
The
Mixed Procedure
Fit Statistics
-2 Res Log Likelihood
11846.0
AIC (smaller is better) 11850.0
AICC
(smaller is better) 11850.0
BIC (smaller is better)
11857.8
Null Model Likelihood Ratio Test
DF
Chi-Square Pr ChiSq
1 822.03
38
Fitting a Longitudinal Model in PROC MIXED
Solution
for Fixed Effects
Standard Effect
Estimate Error DF t
Value Pr t Intercept 8.0594
0.2362 1076 34.13
time -1.0430
0.08359 2249 -12.48 0.01554 0.01885 375 0.82
0.4104 cigarettes 0.4605 0.07203
1328 6.39 0.1295 0.2017 2339 0.64
0.5209 partners 0.03450 0.02237
2360 1.54 0.1231
depression -0.02638 0.008662
2326 -3.05 0.0024
timeage -0.01598 0.004560
2258 -3.50 0.0005
timedepression 0.000784 0.003357
2234 0.23 0.8153
timepartners -0.00584 0.009560
2230 -0.61 0.5410
timedrug -0.04277 0.07641
2233 -0.56 0.5757
timecigarettes -0.1520 0.02454
2244 -6.19 timetime -0.1518 0.02400
2149 -6.32 timetimetime 0.05458 0.006254
2119 8.73 39
Fitting a Longitudinal Model in PROC MIXED
Type 3 Tests of
Fixed Effects

Num Den Effect DF DF F
Value Pr F time 1 2249
155.67 375 0.68 0.4104 cigarettes 1
1328 40.88 1 2339 0.41 0.5209
partners 1 2360
2.38 0.1231
depression 1 2326 9.27
0.0024 timeage 1 2258 12.27
0.0005 timedepression 1 2234
0.05 0.8153 timepartners 1 2230
0.37 0.5410
timedrug 1 2233 0.31
0.5757
timecigarettes 1 2244 38.37
timetime 1 2149 39.99
timetimetime 1 2119 76.17
40
Fitting a Longitudinal Model in PROC MIXED
Now refit the model using the spatial power
covariance structure . Also request the
covariance matrix and correlation matrix for the
13th subject. proc mixed dataaids model
cd4_scaletime age cigarettes drug partners
depression timeage
timedepression timepartners
timedrug timecigarettes timetime
timetimetime / solution
ddfmkr repeated / typesp(pow)(time) local
subjectid r13 rcorr13 title 'Longitudinal
Model with Spatial Power Covariance
Structure' run Note the local option on the
repeated statement. This will be explained later.
41
Fitting a Longitudinal Model in PROC MIXED

Model Information
Data Set
WORK.AIDS
Dependent Variable cd4_scale
Covariance
Structure Spatial Power
Subject Effect
id
Estimation Method REML
Residual Variance
Method Profile
Fixed Effects SE Method
Prasad-Rao-Jeske-

Kackar-Harville
Degrees of Freedom Method Kenward-Roger

Dimensions
Covariance Parameters 3
? because of local option
Columns in X
14
Columns in Z 0
Subjects
369
Max Obs Per Subject
12
Observations Used 2376
Observations
Not Used 0
Total Observations
2376
42
Fitting a Longitudinal Model in PROC MIXED

Iteration History
Iteration Evaluations -2 Res Log Like
Criterion
0 1 12668.04910184
1
3 11883.08815296 0.32992483
2
1 11881.79852820 0.00348677
3
2 11864.84042331 0.10490545
4
2 11801.90993395 2.88713335
5
2 11734.85393060 0.00204795
6
2 11731.57580732 0.00054912
7
1 11729.33587289 0.00001849
8
1 11729.26578521 0.00000003
9
1 11729.26567357 0.00000000

Convergence criteria met.
43
Fitting a Longitudinal Model in PROC MIXED

Estimated R Matrix for Subject 13 Row
Col1 Col2 Col3 Col4 Col5
Col6 Col7 Col8 Col9 Col10
Col11 1 12.1853 7.2673 6.7060
6.2226 5.7297 5.2850 4.8914 4.5252
4.1739 3.8615 3.5831 2 7.2673
12.1853 7.2487 6.7261 6.1934 5.7126
5.2872 4.8914 4.5117 4.1739 3.8730

Estimated R
Matrix for
Subject
13
Row Col12
1 3.3050

2 3.5724
44
Fitting a Longitudinal Model in PROC MIXED

Estimated R Matrix for Subject
13 Row Col1 Col2 Col3
Col4 Col5 Col6 Col7 Col8
Col9 Col10 Col11 3 6.7060
7.2487 12.1853 7.2891 6.7118 6.1908
5.7297 5.3008 4.8893 4.5233 4.1972
4 6.2226 6.7261 7.2891 12.1853
7.2332 6.6717 6.1749 5.7126 5.2692
4.8747 4.5233 5 5.7297 6.1934
6.7118 7.2332 12.1853 7.2456 6.7060
6.2040 5.7224 5.2940 4.9124 6
5.2850 5.7126 6.1908 6.6717 7.2456
12.1853 7.2704 6.7261 6.2040 5.7396
5.3258 7 4.8914 5.2872 5.7297
6.1749 6.7060 7.2704 12.1853 7.2673
6.7032 6.2013 5.7543 8 4.5252
4.8914 5.3008 5.7126 6.2040 6.7261
7.2673 12.1853 7.2456 6.7032 6.2199
9 4.1739 4.5117 4.8893 5.2692
5.7224 6.2040 6.7032 7.2456 12.1853
7.2673 6.7434 10 3.8615 4.1739
4.5233 4.8747 5.2940 5.7396 6.2013
6.7032 7.2673 12.1853 7.2891 11
3.5831 3.8730 4.1972 4.5233 4.9124
5.3258 5.7543 6.2199 6.7434 7.2891
12.1853 12 3.3050 3.5724 3.8714
4.1722 4.5310 4.9124 5.3076 5.7371
6.2199 6.7233 7.2456
Row Col12

3 3.8714
4 4.1722
5
4.5310
6 4.9124
7 5.3076

8 5.7371
9 6.2199
10
6.7233
11 7.2456
12 12.1853
45
Fitting a Longitudinal Model in PROC MIXED
Covariance Parameter Estimates Cov Parm
Subject Estimate Variance id
7.8554 ? 3 variance parameters because SP(POW)
id 0.8554 of local option Residual
4.3300 Fit
Statistics -2 Res Log Likelihood 11729.3
AIC (smaller is
better) 11735.3 AICC (smaller is better)
11735.3 BIC (smaller is better) 11747.0
Null Model Likelihood Ratio Test DF
Chi-Square Pr ChiSq 2 938.78
46
Fitting a Longitudinal Model in PROC MIXED

Solution for Fixed Effects
Standard Effect Estimate
Error DF t Value Pr
t Intercept 8.0939 0.2434
1100 33.25 -1.1385 0.1008 991 -11.29
0.01736 0.01917 385 0.91
0.3659 cigarettes 0.4203 0.07447
1297 5.64 0.1522 0.2035 2331 0.75
0.4545 partners 0.04586 0.02291
2245 2.00 0.0455 depression
-0.02620 0.008672 2338 -3.02
0.0025 timeage -0.01451 0.006080
617 -2.39 0.0174 timedepression
0.001513 0.003825 1644 0.40
0.6926 timepartners -0.01312 0.01061
1790 -1.24 0.2163 timedrug
0.01618 0.08763 1616 0.18
0.8536 timecigarettes -0.1383 0.02986
1032 -4.63 timetime -0.1753 0.02760
966 -6.35 0.06103 0.006933 1114 8.80 47
Fitting a Longitudinal Model in PROC MIXED
48
Fitting a Longitudinal Model in PROC MIXED
49
Evaluating Covariance Structures
Objectives ? Learn the concepts regarding the
sample variogram ? Create a plot of the sample
variogram ? Plot the goodness-of-fit statistics
for the appropriate covariance structure
50
Evaluating Covariance Structures
Importance of Covariance Structures Covariance
structures ? model all the variability in the
data, which cannot be explained by the fixed
effects ? represent the background
variability that the fixed effects are tested
against ? must be carefully selected to obtain
valid inferences for the parameters of the
fixed effects
51
Evaluating Covariance Structures
Sources of Error Random Effects reflect how
much subject-specific profiles deviate from the
average profile (between-subject
variability) Serial Correlation usually is a
decreasing function of the time separation
between measurements (within-subject
variability) Measurement Error for some
measurements, there may be a certain level of
variation in the measurement itself. The
question of which is the main source of
variation, random effects or serial correlation,
is an important topic.
52
Evaluating Covariance Structures
Selecting the Appropriate Covariance
Structures ? Examine OLS residuals to detect
serial correlation and random effects ? Examine
information criteria statistics ? Create a
scatter plot called the sample variogram The
sample variogram can be used with irregularly
spaced and non-stationary data, unlike the sample
autocorrelation function.
53
Evaluating Covariance Structures
54
Evaluating Covariance Structures
55
Evaluating Covariance Structures
Process Variance
Serial Correlation
Time Interval
56
Evaluating Covariance Structures
Process Variance
Serial Correlation
Measurement Error
Time Interval
57
Evaluating Covariance Structures
Process Variance
Random Effects
Serial Correlation
Measurement Error
Time Interval
58
Evaluating Covariance Structures
Variogram
f1.0
f0.25
f0.1
Time Interval
59
Evaluating Covariance Structures
f1.0
Variogram
f0.25
f0.1
Time Interval
60
Evaluating Covariance Structures
Sample Variogram Create a sample variogram
with the AIDS data set. First compile the
variogram and variance macros (listed below).
Then use the variogram macro to create the
dataset varioplot and use the variance macro to
estimate the process variance. PROC LOESS is
then used to fit a nonparametric curve to the
data values in varioplot and output a dataset
with the predicted values. Finally, use PROC
GPLOT to display the sample variogram.
61
Evaluating Covariance Structures
SAS Macro Variogram
macro variogram (data,resvar,clsvar,expvars,i
d,time,maxtime,) ods listing close proc
mixed datadata class clsvar model
resvarexpvars / outpmresiduals run ods
listing data residuals1 set residuals
by id if first.id then timegrp1 else
timegrp1 run proc transpose dataresiduals1
outsubject prefixtime var resid time
by id id timegrp run data
variogram_table(keepvariogram)
time_interval_table(keeptime_interval) set
subject array time() time1-timemaxtime
array diff(eval(maxtime-1),eval(maxtime-1))
array timei(eval(maxtime-1),eval(maxtime-1))

62
Evaluating Covariance Structures
if _name_'Resid' then do i 1 to
eval(maxtime-1) do k i1 to
maxtime if time(i) ne . and time(k)
ne . then do
diff(i,k-1)((time(i)-time(k))2)/2
end end end else do
i 1 to eval(maxtime-1) do k i1 to
maxtime if time(i) ne . and time(k)
ne . then do
timei(i,k-1)abs(time(i)-time(k))
end end end do i1 to
eval(maxtime-1) do ki to
eval(maxtime-1) if diff(i,k) ne .
then do
variogramdiff(i,k) output
variogram_table end else
if timei(i,k) ne . then
do time_intervaltimei(i,k)
output time_interval_table
end end end
SAS Macro Variogram (continued)
run data varioplot merge variogram_table
time_interval_table run mend variogram
63
Evaluating Covariance Structures
SAS Macro Variance
macro variance(data,id,resvar,clsvar,expvars
,subjects,maxtime,) ods listing close proc
mixed datadata class clsvar model
resvarexpvars / outpmresiduals run ods
listing data residuals1 set residuals
by id if first.id then timegrp1 else
timegrp1 run proc transpose dataresiduals1
outvarsubject prefixtime var resid by
id id timegrp run data variance1(keepdiff
1-diffeval(maxtimesubjects)) retain
timepts1-timeptseval(maxtime(subjects1))
diff1-diffeval(maxtimesubjects) set
varsubject endlastone array timemaxtime
array timeptseval(subjects1),maxtime
array diff(subjects,maxtime)
64
Evaluating Covariance Structures
SAS Macro Variance (continued)
do i1 to maxtime timepts(_n_,i)time(i)
end if lastone1 then do do i1
to subjects do j1 to maxtime
do k1 to eval(subjects1)-i
do l1 to maxtime
diff(k,l)((timepts(i,j) - timepts(ki,l))2)1/2
end end
output do k1 to
subjects do l1 to
maxtime diff(k,l).
end end
end end end run
65
Evaluating Covariance Structures
SAS Macro Variance (continued)
data average_variance(keepaverage total
nonmissing) array diffeval(maxtimesubjects
) set variance1 endlastone
nonmissingn(of diff1-diffeval(maxtimesubjects
)) totalsum(of diff1-diffeval(maxtimesubje
cts)) if lastone1 then do
averagetotal/nonmissing output
end run proc print dataaverage_variance run
mend variance
66
Evaluating Covariance Structures
Run the variogram variance macros and PROC LOESS
variogram (datamixed.aids,resvarcd4_scale,clsva
r,expvarstime age cigarettes drug
partners depression timeage timecigarettes
timedrug timepartners timedepression
timetime timetimetime,idid,timeti
me,maxtime12) variance(datamixed.aids,idid,r
esvarcd4_scale,clsvar,expvarstime age
cigarettes drug partners depression timeage
timecigarettes timedrug timepartners
timedepression timetime
timetimetime,subjects369,maxtime12) proc
loess datavarioplot model variogramtime_inte
rval ods output outputstatisticsstat run
The variogram-based estimate of the process
variance is 11.7106
Obs nonmissing total average 1
2813683 32950045.18 11.7106
67
Evaluating Covariance Structures
Plot the Variogram
goptions resetall proc gplot datastat plot
depvartime_interval / vaxisaxis1 haxisaxis2
vref11.71 plot2 predtime_interval /
vaxisaxis1 haxisaxis2 symbol valuestar
colorcyan symbol2 vnone ism90s colorblue
width3 axis1 order0 to 30 by 2 axis2
order0 to 6 by .5 label time_interval'Time
Interval' format time_interval f3.1 depvar
f4.1 pred f4.1 title 'Sample Variogram of
CD4 Data' run quit
68
Evaluating Covariance Structures
Properties of the sample variogram 1. doesnt
converge to 0 as time approaches 0 implying
measurement error present 2. slope of line
not zero implying presence of serial correlation
3. line doesnt reach asymptote implying
presence of random effects
69
Evaluating Covariance Structures
Plot the Autocorrelation Function
data stat set stat autocorr1-(pred/11.71)
run goptions resetall proc gplot
datastat plot autocorrtime_interval /
vaxisaxis1 haxisaxis2 symbol vnone
ism60s axis1 order0 to 1 by .1 axis2
order0 to 6 by .5 label time_interval'Time
Interval' format time_interval f3.1 autocorr
f4.1 title 'Autocorrelation Plot of CD4
Data' run quit
70
Evaluating Covariance Structures
The Autocorrelation Plot shows that the
correlation within subject decreases from about
0.6 to 0.2 over the range of the data. Thus, the
covariance structure model for this data should
account for serial correlation.
71
Evaluating Covariance Structures
Information Criteria ? Akaike
Information Criteria (AIC) tends to choose more
complex models ? Schwarz Bayesian Information
Criteria (BIC) tends to choose simpler models ?
Because excessively simple models have inflated
Type I error rates, AIC appears to be the
most desirable in practice Based on simulation
studies by Guerin and Stroup (2000), the AIC is
preferable, especially when used in conjunction
with the Kenward-Rogers (KR) method for adjusting
the degrees-of-freedom. On the other hand, using
too complex a model reduces power. Guerin-Stroup
concluded that the AIC is the best compromise.
For small samples, use the AICC which corrects
for small samples.
72
Evaluating Covariance Structures
Selecting Covariance
Structures When selecting a covariance structure
for a model, only structures that make sense for
the data should be considered. The Table below
lists the criteria for each structure. Since the
CD4 data is unbalanced and unequally spaced,
only compound symmetry and the spatial covariance
structures are appropriate. Equal
Spacing Unequal Spacing Different time
points across subjects . Compound
Symmetry Yes Yes
Yes Unstructured Yes Yes
No AR(1) Yes No
No Toeplitz Yes No
No Spatial Structures Yes Yes
Yes
73
Evaluating Covariance Structures
Selecting Covariance
Structures The following program plots the AIC,
AICC, and BIC for the Compound Symmetry and
spatial covariance structures for the CD4 data.
ods output clear ods listing close ods output
fitstatistics(match_all persistproc)modstat pr
oc mixed datamixed.aids model cd4_scaletime
age cigarettes drug partners depression timeage
timedepression timepartners
timedrug timecigarettes timetime
timetimetime repeated / typecs
subjectid run proc mixed datamixed.aids
model cd4_scaletime age cigarettes drug partners
depression timeage timedepression
timepartners timedrug timecigarettes timetime
timetimetime repeated / typesp(pow)(time)
local subjectid
74
Evaluating Covariance Structures
proc mixed datamixed.aids model
cd4_scaletime age cigarettes drug partners
depression timeage timedepression
timepartners timedrug timecigarettes timetime
timetimetime repeated / typesp(lin)(time)
local subjectid run proc mixed
datamixed.aids model cd4_scaletime age
cigarettes drug partners depression timeage
timedepression timepartners
timedrug timecigarettes timetime
timetimetime repeated / typesp(exp)(time)
local subjectid run proc mixed
datamixed.aids model cd4_scaletime age
cigarettes drug partners depression timeage
timedepression timepartners
timedrug timecigarettes timetime
timetimetime repeated / typesp(gau)(time)
local subjectid run proc mixed
datamixed.aids model cd4_scaletime age
cigarettes drug partners depression timeage
timedepression timepartners
timedrug timecigarettes timetime
timetimetime repeated / typesp(sph)(time)
local subjectid run
75
Evaluating Covariance Structures
ods listing data model_fit length model
7 type 4 set modstat (incs)
modstat1 (inpow) modstat2 (inlin)
modstat3 (inexp) modstat4 (ingau)
modstat5 (insph) if substr(descr,1,1) in
('A','B') if substr(descr,1,3) 'AIC' then
type'AIC' if substr(descr,1,4) 'AICC' then
type'AICC' if substr(descr,1,3) 'BIC' then
type'BIC' if cs then model'CS' if pow
then model'SpPow' if lin then
model'SpLin' if exp then model'SpExp'
if gau then model'SpGau' if sph then
model'SpSph' run
76
Evaluating Covariance Structures
goptions resetall proc gplot datamodel_fit
plot valuemodeltype symbol1 valuestar
colorblue symbol2 valuecircle colorred
symbol3 valuedot colorgreen label
model'Covariance Structure' title 'Model Fit
Statistics by Covariance Structure' run quit
77
Evaluating Covariance Structures
CS and Spatial Gaussian seem to be inferior. Any
of the other spatial structures would be
appropriate in this case. Note that CS ignores
serial correlation.
78
Evaluating Covariance Structures
Summary of Selecting Covariance
Structures ? Results from the sample variogram
indicate that measurement error, serial
correlation, and error associated with
random effects are evident in the model. ?
Spatial exponential, spatial linear, spatial
power, and spatial spherical all seem to have
the best model fit statistics. ? Spatial power
is the selected covariance structure.
Write a Comment
User Comments (0)
About PowerShow.com