Econometrics With Eviews Chapter 17 Version 4 Discrete and Limited Dependent Variable Models Part 1: - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Econometrics With Eviews Chapter 17 Version 4 Discrete and Limited Dependent Variable Models Part 1:

Description:

These notes are based on a 3-day course taught by SAS Institute. ... then standard error estimates are overestimated and Type II error rates. are inflated. ... – PowerPoint PPT presentation

Number of Views:171
Avg rating:3.0/5.0
Slides: 51
Provided by: HALL2
Category:

less

Transcript and Presenter's Notes

Title: Econometrics With Eviews Chapter 17 Version 4 Discrete and Limited Dependent Variable Models Part 1:


1
Longitudinal Data Analysis with Discrete and
Continuous Responses using Proc Mixed Part 1
Introduction to Longitudinal data Analysis
Exploratory Analysis
October 15, 2003 Charlie Hallahan
2

Introduction
  • These notes are based on a 3-day course taught by
    SAS Institute.
  • ? Part 1 Introduction to Longitudinal Analysis
    Exploratory Analysis
  • Part 2 General Linear Mixed Model Evaluating
    Covariance Structures
  • Part 3 Model Development and Interpretation
  • Part 4 Random Coefficient Models
  • Part 5 Model Assessment
  • Part 6 Generalized Linear Models Generalized
    Linear Mixed Models

3
Longitudinal Data Analysis Concepts
Show how PROC MIXED can estimate panel data
models. Longitudinal (or panel) data consists of
measurements on subjects taken over time. Panel
data analysis can distinguish changes over time
within a subject and differences between subjects
at their baseline or initial levels. Typically
have large number of cross-section units and
fewer time points.
4
Longitudinal Data Analysis Concepts
Given 2 variables Y and X, in a cross-sectional
study, the relationship between Y and X could
appear to be positive, while with panel data the
relationship between Yi and Xi for the i-th
subject over time could vary dramatically across
subjects. Examples of economic panel
datasets PSID Panel Study of Income Dynamics,
www.isr.umich.edu/src/psid/ NLS National
Longitudinal Surveys of Labor Market Experience
www.bls.gov/nlshome.htm
Other examples given in Econometric Analysis of
Panel Data by Badi Baltagi.

5
Longitudinal Data Analysis Concepts
The assumption of independent observations is
usually not appropriate for longitudinal
data. Observations within a subject are usually
correlated over time Observations within a
subject are usually more similar than
observations between subjects. Variances
within subjects can vary over time.
6
Longitudinal Data Analysis Concepts
7
Longitudinal Data Analysis Concepts
Effect of using OLS variance assumptions on
standard error estimates for coefficients of
explanatory variables Time-independent
predictor variables (e.g.. gender,
region) If observations within a subject are
positively correlated (the usual case) then
standard error estimates are underestimated and
Type I error rates are inflated. Time-dependent
predictor variables (e.g., income, blood
pressure) If observations within a subject are
positively correlated (the usual case) then
standard error estimates are overestimated and
Type II error rates are inflated. cf Dunlop,
The American Statistician, 1994, Vol 48, 299-303
8
Longitudinal Data Analysis Concepts
Statistical Methods for Continuous Response in
SAS 1. Univariate ANOVA in PROC GLM 2.
Multivariate ANOVA in PROC GLM
3. Multiple Linear Regression in PROC REG 4.
Linear mixed model in PROC MIXED
9
Longitudinal Data Analysis Concepts
  • Univariate ANOVA in PROC GLM
  • Assumes all measurements have equal variance at
    all times and pairs of
  • measurements within a subject are equally
    correlated. Since correlations
  • over time usually decline, this assumption is
    unlikely for longitudinal data.
  • Data structure is simply the number of
    observations equaling the number
  • of measurements taken on all subjects.

10
Longitudinal Data Analysis Concepts
2. Multivariate ANOVA in PROC GLM The data
structure has the subjects as rows and the
columns as the time points in the study. For
example, with five time periods, there will be
five dependent variables. Multivariate ANOVA
assumes that the data is balanced, that is, that
all subjects have the same number of time
measurements and that they all occur at the same
times. This balance assumption is unreasonable
for most observational longitudinal
studies. Also, Multivariate ANOVA assumes a
within-subject covariance matrix that allows for
a unique correlation between different
time points. So for five time periods, this
leads to 10 distinct covariance parameters. A
simpler covariance structure is sufficient for
most longitudinal models.
11
Longitudinal Data Analysis Concepts
3. Linear Mixed Models using PROC MIXED The
data structure is similar to univariate ANOVA
where the number of observations equals the
number of measurements for all subjects. In
other words, the data does not have to be
balanced and each subject can have time
measurements for different times. All the data
is used in the analysis, unlike the complete-case
analysis of multivariate ANOVA. The Linear
Mixed model directly models the within-subject
covariance structure and is very flexible and
parsimonious.
12
Longitudinal Data Analysis Concepts
Model-building Strategies 1. Conduct an
exploratory data analysis looking at the cross-
sectional and longitudinal
relationships in the data. 2. Fit a complex
mean model and examine the OLS residuals. 3.
Use the OLS residuals to construct a sample
variogram and help select a covariance
structure. 4. Simplify the complex mean
model. 5. Evaluate model diagnostics and
identify potential outliers.
13
Exploratory Data Analysis
Objectives 1. Graph individual and group
profiles 2. Identify cross-sectional and
longitudinal patterns
14
Exploratory Data Analysis
CD4 Data Set HIV causes AIDS by attacking an
immune cell called the CD4 cell which
facilitates the bodys ability to fight
infection. An uninfected person has
approximately 1100 cells per milliliter of blood.
Since CD4 cells decrease in number form the
time of infection, a persons CD4 cell count
can be used to monitor disease progression. A
subset of the Multicenter AIDS Cohort Study
(1987) is used with 369 infected men to examine
CD4 cell counts over time.
15
Exploratory Data Analysis
Objectives of CD4 Cell Numbers Study 1.
Estimate the average time course of CD4 cell
depletion 2. Estimate the time course for
individual men 3. Characterize the degree of
heterogeneity across men in the rate of
progression. 4. Identify factors which predict
CD4 cell changes.
16
Exploratory Data Analysis
CD4 Data Set Variables CD4 CD4 cell
count time time in years since HIV detectable
(seroconversion) age in years relative to
arbitrary origin cigarettes packs of cigarettes
smoked per day drug recreational drug use
(1yes, 0no) partners number of partners
relative to arbitrary origin depression CES-D (a
depression scale) id subject identification
number The data is unbalanced because the
measurements can occur at any time and the number
of measurements can vary over subjects
17
Exploratory Data Analysis
Individual and Group Profiles
options nodate proc print datamixed.aids(obs25)
var id cd4 time age cigarettes drug partners
depression title 'Line Listing of CD4
Data' run
18
Exploratory Data Analysis
Line Listing of CD4 Data

Obs id CD4 time age
cigarettes drug partners depression
1 10002 548 -0.74196
6.57 0 0 5
8 2 10002 893
-0.24641 6.57 0 1 5
2 3 10002
657 0.24367 6.57 0 1
5 -1 4 10005
464 -2.72964 6.95 0 1
5 4 5
10005 845 -2.25051 6.95 0
1 5 -4
6 10005 752 -0.22177 6.95 0
1 5 -5
7 10005 459 0.22177 6.95
0 1 5 2
8 10005 181 0.77481 6.95
0 1 5 -3
9 10005 434 1.25667
6.95 0 1 5
-7 10 10029 846
-1.24025 2.64 0 1 5
18 11 10029
1102 -0.74196 2.64 0 1
5 18 12
10029 801 -0.25188 2.64 0
1 5 38
13 10029 824 0.25188 2.64
0 1 5 7
14 10029 866 0.76934 2.64
0 1 5 15
15 10029 704 1.41273 2.64
0 1 5 21
16 10029 757 1.80698
2.64 0 1 5
25 17 10029 726
2.42026 2.64 0 1 5
29 18 10039 1277
-1.39357 11.28 3 1
-4 -7 19 10039
1132 -0.72006 11.28 3 0
-2 -5 20
10039 1454 -0.26010 11.28 3
1 -3 -6
21 10039 738 0.26010 11.28
3 0 -4 -7
22 10048 994 -0.30664 17.99
0 1 5 -7
23 10048 486 0.30664 17.99
0 1 5 -7
24 10048 605 0.81314
17.99 0 1 5
-5 25 10048 880
1.09514 17.99 0 1 5
7
19
Exploratory Data Analysis
proc means datamixed.aids n min max mean median
std var cd4 time age cigarettes drug partners
depression title 'Descriptive Statistics for
CD4 Data' run
20
Exploratory Data Analysis

Descriptive Statistics for CD4 Data

The MEANS Procedure Variable
N Minimum Maximum
Mean Median Std Dev
--------------------------------------------------
------------------------------------------------
CD4 2376 10.0000000
3184.00 765.1313131 701.5000000
399.3715606 time 2376
-2.9897330 5.4592740 0.8284246
0.7296370 1.8782282 age
2376 -11.2900000 29.0800000
2.6359512 1.5100000 7.5039253
cigarettes 2376 0
4.0000000 0.9890572 0
1.4389639 drug 2376
0 1.0000000 0.7558923
1.0000000 0.4296474 partners
2376 -5.0000000 5.0000000
-0.0340909 -1.0000000 3.6588315
depression 2376 -7.0000000
49.0000000 2.4957912 0
9.5863051 ----------------------------
--------------------------------------------------
--------------------
CD4 has a range of 10 to 3184. Time has a range
of 3 years before seroconversion and 5.5 years
after. 76 of the subjects report some drug use.
21
Exploratory Data Analysis
Descriptive statistics by subject
proc means datamixed.aids noprint nway
class id var cd4 age cigarettes drug partners
depression output outsubject meanavgid_cd4
avgid_age avgid_cigarettes avgid_drug
avgid_partners avgid_depression run data
subject set subject druguse(avgid_drug
gt 0) run proc means datasubject n min max
mean median std var _freq_ avgid_cd4
avgid_age avgid_cigarettes avgid_drug
druguse avgid_partners avgid_depression title
'Descriptive Statistics for CD4 Data Aggregated
by Subject' run
22
Exploratory Data Analysis

Descriptive Statistics for CD4 Data Aggregated
by Subject
The MEANS Procedure
Variable N Minimum
Maximum Mean Median
Std Dev ---------------------------------
--------------------------------------------------
-------------------- _FREQ_
369 1.0000000 12.0000000
6.4390244 6.0000000 2.7141327
avgid_cd4 369 245.6000000
1979.50 773.7727088 731.4000000
290.4222593 avgid_age 369
-11.2900000 29.0800000 2.3844444
1.3400000 7.4355005
avgid_cigarettes 369 0
4.0000000 1.0914632 0
1.3870508 avgid_drug 369
0 1.0000000 0.7728360
1.0000000 0.3373715 druguse
369 0 1.0000000
0.9024390 1.0000000 0.2971230
avgid_partners 369 -4.5555556
5.0000000 0.1284397 -0.4000000
2.7544843 avgid_depression 369
-6.8333333 40.5000000 2.5154360
1.0000000 7.7397510
--------------------------------------------------
--------------------------------------------------
---
The 369 subjects have time observations ranging
from 1 to 12. The average CD4 count was 774
with a range of 246 through 1980 for all
subjects. 90 of the subjects reported some drug
use. The subjects show a wide range on the
depression scale from -6.8 to 40.5.
23
Exploratory Data Analysis
Plot individual profiles
goptions resetall proc gplot datamixed.aids
plot cd4timeid / haxis -3 to
5.5 by 0.5 vaxis 0 to 3500 by
500 symbol vnone repeat369 ijoin
colorblue label time'Years since
Seroconversion' title 'Individual Profiles of
the CD4 Data' run quit
The repeat369 option says not to use multicolors
in the plot.
24
Exploratory Data Analysis
Hard to interpret, too many subjects
25
Exploratory Data Analysis
Superimpose a dark smooth spline showing the
average response against a light background with
the individual responses (suggestion of
Tufte) -------------------------------------------
--------------------------------------------------
-----
goptions resetall proc gplot datamixed.aids
plot cd4timeid / haxis -3 to
5.5 by 0.5 vaxis 0 to 3500 by
500 plot2 cd4time / haxis
-3 to 5.5 by 0.5 vaxis 0 to
3500 by 500 symbol1 vnone repeat369 ijoin
colorcyan symbol2 vnone ism50s colorblue
width3 label time'Years since
Seroconversion' title 'Individual Profiles
with the Average Trend Line' run quit
26
Exploratory Data Analysis
CD4 count appears to remain constant around 1000
until seroconversion, then rapidly declines
before leveling off again around 500. The
relationship appears to be cubic in nature.
27
Exploratory Data Analysis
Individual Group Profiles with Drug Usage
Subgroups
proc format value druggrp 0'no recreational
drug use' 1'recreational drug
use' run goptions resetall proc gplot
datamixed.aids plot cd4timeid
/ haxis -3 to 5.5 by 0.5
vaxis 0 to 3500 by 500 plot2 cd4timedrug
/ haxis -3 to 5.5 by 0.5
vaxis 0 to 3500 by 500 symbol1
vnone repeat369 ijoin colorcyan symbol2
vnone ism50s colorblue width3 line1
symbol3 vnone ism50s colorred width3 line3
label time'Years since Seroconversion'
format drug druggrp. title 'Individual
Profiles with Drug Usage Subgroups' run quit
28
Exploratory Data Analysis
Drug use does not affect CD4 count very much.
29
Exploratory Data Analysis
Individual Group Profiles with Cigarette Usage
Subgroups First collapse levels of cigarette
usage into 3 groups.
data mixed.aids set mixed.aids

ciggroup1(cigarettes0)2(1ltcigaretteslt2)
3(2ltcigaretteslt4) run proc freq
datamixed.aids tables ciggroup title
'Frequency Table of Cigarette Group' run
Frequency
Table of Cigarette Group

Cumulative
Cumulative ciggroup Frequency
Percent Frequency Percent
--------------------------------------------------
----------- 1 1515
63.76 1515 63.76 2
278 11.70 1793 75.46
3 583 24.54
2376 100.00
30
Exploratory Data Analysis
proc format value cgroup 1'non-smoker'
2'light to moderate'
3'heavy' run goptions resetall proc gplot
datamixed.aids plot cd4timeid
/ haxis -3 to 5.5 by 0.5
vaxis 0 to 3500 by 500 plot2
cd4timeciggroup / haxis -3 to
5.5 by 0.5 vaxis 0 to 3500 by
500 symbol1 vnone repeat369 ijoin
colorcyan symbol2 vnone ism50s colorblack
width3 line1 symbol3 vnone ism50s
colorblue width3 line2 symbol4 vnone
ism50s colorred width3 line3 format
ciggroup cgroup. label time'Years since
Seroconversion' title 'Individual Profiles
with Cigarette Usage Subgroups' run quit
31
Exploratory Data Analysis
Heavy cigarette smokers experience a more rapid
decline in CD4 count. The gap between the lines
may indicate a time-cigarette interaction.
32
Exploratory Data Analysis
Individual Group Profiles by Age Subgroups First
collapse age into 4 quartiles using PROC RANK..
proc rank datamixed.aids groups4 outageranks
var age ranks agegroup run proc format
value quartile 0'1st quartile'
1'2nd quartile' 2'3rd
quartile' 3'4th quartile' run
33
Exploratory Data Analysis
goptions resetall proc gplot dataageranks
plot cd4timeid / haxis -3 to
5.5 by 0.5 vaxis 0 to 3500 by
500 plot2 cd4timeagegroup /
haxis -3 to 5.5 by 0.5 vaxis
0 to 3500 by 500 symbol1 vnone repeat369
ijoin colorcyan symbol2 vnone ism50s
colorblack width3 line1 symbol3 vnone
ism50s colorblue width3 line2 symbol4
vnone ism50s colorred width3 line3
symbol5 vnone ism50s colorgreen width3
line4 format agegroup quartile. label
time'Years since Seroconversion' title
'Individual Profiles with Age Subgroups' run qui
t
34
Exploratory Data Analysis
There does not appear to be any effect of age on
CD4 count trend. Similar graphs show no effect
of partners or depression on the CD4 count trend.
35
Exploratory Data Analysis
Cross-sectional versus Longitudinal
Relationships For a time-dependent explanatory
variable X, the question is how to check
what kind of relationship (if any) there is
between X and the response variable Y in both the
cross-sectional and time dimensions. For
cross-sectional relationships graph the
baseline (or initial value) of Y
against
the baseline value of X. For longitudinal
relationships graph the change in Y (value at
time t baseline)
against the change in X.
36
Exploratory Data Analysis
Create SAS data sets with baseline values for the
cross-sectional plots and change values for the
time plots.
data aids1 aids2 set mixed.aids by id
retain basecd4 basedrug basedepress basecig
basepart if first.id then do
basecd4cd4 basedrugdrug
basedepressdepression
basecigcigarettes basepartpartners
output aids1 end if not
first.id then do
chngcd4cd4-basecd4 chngdrugdrug-basedru
g chngdepressdepression-basedepress
chngcigcigarettes-basecig
chngpartpartners-basepart output
aids2 end run
37
Exploratory Data Analysis
Compare baseline CD4 with baseline age.
goptions resetall proc gplot dataaids1
plot basecd4age / vaxis 0 to 3000
by 1000 haxis -12 to 30 by 2
plot2 basecd4age / vaxis 0 to 3000
by 1000 haxis -12 to 30 by 2
symbol1 vstar colorcyan symbol2 vnone
ism70s colorblue width3 label basecd4
'Baseline CD4' age 'Baseline Age'
title 'Baseline CD4 Cells vs. Baseline
Age' run quit
38
Exploratory Data Analysis
Slight positive relationship between baseline
CD4 counts baseline age.
39
Exploratory Data Analysis
Slight positive relationship between baseline
CD4 counts baseline drug use.
40
Exploratory Data Analysis
Essentially no relationship between baseline CD4
counts baseline depression
41
Exploratory Data Analysis
Strong positive relationship between baseline
CD4 counts baseline values of number of packs
of cigarettes smoked per day.
42
Exploratory Data Analysis
Essentially no relationship between baseline CD4
counts baseline number of partners.
43
Exploratory Data Analysis
Correlation analysis shows only strong positive
relationship between baseline CD4 count and
baseline cigarette use.
proc corr dataaids1 rank nosimple var age
basedrug basedepress basecig basepart with
basecd4 title 'Correlation Analysis of
Baseline CD4 Cells to Baseline Covariates' run
Correlation
Analysis of Baseline CD4 Cells to Baseline
Covariates
The CORR
Procedure 1 With
Variables basecd4 5
Variables age basedrug
basedepress basecig basepart
Pearson Correlation
Coefficients, N 369
Prob gt r under H0 Rho0
basecd4 basecig age
basedrug basepart
basedepress
0.37534 0.10144 0.07236
-0.02544 0.02282
lt.0001 0.0515 0.1654
0.6262 0.6621
44
Exploratory Data Analysis
Longitudinal plots now compare change in CD4
count from baseline with change from baseline for
each explanatory variable. Prototype code for
change in drug use.
goptions resetall proc gplot dataaids2
plot chngcd4chngdrug / vref0
vaxis -2500 to 2500 by 500
haxis -1 to 1 by .1 plot2 chngcd4chngdrug
/ vaxis -2500 to 2500 by 500
haxis -1 to 1 by .1 symbol1 vstar
colorcyan symbol2 vnone ism70s colorblue
width3 label chngcd4 'Change CD4'
chngdrug 'Change in Recreational Drug Use'
title 'Change CD4 Cells vs. Change in
Recreational Drug Use' run quit
45
Exploratory Data Analysis
No significant relationship for change in drug
use.
46
Exploratory Data Analysis
Some evidence of a negative relationship between
change in depression score and change in baseline
CD4 count. As depression increases, CD4 count
decreases.
47
Exploratory Data Analysis
Evidence of a positive relationship between
change in packs of cigarettes smoked and change
in baseline CD4 count.
48
Exploratory Data Analysis
Evidence of a positive relationship between
change in partners and change in baseline CD4
count.
49
Exploratory Data Analysis
Correlation analysis supports the graphical
evidence.
proc corr dataaids2 rank nosimple var
chngdrug chngdepress chngcig chngpart with
chngcd4 title 'Correlation Analysis of Change
in CD4 Cells to Change in Covariates' run
Correlation Analysis of
Change in CD4 Cells to Change in Covariates



The CORR Procedure
1 With Variables chngcd4
4 Variables chngdrug
chngdepress chngcig chngpart
Pearson Correlation
Coefficients, N 2007
Prob gt r under H0 Rho0
chngcd4 chngpart
chngcig chngdepress chngdrug
0.18380
0.08236 -0.06615 0.02966
lt.0001
0.0002 0.0030 0.1841
50
Exploratory Data Analysis
Summary of Exploratory Data
Analysis 1. There seems to be a cubic
relationship between CD4 cell count time. 2.
The group profile plots show a time by cigarette
usage interaction. 3. The cross-sectional plots
show a positive relationship between the
baseline CD4 cell counts and the
baseline cigarette usage. 4. The longitudinal
plots show a positive relationship between the
change in CD4 cell counts
and the change in the number of partners.
All these results will be helpful in specifying
an initial model for PROC MIXED.
Write a Comment
User Comments (0)
About PowerShow.com