Title: SC968: Panel Data Methods for Sociologists
1SC968 Panel Data Methods for Sociologists
- Random coefficients models
2Overview
- Random coefficients models
- Continuous data
- Binary data
- Growth curves
3Random coefficients models
- Also known as
- Multilevel models
- MLwiN
- http//www.cmm.bristol.ac.uk/
- Hierarchical models
- HLM
- http//www.ssicentral.com/
4Random coefficients modelsfor continuous outcomes
5Random coefficients models
- We started off with fixed effects models that
pooled data across many waves of panel data - Then we allowed intercepts to vary for each
individual using random effects models - We can also allow the coefficients for the
independent variables to vary for each individual - These models are called random coefficients or
random slopes models
6Possible combinations of slopes and intercepts
The assumptions required for this model are
unlikely to hold
The fixed effects model
Constant slopes Constant intercept
Constant slopes Varying intercepts
Separate regression for each individual
Unlikely to occur
Varying slopes Constant intercept
Varying slopes Varying intercepts
7Partitioning unexplained variance in a random
coefficients model
8Random coefficients model for continuous data
Fixed coefficients
Residual
Random coefficients
9Random coefficients model for continuous data
Fixed intercept
Random intercept
Random slope
Random error
Fixed slope
10Worked example
- Random 20 sample from BHPS
- Waves 1 - 15
- Ages 21 to 59
- Outcome GHQ likert scores
- Explanatory variable household income last month
(logged)
11Random coefficients model example
- where
- yij GHQ score for subject i, j 1,, J
- xij logged household income in month to
wave j - ß1 mean slope
- bi subject-specific random deviation from
mean slope - ui subject-specific random intercept
12Linear random coefficients model
13. xtmixed hlghq1 lnfihhmn pid lnfihhmn, mle
cov(unstr) variance Mixed-effects ML regression
Number of obs
18541 Group variable pid
Number of groups 2508
Obs per group
min 1
avg 7.4
max 15
Wald chi2(1)
28.11 Log likelihood -55286.004
Prob gt chi2 0.0000 --------------
--------------------------------------------------
-------------- hlghq1 Coef. Std.
Err. z Pgtz 95 Conf.
Interval ---------------------------------------
--------------------------------------
lnfihhmn -.4015666 .075741 -5.30 0.000
-.5500162 -.253117 _cons
14.40864 .5917387 24.35 0.000 13.24885
15.56843 --------------------------------------
---------------------------------------- ---------
--------------------------------------------------
------------------- Random-effects Parameters
Estimate Std. Err. 95 Conf.
Interval ---------------------------------------
-------------------------------------- pid
Unstructured
var(lnfihhmn) 2.073304 .3231129
1.527594 2.813961 var(_cons)
144.5579 19.37199 111.1664
187.9793 cov(lnfihhmn,_cons)
-16.63265 2.488823 -21.51065
-11.75465 ---------------------------------------
--------------------------------------
var(Residual) 18.10746 .2092684
17.70191 18.5223 -----------------------------
-------------------------------------------------
LR test vs. linear regression chi2(3)
5099.77 Prob gt chi2 0.0000
Random slopes
Fixed effect
Estimates covariance between all random
effects Least restrictive model
14. xtmixed hlghq1 lnfihhmn pid lnfihhmn, mle
cov(unstr) variance Mixed-effects ML regression
Number of obs
18541 Group variable pid
Number of groups 2508
Obs per group
min 1
avg 7.4
max 15
Wald chi2(1)
28.11 Log likelihood -55286.004
Prob gt chi2 0.0000 --------------
--------------------------------------------------
-------------- hlghq1 Coef. Std.
Err. z Pgtz 95 Conf.
Interval ---------------------------------------
--------------------------------------
lnfihhmn -.4015666 .075741 -5.30 0.000
-.5500162 -.253117 _cons
14.40864 .5917387 24.35 0.000 13.24885
15.56843 --------------------------------------
---------------------------------------- ---------
--------------------------------------------------
------------------- Random-effects Parameters
Estimate Std. Err. 95 Conf.
Interval ---------------------------------------
-------------------------------------- pid
Unstructured
var(lnfihhmn) 2.073304 .3231129
1.527594 2.813961 var(_cons)
144.5579 19.37199 111.1664
187.9793 cov(lnfihhmn,_cons)
-16.63265 2.488823 -21.51065
-11.75465 ---------------------------------------
--------------------------------------
var(Residual) 18.10746 .2092684
17.70191 18.5223 -----------------------------
-------------------------------------------------
LR test vs. linear regression chi2(3)
5099.77 Prob gt chi2 0.0000
Fixed coefficient
Fixed intercept
15. xtmixed hlghq1 lnfihhmn pid lnfihhmn, mle
cov(unstr) variance Mixed-effects ML regression
Number of obs
18541 Group variable pid
Number of groups 2508
Obs per group
min 1
avg 7.4
max 15
Wald chi2(1)
28.11 Log likelihood -55286.004
Prob gt chi2 0.0000 --------------
--------------------------------------------------
-------------- hlghq1 Coef. Std.
Err. z Pgtz 95 Conf.
Interval ---------------------------------------
--------------------------------------
lnfihhmn -.4015666 .075741 -5.30 0.000
-.5500162 -.253117 _cons
14.40864 .5917387 24.35 0.000 13.24885
15.56843 --------------------------------------
---------------------------------------- ---------
--------------------------------------------------
------------------- Random-effects Parameters
Estimate Std. Err. 95 Conf.
Interval ---------------------------------------
-------------------------------------- pid
Unstructured
var(lnfihhmn) 2.073304 .3231129
1.527594 2.813961 var(_cons)
144.5579 19.37199 111.1664
187.9793 cov(lnfihhmn,_cons)
-16.63265 2.488823 -21.51065
-11.75465 ---------------------------------------
--------------------------------------
var(Residual) 18.10746 .2092684
17.70191 18.5223 -----------------------------
-------------------------------------------------
LR test vs. linear regression chi2(3)
5099.77 Prob gt chi2 0.0000
Random slope
Random intercept
Covariation between random intercept and random
slope
16Post estimation predictions
17Post estimation predictions random coefficients
. predict re_slope re_int, reffects
----------------------------------
pid re_int re_slope
---------------------------------- 1.
10019057 -4.833731 .4040526 4.
10028005 -5.241494 .3758182 16.
10042571 -1.442705 .1409662 17.
10051538 -2.836288 .305106 35.
10059377 5.487209 -.5189736
----------------------------------
18Predicted individual regression lines
. gen intercept _b_cons re_int . gen slope
_blnfihhmn re_slope
---------------------------------
pid intercept slope
---------------------------------
10019057 9.574909 .002486
10028005 9.167147 -.0257484
10042571 12.96594 -.2606004
10051538 11.57235 -.0964606
10059377 19.89585 -.9205403
---------------------------------
19Partitioning unexplained variance in a random
coefficients model
20Calculating the variance partition coefficient
Between variance
Total variance i.e. Between Within
21Calculating the variance partition coefficient
- Random slopes model
- At the intercept, 0
-
-
- So, the VPC for the random slopes model reduces
to the same as the random intercepts model
22Variance partition coefficient for our example
Tentative interpretation least variability in
GHQ for those on average incomes
23Random coefficients models
24Random coefficients model for binary data
-
-
- where
- ßk is the mean coefficient or fixed
effect of covariate k - bik is a subject-specific random deviation
from mean coefficient - ui is a subject-specific random intercept with
mean zero
25Logistic random coefficients example
- where
- yij binary GHQ score for subject i, j
1,, J - xij employment status in wave j
- ß1 mean slope
- bi subject-specific random deviation from
mean slope - ui subject-specific random intercept
26Worked example
- Random 20 sample
- 15 waves of BHPS
- Ages 21 to 59
- Outcome GHQ binary scores (psychological
morbidity cases hlghq2 gt 2) - Explanatory variable employment status (jbstat
recoded to employed/unemployed/olf)
27Logistic random coefficients model
28(No Transcript)
29No constant term with odds ratios
30No random residual for logit model
31(No Transcript)
32Random coefficients models for development over
time
33Growth curve models
- Models change over time as a continuous
trajectory - Suitable for research questions such as
- What is the trajectory for the population?
- Are there distinct trajectories for each
respondent? - If individuals have distinct trajectories, what
variables predict these individual trajectories?
34Linear growth curve model
- Individual growth curves
- t 0 at baseline and 1,2,3 .,T in successive
waves - Mean population growth curve
35Worked example
- Random 20 sample from BHPS
- Waves 1 - 15
- All respondents over 16 years
- Outcome self-rated health (hlstat)
- Linear growth wave 1 to wave 15
36(No Transcript)
37(No Transcript)
38Slope (change in health over time)
Intercept (mean health at baseline)
39Individual differences in health change
Individual differences in baseline health
40(No Transcript)
41Adding time invariant covariates
42Interacting gender with time
43Adding time varying covariates
44Beyond linear change
- Polynomial trajectories
- Quadratic or cubic trajectories
- Piecewise linear trajectories
- Exponential trajectories
45Non linear growth curves
46Specifying time
- Metrics of time
- Wave of assessment
- Chronological age
- Time before/after an event
- Individually varying values of time
47Age-period-cohort
- A cohort is defined by their age in a particular
period - Impossible to separate age, period and cohort
effects - But any pair of the 3 factors are independent
- If wave is our metric of time, then usual to
control for age at baseline (i.e. cohort) - If age is our metric of time, then can control
for period (i.e. wave) effects using dummy
variables
48Accelerated panel designs
- Respondents of varying age sampled at same
time-point then followed for several years - Period and age effects not completely confounded
as in a birth cohort design - Assumption, after controlling for period growth
curves for each cohort overlap and form smooth
curve
49Example of an accelerated panel design
- Random 20 sample from BHPS
- Waves 1 - 15
- All respondents over 16 years
- Outcome self-rated health (hlstat)
- Quadratic growth by age
- Controlling for period effects (wave)
50(No Transcript)
51(No Transcript)
52(No Transcript)
53When are random coefficients not necessary?
- When number of units at level 2 is small
- Use dummy variables with OLS
- When want to correct for correlation between
observations but not interested in variation
between and within subjects - Use fixed effects model
- When correlation within subjects is small
- Using OLS when ICC large will give similar
estimates
54Finally.
- Random coefficients models can take a very long
time to estimate - Can speed things up by collapsing data and using
frequency weights - My personal recommendation is to use MLwiN
- Excellent online training material
- Easier to build up model step-by-step