Title: 4'0 Contextual effects
14.0 Contextual effects
In the previous sections we found that schools
vary in both their intercepts and slopes
resulting in crossing lines. The next question is
are there any school level variables that can
explain this variation?
- Interest lies in how the outcome of individuals
in a cluster is affected by their social contexts
(measures at the cluster level). Typical
questions are - Does school type effect students' exam results?
- Does area deprivation effect the health status
of individuals in the area?
In our data set we have a contextual school
ability measure, schav. The mean intake score is
formed for each school, these means are ranked
and the ranks are categorised into 3 groups
lowlt25,25gtmidlt75, highgt75
24.1 Exploring contextual effects and the tutorial
data
Does school gender effect exam score by gender?
Do boys in boys schools do better or worse or
the same compared with boys in mixed schools? Do
girls in girls schools do better or worse or the
same compared with girls in mixed schools?
Does peer group ability effect individual pupil
performance? That is given two pupils of equal
intake ability do they progress differently
depending on whether they are educated in a low,
mid or high ability peer group?
34.2 School gender effects
girl boysch girlsch 0 0
0 boy/mixed school
-0.189 1 0 0
girl/mixed school -0.1890.168 0
1 0 boy/boy school
-0.1890.180 1 0 1
girl/girl school
-0.1890.1680.175
44.3 Peer group ability effects
The effect of peer group ability is modelled as
being constant across gender, school gender and
standlrt. For example, comparing peer group
ability effects for boys in mixed schools and
boys in boys schools
-0.2650.552standlrtij boy,mixed
school,low(reference group)
54.4 Cross level interactions
There may be interactions between school gender,
peer group ability, gender and standlrt. An
interesting interaction is between peer group
ability and standlrt. This tests whether the
effect of peer group differs across the standlrt
intake spectrum. For example, being in a high
ability group may have a different effect for
pupils of different ability. This is a cross
level interaction because it is the interaction
between a pupil level variable(standlrt) and a
school level variable(schav).
64.5 Cross level interactions contd
Which leads to three lines for the low,mid and
high groupings. -0.3470.455standlrtij
low (-0.3470.144)(0.4550.092) standlrtij
mid (-0.3470.290)(0.4550.180) standlrtij
high
75.0 Variance functions or modelling
heteroscedasticity
Tabulating normexam by gender we see that the
means and variances for boys and girls are
(0.140 and 1.051) and (0.093 and 0.940). We may
want to fit a model that estimates separate
variances for boys and girls. The notation we
have been using so far assumes a common
intercept(?0) and a single set of student
residuals, ei, with a common variance ?e2. We
need to use a more flexible notation to build
this model.
85.1 Working with general notation in MLwiN
A model with no variables specified in general
notation looks like this.
A new first line is added stating that the
response variable follows a Normal distribution.
We now have the flexibility to specify
alternative distributions for our response. We
will explore these models later. The ?0
coefficient now has an explanatory x0 associated
with it. The values x0 takes determines the
meaning of the ?0 coefficient. If x0 is a vector
of 1s then ?0 will estimate an intercept common
to all individuals, in the absence of other
predictors this would be the overall mean. If x0
variable, say 1 for boys and 0 for girls, then ?0
will estimate the mean for boys.
95.2 A simple variance function
The new notation allows us to set up this simple
model where x0i is a dummy variable for boy and
x1i is a dummy variable for girl. This model
estimates separate means and variances for the
two groups. This is an example of a variance
function because the variance changes as a
function of explanatory variables. The function
is
105.3 Deriving the variance function
We arrive at the expression
(1)
115.4 Variance functions at level 2
The notion of variance functions is powerful and
not restricted to level 1 variances.
The random slopes model fitted earlier produces
the following school level predictions which show
school level variability increasing with intake
score.
The model
125.5 Two views of the level 2 variance
Given x0 1, we have
Which shows that the level 2 variance is
polynomial function of x1ij
- View 1 In terms of school lines predicted
intercepts and slopes varying across schools.
View 2 In terms of a variance function which
shows how the level 2 variance changes as a
function of 1 or more explanatory variables.
135.6 Elaborating the level 1 variance
Maybe the student level departures around their
schools summary lines are not constant.
Note at level 2 we have 2 interpretations of
level 2 random variation, random coefficients
(varying slopes and intercepts across level 2
units) and variance functions. In each level 1
unit, by definition, we only have one point,
therefore the first interpretation does not exist
because you cannot have a slope given a single
data point.
145.7 Variance functions at level 1
If we allow standlrt(x1ij) to have a random term
at level 1, we get
155.8 Modelling the mean and variance simultaneously
In our model
166.0 Multivariate response models
We may have data on two or more responses we wish
to analyse jointly. For example, we may have
english and maths scores on pupils within
schools. We can consider the response type as a
level below pupil.
176.1 Rearranging data
Often data comes like this with one row per person
For MLwiN to analyse the data we require the data
matrix to have one row per level 1 unit. That is
one row per response measurement
186.2 Writing down the model
Where y1j is the english score for student j and
y2j is the maths score for student j. The means
and variances for english and maths(?0,?1,?u02,?u1
2) are estimated. Also the covariance between
maths and english, ? u01is estimated.
Note there is no level 1(eij) variance. This can
be seen if we consider the picture for one pupil.
196.3 Advantages of framing a multivariate response
model as a multilevel model
The model has the following advantages over
traditional multivariate techniques It can
deal with missing responses-provided response
data is missing completely at random(MCAR) or
missing at random(MAR) that is missingness is
related to explanatory variables in the
model. Covariates can be added giving us the
conditional covariance matrix of the
responses. Further levels can be added to the
model
206.4 Example from MLwiN user guide
pupils have two responses written and coursework
mean for written 46.8 Variance(written)
178.7 mean for coursework 73.36 Variance(coursew
ork) 265.4 covariance(written, coursework)
102.3
That is we have two means and a covariance
matrix, which we could get from any stats
package. However, the data are unbalanced. Of the
1905 pupils 202 are missing a written response
and 180 are missing a coursework response.
216.5 Further extensions
We can add further explanatory variables. For
example, female. We see that females do better
for coursework than males and worse than males
on written exams males do better on written exams.
226.6 Repeated measures.
We may have repeated measurements on individuals,
for example a series of heights or test scores.
Often we want to model peoples growth. We can fit
this structure as a multilevel model with
repeated measurements nested within people. That
is
236.7 Advantages of fitting repeated measures
models in a multilevel framework
- Fitting these structures using a multilevel model
has the advantages that data can be - Unbalanced (individuals can have different
numbers of measurement occasions) - Unequally spaced (different individuals can be
measured at different ages) - As opposed to traditional multivariate techniques
which require data to be balanced and equally
spaced. - Again the multilevel model requires response
measurements are MCAR or MAR.
246.8 An example from the MLwiN user guide
Repeated measures model for childrens reading
scores
This (random intercepts model) models growth as a
linear process with individuals varying only in
their intercepts. That is for the 405 individuals
in the data set
256.9 Further possibilities for repeated measures
model
- We can go on and fit a random slope model. Which
in this case allows the model to deal with
children growing at different rates. -
- We can fit polynomials in age to allow for
curvilinear growth. -
- We can also try and explain between individual
variation in growth by introducing child level
variables. - If appropriate we can include further levels of
nesting. For example, if children are nested
within schools we could fit a 3 level model
occasionschildrenschools. We could then look
to see if childrens patterns of growth varied
across schools.
267.0 MCMC estimation in MlwiN
MCMC estimation is a big topic and is given a
pragmatic and cursory treatment here. Interested
students are referred to the manual MCMC
estimation in MLwiN available from http//multile
vel.ioe.ac.uk/beta/index.html
In the workshop so far you have been using IGLS
(Iterative Generalised Least Squares) algorithm
to estimate the models.
277.1 IGLS versus MCMC
MCMC
IGLS
287.2 Bayesian framework
MCMC estimation operates in a Bayesian framework.
A bayesian framework requires one to think about
prior information we have on the parameters we
are estimating and to formally include that
information in the model. We may make the
decision that we are in a state of complete
ignorance about the parameters we are estimating
in which case we must specify a so called
uninformative prior. The posterior
distribution for a paremeter ? given that we have
observed y is subject to the following rule
p(?y)? p(y ?)p(?)
Where p(?y) is the posterior distribution for ?
given we have observed y p(y ?) is the
likelihood of observing y given ? p(?) is the
probability distribution arising from some
statement of prior belief such as we believe
?N(1,0.01). Note that we believe ?N(1,1) is
a much weaker and therefore less influential
statement of prior belief.
297.3 Applying MCMC to multilevel models
In a two level variance components model we have
the following unknowns
There joint posterior is
307.4 Gibbs sampling
Evaluating the expression for the joint posterior
with all the parameters unknown is for most
models, virtually impossible. However, if we take
each unknown parameter in turn and temporarily
assume we know the values of the other
parameters, then we can simulate from the so
called conditional posterior distribution. The
Gibbs sampling algorithm cycles through the
following simulation steps. First we assume some
starting values for our unknown parameters
317.5 Gibbs sampling cntd
We now have updated all the unknowns in the
model. This process is repeated many times until
eventually we converge on the distribution of
each of the unknown parameters.
327.6 IGLS vs MCMC convergence
IGLS algorithm converges, deterministically to a
distribution.
MCMC algorithm converges on a distribution.
Parameter estimates and intervals are then
calculated from the simulation chains.
337.7 Other MCMC issues
By default MLwiN uses flat, uniformative priors
see page 5 of MCMC estimation in MLwiN (MEM) For
specifying informative priors see chapter 6 of
MEM. For model comparison in MCMC using the DIC
statistic see chapters 3 and 4 MEM. For
description of MCMC algorithms used in MLwiN see
chapter 2 of MEM.
347.8 When to consider using MCMC in MLwiN
If you have discrete response data binary,
binomial, multinomial or Poisson (chapters 11,
12, 20 and 21). Often PQL gives quick and
accurate estimates for these models. However, it
is a good idea to check against MCMC to test for
bias in the PQL estimates.
If you have few level 2 units and you want to
make accurate inferences about the distribution
of higher level variances.
Some of the more advanced models in MLwiN are
only available in MCMC. For example, factor
analysis (chapter 19), measurement error in
predictor variables (chapter 14) and CAR spatial
models (chapter 16)
Other models, can be fitted in IGLS but are
handled more easily in MCMC such as multiple
imputation (chapter 17), cross-classified(chapter
14) and multiple membership models (chapter 15).
All chapter references to MCMC estimation in
MLwiN.
358.0 Generalised Multilevel Models 1 Binary
Responses and Proportions
368.0 Generalised multilevel models
- So Far
- Response at level 1 has been a continuous
variable and - associated level 1 random term has been assumed
to have - a Normal distribution
- Now a range of other data types for the response
- All can be handled routinely by MLwiN
- Achieved by 2 aspects
- a non-linear link between response and
predictors - a non-Gaussian level 1 distribution
378.1 Typology of discrete responses
388.2 Focus on modelling proportions
- Proportions eg death rate employment rate can
be conceived as the underlying probability of
dying probability of being employed - Four important attributes of a proportion that
MUST be taken into account in modelling - (1)Closed range bounded between 0 and 1
- (2)Anticipated non-linearity between response and
predictors as predicted response approaches
bounds, greater and greater change in x is
required to achieve the same change in outcome
examination analogy - (3)Two numbers numerator subset of denominator
- (4)Heterogeneity variance is not homoscedastic
two aspects - (a) the variance depends on the mean
- as approach bound of 0 and 1, less
room to vary - ie Variance is a function of the predicted
probability - (b) the variance depends on the denominator
- small denominators result in highly
variable proportions
398.3 Modelling Proportions
- Linear probability model that is use standard
regression model with linear relationship and
Gaussian random term - But 3 problems
- (1) Nonsensical predictions predicted
proportions are unbounded, outside range of 0
and 1 - (2) Anticipated non-linearity as approach
bounds - (3) Heteogeneity inherent unequal variance
- dependent on mean and on denominator
- Logit model with Binomial random term resolves
all three problems (could use probit, clog-clog)
408.5 The logistic model resolves problems 1 2
- The relationship between the probability and
predictor(s) can be represented by a logistic
function, that resembles a S-shaped curve
- Models not the proportion but a non-linear
transformation of it (solves problems 12)
418.6 The Logit transformation
- L LOGe(p/ (1-p))
- L Logit the log of the odds
- p proportion having an attribute
- 1-p proportion not having the attribute
- p/(1-p) the odds of having an attribute
compared to not having an attribute - As p goes from 0 to 1, L goes from minus to plus
infinity, so if model L, cannot get predicted
proportions that lie outside 0 and 1 (ie solves
problem 1) - Easy to move between proportions, odds and logits
428.7 Proportions, Odds and Logits
438.8 The logistic model
- The underlying probability or proportion is
non-linearly related to the predictor
- where e is the base of the natural logarithm
- linearized by the logit transformation(log
natural logarithm)
448.9 The logistic model key characteristics
- The logit transformation produces a linear
function of the parameters. - Bounded between 0 and 1
- Thereby solving problems 1 and 2
458.10 Solving problem 3assume Binomial variation
- Variance of the response in logistic models is
presumed to be binomial
- Ie depends on underlying proportion and the
denominator -
- In practice this is achieved by replacing the
constant variable at level 1 by a binomial
weight, z, and constraining the level-1 variance
to 1 for exact binomial variation - The random (level-1) component can be written as
468.11 Multilevel Logistic Model
- Assume observed response comes from a Binomial
distribution with a denominator for each cell,
and an underlying probability/proportion
- Underlying proportions/probabilities, in turn,
are related to a set of individual and
neighborhood predictors by the logit link function
- Linear predictor of the fixed part and the
higher-level random part
478.12 Estimation 1
- Quasi-likelihood (MQL/PQL 1st and 2nd order)
- model linearised and IGLS applied.
- 1st or 2nd order Taylor series expansion (to
linearise the non-linear model) - MQL versus PQL are higher-level effects included
in the linearisation - MQL1 crudest approximation. Estimates may be
biased downwards (esp. if within cluster sample
size is small and between cluster variance is
large eg households). But stable. - PQL2 best approximation, but may not converge.
- Tip Start with MQL1 to get starting values for
PQL.
488.13 Estimation 2
- MCMC methods get deviance of model (DIC) for
sequential model testing, and good quality
estimates even where cluster size is small start
with MQL1 and then switch to MCMC
498.14 Variance Partition Coefficient
yijBinomial(pij,1) logit(pij xij, uj,) a
bx1ij uj Var(uj) su2 var(yij- pij) pij(1-
pij) Level 1 variance is function of
predicted probability
The level 2 variance su2 is on the logit scale
and the level 1 variance var(yij- pij) is on the
probability scale so they can not be directly
compared. Also level 1 variance depends on pij
and therefore x1ij. Possible solutions include i)
set the level 1 variance variance of a standard
logistic distribution ii) simulation method
508.15 VPC 1 Threshold Model
But this ignores the fact that the level 1
variance is not constant, but is function of the
mean probability which depends on the predictors
in the fixed part of the model
518.16 VPC 2 Simulation Method
528.17 Multilevel modelling of binary data
- Exactly the same as proportions except
- The response is either 1 or 0
- The denominator is a set of 1s
- So that a Yes is 1 out of 1 , while a No is 0
out of 1
538.18 Chapter 9 of Manual Contraceptive Use in
Bangladesh
- 2867 women nested in 60 districts
- y1 if using contraception at time of survey, y0
if not using contraception - Covariates age (mean centred), urban residence
(vs. rural)
548.19 Random Intercept Model PQL2
558.20 Variance Partition Coefficient
568.21 MLwiN Gives
- UNIT or (subject) SPECIFIC Estimates
- the fixed effects conditional on higher level
unit random effects, NOT the - POPULATION-AVERAGE estimatesiethe marginal
expectation of the dependent variables across the
population "averaged " across the random effects - In non-linear models these are different and the
PA will generally be smaller than US, especially
as size of random effects grows - Can derive PA fom US but not vice-versa (next
version give both)
578.22 Unit specific / Population average
- Probability of adverse reaction against dose
- Left subject-specific big differences between
subjects for middle dose (the between patient
variance is large), - Right is the population average dose response
curve, - Subject-specific curves have a steeper slope in
the middle range of the dose variable
589.0 Multilevel Multinomial Models
Logistic models handle the situation where we
have a binary response(two response categories eg
alive/dead or pass/fail.)
Where we have a response variable with more than
two categories we use multinomial models.
Two types of multinomial response
Unordered eg voting prerference(lab, tory,
libdem, other) or cause of death.
Ordered attitude scales(strongly
disagree...strongly agree) or exam grades.
First we deal with unordered multinomial responses
599.1 Extending a binary to a multinomial model
Take a binary variable (yi) which is 1 if an
individual votes tory 0 otherwise.
The underlying probability of individual i voting
tory is ?i .
We model the log odds of voting tory as a
function of explanatory variables
log?i / (1- ?i )b0 b1x1i.....
(1)
Lets call ?i ?1i prob of individual i
voting tory and ?2i (1- ?1i ) prob of
individual i not voting tory We can now write (1)
as
log?1i / ?2ib0 b1x1i.....
609.2 Moving to more than two response categories
Suppose now that yi can take three values 1,2,3
vote tory, vote labour, vote lib dem. Now
?1i is probability of individual i votes tory ?2i
is probability of individual i votes labour ?3i
is probability of individual i votes lib dem
Now we must choose a reference category, say vote
lib dem, and model the log odds of all remaining
categories against the reference category.
Therefore with t categories we need t-1 equations
to model this set of log odds ratios. In our case
log?1i / ?3ib0 b1x1i..... log?2i / ?3ib2
b3x1i.....
619.3 Notation
The MLwiN software uses the notation
log?1i / ?3ib0 b1x1i..... log?2i / ?3ib2
b3x1i..... .....
Often in papers you will see the more succinct
notational form
log?i(s) / ?i(t) b0(s) b1(s) x1i
s1,..,t-1
Which becomes
For s 1 log?i(1) / ?i(3) b0(1)
b1(1)x1i.....
For s 2 log?i(2) / ?i(3) b0(2)
b1(2)x1i.....
629.4 Interpretation(odds ratios)
We can interpret as with logistic regression. In
the political example, 1,2,3 vote tory, vote
labour, vote lib dem.
log?1i / ?3ib0 b1x1i..... log?2i / ?3ib2
b3x1i.....
b1 is the change in the log odds of voting tory
as opposed to lib dem for 1 unit increase in
x1i. b3 is the change in the log odds of voting
labour as opposed to lib dem for 1 unit increase
in x1i. and expo(b ) gives odds ratios
639.5 Interpretation(probabilities)
Or in general notation
log?1i / ?3ib0 b1x1i..... log?2i / ?3ib2
b3x1i.....
Probability of voting tory for individual i
Probability of voting labour for individual i
649.6 Multilevel Multinomial models
Suppose the individuals in the voting example are
clustered into constituencies and we wish to
include constituency effects in our model. We
include intercept level residuals for each log
odds equiation in our model
log?1ij / ?3ijb0 b1x1ij u0j log?2ij /
?3ijb2 b3x1ij u2j
u0j is the effect of the constituency j on the
log odds of voting tory as opposed to lib dem. So
if u0j is 1 the log odds of voting tory as
opposed to lib dem increase by 1 compared to u0j
where u0j 0 (the is average constituency)
Likewise u2j is the effect of the constituency j
on the log odds of voting labour as opposed to
lib dem.
659.7 Variance of level 2 random effects
log?1ij / ?3ijb0 u0j b1x1ij log?2ij /
?3ijb2 u2j b3x1ij
s 2u0 is the betwen constituency variance of the
vote torylib dem log odds ratio
s 2u2 is the between constituency variance of the
vote labourlib dem log odds ratio
s u02 is the constituency level covariance
between tory and labour constituency level
effects. A negative covariance means there is a
tendency for constituencies where labour do well
as opposed to libdems for tories to do badly as
opposed to the libdems and vice versa.
6610.0 Ordered categorical data
Where there is an underlying ordering to the
categories a convenient parameterisation is to
work with cumulative probabilities that an
individual crosses a threshold. For example, with
exam grades
With an ordered multinomialwe work with the set
of cumulative probabalities g. As before with t
categories in the the model has t-1 categories.
6710.1 Writing the ordered multinomial model
log(g1i/ (1-g1i) b0 log odds of ?
D log(g2i/ (1-g2i) b1 log odds of ?
C log(g3i/ (1-g3i) b2 log odds of ? B
The threshold probability gki are given by
antilogit(bk)
We must have b0 lt b1 lt b2 to ensure g1 lt g2
lt g3
6810.2 Adding covariates to the model
- log(g1i/ (1-g1i) b0 hi log odds
of ? D - log(g2i/ (1-g2i) b1 hi log odds
of ? C - log(g3i/ (1-g3i) b2 hi log
odds of ? B - hi b3x1i.....
Note that the covariates hi are the same for each
of the response threshold categories.
This means that the log odds ratios and odds
ratios for threshold category membership are
independent of the predictor variables.
6910.3 Proportional odds models
Sio far we have assumed that the odds ratios of
response category membership remains constant wrt
predictor variables. This is known as the
proportional odds assumption.
We can test the assumption that odds ratios of
response category membership being independent of
predictor variables by fitting
log(g1i/ (1-g1i) b0 b3x1i log odds of ?
D log(g2i/ (1-g2i) b1 b4x1i log odds of ?
C log(g3i/ (1-g3i) b2 b5x1i log odds of ? B
Now if our assumptions are correct b3, b4, b5
will be very similar. We can formally test b3
b4,b5 using the intervals and tests window
7010.4 Multilevel ordered multinomial models
- log(g1i/ (1-g1i) b0 hi log odds
of ? D - log(g2i/ (1-g2i) b1 hi log odds
of ? C - log(g3i/ (1-g3i) b2 hi log
odds of ? B - hi b3x1iu0j
u0j is a random effect for school j, which shifts
all the threshold probabilities equally for all
kids in school j. Again odds ratios for category
membership are independent of u0j
7110.5 Higher level variances
u0jN(0,s 2u0)
The greater s 2u0 The greater the variability in
the school level shifts in the response threshold
probabilities.
7211.0 Non-hierarchical multilevel models
- Two types
- Cross-classified models
- Multiple membership models
7311.1 Cross-classification
For example, hospitals by neighbourhoods.
Hospitals will draw patients from many different
neighbourhoods and the inhabitants of a
neighbourhood will go to many hospitals. No pure
hierarchy can be found and patients are said to
be contained within a cross-classification of
hospitals by neighbourhoods
7411.2 Other examples of cross-classifications
- pupils within primary schools by secondary
schools - patients within GPs by hospitals
- interviewees within interviewers by surveys
- repeated measures within raters by
individual(e.g. patients by nurses)
7511.3 Notation
With hirearchical models we have subscript
notation that has one subscript per level and
nesting is implied reading from left. For
example, subscript pattern ijk denotes the ith
level unit within the jth level 2 unit within
the kth level 3 unit. If models become
cross-classified we use the term classification
instead of level. With notation that has one
subscript per classification, that captures the
relationship between classifications, notation
can become very cumbersome. We propose an
alternative notation that only has a single
subscript no matter how many classifications are
in the model.
7611.4 Single subscript notation
We write the model as
Where classification 2 is nbhd and classification
3 is hospital. Classification 1 always
corresponds to the classification at which the
response measurements are made, in this case
patients. For patients 1 and 11 equation (1)
becomes
7711.5 Classification diagrams
In the single subscript notation we loose
informatin about the relationship(crossed or
nested) between classifications. A useful way of
conveying this informatin is with the
classification diagram. Which has one node per
classification and nodes linked by arrows have a
nested relationship and unlinked nodes have a
crossed relationship.
Hospital
Neighbourhood
Patient
Cross-classified structure where patients from a
hospital come from many neighbourhoods and people
from a neighbourhood attend several hospitals.
Nested structure where hospitals are contained
within neighbourhoods
7811.6 Data example Artificial insemination by
donor
1901 women 279 donors 1328 donations 12100
ovulatory cycles response is whether conception
occurs in a given cycle
In terms of a unit diagram
Or a classification diagram
7911.7 Model for artificial insemination data
artificial insemination
We can write the model as
Results
8011.8 Multiple membership models
- Where level 1 units are members of more than one
higher level unit. For example, - Pupils change schools/classes and each
school/class has an effect on pupil outcomes - Patients are seen by more than one nurse during
the course of their treatment
8111.9 Notation
Note that nurse(i) now indexes the set of nurses
that treat patient i and w(2)i,j is a weighting
factor relating patient i to nurse j. For
example, with four patients and three nurses, we
may have the following weights
8211.10 Classification diagrams for multiple
membership relationships
Double arrows indicate a multiple membership
relationship between classifications
We can mix multiple membership, crossed and
hierarchical structures in a single model
8311.11 Example involving, nesting, crossing and
multiple membership Danish chickens
Production hierarchy 10,127 child flocks
725
houses 304 farms
Breeding hierarchy 10,127 child flocks 200 parent
flocks
As a unit diagram
As a classification diagram
8410.12 Model and results
8510.13 Alspac data
All the children born in the Avon area in 1990
followed up longitudinally
Many measurements made including educational
attainment measures
Children span 3 school year cohorts(say
1994,1995,1996)
Suppose we wish to model development of numeracy
over the schooling period. We may have the
following attainment measures on a child
m1 m2 m3 m4 m5 m6 m7
m8 primary school secondary
school
8611.14 Structure for primary schools
- Measurement occasions within pupils
- At each occasion there may be a different teacher
- Pupils are nested within primary school cohorts
- All this structure is nested within primary school
- Pupils are nested within residential areas
8711.17 A mixture of nested and crossed
relationships
Nodes directly connected by a single arrow are
nested, otherwise nodes are cross-classified. For
example, measurement occasions are nested within
pupils. However, cohort are cross-classified with
primary teachers, that is teachers teach more
than one cohort and a cohort is taught by more
than one teacher.
8811.16 Multiple membership
It is reasonable to suppose the attainment of a
child in a particualr year is influenced not only
by the current teacher, but also by teachers in
previous years. That is measurements occasions
are multiple members of teachers.
We represent this in the classification diagram
by using a double arrow.
8911.17 What happens if pupils move area?
Classification diagram without pupils moving
residential areas
If pupils move area, then pupils are no longer
nested within areas. Pupils and areas are
cross-classified. Also it is reasonable to
suppose that pupils measured attainments are
effected by the areas they have previously lived
in. So measurement occasions are multiple members
of areas
Classification diagram where pupils move between
residential areas
BUT
9011.18 If pupils move area they will also move
schools
Classification diagram where pupils move between
areas but not schools
If pupils move schools they are no longer nested
within primary school or primary school cohort.
Also we can expect, for the mobile pupils, both
their previous and current cohort and school to
effect measured attainments
Classification diagram where pupils move between
schools and areas
9111.19 If pupils move area they will also move
schools cntd
And secondary schools
We could also extend the above model to take
account of Secondary school, secondary school
cohort and secondary school teachers.
9211.20 Other predictor variables
Remember we are partitioning the variability in
attainment over time between primary school,
residential area, pupil, p. school cohort,
teacher and occasion. We also have predictor
variables for these classifications, eg pupil
social class, teacher training, school budget and
so on. We can introduce these predictor variables
to see to what extent they explain the
partitioned variability.
93Multiprocess multilevel models, selection effects
and causal inference
9412.0 Causal inference with observational data
Ideally we would like to perform a randomised
control trial to test causal relationships. Often
for ethical, financial or other reasons only
observational data are available to us.
We may have a theory that x causes y that we wish
to test. In a regression of y on x we find a
significant relationship.
However there could be a further unmeasured
variable z that causes both y and x which if we
were to include in our regression would alter the
coefficient for our x variable.
Such variables are called confounders and study
designers exert a lot of effort in order to think
up likely confounders and measure them.
But how do we know we have not missed an
important confounder?
9512.1 Effect of missing confounder variables
But that there is an unobserved confounder zi
that is correlated(partially determine) both yi
and xi. So the true relationships are
yi b0 b1xi b2 zi ei(b) xi c0 c1zi
ei(c)
So the unmodelled effect of b2 zi biases both a1
and ei(a) and results in cov(xi , ei(a)) being
non-zero.
9612.2 Endogeneity
If x variables are correlated with
residuals(error terms) this violates an important
assumptions required for least squares to be
unbiased and consistent. Where corr(xi, ei) ? 0
xi is said to be endogenous. This can arise
through missing confounders or selection
effects(as we will see later).
A solution is to fit a multiprocess(simultaneous
equation, multivariate response) model for both
yi and xi.
However as we will see we also require an
instrumental variable. That is a variable
correlated with xi but not yi indentify the
model.
9712.3 Multiprocess model to handle endogeneity
Given we have observed yi and xi both partially
determined by an unobserved confounder zi
We also have observed an instrumental, vi,
variable that is correlated with xi but not yi.
Fitting yi a0 a1 xi ei(a) will give a
biassed estimate of the regression coefficient of
xi
will give an unbiassed estimate of the
regression coefficient of xi
9812.4 A simulation experiment to demonstrate
multiprocess models
yi b0 b1xi b2 zi ei(b) xi c0 c1zi
ei(c)
Given the true model
1. Sample ei(b) and zifrom univariate standard
Normal distributions
3. Form xi c0 c1zi ei (c) c0 0,
c12
4. Form yi b0 b1xi b2 zi ei(b) b0
1, b12, b22
Now forget about zi and analyse using univariate
regression and multiprocess model with
instruments and see which comes closer to true
value of b12.
9912.5 Simulation Results
yi b0 b1xi ei(b) (1)
10012.6 level 1 variance
The true model(2) yi b0 b1 (c0 c1zi
ei(c)) b2 zi ei(b)
In model(1) we only fit yi b0 b1 xi ei(b)
So the unmodelled effect of b2 zi biases
upwards both b1 and ei(b) and therefore ?2e(b)
10112.7 Model identifiability
Recall endogeneity results corr(xi, ei) ? 0.
Q So why not just fit the bivariate model
That is the multiprocess model without the the
instrumental variable(vi) which in practice can
be difficult to find?
A. The bivariate model without the instrumental
variable is not identified. This means that there
is no unique solution.
10212.8 Non-identifiability
Both models have the same likelihood(fit measure)
but give different answers. There is no unique
solution to this model.
10312.9 Selection effects and endogeneity
yi average pupil attainment school i xi is
funding school receives We want to look at the
effect of funding on average pupil attainment.
We can fit yi b0 b1xi ei(b)
BUT.. Funding is not allocated at random to
schools. Schools with low attainment may be given
greater funding. This is a selection effect which
will cause an underestimation of b1. Again
corr(xi, ei) ? 0. So if we can find an
instrumental variable( correlated with xi but not
yi. We can apply a multiprocess model to obtain
an unbiassed estimate of b1.
10412.10 Data Example The Effect of School Resources
on Pupil Attainment
- Fiona Steele
- (DfES project led by Ros Levacic and Anna
Vignoles)
10512.11 Simple 3-level Model
yijk Key Stage 3 score for pupil i in school j in
LEA k in 2002/03 xijk Characteristics of
pupils, school or LEA zjk School resources
(expenditure and staffing) vk Random LEA
effect ujk Random school effect eijk Pupil
residual
10612.12 EndogeneityProblem
Resources not allocated at random, and allocation
may be related to pupil attainment. E.g. school
may get more resources for less able pupils.
If ignored estimate of ? (resource effect) will
be biased.
10712.13 Solution
Model y and z jointly
Note responses at different levels.
10812.14 Identification
Need instruments, variables that predict resource
allocation but not pupil attainment.
- School size in 1999 (attainment in 2002/03)
- Education Formula Spending (amount received by
LEA) - Party political affiliations of council (number
of years controlled by each party)
10912.15 Random Effect Correlations from 2-Equation
Model
School level -0.096 LEA-level -0.355
11012.16 Results Effect of Expenditure per Pupil on
KS3 Maths