3: Multilevel residuals and variance partitioning coefficient

About This Presentation

Title:

3: Multilevel residuals and variance partitioning coefficient

Description:

3. 2 Residuals in a two-level random intercept model: simplified ... If ignored leads to overstatement of significance and under-estimating the standard errors ... – PowerPoint PPT presentation

Number of Views:373

Avg rating:3.0/5.0

Slides: 108

Provided by: rcla7

Category:

more less

Transcript and Presenter's Notes

Title: 3: Multilevel residuals and variance partitioning coefficient

1
3 Multilevel residuals and variance partitioning
coefficient
2
3.1 2-level random-intercept multilevel model
review

Combined model in full

Fixed part

Random part (Level 2)

Random part (Level 1)

3
3. 2 Residuals in a two-level random intercept
model simplified to three groups (Red, Blue,
Other districts)
4
3. 3 Residuals in a single-level model
5
3.4 Random intercepts model
6
3.5 Residuals in a two-level model Level 2
7
3.6 Residuals in a two-level model Level 1
8
3.7 Estimating Multilevel Residuals
9
3.8 AKA Posterior residuals
10
3.9 AKA shrunken residuals
11
3.10 Characteristics of a dependent data

Dependent according to some grouping eg
households tend to be alike measurements on
occasions within people tend to very alike
Consequently not as much information as appears
not as many degrees of freedom as we think
If ignored leads to overstatement of significance
and under-estimating the standard errors
Multilevel models model this dependency and
automatically corrects for the standard errors

Parameter Single
level Multilevel Intercept
-0.098 (0.021) -0.101 (0.070) Boy
school 0.122 (0.049) 0.120
(0.149) Girl school 0.244 (0.034)
0.258 (0.117) Between school variance(?u2)
0.155 (0.030) Between student variance
(?e2) 0.985 (0.022) 0.848 (0.019)
12
3.11 What is the degree of dependency in the
data?
13
(No Transcript)
14
3.14 Meaning of

When 0
- No dependency, all level-1 information, no
differences between groups
When 1
- Complete dependency
- Maximum difference between groupings
equivalent to complete similarity within
grouping

15
Covariance structure of single level model
Pupils within schools FULL covariance structure
16
Covariance structure of 2 level RI model
17
3.17 Fitting models in MLwiN

Work through (at your own pace) Chapter 3 of the
manual multilevel residuals
Dont be afraid to ask!

18
4.0 Contextual effects
In the previous sections we found that schools
vary in both their intercepts and slopes
resulting in crossing lines. The next question is
are there any school level variables that can
explain this variation?

Interest lies in how the outcome of individuals
in a cluster is affected by their social contexts
(measures at the cluster level). Typical
questions are
Does school type effect students' exam results?
Does area deprivation effect the health status
of individuals in the area?

In our data set we have a contextual school
ability measure, schav. The mean intake score is
formed for each school, these means are ranked
and the ranks are categorised into 3 groups
lowlt25,25gtmidlt75, highgt75
19
4.1 Exploring contextual effects and the tutorial
data
Does school gender effect exam score by gender?
Do boys in boys schools do better or worse or
the same compared with boys in mixed schools? Do
girls in girls schools do better or worse or the
same compared with girls in mixed schools?
Does peer group ability effect individual pupil
performance? That is given two pupils of equal
intake ability do they progress differently
depending on whether they are educated in a low,
mid or high ability peer group?
20
4.2 School gender effects
girl boysch girlsch 0 0
0 boy/mixed school
-0.189 1 0 0
girl/mixed school -0.1890.168 0
1 0 boy/boy school
-0.1890.180 1 0 1
girl/girl school
-0.1890.1680.175
21
4.3 Peer group ability effects
The effect of peer group ability is modelled as
being constant across gender, school gender and
standlrt. For example, comparing peer group
ability effects for boys in mixed schools and
boys in boys schools
-0.2650.552standlrtij boy,mixed
school,low(reference group)
22
4.4 Cross level interactions
There may be interactions between school gender,
peer group ability, gender and standlrt. An
interesting interaction is between peer group
ability and standlrt. This tests whether the
effect of peer group differs across the standlrt
intake spectrum. For example, being in a high
ability group may have a different effect for
pupils of different ability. This is a cross
level interaction because it is the interaction
between a pupil level variable(standlrt) and a
school level variable(schav).
23
4.5 Cross level interactions contd
Which leads to three lines for the low,mid and
high groupings. -0.3470.455standlrtij
low (-0.3470.144)(0.4550.092) standlrtij
mid (-0.3470.290)(0.4550.180) standlrtij
high
24
5.0 Variance functions or modelling
heteroscedasticity
Tabulating normexam by gender we see that the
means and variances for boys and girls are
(0.140 and 1.051) and (0.093 and 0.940). We may
want to fit a model that estimates separate
variances for boys and girls. The notation we
have been using so far assumes a common
intercept(?0) and a single set of student
residuals, ei, with a common variance ?e2. We
need to use a more flexible notation to build
this model.
25
5.1 Working with general notation in MLwiN
A model with no variables specified in general
notation looks like this.
A new first line is added stating that the
response variable follows a Normal distribution.
We now have the flexibility to specify
alternative distributions for our response. We
will explore these models later. The ?0
coefficient now has an explanatory x0 associated
with it. The values x0 takes determines the
meaning of the ?0 coefficient. If x0 is a vector
of 1s then ?0 will estimate an intercept common
to all individuals, in the absence of other
predictors this would be the overall mean. If x0
variable, say 1 for boys and 0 for girls, then ?0
will estimate the mean for boys.
26
5.2 A simple variance function
The new notation allows us to set up this simple
model where x0i is a dummy variable for boy and
x1i is a dummy variable for girl. This model
estimates separate means and variances for the
two groups. This is an example of a variance
function because the variance changes as a
function of explanatory variables. The function
is
27
5.3 Deriving the variance function
We arrive at the expression
(1)
28
5.4 Variance functions at level 2
The notion of variance functions is powerful and
not restricted to level 1 variances.
The random slopes model fitted earlier produces
the following school level predictions which show
school level variability increasing with intake
score.
The model
29
5.5 Two views of the level 2 variance
Given x0 1, we have
Which shows that the level 2 variance is
polynomial function of x1ij

View 1 In terms of school lines predicted
intercepts and slopes varying across schools.

  View 2 In terms of a variance function which
shows how the level 2 variance changes as a
function of 1 or more explanatory variables.
30
5.6 Elaborating the level 1 variance
Maybe the student level departures around their
schools summary lines are not constant.
Note at level 2 we have 2 interpretations of
level 2 random variation, random coefficients
(varying slopes and intercepts across level 2
units) and variance functions. In each level 1
unit, by definition, we only have one point,
therefore the first interpretation does not exist
because you cannot have a slope given a single
data point.
31
5.7 Variance functions at level 1
If we allow standlrt(x1ij) to have a random term
at level 1, we get
32
5.8 Modelling the mean and variance simultaneously
In our model
33
6.0 Multivariate response models
We may have data on two or more responses we wish
to analyse jointly. For example, we may have
english and maths scores on pupils within
schools. We can consider the response type as a
level below pupil.
34
6.1 Rearranging data
Often data comes like this with one row per person
For MLwiN to analyse the data we require the data
matrix to have one row per level 1 unit. That is
one row per response measurement
35
6.2 Writing down the model
Where y1j is the english score for student j and
y2j is the maths score for student j. The means
and variances for english and maths(?0,?1,?u02,?u1
2) are estimated. Also the covariance between
maths and english, ? u01is estimated.
Note there is no level 1(eij) variance. This can
be seen if we consider the picture for one pupil.
36
6.3 Advantages of framing a multivariate response
model as a multilevel model
The model has the following advantages over
traditional multivariate techniques    It can
deal with missing responses-provided response
data is missing completely at random(MCAR) or
missing at random(MAR) that is missingness is
related to explanatory variables in the
model.   Covariates can be added giving us the
conditional covariance matrix of the
responses.   Further levels can be added to the
model
37
6.4 Example from MLwiN user guide
pupils have two responses written and coursework
mean for written 46.8 Variance(written)
178.7 mean for coursework 73.36 Variance(coursew
ork) 265.4 covariance(written, coursework)
102.3
That is we have two means and a covariance
matrix, which we could get from any stats
package. However, the data are unbalanced. Of the
1905 pupils 202 are missing a written response
and 180 are missing a coursework response.
38
6.5 Further extensions
We can add further explanatory variables. For
example, female. We see that females do better
for coursework than males and worse than males
on written exams males do better on written exams.
39
6.6 Repeated measures.
We may have repeated measurements on individuals,
for example a series of heights or test scores.
Often we want to model peoples growth. We can fit
this structure as a multilevel model with
repeated measurements nested within people. That
is
40
6.7 Advantages of fitting repeated measures
models in a multilevel framework

Fitting these structures using a multilevel model
has the advantages that data can be
Unbalanced (individuals can have different
numbers of measurement occasions)
Unequally spaced (different individuals can be
measured at different ages)
As opposed to traditional multivariate techniques
which require data to be balanced and equally
spaced.
Again the multilevel model requires response
measurements are MCAR or MAR.

41
6.8 An example from the MLwiN user guide
Repeated measures model for childrens reading
scores
This (random intercepts model) models growth as a
linear process with individuals varying only in
their intercepts. That is for the 405 individuals
in the data set
42
6.9 Further possibilities for repeated measures
model

We can go on and fit a random slope model. Which
in this case allows the model to deal with
children growing at different rates.
We can fit polynomials in age to allow for
curvilinear growth.
We can also try and explain between individual
variation in growth by introducing child level
variables.
If appropriate we can include further levels of
nesting. For example, if children are nested
within schools we could fit a 3 level model
occasionschildrenschools. We could then look
to see if childrens patterns of growth varied
across schools.

43
7 Significance testing and model comparison

Individual fixed part and random coefficients at
each level
Simultaneous and complex comparisons
Comparing nested models likelihood ratio test
Use of Deviance Information Criteria

44
7.1 Individual coefficients

Akin to t tests in regression models
Either specific fixed effect or specific
variance-covariance component
H0 is 0 H1 is not 0
H0 is 0 H1 is not 0
Procedure Divide estimated coefficient by their
standard error
Judge against a z distribution
If ratio exceeds 1.96 then significant at 0.05
level
Approximate procedure asymptotic test, small
sample properties not well-known.
OK for fixed part coefficients but not for random
(typically small numbers variance distribution
is likely to have skew)

45
7.2 Simultaneous/complex comparisons
recommended for random part testing

Example Testing H0 b2 b3 0 AND b3 5
H0 Cb k
C is the contrast matrix (p by q) specifying
the nature of hypothesis (q is number of
parameters in model p is the number of
simultaneous tests)
FILL Contrast matrix with
1 if parameter involved
-1 if involved as a difference
0 not involved otherwise
b is a vector of parameters (fixed or random)
q
k is a vector of values that the parameters are
contrasted against (usually the null) these have
to be set

Example Testing H0 b2 b3 0 AND b3 5
q 4 (intercept and 3 slope terms)
p 2 (2 sets of tests)

Overall test against chi square with p degrees of
freedom
Output
Result of the contrast
Chi-square statistic for each test separately
Chi-square statistic for overall test all
contrasts simultaneously

47
Testing in fixed part 1 slope for Standlrt 2
BoySch from mixed 3 GirlSch from mixed 4 Boysch
from Girlsch Model gt Intervals tests gtFixed
coefficients 4 tests
Basic Statistics gt Tail Areas Chi square
CPRObability 1.586 1 0.20790
48
Testing in random part 1 school variance 2
difference between school and student
variance Model gt Intervals tests gtRandom
coefficients 2 tests
Basic Statistics gt Tail Areas Chi square
CPRObability 25.019 1 5.6768e-007
49
7.6 Do we need a quadratic variance function at
level 2?
-gtCPRObability 32.126 3 4.9230e-007
CPRO 4 1 Benchmarks 0.046 CPRO 6 2 0.050
CPRO 8 3 0.046
50
7.7 Comparing nested models likelihood ratio test

Akin to F tests in regression models, i.e., is a
more complex model a significantly model better
fit to the data or is simpler model a
significantly worse fit
Procedure
Calculate the difference in the deviance of the
two models
Calculate the change in complexity as the
difference in the number of parameters between
models
Compare the difference in deviance with a
chi-square distribution with df difference in
number of parameters
Example tutorial data
do we get a significant improvement in the fit
if we move from a constant variance function for
schools to a quadratic involving Standlrt?

51
-2log(lh) is 9305.78 quadratic -2log(lh)
is 9349.42 constant -gtcalc b3 b2-b1
43.644 -gtcpro 43.410 2 3.7466e-010 NB
significantly worse fit ie need quadratic
52
7.9 Deviance Information Criterion

Diagnostic for model comparison
Goodness of fit criterion that is penalized for
model complexity
Generalization of the Akaike Information
Criterion (AIC where df is known)
Used for comparing non-nested models (eg same
number but different variables)
Valuable in Mlwin for testing improved goodness
of fit of non-linear model (eg Logit) because
Likelihood (and hence Deviance is incorrect)
Estimated by MCMC sampling on output get
Bayesian Deviance Information Criterion (DIC)
Dbar D(thetabar) pD DIC
9763.54 9760.51 3.02 9766.56
Dbar the average deviance from the complete
set of iterations
D(thetaBar) the deviance at the expected value
of the unknown parameters
pD the Estimated degrees of freedom consumed
in the fit, ie Dbar- D(thetaBar)
DIC Fit Complexity Dbar pD
NB lower values better parsimonious model
Somewhat contoversial!
Spiegelhalter, D.J., Best, N.G., Carlin, B.P. and
van der Linde, A. (2002). Bayesian measures of
model complexity and fit. Journal of the Royal
Statistical Society, Series B 64 583-640.

53
7.10 Some guidance

any decrease in DIC suggests a better model
But stochastic nature of MCMC so, with small
difference in DIC you should confirm if this is a
real difference by checking the results with
different seeds and/or starting values.
More experience with AIC, and common rules of
thumb

7.11 Example Tutorial dataset example
Model 1 NULL model a constant and level 1
variance
Model 2 additionally include slope for Standlrt
Model 3 65 fixed school effects (64 dummies and
constant)
Model 4 school as random effects
Model 5 65 fixed school intercepts and slopes
Model 6 random slopes model quadratic variance
function

Best Model 6 Note random models (4 6) have
more nominal parameters than their fixed
equivalents but less effective parameters and
a lower DIC value (due to distributional
assumptions)
55
8.0 Generalised Multilevel Models 1 Binary
Responses and Proportions
56
8.0 Generalised multilevel models

So Far
Response at level 1 has been a continuous
variable and
associated level 1 random term has been assumed
to have
a Normal distribution
Now a range of other data types for the response
All can be handled routinely by MLwiN
Achieved by 2 aspects
a non-linear link between response and
predictors
a non-Gaussian level 1 distribution

57
8.1 Typology of discrete responses
58
8.2 Focus on modelling proportions

Proportions eg death rate employment rate can
be conceived as the underlying probability of
dying probability of being employed
Four important attributes of a proportion that
MUST be taken into account in modelling
(1)Closed range bounded between 0 and 1
(2)Anticipated non-linearity between response and
predictors as predicted response approaches
bounds, greater and greater change in x is
required to achieve the same change in outcome
examination analogy
(3)Two numbers numerator subset of denominator
(4)Heterogeneity variance is not homoscedastic
two aspects
(a) the variance depends on the mean
as approach bound of 0 and 1, less
room to vary
ie Variance is a function of the predicted
probability
(b) the variance depends on the denominator
small denominators result in highly
variable proportions

59
8.3 Modelling Proportions

Linear probability model that is use standard
regression model with linear relationship and
Gaussian random term
But 3 problems
(1) Nonsensical predictions predicted
proportions are unbounded, outside range of 0
and 1
(2) Anticipated non-linearity as approach
bounds
(3) Heteogeneity inherent unequal variance
dependent on mean and on denominator
Logit model with Binomial random term resolves
all three problems (could use probit, clog-clog)

60
8.5 The logistic model resolves problems 1 2

The relationship between the probability and
predictor(s) can be represented by a logistic
function, that resembles a S-shaped curve

Models not the proportion but a non-linear
transformation of it (solves problems 12)

61
8.6 The Logit transformation

L LOGe(p/ (1-p))
L Logit the log of the odds
p proportion having an attribute
1-p proportion not having the attribute
p/(1-p) the odds of having an attribute
compared to not having an attribute
As p goes from 0 to 1, L goes from minus to plus
infinity, so if model L, cannot get predicted
proportions that lie outside 0 and 1 (ie solves
problem 1)
Easy to move between proportions, odds and logits

62
8.7 Proportions, Odds and Logits
63
8.8 The logistic model

The underlying probability or proportion is
non-linearly related to the predictor

where e is the base of the natural logarithm
linearized by the logit transformation(log
natural logarithm)

64
8.9 The logistic model key characteristics

The logit transformation produces a linear
function of the parameters.
Bounded between 0 and 1
Thereby solving problems 1 and 2

65
8.10 Solving problem 3assume Binomial variation

Variance of the response in logistic models is
presumed to be binomial

Ie depends on underlying proportion and the
denominator
In practice this is achieved by replacing the
constant variable at level 1 by a binomial
weight, z, and constraining the level-1 variance
to 1 for exact binomial variation
The random (level-1) component can be written as

66
8.11 Multilevel Logistic Model

Assume observed response comes from a Binomial
distribution with a denominator for each cell,
and an underlying probability/proportion

Underlying proportions/probabilities, in turn,
are related to a set of individual and
neighborhood predictors by the logit link function

Linear predictor of the fixed part and the
higher-level random part

67
8.12 Estimation 1

Quasi-likelihood (MQL/PQL 1st and 2nd order)
model linearised and IGLS applied.
1st or 2nd order Taylor series expansion (to
linearise the non-linear model)
MQL versus PQL are higher-level effects included
in the linearisation
MQL1 crudest approximation. Estimates may be
biased downwards (esp. if within cluster sample
size is small and between cluster variance is
large eg households). But stable.
PQL2 best approximation, but may not converge.
Tip Start with MQL1 to get starting values for
PQL.

68
8.13 Estimation 2

MCMC methods get deviance of model (DIC) for
sequential model testing, and good quality
estimates even where cluster size is small start
with MQL1 and then switch to MCMC

69
8.14 Variance Partition Coefficient
yijBinomial(pij,1) logit(pij xij, uj,) a
bx1ij uj Var(uj) su2 var(yij- pij) pij(1-
pij) Level 1 variance is function of
predicted probability
The level 2 variance su2 is on the logit scale
and the level 1 variance var(yij- pij) is on the
probability scale so they can not be directly
compared. Also level 1 variance depends on pij
and therefore x1ij. Possible solutions include i)
set the level 1 variance variance of a standard
logistic distribution ii) simulation method
70
8.15 VPC 1 Threshold Model
But this ignores the fact that the level 1
variance is not constant, but is function of the
mean probability which depends on the predictors
in the fixed part of the model
71
8.16 VPC 2 Simulation Method
72
8.17 Multilevel modelling of binary data

Exactly the same as proportions except
The response is either 1 or 0
The denominator is a set of 1s
So that a Yes is 1 out of 1 , while a No is 0
out of 1

73
8.18 Chapter 9 of Manual Contraceptive Use in
Bangladesh

2867 women nested in 60 districts
y1 if using contraception at time of survey, y0
if not using contraception
Covariates age (mean centred), urban residence
(vs. rural)

74
8.19 Random Intercept Model PQL2
75
8.20 Variance Partition Coefficient
76
8.21 MLwiN Gives

UNIT or (subject) SPECIFIC Estimates
the fixed effects conditional on higher level
unit random effects, NOT the
POPULATION-AVERAGE estimatesiethe marginal
expectation of the dependent variables across the
population "averaged " across the random effects
In non-linear models these are different and the
PA will generally be smaller than US, especially
as size of random effects grows
Can derive PA fom US but not vice-versa (next
version give both)

77
8.22 Unit specific / Population average

Probability of adverse reaction against dose
Left subject-specific big differences between
subjects for middle dose (the between patient
variance is large),
Right is the population average dose response
curve,
Subject-specific curves have a steeper slope in
the middle range of the dose variable

78
9.0 MCMC estimation in MlwiN
MCMC estimation is a big topic and is given a
pragmatic and cursory treatment here. Interested
students are referred to the manual MCMC
estimation in MLwiN available from http//multile
vel.ioe.ac.uk/beta/index.html
In the workshop so far you have been using IGLS
(Iterative Generalised Least Squares) algorithm
to estimate the models.
79
9.1 IGLS versus MCMC
MCMC
IGLS
80
9.2 Bayesian framework
MCMC estimation operates in a Bayesian framework.
A bayesian framework requires one to think about
prior information we have on the parameters we
are estimating and to formally include that
information in the model. We may make the
decision that we are in a state of complete
ignorance about the parameters we are estimating
in which case we must specify a so called
uninformative prior. The posterior
distribution for a paremeter ? given that we have
observed y is subject to the following rule
p(?y)? p(y ?)p(?)
Where p(?y) is the posterior distribution for ?
given we have observed y p(y ?) is the
likelihood of observing y given ? p(?) is the
probability distribution arising from some
statement of prior belief such as we believe
?N(1,0.01). Note that we believe ?N(1,1) is
a much weaker and therefore less influential
statement of prior belief.
81
9.3 Applying MCMC to multilevel models
In a two level variance components model we have
the following unknowns
There joint posterior is
82
9.4 Gibbs sampling
Evaluating the expression for the joint posterior
with all the parameters unknown is for most
models, virtually impossible. However, if we take
each unknown parameter in turn and temporarily
assume we know the values of the other
parameters, then we can simulate from the so
called conditional posterior distribution. The
Gibbs sampling algorithm cycles through the
following simulation steps. First we assume some
starting values for our unknown parameters
83
9.5 Gibbs sampling cntd
We now have updated all the unknowns in the
model. This process is repeated many times until
eventually we converge on the distribution of
each of the unknown parameters.
84
9.6 IGLS vs MCMC convergence
IGLS algorithm converges, deterministically to a
distribution.
MCMC algorithm converges on a distribution.
Parameter estimates and intervals are then
calculated from the simulation chains.
85
9.7 Other MCMC issues
By default MLwiN uses flat, uniformative priors
see page 5 of MCMC estimation in MLwiN (MEM) For
specifying informative priors see chapter 6 of
MEM. For model comparison in MCMC using the DIC
statistic see chapters 3 and 4 MEM. For
description of MCMC algorithms used in MLwiN see
chapter 2 of MEM.
86
9.8 When to consider using MCMC in MLwiN
If you have discrete response data binary,
binomial, multinomial or Poisson (chapters 11,
12, 20 and 21). Often PQL gives quick and
accurate estimates for these models. However, it
is a good idea to check against MCMC to test for
bias in the PQL estimates.
If you have few level 2 units and you want to
make accurate inferences about the distribution
of higher level variances.
Some of the more advanced models in MLwiN are
only available in MCMC. For example, factor
analysis (chapter 19), measurement error in
predictor variables (chapter 14) and CAR spatial
models (chapter 16)
Other models, can be fitted in IGLS but are
handled more easily in MCMC such as multiple
imputation (chapter 17), cross-classified(chapter
14) and multiple membership models (chapter 15).
All chapter references to MCMC estimation in
MLwiN.
87
10.0 Non-hierarchical multilevel models

Two types
Cross-classified models
Multiple membership models

88
10.1 Cross-classification
For example, hospitals by neighbourhoods.
Hospitals will draw patients from many different
neighbourhoods and the inhabitants of a
neighbourhood will go to many hospitals. No pure
hierarchy can be found and patients are said to
be contained within a cross-classification of
hospitals by neighbourhoods

89
10.2 Other examples of cross-classifications

pupils within primary schools by secondary
schools
patients within GPs by hospitals
interviewees within interviewers by surveys
repeated measures within raters by
individual(e.g. patients by nurses)

90
10.3 Notation
With hirearchical models we have subscript
notation that has one subscript per level and
nesting is implied reading from left. For
example, subscript pattern ijk denotes the ith
level unit within the jth level 2 unit within
the kth level 3 unit. If models become
cross-classified we use the term classification
instead of level. With notation that has one
subscript per classification, that captures the
relationship between classifications, notation
can become very cumbersome. We propose an
alternative notation that only has a single
subscript no matter how many classifications are
in the model.
91
10.4 Single subscript notation
We write the model as
Where classification 2 is nbhd and classification
3 is hospital. Classification 1 always
corresponds to the classification at which the
response measurements are made, in this case
patients. For patients 1 and 11 equation (1)
becomes
92
10.5 Classification diagrams
In the single subscript notation we loose
informatin about the relationship(crossed or
nested) between classifications. A useful way of
conveying this informatin is with the
classification diagram. Which has one node per
classification and nodes linked by arrows have a
nested relationship and unlinked nodes have a
crossed relationship.
Hospital
Neighbourhood
Patient
Cross-classified structure where patients from a
hospital come from many neighbourhoods and people
from a neighbourhood attend several hospitals.
Nested structure where hospitals are contained
within neighbourhoods
93
10.6 Data example Artificial insemination by
donor
1901 women 279 donors 1328 donations 12100
ovulatory cycles response is whether conception
occurs in a given cycle
In terms of a unit diagram
Or a classification diagram
94
10.7 Model for artificial insemination data
artificial insemination
We can write the model as
Results
95
10.8 Multiple membership models

Where level 1 units are members of more than one
higher level unit. For example,
Pupils change schools/classes and each
school/class has an effect on pupil outcomes
Patients are seen by more than one nurse during
the course of their treatment

96
10.9 Notation
Note that nurse(i) now indexes the set of nurses
that treat patient i and w(2)i,j is a weighting
factor relating patient i to nurse j. For
example, with four patients and three nurses, we
may have the following weights
97
10.10 Classification diagrams for multiple
membership relationships
Double arrows indicate a multiple membership
relationship between classifications
We can mix multiple membership, crossed and
hierarchical structures in a single model
98
10.11 Example involving, nesting, crossing and
multiple membership Danish chickens
Production hierarchy 10,127 child flocks
725
houses 304 farms
Breeding hierarchy 10,127 child flocks 200 parent
flocks
As a unit diagram
As a classification diagram
99
10.12 Model and results
100
10.13 Alspac data
All the children born in the Avon area in 1990
followed up longitudinally
Many measurements made including educational
attainment measures
Children span 3 school year cohorts(say
1994,1995,1996)
Suppose we wish to model development of numeracy
over the schooling period. We may have the
following attainment measures on a child
m1 m2 m3 m4 m5 m6 m7
m8 primary school secondary
school
101
10.14 Structure for primary schools

Measurement occasions within pupils

At each occasion there may be a different teacher

Pupils are nested within primary school cohorts

All this structure is nested within primary school

Pupils are nested within residential areas

102
10.15 A mixture of nested and crossed
relationships
Nodes directly connected by a single arrow are
nested, otherwise nodes are cross-classified. For
example, measurement occasions are nested within
pupils. However, cohort are cross-classified with
primary teachers, that is teachers teach more
than one cohort and a cohort is taught by more
than one teacher.
103
10.16 Multiple membership
It is reasonable to suppose the attainment of a
child in a particualr year is influenced not only
by the current teacher, but also by teachers in
previous years. That is measurements occasions
are multiple members of teachers.
We represent this in the classification diagram
by using a double arrow.
104
10.17 What happens if pupils move area?
Classification diagram without pupils moving
residential areas
If pupils move area, then pupils are no longer
nested within areas. Pupils and areas are
cross-classified. Also it is reasonable to
suppose that pupils measured attainments are
effected by the areas they have previously lived
in. So measurement occasions are multiple members
of areas
Classification diagram where pupils move between
residential areas
BUT
105
10.18 If pupils move area they will also move
schools
Classification diagram where pupils move between
areas but not schools
If pupils move schools they are no longer nested
within primary school or primary school cohort.
Also we can expect, for the mobile pupils, both
their previous and current cohort and school to
effect measured attainments
Classification diagram where pupils move between
schools and areas
106
10.19 If pupils move area they will also move
schools cntd
And secondary schools
We could also extend the above model to take
account of Secondary school, secondary school
cohort and secondary school teachers.
107
10.20 Other predictor variables
Remember we are partitioning the variability in
attainment over time between primary school,
residential area, pupil, p. school cohort,
teacher and occasion. We also have predictor
variables for these classifications, eg pupil
social class, teacher training, school budget and
so on. We can introduce these predictor variables
to see to what extent they explain the
partitioned variability.

Write a Comment

User Comments (0)