3: Multilevel residuals and variance partitioning coefficient - PowerPoint PPT Presentation

1 / 107
About This Presentation
Title:

3: Multilevel residuals and variance partitioning coefficient

Description:

3. 2 Residuals in a two-level random intercept model: simplified ... If ignored leads to overstatement of significance and under-estimating the standard errors ... – PowerPoint PPT presentation

Number of Views:373
Avg rating:3.0/5.0
Slides: 108
Provided by: rcla7
Category:

less

Transcript and Presenter's Notes

Title: 3: Multilevel residuals and variance partitioning coefficient


1
3 Multilevel residuals and variance partitioning
coefficient
2
3.1 2-level random-intercept multilevel model
review
  • Combined model in full
  • Fixed part
  • Random part (Level 2)
  • Random part (Level 1)

3
3. 2 Residuals in a two-level random intercept
model simplified to three groups (Red, Blue,
Other districts)
4
3. 3 Residuals in a single-level model
5
3.4 Random intercepts model
6
3.5 Residuals in a two-level model Level 2
7
3.6 Residuals in a two-level model Level 1
8
3.7 Estimating Multilevel Residuals
9
3.8 AKA Posterior residuals
10
3.9 AKA shrunken residuals
11
3.10 Characteristics of a dependent data
  • Dependent according to some grouping eg
    households tend to be alike measurements on
    occasions within people tend to very alike
  • Consequently not as much information as appears
    not as many degrees of freedom as we think
  • If ignored leads to overstatement of significance
    and under-estimating the standard errors
  • Multilevel models model this dependency and
    automatically corrects for the standard errors

Parameter Single
level Multilevel Intercept
-0.098 (0.021) -0.101 (0.070) Boy
school 0.122 (0.049) 0.120
(0.149) Girl school 0.244 (0.034)
0.258 (0.117) Between school variance(?u2)
0.155 (0.030) Between student variance
(?e2) 0.985 (0.022) 0.848 (0.019)
12
3.11 What is the degree of dependency in the
data?
13
(No Transcript)
14
3.14 Meaning of
  • When 0
  • - No dependency, all level-1 information, no
    differences between groups
  • When 1
  • - Complete dependency
  • - Maximum difference between groupings
    equivalent to complete similarity within
    grouping

15
Covariance structure of single level model
Pupils within schools FULL covariance structure
16
Covariance structure of 2 level RI model
17
3.17 Fitting models in MLwiN
  • Work through (at your own pace) Chapter 3 of the
    manual multilevel residuals
  • Dont be afraid to ask!

18
4.0 Contextual effects
In the previous sections we found that schools
vary in both their intercepts and slopes
resulting in crossing lines. The next question is
are there any school level variables that can
explain this variation?
  • Interest lies in how the outcome of individuals
    in a cluster is affected by their social contexts
    (measures at the cluster level). Typical
    questions are
  •  Does school type effect students' exam results?
  • Does area deprivation effect the health status
    of individuals in the area?

In our data set we have a contextual school
ability measure, schav. The mean intake score is
formed for each school, these means are ranked
and the ranks are categorised into 3 groups
lowlt25,25gtmidlt75, highgt75
19
4.1 Exploring contextual effects and the tutorial
data
Does school gender effect exam score by gender?
Do boys in boys schools do better or worse or
the same compared with boys in mixed schools? Do
girls in girls schools do better or worse or the
same compared with girls in mixed schools?
Does peer group ability effect individual pupil
performance? That is given two pupils of equal
intake ability do they progress differently
depending on whether they are educated in a low,
mid or high ability peer group?
20
4.2 School gender effects
girl boysch girlsch 0 0
0 boy/mixed school
-0.189 1 0 0
girl/mixed school -0.1890.168 0
1 0 boy/boy school
-0.1890.180 1 0 1
girl/girl school
-0.1890.1680.175
21
4.3 Peer group ability effects
The effect of peer group ability is modelled as
being constant across gender, school gender and
standlrt. For example, comparing peer group
ability effects for boys in mixed schools and
boys in boys schools
-0.2650.552standlrtij boy,mixed
school,low(reference group)
22
4.4 Cross level interactions
There may be interactions between school gender,
peer group ability, gender and standlrt. An
interesting interaction is between peer group
ability and standlrt. This tests whether the
effect of peer group differs across the standlrt
intake spectrum. For example, being in a high
ability group may have a different effect for
pupils of different ability. This is a cross
level interaction because it is the interaction
between a pupil level variable(standlrt) and a
school level variable(schav).
23
4.5 Cross level interactions contd
Which leads to three lines for the low,mid and
high groupings. -0.3470.455standlrtij
low (-0.3470.144)(0.4550.092) standlrtij
mid (-0.3470.290)(0.4550.180) standlrtij
high
24
5.0 Variance functions or modelling
heteroscedasticity
Tabulating normexam by gender we see that the
means and variances for boys and girls are
(0.140 and 1.051) and (0.093 and 0.940). We may
want to fit a model that estimates separate
variances for boys and girls. The notation we
have been using so far assumes a common
intercept(?0) and a single set of student
residuals, ei, with a common variance ?e2. We
need to use a more flexible notation to build
this model.
25
5.1 Working with general notation in MLwiN
A model with no variables specified in general
notation looks like this.
A new first line is added stating that the
response variable follows a Normal distribution.
We now have the flexibility to specify
alternative distributions for our response. We
will explore these models later. The ?0
coefficient now has an explanatory x0 associated
with it. The values x0 takes determines the
meaning of the ?0 coefficient. If x0 is a vector
of 1s then ?0 will estimate an intercept common
to all individuals, in the absence of other
predictors this would be the overall mean. If x0
variable, say 1 for boys and 0 for girls, then ?0
will estimate the mean for boys.
26
5.2 A simple variance function
The new notation allows us to set up this simple
model where x0i is a dummy variable for boy and
x1i is a dummy variable for girl. This model
estimates separate means and variances for the
two groups. This is an example of a variance
function because the variance changes as a
function of explanatory variables. The function
is
27
5.3 Deriving the variance function
We arrive at the expression
(1)
28
5.4 Variance functions at level 2
The notion of variance functions is powerful and
not restricted to level 1 variances.
The random slopes model fitted earlier produces
the following school level predictions which show
school level variability increasing with intake
score.
The model
29
5.5 Two views of the level 2 variance
Given x0 1, we have
Which shows that the level 2 variance is
polynomial function of x1ij
  • View 1 In terms of school lines predicted
    intercepts and slopes varying across schools.

  View 2  In terms of a variance function which
shows how the level 2 variance changes as a
function of 1 or more explanatory variables.
30
5.6 Elaborating the level 1 variance
Maybe the student level departures around their
schools summary lines are not constant.
Note at level 2 we have 2 interpretations of
level 2 random variation, random coefficients
(varying slopes and intercepts across level 2
units) and variance functions. In each level 1
unit, by definition, we only have one point,
therefore the first interpretation does not exist
because you cannot have a slope given a single
data point.
31
5.7 Variance functions at level 1
If we allow standlrt(x1ij) to have a random term
at level 1, we get
32
5.8 Modelling the mean and variance simultaneously
In our model
33
6.0 Multivariate response models
We may have data on two or more responses we wish
to analyse jointly. For example, we may have
english and maths scores on pupils within
schools. We can consider the response type as a
level below pupil.
34
6.1 Rearranging data
Often data comes like this with one row per person
For MLwiN to analyse the data we require the data
matrix to have one row per level 1 unit. That is
one row per response measurement
35
6.2 Writing down the model
Where y1j is the english score for student j and
y2j is the maths score for student j. The means
and variances for english and maths(?0,?1,?u02,?u1
2) are estimated. Also the covariance between
maths and english, ? u01is estimated.
Note there is no level 1(eij) variance. This can
be seen if we consider the picture for one pupil.
36
6.3 Advantages of framing a multivariate response
model as a multilevel model
The model has the following advantages over
traditional multivariate techniques      It can
deal with missing responses-provided response
data is missing completely at random(MCAR) or
missing at random(MAR) that is missingness is
related to explanatory variables in the
model.   Covariates can be added giving us the
conditional covariance matrix of the
responses.   Further levels can be added to the
model
37
6.4 Example from MLwiN user guide
pupils have two responses written and coursework
mean for written 46.8 Variance(written)
178.7 mean for coursework 73.36 Variance(coursew
ork) 265.4 covariance(written, coursework)
102.3
That is we have two means and a covariance
matrix, which we could get from any stats
package. However, the data are unbalanced. Of the
1905 pupils 202 are missing a written response
and 180 are missing a coursework response.
38
6.5 Further extensions
We can add further explanatory variables. For
example, female. We see that females do better
for coursework than males and worse than males
on written exams males do better on written exams.
39
6.6 Repeated measures.
We may have repeated measurements on individuals,
for example a series of heights or test scores.
Often we want to model peoples growth. We can fit
this structure as a multilevel model with
repeated measurements nested within people. That
is
40
6.7 Advantages of fitting repeated measures
models in a multilevel framework
  • Fitting these structures using a multilevel model
    has the advantages that data can be
  • Unbalanced (individuals can have different
    numbers of measurement occasions)
  • Unequally spaced (different individuals can be
    measured at different ages)
  • As opposed to traditional multivariate techniques
    which require data to be balanced and equally
    spaced.
  • Again the multilevel model requires response
    measurements are MCAR or MAR.

41
6.8 An example from the MLwiN user guide
Repeated measures model for childrens reading
scores
This (random intercepts model) models growth as a
linear process with individuals varying only in
their intercepts. That is for the 405 individuals
in the data set
42
6.9 Further possibilities for repeated measures
model
  • We can go on and fit a random slope model. Which
    in this case allows the model to deal with
    children growing at different rates.
  •  
  • We can fit polynomials in age to allow for
    curvilinear growth.
  •  
  • We can also try and explain between individual
    variation in growth by introducing child level
    variables.
  • If appropriate we can include further levels of
    nesting. For example, if children are nested
    within schools we could fit a 3 level model
    occasionschildrenschools. We could then look
    to see if childrens patterns of growth varied
    across schools.

43
7 Significance testing and model comparison
  • Individual fixed part and random coefficients at
    each level
  • Simultaneous and complex comparisons
  • Comparing nested models likelihood ratio test
  • Use of Deviance Information Criteria

44
7.1 Individual coefficients
  • Akin to t tests in regression models
  • Either specific fixed effect or specific
    variance-covariance component
  • H0 is 0 H1 is not 0
  • H0 is 0 H1 is not 0
  • Procedure Divide estimated coefficient by their
    standard error
  • Judge against a z distribution
  • If ratio exceeds 1.96 then significant at 0.05
    level
  • Approximate procedure asymptotic test, small
    sample properties not well-known.
  • OK for fixed part coefficients but not for random
    (typically small numbers variance distribution
    is likely to have skew)

45
7.2 Simultaneous/complex comparisons
recommended for random part testing
  • Example Testing H0 b2 b3 0 AND b3 5
  • H0 Cb k
  • C is the contrast matrix (p by q) specifying
    the nature of hypothesis (q is number of
    parameters in model p is the number of
    simultaneous tests)
  • FILL Contrast matrix with
  • 1 if parameter involved
  • -1 if involved as a difference
  • 0 not involved otherwise
  • b is a vector of parameters (fixed or random)
    q
  • k is a vector of values that the parameters are
    contrasted against (usually the null) these have
    to be set

46
  • Example Testing H0 b2 b3 0 AND b3 5
  • q 4 (intercept and 3 slope terms)
  • p 2 (2 sets of tests)
  • Overall test against chi square with p degrees of
    freedom
  • Output
  • Result of the contrast
  • Chi-square statistic for each test separately
  • Chi-square statistic for overall test all
    contrasts simultaneously

47
Testing in fixed part 1 slope for Standlrt 2
BoySch from mixed 3 GirlSch from mixed 4 Boysch
from Girlsch Model gt Intervals tests gtFixed
coefficients 4 tests
Basic Statistics gt Tail Areas Chi square
CPRObability 1.586 1 0.20790
48
Testing in random part 1 school variance 2
difference between school and student
variance Model gt Intervals tests gtRandom
coefficients 2 tests
Basic Statistics gt Tail Areas Chi square
CPRObability 25.019 1 5.6768e-007
49
7.6 Do we need a quadratic variance function at
level 2?
-gtCPRObability 32.126 3 4.9230e-007
CPRO 4 1 Benchmarks 0.046 CPRO 6 2 0.050
CPRO 8 3 0.046
50
7.7 Comparing nested models likelihood ratio test
  • Akin to F tests in regression models, i.e., is a
    more complex model a significantly model better
    fit to the data or is simpler model a
    significantly worse fit
  • Procedure
  • Calculate the difference in the deviance of the
    two models
  • Calculate the change in complexity as the
    difference in the number of parameters between
    models
  • Compare the difference in deviance with a
    chi-square distribution with df difference in
    number of parameters
  • Example tutorial data
  • do we get a significant improvement in the fit
    if we move from a constant variance function for
    schools to a quadratic involving Standlrt?

51
-2log(lh) is 9305.78 quadratic -2log(lh)
is 9349.42 constant -gtcalc b3 b2-b1
43.644 -gtcpro 43.410 2 3.7466e-010 NB
significantly worse fit ie need quadratic
52
7.9 Deviance Information Criterion
  • Diagnostic for model comparison
  • Goodness of fit criterion that is penalized for
    model complexity
  • Generalization of the Akaike Information
    Criterion (AIC where df is known)
  • Used for comparing non-nested models (eg same
    number but different variables)
  • Valuable in Mlwin for testing improved goodness
    of fit of non-linear model (eg Logit) because
    Likelihood (and hence Deviance is incorrect)
  • Estimated by MCMC sampling on output get
  • Bayesian Deviance Information Criterion (DIC)
  • Dbar D(thetabar) pD DIC
  • 9763.54 9760.51 3.02 9766.56
  • Dbar the average deviance from the complete
    set of iterations
  • D(thetaBar) the deviance at the expected value
    of the unknown parameters
  • pD the Estimated degrees of freedom consumed
    in the fit, ie Dbar- D(thetaBar)
  • DIC Fit Complexity Dbar pD
  • NB lower values better parsimonious model
  • Somewhat contoversial!
  • Spiegelhalter, D.J., Best, N.G., Carlin, B.P. and
    van der Linde, A. (2002). Bayesian measures of
    model complexity and fit. Journal of the Royal
    Statistical Society, Series B 64 583-640.

53
7.10 Some guidance
  • any decrease in DIC suggests a better model
  • But stochastic nature of MCMC so, with small
    difference in DIC you should confirm if this is a
    real difference by checking the results with
    different seeds and/or starting values.
  • More experience with AIC, and common rules of
    thumb

54
  • 7.11 Example Tutorial dataset example
  • Model 1 NULL model a constant and level 1
    variance
  • Model 2 additionally include slope for Standlrt
  • Model 3 65 fixed school effects (64 dummies and
    constant)
  • Model 4 school as random effects
  • Model 5 65 fixed school intercepts and slopes
  • Model 6 random slopes model quadratic variance
    function

Best Model 6 Note random models (4 6) have
more nominal parameters than their fixed
equivalents but less effective parameters and
a lower DIC value (due to distributional
assumptions)
55
8.0 Generalised Multilevel Models 1 Binary
Responses and Proportions
56
8.0 Generalised multilevel models
  • So Far
  • Response at level 1 has been a continuous
    variable and
  • associated level 1 random term has been assumed
    to have
  • a Normal distribution
  • Now a range of other data types for the response
  • All can be handled routinely by MLwiN
  • Achieved by 2 aspects
  • a non-linear link between response and
    predictors
  • a non-Gaussian level 1 distribution

57
8.1 Typology of discrete responses
58
8.2 Focus on modelling proportions
  • Proportions eg death rate employment rate can
    be conceived as the underlying probability of
    dying probability of being employed
  • Four important attributes of a proportion that
    MUST be taken into account in modelling
  • (1)Closed range bounded between 0 and 1
  • (2)Anticipated non-linearity between response and
    predictors as predicted response approaches
    bounds, greater and greater change in x is
    required to achieve the same change in outcome
    examination analogy
  • (3)Two numbers numerator subset of denominator
  • (4)Heterogeneity variance is not homoscedastic
    two aspects
  • (a) the variance depends on the mean
  • as approach bound of 0 and 1, less
    room to vary
  • ie Variance is a function of the predicted
    probability
  • (b) the variance depends on the denominator
  • small denominators result in highly
    variable proportions

59
8.3 Modelling Proportions
  • Linear probability model that is use standard
    regression model with linear relationship and
    Gaussian random term
  • But 3 problems
  • (1) Nonsensical predictions predicted
    proportions are unbounded, outside range of 0
    and 1
  • (2) Anticipated non-linearity as approach
    bounds
  • (3) Heteogeneity inherent unequal variance
  • dependent on mean and on denominator
  • Logit model with Binomial random term resolves
    all three problems (could use probit, clog-clog)

60
8.5 The logistic model resolves problems 1 2
  • The relationship between the probability and
    predictor(s) can be represented by a logistic
    function, that resembles a S-shaped curve
  • Models not the proportion but a non-linear
    transformation of it (solves problems 12)

61
8.6 The Logit transformation
  • L LOGe(p/ (1-p))
  • L Logit the log of the odds
  • p proportion having an attribute
  • 1-p proportion not having the attribute
  • p/(1-p) the odds of having an attribute
    compared to not having an attribute
  • As p goes from 0 to 1, L goes from minus to plus
    infinity, so if model L, cannot get predicted
    proportions that lie outside 0 and 1 (ie solves
    problem 1)
  • Easy to move between proportions, odds and logits

62
8.7 Proportions, Odds and Logits
63
8.8 The logistic model
  • The underlying probability or proportion is
    non-linearly related to the predictor
  • where e is the base of the natural logarithm
  • linearized by the logit transformation(log
    natural logarithm)

64
8.9 The logistic model key characteristics
  • The logit transformation produces a linear
    function of the parameters.
  • Bounded between 0 and 1
  • Thereby solving problems 1 and 2

65
8.10 Solving problem 3assume Binomial variation
  • Variance of the response in logistic models is
    presumed to be binomial
  • Ie depends on underlying proportion and the
    denominator
  • In practice this is achieved by replacing the
    constant variable at level 1 by a binomial
    weight, z, and constraining the level-1 variance
    to 1 for exact binomial variation
  • The random (level-1) component can be written as

66
8.11 Multilevel Logistic Model
  • Assume observed response comes from a Binomial
    distribution with a denominator for each cell,
    and an underlying probability/proportion
  • Underlying proportions/probabilities, in turn,
    are related to a set of individual and
    neighborhood predictors by the logit link function
  • Linear predictor of the fixed part and the
    higher-level random part

67
8.12 Estimation 1
  • Quasi-likelihood (MQL/PQL 1st and 2nd order)
  • model linearised and IGLS applied.
  • 1st or 2nd order Taylor series expansion (to
    linearise the non-linear model)
  • MQL versus PQL are higher-level effects included
    in the linearisation
  • MQL1 crudest approximation. Estimates may be
    biased downwards (esp. if within cluster sample
    size is small and between cluster variance is
    large eg households). But stable.
  • PQL2 best approximation, but may not converge.
  • Tip Start with MQL1 to get starting values for
    PQL.

68
8.13 Estimation 2
  • MCMC methods get deviance of model (DIC) for
    sequential model testing, and good quality
    estimates even where cluster size is small start
    with MQL1 and then switch to MCMC

69
8.14 Variance Partition Coefficient
yijBinomial(pij,1) logit(pij xij, uj,) a
bx1ij uj Var(uj) su2 var(yij- pij) pij(1-
pij) Level 1 variance is function of
predicted probability
The level 2 variance su2 is on the logit scale
and the level 1 variance var(yij- pij) is on the
probability scale so they can not be directly
compared. Also level 1 variance depends on pij
and therefore x1ij. Possible solutions include i)
set the level 1 variance variance of a standard
logistic distribution ii) simulation method
70
8.15 VPC 1 Threshold Model
But this ignores the fact that the level 1
variance is not constant, but is function of the
mean probability which depends on the predictors
in the fixed part of the model
71
8.16 VPC 2 Simulation Method
72
8.17 Multilevel modelling of binary data
  • Exactly the same as proportions except
  • The response is either 1 or 0
  • The denominator is a set of 1s
  • So that a Yes is 1 out of 1 , while a No is 0
    out of 1

73
8.18 Chapter 9 of Manual Contraceptive Use in
Bangladesh
  • 2867 women nested in 60 districts
  • y1 if using contraception at time of survey, y0
    if not using contraception
  • Covariates age (mean centred), urban residence
    (vs. rural)

74
8.19 Random Intercept Model PQL2
75
8.20 Variance Partition Coefficient
76
8.21 MLwiN Gives
  • UNIT or (subject) SPECIFIC Estimates
  • the fixed effects conditional on higher level
    unit random effects, NOT the
  • POPULATION-AVERAGE estimatesiethe marginal
    expectation of the dependent variables across the
    population "averaged " across the random effects
  • In non-linear models these are different and the
    PA will generally be smaller than US, especially
    as size of random effects grows
  • Can derive PA fom US but not vice-versa (next
    version give both)

77
8.22 Unit specific / Population average
  • Probability of adverse reaction against dose
  • Left subject-specific big differences between
    subjects for middle dose (the between patient
    variance is large),
  • Right is the population average dose response
    curve,
  • Subject-specific curves have a steeper slope in
    the middle range of the dose variable

78
9.0 MCMC estimation in MlwiN
MCMC estimation is a big topic and is given a
pragmatic and cursory treatment here. Interested
students are referred to the manual MCMC
estimation in MLwiN available from http//multile
vel.ioe.ac.uk/beta/index.html
In the workshop so far you have been using IGLS
(Iterative Generalised Least Squares) algorithm
to estimate the models.
79
9.1 IGLS versus MCMC
MCMC
IGLS
80
9.2 Bayesian framework
MCMC estimation operates in a Bayesian framework.
A bayesian framework requires one to think about
prior information we have on the parameters we
are estimating and to formally include that
information in the model. We may make the
decision that we are in a state of complete
ignorance about the parameters we are estimating
in which case we must specify a so called
uninformative prior. The posterior
distribution for a paremeter ? given that we have
observed y is subject to the following rule
p(?y)? p(y ?)p(?)
Where p(?y) is the posterior distribution for ?
given we have observed y p(y ?) is the
likelihood of observing y given ? p(?) is the
probability distribution arising from some
statement of prior belief such as we believe
?N(1,0.01). Note that we believe ?N(1,1) is
a much weaker and therefore less influential
statement of prior belief.
81
9.3 Applying MCMC to multilevel models
In a two level variance components model we have
the following unknowns
There joint posterior is
82
9.4 Gibbs sampling
Evaluating the expression for the joint posterior
with all the parameters unknown is for most
models, virtually impossible. However, if we take
each unknown parameter in turn and temporarily
assume we know the values of the other
parameters, then we can simulate from the so
called conditional posterior distribution. The
Gibbs sampling algorithm cycles through the
following simulation steps. First we assume some
starting values for our unknown parameters
83
9.5 Gibbs sampling cntd
We now have updated all the unknowns in the
model. This process is repeated many times until
eventually we converge on the distribution of
each of the unknown parameters.
84
9.6 IGLS vs MCMC convergence
IGLS algorithm converges, deterministically to a
distribution.
MCMC algorithm converges on a distribution.
Parameter estimates and intervals are then
calculated from the simulation chains.
85
9.7 Other MCMC issues
By default MLwiN uses flat, uniformative priors
see page 5 of MCMC estimation in MLwiN (MEM) For
specifying informative priors see chapter 6 of
MEM. For model comparison in MCMC using the DIC
statistic see chapters 3 and 4 MEM. For
description of MCMC algorithms used in MLwiN see
chapter 2 of MEM.
86
9.8 When to consider using MCMC in MLwiN
If you have discrete response data binary,
binomial, multinomial or Poisson (chapters 11,
12, 20 and 21). Often PQL gives quick and
accurate estimates for these models. However, it
is a good idea to check against MCMC to test for
bias in the PQL estimates.
If you have few level 2 units and you want to
make accurate inferences about the distribution
of higher level variances.
Some of the more advanced models in MLwiN are
only available in MCMC. For example, factor
analysis (chapter 19), measurement error in
predictor variables (chapter 14) and CAR spatial
models (chapter 16)
Other models, can be fitted in IGLS but are
handled more easily in MCMC such as multiple
imputation (chapter 17), cross-classified(chapter
14) and multiple membership models (chapter 15).
All chapter references to MCMC estimation in
MLwiN.
87
10.0 Non-hierarchical multilevel models
  • Two types
  • Cross-classified models
  • Multiple membership models

88
10.1 Cross-classification
For example, hospitals by neighbourhoods.
Hospitals will draw patients from many different
neighbourhoods and the inhabitants of a
neighbourhood will go to many hospitals. No pure
hierarchy can be found and patients are said to
be contained within a cross-classification of
hospitals by neighbourhoods
 
89
10.2 Other examples of cross-classifications
  • pupils within primary schools by secondary
    schools
  • patients within GPs by hospitals
  • interviewees within interviewers by surveys
  • repeated measures within raters by
    individual(e.g. patients by nurses)

90
10.3 Notation
With hirearchical models we have subscript
notation that has one subscript per level and
nesting is implied reading from left. For
example, subscript pattern ijk denotes the ith
level unit within the jth level 2 unit within
the kth level 3 unit. If models become
cross-classified we use the term classification
instead of level. With notation that has one
subscript per classification, that captures the
relationship between classifications, notation
can become very cumbersome. We propose an
alternative notation that only has a single
subscript no matter how many classifications are
in the model.
91
10.4 Single subscript notation
We write the model as
Where classification 2 is nbhd and classification
3 is hospital. Classification 1 always
corresponds to the classification at which the
response measurements are made, in this case
patients. For patients 1 and 11 equation (1)
becomes
92
10.5 Classification diagrams
In the single subscript notation we loose
informatin about the relationship(crossed or
nested) between classifications. A useful way of
conveying this informatin is with the
classification diagram. Which has one node per
classification and nodes linked by arrows have a
nested relationship and unlinked nodes have a
crossed relationship.
Hospital
Neighbourhood
Patient
Cross-classified structure where patients from a
hospital come from many neighbourhoods and people
from a neighbourhood attend several hospitals.
Nested structure where hospitals are contained
within neighbourhoods
93
10.6 Data example Artificial insemination by
donor
1901 women 279 donors 1328 donations 12100
ovulatory cycles response is whether conception
occurs in a given cycle
In terms of a unit diagram
Or a classification diagram
94
10.7 Model for artificial insemination data
artificial insemination
We can write the model as
Results
95
10.8 Multiple membership models
  • Where level 1 units are members of more than one
    higher level unit. For example,
  •  Pupils change schools/classes and each
    school/class has an effect on pupil outcomes
  • Patients are seen by more than one nurse during
    the course of their treatment

 
96
10.9 Notation
Note that nurse(i) now indexes the set of nurses
that treat patient i and w(2)i,j is a weighting
factor relating patient i to nurse j. For
example, with four patients and three nurses, we
may have the following weights
97
10.10 Classification diagrams for multiple
membership relationships
Double arrows indicate a multiple membership
relationship between classifications
We can mix multiple membership, crossed and
hierarchical structures in a single model
98
10.11 Example involving, nesting, crossing and
multiple membership Danish chickens
Production hierarchy 10,127 child flocks
725
houses 304 farms
Breeding hierarchy 10,127 child flocks 200 parent
flocks
As a unit diagram
As a classification diagram
99
10.12 Model and results
100
10.13 Alspac data
All the children born in the Avon area in 1990
followed up longitudinally
Many measurements made including educational
attainment measures
Children span 3 school year cohorts(say
1994,1995,1996)
Suppose we wish to model development of numeracy
over the schooling period. We may have the
following attainment measures on a child
m1 m2 m3 m4 m5 m6 m7
m8 primary school secondary
school
101
10.14 Structure for primary schools
  • Measurement occasions within pupils
  • At each occasion there may be a different teacher
  • Pupils are nested within primary school cohorts
  • All this structure is nested within primary school
  • Pupils are nested within residential areas

102
10.15 A mixture of nested and crossed
relationships
Nodes directly connected by a single arrow are
nested, otherwise nodes are cross-classified. For
example, measurement occasions are nested within
pupils. However, cohort are cross-classified with
primary teachers, that is teachers teach more
than one cohort and a cohort is taught by more
than one teacher.
103
10.16 Multiple membership
It is reasonable to suppose the attainment of a
child in a particualr year is influenced not only
by the current teacher, but also by teachers in
previous years. That is measurements occasions
are multiple members of teachers.
We represent this in the classification diagram
by using a double arrow.
104
10.17 What happens if pupils move area?
Classification diagram without pupils moving
residential areas
If pupils move area, then pupils are no longer
nested within areas. Pupils and areas are
cross-classified. Also it is reasonable to
suppose that pupils measured attainments are
effected by the areas they have previously lived
in. So measurement occasions are multiple members
of areas
Classification diagram where pupils move between
residential areas
BUT
105
10.18 If pupils move area they will also move
schools
Classification diagram where pupils move between
areas but not schools
If pupils move schools they are no longer nested
within primary school or primary school cohort.
Also we can expect, for the mobile pupils, both
their previous and current cohort and school to
effect measured attainments
Classification diagram where pupils move between
schools and areas
106
10.19 If pupils move area they will also move
schools cntd
And secondary schools
We could also extend the above model to take
account of Secondary school, secondary school
cohort and secondary school teachers.
107
10.20 Other predictor variables
Remember we are partitioning the variability in
attainment over time between primary school,
residential area, pupil, p. school cohort,
teacher and occasion. We also have predictor
variables for these classifications, eg pupil
social class, teacher training, school budget and
so on. We can introduce these predictor variables
to see to what extent they explain the
partitioned variability.
Write a Comment
User Comments (0)
About PowerShow.com