Introduction to Hierarchical Models - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Introduction to Hierarchical Models

Description:

The hierarchical setup and the concept of ... basic Bayesian setup for. hierarchical data ... just a standard Bayesian setup where we are assigning some ... – PowerPoint PPT presentation

Number of Views:233
Avg rating:3.0/5.0
Slides: 19
Provided by: jeffgry
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Hierarchical Models


1
Introduction to Hierarchical Models
  • Intuitions of Hierarchical Modeling
  • The hierarchical setup and the concept of
    exchangeability
  • The hierarchical Poisson model with gamma priors
  • The hierarchical normal model with normal priors

2
The Concept of Hierarchical Data Structures
  • Hierarchical data is ubiquitous in the social
    sciences where measurement occurs at different
    levels of aggregation.
  • e.g. we collect measurements of individuals who
    live in a certain locality or belong to a
    particular race or social group.
  • When this occurs, standard techniques either
    assume that these groups belong to entirely
    different populations or ignore the aggregate
    information entirely.
  • Hierarchical models provide a way of pooling the
    information for the disparate groups without
    assuming that they belong to precisely the same
    population.

3
The basic Bayesian setup for hierarchical data
structures
  • Suppose we have collected data about some random
    variable Y from m different populations with n
    observations for each population.
  • Let yij represent observation j from population
    i.
  • Suppose yij f(?i), where ?i is a vector of
    parameters for population i.
  • Further, ?i f(?) ? may also be a vector
  • ? note, until this point this is just a standard
    Bayesian setup where we are assigning some prior
    distribution for the parameters ? that govern the
    distribution of y.
  • Now we extend the model, and assume that the
    parameters ?11, ?12 that govern the distribution
    of the ?s are themselves random variables and
    assign a prior distribution to these variables as
    well
  • ? f(a,b)
  • where, ? is called the hyperprior. The parameters
    a,b,c,d for the hyperprior may be known and
    represent our prior beliefs about ? or, in
    theory, we can also assign a probability
    distribution for these quantities as well, and
    proceed to another layer of hierarchy.

4
Graphical Illustration of a hierarchical model
Priors for each sub-population
Hyperpriors for the full sample
Data
5
Exchangeability
  • Exchangeability (formal) The parameters ?1,
    ?2,, ?n are exchangeable in their joint
    distribution if p(?1, ?2,, ?n) is invariant to
    permutations in the index 1, 2, , n.
  • Exchangeability (informal) If no information
    other than the data is available to distinguish
    any of the ?js form any of the others, and no
    ordering of the parameters can be made, one must
    assume symmetry among the parameters in the prior
    distribution.
  • This concept is closely related to the concept of
    identically and independent random variables
    where, conditional on the data, each observation
    is treated the same.

6
Example A Gamma-Poisson Model
  • Gill (2002) examines data on the number of
    marriages per 1,000 people in Italy from 1936 to
    1951. He asks, did the marriage rate decline
    during the war years?
  • To model this process, he assumes that the number
    of marriages per 1000 people follows a Poisson
    distribution.
  • marriagest Poisson(??t )
  • How would we have addressed this question before
    now?
  • Why might we model this as a hierarchical process?

7
Exchangeability continued
  • Exchangeability means that we can treat the
    parameters for each sub-population as
    exchangeable units.
  • In its simplest form, each parameter ?j is
    treated as an independent sample from a
    distribution governed by unknown parameter vector
    ?.
  • p(?1, ?2,, ?n ?) ??i p(?i ?)
  • ? in a more general form, we may also condition
    on data that we have about the different
    sub-populations.
  • Further, we can write the joint prior
    distribution as
  • p(?1, ?2,, ?n , ?) p(?1, ?2,, ?n ?) p(?).
  • By Bayes rule
  • p(?1, ?2,, ?n , ? Y ) ?? prior ? likelihood
    for Y.

8
Italian Marriages cont.
  • marriagest Poisson( ?t ) for t 1936, , 1951.
  • To model this as a hierarchical process, we
    assume that each of the annual means ?t are
    exchangeable draws from a common distribution.
  • ? In this case, the gamma distribution has
    desirable properties.
  • Thus,
  • ?t Gamma( ?, ? ) for t 1936, , 1951
  • ? Note that ? and ? are unknown parameters.
  • To satisfy the requirement of exchangeability,
    what must we assume about the data generating
    process?
  • Finally, to complete the hierarchical structure,
    we must assign hyperpriors for the parameters ?
    and ?. Again, the gamma distribution has nice
    properties, so we assume that
  • ? Gamma( A, B ) and ? Gamma( C, D ).
  • ? Note, we pick real numbers for the numbers A,
    B, C, D to represent our prior beliefs (which in
    the usual case we shall assume are flat).

9
Graphical Representation of the Hierarchical
Gamma-Poisson Model
The prior parameters ? and ? are unknown. Both ?
and ? are assumed to be drawn from Gamma
distributions
? Gamma ( C, D )
? Gamma ( C, D )
The year specific means ?t are random draw from a
gamma distribution.
?1936 Gamma ( ?, ? )
?t Gamma ( ?, ? )
?1951 Gamma ( ?, ? )
The data observed for any given year y is a
random draw from a Poisson distribution with
year-specific mean.
y1936 Poisson ( ?1936 )
yt Poisson ( ?t )
y1951 Poisson ( ?1951 )
In this model we have more unknown parameters
than observations! There are t parameters ? ?
? and only t observations. Why is this okay?
10
The conditional distributions of ?i, ?, ? in the
Gamma-Poisson Hierarchical Model
  • To implement this model in a Gibbs Sampler, it is
    necessary to derive the conditional distribution
    of ?i, ?, and ?. WinBugs knows what these are,
    but sometimes it is informative to derive them
    ourselves. In this case,
  • p(?i, ?, ? y) ? ?i Poisson(yi ?i)?
    p(?i?,?)p(?)p(?)
  • Using our trick for conditional distributions, we
    know that
  • p(?i ?, ?, y) ? ?i Poisson(yi ?i)? p(?i?,?)
    ?(yi?,1?)
  • and
  • p(? ?1 , ?, y) ? p(?) ?i p(?i?,?) ? Not a
    standard dist.
  • and
  • p(? ?1 , ?, y) ? p(?) ?i p(?i?,?) ? Gamma
    Distribution

11
WinBugs Implementation of the Italian Marriage
Rates Example
  • model
  • for (i in 116)
  • marriagesi dpois(lambdai)
  • lambdai dgamma(alpha,beta)
  • alpha dgamma(1,1) A1,B1 (diffuse priors)
  • beta dgamma(1,1) C1,D1 (diffuse priors)
  • Data
  • list(marriages7,9,8,7,7,6,6,5,5,7,9,10,8,8,8,7)
  • Use the boxplot function in WinBugs

12
Example 2Political Ideologies
  • Previously, we examined individuals responses to
    a 7-point liberal-conservative ideology survey
    question in two different ways.
  • Method 1) Assume that all respondents are drawn
    from the same pool and examine the overall mean
    and variance.
  • Method 2) Break respondents into categories based
    on their self-reported partisan identities and
    estimate the mean and variance of Democrats,
    Republicans, and Independents separately.
  • Using a hierarchical approach, individuals are
    treated as independent draws from a
    party-specific distribution, but the mean of each
    of the party-specific distributions is itself a
    draw from a hyper-distribution with some unknown
    mean and variance.
  • ? if the hyper-distribution has zero variance,
    then Method 1 above is a special case.
  • ? if the hyper-distribution has infinite
    variance, then Method 2 above is a special case.
  • ? typically, however, we find that if we borrow
    strength across populations by including the
    hyper-distribution, the separate population means
    shrink toward a common mean.
  • Note I am using the term population
    colloquially, not in technical sense

13
The ideology example
  • We assume that the random variable respondent
    ideology (denoted y) follows a normal
    distribution with a mean and variance specific to
    the respondents party
  • ydem,j N(?dem, ?dem)
  • yind,j N(?ind, ?ind)
  • yrep,j N(?rep, ?rep) for all j ? sample.
  • Further, assume that ?i is also a normal random
    variable with unknown mean and variance.
  • Thus, ?p N(?M , T ) for p ? Dems, Inds, Reps
  • But, we shall allow model the precision terms in
    the standard way with non-informative gamma
    priors.
  • Thus, ?p ?(.1, .1) for p ? Dems, Inds, Reps.
  • Finally, we need to assign pdfs for the
    hyperpriors as well.
  • What would be a reasonable distribution to
    choose?

14
Ideology example continued
  • If ?p N(?M , T ) for p ? Dems, Inds, Reps
  • Then the obvious choice for the hyperpriors for M
    and T is to assume that the mean is normally
    distributed and the precision follows a gamma
    distribution.
  • Thus, M N(4, .01) and T ?(.1, .1)
  • Note, if we assume that T ??, then we are
    imposing the condition that ?Dem ?Ind ?Rep.
    This is equivalent to assuming that all
    observations are drawn from a distribution with
    the same overall mean.
  • If we assume that T ?0, then we are assuming
    that there is no underlying structure to the
    data. This is equivalent to assuming that there
    is no hierarchical structure in the data.

15
Derivation of the conditional distributions of ?,
?, M for the normal hierarchical model
  • yp,j N(?p, ?p) ?p N(M, T) ?p ?(? ,?)
    MN(m,t) T?(a,b)
  • By the conditional distribution trick
  • p(?py,?p, ?p/i, M,T) ? N(?p,?p)N(M,T)

This is the kernel of a normal distribution. The
mean of this distribution is a weighted average
of the sample mean for sub-population p and the
parent population. The weights are provided by
the precision of the parent population and the
sub-population.
16
Derivation of the conditional distributions of ?,
?, M for the normal hierarchical model
  • yp,j N(?p, ?p) ?p N(M, T) ?p ?(? ,?)
    MN(m,t) T?(a,b)
  • By the conditional distribution trick
  • p(My,?p,?p,T) ? ?pN(?p,?p)N(M,T)

This is the kernel of a normal distribution. The
mean of this distribution is a weighted average
of the prior population mean and the average
sub-population mean. The weights are provided by
the prior precision and the precision of the
parent population.
17
WinBugs Implementation of the ideology example
  • model
  • for (i in 1669)
  • ideologyRi dnorm(mupid3i,
    taupid3i)
  • temp1i lt- pid3i
  • temp2i lt- pid7i
  • for (j in 13)
  • muj dnorm(M,T)
  • tauj dgamma(.1,.1)
  • M dnorm(4,.01)
  • T dgamma(.1,.1)

18
Final Comments
  • In the case of political ideologies, we find that
    there is very little shrinkage toward the
    overall mean. Why?
  • One of the properties of hierarchical models is
    that if the posterior precision of the hyperprior
    is very large, then we are essentially finding
    that each of the sub-populations is drawn from a
    common distribution with zero variance. This
    means that we have endogenously estimated that
    the means of all of the sub-populations are
    identical. In this case, there may be a great
    deal of shrinkage toward the overall population
    mean due to the fact that variation from the
    population mean is random noise.
  • On the other hand, if the posterior precision of
    the hyperprior is very small, then we are
    essentially finding that each of the
    sub-populations is drawn from a distribution with
    a very different mean. In which case there is
    little shrinkage toward the population mean,
    which is desirable because we dont want to
    impose structure where none exists.
  • The absence of shrinkage in the example is due to
    the fact that Democrats, Republicans, and
    Independents actual do have significantly
    different ideologies.
Write a Comment
User Comments (0)
About PowerShow.com