Bayesian General Linear Model - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Bayesian General Linear Model

Description:

General linear models extend the above setup to the case where: ... The Bayesian setup for the GLM is a very natural extension of the framework we ... – PowerPoint PPT presentation

Number of Views:337
Avg rating:3.0/5.0
Slides: 22
Provided by: jeffgry
Category:

less

Transcript and Presenter's Notes

Title: Bayesian General Linear Model


1
Bayesian General Linear Model
  • Review of the General Linear Model
  • Bayesian setup of the GLM
  • WinBugs Implementation of the GLM
  • Logit, Poisson, etc.

2
Review of GLMS
  • In general, statistical models contain both
    systematic and random components.
  • For the standard linear model, we assume that Y
    (the dep. var) is a vector of random variables
    whose components are independently distributed
    with mean ?
  • ? represents the systematic component of the
    model (the expected value of Y) and is assumed to
    be a linear function of explanatory variables X
    and parameters b.
  • The random part of the model (the unexplainable
    error terms) are assumed to be independent with
    constant error variance.
  • In the classical model, we assume that the random
    component follows a normal distribution. Thus,
    components of Y are independent normal variables
    with constant variance.
  • Thus, Y ? e XB e

3
Review of GLMs
  • To segue to the general linear model, note that
    the classical model has three important
    components
  • the random componenteach observation or
    component of Y has an independent normal
    distribution with E(Y) ? and constant variance
    ??2.
  • 2) the systematic componentcovariates X produce
    a linear predictor ? XB
  • 3) The link between the random and the systematic
    components ? ?.
  • For the normal linear model, the link states that
    the linear predictor ? XB is identical to the
    expected value of the random component.
  • However, more generally, ?i g(?i), where g( )
    is called the link function.
  • General linear models extend the above setup to
    the case where
  • 1) the random component follows a distribution
    other than the normal
  • 2) the link function is a function other than
    that given above.

4
Review of GLMs
  • Common distributions other than the normal for
    the random component include
  • Poisson
  • Bernoulli (or binomial)
  • Weibull (for duration models)
  • Multinomial

5
Review of GLMs
  • The link function relates the linear predictor ?
    to an observation y.
  • In classical linear models, where the dependent
    variable is normal, we use the identity link.
    Since the expected value of the linear predictor
    can take any value for the normal distribution,
    the identity link makes sense.
  • For a Poisson (count) model, ? gt 0, so the
    identity link is less attractive because the
    linear predictor ? can be negative while ?
    cannot.
  • So, for Poisson models we typically use the log
    link, ? log(?), or its inverse ? exp?.
  • This function maps the linear predictor which can
    take any value along the real line to a set of
    plausible expected values.
  • For a Bernoulli model, 0 lt ? lt 1, so the identity
    link is unattractive.
  • So, for Bernoulli models, we typically use the
    logit link, ? log(?/(1- ?))

6
Bayesian Setup of the GLM
  • The Bayesian setup for the GLM is a very natural
    extension of the framework we have used for
    regression models.
  • Step 1. Specify the probability distribution for
    the dependent variable in your model.
  • i.e. yi N(?i, t)
  • Step 2. Define the linear predictor for your
    model.
  • i.e. b1 b2x2i
  • Step 3. Choose the link function that maps from
    the linear predictor ? to a set of plausible
    expected values for ?.
  • i.e. ?i b1 b2x2i
  • Step 4. Choose priors for all of the parameters
    in your model.
  • i.e. bj N(0, .001) and t G(.1 , .1)

7
General Comments on WinBugs Implementation for
GLMs
  • In most cases, you will need to specify a set of
    initial values for the regression coefficients
    for the logit model.
  • You may need to specify a more reasonable set of
    priors for the precision terms for your
    parameters than the massively diffuse priors we
    used for the normal linear model.
  • Failure to follow 1) and 2) may cause WinBugs to
    bomb. To see why, suppose we chose diffuse priors
    for our regression coefficients and did not
    specify a set of initial values. WinBugs will
    choose a reasonable set of initial values for us
    by basically sampling from our priors. It is
    likely that the coefficients will be either very
    big or very small numbers. Consequently, when
    these numbers are used to sample from the linear
    predictor, the expected value will be a very
    large or very small number, which when
    substituted into the link function will yield an
    impossibly large number or a number so close to
    zero that the link function will be undefined.
  • 3) GLMs will take a longer time to run than the
    linear model (not iterations of the Gibbs sampler
    necessarily, but cpu time). Tricks to speed
    convergence will therefore become increasingly
    important because each iteration is more costly.

8
Logistic Regression
  • Suppose that yi Bernoulli(pi),
  • To ensure that 0 lt pi lt 1, we use the logit
    transformation so,
  • logit(pi) lt- b1 b2x2i bkxki
  • We assume that bj N(0, .001) for all j
  • noticethere is no prior distribution for the
    variance of y because there is not a parameter in
    the Bernoulli distribution for variance
  • Thus, the joint posterior distribution of the
    parameters is given by the following expression
  • p(b1,,bky,x) ?? p(b1,,bk)?ip(yi b1 b2x2i
    bkxki)
  • ? p(b1,,bk) ?i piy (1-pi)1-y
  • p(b1,,bk)?I logit-1(b1 b2x2i bkxki)y
  • ? logit-1(1-(b1 b2x2i bkxki))1-y

9
ApplicationVoter perceptions of party
differences
  • Dependent Variable whether voters indicate that
    Republicans are more conservative than Democrats
    on a liberal-conservative ideology scale
    (1successful classification, 0failure)
  • Independent Variables
  • - Respondent level data
  • race, gender, education, party identification,
    strength of respondents ideology (folded
    self-placement)
  • - Contextual data
  • degree of elite-level polarization, whether
    there was divided government, and whether there
    was a presidential election year

10
WinBugs Implementation of Example
  • model
  • for (i in 129022)
  • placementi dbern(pi)
  • logit(pi) lt- b1 b2racei
    b3genderi b4educi b5pidi
    b6strideoli b7elitepoli
    b8divgovi b9offyeari
  • for (j in 19)
  • bj dnorm(0,.001)
  • Comment 1if all independent variables are
    standardized the model converges quickly even
    without the multivariate normal prior
    distribution for bj, though with 30,000
    observations, things still take awhile.

11
Model Interpretation in WinBugs
  • Ive never tried this, but in principle the
    generation of predicted values with confidence
    bands should be quite simple.
  • Set it up as follows Let logit(mustar) the
    expected value for the linear component of your
    model. Then
  • logit(mustar) lt- b1 b2mean(race)
    b3mean(gender)
  • bk(mean(educ) stdev(educ))
  • Pred dbern(mustar)
  • Set sample monitor tools to monitor pred, which
    generate a sample summary of predicted values
    from the posterior predictive distribution
    corresponding to the stated values of X.

12
Hierarchical Logit
  • The hierarchical logistic regression model is a
    very easy extension of standard logit.
  • Likelihood
  • yij Bernoulli(pij),
  • logit(pij) lt- b1j b2jx2ij bkjxkij
  • Priors
  • bjk N(Bjk, Tk) for all j,k
  • Bjk lt- ?k1 ?k2 zj2 ?km zjm
  • ?qr N(0, .001) for all q,r
  • Tk Gamma(.01, .01)

13
Poisson Regression
  • Suppose that yi Poisson(?i)
  • To ensure that ?i gt 0, we use the log link
    function,
  • log(?i) lt- b1 b2x2i bkxki
  • We assume that bj N(0, .001) for all j

14
ApplicationSlave Revolts
  • Dependent Variable the number of national slave
    revolts in year t (1805-1860).
  • Independent Variables
  • nominal slave prices
  • whether the U.S. was at war
  • whether it was a presidential election year
  • lagged revolts
  • interaction between war and election, war and
    lagged revolts, and election and lagged revolts
  • Key variables of interest were the interaction
    terms with elections because the theory predicted
    that elections would communicate weakness of the
    police state to the slaves.

15
WinBugs Implementation
  • model
  • for (i in 156)
  • natrvlti dpois(lambdai)
  • log(lambdai) lt- b1 b2nlslvpri
  • b3Wari b4Electioni
    b5natrvltL1i b6ElectioninatrvltL1i
    b7WariElectioni
  • b8WarinatrvltL1i
  • for (j in 18) bj dnorm(0,.1)

16
Multinomial Logit
  • Suppose that yi multinomial(pi1,,pim, 1)
  • Let xij represent the set of factors that might
    induce actor i to choose outcome j.
  • Then the probability that individual i chooses
    outcome j can be represented
  • pij exp(b1j b2jxij)/ ?k1 to m exp(b1k
    b2kxik)
  • Let log(phiij) b1j b2jxij such that
  • pij phiij / (?k1 to m phiik)
  • To implement the multinomial logit, it is
    necessary to fix one of latent utilities in the
    model (the log(phis)) to sum value such that the
    predictors then influence how the attributes of
    one variables influence the change in probability
    of choosing one outcome versus the baseline.
  • Thus, for the fixed outcome j, we let b1j b2j
    0 and assume diffuse normal priors for everything
    else.

17
WinBugs Implementation of MNL
  • model
  • for (i in 1numberofobservations)
  • yi, 1numberofchoices dmulti( pi,
    1numberofchoices , 1 )
  • for (j in 1 numberofchoices)
  • pi,j lt- phii,j / sum(phii,)
  • log(phii,j) lt- bj,1 bj,2x2i
  • priors - fix the values for the first outcome
    variable to be zero to establish a baseline
  • b1,1 lt- 0
  • b1,2 lt- 0
  • all other parameters influence the probability
    of an outcome relative to the baseline
  • for (j in 2numberofchoices)
  • bj,1 dnorm(0, .01)
  • bj,2 dnorm(0, .01)
  • Data--the trick is to create a matrix for the
    dependent variable here that is consistent with
    the number of observations and the number of
    choices.
  • Suppose the number of observations 10 and the
    number of choices ). Then

18
Autoregressive Model
  • Suppose that you have a model with serial
    correlation. That is, you have a time-series
    model such that
  • Yt b1 b2x2t bkxkt et,
  • where et ?et-1 vt and 0 ? ? ? 1 and
    vt N(0 , ??2v)
  • It can be shown that
  • Var(et) ??2v / (1- ?2) and ? cov(et ,
    et-1) /??2e
  • Recall that serial correlation decreases our
    models efficiency, but may mask that loss
    because standard errors are biased (positive
    serial correlation decreases standard errors,
    negative serial correlation increases standard
    errors).

19
The standard correction for serial correlation
  • By Definition et ?et-1 vt
  • where et-1 Yt-1 - b1 b2x2t-1 bkxkt-1
    et-1
  • To correct for serial correlation, we leverage
    this expression for et into our original
    expression for Yt such that for all t gt 1
  • Yt b1 b2x2t bkxkt ?et-1 vt
  • b1 b2x2t bkxkt ?(Yt-1 - b1
    b2x2t-1 bkxkt-1 et-1) vt
  • Which can be rewritten for t gt 1
  • Yt - ?Yt-1 b1 (1-?) b2(x2t - ?x2t-1)
    bk(xkt - ?xkt-1) vt
  • This transformed equation has a normally
    distributed error term with mean zero and
    constant variance.

20
ApplicationReagans Presidential Approval
(Example brazenly lifted from Jackmans MCMC page)
  • Dependent variable Reagans aggregate public
    approval score in month t.
  • Independent variables
  • - Inflation rate
  • - Unemployment rate

21
WinBugs Implementation
  • model
  • mu1 lt- b1 b2infl1 b3unemp1
  • app1 dnorm(mu1,tau.u)
  • for (t in 296) loop over
    obs 2 to T
  • mut lt- b1(1-rho)
  • b2(inflt - rhoinflt-1)
  • b3(unempt - rhounempt-1)
  • rhoappt-1
  • appt dnorm(mut, tau.e)
  • sigma.e lt- 1/tau.e convert
    precision to variance for transformed equation
  • sigma.u lt- sigma.e/(1pow(rho,2)) regression
    error variance for original equation
  • tau.u lt- 1/sigma.u see definition of
    Var(et) from above
  • priors
  • rho dunif(-1,1) uniform
    prior on stationary interval
  • b13 dmnorm(b0, B0 , )
    multivariate Normal prior
Write a Comment
User Comments (0)
About PowerShow.com