Bayesian General Linear Model

About This Presentation

Title:

Bayesian General Linear Model

Description:

General linear models extend the above setup to the case where: ... The Bayesian setup for the GLM is a very natural extension of the framework we ... – PowerPoint PPT presentation

Number of Views:337

Avg rating:3.0/5.0

Slides: 22

Provided by: jeffgry

Category:

more less

Transcript and Presenter's Notes

Title: Bayesian General Linear Model

1
Bayesian General Linear Model

Review of the General Linear Model
Bayesian setup of the GLM
WinBugs Implementation of the GLM
Logit, Poisson, etc.

2
Review of GLMS

In general, statistical models contain both
systematic and random components.
For the standard linear model, we assume that Y
(the dep. var) is a vector of random variables
whose components are independently distributed
with mean ?
? represents the systematic component of the
model (the expected value of Y) and is assumed to
be a linear function of explanatory variables X
and parameters b.
The random part of the model (the unexplainable
error terms) are assumed to be independent with
constant error variance.
In the classical model, we assume that the random
component follows a normal distribution. Thus,
components of Y are independent normal variables
with constant variance.
Thus, Y ? e XB e

3
Review of GLMs

To segue to the general linear model, note that
the classical model has three important
components
the random componenteach observation or
component of Y has an independent normal
distribution with E(Y) ? and constant variance
??2.
2) the systematic componentcovariates X produce
a linear predictor ? XB
3) The link between the random and the systematic
components ? ?.
For the normal linear model, the link states that
the linear predictor ? XB is identical to the
expected value of the random component.
However, more generally, ?i g(?i), where g( )
is called the link function.
General linear models extend the above setup to
the case where
1) the random component follows a distribution
other than the normal
2) the link function is a function other than
that given above.

4
Review of GLMs

Common distributions other than the normal for
the random component include
Poisson
Bernoulli (or binomial)
Weibull (for duration models)
Multinomial

5
Review of GLMs

The link function relates the linear predictor ?
to an observation y.
In classical linear models, where the dependent
variable is normal, we use the identity link.
Since the expected value of the linear predictor
can take any value for the normal distribution,
the identity link makes sense.
For a Poisson (count) model, ? gt 0, so the
identity link is less attractive because the
linear predictor ? can be negative while ?
cannot.
So, for Poisson models we typically use the log
link, ? log(?), or its inverse ? exp?.
This function maps the linear predictor which can
take any value along the real line to a set of
plausible expected values.
For a Bernoulli model, 0 lt ? lt 1, so the identity
link is unattractive.
So, for Bernoulli models, we typically use the
logit link, ? log(?/(1- ?))

6
Bayesian Setup of the GLM

The Bayesian setup for the GLM is a very natural
extension of the framework we have used for
regression models.
Step 1. Specify the probability distribution for
the dependent variable in your model.
i.e. yi N(?i, t)
Step 2. Define the linear predictor for your
model.
i.e. b1 b2x2i
Step 3. Choose the link function that maps from
the linear predictor ? to a set of plausible
expected values for ?.
i.e. ?i b1 b2x2i
Step 4. Choose priors for all of the parameters
in your model.
i.e. bj N(0, .001) and t G(.1 , .1)

7
General Comments on WinBugs Implementation for
GLMs

In most cases, you will need to specify a set of
initial values for the regression coefficients
for the logit model.
You may need to specify a more reasonable set of
priors for the precision terms for your
parameters than the massively diffuse priors we
used for the normal linear model.
Failure to follow 1) and 2) may cause WinBugs to
bomb. To see why, suppose we chose diffuse priors
for our regression coefficients and did not
specify a set of initial values. WinBugs will
choose a reasonable set of initial values for us
by basically sampling from our priors. It is
likely that the coefficients will be either very
big or very small numbers. Consequently, when
these numbers are used to sample from the linear
predictor, the expected value will be a very
large or very small number, which when
substituted into the link function will yield an
impossibly large number or a number so close to
zero that the link function will be undefined.
3) GLMs will take a longer time to run than the
linear model (not iterations of the Gibbs sampler
necessarily, but cpu time). Tricks to speed
convergence will therefore become increasingly
important because each iteration is more costly.

8
Logistic Regression

Suppose that yi Bernoulli(pi),
To ensure that 0 lt pi lt 1, we use the logit
transformation so,
logit(pi) lt- b1 b2x2i bkxki
We assume that bj N(0, .001) for all j
noticethere is no prior distribution for the
variance of y because there is not a parameter in
the Bernoulli distribution for variance
Thus, the joint posterior distribution of the
parameters is given by the following expression
p(b1,,bky,x) ?? p(b1,,bk)?ip(yi b1 b2x2i
bkxki)
? p(b1,,bk) ?i piy (1-pi)1-y
p(b1,,bk)?I logit-1(b1 b2x2i bkxki)y
? logit-1(1-(b1 b2x2i bkxki))1-y

9
ApplicationVoter perceptions of party
differences

Dependent Variable whether voters indicate that
Republicans are more conservative than Democrats
on a liberal-conservative ideology scale
(1successful classification, 0failure)
Independent Variables
- Respondent level data
race, gender, education, party identification,
strength of respondents ideology (folded
self-placement)
- Contextual data
degree of elite-level polarization, whether
there was divided government, and whether there
was a presidential election year

10
WinBugs Implementation of Example

model
for (i in 129022)
placementi dbern(pi)
logit(pi) lt- b1 b2racei
b3genderi b4educi b5pidi
b6strideoli b7elitepoli
b8divgovi b9offyeari
for (j in 19)
bj dnorm(0,.001)
Comment 1if all independent variables are
standardized the model converges quickly even
without the multivariate normal prior
distribution for bj, though with 30,000
observations, things still take awhile.

11
Model Interpretation in WinBugs

Ive never tried this, but in principle the
generation of predicted values with confidence
bands should be quite simple.
Set it up as follows Let logit(mustar) the
expected value for the linear component of your
model. Then
logit(mustar) lt- b1 b2mean(race)
b3mean(gender)
bk(mean(educ) stdev(educ))
Pred dbern(mustar)
Set sample monitor tools to monitor pred, which
generate a sample summary of predicted values
from the posterior predictive distribution
corresponding to the stated values of X.

12
Hierarchical Logit

The hierarchical logistic regression model is a
very easy extension of standard logit.
Likelihood
yij Bernoulli(pij),
logit(pij) lt- b1j b2jx2ij bkjxkij
Priors
bjk N(Bjk, Tk) for all j,k
Bjk lt- ?k1 ?k2 zj2 ?km zjm
?qr N(0, .001) for all q,r
Tk Gamma(.01, .01)

13
Poisson Regression

Suppose that yi Poisson(?i)
To ensure that ?i gt 0, we use the log link
function,
log(?i) lt- b1 b2x2i bkxki
We assume that bj N(0, .001) for all j

14
ApplicationSlave Revolts

Dependent Variable the number of national slave
revolts in year t (1805-1860).
Independent Variables
nominal slave prices
whether the U.S. was at war
whether it was a presidential election year
lagged revolts
interaction between war and election, war and
lagged revolts, and election and lagged revolts
Key variables of interest were the interaction
terms with elections because the theory predicted
that elections would communicate weakness of the
police state to the slaves.

15
WinBugs Implementation

model
for (i in 156)
natrvlti dpois(lambdai)
log(lambdai) lt- b1 b2nlslvpri
b3Wari b4Electioni
b5natrvltL1i b6ElectioninatrvltL1i
b7WariElectioni
b8WarinatrvltL1i
for (j in 18) bj dnorm(0,.1)

16
Multinomial Logit

Suppose that yi multinomial(pi1,,pim, 1)
Let xij represent the set of factors that might
induce actor i to choose outcome j.
Then the probability that individual i chooses
outcome j can be represented
pij exp(b1j b2jxij)/ ?k1 to m exp(b1k
b2kxik)
Let log(phiij) b1j b2jxij such that
pij phiij / (?k1 to m phiik)
To implement the multinomial logit, it is
necessary to fix one of latent utilities in the
model (the log(phis)) to sum value such that the
predictors then influence how the attributes of
one variables influence the change in probability
of choosing one outcome versus the baseline.
Thus, for the fixed outcome j, we let b1j b2j
0 and assume diffuse normal priors for everything
else.

17
WinBugs Implementation of MNL

model
for (i in 1numberofobservations)
yi, 1numberofchoices dmulti( pi,
1numberofchoices , 1 )
for (j in 1 numberofchoices)
pi,j lt- phii,j / sum(phii,)
log(phii,j) lt- bj,1 bj,2x2i
priors - fix the values for the first outcome
variable to be zero to establish a baseline
b1,1 lt- 0
b1,2 lt- 0
all other parameters influence the probability
of an outcome relative to the baseline
for (j in 2numberofchoices)
bj,1 dnorm(0, .01)
bj,2 dnorm(0, .01)
Data--the trick is to create a matrix for the
dependent variable here that is consistent with
the number of observations and the number of
choices.
Suppose the number of observations 10 and the
number of choices ). Then

18
Autoregressive Model

Suppose that you have a model with serial
correlation. That is, you have a time-series
model such that
Yt b1 b2x2t bkxkt et,
where et ?et-1 vt and 0 ? ? ? 1 and
vt N(0 , ??2v)
It can be shown that
Var(et) ??2v / (1- ?2) and ? cov(et ,
et-1) /??2e
Recall that serial correlation decreases our
models efficiency, but may mask that loss
because standard errors are biased (positive
serial correlation decreases standard errors,
negative serial correlation increases standard
errors).

19
The standard correction for serial correlation

By Definition et ?et-1 vt
where et-1 Yt-1 - b1 b2x2t-1 bkxkt-1
et-1
To correct for serial correlation, we leverage
this expression for et into our original
expression for Yt such that for all t gt 1
Yt b1 b2x2t bkxkt ?et-1 vt
b1 b2x2t bkxkt ?(Yt-1 - b1
b2x2t-1 bkxkt-1 et-1) vt
Which can be rewritten for t gt 1
Yt - ?Yt-1 b1 (1-?) b2(x2t - ?x2t-1)
bk(xkt - ?xkt-1) vt
This transformed equation has a normally
distributed error term with mean zero and
constant variance.

20
ApplicationReagans Presidential Approval
(Example brazenly lifted from Jackmans MCMC page)

Dependent variable Reagans aggregate public
approval score in month t.
Independent variables
- Inflation rate
- Unemployment rate

21
WinBugs Implementation

model
mu1 lt- b1 b2infl1 b3unemp1
app1 dnorm(mu1,tau.u)
for (t in 296) loop over
obs 2 to T
mut lt- b1(1-rho)
b2(inflt - rhoinflt-1)
b3(unempt - rhounempt-1)
rhoappt-1
appt dnorm(mut, tau.e)
sigma.e lt- 1/tau.e convert
precision to variance for transformed equation
sigma.u lt- sigma.e/(1pow(rho,2)) regression
error variance for original equation
tau.u lt- 1/sigma.u see definition of
Var(et) from above
priors
rho dunif(-1,1) uniform
prior on stationary interval
b13 dmnorm(b0, B0 , )
multivariate Normal prior

Write a Comment

User Comments (0)

About PowerShow.com

Bayesian General Linear Model - PowerPoint PPT Presentation

Bayesian General Linear Model

General linear models extend the above setup to the case where: ... The Bayesian setup for the GLM is a very natural extension of the framework we ... – PowerPoint PPT presentation