Title: Bayesian analysis of a oneparameter model
1Bayesian analysis of a one-parameter model
- I. The binomial distributionuniform prior
- Integration tricks
- II. Posterior Interpretation
- III. Binomial distributionbeta prior
- Conjugate priors and sufficient statistics
2Review of the Bayesian Setup
- From the Bayesian perspective, there are known
and unknown quantities. - - The known quantity is the data, denoted D.
- - The unknown quantities are the parameters
(e.g. mean, variance, missing data), denoted ?. - To make inferences about the unknown quantities,
we stipulate a joint probability function that
describes how we believe these quantities behave
in conjunction, p(?,D). - Using Bayes Rule, this joint probability
function can be rearranged to make inference
about ? - p( ? D ) p( ? ) p( D ? ) / p( D )
3Review of the Bayesian Set-Up cont.
- L( ? D ) is the likelihood function for ?
- ??p(?)p(D ?)d? is the normalizing constant or
the prior predictive distribution. - It is the normalizing constant because it ensures
that the posterior distribution of ? integrates
to one. - It is the prior predictive distribution because
it is not conditional on a previous observation
of the data-generating process (prior) and
because it is the distribution of an observable
quantity (predictive).
4Review of the Bayesian Set-Up cont.
Why are we allowed to do this? Why might it not
be as useful?
5Example The Binomial Distribution
- Suppose X1, X2, , Xn are independent random
draws from the same Bernoulli distribution with
parameter ?. - Thus, Xi Bernoulli( ? ) for i ? 1, ... , n
- or equivalently, Y ? Xi Binomial(?? , n)
- The joint distribution of Y and ? is the product
of the conditional distribution of Y and the
prior distribution ?. - What distribution might be a reasonable choice
for the prior distribution of ?? Why?
6Binomial Distribution cont.
- If Y Bin(?, n), a reasonable prior distribution
for p must be bounded between zero and one. - One option is the uniform dist. ? Unif( 0, 1 ).
As it happens, this is a proper posterior density
function. How can you tell?
7Binomial Distribution cont.
- Let Y Bin(?, n) and ? Unif( 0, 1 ).
You cannot just call the posterior a binomial
distribution because you are conditioning on Y
and ? is a random variable, not the other way
around.
This is the normalization constant to transform
?y(1-?)n-y into a beta distribution.
8Application-The Cultural Consensus Model
- A researcher examined the level of consensus
denoted ? among n 24 Guatemalan women about
whether or not polio (as well as other diseases)
was thought to be contagious. In this case, 17
women said polio was contagious. - Let Xi 1 if respondent i thought polio was
contagious and Xi 0 otherwise. - Let ?i Xi Y Bin(?,24) and let ? Unif(0,1)
- Based on the previous slide
- p(?Y,n) Beta(Y1, n-Y1).
- Substitute n 24 and Y 17 into the posterior
distribution. - Thus, p(?Y,n) Beta(18,8)
9The Posterior Distribution
- The posterior distribution summarizes all that we
know after analyzing the data - How do we interpret the posterior distribution
- p(?Y,n) Beta(18,8)
- One option is graphically
10Posterior Summaries
- The full posterior contains too much information,
especially in multi-parameter models. So, we use
summary statistics (e.g. mean, var, HDR). - 2 Methods for generating summary stats
- 1) Analytical Solutions use the well-known
analytic solutions for the mean, variance, etc.
of the various posterior distribution. - 2) Numerical Solutions use a random number
generator to draw a large number of values from
the posterior distribution, then compute summary
stats from those random draws.
11Analytic Summaries of the Posterior
- Analytic summaries are based on standard results
from probability theory (see the handout from
Gills Text). - Continuing our example, p(?Y,n) Beta(18,8)
12Numerical Summaries of the Posterior
- To create numerical summaries from the posterior,
you need a random number generator. - To summarize p(?Y,n) Beta(18,8)
- Draw a large number of random samples from a
Beta(18,8) distribution - Calculate the sample statistics from that set of
random samples.
13Numerical Summaries of the Posterior
- S-Plus code (should work in R) for Beta(18,8)
summary - true posterior plot (see before)
- xlt-01000/1000
- postlt-dbeta(x,18,8)
- plot(x,post)
- take 1000 draws from the posterior
- rands lt- rbeta(1000,18,8)
- create summaries of those draws)
- mean(rands)
- median(rands)
- var(rands)
- hist(rands,20)
Mean(?).70 Median(?).70 Var(?).01
14Highest Posterior Density Regions (also known
as Bayesian confidence or credible intervals)
- Highest Density Regions (HDRs) are intervals
containing a specified posterior probability. The
figure below plots the 95 highest posterior
density region.
Beta(18,8)
95 HDR .51,.84
15Identification of the HDR
- It is easiest to find the Highest Density Region
numerically. - In S-Plus, to find the 95 HDR
- take 1000 draws from the posterior
- rands lt- rbeta(1000,18,8)
- sort the random from highest to lowest, then
identify the thresholds for the 95 credible
interval. - Quantile(rands,c(.025,.975))
16An alternative HDR
- With asymmetric posterior distributions, it makes
more sense to identify regions of equal heights,
rather than of equal mass - (I havent figured out a cute way to do this
numerically).
Beta(18,8)
HDR
17Confidence Intervals vs. Bayesian Credible
Intervals
- Differing interpretations
- The Bayesian credible interval is the probability
given the data that a true value of ? lies in the
interval. - Technically, P(??Interval)X)?Intervalp( ? X
)d? - The frequentist ?-percent confidence interval is
the region of the sampling distribution for ?
such that given the observed data one would
expect (100-?) percent of the future estimates of
? to be outside that interval. - Technically, ? 1-?a to b g( u ? )du
U is a dummy variable of integration for the
estimated value of ?
These limits are functions of the data
18Confidence Intervals vs. Bayesian Credible
Intervals
- But often the results appear similar
- If Bayesians use non-informative priors and
there is a large number of observations, often
several dozen will do, HDRs and frequentist
confidence intervals will coincide numerically. - We will talk more about this when we cover the
great p-value debate, but this is only a
coincidence. - The interpretation of the two quantities is
entirely different.
19Returning to the Binomial Distribution
- If Y Bin(n,?), the uniform prior is just one of
an infinite number of possible prior
distributions. - What other distributions could we use?
- A reasonable alternative to the unif(0,1)
distribution is the beta distribution.
Can you show that Beta(1,1) is a uniform(0,1)
distribution?
20Prior ConsequencesPlots of 4 Different Beta
Distributions
Beta(5,5)
Beta(3,10)
Beta(10,3)
Beta(100,30)
21The Binomial Distribution with Beta Prior
- If Y Bin(n,?) and ? Beta(?,?), then
This is a very nasty looking integral. Rather
than computing it directly, we shall use a
standard trick in the Bayesian toolbox. 1)
Find some multiplicative constant c such that
f(y)c 1. ? i.e. try to transform f(y) into a
well-known pdf. 2) Multiply by c and c-1 3)
Since cf(y)1, the original numerator multiplied
by c-1 is the posterior distribution.
22The posterior predictive distribution
This is the kernel of the beta distribution
This is called a beta-binomial distribution
23The posterior of the binomial model with beta
priors
This is a Beta(Y?, n-Y?) distribution.
Beautifully, it worked out that the posterior
distribution is a form of the prior distribution
updated by the new data. In general, when this
occurs we say the prior is conjugate.
24Continuing the earlier example, if 17 of 24 women
say polio is contagious (so Y17 and n 24,
where Y is a binomial) and you use a Beta(5,5)
prior, the posterior distribution is
Beta(175,24-175) Beta(22,12)
Posterior Mean .65 Posterior Variance .01
Posterior
Prior
25What is the mle for this likelihood?
- Have the students derive the maximum likelihood
estimate to serve as a basis of comparison.
26Prior ConsequencesPlots of 4 Different Beta
Distributions
Beta(5,5)
Beta(3,10)
Beta(10,3)
Beta(100,30)
27Comparison of four different posterior
distributions (in red) for the four different
priors (black)
Prior Beta(5,5) Post Beta(22,12)
Prior Beta(10,3) Post Beta(27,10)
Prior Beta(3,10) Post Beta(20,17)
Prior Beta(100,30)Post Beta(117,37)
28Summary Statistics of the Findings for different
priors