Bayesian analysis of a oneparameter model - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Bayesian analysis of a oneparameter model

Description:

So, we use summary statistics (e.g. mean, var, HDR). 2 Methods for generating summary stats: ... Prior Var. Prior Mean. Summary Table. Summary Statistics of ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 29

Provided by: jeffgry

Category:

more less

Transcript and Presenter's Notes

Title: Bayesian analysis of a oneparameter model

1
Bayesian analysis of a one-parameter model

I. The binomial distributionuniform prior
Integration tricks
II. Posterior Interpretation
III. Binomial distributionbeta prior
Conjugate priors and sufficient statistics

2
Review of the Bayesian Setup

From the Bayesian perspective, there are known
and unknown quantities.
- The known quantity is the data, denoted D.
- The unknown quantities are the parameters
(e.g. mean, variance, missing data), denoted ?.
To make inferences about the unknown quantities,
we stipulate a joint probability function that
describes how we believe these quantities behave
in conjunction, p(?,D).
Using Bayes Rule, this joint probability
function can be rearranged to make inference
about ?
p( ? D ) p( ? ) p( D ? ) / p( D )

3
Review of the Bayesian Set-Up cont.

L( ? D ) is the likelihood function for ?
??p(?)p(D ?)d? is the normalizing constant or
the prior predictive distribution.
It is the normalizing constant because it ensures
that the posterior distribution of ? integrates
to one.
It is the prior predictive distribution because
it is not conditional on a previous observation
of the data-generating process (prior) and
because it is the distribution of an observable
quantity (predictive).

4
Review of the Bayesian Set-Up cont.
Why are we allowed to do this? Why might it not
be as useful?
5
Example The Binomial Distribution

Suppose X1, X2, , Xn are independent random
draws from the same Bernoulli distribution with
parameter ?.
Thus, Xi Bernoulli( ? ) for i ? 1, ... , n
or equivalently, Y ? Xi Binomial(?? , n)
The joint distribution of Y and ? is the product
of the conditional distribution of Y and the
prior distribution ?.
What distribution might be a reasonable choice
for the prior distribution of ?? Why?

6
Binomial Distribution cont.

If Y Bin(?, n), a reasonable prior distribution
for p must be bounded between zero and one.
One option is the uniform dist. ? Unif( 0, 1 ).

As it happens, this is a proper posterior density
function. How can you tell?
7
Binomial Distribution cont.

Let Y Bin(?, n) and ? Unif( 0, 1 ).

You cannot just call the posterior a binomial
distribution because you are conditioning on Y
and ? is a random variable, not the other way
around.
This is the normalization constant to transform
?y(1-?)n-y into a beta distribution.
8
Application-The Cultural Consensus Model

A researcher examined the level of consensus
denoted ? among n 24 Guatemalan women about
whether or not polio (as well as other diseases)
was thought to be contagious. In this case, 17
women said polio was contagious.
Let Xi 1 if respondent i thought polio was
contagious and Xi 0 otherwise.
Let ?i Xi Y Bin(?,24) and let ? Unif(0,1)
Based on the previous slide
p(?Y,n) Beta(Y1, n-Y1).
Substitute n 24 and Y 17 into the posterior
distribution.
Thus, p(?Y,n) Beta(18,8)

9
The Posterior Distribution

The posterior distribution summarizes all that we
know after analyzing the data
How do we interpret the posterior distribution
p(?Y,n) Beta(18,8)
One option is graphically

10
Posterior Summaries

The full posterior contains too much information,
especially in multi-parameter models. So, we use
summary statistics (e.g. mean, var, HDR).
2 Methods for generating summary stats
1) Analytical Solutions use the well-known
analytic solutions for the mean, variance, etc.
of the various posterior distribution.
2) Numerical Solutions use a random number
generator to draw a large number of values from
the posterior distribution, then compute summary
stats from those random draws.

11
Analytic Summaries of the Posterior

Analytic summaries are based on standard results
from probability theory (see the handout from
Gills Text).
Continuing our example, p(?Y,n) Beta(18,8)

12
Numerical Summaries of the Posterior

To create numerical summaries from the posterior,
you need a random number generator.
To summarize p(?Y,n) Beta(18,8)
Draw a large number of random samples from a
Beta(18,8) distribution
Calculate the sample statistics from that set of
random samples.

13
Numerical Summaries of the Posterior

S-Plus code (should work in R) for Beta(18,8)
summary
true posterior plot (see before)
xlt-01000/1000
postlt-dbeta(x,18,8)
plot(x,post)
take 1000 draws from the posterior
rands lt- rbeta(1000,18,8)
create summaries of those draws)
mean(rands)
median(rands)
var(rands)
hist(rands,20)

Mean(?).70 Median(?).70 Var(?).01
14
Highest Posterior Density Regions (also known
as Bayesian confidence or credible intervals)

Highest Density Regions (HDRs) are intervals
containing a specified posterior probability. The
figure below plots the 95 highest posterior
density region.

Beta(18,8)
95 HDR .51,.84
15
Identification of the HDR

It is easiest to find the Highest Density Region
numerically.
In S-Plus, to find the 95 HDR
take 1000 draws from the posterior
rands lt- rbeta(1000,18,8)
sort the random from highest to lowest, then
identify the thresholds for the 95 credible
interval.
Quantile(rands,c(.025,.975))

16
An alternative HDR

With asymmetric posterior distributions, it makes
more sense to identify regions of equal heights,
rather than of equal mass
(I havent figured out a cute way to do this
numerically).

Beta(18,8)
HDR
17
Confidence Intervals vs. Bayesian Credible
Intervals

Differing interpretations
The Bayesian credible interval is the probability
given the data that a true value of ? lies in the
interval.
Technically, P(??Interval)X)?Intervalp( ? X
)d?
The frequentist ?-percent confidence interval is
the region of the sampling distribution for ?
such that given the observed data one would
expect (100-?) percent of the future estimates of
? to be outside that interval.
Technically, ? 1-?a to b g( u ? )du

U is a dummy variable of integration for the
estimated value of ?
These limits are functions of the data
18
Confidence Intervals vs. Bayesian Credible
Intervals

But often the results appear similar
If Bayesians use non-informative priors and
there is a large number of observations, often
several dozen will do, HDRs and frequentist
confidence intervals will coincide numerically.
We will talk more about this when we cover the
great p-value debate, but this is only a
coincidence.
The interpretation of the two quantities is
entirely different.

19
Returning to the Binomial Distribution

If Y Bin(n,?), the uniform prior is just one of
an infinite number of possible prior
distributions.
What other distributions could we use?
A reasonable alternative to the unif(0,1)
distribution is the beta distribution.

Can you show that Beta(1,1) is a uniform(0,1)
distribution?
20
Prior ConsequencesPlots of 4 Different Beta
Distributions
Beta(5,5)
Beta(3,10)
Beta(10,3)
Beta(100,30)
21
The Binomial Distribution with Beta Prior

If Y Bin(n,?) and ? Beta(?,?), then

This is a very nasty looking integral. Rather
than computing it directly, we shall use a
standard trick in the Bayesian toolbox. 1)
Find some multiplicative constant c such that
f(y)c 1. ? i.e. try to transform f(y) into a
well-known pdf. 2) Multiply by c and c-1 3)
Since cf(y)1, the original numerator multiplied
by c-1 is the posterior distribution.
22
The posterior predictive distribution
This is the kernel of the beta distribution
This is called a beta-binomial distribution
23
The posterior of the binomial model with beta
priors
This is a Beta(Y?, n-Y?) distribution.
Beautifully, it worked out that the posterior
distribution is a form of the prior distribution
updated by the new data. In general, when this
occurs we say the prior is conjugate.
24
Continuing the earlier example, if 17 of 24 women
say polio is contagious (so Y17 and n 24,
where Y is a binomial) and you use a Beta(5,5)
prior, the posterior distribution is
Beta(175,24-175) Beta(22,12)
Posterior Mean .65 Posterior Variance .01
Posterior
Prior
25
What is the mle for this likelihood?

Have the students derive the maximum likelihood
estimate to serve as a basis of comparison.

26
Prior ConsequencesPlots of 4 Different Beta
Distributions
Beta(5,5)
Beta(3,10)
Beta(10,3)
Beta(100,30)
27
Comparison of four different posterior
distributions (in red) for the four different
priors (black)
Prior Beta(5,5) Post Beta(22,12)
Prior Beta(10,3) Post Beta(27,10)
Prior Beta(3,10) Post Beta(20,17)
Prior Beta(100,30)Post Beta(117,37)
28
Summary Statistics of the Findings for different
priors

Write a Comment

User Comments (0)