Title: Bayesian Data Analysis
1Bayesian Data Analysis
- Eric T. Bradlow
- Associate Professor of Marketing and Statistics
- The Wharton School
- Lecture 1
2Goals of this course
- Familiarize you with the Bayesian paradigm for
estimation and inference - Apply this to problems in the educational testing
arena (Bayesian Psychometrics) - Teach you, via in-class demonstrations and
assignments, Bayesian computing via the BUGS
program
3Course Details
- Meetings NBME, every other week, Monday
330-430 - Textbook Bayesian Data Analysis, Gelman, Carlin,
Stern, and Rubin Chapman and Hall 1995 - Software BUGS, Bayesian Inference Using Gibbs
Sampling -
- Course Website
- mktgweb.wharton.upenn.edu/ebradlow/bayesian_data_a
nalysis_homepage.htm - - BUGS information, Lecture Notes, Assignments
- Instructor
- Eric T. Bradlow (215) 898-8255
- 761 JMHH, The Wharton School
- Ebradlow_at_wharton.upenn.edu
4Lecture 1 Outline
- A Review of Frequentist and Bayesian jargon
- Frequentist and Bayesian Inference
- An example using coin flips and the Normal
distribution
Loosely speaking this is CH 1 and beginning of CH
2 of GCSR
5Frequentist and Bayesian Paradigm
Observed Dependent Variable y1,,
yn Covariates unit specific x1,, xn
general Z
Observed Dependent Variable y1,,
yn Covariates unit specific x1,, xn
general Z
Ex yi test scores, xi person-level
characteristics, Z item characteristics
Sampling Distribution
Likelihood
y1,, yn p(y?)
y1,, yn p(y?)
Interpretation ? is fixed, but unknown. ys are
noisy manifestations of ? (sampling error), and
we estimate ? using y1,, yn , x1,.., Xn, Z to
maximize the likelihood of the data.
- Interpretation ? is a random variable which has
its own - probability distribution ?(?). The observed data
y1,.., yn, - x1,.., xn, Z informs about the likelihood of one
value of - over another.It is used to update ?(?) to yield
a - posterior distribution.
6Graphically
AN(
?, Var( )),
Var( ) FI-1( )
Frequentist
y1,yn
y1(1),yn(1)
?
y1(2),yn(2)
?
..
Histogram of
y1(inf),yn(inf)
Bayesian
Point-wise multiplication
Likelihood
Posterior
Prior
?(?)
p(y?)
?(?Y)
?
?
?
?
7Inference
Bayesian
Constructs (1-?) confidence interval
Constructs (1-?) Bayesian Posterior Interval
Interpretation (1-?) of all values of
will be contained in this interval. It is the
sampling distribution of the estimator.
Interpretation Pr(? in Interval) 1-?, its
natural meaning.
Issues
Issues
- Where to obtain the prior?
- -diffuse priors, Jeffreys priors, meta-analysis
- (2) The Bayesian can compute the posterior MODE,
but - typically wants to infer something about the
entire posterior - distribution (mean, median, quantiles, sd,
etc.). Since the - posterior is typically not attainable in
closed-form the - Bayesian recourse is to obtain a posterior sample.
- What if asymptotically normality does not hold?
- Computationally fast if MLE is in closed-form
- MLE is the solution, value of ?, of
- dp(Y?)/d? 0
- (3) Very standard algorithms if not, Netwons
method, - Gradient Search (Steepest Descent).
- (4) Ignores prior information
- (5) Difficult to implement with sparse data
8Example Coin-flipping model, Maximum
Likelihood(one observation per unit, common p)
- y1,, yn Binomial(n, p)
- yi 1 with prob p independent flips
- 0 with prob 1-p constant p across trials
- p(y1,ynp) ? ?ipyi(1-p)(1-yi)
p?yi(1-p)(n-?yi) - Then, taking logs (montone transform so
maximizing p is the same as maximizing log p), we
solve the equation - dlog p(y1,ynp)/dp 0 which yields
- Var( ) and now the CI is constructed
9Example Coin-flipping model
- y1,, yn Binomial(n, p)
- p(y1,ynp) ? ?ipyi(1-p)(1-yi)
p?yi(1-p)(n-?yi)
In this manner a can be thought of as a prior
number of successes and b a prior number of
failures
Prior ?(p) is asserted for p, the unknown
probability
Notice how these combine naturally
?(p) Beta(a,b), a beta-distribution pa-1(1-p)b-1
What is the problem here though????!!!!!!!
A prior distribution is said to be conjugate for
a parameter if the prior and posterior are of the
same distribution.
?(p y1,yn,a,b) Beta(a ?yi, b
n-?yi) Posterior mean (a ?yi)/(a ?yi b
n-?yi)
10Example (Normal Distribution, one obs. per
unit)Case 1 ML inference
assuming CI between observations
- Y1,.., Yn N(?, ?2)
- p(yi?, ?2) (1/(2??))0.5exp(-0.5((yi- ?)/ ?)2)
- p(y1,..,yn?, ?2) ?i(1/(2??))0.5exp(-0.5((yi-
?)/ ?)2) - (1/(2 ??))n/2exp(-0.5?i((yi- ?)/ ?)2)
- dlogp(y1,..,yn?, ?2)/d ?0 which yields
- and
- dlogp(y1,..,yn?, ?2)/d ?0 which yields
- Then utilizing the central limit theorem and the
fact that var( ) ?2/n we obtain the
frequentist interval estimate.
11Example Normal DistributionCase 2 Bayesian
Inference
- Y1,.., Yn N(?, ?2)
- p(y1,..,yn?, ?2) (1/(2 ??))n/2exp(-0.5?i((yi-
?)/ ?)2) - Now, though, we assert prior distributions for ?,
?2. - Conjugate prior ? N(?0, ?20)
- Posterior distribution for ?, assuming flat
priors for ?0, ?20 - P(? y1,..,yn,?0, ?20)
Shrinkage
Posterior mean is a precision (1/variance)
weighted average of the prior mean ?0 and
likelihood mean
Posterior precision prior precision
likelihood precision (information addition)
Upshot (1) If data is very informative about ?
then n/?2 is large and posterior mean Is close
to the MLE (the nice part is that the data
decides this) (2) If your prior is diffuse ?20
-gt Infinity then you get back the MLE solution
12General Inference Problem
- In cases where the prior and likelihood are
conjugate, usually Bayesian inference can be done
in a straightforward manner. - Computation is straightforward
- However, if they are not (which are most
problems), then comes in the area of markov chain
simulation, e.g. Gibbs sampling.
13An example where closed-form inference does not
exist that is close to home
Examinee ability
- Bayesian IRT model
- Yij 1 w.p. pij logit(pij) aj(?i-bj)
(2-PL) - 0 w.p. 1-pij ?i N(0,1)
Item slope
Item difficulty
Mean is set to 0 to shift identify the model, sd
is set to 1 to scale identify it
P(Y11,, YIJIpij)
?(?i) (2?)-0.5exp(-0.5 ?i2)
The product of these two functions is not a
known distribution and hence it can not be
maximized directly nor can it be sampled from
directly, nor are its moments known directly.
14Bayes, Empirical Bayes, and Bayes Empirical Bayes
- In Bayesian inference all parameters have
distributions and inference is made by
integrating over distributions to obtain marginal
posterior distributions. - In Bayes Empirical Bayes methods (which is what
ALL people call Empirical Bayes methods), the
parameters of the prior distribution are
estimated and treated as known and fixed - In Empirical Bayes methods (see Maritz and Lwin,
seminal reference) the entire prior distribution
is estimated, the so called inverse-integration
problem - We will discuss this in detail next time
15Summary of Todays Lecture
- Frequentist v. Bayesian Inference
- Conjugate Distribution
- Inference via simulation in non-conjugate
situations - Assignment (1) Read Chapter 1 and 2 GCSR
- (2) Come in with one critical question you had
about each chapter.