Bayesian Data Analysis - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Bayesian Data Analysis

Description:

An example using coin flips and the Normal distribution ... Example: Coin-flipping model, Maximum Likelihood (one observation per unit, common p) ... – PowerPoint PPT presentation

Number of Views:302
Avg rating:3.0/5.0
Slides: 16
Provided by: ebra5
Category:

less

Transcript and Presenter's Notes

Title: Bayesian Data Analysis


1
Bayesian Data Analysis
  • Eric T. Bradlow
  • Associate Professor of Marketing and Statistics
  • The Wharton School
  • Lecture 1

2
Goals of this course
  • Familiarize you with the Bayesian paradigm for
    estimation and inference
  • Apply this to problems in the educational testing
    arena (Bayesian Psychometrics)
  • Teach you, via in-class demonstrations and
    assignments, Bayesian computing via the BUGS
    program

3
Course Details
  • Meetings NBME, every other week, Monday
    330-430
  • Textbook Bayesian Data Analysis, Gelman, Carlin,
    Stern, and Rubin Chapman and Hall 1995
  • Software BUGS, Bayesian Inference Using Gibbs
    Sampling
  • Course Website
  • mktgweb.wharton.upenn.edu/ebradlow/bayesian_data_a
    nalysis_homepage.htm
  • - BUGS information, Lecture Notes, Assignments
  • Instructor
  • Eric T. Bradlow (215) 898-8255
  • 761 JMHH, The Wharton School
  • Ebradlow_at_wharton.upenn.edu

4
Lecture 1 Outline
  • A Review of Frequentist and Bayesian jargon
  • Frequentist and Bayesian Inference
  • An example using coin flips and the Normal
    distribution

Loosely speaking this is CH 1 and beginning of CH
2 of GCSR
5
Frequentist and Bayesian Paradigm
  • Frequentist Bayesian

Observed Dependent Variable y1,,
yn Covariates unit specific x1,, xn
general Z
Observed Dependent Variable y1,,
yn Covariates unit specific x1,, xn
general Z
Ex yi test scores, xi person-level
characteristics, Z item characteristics
Sampling Distribution
Likelihood
y1,, yn p(y?)
y1,, yn p(y?)
Interpretation ? is fixed, but unknown. ys are
noisy manifestations of ? (sampling error), and
we estimate ? using y1,, yn , x1,.., Xn, Z to
maximize the likelihood of the data.
  • Interpretation ? is a random variable which has
    its own
  • probability distribution ?(?). The observed data
    y1,.., yn,
  • x1,.., xn, Z informs about the likelihood of one
    value of
  • over another.It is used to update ?(?) to yield
    a
  • posterior distribution.

6
Graphically
AN(
?, Var( )),
Var( ) FI-1( )
Frequentist
y1,yn
y1(1),yn(1)
?
y1(2),yn(2)
?
..
Histogram of
y1(inf),yn(inf)
Bayesian
Point-wise multiplication
Likelihood
Posterior
Prior
?(?)
p(y?)
?(?Y)
?
?
?
?
7
Inference
Bayesian
  • Frequentist



Constructs (1-?) confidence interval
Constructs (1-?) Bayesian Posterior Interval
Interpretation (1-?) of all values of
will be contained in this interval. It is the
sampling distribution of the estimator.
Interpretation Pr(? in Interval) 1-?, its
natural meaning.
Issues
Issues
  • Where to obtain the prior?
  • -diffuse priors, Jeffreys priors, meta-analysis
  • (2) The Bayesian can compute the posterior MODE,
    but
  • typically wants to infer something about the
    entire posterior
  • distribution (mean, median, quantiles, sd,
    etc.). Since the
  • posterior is typically not attainable in
    closed-form the
  • Bayesian recourse is to obtain a posterior sample.
  • What if asymptotically normality does not hold?
  • Computationally fast if MLE is in closed-form
  • MLE is the solution, value of ?, of
  • dp(Y?)/d? 0
  • (3) Very standard algorithms if not, Netwons
    method,
  • Gradient Search (Steepest Descent).
  • (4) Ignores prior information
  • (5) Difficult to implement with sparse data

8
Example Coin-flipping model, Maximum
Likelihood(one observation per unit, common p)
  • y1,, yn Binomial(n, p)
  • yi 1 with prob p independent flips
  • 0 with prob 1-p constant p across trials
  • p(y1,ynp) ? ?ipyi(1-p)(1-yi)
    p?yi(1-p)(n-?yi)
  • Then, taking logs (montone transform so
    maximizing p is the same as maximizing log p), we
    solve the equation
  • dlog p(y1,ynp)/dp 0 which yields
  • Var( ) and now the CI is constructed

9
Example Coin-flipping model
  • y1,, yn Binomial(n, p)
  • p(y1,ynp) ? ?ipyi(1-p)(1-yi)
    p?yi(1-p)(n-?yi)

In this manner a can be thought of as a prior
number of successes and b a prior number of
failures
Prior ?(p) is asserted for p, the unknown
probability
Notice how these combine naturally
?(p) Beta(a,b), a beta-distribution pa-1(1-p)b-1
What is the problem here though????!!!!!!!
A prior distribution is said to be conjugate for
a parameter if the prior and posterior are of the
same distribution.
?(p y1,yn,a,b) Beta(a ?yi, b
n-?yi) Posterior mean (a ?yi)/(a ?yi b
n-?yi)
10
Example (Normal Distribution, one obs. per
unit)Case 1 ML inference
assuming CI between observations
  • Y1,.., Yn N(?, ?2)
  • p(yi?, ?2) (1/(2??))0.5exp(-0.5((yi- ?)/ ?)2)
  • p(y1,..,yn?, ?2) ?i(1/(2??))0.5exp(-0.5((yi-
    ?)/ ?)2)
  • (1/(2 ??))n/2exp(-0.5?i((yi- ?)/ ?)2)
  • dlogp(y1,..,yn?, ?2)/d ?0 which yields
  • and
  • dlogp(y1,..,yn?, ?2)/d ?0 which yields
  • Then utilizing the central limit theorem and the
    fact that var( ) ?2/n we obtain the
    frequentist interval estimate.

11
Example Normal DistributionCase 2 Bayesian
Inference
  • Y1,.., Yn N(?, ?2)
  • p(y1,..,yn?, ?2) (1/(2 ??))n/2exp(-0.5?i((yi-
    ?)/ ?)2)
  • Now, though, we assert prior distributions for ?,
    ?2.
  • Conjugate prior ? N(?0, ?20)
  • Posterior distribution for ?, assuming flat
    priors for ?0, ?20
  • P(? y1,..,yn,?0, ?20)

Shrinkage
Posterior mean is a precision (1/variance)
weighted average of the prior mean ?0 and
likelihood mean
Posterior precision prior precision
likelihood precision (information addition)
Upshot (1) If data is very informative about ?
then n/?2 is large and posterior mean Is close
to the MLE (the nice part is that the data
decides this) (2) If your prior is diffuse ?20
-gt Infinity then you get back the MLE solution
12
General Inference Problem
  • In cases where the prior and likelihood are
    conjugate, usually Bayesian inference can be done
    in a straightforward manner.
  • Computation is straightforward
  • However, if they are not (which are most
    problems), then comes in the area of markov chain
    simulation, e.g. Gibbs sampling.

13
An example where closed-form inference does not
exist that is close to home
Examinee ability
  • Bayesian IRT model
  • Yij 1 w.p. pij logit(pij) aj(?i-bj)
    (2-PL)
  • 0 w.p. 1-pij ?i N(0,1)

Item slope
Item difficulty
Mean is set to 0 to shift identify the model, sd
is set to 1 to scale identify it
P(Y11,, YIJIpij)
?(?i) (2?)-0.5exp(-0.5 ?i2)
The product of these two functions is not a
known distribution and hence it can not be
maximized directly nor can it be sampled from
directly, nor are its moments known directly.
14
Bayes, Empirical Bayes, and Bayes Empirical Bayes
  • In Bayesian inference all parameters have
    distributions and inference is made by
    integrating over distributions to obtain marginal
    posterior distributions.
  • In Bayes Empirical Bayes methods (which is what
    ALL people call Empirical Bayes methods), the
    parameters of the prior distribution are
    estimated and treated as known and fixed
  • In Empirical Bayes methods (see Maritz and Lwin,
    seminal reference) the entire prior distribution
    is estimated, the so called inverse-integration
    problem
  • We will discuss this in detail next time

15
Summary of Todays Lecture
  • Frequentist v. Bayesian Inference
  • Conjugate Distribution
  • Inference via simulation in non-conjugate
    situations
  • Assignment (1) Read Chapter 1 and 2 GCSR
  • (2) Come in with one critical question you had
    about each chapter.
Write a Comment
User Comments (0)
About PowerShow.com