Title: Large Sample Bayesian Inference
1Large Sample Bayesian Inference
2Similarity between Bayesian and Frequentist
Results
- When using a non-informative prior
- Posterior Normal distribution for Normal mean
with known variance - Posterior t distribution for Normal mean with
unknown variance
3Bayesian Asymptotics
- The properties of posterior distribution when
sample size n ? infinity - Not necessary for Bayesian inference, but useful
in approximations.
4Normal Approximation to the Posterior
- If the posterior is unimodel symmetric, often
convenient to approximate it with a Normal
distribution.
5Rationale Taylor Expansion of at Posterior Mode
gt
I(?) represents the curvature of the posterior at
its mode
6Example approximate p(µ, log(s)y) in Normal
model
7Example approximate p(µ, s2y) in Normal model
8When n?8 Bayesian Central Limit Theorem
- Under regularity conditions
- Likelihood function is continuous
- the true parameter value ?0 is not on the
boundary -
- As n?8
- p(?y) converges to N(?0,nJ(?0)-1)
- where J(.) is the Fisher information
9Outline of proof
I(?0)
10Summary Bayesian CLT
11Likelihood based methods
- Posterior mode if uniform prior
12Counter-examples
13Bayesian Hypothesis Testing
14Example Cancers at Slater School
- Teachers and staff were concerned about the two
high-voltage transmission lines that ran past the
school. - observed 8 cases of cancer
- Expected 4.2 cases (according to the 145 years
of employment of teachers, staff National
Cancer Institute statistics) - --The New Yorker Dec 7, 1992.
15Classical Hyp. Testing
- H0 Cancer rate ?4.2/1453
- YBinomial(145, ?)
- y8
- p-valuePr(Ygty?3)Pr(Y8?3) Pr(Y9?3)
Pr(Y10?3) - 0.07
- At a0.05 level, cannot reject null. Cannot
accept null either!!!
16Consider 4 competing theories
- HA ? 3
- HB ? 4
- HC ? 5
- HD ? 6
17Likelihood
- Pr(YyHA)Pr(Y8?0.03)0.036
- Pr(YyHB)Pr(Y8?0.04)0.096
- Pr(YyHC)Pr(Y8?0.05)0.134
- Pr(YyHD)Pr(Y8?0.06)0.136
- Theory B explains the data about 3 times
- as well as Theory A"
- Theories C and D explain the data about equally
well. - -- Bayes factorPr(yHi)/Pr(yHj)
- measures evidence.
18Jeffreys scale of evidence for BF
- Pr(yHi)
- Bayes factor(i,j)-------------
- Pr(yHj)
- B(i,j) lt1/10 strong evidence against Hi
- 1/10 lt B(i,j) lt 1/3 moderate against Hi
- 1/3 lt B(i,j) lt 1 weak evidence against Hi
- Similar to likelihood ratio test
19Prior Belief
- Assume HA is equally likely to be True or False.
HB HC HD are equally likely to be true - Pr(HA)1/2, Pr(HB)Pr(HC)Pr(HD)1/6
20Posterior belief about the theories
- Pr(HAy)P(yHA)P(HA)/P(y) 0.23
- Pr(HBy)0.21
- Pr(HCy)0.28
- Pr(HDy)0.28
- The four theories seem about equally likely."
- The odds are about 3 to 1 that the underlying
cancer rate at Slater is higher than 0.03."
21- Posterior oddsPrior odds x Bayes Factor
22Nested Hypotheses
- H0 ?0.03
- H1 ? !0.03
- Prior belief
- Pr(H0)1/2, Pr(H1)1/2
23Calculation of Bayes factor requires integration
- P(yH0)Bin(y3, n145, p0.03)
- P(yH1)?P(y?)p(?H1)d?
24False Discovery Rate
25Example
Journal of the National Cancer Insitute (6 Dec
1995).
Of 46 vegetables and fruits or related
products, four were significantly associated with
lower prostate cancer risk of the four ---
tomato sauce (P for trend 0.001), tomatoes (P
for trend 0.03), and pizza (P for trend
0.05), but not strawberries --- were primary
sources of lycopene. p-values Tomato sauce
Tomatoes Tomato juice Pizza 0.001 0.03
0.67 0.05
26In a single test setting Type I error
- A type I error to make a discovery when there is
none.
27When testing m hypotheses
- One may wish to control for other errors
28False Discovery Rate V/R
- Current science is discovery driven. Not
substance driven. In other words, tomatoes didnt
lose anything. But the epidemiologist who spent
12 years on tomatoes lost the 12 years he could
have spent on strawberries.
29False discovery rate
Bayesian Interpretation FDR Pr(H0 T(y) ?
Rejection region)
30p-value and FDR generally have a monotonic
relationship
- The more discoveries you make, the more likely
they are false.
31Procedures to control FDR at q
Find a p-value cutoff u such that when rejecting
all pltu, ? FDRltq
32Benjamini-Hochberg procedure
Order p-values so that p(1)ltp(2)lt ... Let r be
the largest i such that
Reject all pltp(r)
i/m quantile
p-value
33Thresholding is driven by the full distribution
of test statistic.