Introduction to Bayesian Methods - PowerPoint PPT Presentation

1 / 64

About This Presentation

Title:

Introduction to Bayesian Methods

Description:

Multinomial Distribution. Example: proportions of all 4 nucleotides in human chromosome 2. ... Example 1: Proportions in a multinomial model. pA pT pG pC ... – PowerPoint PPT presentation

Number of Views:126

Avg rating:3.0/5.0

Slides: 65

Provided by: karch7

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Bayesian Methods

1
Introduction to Bayesian Methods

Sining Chen

2
What is probability?

Frequentist
frequency of a certain event when repeating a
random experiment infinite times
Examples
1. probability of heads in a coin toss
experiment defined
2. probability that it will snow 4/20/2009 not
defined.
3. probability of me getting cancer by 70 if
smoking 1 pack cig a day not defined

Bayesian
the quantification of uncertainty de Finetti
this uncertainty can be equated with personal
belief of a quantity
Examples
1. belief (can be updated by observation)
2. belief (can be refined by modeling)
3. belief (as above)

3
What do Bayesians do?

Bayesians study the uncertainty of parameter ?,
conditional on data X
Bayesians use priors (non-experimental knowledge,
knowledge from other sources, the state of
ignorance, etc.)
Music expert, distinguishes mozart from haydn
A lady tastes tea tea poured into milk or vice
versa.
A man predicts coin toss.
Each of the 10 times, they got it right.
Inference may differ due to different prior
beliefs.

4
Bayesian Machinery

data X
prior belief ---------? posterior belief
about ? about ?
likelihood
L(?)P(X?)
prior distribution ------------? posterior distn
p(?) p(?X)

5
Bayes Rule
6
Question whats the proportion of As in
chromosome 2?

Data pieces of sequence from shot-gun sequencing

Parameter ? proportion of As in chr 2.

7
Prior

What is your prior belief about ??

? U0,1
? beta(4,12)
? beta(20,60)
8
Beta Distribution for Prior of a Probability

Beta(? a, ß)
mean a/(aß)
var mean(1-mean)/(1aß)
Uniformbeta(1,1)

9
Likelihood

Data 13As out of a sequence of 50
Model (for the data) determines the form of
likelihood. Also a subjective choice
Binomial

10
Posterior
prior (beta)
likelihood (binomial)
posterior
posterior mean
posterior 95 credible interval (0.16, 0.37)
11
Posterior is a compromise between prior and data

Prior mean a/(aß)
Sample mean y/n
Posterior mean (ya)/(naß)

12
Conjugate Priors (pseudo-counts)

Prior ? Beta(a, ß)
Likelihood y? ? ?y (1-?)(n-y)
Posterior ?y Beta(ay,ßn-y)
Conjugacy posterior distn has the same
parametric form as prior.
Beta distribution is conjugate to binomial
likelihood.

13
Sequential update with conjugate prior

? beta(a,ß)
data y1,n1-y1
?y1 beta(ay1,ßn1-y1)
data y2,n2-y2
?y1,y2 beta(ay1y2, ßn1n2-y1-y2)
Informative conjugate priors act like
pseudo-counts

Java applet demonstrating the updating of ? with
data
http//www.bolderstats.com/gallery/bayes/binomialR
evision.html

15
Sequential update with conjugate prior

? beta(4,12)
data 13,37
?y1 beta(17,39)
data 12,38
?y1,y2 beta(29, 77)
?entire chr2beta(71e6, 167e6)
? E(?chr2)0.30, sd(?chr2)2.6e-5
Large amount of data wash over the initial prior

16
Ex. Multinomial Distribution

Example proportions of all 4 nucleotides in
human chromosome 2.
Prior belief about ?(pA, pT, pG, pC)
with the constraint pApTpGpC1
Dirichlet distribution is a generalization of
beta distribution. e.g. ?Dir(?4,4,4,4)

17
Likelihood

Data 13As, 24Ts, 6Gs,7Cs
Model (for the data) determines the form of
likelihood. Also a subjective choice
Multi-nomial is a multi-class version of binomial

18
Posterior

E(pAX)0.26
E(pTX)0.42
E(pGX)0.15
E(pCX)0.17

19
Multinomial with conjugate prior

? Dirichlet(4,4,4,4)
data 13,24,6,7
?y1.Dirichlet(17,28,10,11)
data y21,y22,y23,
?chr2Dirichlet(71e6, 71e6, 48e6, 48e6)
Informative conjugate priors has a pseudo-count
interpretation

20
Several parameters in one problem

Example 1 Proportions in a multinomial model
pA pT pG pC
Example 2 What is the proportion of CpG islands
in Chr2 (of a Martian)? (?)
The amount of CpGs in non-island area (p1)
The amount of CpGs in CpG-islands (p2)

21
Prior

? Uniform(0,1)
P1 and p2 are uniform in
0,1)x0,1) and p1ltp2

22
Data

Again shotgun sequence data. Instead of
nucleotides, count the number of transitions and
how many are CG

23
Likelihood
24
Posterior? likelihood prior
25
Joint, Conditional, Marginal Distribution

Two random variables?1,?2
Joint distributionP(?1,?2)
Conditional P(?1?2), P(?2?1)
Marginal distribution p(?1), p(?2)

26
Joint, Marginal, Conditional
P(?1,?2)
P(?1)
P(?1?2t)
t
27
Joint ? Marginal by integration

P(?1)?p(?1,?2)d?2
P(?2)?p(?1,?2)d?1
Integration

http//homepage2.nifty.com/hashimoto-t/misc/margin
-e.html
28
Joint?Marginal by Simulation
29
Posterior Inference by Computation

MCMC
M-H Gibbs

30
Goal obtain a random sample from the posterior

Two main approaches
Independence sampling
draw ?(1), ?(2), ?(M) p(?y)
?(1)??(2)? ??(M)
Iterative sampling
draw ?(i1) p(?(i1)?(i))
Such that p(?(i))?p(?y)

31
Posterior inference using Simulation

Posterior mean for ?
E(?)????p(?,p1,p2)d?dp1dp2
Posterior variance
var(?)???(?-E(?))2 p(?,p1,p2)d?dp1dp2
Confidence Interval
E(log(p1/(1-p1))-log(p2/(1-p2)))

mean(lambda.simu)
var(lambda.simu)
quantile(c(0.025,0.975),lambda.simu)
mean(log(p1.simu/(1-p1.simu))-log(p2.simu/(1-p2.si
mu)))
No need for integration, ever!

32
Simulation vs Exact

Difficulty with Exact
Some posteriors have no close form
Some likelihoods are too complex, high
dimensional, etc.
Needs strong math skills

Limitations of Simulation
Simulation has stochastic error
Iterative algorithms may not converge
Sometimes too easy beware!
Requires programming

33
Markov chain Monte Carlo (MCMC)

Markov chain
The (n1)-th step only depends on the n-th step.
Transition probability
Monte CarloSimulation

34
Transition Distribution Stationary Distribution
of Markov chain

A Markov chain is constructed by specifying a
transition probability
? dance steps
A Markov chain (aperiodic irreducible) has a
stationary distribution
? map of footprints
Idea construct a Markov chain for ? whose
stationary distn is p(?y)

35
(No Transcript)
36
Metropolis Algorithm

Start with ?0

proposal step
acceptance/rejection step
37
Relationship to Optimization

is a stochastic version of a stepwise
mode-finding algorithm, always accepting upward
steps, only sometimes accepting downward steps

38
An illustration

http//www.probability.ca/jeff/java/rwm.html

39
Back to the CpG island problem

logposterior function(theta,y,n)
sum(log(theta1theta2y(1-theta2)(n-y)
(1-theta1)theta3y(1-theta3)(n-y)))
jumping distribution
proposal function(theta)
thetarnorm(3,0,1)c(0.05,0.05,0.05)
mcmatrix(NA,100000,3)
names(mc)c(lambda,p1,p2)
starting values
mc1,c(0.7,0.1,0.2)
for(i in 2100000)
for(j in 1100)
Candidate proposal(mci-1,)

if(j100) print("too many proposals outside
of boundary!\n") break if(prod(Candidatelt1)p
rod(Candidategt0)(Candidate3gtCandidate2)gt0)
cat(j)break r logposterior(Candidate,y
,n)-logposterior(mci-1,,y,n) if(runif(1) lt
exp(r) ) mci,Candidate else mci,mc
i-1,
40
Results (lambda) Metropolis not converged in
1000 iterations
41
p1 p2 first 1000 iterations
42
Results (lambda) converged with 100,000 iterations
mean(lambda)0.63, sd0.25
43
p1 p2 for 100,000 iterations
mean(p1)0.10, sd0.024, mean(p2)0.24, sd0.16,
44
Gibbs Sampler
45
Gibbs
46
The Gibbs Sampler
47
(No Transcript)
48
Back to the mixture problem

Augment the data with group membership zi,
i1,,10

Now the full conditional distributions are all
known distributions

50
Gibbs code in R

datread.table("D/guestlecture/CpG.dat",headerT)
ydaty ndatn
lambda lt- 0.3 p1 lt- 0.1 p2 lt- 0.2 z lt-
rep(0,10)
mcmatrix(NA, 10000, 13)
mc1, c(lambda, p1, p2, z)
for(i in 210000)
lambda lt- rbeta(1, sum(z)1, 10-sum(z)1)
for(j in 1100)
p1 lt- rbeta(1, sum(yz)1, sum((n-y)z)1)
p2 lt- rbeta(1, sum(y(1-z))1,
sum((n-y)(1-z))1)
if(p2gtp1) break
ff1lambdap1y(1-p1)(n-y)
ff2(1-lambda)p2y(1-p2)(n-y)
zrbinom(10, rep(1,10), ff1/(ff1ff2))
mci,c(lambda,p1,p2,z)

51
lambda Gibbs at 1000 iterations
mean(lambda)0.66, sd0.25
52
p1, p2 at 1000 iterations
mean(p1)0.1, sd(p1)0.025, mean(p2)0.27,
sd(p2)0.20
53
Result (lambda) at 10k iterations
mean(lambda)0.64, sd0.25
54
p1, p2 at 10k iterations
mean(p1)0.1, sd(p1)0.026, mean(p2)0.26,
sd(p2)0.19
55
Compared with Metropolis

Gibbs converged within the first 1000 iterations,
comparable to Metropolis at around 20,000.
The convergence of Gibbs at 10k looks better than
Metropolis at 100k.

56
Explaining the Gibbs Sampler

http//www.jstor.org/pss/2685208
Explaining the Gibbs Sampler, Casella George,
the American Statistician 46(3)

57
Difficulties with Iterative Sampling

Burn-in earlier iterations are not
representative of the target distribution
influenced by the starting values.
Dependence effective number of independent draws
lt N. ? Thinning

58
Methods to assess convergence

CODA
Eye-balling
Galin Jones workDepartment of
StatisticsUniversity of Minnesota

59
Summary

The Bayesian approach
Priorlikelihood?posterior
Betabinomial?Beta
Dirichletmultinomial?Dirichlet
Multi-parameter problem
Simulation from posterior
MCMC
Gibbs

60
Additional material
61
Why does Metropolis work?

Outline of proof
show that the simulated sequence ?1, ?2, is a
Markov chain
show that the Markov chain has a unique
stationary distribution
show that this distribution is the target
distribution p(?y)
Understanding the Metropolis-Hastings Algorithm,
Chib and Greenberg, the American Statistician,
49(4)

62
A generalization Metropolis-Hastings