Introduction to Bayesian Methods - PowerPoint PPT Presentation

1 / 64
About This Presentation
Title:

Introduction to Bayesian Methods

Description:

Multinomial Distribution. Example: proportions of all 4 nucleotides in human chromosome 2. ... Example 1: Proportions in a multinomial model. pA pT pG pC ... – PowerPoint PPT presentation

Number of Views:124
Avg rating:3.0/5.0
Slides: 65
Provided by: karch7
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Bayesian Methods


1
Introduction to Bayesian Methods
  • Sining Chen

2
What is probability?
  • Frequentist
  • frequency of a certain event when repeating a
    random experiment infinite times
  • Examples
  • 1. probability of heads in a coin toss
    experiment defined
  • 2. probability that it will snow 4/20/2009 not
    defined.
  • 3. probability of me getting cancer by 70 if
    smoking 1 pack cig a day not defined
  • Bayesian
  • the quantification of uncertainty de Finetti
  • this uncertainty can be equated with personal
    belief of a quantity
  • Examples
  • 1. belief (can be updated by observation)
  • 2. belief (can be refined by modeling)
  • 3. belief (as above)

3
What do Bayesians do?
  • Bayesians study the uncertainty of parameter ?,
    conditional on data X
  • Bayesians use priors (non-experimental knowledge,
    knowledge from other sources, the state of
    ignorance, etc.)
  • Music expert, distinguishes mozart from haydn
  • A lady tastes tea tea poured into milk or vice
    versa.
  • A man predicts coin toss.
  • Each of the 10 times, they got it right.
  • Inference may differ due to different prior
    beliefs.

4
Bayesian Machinery
  • data X
  • prior belief ---------? posterior belief
  • about ? about ?
  • likelihood
  • L(?)P(X?)
  • prior distribution ------------? posterior distn
  • p(?) p(?X)

5
Bayes Rule
6
Question whats the proportion of As in
chromosome 2?
  • Data pieces of sequence from shot-gun sequencing
  • Parameter ? proportion of As in chr 2.

7
Prior
  • What is your prior belief about ??

? U0,1
? beta(4,12)
? beta(20,60)
8
Beta Distribution for Prior of a Probability
  • Beta(? a, ß)
  • mean a/(aß)
  • var mean(1-mean)/(1aß)
  • Uniformbeta(1,1)

9
Likelihood
  • Data 13As out of a sequence of 50
  • Model (for the data) determines the form of
    likelihood. Also a subjective choice
  • Binomial

10
Posterior
prior (beta)
likelihood (binomial)
posterior
posterior mean
posterior 95 credible interval (0.16, 0.37)
11
Posterior is a compromise between prior and data
  • Prior mean a/(aß)
  • Sample mean y/n
  • Posterior mean (ya)/(naß)

12
Conjugate Priors (pseudo-counts)
  • Prior ? Beta(a, ß)
  • Likelihood y? ? ?y (1-?)(n-y)
  • Posterior ?y Beta(ay,ßn-y)
  • Conjugacy posterior distn has the same
    parametric form as prior.
  • Beta distribution is conjugate to binomial
    likelihood.

13
Sequential update with conjugate prior
  • ? beta(a,ß)
  • data y1,n1-y1
  • ?y1 beta(ay1,ßn1-y1)
  • data y2,n2-y2
  • ?y1,y2 beta(ay1y2, ßn1n2-y1-y2)
  • Informative conjugate priors act like
    pseudo-counts

14
  • Java applet demonstrating the updating of ? with
    data
  • http//www.bolderstats.com/gallery/bayes/binomialR
    evision.html

15
Sequential update with conjugate prior
  • ? beta(4,12)
  • data 13,37
  • ?y1 beta(17,39)
  • data 12,38
  • ?y1,y2 beta(29, 77)
  • ?entire chr2beta(71e6, 167e6)
  • ? E(?chr2)0.30, sd(?chr2)2.6e-5
  • Large amount of data wash over the initial prior

16
Ex. Multinomial Distribution
  • Example proportions of all 4 nucleotides in
    human chromosome 2.
  • Prior belief about ?(pA, pT, pG, pC)
  • with the constraint pApTpGpC1
  • Dirichlet distribution is a generalization of
    beta distribution. e.g. ?Dir(?4,4,4,4)

17
Likelihood
  • Data 13As, 24Ts, 6Gs,7Cs
  • Model (for the data) determines the form of
    likelihood. Also a subjective choice
  • Multi-nomial is a multi-class version of binomial

18
Posterior
  • E(pAX)0.26
  • E(pTX)0.42
  • E(pGX)0.15
  • E(pCX)0.17

19
Multinomial with conjugate prior
  • ? Dirichlet(4,4,4,4)
  • data 13,24,6,7
  • ?y1.Dirichlet(17,28,10,11)
  • data y21,y22,y23,
  • ?chr2Dirichlet(71e6, 71e6, 48e6, 48e6)
  • Informative conjugate priors has a pseudo-count
    interpretation

20
Several parameters in one problem
  • Example 1 Proportions in a multinomial model
  • pA pT pG pC
  • Example 2 What is the proportion of CpG islands
    in Chr2 (of a Martian)? (?)
  • The amount of CpGs in non-island area (p1)
  • The amount of CpGs in CpG-islands (p2)

21
Prior
  • ? Uniform(0,1)
  • P1 and p2 are uniform in
  • 0,1)x0,1) and p1ltp2

22
Data
  • Again shotgun sequence data. Instead of
    nucleotides, count the number of transitions and
    how many are CG

23
Likelihood
24
Posterior? likelihood prior
25
Joint, Conditional, Marginal Distribution
  • Two random variables?1,?2
  • Joint distributionP(?1,?2)
  • Conditional P(?1?2), P(?2?1)
  • Marginal distribution p(?1), p(?2)

26
Joint, Marginal, Conditional
P(?1,?2)
P(?1)
P(?1?2t)
t
27
Joint ? Marginal by integration
  • P(?1)?p(?1,?2)d?2
  • P(?2)?p(?1,?2)d?1
  • Integration

http//homepage2.nifty.com/hashimoto-t/misc/margin
-e.html
28
Joint?Marginal by Simulation
29
Posterior Inference by Computation
  • MCMC
  • M-H Gibbs

30
Goal obtain a random sample from the posterior
  • Two main approaches
  • Independence sampling
  • draw ?(1), ?(2), ?(M) p(?y)
  • ?(1)??(2)? ??(M)
  • Iterative sampling
  • draw ?(i1) p(?(i1)?(i))
  • Such that p(?(i))?p(?y)

31
Posterior inference using Simulation
  • Posterior mean for ?
  • E(?)????p(?,p1,p2)d?dp1dp2
  • Posterior variance
  • var(?)???(?-E(?))2 p(?,p1,p2)d?dp1dp2
  • Confidence Interval
  • E(log(p1/(1-p1))-log(p2/(1-p2)))
  • mean(lambda.simu)
  • var(lambda.simu)
  • quantile(c(0.025,0.975),lambda.simu)
  • mean(log(p1.simu/(1-p1.simu))-log(p2.simu/(1-p2.si
    mu)))
  • No need for integration, ever!

32
Simulation vs Exact
  • Difficulty with Exact
  • Some posteriors have no close form
  • Some likelihoods are too complex, high
    dimensional, etc.
  • Needs strong math skills
  • Limitations of Simulation
  • Simulation has stochastic error
  • Iterative algorithms may not converge
  • Sometimes too easy beware!
  • Requires programming

33
Markov chain Monte Carlo (MCMC)
  • Markov chain
  • The (n1)-th step only depends on the n-th step.
  • Transition probability
  • Monte CarloSimulation

34
Transition Distribution Stationary Distribution
of Markov chain
  • A Markov chain is constructed by specifying a
    transition probability
  • ? dance steps
  • A Markov chain (aperiodic irreducible) has a
    stationary distribution
  • ? map of footprints
  • Idea construct a Markov chain for ? whose
    stationary distn is p(?y)

35
(No Transcript)
36
Metropolis Algorithm
  • Start with ?0

proposal step
acceptance/rejection step
37
Relationship to Optimization
  • is a stochastic version of a stepwise
    mode-finding algorithm, always accepting upward
    steps, only sometimes accepting downward steps


38
An illustration
  • http//www.probability.ca/jeff/java/rwm.html

39
Back to the CpG island problem
  • logposterior function(theta,y,n)
  • sum(log(theta1theta2y(1-theta2)(n-y)
    (1-theta1)theta3y(1-theta3)(n-y)))
  • jumping distribution
  • proposal function(theta)
  • thetarnorm(3,0,1)c(0.05,0.05,0.05)
  • mcmatrix(NA,100000,3)
  • names(mc)c(lambda,p1,p2)
  • starting values
  • mc1,c(0.7,0.1,0.2)
  • for(i in 2100000)
  • for(j in 1100)
  • Candidate proposal(mci-1,)

if(j100) print("too many proposals outside
of boundary!\n") break if(prod(Candidatelt1)p
rod(Candidategt0)(Candidate3gtCandidate2)gt0)
cat(j)break r logposterior(Candidate,y
,n)-logposterior(mci-1,,y,n) if(runif(1) lt
exp(r) ) mci,Candidate else mci,mc
i-1,
40
Results (lambda) Metropolis not converged in
1000 iterations
41
p1 p2 first 1000 iterations
42
Results (lambda) converged with 100,000 iterations
mean(lambda)0.63, sd0.25
43
p1 p2 for 100,000 iterations
mean(p1)0.10, sd0.024, mean(p2)0.24, sd0.16,
44
Gibbs Sampler
45
Gibbs
46
The Gibbs Sampler
47
(No Transcript)
48
Back to the mixture problem
  • Augment the data with group membership zi,
    i1,,10

49
  • Now the full conditional distributions are all
    known distributions

50
Gibbs code in R
  • datread.table("D/guestlecture/CpG.dat",headerT)
  • ydaty ndatn
  • lambda lt- 0.3 p1 lt- 0.1 p2 lt- 0.2 z lt-
    rep(0,10)
  • mcmatrix(NA, 10000, 13)
  • mc1, c(lambda, p1, p2, z)
  • for(i in 210000)
  • lambda lt- rbeta(1, sum(z)1, 10-sum(z)1)
  • for(j in 1100)
  • p1 lt- rbeta(1, sum(yz)1, sum((n-y)z)1)
  • p2 lt- rbeta(1, sum(y(1-z))1,
    sum((n-y)(1-z))1)
  • if(p2gtp1) break
  • ff1lambdap1y(1-p1)(n-y)
  • ff2(1-lambda)p2y(1-p2)(n-y)
  • zrbinom(10, rep(1,10), ff1/(ff1ff2))
  • mci,c(lambda,p1,p2,z)

51
lambda Gibbs at 1000 iterations
mean(lambda)0.66, sd0.25
52
p1, p2 at 1000 iterations
mean(p1)0.1, sd(p1)0.025, mean(p2)0.27,
sd(p2)0.20
53
Result (lambda) at 10k iterations
mean(lambda)0.64, sd0.25
54
p1, p2 at 10k iterations
mean(p1)0.1, sd(p1)0.026, mean(p2)0.26,
sd(p2)0.19
55
Compared with Metropolis
  • Gibbs converged within the first 1000 iterations,
    comparable to Metropolis at around 20,000.
  • The convergence of Gibbs at 10k looks better than
    Metropolis at 100k.

56
Explaining the Gibbs Sampler
  • http//www.jstor.org/pss/2685208
  • Explaining the Gibbs Sampler, Casella George,
    the American Statistician 46(3)

57
Difficulties with Iterative Sampling
  • Burn-in earlier iterations are not
    representative of the target distribution
    influenced by the starting values.
  • Dependence effective number of independent draws
    lt N. ? Thinning

58
Methods to assess convergence
  • CODA
  • Eye-balling
  • Galin Jones workDepartment of
    StatisticsUniversity of Minnesota

59
Summary
  • The Bayesian approach
  • Priorlikelihood?posterior
  • Betabinomial?Beta
  • Dirichletmultinomial?Dirichlet
  • Multi-parameter problem
  • Simulation from posterior
  • MCMC
  • Gibbs

60
Additional material
61
Why does Metropolis work?
  • Outline of proof
  • show that the simulated sequence ?1, ?2, is a
    Markov chain
  • show that the Markov chain has a unique
    stationary distribution
  • show that this distribution is the target
    distribution p(?y)
  • Understanding the Metropolis-Hastings Algorithm,
    Chib and Greenberg, the American Statistician,
    49(4)

62
A generalization Metropolis-Hastings
  • In Metropolis, the jumping distribution has to be
    symmetric
  • Hastings got rid of that constraint

63
Properties of a good jumping rule
  • Theoretically you can use any jumping
    distribution as long as it can get you bto all
    areas of the parameter space(irreducibility)

64
Gibbs as a special case of M-H
  • Acceptance ratio is always r1
Write a Comment
User Comments (0)
About PowerShow.com