Title: BSTA 670 – Statistical Computing
1BSTA 670 Statistical Computing
Lecture 12, Part 2 Numerical Integration and
Monte Carlo Methods
Partially adapted from http//www.busim.ee.boun.
edu.tr/busim/bayes/5MCMCv2.ppt
2Numerical Integration
- Newton-Cotes quadrature (replaces integrand with
a polynomial approximation at each sub-interval,
where each sub-interval has equal
length.) - Riemann rule
- Trapezoidal rule, linear approx. for
each sub-interval - Simpson's rule, quadratic approx. in
each sub-interval - Midpoint rule (linear approximation for each
sub-interval) - Guassian quadrature (Unequal sub-interval
lengths, with emphasis on regions where integrand
is largest. - Monte Carlo methods
3- In many cases it is difficult to evaluate
integrals integrals analytically. - Examples- Computing expectations, Bayesian
analysis
4- Marginal Distribution - Marginalisation integral
- Suppose (multidimensional parameter)
- We have calculated and we want to calculate
the posterior for one parameter only - Where
- This is a k-1 dimensional integration
5- Expectation integral
- Evaluate a point estimate
- Often, it is not possible to evaluate the
integral analytically - This is a common problem with the Bayesian
approach
6Example shape parameter of the Gamma distribution
- a is known and we observe
- Take the prior as uniform distribution
- The posterior is proportional to
- Very difficult to have closed form integrals over
this complicated integrand
7One possible solution is numerical integration
- Suppose we wish to integrate
- Any of the methods we covered will work, with
some modification for the infinite limits.
8Approximation of Integral
- 1. Find values and beyond
which - is negligible
- 2. Split into N equivalent
- intervals. Then
9- With multiparameters, this is numerically
inefficient, since we need very large N for good
approximation - When there are multi-parameters, you need to form
grids on which to assess the function - Where the distributions are peaked, finer grids
are needed, and non-equal grids are best. - Methods become quite complicated in practice
- Alternative Monte Carlo methods
10Monte Carlo
- Monte-Carlo, created in 1866, named in honor of
Prince Charles III, hosts an internationally
famous Casino, luxury hotels and leisure
facilities.Monte Carlo is a district of the
Principality of Monaco.
11Monte Carlo
- Monte Carlo is the use of experiments with random
numbers to evaluate mathematical expressions. - The experimental units are the random numbers.
- The expressions can be definite integrals,
systems of equations, or complicated mathematical
models.
12Monte Carlo
- The term Monte Carlo was coined by John von
Neumann and S.M. Ulam, as a code word for their
secret work at Los Alamos Labs that applied these
methods to problems in atomic physics as part of
the development of the atomic bomb during World
War II. The name was suggested by the gambling
casinos in the city of Monte Carlo in Monaco. - The use of these methods increased rapids are the
availability of the computer.
13Monte Carlo
- Standard approximations from numerical analysis
are preferred. However, Monte Carlo methods
provide an alternative when the numerical
approaches are intractable (dont work). - Monte Carlo methods are typically preferred for
the evaluation of integrals over high-dimensional
domains.
14Monte Carlo
- Monte Carlo methods can be used to solve very
large systems of equations. In this application,
there is no inherent randomness. The randomness
is introduced through RVs, which are defined and
simulated for the purpose of solving a
deterministic problem.
15Monte Carlo
- In some applications, there is an inherent
randomness in the model, and the objective of the
Monte Carlo study is to estimate or evaluate the
distribution of a variable or statistic. The
Monte Carlo method is used to generate data from
the underlying distribution of interest. Methods
for probability density estimation are then
applied using this data, as well as a graphical
presentation of the density.
16Monte Carlo StudiesSome Practical Considerations
- Software
- Most statistical software programs have very
limited built in simulation functions, but
include the basic tools needed to write programs
to perform simulations. - STATA, Splus, SAS, Matlab (?)
17Monte Carlo StudiesSome Practical Considerations
- Software
- Matlab has Simulation module Simulink .
- But, this is written specifically to design,
simulate, implement, and test a - variety of time-varying systems, including
communications, controls, - signal processing, video processing, and image
processing. - Thus, it is not generally applicable to the
problems you will encounter. -
-
18Monte Carlo StudiesSome Practical Considerations
- Successive sets of Monte Carlo generated data
should be based on non-overlapping sequences of
random numbers. - One uninterrupted sequence of random numbers
should be used for the complete Monte Carlo study.
19Monte Carlo StudiesSome Practical Considerations
- The results any scientific experiment should be
reproducible. - This requirement is stricter for a Monte Carlo
study. - The experimental units should be strictly
reproducible. This means that the complete
sequence of random numbers can be repeated, given
the same initial conditions (up to machine
precision).
20What is Monte Carlo used for?
- Monte Carlo techniques are used for
- Integration problems
- Marginalization (Bayesian problem primarily)
- Computing Expectation
- Optimization problems
- Model selection
- Modeling
- Simulations
21In the case of Bayesian Analysis
- We do not need to numerically calculate the
posterior density function. Rather we generate
values with its distribution. - These values are then used to approximate the
density functions or estimates such as posterior
means, variances, etc. - Various ways
- Rejection sampling
- Importance sampling
22Monte Carlo principle
- Given a very large set X and a distribution p(x)
over it - We draw i.i.d. a set of N samples
- We can then approximate the distribution using
these samples
23Monte Carlo principle
24Monte Carlo principle
- These samples can also be used to compute
expectations - The samples can be used to determine the maximum
or minimum (Note arg max stands for argument
of the maximum, which is the value of the
argument for which the value of the expression
attains its maximum)
25Sampling X Reverse sampling Method
- Simplest method
- Used frequently for non-uniform random number
generation - Sample a random
- number ? from U0,1)
- Evaluate xF?¹(?)
- Simple, but you need
- the functional form of F.
26Sampling X Rejection sampling
- More generally, we would like to sample from
p(x), but its easier to sample from a proposal
distribution q(x) - q(x) satisfies p(x) lt M q(x) for some Mlt8
- Procedure
- Sample x(i) from q(x)
- Accept with probability p(x(i)) / Mq(x(i))
- Reject otherwise
- The accepted x(i) are sampled from p(x)!
- Problem if M is too large, we will rarely accept
samples
27Rejection Sampling
28Example Rejection Sampling
- Shape parameter of a Gamma distribution
- Choosing uniform prior, the prior is bounded and
on a finite interval - We need to find a constant such that
- In this case c is
29Note that only about 0.11 of the proposals are
accepted in this example. A lot of waste!
30Importance sampling
- The problem with the rejection sampling is that
we have to be clever in choosing the proposal
density! - Importance sampling avoids difficult choices and
generates random numbers economically.
31Importance sampling
- Also called biased sampling
- It is a variance reduction sampling technique
- Introduced by Metropolis in 1953
- Instead of choosing points from a uniform
distribution, they are now chosen from a
distribution which concentrates the points where
the function being integrated is large. (It
encourages important samples)
32Importance sampling
- Nicholas Constantine Metropolis
- Worked ar Los Alamos on
- the Manhatten project
- Worked with Enrico Fermi
- (developed quantum physics)
- and Edward Teller (father of the hydrogen
bomb) on the first nuclear reactors
33Importance sampling
- Sample from g(x) and evaluate f(x)/g(x)
- The new integrand, f/g, is close to unity and so
the variance for this function is much smaller
than that obtained when evaluating the function
by sampling from a uniform distribution. Sampling
from a non-uniform distribution for this function
should therefore be more efficient than doing a
crude Monte Carlo calculation without importance
sampling
34Importance Sampling
To account for the fact we sampled from the wrong
distribution we introduce weights
Then
Monte Carlo estimate of I(f)
Choose proposal distribution to minimize variance
of the estimator
35Markov Chain sampling
- Recall again the set X and the distribution p(x)
we wish to sample from - Suppose that it is hard to sample p(x) and
difficult to suggest a proposal distribution but
that it is possible to walk around in X using
only local state transitions - Insight we can use a random walk to help us
draw random samples from p(x)
36Andrei A. Markov 1856-1922
Russian mathematician
37Markov Chain sampling
38Markov Chain sampling
- That is, if our sampling follows a well defined
structure, i.e. if our sample at one instant
depends on our sampling at the previous step, we
can use this information. - This is a Markov Chain Monte Carlo (MCMC)
Sampling and we can benefit from a random walk. - MCMC theory basically says that if you sample
using a Markov Chain, eventually your samples
will be coming from the stationary distribution
of the Markov Chain.
39Markov Chain sampling
- The key to the success of Markov Chain sampling
is the fact that at every iteration you sample
from a different distribution in a sequence of
distributions, which eventually converge to the
target distribution. - In importance sampling, you always sample from
the same (and wrong) distribution, and then make
an adjustment for this.