Introduction to Monte Carlo Methods - PowerPoint PPT Presentation

About This Presentation

Title:

Introduction to Monte Carlo Methods

Description:

Introduction to Monte Carlo Methods ... easier As opposed to its use in optimization, ... Default Design MathType 5.0 Equation Bitmap Image Microsoft ... – PowerPoint PPT presentation

Number of Views:211

Avg rating:3.0/5.0

Slides: 52

Provided by: UCSD3

Learn more at: https://cseweb.ucsd.edu

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Monte Carlo Methods

1
Introduction to Monte Carlo Methods
D.J.C. Mackay
Rasit Onur Topaloglu Ph.D. candidate rot_at_ucsd.edu
2
Outline

Problems Monte Carlo (MC) deals with

Uniform Sampling
Importance Sampling
Rejection Sampling
Metropolis
Gibbs Sampling
Speeding up MC Hybrid MC and Over-relaxation

3
Definition of Problems

Problem1 Generate samples from a distribution

Problem2 Estimate expectation of a function
given probability distribution for a variable
4
Monte Carlo for High Dimensions

Solving first problem gt solving second one
Just evaluate function using the samples and
average them out

The accuracy is independent of the dimensionality
of the space sampled

5
Sampling from P(x)

Assume we can evaluate P(x) within a
multiplicative constant

If P(x) can be evaluated, still a hard problem
as
Z is not known
Not easy to draw at high dimensions (except
Gaussian)
ex A sample from univariate Gaussian can be
calculated as

u1 and u2 are uniform in 0,1
6
Difficulty of Sampling at High Dimensions

Can discretize the function (figure on right) and
sample the discrete samples

This is costly at high dimensions BN for B bins
and N dimensions

7
Uniform Sampling

Tries to solve the 2nd problem

Draw samples uniformly from state space

ZR is the normalizing constant

Estimate by

For distributions that have peaks at a small
region, lots of points must be sampled so that
?(x) is calculated a number of times gt requires
lots of samples
Thus, uniform distribution seldom useful

8
Importance Sampling

Tries to solve the 2nd problem

Introduce a simpler density Q

Values of x where Q(x)gtP(x) are over-represented
values of x where Q(x)ltP(x) are
under-representedgt
introduce weights

9
Reliability of Importance Sampling
The estimate ? vs number of samples Gaussian
sampler Cauchy Sampler

An importance sampler should have heavy tails in
problems where infrequent samples might be
influential

10
Rejection Sampling

Again, a Q (proposal density) assumed

Also assume we know a constant c s.t.

Generate x using Q(x)

Find r.v. u uniformly in interval 0,cQ(x)

If ultP(x), accept and add x to the random
number list

11
Transition to Markov Chain MC

Importance and rejection sampling work well if Q
is similar to P
In complex problems, it is difficult to find a
single such Q over the state space

12
Metropolis Method

Proposal density Q depends on current state
x(t)Q(xx(t))

Accept new state if a?1, Else accept with
probability a

In comparison to rejection sampling, the rejected
points are not discarded and hence influential
on consequent samples gt samples correlated

gt Metropolis may have to be run more to generate
independent samples from P(x)!
13
Disadvantages of Large Step Size

Useful for high dimensions

Uses a length scale ? much smaller than state
space L

Because
Large steps are highly unlikely to be accepted
gt Limited or no movement in space! gt biased

14
A Lower Bound for Independent Samples

Metropolis will explore using a random walk

Random walks take a long time
After T steps of size ?, state only moves T?
As Monte Carlo is trying to achieve independent
samples, requires (L/ ?)2 samples to get a
sample independent of the initial condition
This rule of thumb be used for a lower bound

15
An Example using this Bound
100th iteration
time
400th iteration
1200th iteration

Takes 102100 steps to reach an end state

Metropolis still provides biases estimates even
after a large number of iterations

16
Gibbs Sampling

As opposed to previous methods, at least 2
dimensional distributions required
Q defined in terms of conditional distributions
of joint distribution P(x)
The assumption is P(x) complex to evaluate but
P(xixjj?i) is tractable

17
Gibbs Sampling on an Example

Start with x(x1,x2), fix x2(t) and sample x1
from P(x1x2)
Fix x1 and sample x2 from P(x2x1)

18
Gibbs Sampling with K Variables

In comparison to Metropolis, every proposal is
always accepted

In bigger models, groups of variables jointly
sampled

19
Comparison of MC Methods in High Dimensions

Importance and rejection sampling result in high
weights and constants respectively, resulting in
inaccurate or length simulations gt not practical

Metropolis requires at least (?max/ ?min)2
samples to acquire independent samples gt might
be lengthy

Gibbs sampling has similar properties with
Metropolis. No adjustable parameters gt most
practical

20
Practical Questions About Monte Carlo

Can we predict how long it takes for equilibrium
Use the simple bound proposed

Can we determine convergence in a running
simulation yet another difficult problem

Can we speed up convergence time and time
between independent samples?

21
Reducing Random Walk in Metropolis Hybrid MC

Most probabilities can be written in the form

Introduce a momentum variable p

K is kinetic energy

Create asymptotic samples from joint distribution

Pick p randomly from

Update x and p

This results in linear time convergence

22
An Illustrative Example for Hybrid MC
Hybrid
Hybrid
Random walk
Random walk

Over a number of iterations, hybrid trajectories
indicate less
correlated samples

23
Reducing Random Walk in Gibbs Over-relaxation

Use former value x(t) as well to calculate x(t1)

Suitable to speed up the process when variables
are highly correlated

Useful for Gaussians, not straightforward for
other types of conditional distributions

24
An Illustrative Example for Over-Relaxation

State space for a bi-variate Gaussian for 40
iterations

Over-relaxation samples better covers the state
space

25
Reducing Random Simulated Annealing

Introduce a parameter T (temperature) and
gradually reduce it to 1

High T corresponds to being able to make
transitions easier

As opposed to its use in optimization, T is not
reduced to 0 but 1

26
Applications of Monte Carlo
Differential Equations
Ex Steady-state temperature distribution of an
annulus
The finite difference approximation using grid
dimension h

With ¼ probability, one of the states is selected

The value when a boundary is reached is kept, the
mean of many such runs gives the solution

27
Applications of Monte Carlo
Integration
To evaluate

Bound the function with a box

Ratio of points under function to all points
within the box gives the ratio of the areas of
the function and the box

28
Applications of Monte Carlo
Image Processing Automatic eye-glass removal

MCMC used instead of a gradient-based methods to
solve
MAP criterion that is used to locate points of
eye-glasses

C. Wu, C. Liu, H.-Y. Shum, Y.-Q. Xu and Z.
Zhang, Automatic Eyeglasses Removal from Face
Images, IEEE Tran. On Pattern Analysis and
Machine Intelligence, Vol.26, No.3, pp. 322-336,
Mar.2004
29
Applications of Monte Carlo
Image segmentation

Data-driven techniques such as edge detection,
tracing and clustering combined with MCMC to
speed up the search

Z. Tu and S.-C. Zhu, Image Segmentation by
Data-Driven Markov Chain Monte Carlo, IEEE Tran.
On Pattern Analysis and Machine Intelligence,
Vol.24, No.5, pp. 657-673, May 2002
30
Forward Discrete Probability Propagation
Rasit Onur Topaloglu Ph.D. candidate rot_at_ucsd.edu
31
The Problem
Lowest Level
Ex gm?(2kId)
High level

The tree relates physical parameters to circuit
parameters

Structured according to SPICE formula hierarchy

Given pdfs for lowest level parameters find
pdfs at highest level

32
Motivation for Probability Propagation

Estimation of distributions of high level
parameters needed to examine effects of process
variations

Gaussian assumption attributed to these
parameters no longer accurate in current
technologies

Find a novel propagation method
GOALS
Determinism a stochastic output using known
formulas
Algebraic tractability enabling manual
applicability
Speed Accuracy be comparable or outperform
Monte Carlo and other parametric methods
33
Parametric Belief Propagation
Calculations handled at each node

Each node receives and sends messages to parents
and children until equilibrium

Parent to child (?) causal information

Parent to parent (?) diagnostic information

34
Parametric Belief Propagation

When arrows in the hierarchy tree indicate linear
addition operations on Gaussians, analytic
formulations possible

Not straightforward for other distributions or
non- standard distributions

35
Shortcomings of Monte Carlo

Non-determinism Not manually applicable

Limited for certain distributions Random number
generators in most CAD packages only provide
certain distributions

Accuracy May miss points that are less likely
to occur due to random sampling unless very large
number of samples used limited by the
performance of random number generator

36
Monte Carlo FDPP Comparison
one-to-many relationships and custom pdfs

Non-standard pdfs not possible without a custom
random number generator

Monte Carlo overestimates in one-to-many
relationships as same sample is used

P2
P3
P4
37
Operations Necessary to Implement FDPP
Analytic operation on continuous distributions
difficult instead operations on discrete
distributions implemented

F (Forward) Given a function, estimates the
distribution of next node in the formula
hierarchy

Q (Quantize) Discretize a pdf to operate on its
samples

B (Bandpass) Decrements number of samples for
computational efficiency

R (Re-bin) Reduces number of samples for
computational efficiency

38
Necessary Operators (Q, F, B, R) on a
Hierarchical Tree

Repeated until we acquire the high level
distribution (ex. G)

39
Probability Discretization Theory QN Operator
p and r domains
pdf(X)
p-domain
r-domain
Certain operators easy to apply in r-domain
40
Characterizing an spdf
spdf(X) or ?(X)
r-domain
41
F Operator

F operator implements a function over spdfs

Function applied to individual impulses

Individual probabilities multiplied

42
Band-pass, Be, Operator
43
Re-bin, RN, Operator
Resulting spdf(X)
44
Error Analysis

If quantizer uniform and ? small, quantization
error random variable Q is uniformly distributed

45
Algorithm Implementing the F Operator
While each random variable has its spdf computed
For each rv. which has all ancestor spdfs
computed
For each sample in X1
For each sample in Xr
Place an impulse with height p1,..,pr at
xf(v1,..,vr)
Apply B and R algorithms to this rv.
46
Algorithm for the B and R Operators
Find maximum and minimum values wi within impulses
Divide this range into M bins
For each bin
Place a quantizing impulse at the center of the
bin with a height pi equal to the sum of all
impulses within bin
Find maximum probability, pi-max, of quantized
impulses within bins
Eliminate impulses within bins which have a
quantized impulse with smaller probability than
error-ratepi-max
Find new maximum and minimum values wi within
impulses
Divide this range into N bins
For each bin
Place an impulse at the center of the bin with
height equal to sum of all impulses within bin
47
Monte Carlo FDPP Comparison
Pdf of Vth
Pdf of ID
solid FDPP dotted Monte Carlo

A close match is observed after interpolation

48
Monte Carlo FDPP Comparison with a Low Sample
Number
Pdf of ?F
Pdf of ?F
solid FDPP,100 samples
noisy Monte Carlo, 1000 and 100000 samples
respectively

Monte Carlo inaccurate for moderate number of
samples

Indicates FDPP can be manually applied without
major accuracy degradation

49
Monte Carlo FDPP Comparison
Pdf of n7
Benchmark example
solid FDPP dotted Monte Carlo
trianglesbelief propagation

Edges define a linear sum, ex n5n2n3

50
Faulty Application of Monte Carlo
Pdf of n7
Benchmark example
solid FDPP dotted Monte Carlo
trianglesbelief propagation

Distributions at internal nodes n4, n5, n6 should
be re-sampled using Monte Carlo

Not optimal for internal nodes with non-standard
distributions

51
Conclusions

Forward Discrete Probability Propagation is
introduced as an alternative to Monte Carlo
based methods

FDPP should be preferred when low probability
samples are important, algebraic intuition
needed, non-standard pdfs are present or
one-to-many relationships are present

Write a Comment

User Comments (0)