Mathematical%20Foundations%20of%20Markov%20Chain%20Monte%20Carlo%20Algorithms - PowerPoint PPT Presentation

About This Presentation

Title:

Mathematical%20Foundations%20of%20Markov%20Chain%20Monte%20Carlo%20Algorithms

Description:

Techniques for Bounding the Mixing Time. Based on lectures by ... Application 3 : Volume & Integration [DyerFriezeKannan] : a convex body in Rd (d large) ... – PowerPoint PPT presentation

Number of Views:181

Avg rating:3.0/5.0

Slides: 67

Provided by: math6

Learn more at: https://people.csail.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: Mathematical%20Foundations%20of%20Markov%20Chain%20Monte%20Carlo%20Algorithms

1
Mathematical FoundationsofMarkov Chain Monte
Carlo Algorithms

Based on lectures given by
Alistair Sinclair
Computer Science Division
U.C. Berkeley

2
Overview

Random Sampling
The Markov Chain Monte-Carlo Paradigm
Mixing Time
Coupling
Flow
Geometry

Techniques for Bounding the Mixing Time
3
Random Sampling
x

? - very large sample set.
? - probability distribution over ?.

Goal Sample points x?? at random from
distribution ?.
4
The Probability Distribution

Typically,

w??R is an easily-computed weight function
ZSx w(x) is an unknown normalization factor
5
Application 1 Card Shuffling

? - all 52! permutations of a deck of cards.
? - uniform distribution ?x w(x)1.

Goal pick a permutation uniformly at random
6
Application 2 Counting

How many ways can we tile some given pattern with
dominos?

7
Application 2 Counting (cont.)

Sample tilings uniformly at random.
Let P1 proportion of sample of type 1.
Compute estimate N1 of N1 recursively.
output N N1 / P1.

N1
N2
N N1 N2
sample size O(n), levels O(n) ? O(n2)
samples total
8
Application 3 Volume Integration
Dyer\Frieze\Kannan

? a convex body in Rd (d large)
Problem estimate vol(?)

sequence of concentric balls B0 ? ? Br
estimate by sampling uniformly from ??Bi
Generalization Integration of log-concave
function over a cube A?Rd
9
Application 4 Statistical Physics

? - set of configurations of a physical system
? - Gibbs distribution
?(x)Pr system in config. xw(x)/Z
where w(x)e-H(x)/KT

temperature
energy
10
The Ising Model

n atomic magnets
configuration x?-,n
H(x) -(aligned neighbors)

- - -
- - -
- - -
- - - -
-

11
Why Sampling?

statistics of typical configurations.
mean energy (E?H(x)), specific heat,
estimate of partition function ZZ(T)?x??w(x)

12
Estimating the Partition Function

Let ?e-1/KT ? ZZ(?)?x?? ?-H(x).
Define 1?0 lt ?1 lt lt ?r?.

? r ? nlog?O(n2)
can be estimated by random sampling from
??i-1 ?i? ?i-1(11/n) ensures small variance ?
O(n) samples suffice for each ratio
13
Application 5 Optimization

? - set of feasible solutions to an optimization
problem
f(x) - value of solution x.
Goal maximize f(x).
Idea sample solutions where w(x)?f(x).

14
Application 5 Optimization

Idea sample solutions where w(x)?f(x).

concentration on good solutions (large values
f(x))
large ?
?
greater mobility (local optima are less high)
small ?
Simulated Annealing heuristic Slowly increase ?
15
Application 6 Hypothesis Verification in
Statistical Models

? - set of hypotheses
X - observed data

Let w(?)P(?)P(X/?).
prior
easy
16
Application 6 Hypothesis Verification in
Statistical Models (cont.)

Sampling from ?(?)P(?/X) gives
Statistical estimate of hypotheses ?.
Prediction
Model comparison
normalization factor P(X)
Prob model generated X

17
Markov Chains

Sample space ?
Random variables (r.v) over ?
X1,X2,,Xt,
Memoryless ?tgt0, ?x1,,xt1??,

18
Sampling Algorithm

Start at an arbitrary state X0.
Simulate MC for sufficiently many steps t.
Output Xt.
Then, ?x?? Prob Xt x ?(x)

X0
?
Xt
19
Transitions Matrix
PrXt1y/Xtx
y

P is non-negative
P is stochastic (?x ?xP(x,y)1)
PrXt1y/X0xPt(x,y)
PxtPx0 Pt
Definition ? is a stationary distribution, if
?P?.

Px
x
P
20
Irreducibility

Definition P is irreducible if

21
Aperiodicity

Definition P is aperiodic if

22
Note on Irreducibility and Aperiodicity

If P is irreducible, we can always make it
aperiodic, by adding self-loops
P ½(PI)
P has same stationary distribution as P.
Call P a lazy MC.

23
Fundamental Theorem

Theorem If P is irreducible and aperiodic,
then it is ergodic, i.e
where ? is the (unique) stationary distribution
of P i.e ? P?.

24
Main Idea (The MCMC Paradigm)

An ergodic MC provides an effective algorithm for
sampling from ?.

25
Examples

Random Walks on Graphs
Ehrenfest Urn
Card Shuffling
Coloring of a Graph
The Ising Model

26
1. Random Walk on Undirected Graphs
At each node, choose a neighbor u.a.r and jump to
it
27
Random Walk on Undirected Graph G(V,E)
?V
degree

Irreducible ? G is connected
Aperiodic ? G is not bipartite

28
Random Walk The Stationary Distribution
not essential

Claim If G is connected and not bipartite, then
the probability distribution induced by a random
walk on it converges to ?(x)d(x)/Sxd(x).
Proof

2E
29
2. Ehrenfest Urn
j balls
(n-j) balls

Pick a ball u.a.r
Move the ball to the other urn

30
2. Ehrenfest Urn

Xt number of balls in first urn.
MC is a non-uniform random walk on ?0,1,,n.

j/n
1-j/n

Irreducible Periodic
Stationary distribution

31
3. Card Shuffling

Top-in-at-random

Irreducible
Aperiodic
P is doubly stochastic ?y SxP(x,y)1
? ? is uniform ?x ?(x)1/n!

32
3. Card Shuffling

Random Transpositions

Irreducible
Aperiodic
P is symmetric ?x,y P(x,y)P(y,x)
? ? is uniform

33
3. Card Shuffling

Riffle shuffle Gilbert/Shannon/Reeds

34
3. Card Shuffling

Riffle shuffle Gilbert/Shannon/Reeds

Irreducible
Aperiodic
P is doubly stochastic
? ? is uniform

35
4. Colorings of a graph

G(V,E) connected, undirected
q number of colors
? set of proper q-colorings of G
? uniform

36
Colorings Markov Chain

pick v?V and c?1,,q u.a.r.
recolor v with c if possible.

Gs max degree

Irreducible if q??2
Aperiodic
P is symmetric
? ? is uniform

37
5. The Ising Model

Markov chain (Heat bath)
pick a site i u.a.r
replace spin x(i) by random spin x(i) s.t

n sites
?-,n
w(x)?aligned neighbors (x)

- - -
- - -
- - -
- - - -
-

neighbors of i
Irreducible, aperiodic, reversible w.r.t ? ?
converges to ?
38
Designing Markov Chains

What do we want?
Given ?, ?
MC over ? which converges to ?

39
The Metropolis Rule

Define any connected undirected graph on ?
(neighborhood structure/(local) moves)

40
The Metropolis Rule

Transitions from state x??
pick a neighbor y of x w.p ?(x,y)
move to y w.p minw(y)/w(x),1
(else stay at x)

?(x,y)?(y,x), ?(x,x)1-Sy-x?(x,y)

Irreducible
Aperiodic (make lazy if nec.)
reversible w.r.t w
? converges to ?.

41
The Mixing Time

Key Question How long until Pxt looks like ??
We will use the variation distance

42
The Mixing Time

Define
?x(t) pxt-?
?(t) maxx ?x(t)
The mixing time is
?mixmin t ?(t)?1/2e

43
Toy Example Top-In-At-Random

Let T time after initial bottom card reaches
top
T is a strong stationary time, i.e
PrXtx/tT?(x)
Claim ?(t)?PrTgtt
Thus, it remains to estimate T.

n
44
The Coupon Collector Problem

Each pack contains one coupon.
The goal is to complete the series.
How many packs would we buy?!

45
The Coupon Collector Problem

N total number of different coupons.
Xi time to get the i-th coupon.

46
Toy Example Top-In-At-Random

By the coupon collector,
the i-th coupon is a ticket to advance from the
(n-i1) level to the next one.
Pr T gt nlnn cn ? e-c
? ?mixnlnn cn

n
47
Example Riffle Shuffle
48
Example Riffle Shuffle

Inverse shuffle (same mixing time)

0 0 0 1 1 1 1 1
1 0 1 1 1 0 1 0
0/1 u.a.r
sorted stably
49
Inverse Shuffle

After t steps, each card is labeled with t
digits.
Cards are sorted by their labels.
Cards with different labels are in random order
Cards with same label are in original order

0 0 0 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0 1 0 1 1 1 1
50
Riffle Shuffle (Cont.)

Let T time until all cards have distinct labels
T is a strong stationary time.
Again we need to estimate T.

51
B i rthday Paradox

With which probability two of them have the same
birthday?

52
B I rthday Paradox (Cont.)

k people, n days (ngtkgt1)
The probability all birthdays are distinct

arithmetic sum
53
Riffle Shuffle (Cont.)

By the birthday paradox,
each card (1..n) picks a random label
there are 2t possible labels
we want all labels to be distinct
?mixO(logn)

54
General Techniques for Mixing Time

Probabilistic Coupling
Combinatorial Flows
Geometric - Conductance

55
Coupling
56
Mixing Time Via Coupling

Let P be an ergodic MC. A coupling for P is a
pair process (Xt,Yt) s.t
Xt,Yt are each copies of P
XtYt ? Xt1Yt1
Define Txymint XtYt X0x, Y0Y

57
Coupling Theorem

Theorem Aldous et al.
?(t) ? maxx,yPrTx,y gt t

Design a coupling that brings X and Y together
fast
58
1. Random Walk On Cube
?0,1n ? is uniform

Markov Chain
pick coordinate i?R1,,n
pick value b?R0,1
set x(i)b

1/2
1/6
1/6
1/6
59
Coupling For Random Walk

pick same i,b for both X and Y
Txy ? time to hit all n coordinates
By coupon collecting,
Pr Txy gt nlnn cn lt e-c
? ?mix ? nlnn cn

( 0 , 0 , 1 , 0 , 1 , 1 )
( 1 , 1 , 0 , 0 , 1 , 0 )
( 0 , 0 , 1 , 0 , 1 , 1 )
( 1 , 1 , 0 , 0 , 1 , 1 )
60
Flow
capacity of e(z,z) C(e)?(z)P(z,z)
flow along e denoted f(e)
flow routes ?(x)?(y) units from x to y, for every
x,y
l(f)
Diameter
61
Flow Theorem