Mathematical Foundations of Markov Chain Monte Carlo Algorithms

About This Presentation

Title:

Mathematical Foundations of Markov Chain Monte Carlo Algorithms

Description:

Mathematical Foundations of Markov Chain Monte Carlo Algorithms Based on lectures given by Alistair Sinclair Computer Science Division U.C. Berkeley – PowerPoint PPT presentation

Number of Views:126

Avg rating:3.0/5.0

Slides: 67

Provided by: 89949

Learn more at: https://people.csail.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: Mathematical Foundations of Markov Chain Monte Carlo Algorithms

1
Mathematical FoundationsofMarkov Chain Monte
Carlo Algorithms

Based on lectures given by
Alistair Sinclair
Computer Science Division
U.C. Berkeley

2
Overview

Random Sampling
The Markov Chain Monte-Carlo Paradigm
Mixing Time
Coupling
Flow
Geometry

Techniques for Bounding the Mixing Time
3
Random Sampling
x

? - very large sample set.
? - probability distribution over ?.

Goal Sample points x?? at random from
distribution ?.
4
The Probability Distribution

Typically,

w??R is an easily-computed weight function
ZSx w(x) is an unknown normalization factor
5
Application 1 Card Shuffling

? - all 52! permutations of a deck of cards.
? - uniform distribution ?x w(x)1.

Goal pick a permutation uniformly at random
6
Application 2 Counting

How many ways can we tile some given pattern with
dominos?

7
Application 2 Counting (cont.)

Sample tilings uniformly at random.
Let P1 proportion of sample of type 1.
Compute estimate N1 of N1 recursively.
output N N1 / P1.

N1
N2
N N1 N2
sample size O(n), levels O(n) ? O(n2)
samples total
8
Application 3 Volume Integration
Dyer\Frieze\Kannan

? a convex body in Rd (d large)
Problem estimate vol(?)

sequence of concentric balls B0 ? ? Br
estimate by sampling uniformly from ??Bi
Generalization Integration of log-concave
function over a cube A?Rd
9
Application 4 Statistical Physics

? - set of configurations of a physical system
? - Gibbs distribution
?(x)Pr system in config. xw(x)/Z
where w(x)e-H(x)/KT

temperature
energy
10
The Ising Model

n atomic magnets
configuration x?-,n
H(x) -(aligned neighbors)

- - -
- - -
- - -
- - - -
-

11
Why Sampling?

statistics of typical configurations.
mean energy (E?H(x)), specific heat,
estimate of partition function ZZ(T)?x??w(x)

12
Estimating the Partition Function

Let ?e-1/KT ? ZZ(?)?x?? ?-H(x).
Define 1?0 lt ?1 lt lt ?r?.

? r ? nlog?O(n2)
can be estimated by random sampling from
??i-1 ?i? ?i-1(11/n) ensures small variance ?
O(n) samples suffice for each ratio
13
Application 5 Optimization

? - set of feasible solutions to an optimization
problem
f(x) - value of solution x.
Goal maximize f(x).
Idea sample solutions where w(x)?f(x).

14
Application 5 Optimization

Idea sample solutions where w(x)?f(x).

concentration on good solutions (large values
f(x))
large ?
?
greater mobility (local optima are less high)
small ?
Simulated Annealing heuristic Slowly increase ?
15
Application 6 Hypothesis Verification in
Statistical Models

? - set of hypotheses
X - observed data

Let w(?)P(?)P(X/?).
prior
easy
16
Application 6 Hypothesis Verification in
Statistical Models (cont.)

Sampling from ?(?)P(?/X) gives
Statistical estimate of hypotheses ?.
Prediction
Model comparison
normalization factor P(X)
Prob model generated X

17
Markov Chains

Sample space ?
Random variables (r.v) over ?
X1,X2,,Xt,
Memoryless ?tgt0, ?x1,,xt1??,

18
Sampling Algorithm

Start at an arbitrary state X0.
Simulate MC for sufficiently many steps t.
Output Xt.
Then, ?x?? Prob Xt x ?(x)

X0
?
Xt
19
Transitions Matrix
PrXt1y/Xtx
y

P is non-negative
P is stochastic (?x ?xP(x,y)1)
PrXt1y/X0xPt(x,y)
PxtPx0 Pt
Definition ? is a stationary distribution, if
?P?.

Px
x
P
20
Irreducibility

Definition P is irreducible if

21
Aperiodicity

Definition P is aperiodic if

22
Note on Irreducibility and Aperiodicity

If P is irreducible, we can always make it
aperiodic, by adding self-loops
P ½(PI)
P has same stationary distribution as P.
Call P a lazy MC.

23
Fundamental Theorem

Theorem If P is irreducible and aperiodic,
then it is ergodic, i.e
where ? is the (unique) stationary distribution
of P i.e ? P?.

24
Main Idea (The MCMC Paradigm)

An ergodic MC provides an effective algorithm for
sampling from ?.

25
Examples

Random Walks on Graphs
Ehrenfest Urn
Card Shuffling
Coloring of a Graph
The Ising Model

26
1. Random Walk on Undirected Graphs
At each node, choose a neighbor u.a.r and jump to
it
27
Random Walk on Undirected Graph G(V,E)
?V
degree

Irreducible ? G is connected
Aperiodic ? G is not bipartite

28
Random Walk The Stationary Distribution
not essential

Claim If G is connected and not bipartite, then
the probability distribution induced by a random
walk on it converges to ?(x)d(x)/Sxd(x).
Proof

2E
29
2. Ehrenfest Urn
j balls
(n-j) balls

Pick a ball u.a.r
Move the ball to the other urn

30
2. Ehrenfest Urn

Xt number of balls in first urn.
MC is a non-uniform random walk on ?0,1,,n.

j/n
1-j/n

Irreducible Periodic
Stationary distribution

31
3. Card Shuffling

Top-in-at-random

Irreducible
Aperiodic
P is doubly stochastic ?y SxP(x,y)1
? ? is uniform ?x ?(x)1/n!

32
3. Card Shuffling

Random Transpositions

Irreducible
Aperiodic
P is symmetric ?x,y P(x,y)P(y,x)
? ? is uniform

33
3. Card Shuffling

Riffle shuffle Gilbert/Shannon/Reeds

34
3. Card Shuffling

Riffle shuffle Gilbert/Shannon/Reeds

Irreducible
Aperiodic
P is doubly stochastic
? ? is uniform

35
4. Colorings of a graph

G(V,E) connected, undirected
q number of colors
? set of proper q-colorings of G
? uniform

36
Colorings Markov Chain

pick v?V and c?1,,q u.a.r.
recolor v with c if possible.

Gs max degree

Irreducible if q??2
Aperiodic
P is symmetric
? ? is uniform

37
5. The Ising Model

Markov chain (Heat bath)
pick a site i u.a.r
replace spin x(i) by random spin x(i) s.t

n sites
?-,n
w(x)?aligned neighbors (x)

- - -
- - -
- - -
- - - -
-

neighbors of i
Irreducible, aperiodic, reversible w.r.t ? ?
converges to ?
38
Designing Markov Chains

What do we want?
Given ?, ?
MC over ? which converges to ?

39
The Metropolis Rule

Define any connected undirected graph on ?
(neighborhood structure/(local) moves)

40
The Metropolis Rule

Transitions from state x??
pick a neighbor y of x w.p ?(x,y)
move to y w.p minw(y)/w(x),1
(else stay at x)

?(x,y)?(y,x), ?(x,x)1-Sy-x?(x,y)

Irreducible
Aperiodic (make lazy if nec.)
reversible w.r.t w
? converges to ?.

41
The Mixing Time

Key Question How long until Pxt looks like ??
We will use the variation distance

42
The Mixing Time

Define
?x(t) pxt-?
?(t) maxx ?x(t)
The mixing time is
?mixmin t ?(t)?1/2e

43
Toy Example Top-In-At-Random

Let T time after initial bottom card reaches
top
T is a strong stationary time, i.e
PrXtx/tT?(x)
Claim ?(t)?PrTgtt
Thus, it remains to estimate T.

n
44
The Coupon Collector Problem

Each pack contains one coupon.
The goal is to complete the series.
How many packs would we buy?!

45
The Coupon Collector Problem

N total number of different coupons.
Xi time to get the i-th coupon.

46
Toy Example Top-In-At-Random

By the coupon collector,
the i-th coupon is a ticket to advance from the
(n-i1) level to the next one.
Pr T gt nlnn cn ? e-c
? ?mixnlnn cn

n
47
Example Riffle Shuffle
48
Example Riffle Shuffle

Inverse shuffle (same mixing time)

0 0 0 1 1 1 1 1
1 0 1 1 1 0 1 0
0/1 u.a.r
sorted stably
49
Inverse Shuffle

After t steps, each card is labeled with t
digits.
Cards are sorted by their labels.
Cards with different labels are in random order
Cards with same label are in original order

0 0 0 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0 1 0 1 1 1 1
50
Riffle Shuffle (Cont.)

Let T time until all cards have distinct labels
T is a strong stationary time.
Again we need to estimate T.

51
B i rthday Paradox

With which probability two of them have the same
birthday?

52
B I rthday Paradox (Cont.)

k people, n days (ngtkgt1)
The probability all birthdays are distinct

arithmetic sum
53
Riffle Shuffle (Cont.)

By the birthday paradox,
each card (1..n) picks a random label
there are 2t possible labels
we want all labels to be distinct
?mixO(logn)

54
General Techniques for Mixing Time

Probabilistic Coupling
Combinatorial Flows
Geometric - Conductance

55
Coupling
56
Mixing Time Via Coupling

Let P be an ergodic MC. A coupling for P is a
pair process (Xt,Yt) s.t
Xt,Yt are each copies of P
XtYt ? Xt1Yt1
Define Txymint XtYt X0x, Y0Y

57
Coupling Theorem

Theorem Aldous et al.
?(t) ? maxx,yPrTx,y gt t

Design a coupling that brings X and Y together
fast
58
1. Random Walk On Cube
?0,1n ? is uniform

Markov Chain
pick coordinate i?R1,,n
pick value b?R0,1
set x(i)b

1/2
1/6
1/6
1/6
59
Coupling For Random Walk

pick same i,b for both X and Y
Txy ? time to hit all n coordinates
By coupon collecting,
Pr Txy gt nlnn cn lt e-c
? ?mix ? nlnn cn

( 0 , 0 , 1 , 0 , 1 , 1 )
( 1 , 1 , 0 , 0 , 1 , 0 )
( 0 , 0 , 1 , 0 , 1 , 1 )
( 1 , 1 , 0 , 0 , 1 , 1 )
60
Flow
capacity of e(z,z) C(e)?(z)P(z,z)
flow along e denoted f(e)
flow routes ?(x)?(y) units from x to y, for every
x,y
l(f)
Diameter
61
Flow Theorem

Theorem Diaconis/Stroak, Jerrum/Sinclair
For a lazy ergodic MC and any flow f,
?x(?) ? 2p(f)l(f) ln?(x)-1 2ln?-1

62
1. Random Walk On Cube

Flow f Route (x,y) flow evenly along all
shortest paths xy
? ?mix ?
constp(f)l(f)log?-1 O(n3)

?0,1n ?2nN ?x ?(x)1/N
1/2
1/2n
1/2n
1/2n
63
Conductance
bottleneck
64
Conductance
S
?-S
65
Conductance Theorem

Theorem Jerrum/Sinclair, Lawler/Sokal, Alon,
Cheeger For a lazy reversible MC,
?x(?) ? 2/?2 ln?(x)-1 ln?-1

66
1. Random Walk On Cube

The sketched S is (essentially) the worst S.
? ?mix O(?-2 log?min-1) O(n3)

Write a Comment

User Comments (0)

About PowerShow.com

Mathematical Foundations of Markov Chain Monte Carlo Algorithms - PowerPoint PPT Presentation

Mathematical Foundations of Markov Chain Monte Carlo Algorithms

Mathematical Foundations of Markov Chain Monte Carlo Algorithms Based on lectures given by Alistair Sinclair Computer Science Division U.C. Berkeley – PowerPoint PPT presentation