Mathematical Foundations of Markov Chain Monte Carlo Algorithms - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

Mathematical Foundations of Markov Chain Monte Carlo Algorithms

Description:

Dana Moshkovitz ... Dana Moshkovitz. Why Sampling? statistics of 'typical' ... Dana Moshkovitz. Application 6 : Hypothesis Verification in ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 67
Provided by: csPrin
Category:

less

Transcript and Presenter's Notes

Title: Mathematical Foundations of Markov Chain Monte Carlo Algorithms


1
Mathematical FoundationsofMarkov Chain Monte
Carlo Algorithms
  • Based on lectures given by
  • Alistair Sinclair
  • Computer Science Division
  • U.C. Berkeley

2
Overview
  • Random Sampling
  • The Markov Chain Monte-Carlo Paradigm
  • Mixing Time
  • Coupling
  • Flow
  • Geometry

Techniques for Bounding the Mixing Time
3
Random Sampling
x
  • ? - very large sample set.
  • ? - probability distribution over ?.

Goal Sample points x?? at random from
distribution ?.
4
The Probability Distribution
  • Typically,

w??R is an easily-computed weight function
ZSx w(x) is an unknown normalization factor
5
Application 1 Card Shuffling
  • ? - all 52! permutations of a deck of cards.
  • ? - uniform distribution ?x w(x)1.

Goal pick a permutation uniformly at random
6
Application 2 Counting
  • How many ways can we tile some given pattern with
    dominos?

7
Application 2 Counting (cont.)
  • Sample tilings uniformly at random.
  • Let P1 proportion of sample of type 1.
  • Compute estimate N1 of N1 recursively.
  • output N N1 / P1.

N1
N2
N N1 N2
sample size O(n), levels O(n) ? O(n2)
samples total
8
Application 3 Volume Integration
Dyer\Frieze\Kannan
  • ? a convex body in Rd (d large)
  • Problem estimate vol(?)

sequence of concentric balls B0 ? ? Br
estimate by sampling uniformly from ??Bi
Generalization Integration of log-concave
function over a cube A?Rd
9
Application 4 Statistical Physics
  • ? - set of configurations of a physical system
  • ? - Gibbs distribution
  • ?(x)Pr system in config. xw(x)/Z
  • where w(x)e-H(x)/KT

temperature
energy
10
The Ising Model
  • n atomic magnets
  • configuration x?-,n
  • H(x) -(aligned neighbors)
  • - - -
  • - - -
  • - - -
  • - - - -
  • -

11
Why Sampling?
  • statistics of typical configurations.
  • mean energy (E?H(x)), specific heat,
  • estimate of partition function ZZ(T)?x??w(x)

12
Estimating the Partition Function
  • Let ?e-1/KT ? ZZ(?)?x?? ?-H(x).
  • Define 1?0 lt ?1 lt lt ?r?.

? r ? nlog?O(n2)
can be estimated by random sampling from
??i-1 ?i? ?i-1(11/n) ensures small variance ?
O(n) samples suffice for each ratio
13
Application 5 Optimization
  • ? - set of feasible solutions to an optimization
    problem
  • f(x) - value of solution x.
  • Goal maximize f(x).
  • Idea sample solutions where w(x)?f(x).

14
Application 5 Optimization
  • Idea sample solutions where w(x)?f(x).

concentration on good solutions (large values
f(x))
large ?
?
greater mobility (local optima are less high)
small ?
Simulated Annealing heuristic Slowly increase ?
15
Application 6 Hypothesis Verification in
Statistical Models
  • ? - set of hypotheses
  • X - observed data

Let w(?)P(?)P(X/?).
prior
easy
16
Application 6 Hypothesis Verification in
Statistical Models (cont.)
  • Sampling from ?(?)P(?/X) gives
  • Statistical estimate of hypotheses ?.
  • Prediction
  • Model comparison
  • normalization factor P(X)
  • Prob model generated X

17
Markov Chains
  • Sample space ?
  • Random variables (r.v) over ?
  • X1,X2,,Xt,
  • Memoryless ?tgt0, ?x1,,xt1??,

18
Sampling Algorithm
  • Start at an arbitrary state X0.
  • Simulate MC for sufficiently many steps t.
  • Output Xt.
  • Then, ?x?? Prob Xt x ?(x)

X0
?
Xt
19
Transitions Matrix
PrXt1y/Xtx
y
  • P is non-negative
  • P is stochastic (?x ?xP(x,y)1)
  • PrXt1y/X0xPt(x,y)
  • PxtPx0 Pt
  • Definition ? is a stationary distribution, if
    ?P?.

Px
x
P
20
Irreducibility
  • Definition P is irreducible if

21
Aperiodicity
  • Definition P is aperiodic if

22
Note on Irreducibility and Aperiodicity
  • If P is irreducible, we can always make it
    aperiodic, by adding self-loops
  • P ½(PI)
  • P has same stationary distribution as P.
  • Call P a lazy MC.

23
Fundamental Theorem
  • Theorem If P is irreducible and aperiodic,
  • then it is ergodic, i.e
  • where ? is the (unique) stationary distribution
    of P i.e ? P?.

24
Main Idea (The MCMC Paradigm)
  • An ergodic MC provides an effective algorithm for
    sampling from ?.

25
Examples
  • Random Walks on Graphs
  • Ehrenfest Urn
  • Card Shuffling
  • Coloring of a Graph
  • The Ising Model

26
1. Random Walk on Undirected Graphs
At each node, choose a neighbor u.a.r and jump to
it
27
Random Walk on Undirected Graph G(V,E)
?V
degree
  • Irreducible ? G is connected
  • Aperiodic ? G is not bipartite

28
Random Walk The Stationary Distribution
not essential
  • Claim If G is connected and not bipartite, then
    the probability distribution induced by a random
    walk on it converges to ?(x)d(x)/Sxd(x).
  • Proof

2E
29
2. Ehrenfest Urn
j balls
(n-j) balls
  • Pick a ball u.a.r
  • Move the ball to the other urn

30
2. Ehrenfest Urn
  • Xt number of balls in first urn.
  • MC is a non-uniform random walk on ?0,1,,n.

j/n
1-j/n
  • Irreducible Periodic
  • Stationary distribution

31
3. Card Shuffling
  • Top-in-at-random
  • Irreducible
  • Aperiodic
  • P is doubly stochastic ?y SxP(x,y)1
  • ? ? is uniform ?x ?(x)1/n!

32
3. Card Shuffling
  • Random Transpositions
  • Irreducible
  • Aperiodic
  • P is symmetric ?x,y P(x,y)P(y,x)
  • ? ? is uniform

33
3. Card Shuffling
  • Riffle shuffle Gilbert/Shannon/Reeds

34
3. Card Shuffling
  • Riffle shuffle Gilbert/Shannon/Reeds
  • Irreducible
  • Aperiodic
  • P is doubly stochastic
  • ? ? is uniform

35
4. Colorings of a graph
  • G(V,E) connected, undirected
  • q number of colors
  • ? set of proper q-colorings of G
  • ? uniform

36
Colorings Markov Chain
  • pick v?V and c?1,,q u.a.r.
  • recolor v with c if possible.

Gs max degree
  • Irreducible if q??2
  • Aperiodic
  • P is symmetric
  • ? ? is uniform

37
5. The Ising Model
  • Markov chain (Heat bath)
  • pick a site i u.a.r
  • replace spin x(i) by random spin x(i) s.t
  • n sites
  • ?-,n
  • w(x)?aligned neighbors (x)
  • - - -
  • - - -
  • - - -
  • - - - -
  • -

neighbors of i
Irreducible, aperiodic, reversible w.r.t ? ?
converges to ?
38
Designing Markov Chains
  • What do we want?
  • Given ?, ?
  • MC over ? which converges to ?

39
The Metropolis Rule
  • Define any connected undirected graph on ?
    (neighborhood structure/(local) moves)

40
The Metropolis Rule
  • Transitions from state x??
  • pick a neighbor y of x w.p ?(x,y)
  • move to y w.p minw(y)/w(x),1
  • (else stay at x)

?(x,y)?(y,x), ?(x,x)1-Sy-x?(x,y)
  • Irreducible
  • Aperiodic (make lazy if nec.)
  • reversible w.r.t w
  • ? converges to ?.

41
The Mixing Time
  • Key Question How long until Pxt looks like ??
  • We will use the variation distance

42
The Mixing Time
  • Define
  • ?x(t) pxt-?
  • ?(t) maxx ?x(t)
  • The mixing time is
  • ?mixmin t ?(t)?1/2e

43
Toy Example Top-In-At-Random
  • Let T time after initial bottom card reaches
    top
  • T is a strong stationary time, i.e
  • PrXtx/tT?(x)
  • Claim ?(t)?PrTgtt
  • Thus, it remains to estimate T.

n
44
The Coupon Collector Problem
  • Each pack contains one coupon.
  • The goal is to complete the series.
  • How many packs would we buy?!

45
The Coupon Collector Problem
  • N total number of different coupons.
  • Xi time to get the i-th coupon.

46
Toy Example Top-In-At-Random
  • By the coupon collector,
  • the i-th coupon is a ticket to advance from the
    (n-i1) level to the next one.
  • Pr T gt nlnn cn ? e-c
  • ? ?mixnlnn cn

n
47
Example Riffle Shuffle
48
Example Riffle Shuffle
  • Inverse shuffle (same mixing time)

0 0 0 1 1 1 1 1
1 0 1 1 1 0 1 0
0/1 u.a.r
sorted stably
49
Inverse Shuffle
  • After t steps, each card is labeled with t
    digits.
  • Cards are sorted by their labels.
  • Cards with different labels are in random order
  • Cards with same label are in original order

0 0 0 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0 1 0 1 1 1 1
50
Riffle Shuffle (Cont.)
  • Let T time until all cards have distinct labels
  • T is a strong stationary time.
  • Again we need to estimate T.

51
B i rthday Paradox
  • With which probability two of them have the same
    birthday?

52
B I rthday Paradox (Cont.)
  • k people, n days (ngtkgt1)
  • The probability all birthdays are distinct

arithmetic sum
53
Riffle Shuffle (Cont.)
  • By the birthday paradox,
  • each card (1..n) picks a random label
  • there are 2t possible labels
  • we want all labels to be distinct
  • ?mixO(logn)

54
General Techniques for Mixing Time
  • Probabilistic Coupling
  • Combinatorial Flows
  • Geometric - Conductance

55
Coupling
56
Mixing Time Via Coupling
  • Let P be an ergodic MC. A coupling for P is a
    pair process (Xt,Yt) s.t
  • Xt,Yt are each copies of P
  • XtYt ? Xt1Yt1
  • Define Txymint XtYt X0x, Y0Y

57
Coupling Theorem
  • Theorem Aldous et al.
  • ?(t) ? maxx,yPrTx,y gt t

Design a coupling that brings X and Y together
fast
58
1. Random Walk On Cube
?0,1n ? is uniform
  • Markov Chain
  • pick coordinate i?R1,,n
  • pick value b?R0,1
  • set x(i)b

1/2
1/6
1/6
1/6
59
Coupling For Random Walk
  • pick same i,b for both X and Y
  • Txy ? time to hit all n coordinates
  • By coupon collecting,
  • Pr Txy gt nlnn cn lt e-c
  • ? ?mix ? nlnn cn

( 0 , 0 , 1 , 0 , 1 , 1 )
( 1 , 1 , 0 , 0 , 1 , 0 )
( 0 , 0 , 1 , 0 , 1 , 1 )
( 1 , 1 , 0 , 0 , 1 , 1 )
60
Flow
capacity of e(z,z) C(e)?(z)P(z,z)
flow along e denoted f(e)
flow routes ?(x)?(y) units from x to y, for every
x,y
l(f)
Diameter
61
Flow Theorem
  • Theorem Diaconis/Stroak, Jerrum/Sinclair
  • For a lazy ergodic MC and any flow f,
  • ?x(?) ? 2p(f)l(f) ln?(x)-1 2ln?-1

62
1. Random Walk On Cube
  • Flow f Route (x,y) flow evenly along all
    shortest paths xy
  • ? ?mix ?
  • constp(f)l(f)log?-1 O(n3)

?0,1n ?2nN ?x ?(x)1/N
1/2
1/2n
1/2n
1/2n
63
Conductance
bottleneck
64
Conductance
S
?-S
65
Conductance Theorem
  • Theorem Jerrum/Sinclair, Lawler/Sokal, Alon,
    Cheeger For a lazy reversible MC,
  • ?x(?) ? 2/?2 ln?(x)-1 ln?-1

66
1. Random Walk On Cube
  • The sketched S is (essentially) the worst S.
  • ? ?mix O(?-2 log?min-1) O(n3)
Write a Comment
User Comments (0)
About PowerShow.com