Title: CHAPTER 15 SIMULATION-BASED OPTIMIZATION II: STOCHASTIC GRADIENT AND SAMPLE PATH METHODS
1CHAPTER 15 SIMULATION-BASED OPTIMIZATION II
STOCHASTIC GRADIENT AND SAMPLE PATH METHODS
Slides for Introduction to Stochastic Search and
Optimization (ISSO) by J. C. Spall
- Organization of chapter in ISSO
- Introduction to gradient estimation
- Interchange of derivative and integral
- Gradient estimation techniques
- Likelihood ratio/score function (LR/SF)
- Infinitesimal perturbation analysis (IPA)
- Optimization with gradient estimates
- Sample path method
2Issues in Gradient Estimation
- Estimate the gradient of the loss function with
respect to parameters for optimization from
simulation outputs -
- where L(q) is a scalar-valued loss function to
minimize and q is a p-dimensional vector of
parameters - Essential properties of gradient estimates
- Unbiased
- Small variance
3Two Types of Parameters
- where V is the random effect in the system,
is the probability density function
of V - Distributional parameters qD Elements of q that
enter via their effect on probability
distribution of V. For example, if scalar V has
distribution N(m,s2), then m and s2 are
distributional parameters - Structural parameters qS Elements of q that have
effects directly on the loss function (via Q) - Distinction not always obvious
4Interchange of Derivative and Integral
- Unbiased gradient estimations using only one
simulation require the interchange of derivative
and integral - Above generally not true. Technical conditions
needed for validity - Q pV and are continuous
-
-
- Above has implications in practical applications
5A General Form of Gradient Estimate
- Assume that all the conditions required for the
exchange of derivative and integral are
satisfied, - Hence, an unbiased gradient estimate can be
obtained as
Output from one simulation!
6Two Gradient Estimates LR/SF and IPA
pure LR/SF
pure IPA
- Likelihood Ratio/ Score Function (LR/SF) only
distributional parameters - Infinitestimal Perturbation Analysis (IPA) only
structural parameters
7Comparison of Pure LR/SF and IPA
- In practice, neither extreme (LR/SF or IPA) may
provide a framework for reasonable
implementation - LR/SF may require deriving a complex distribution
function starting from U(0,1) - IPA may lead to intractable ?Q/?q with a complex
Q(q,V) - Pure LR/SF gradient estimate tend to suffer from
large variance (variance can grow with the number
of components in V) - Pure IPA may result in a Q(q,V) that fails to
meet the conditions for valid interchange of
derivative and integral. Hence can lead to biased
gradient estimate. - In many cases where IPA is feasible, it leads to
low variance gradient estimate
8A Simple Example Exponential Distribution
- Let Z be exponential random variable with mean q.
That is - . Define L E(Z) q. Then ?L/?q
1. - LR/SF estimate V Z Q(q,V) V.
- IPA estimate V U(0,1) Q(q,V) -qlogV (Z
-q?logV). - Both of LR/SF and IPA estimators are unbiased
9Stochastic Optimization with Gradient Estimate
- Use the gradient estimates in the root-finding
stochastic approximation (SA) algorithm to
minimize the loss function L(q) EQ(q,V) Find
q such that g(q) 0 based on simulation
outputs - A general root-finding SA algorithm
-
-
- where ak is the step size with
- If Yk is unbiased and has bounded variance (and
other appropriate assumptions hold), then
(a.s.)
an estimate of
10Simulation-Based Optimization
- Use gradient estimate derived from one simulation
run in the iteration of SA - where Vk is the realization of V from a
simulation run with parameter q set at
run one simulation with q to obtain Vk
derive gradient estimate from Vk
iterate SA with the gradient estimate
11Example Experimental Response(Examples 15.4 and
15.5 in ISSO)
- Let Vk be i.i.d. randomly generated binary
(on-off) stimuli with on probability l. Assume
Q(l,b,Vk) represents negative of specimen
response, where b is design parameter. Objective
is to design experiment to maximize the response
(i.e., minimize Q) by selecting values for l and
b. - Gradient estimate q l, bT
- where and denotes
derivative w.r.t. x
12Experimental Response (continued)
- Specific response function
- where b is a structural parameter, but l is both
a distributional and structural parameter. Then
13Search Path in Experimental Response Problem
14Sample Path Method
- Sample path method based on reusing a fixed set
of simulation runs - Method based on minimizing rather than
L(?) - represents sample mean of N
simulation runs - If N is large, then minimum of is
close to minimum of L(?) (under conditions) - Optimization problem with is
effectively deterministic - Can use standard nonlinear programming
- IPA and/or LR/SF methods of gradient estimation
still relevant - Generally need to choose a fixed value of ?
(reference value) to produce the N simulation
runs - Choice of reference value has impact on
for finite N
15Accuracy of Sample Path Method
- Interested in accuracy of sample path method in
seeking true optimal ?? (minimum of L(?)) - Let represent minimum of surrogate loss
- Let denote final solution from nonlinear
programming method - Hence, error in estimate is due to two sources
- Error in nonlinear programming solution to
finding - Difference in ?? and
- Triangle inequality can be used to provide bound
to overall error - Sometimes numerical values can be assigned to two
right-hand terms in triangle inequality