Lecture 2: Parameter Estimation and Evaluation of Support - PowerPoint PPT Presentation

About This Presentation
Title:

Lecture 2: Parameter Estimation and Evaluation of Support

Description:

Lecture 2: Parameter Estimation and Evaluation of Support – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 28
Provided by: canhamc
Learn more at: http://www.sortie-nd.org
Category:

less

Transcript and Presenter's Notes

Title: Lecture 2: Parameter Estimation and Evaluation of Support


1
Lecture 2Parameter Estimation and
Evaluation of Support
2
Parameter Estimation
The problem of estimation is of more central
importance, (than hypothesis testing).. for in
almost all situations we know that the effect
whose significance we are measuring is perfectly
real, however small what is at issue is its
magnitude. (Edwards, 1992, pg. 2)
An insignificant result, far from telling us
that the effect is non-existent, merely warns us
that the sample was not large enough to reveal
it. (Edwards, 1992, pg. 2)
3
Parameter Estimation
  • Finding Maximum Likelihood Estimates (MLEs)
  • Local optimization (optim)
  • Gradient methods
  • Simplex (Nelder-Mead)
  • Global optimization
  • Simulated Annealing (anneal)
  • Genetic Algorithms (rgenoud)
  • Evaluating the strength of evidence (support)
    for different parameter estimates
  • Support Intervals
  • Asymptotic Support Intervals
  • Simultaneous Support Intervals
  • The shape of likelihood surfaces around MLEs

4
Parameter estimation finding peaks on likelihood
surfaces...
The variation in likelihood for any given set of
parameter values defines a likelihood
surface...
The goal of parameter estimation is to find the
peak of the likelihood surface.... (optimization)
5
Local vs Global Optimization
  • Fast local optimization methods
  • Large family of methods, widely used for
    nonlinear regression in commercial software
    packages
  • Brute force global optimization methods
  • Grid search
  • Genetic algorithms
  • Simulated annealing

global optimum
local optimum
6
Local Optimization Gradient Methods
  • Derivative-based (Newton-Raphson) methods

Likelihood surface
General approach Vary parameter estimate
systematically and search for zero slope in the
first derivative of the likelihood
function...(using numerical methods to estimate
the derivative, and checking the second
derivative to make sure it is a maximum, not a
minimum)
7
Local Optimization No Gradient
  • The Simplex (Nelder Mead) method
  • Much simpler to program
  • Does not require calculation or estimation of a
    derivative
  • No general theoretical proof that it works, (but
    lots of happy practitioners)
  • Implemented as method Nelder-Mead in the
    optim function in R

8
Global Optimization Grid Searches
  • Simplest form of optimization (and rarely used in
    practice)
  • Systematically search parameter space at a grid
    of points
  • Can be useful for visualization of the broad
    features of a likelihood surface

9
Global Optimization Genetic Algorithms
  • Based on a fairly literal analogy with evolution
  • Start with a reasonably large population of
    parameter sets
  • Calculate the fitness (likelihood) of each
    individual set of parameters
  • Create the next generation of parameter sets
    based on the fitness of the parents, and
    various rules for recombination of subsets of
    parameters (genes)
  • Let the population evolve until fitness reaches a
    maximum asymptote
  • Implemented in the genoud package in R cool
    but slow for large datasets with large number of
    parameters

10
Global optimization - Simulated Annealing
  • Analogy with the physical process of annealing
  • Start the process at a high temperature
  • Gradually reduce the temperature according to an
    annealing schedule
  • Always accept uphill moves (i.e. an increase in
    likelihood)
  • Accept downhill moves according to the Metropolis
    algorithm

p probability of accepting downhill move Dlh
magnitude of change in likelihood t temperature
11
Effect of temperature (t)
12
Simulated Annealing in practice...
A version with automatic adjustment of range...
Search range (step size)
Lower bound
Upper bound
Current value
REFERENCES Goffe, W. L., G. D. Ferrier, and J.
Rogers. 1994. Global optimization of statistical
functions with simulated annealing. Journal of
Econometrics 6065-99. Corana et al. 1987.
Minimizing multimodal functions of continuous
variables with the simulated annealing algorithm.
ACM Transactions on Mathematical Software
13262-280
13
Effect of C on Adjusting Range...
14
Constraints setting limits for the search...
  • Biological limits
  • Values that make no sense biologically (be
    careful...)
  • Algebraic limits
  • Values for which the model is undefined (i.e.
    dividing by zero...)

Bottom line global optimization methods let you
cast your net widely, at the cost of computer
time...
15
Simulated Annealing - Initialization
  • Set
  • Annealing schedule
  • Initial temperature (t) (3.0)
  • Rate of reduction in temperature (rt) (0.95)N
  • Interval between drops in temperature (nt) (100)
  • Interval between changes in range (ns) (20)
  • Parameter values
  • Initial values (x)
  • Upper and lower bounds (lb,ub)
  • Initial range (vm)

Typical values in blue...
16
How many iterations?...
Logistic regression of windthrow
susceptibility (188 parameters) 5 million is not
enough!
Red maple leaf litterfall (6 parameters) 500,000
is way more than necessary!
What would constitute convergence?...
17
Optimization - Summary
  • No hard and fast rules for any optimization be
    willing to explore alternate options.
  • Be wary of initial values used in local
    optimization when the model is at all
    complicated
  • How about a hybrid approach? Start with
    simulated annealing, then switch to a local
    optimization

18
Evaluating the strength of evidence for the MLE
Now that you have an MLE, how should you evaluate
it? (Hint think about the shape of the
likelihood function, not just the MLE)
19
Strength of evidence for particular parameter
estimates Support
Log-likelihood Support (Edwards 1992)
  • Likelihood provides an objective measure of the
    strength of evidence for different parameter
    estimates...

20
Profile Likelihood
  • Evaluate support (information) for a range of
    values of a given parameter by treating all other
    parameters as nuisance and holding them at
    their MLEs

21
Asymptotic vs. Simultaneous M-Unit Support Limits
  • Asymptotic Support Limits (based on Profile
    Likelihood)
  • Hold all other parameters at their MLE values,
    and systematically vary the remaining parameter
    until likelihood declines by a chosen amount
    (m)...

What should m be? (2 is a good number, and is
roughly analogous to a 95 CI)
22
Asymptotic vs. Simultaneous M-Unit Support Limits
  • Simultaneous
  • Resampling method draw a very large number of
    random sets of parameters and calculate
    log-likelihood. M-unit simultaneous support
    limits for parameter xi are the upper and lower
    limits that dont differ by more than m units of
    support...

In practice, it can require an enormous number of
iterations to do this if there are more than a
few parameters
23
Asymptotic vs. Simultaneous Support Limits
A hypothetical likelihood surface for 2
parameters...
Simultaneous 2-unit support limits for P1
2-unit drop in support
Parameter 2
Asymptotic 2-unit support limits for P1
Parameter 1
24
Other measures of strength of evidence for
different parameter estimates
  • Edwards (1992 Chapter 5)
  • Various measures of the shape of the likelihood
    surface in the vicinity of the MLE...

How pointed is the peak?...
25
Evaluating Support for Parameter Estimates A
Frequentist Approach
  • Traditional confidence intervals and standard
    errors of the parameter estimates can be
    generated from the Hessian matrix
  • Hessian matrix of second partial derivatives of
    the likelihood function with respect to
    parameters, evaluated at the maximum likelihood
    estimates
  • Also called the Information Matrix by Fisher
  • Provides a measure of the steepness of the
    likelihood surface in the region of the optimum
  • Can be generated in R using optim and fdHess

26
Example from R
The Hessian matrix (when maximizing a log
likelihood) is a numerical approximation for
Fisher's Information Matrix (i.e. the matrix of
second partial derivatives of the likelihood
function), evaluated at the point of the maximum
likelihood estimates. Thus, it's a measure of
the steepness of the drop in the likelihood
surface as you move away from the MLE. gt
reshessian a
b sd a -150.182 -2758.360
-0.201 b -2758.360 -67984.416 -5.925 sd
-0.202 -5.926
-299.422
(sample output from an analysis that estimates
two parameters and a variance term)
27
More from R
now invert (solve in R parlance) the negative
of the Hessian matrix to get the matrix of
parameter variance and covariance gt
solve(-1reshessian) a
b sd a 2.613229e-02
-1.060277e-03 3.370998e-06 b -1.060277e-03
5.772835e-05 -4.278866e-07 sd 3.370998e-06
-4.278866e-07 3.339775e-03 the square roots
of the diagonals of the inverted negative Hessian
are the standard errors gt sqrt(diag(solve(-1res
hessian))) a b sd 0.1616
0.007597 0.05779 (and 1.96 S.E. is a 95
C.I.)
Write a Comment
User Comments (0)
About PowerShow.com