Lecture 2: Parameter Estimation and Evaluation of Support - PowerPoint PPT Presentation

About This Presentation

Title:

Lecture 2: Parameter Estimation and Evaluation of Support

Description:

Lecture 2: Parameter Estimation and Evaluation of Support – PowerPoint PPT presentation

Number of Views:72

Avg rating:3.0/5.0

Slides: 28

Provided by: canhamc

Learn more at: http://www.sortie-nd.org

Category:

more less

Transcript and Presenter's Notes

Title: Lecture 2: Parameter Estimation and Evaluation of Support

1
Lecture 2Parameter Estimation and
Evaluation of Support
2
Parameter Estimation
The problem of estimation is of more central
importance, (than hypothesis testing).. for in
almost all situations we know that the effect
whose significance we are measuring is perfectly
real, however small what is at issue is its
magnitude. (Edwards, 1992, pg. 2)
An insignificant result, far from telling us
that the effect is non-existent, merely warns us
that the sample was not large enough to reveal
it. (Edwards, 1992, pg. 2)
3
Parameter Estimation

Finding Maximum Likelihood Estimates (MLEs)
Local optimization (optim)
Gradient methods
Simplex (Nelder-Mead)
Global optimization
Simulated Annealing (anneal)
Genetic Algorithms (rgenoud)
Evaluating the strength of evidence (support)
for different parameter estimates
Support Intervals
Asymptotic Support Intervals
Simultaneous Support Intervals
The shape of likelihood surfaces around MLEs

4
Parameter estimation finding peaks on likelihood
surfaces...
The variation in likelihood for any given set of
parameter values defines a likelihood
surface...
The goal of parameter estimation is to find the
peak of the likelihood surface.... (optimization)
5
Local vs Global Optimization

Fast local optimization methods
Large family of methods, widely used for
nonlinear regression in commercial software
packages
Brute force global optimization methods
Grid search
Genetic algorithms
Simulated annealing

global optimum
local optimum
6
Local Optimization Gradient Methods

Derivative-based (Newton-Raphson) methods

Likelihood surface
General approach Vary parameter estimate
systematically and search for zero slope in the
first derivative of the likelihood
function...(using numerical methods to estimate
the derivative, and checking the second
derivative to make sure it is a maximum, not a
minimum)
7
Local Optimization No Gradient

The Simplex (Nelder Mead) method
Much simpler to program
Does not require calculation or estimation of a
derivative
No general theoretical proof that it works, (but
lots of happy practitioners)
Implemented as method Nelder-Mead in the
optim function in R

8
Global Optimization Grid Searches

Simplest form of optimization (and rarely used in
practice)
Systematically search parameter space at a grid
of points
Can be useful for visualization of the broad
features of a likelihood surface

9
Global Optimization Genetic Algorithms

Based on a fairly literal analogy with evolution
Start with a reasonably large population of
parameter sets
Calculate the fitness (likelihood) of each
individual set of parameters
Create the next generation of parameter sets
based on the fitness of the parents, and
various rules for recombination of subsets of
parameters (genes)
Let the population evolve until fitness reaches a
maximum asymptote
Implemented in the genoud package in R cool
but slow for large datasets with large number of
parameters

10
Global optimization - Simulated Annealing

Analogy with the physical process of annealing
Start the process at a high temperature
Gradually reduce the temperature according to an
annealing schedule
Always accept uphill moves (i.e. an increase in
likelihood)
Accept downhill moves according to the Metropolis
algorithm

p probability of accepting downhill move Dlh
magnitude of change in likelihood t temperature
11
Effect of temperature (t)
12
Simulated Annealing in practice...
A version with automatic adjustment of range...
Search range (step size)
Lower bound
Upper bound
Current value
REFERENCES Goffe, W. L., G. D. Ferrier, and J.
Rogers. 1994. Global optimization of statistical
functions with simulated annealing. Journal of
Econometrics 6065-99. Corana et al. 1987.
Minimizing multimodal functions of continuous
variables with the simulated annealing algorithm.
ACM Transactions on Mathematical Software
13262-280
13
Effect of C on Adjusting Range...
14
Constraints setting limits for the search...

Biological limits
Values that make no sense biologically (be
careful...)
Algebraic limits
Values for which the model is undefined (i.e.
dividing by zero...)

Bottom line global optimization methods let you
cast your net widely, at the cost of computer
time...
15
Simulated Annealing - Initialization

Set
Annealing schedule
Initial temperature (t) (3.0)
Rate of reduction in temperature (rt) (0.95)N
Interval between drops in temperature (nt) (100)
Interval between changes in range (ns) (20)
Parameter values
Initial values (x)
Upper and lower bounds (lb,ub)
Initial range (vm)

Typical values in blue...
16
How many iterations?...
Logistic regression of windthrow
susceptibility (188 parameters) 5 million is not
enough!
Red maple leaf litterfall (6 parameters) 500,000
is way more than necessary!
What would constitute convergence?...
17
Optimization - Summary

No hard and fast rules for any optimization be
willing to explore alternate options.
Be wary of initial values used in local
optimization when the model is at all
complicated
How about a hybrid approach? Start with
simulated annealing, then switch to a local
optimization

18
Evaluating the strength of evidence for the MLE
Now that you have an MLE, how should you evaluate
it? (Hint think about the shape of the
likelihood function, not just the MLE)
19
Strength of evidence for particular parameter
estimates Support
Log-likelihood Support (Edwards 1992)

Likelihood provides an objective measure of the
strength of evidence for different parameter
estimates...

20
Profile Likelihood

Evaluate support (information) for a range of
values of a given parameter by treating all other
parameters as nuisance and holding them at
their MLEs

21
Asymptotic vs. Simultaneous M-Unit Support Limits

Asymptotic Support Limits (based on Profile
Likelihood)
Hold all other parameters at their MLE values,
and systematically vary the remaining parameter
until likelihood declines by a chosen amount
(m)...

What should m be? (2 is a good number, and is
roughly analogous to a 95 CI)
22
Asymptotic vs. Simultaneous M-Unit Support Limits

Simultaneous
Resampling method draw a very large number of
random sets of parameters and calculate
log-likelihood. M-unit simultaneous support
limits for parameter xi are the upper and lower
limits that dont differ by more than m units of
support...

In practice, it can require an enormous number of
iterations to do this if there are more than a
few parameters
23
Asymptotic vs. Simultaneous Support Limits
A hypothetical likelihood surface for 2
parameters...
Simultaneous 2-unit support limits for P1
2-unit drop in support
Parameter 2
Asymptotic 2-unit support limits for P1
Parameter 1
24
Other measures of strength of evidence for
different parameter estimates

Edwards (1992 Chapter 5)
Various measures of the shape of the likelihood
surface in the vicinity of the MLE...

How pointed is the peak?...
25
Evaluating Support for Parameter Estimates A
Frequentist Approach

Traditional confidence intervals and standard
errors of the parameter estimates can be
generated from the Hessian matrix
Hessian matrix of second partial derivatives of
the likelihood function with respect to
parameters, evaluated at the maximum likelihood
estimates
Also called the Information Matrix by Fisher
Provides a measure of the steepness of the
likelihood surface in the region of the optimum
Can be generated in R using optim and fdHess

26
Example from R
The Hessian matrix (when maximizing a log
likelihood) is a numerical approximation for
Fisher's Information Matrix (i.e. the matrix of
second partial derivatives of the likelihood
function), evaluated at the point of the maximum
likelihood estimates. Thus, it's a measure of
the steepness of the drop in the likelihood
surface as you move away from the MLE. gt
reshessian a
b sd a -150.182 -2758.360
-0.201 b -2758.360 -67984.416 -5.925 sd
-0.202 -5.926
-299.422
(sample output from an analysis that estimates
two parameters and a variance term)
27
More from R
now invert (solve in R parlance) the negative
of the Hessian matrix to get the matrix of
parameter variance and covariance gt
solve(-1reshessian) a
b sd a 2.613229e-02
-1.060277e-03 3.370998e-06 b -1.060277e-03
5.772835e-05 -4.278866e-07 sd 3.370998e-06
-4.278866e-07 3.339775e-03 the square roots
of the diagonals of the inverted negative Hessian
are the standard errors gt sqrt(diag(solve(-1res
hessian))) a b sd 0.1616
0.007597 0.05779 (and 1.96 S.E. is a 95
C.I.)

Write a Comment

User Comments (0)