Title: Applied Econometrics
1Applied Econometrics
- William Greene
- Department of Economics
- Stern School of Business
2Applied Econometrics
- 22. Simulation Based Estimation
3Settings
- Conditional and unconditional log likelihoods
- Likelihood function to be maximized contains
unobservables - Integration techniques
- Bayesian estimation
- Prior times likelihood is intractible
- How to obtain posterior means, which are open
form integrals - The problem in both cases is how to do the
integration?
4A Conditional Log Likelihood
5Application - Innovation
- Sample 1,270 German Manufacturing Firms
- Panel, 5 years, 1984-1988
- Response Process or product innovation in the
survey year? (yes or no) - Inputs
- Imports of products in the industry
- Pressure from foreign direct investment
- Other covariates
- Model Probit with common firm effects
- (Irene Bertschuk, doctoral thesis, Journal of
Econometrics, 1998)
6Likelihood Function
- Joint conditional (on ui) density for obs. i.
- Unconditional likelihood for observation i
- How do we do the integration to get rid of the
heterogeneity in the conditional likelihood?
7Obtaining the Unconditional Likelihood
- The Butler and Moffitt (1982) method is used by
most current software - Quadrature (Stata GLAMM)
- Works only for normally distributed heterogeneity
8Hermite Quadrature
9Example 8 Point Quadrature
Nodes for 8 point Hermite Quadrature Use both
signs, and - 0.381186990207322000,
1.15719371244677990 1.98165675669584300
2.93063742025714410
Weights for 8 point Hermite Quadrature
0.661147012558199960, 0.20780232581489999,
0.0170779830074100010,
0.000199604072211400010
10Butler and Moffitts Approach Random Effects
Log Likelihood Function
11Monte Carlo Integration
12The Simulated Log Likelihood
13Quasi-Monte Carlo Integration Based on Halton
Sequences
For example, using base p5, the integer r37 has
b0 2, b1 2, and b3 1. Then H37(5) 2?5-1
2?5-2 1?5-3 0.448.
14Panel Data Estimation A Random Effects Probit
Model
15Log Likelihood
16(1.17072 / (1 1.17072) 0.578)
17Quadrature vs. Simulation
- Computationally, comparably difficult
- Numerically, essentially the same answer. MSL is
consistent in R - Advantages of simulation
- Can integrate over any distribution, not just
normal - Can integrate over multiple random variables.
Quadrature is largely unable to do this. - Models based on simulation are being extended in
many directions. - Simulation based estimator allows estimation of
conditional means ? essentially the same as
Bayesian posterior means
18A Random Parameters Model
19Estimates of a Random Parameters Model
--------------------------------------------------
-------------------- Probit Regression Start
Values for IP Dependent variable
IP Log likelihood function
-4134.84707 Estimation based on N 6350, K
6 Information Criteria Normalization1/N
Normalized Unnormalized AIC
1.30420 8281.69414 --------------------------
------------------------------------------- Variab
le Coefficient Standard Error b/St.Er.
PZgtz Mean of X ----------------------------
----------------------------------------- Constant
-2.34718 .21381 -10.978
.0000 FDIUM 3.39290 .39359
8.620 .0000 .04581 IMUM
.90941 .14333 6.345 .0000
.25275 LOGSALES .24292 .01937
12.538 .0000 10.5401 SP
1.16687 .14072 8.292 .0000
.07428 PROD -4.71078 .55278
-8.522 .0000 .08962 --------------------
-------------------------------------------------
20RPM
--------------------------------------------------
-------------------- Random Coefficients Probit
Model Dependent variable
IP Log likelihood function -3778.66358 Restric
ted log likelihood -4134.84707 Chi squared
3 d.f. 712.36699 Significance level
.00000 McFadden Pseudo R-squared
.0861419 Estimation based on N 6350, K
9 Sample is 5 pds and 1270 individuals PROBIT
(normal) probability model Simulation based on
100 Halton draws -------------------------------
-------------------------------------- Variable
Coefficient Standard Error b/St.Er. PZgtz
Mean of X --------------------------------------
-------------------------------
Nonrandom parameters Constant -2.27025
.22690 -10.006 .0000 FDIUM
3.47186 .45540 7.624 .0000
.04581 IMUM 1.14380 .15923
7.183 .0000 .25275 LOGSALES
.22455 .02061 10.894 .0000
10.5401 Means for random parameters
SP 3.26505 .20589 15.858
.0000 .07428 PROD -5.04105
.65950 -7.644 .0000 .08962
Diagonal elements of Cholesky matrix SP
3.56006 1.34728 2.642 .0082
PROD .01483 .18199 .082
.9350 Below diagonal elements of
Cholesky matrix lPRO_SP 3.13827
.27013 11.618 .0000 ---------------------
------------------------------------------------
21RPM
Implied covariance matrix of random
parameters Matrix Var_Beta has 2 rows and 2
columns. 1 2
-------------------------- 1
12.67402 11.17243 2 11.17243
9.84897 -------------------------- Imp
lied standard deviations of random
parameters Matrix S.D_Beta has 2 rows and 1
columns. 1
------------- 1 3.56006 2
3.13831 -------------
22(No Transcript)
23Movie Model
24Parameter Heterogeneity
25Bayesian Estimators
- Random Parameters vs. Randomly Distributed
Parameters - Models of Individual Heterogeneity
- Random Effects Consumer Brand Choice
- Fixed Effects Hospital Costs
26Bayesian Estimation
- Specification of conditional likelihood f(data
parameters) - Specification of priors g(parameters)
- Posterior density of parameters
- Posterior mean Eparametersdata
27The Marginal Density for the Data is Irrelevant
28Computing Bayesian Estimators
- First generation Do the integration (math)
- Contemporary - Simulation
- (1) Deduce the posterior
- (2) Draw random samples of draws from the
posterior and compute the sample means and
variances of the samples. (Relies on the law of
large numbers.)
29Modeling Issues
- As N ??, the likelihood dominates and the prior
disappears ? Bayesian and Classical MLE converge.
(Needs the mode of the posterior to converge to
the mean.) - Priors
- Diffuse ? large variances imply little prior
information. (NONINFORMATIVE) - INFORMATIVE priors finite variances that appear
in the posterior. Taints any final results.
30A Random Effects Approach
- Allenby and Rossi, Marketing Models of Consumer
Heterogeneity - Discrete Choice Model Brand Choice
- Hierarchical Bayes
- Multinomial Probit
- Panel Data Purchases of 4 brands of Ketchup
31Structure
32Bayesian Priors
33Bayesian Estimator
- Joint Posterior
- Integral does not exist in closed form.
- Estimate by random samples from the joint
posterior. - Full joint posterior is not known, so not
possible to sample from the joint posterior.
34Gibbs Sampling
- Target Sample from f(x1, x2) joint
distribution - Joint distribution is unknown or it is not
possible to sample from the joint distribution. - Assumed f(x1x2) and f(x2x1) both known and
samples can be drawn from both. - Gibbs sampling Obtain one draw from x1,x2 by
many cycles between x1x2 and x2x1. - Start x1,0 anywhere in the right range.
- Draw x2,0 from x2x1,0.
- Return to x1,1 from x1x2,0 and so on.
- Several thousand cycles produces a draw
- Repeat several thousand times to produce a sample
- Average the draws to estimate the marginal means.
35Gibbs Cycles for the MNP Model
- Samples from the marginal posteriors
36Results
- Individual parameter vectors and disturbance
variances - Individual estimates of choice probabilities
- The same as the random parameters model with
slightly different weights. - Allenby and Rossi call the classical method an
approximate Bayesian approach. - (Greene calls the Bayesian estimator an
approximate random parameters model) - Whos right?
- Bayesian layers on implausible uninformative
priors and calls the maximum likelihood results
exact Bayesian estimators - Classical is strongly parametric and a slave to
the distributional assumptions. - Bayesian is even more strongly parametric than
classical. - Neither is right Both are right.
37Comparison of Maximum Simulated Likelihood and
Hierarchical Bayes
- Ken Train A Comparison of Hierarchical Bayes
and Maximum Simulated Likelihood for Mixed Logit - Mixed Logit
38Stochastic Structure Conditional Likelihood
Note individual specific parameter vector, ?i
39Classical Approach
40Bayesian Approach Gibbs Sampling and
Metropolis-Hastings
41Gibbs Sampling from Posteriors b
42Gibbs Sampling from Posteriors O
43Gibbs Sampling from Posteriors ?i
44Metropolis Hastings Method
45Metropolis Hastings A Draw of ?i
46Application Energy Suppliers
- N361 individuals, 2 to 12 hypothetical suppliers
- X(1) fixed rates, (2) contract length, (3)
local (0,1),(4) well known company (0,1), (5)
offer TOD rates (0,1), (6) offer seasonal rates
47Estimates Mean of Individual ?i
48Reconciliation A Theorem (Bernstein-Von Mises)
- The posterior distribution converges to normal
with covariance matrix equal to 1/N times the
information matrix (same as classical MLE). (The
distribution that is converging is the posterior,
not the sampling distribution of the estimator of
the posterior mean.) - The posterior mean (empirical) converges to the
mode of the likelihood function. Same as the
MLE. A proper prior disappears asymptotically. - Asymptotic sampling distribution of the posterior
mean is the same as that of the MLE.