Title: Applied Econometrics
1Applied Econometrics
- William Greene
- Department of Economics
- Stern School of Business
2Applied Econometrics
- 18. Maximum Likelihood Estimation
3Maximum Likelihood Estimation
- This defines a class of estimators based on the
particular distribution assumed to have generated
the observed random variable. - The main advantage of ML estimators is that among
all Consistent Asymptotically Normal Estimators,
MLEs have optimal asymptotic properties. - The main disadvantage is that they are not
necessarily robust to failures of the
distributional assumptions. They are very
dependent on the particular assumptions. - The oft cited disadvantage of their mediocre
small sample properties is probably overstated in
view of the usual paucity of viable alternatives.
4Setting up the MLE
- The distribution of the observed random variable
is written as a function of the parameters to be
estimated - P(yidata,ß) Probability density
parameters. - The likelihood function is constructed from the
density - Construction Joint probability density
function of the observed sample of data
generally the product when the data are a random
sample.
5Regularity Conditions
- What they are
- 1. logf(.) has three continuous derivatives wrt
parameters - 2. Conditions needed to obtain expectations of
derivatives are met. (E.g., range of the
variable is not a function of the parameters.) - 3. Third derivative has finite expectation.
- What they mean
- Moment conditions and convergence. We need to
obtain expectations of derivatives. - We need to be able to truncate Taylor series.
- We will use central limit theorems
6The MLE
- The log-likelihood function log-L(?data)
- The likelihood equation(s)
- First derivatives of log-L equal zero at the
MLE. - (1/n)Si ?logf(yi ?)/??MLE 0.
- (Sample statistic.) (The 1/n is irrelevant.)
- First order conditions for maximization
- A moment condition - its counterpart is the
fundamental result E?log-L/?? 0. - How do we use this result? An analogy principle.
7Average Time Until Failure
- Estimating the average time until failure, ?, of
light bulbs. yi observed life until failure. - f(yi?)(1/?)exp(-yi/?)
- L(?)?i f(yi?) ?-N exp(-Syi/?)
- logL (?)-Nlog (?) - Syi/?
- Likelihood equation
- ?logL(?)/??-N/? Syi/?2 0
- Note, ?logf(yi?)/?? -1/? yi/?2
- Since Eyi ?, E?logf(?)/??0.
(Regular)
8Properties of the Maximum Likelihood Estimator
- We will sketch formal proofs of these results
- The log-likelihood function, again
- The likelihood equation and the information
matrix. - A linear Taylor series approximation to the first
order conditions - g(?ML) 0 ? g(?) H(?) (?ML - ?)
- (under regularity, higher order terms will
vanish in large samples.) - Our usual approach. Large sample behavior of the
left and right hand sides is the same. - A Proof of consistency. (Property 1)
- The limiting variance of ?n(?ML - ?). We are
using the central limit theorem here. - Leads to asymptotic normality (Property 2). We
will derive the asymptotic variance of the MLE. - Efficiency (we have not developed the tools to
prove this.) The Cramer-Rao lower bound for
efficient estimation (an asymptotic version of
Gauss-Markov). - Estimating the variance of the maximum likelihood
estimator. - Invariance. (A VERY handy result.) Coupled with
the Slutsky theorem and the delta method, the
invariance property makes estimation of nonlinear
functions of parameters very easy.
9Testing Hypotheses A Trinity of Tests
- The likelihood ratio test
- Based on the proposition (Greenes) that
restrictions always make life worse - Is the reduction in the criterion
(log-likelihood) large? Leads to the LR test. - The Lagrange multiplier test
- Underlying basis Reexamine the first order
conditions. - Form a test of whether the gradient is
significantly nonzero at the restricted
estimator. - The Wald test The usual.
10The Linear (Normal) Model
- Definition of the likelihood function - joint
density of the observed data, written as a
function of the parameters we wish to estimate. - Definition of the maximum likelihood estimator as
that function of the observed data that maximizes
the likelihood function, or its logarithm. - For the model yi ??xi ?i, where ?i
N0,?2, - the maximum likelihood estimators of ? and ?2
are - b (X?X)-1X?y and s2 e?e/n.
- That is, least squares is ML for the slopes, but
the variance estimator makes no degrees of
freedom correction, so the MLE is biased.
11Normal Linear Model
- The log-likelihood function
- ?i log f(yi?)
- sum of logs of densities.
- For the linear regression model with normally
distributed disturbances - log-L ?i - ½log2?
- -½log?2
- - ½(yi xi??)2/?2 .
12Likelihood Equations
- The estimator is defined by the function of the
data that equates - ?log-L/?? to 0. (Likelihood equation)
- The derivative vector of the log-likelihood
function is the score function. For the
regression model, - g ?log-L/?? , ?log-L/??2
- ?log-L/?? ?i (1/?2)xi(yi - xi??)
- ?log-L/??2 ?i -1/(2?2) (yi -
xi??)2/(2?4) - For the linear regression model, the first
derivative vector of log-L is - (1/?2)X?(y - X?) and (1/2?2) ?i (yi -
xi??)2/?2 - 1 - (K?1)
(1?1) - Note that we could compute these functions at any
? and ?2. If we compute them at b and e?e/n, the
functions will be identically zero.
13Moment Equations
- Note that g ?i gi is a random vector and
that each term in the sum has expectation zero.
It follows that E(1/n)g 0. Our estimator
is found by finding the ? that sets the sample
mean of the gs to 0. That is, theoretically,
Egi(?,?2) 0. We find the estimator as that
function which produces (1/n)?i gi(b ,s2) 0.
- Note the similarity to the way we would estimate
any mean. If - Exi ?, then Exi - ? 0. We
estimate ? by finding the function of the data
that produces (1/n)?i (xi - m) 0, which is,
of course the sample mean. - There are two main components to the regularity
conditions for maximum likelihood estimation.
The first is that the first derivative has
expected value 0. That moment equation
motivates the MLE
14Information Matrix
- The negative of the second derivatives matrix of
the log-likelihood, - -H
- is called the information matrix. It is usually
a random matrix, also. For the linear regression
model,
15Hessian for the Linear Model
Note that the off diagonal elements have
expectation zero.
16Estimated Information Matrix
This can be computed at any vector ? and scalar
?2. You can take expected values of the parts
of the matrix to get
- (which should look familiar). The off
diagonal terms go to zero (one of the assumptions
of the model).
17Deriving the Properties of the Maximum Likelihood
Estimator
18The MLE
19Consistency
20Consistency Proof
21Asymptotic Variance
22Asymptotic Variance
23Asymptotic Distribution
24Other Results 1 Variance Bound
25Invariance
- The maximum likelihood estimator of a function of
?, say h(?) is h(MLE). This is not always true
of other kinds of estimators. To get the
variance of this function, we would use the delta
method. E.g., the MLE of ?(ß/s) is b/(ee/n)
26Invariance
27Reparameterizing the Log Likelihood
28Estimating the Tobit Model
29Computing the Asymptotic Variance
- We want to estimate -EH-1 Three ways
- (1) Just compute the negative of the actual
second derivatives matrix and invert it. - (2) Insert the maximum likelihood estimates into
the known expected values of the second
derivatives matrix. Sometimes (1) and (2) give
the same answer (for example, in the linear
regression model). - (3) Since EH is the variance of the first
derivatives, estimate this with the sample
variance (i.e., mean square) of the first
derivatives. This will almost always be
different from (1) and (2). - Since they are estimating the same thing, in
large samples, all three will give the same
answer. Current practice in econometrics often
favors (3). Stata rarely uses (3). Others do.
30(No Transcript)
31(No Transcript)
32(No Transcript)
33Linear Regression Model
- Example Different Estimators of the Variance of
the MLE - Consider, again, the gasoline data. We use a
simple equation - Gt ?1 ?2Yt ?3Pgt ?t.
34Linear Model
35(No Transcript)
36BHHH Estimator
37Newtons Method
38Poisson Regression
39(No Transcript)
40Asymptotic Variance of the MLE
41(No Transcript)
42Estimators of the Asymptotic Covariance Matrix
43ROBUST ESTIMATION
- Sandwich Estimator
- H-1 (GG) H-1
- Is this appropriate? Why do we do this?
44Application Doctor Visits
- German Individual Health Care data N27,236
- Model for number of visits to the doctor
- Poisson regression (fit by maximum likelihood)
- Income, Education, Gender
45Poisson Regression Iterations
poisson lhs doctor rhs one,female,hhninc,e
ducmaroutput3 MethodNewton Maximum
iterations100 Convergence criteria gtHg
.1000D-05 chg.F .0000D00 maxdb
.0000D00 Start values .00000D00 .00000D00
.00000D00 .00000D00 1st derivs.
-.13214D06 -.61899D05 -.43338D05
-.14596D07 Parameters .28002D01
.72374D-01 -.65451D00 -.47608D-01 Itr 2 F
-.1587D06 gtHg .2832D03 chg.F .1587D06
maxdb .1346D01 1st derivs. -.33055D05
-.14401D05 -.10804D05 -.36592D06 Parameters
.21404D01 .16980D00 -.60181D00
-.48527D-01 Itr 3 F -.1115D06 gtHg .9725D02
chg.F .4716D05 maxdb .6348D00 1st derivs.
-.42953D04 -.15074D04 -.13927D04
-.47823D05 Parameters .17997D01
.27758D00 -.54519D00 -.49513D-01 Itr 4 F
-.1063D06 gtHg .1545D02 chg.F .5162D04
maxdb .1437D00 1st derivs. -.11692D03
-.22248D02 -.37525D02 -.13159D04 Parameters
.17276D01 .31746D00 -.52565D00
-.49852D-01 Itr 5 F -.1062D06 gtHg .5006D00
chg.F .1218D03 maxdb .6542D-02 1st derivs.
-.12522D00 -.54690D-02 -.40254D-01
-.14232D01 Parameters .17249D01
.31954D00 -.52476D00 -.49867D-01 Itr 6 F
-.1062D06 gtHg .6215D-03 chg.F .1254D00
maxdb .9678D-05 1st derivs. -.19317D-06
-.94936D-09 -.62872D-07 -.22029D-05 Parameters
.17249D01 .31954D00 -.52476D00
-.49867D-01 Itr 7 F -.1062D06 gtHg .9957D-09
chg.F .1941D-06 maxdb .1602D-10
Converged
46Regression and Partial Effects
----------------------------------------------
------------------ Variable Coefficient
Standard Error b/St.Er.PZgtz Mean of
X -------------------------------------------
--------------------- Constant 1.72492985
.02000568 86.222 .0000 FEMALE
.31954440 .00696870 45.854 .0000
.47877479 HHNINC -.52475878
.02197021 -23.885 .0000 .35208362 EDUC
-.04986696 .00172872 -28.846 .0000
11.3206310 ------------------------------------
------- Partial derivatives of expected val.
with respect to the vector of
characteristics. Effects are averaged over
individuals. Observations used for means
are All Obs. Conditional Mean at Sample
Point 3.1835 Scale Factor for Marginal
Effects 3.1835 -------------------------------
------------ ---------------------------------
------------------------------- Variable
Coefficient Standard Error b/St.Er.PZgtz
Mean of X ------------------------------------
---------------------------- Constant
5.49135704 .07890083 69.598 .0000
FEMALE 1.01727755 .02427607 41.905
.0000 .47877479 HHNINC -1.67058263
.07312900 -22.844 .0000 .35208362 EDUC
-.15875271 .00579668 -27.387
.0000 11.3206310
47Comparison of Standard Errors
Negative Inverse of Second Derivatives ---------
----------------------------------------------
--------- Variable Coefficient Standard
Error b/St.Er.PZgtz Mean of
X -------------------------------------------
--------------------- Constant 1.72492985
.02000568 86.222 .0000 FEMALE
.31954440 .00696870 45.854 .0000
.47877479 HHNINC -.52475878
.02197021 -23.885 .0000 .35208362 EDUC
-.04986696 .00172872 -28.846 .0000
11.3206310 BHHH -----------------------------
------------------------- Variable
Coefficient Standard Error b/St.Er.PZgtz
----------------------------------------------
-------- Constant 1.72492985
.00677787 254.495 .0000 FEMALE
.31954440 .00217499 146.918 .0000
HHNINC -.52475878 .00733328 -71.559
.0000 EDUC -.04986696 .00062283
-80.065 .0000
Why are they so different? Model failure. This
is a panel. There is autocorrelation.
48Testing Hypotheses
- Wald tests, using the familiar distance measure
- Likelihood ratio tests
- LogLU log likelihood without restrictions
- LogLR log likelihood with restrictions
- LogLU gt logLR for any nested restrictions
- 2(LogLU logLR) ? chi-squared J
- The Lagrange multiplier test. Wald test of the
hypothesis that the score of the unrestricted log
likelihood is zero when evaluated at the
restricted estimator.
49Testing the Model
---------------------------------------------
Poisson Regression
Maximum Likelihood Estimates
Dependent variable DOCVIS
Number of observations 27326
Iterations completed 7
Log likelihood function -106215.1 Log
likelihood Number of parameters
4 Restricted log likelihood
-108662.1 Log Likelihood with only a
McFadden Pseudo R-squared .0225193
constant term. Chi squared
4893.983 2logL logL(0) Degrees of
freedom 3 ProbChiSqd
gt value .0000000
---------------------------------------------
Likelihood ratio test that all three slopes are
zero.
50Wald Test
--gt MATRIX List b1 b(24) v11
varb(24,24) B1'ltV11gtB1 Matrix B1
Matrix V11 has 3 rows and 1 columns.
has 3 rows and 3 columns 1
1 2
3 --------------
------------------------------------------ 1
.31954 1 .4856275D-04
-.4556076D-06 .2169925D-05 2 -.52476
2 -.4556076D-06 .00048
-.9160558D-05 3 -.04987 3
.2169925D-05 -.9160558D-05 .2988465D-05 Matrix
Result has 1 rows and 1 columns.
1 -------------- 1 4682.38779
LR statistic was 4893.983
51LM Test
- Hypothesis 3 slopes 0. MLE with all 3 slopes
0, ? y-bar exp(ß1), so MLE of ß1 is
log(y-bar). Constrained MLEs of other 3 slopes
are zero.
52LM Statistic
--gt calc beta1log(xbr(docvis)) --gt matrix
bmle0beta1/0/0/0 --gt create lambda0
exp(x'bmle0) res0 docvis - lambda0 --gt
matrix list g0 x'res0 h0
x'lambda0x lm g0'lth0gtg0 Matrix G0
has 4 rows and 1 columns.
-------------- 1 .2664385D-08 2
7944.94441 3-1781.12219 4
-.3062440D05 Matrix H0 has 4 rows and 4
columns. --------------------------------
------------------------ 1 .8699300D05
.4165006D05 .3062881D05 .9848157D06
2 .4165006D05 .4165006D05 .1434824D05
.4530019D06 3 .3062881D05
.1434824D05 .1350638D05 .3561238D06
4 .9848157D06 .4530019D06 .3561238D06
.1161892D08 Matrix LM has 1 rows and 1
columns. -------------- 1
4715.41008 Wald was 4682.38779 LR statistic was
4893.983
53Chow Style Test for Structural Change
54Poisson Regressions
--------------------------------------------------
-------------------- Poisson Regression Dependent
variable DOCVIS Log likelihood
function -90878.20153 (Pooled, N 27326) Log
likelihood function -43286.40271 (Male, N
14243) Log likelihood function -46587.29002
(Female, N 13083) -----------------------------
---------------------------------------- Variable
Coefficient Standard Error b/St.Er. PZgtz
Mean of X -------------------------------------
-------------------------------- Pooled Constant
2.54579 .02797 91.015 .0000
AGE .00791 .00034 23.306
.0000 43.5257 EDUC -.02047
.00170 -12.056 .0000 11.3206
HSAT -.22780 .00133 -171.350
.0000 6.78543 HHNINC -.26255
.02143 -12.254 .0000 .35208
HHKIDS -.12304 .00796 -15.464
.0000 .40273 ------------------------------
--------------------------------------- Males Cons
tant 2.38138 .04053 58.763
.0000 AGE .01232 .00050
24.738 .0000 42.6528 EDUC
-.02962 .00253 -11.728 .0000
11.7287 HSAT -.23754 .00202
-117.337 .0000 6.92436 HHNINC
-.33562 .03357 -9.998 .0000
.35905 HHKIDS -.10728 .01166
-9.204 .0000 .41297 --------------------
-------------------------------------------------
Females Constant 2.48647 .03988
62.344 .0000 AGE .00379
.00048 7.940 .0000 44.4760
EDUC .00893 .00234 3.821
.0001 10.8764 HSAT -.21724
.00177 -123.029 .0000 6.63417
HHNINC -.22371 .02767 -8.084
.0000 .34450 HHKIDS -.14906
.01107 -13.463 .0000
.39158 ------------------------------------------
---------------------------
55Chi Squared Test
Namelist X one,age,educ,hsat,hhninc,hhkids Sam
ple All Poisson Lhs Docvis Rhs X
Calc Lpool logl Poisson For female
0 Lhs Docvis Rhs X Calc Lmale
logl Poisson For female 1 Lhs Docvis
Rhs X Calc Lfemale logl Calc K
Col(X) Calc List Chisq
2(Lmale Lfemale - Lpool) Ctb(.95,k)
------------------------------------
Listed Calculator Results
------------------------------------ CHISQ
2009.017601 Result 12.591587 The
hypothesis that the same model applies to men
and women is rejected.