Applied Econometrics

About This Presentation

Title:

Applied Econometrics

Description:

This defines a class of estimators based on the particular ... The main advantage of ML estimators is that among all ... Converged. Regression and ... – PowerPoint PPT presentation

Number of Views:56

Avg rating:3.0/5.0

Slides: 56

Provided by: valued79

Category:

more less

Transcript and Presenter's Notes

Title: Applied Econometrics

1
Applied Econometrics

William Greene
Department of Economics
Stern School of Business

2
Applied Econometrics

18. Maximum Likelihood Estimation

3
Maximum Likelihood Estimation

This defines a class of estimators based on the
particular distribution assumed to have generated
the observed random variable.
The main advantage of ML estimators is that among
all Consistent Asymptotically Normal Estimators,
MLEs have optimal asymptotic properties.
The main disadvantage is that they are not
necessarily robust to failures of the
distributional assumptions. They are very
dependent on the particular assumptions.
The oft cited disadvantage of their mediocre
small sample properties is probably overstated in
view of the usual paucity of viable alternatives.

4
Setting up the MLE

The distribution of the observed random variable
is written as a function of the parameters to be
estimated
P(yidata,ß) Probability density
parameters.
The likelihood function is constructed from the
density
Construction Joint probability density
function of the observed sample of data
generally the product when the data are a random
sample.

5
Regularity Conditions

What they are
1. logf(.) has three continuous derivatives wrt
parameters
2. Conditions needed to obtain expectations of
derivatives are met. (E.g., range of the
variable is not a function of the parameters.)
3. Third derivative has finite expectation.
What they mean
Moment conditions and convergence. We need to
obtain expectations of derivatives.
We need to be able to truncate Taylor series.
We will use central limit theorems

6
The MLE

The log-likelihood function log-L(?data)
The likelihood equation(s)
First derivatives of log-L equal zero at the
MLE.
(1/n)Si ?logf(yi ?)/??MLE 0.
(Sample statistic.) (The 1/n is irrelevant.)
First order conditions for maximization
A moment condition - its counterpart is the
fundamental result E?log-L/?? 0.
How do we use this result? An analogy principle.

7
Average Time Until Failure

Estimating the average time until failure, ?, of
light bulbs. yi observed life until failure.
f(yi?)(1/?)exp(-yi/?)
L(?)?i f(yi?) ?-N exp(-Syi/?)
logL (?)-Nlog (?) - Syi/?
Likelihood equation
?logL(?)/??-N/? Syi/?2 0
Note, ?logf(yi?)/?? -1/? yi/?2
Since Eyi ?, E?logf(?)/??0.
(Regular)

8
Properties of the Maximum Likelihood Estimator

We will sketch formal proofs of these results
The log-likelihood function, again
The likelihood equation and the information
matrix.
A linear Taylor series approximation to the first
order conditions
g(?ML) 0 ? g(?) H(?) (?ML - ?)
(under regularity, higher order terms will
vanish in large samples.)
Our usual approach. Large sample behavior of the
left and right hand sides is the same.
A Proof of consistency. (Property 1)
The limiting variance of ?n(?ML - ?). We are
using the central limit theorem here.
Leads to asymptotic normality (Property 2). We
will derive the asymptotic variance of the MLE.
Efficiency (we have not developed the tools to
prove this.) The Cramer-Rao lower bound for
efficient estimation (an asymptotic version of
Gauss-Markov).
Estimating the variance of the maximum likelihood
estimator.
Invariance. (A VERY handy result.) Coupled with
the Slutsky theorem and the delta method, the
invariance property makes estimation of nonlinear
functions of parameters very easy.

9
Testing Hypotheses A Trinity of Tests

The likelihood ratio test
Based on the proposition (Greenes) that
restrictions always make life worse
Is the reduction in the criterion
(log-likelihood) large? Leads to the LR test.
The Lagrange multiplier test
Underlying basis Reexamine the first order
conditions.
Form a test of whether the gradient is
significantly nonzero at the restricted
estimator.
The Wald test The usual.

10
The Linear (Normal) Model

Definition of the likelihood function - joint
density of the observed data, written as a
function of the parameters we wish to estimate.
Definition of the maximum likelihood estimator as
that function of the observed data that maximizes
the likelihood function, or its logarithm.
For the model yi ??xi ?i, where ?i
N0,?2,
the maximum likelihood estimators of ? and ?2
are
b (X?X)-1X?y and s2 e?e/n.
That is, least squares is ML for the slopes, but
the variance estimator makes no degrees of
freedom correction, so the MLE is biased.

11
Normal Linear Model

The log-likelihood function
?i log f(yi?)
sum of logs of densities.
For the linear regression model with normally
distributed disturbances
log-L ?i - ½log2?
-½log?2
- ½(yi xi??)2/?2 .

12
Likelihood Equations

The estimator is defined by the function of the
data that equates
?log-L/?? to 0. (Likelihood equation)
The derivative vector of the log-likelihood
function is the score function. For the
regression model,
g ?log-L/?? , ?log-L/??2
?log-L/?? ?i (1/?2)xi(yi - xi??)
?log-L/??2 ?i -1/(2?2) (yi -
xi??)2/(2?4)
For the linear regression model, the first
derivative vector of log-L is
(1/?2)X?(y - X?) and (1/2?2) ?i (yi -
xi??)2/?2 - 1
(K?1)
(1?1)
Note that we could compute these functions at any
? and ?2. If we compute them at b and e?e/n, the
functions will be identically zero.

13
Moment Equations

Note that g ?i gi is a random vector and
that each term in the sum has expectation zero.
It follows that E(1/n)g 0. Our estimator
is found by finding the ? that sets the sample
mean of the gs to 0. That is, theoretically,
Egi(?,?2) 0. We find the estimator as that
function which produces (1/n)?i gi(b ,s2) 0.
Note the similarity to the way we would estimate
any mean. If
Exi ?, then Exi - ? 0. We
estimate ? by finding the function of the data
that produces (1/n)?i (xi - m) 0, which is,
of course the sample mean.
There are two main components to the regularity
conditions for maximum likelihood estimation.
The first is that the first derivative has
expected value 0. That moment equation
motivates the MLE

14
Information Matrix

The negative of the second derivatives matrix of
the log-likelihood,
-H
is called the information matrix. It is usually
a random matrix, also. For the linear regression
model,

15
Hessian for the Linear Model
Note that the off diagonal elements have
expectation zero.
16
Estimated Information Matrix
This can be computed at any vector ? and scalar
?2. You can take expected values of the parts
of the matrix to get

(which should look familiar). The off
diagonal terms go to zero (one of the assumptions
of the model).

17
Deriving the Properties of the Maximum Likelihood
Estimator
18
The MLE
19
Consistency
20
Consistency Proof
21
Asymptotic Variance
22
Asymptotic Variance
23
Asymptotic Distribution
24
Other Results 1 Variance Bound
25
Invariance

The maximum likelihood estimator of a function of
?, say h(?) is h(MLE). This is not always true
of other kinds of estimators. To get the
variance of this function, we would use the delta
method. E.g., the MLE of ?(ß/s) is b/(ee/n)

26
Invariance
27
Reparameterizing the Log Likelihood
28
Estimating the Tobit Model
29
Computing the Asymptotic Variance

We want to estimate -EH-1 Three ways
(1) Just compute the negative of the actual
second derivatives matrix and invert it.
(2) Insert the maximum likelihood estimates into
the known expected values of the second
derivatives matrix. Sometimes (1) and (2) give
the same answer (for example, in the linear
regression model).
(3) Since EH is the variance of the first
derivatives, estimate this with the sample
variance (i.e., mean square) of the first
derivatives. This will almost always be
different from (1) and (2).
Since they are estimating the same thing, in
large samples, all three will give the same
answer. Current practice in econometrics often
favors (3). Stata rarely uses (3). Others do.

30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
Linear Regression Model

Example Different Estimators of the Variance of
the MLE
Consider, again, the gasoline data. We use a
simple equation
Gt ?1 ?2Yt ?3Pgt ?t.

34
Linear Model
35
(No Transcript)
36
BHHH Estimator
37
Newtons Method
38
Poisson Regression
39
(No Transcript)
40
Asymptotic Variance of the MLE
41
(No Transcript)
42
Estimators of the Asymptotic Covariance Matrix
43
ROBUST ESTIMATION

Sandwich Estimator
H-1 (GG) H-1
Is this appropriate? Why do we do this?

44
Application Doctor Visits

German Individual Health Care data N27,236
Model for number of visits to the doctor
Poisson regression (fit by maximum likelihood)
Income, Education, Gender

45
Poisson Regression Iterations
poisson lhs doctor rhs one,female,hhninc,e
ducmaroutput3 MethodNewton Maximum
iterations100 Convergence criteria gtHg
.1000D-05 chg.F .0000D00 maxdb
.0000D00 Start values .00000D00 .00000D00
.00000D00 .00000D00 1st derivs.
-.13214D06 -.61899D05 -.43338D05
-.14596D07 Parameters .28002D01
.72374D-01 -.65451D00 -.47608D-01 Itr 2 F
-.1587D06 gtHg .2832D03 chg.F .1587D06
maxdb .1346D01 1st derivs. -.33055D05
-.14401D05 -.10804D05 -.36592D06 Parameters
.21404D01 .16980D00 -.60181D00
-.48527D-01 Itr 3 F -.1115D06 gtHg .9725D02
chg.F .4716D05 maxdb .6348D00 1st derivs.
-.42953D04 -.15074D04 -.13927D04
-.47823D05 Parameters .17997D01
.27758D00 -.54519D00 -.49513D-01 Itr 4 F
-.1063D06 gtHg .1545D02 chg.F .5162D04
maxdb .1437D00 1st derivs. -.11692D03
-.22248D02 -.37525D02 -.13159D04 Parameters
.17276D01 .31746D00 -.52565D00
-.49852D-01 Itr 5 F -.1062D06 gtHg .5006D00
chg.F .1218D03 maxdb .6542D-02 1st derivs.
-.12522D00 -.54690D-02 -.40254D-01
-.14232D01 Parameters .17249D01
.31954D00 -.52476D00 -.49867D-01 Itr 6 F
-.1062D06 gtHg .6215D-03 chg.F .1254D00
maxdb .9678D-05 1st derivs. -.19317D-06
-.94936D-09 -.62872D-07 -.22029D-05 Parameters
.17249D01 .31954D00 -.52476D00
-.49867D-01 Itr 7 F -.1062D06 gtHg .9957D-09
chg.F .1941D-06 maxdb .1602D-10
Converged
46
Regression and Partial Effects
----------------------------------------------
------------------ Variable Coefficient
Standard Error b/St.Er.PZgtz Mean of
X -------------------------------------------
--------------------- Constant 1.72492985
.02000568 86.222 .0000 FEMALE
.31954440 .00696870 45.854 .0000
.47877479 HHNINC -.52475878
.02197021 -23.885 .0000 .35208362 EDUC
-.04986696 .00172872 -28.846 .0000
11.3206310 ------------------------------------
------- Partial derivatives of expected val.
with respect to the vector of
characteristics. Effects are averaged over
individuals. Observations used for means
are All Obs. Conditional Mean at Sample
Point 3.1835 Scale Factor for Marginal
Effects 3.1835 -------------------------------
------------ ---------------------------------
------------------------------- Variable
Coefficient Standard Error b/St.Er.PZgtz
Mean of X ------------------------------------
---------------------------- Constant
5.49135704 .07890083 69.598 .0000
FEMALE 1.01727755 .02427607 41.905
.0000 .47877479 HHNINC -1.67058263
.07312900 -22.844 .0000 .35208362 EDUC
-.15875271 .00579668 -27.387
.0000 11.3206310
47
Comparison of Standard Errors
Negative Inverse of Second Derivatives ---------
----------------------------------------------
--------- Variable Coefficient Standard
Error b/St.Er.PZgtz Mean of
X -------------------------------------------
--------------------- Constant 1.72492985
.02000568 86.222 .0000 FEMALE
.31954440 .00696870 45.854 .0000
.47877479 HHNINC -.52475878
.02197021 -23.885 .0000 .35208362 EDUC
-.04986696 .00172872 -28.846 .0000
11.3206310 BHHH -----------------------------
------------------------- Variable
Coefficient Standard Error b/St.Er.PZgtz
----------------------------------------------
-------- Constant 1.72492985
.00677787 254.495 .0000 FEMALE
.31954440 .00217499 146.918 .0000
HHNINC -.52475878 .00733328 -71.559
.0000 EDUC -.04986696 .00062283
-80.065 .0000
Why are they so different? Model failure. This
is a panel. There is autocorrelation.
48
Testing Hypotheses

Wald tests, using the familiar distance measure
Likelihood ratio tests
LogLU log likelihood without restrictions
LogLR log likelihood with restrictions
LogLU gt logLR for any nested restrictions
2(LogLU logLR) ? chi-squared J
The Lagrange multiplier test. Wald test of the
hypothesis that the score of the unrestricted log
likelihood is zero when evaluated at the
restricted estimator.

49
Testing the Model
---------------------------------------------
Poisson Regression
Maximum Likelihood Estimates
Dependent variable DOCVIS
Number of observations 27326
Iterations completed 7
Log likelihood function -106215.1 Log
likelihood Number of parameters
4 Restricted log likelihood
-108662.1 Log Likelihood with only a
McFadden Pseudo R-squared .0225193
constant term. Chi squared
4893.983 2logL logL(0) Degrees of
freedom 3 ProbChiSqd
gt value .0000000
---------------------------------------------
Likelihood ratio test that all three slopes are
zero.
50
Wald Test
--gt MATRIX List b1 b(24) v11
varb(24,24) B1'ltV11gtB1 Matrix B1
Matrix V11 has 3 rows and 1 columns.
has 3 rows and 3 columns 1
1 2
3 --------------
------------------------------------------ 1
.31954 1 .4856275D-04
-.4556076D-06 .2169925D-05 2 -.52476
2 -.4556076D-06 .00048
-.9160558D-05 3 -.04987 3
.2169925D-05 -.9160558D-05 .2988465D-05 Matrix
Result has 1 rows and 1 columns.
1 -------------- 1 4682.38779
LR statistic was 4893.983
51
LM Test

Hypothesis 3 slopes 0. MLE with all 3 slopes
0, ? y-bar exp(ß1), so MLE of ß1 is
log(y-bar). Constrained MLEs of other 3 slopes
are zero.

52
LM Statistic
--gt calc beta1log(xbr(docvis)) --gt matrix
bmle0beta1/0/0/0 --gt create lambda0
exp(x'bmle0) res0 docvis - lambda0 --gt
matrix list g0 x'res0 h0
x'lambda0x lm g0'lth0gtg0 Matrix G0
has 4 rows and 1 columns.
-------------- 1 .2664385D-08 2
7944.94441 3-1781.12219 4
-.3062440D05 Matrix H0 has 4 rows and 4
columns. --------------------------------
------------------------ 1 .8699300D05
.4165006D05 .3062881D05 .9848157D06
2 .4165006D05 .4165006D05 .1434824D05
.4530019D06 3 .3062881D05
.1434824D05 .1350638D05 .3561238D06
4 .9848157D06 .4530019D06 .3561238D06
.1161892D08 Matrix LM has 1 rows and 1
columns. -------------- 1
4715.41008 Wald was 4682.38779 LR statistic was
4893.983
53
Chow Style Test for Structural Change
54
Poisson Regressions
--------------------------------------------------
-------------------- Poisson Regression Dependent
variable DOCVIS Log likelihood
function -90878.20153 (Pooled, N 27326) Log
likelihood function -43286.40271 (Male, N
14243) Log likelihood function -46587.29002
(Female, N 13083) -----------------------------
---------------------------------------- Variable
Coefficient Standard Error b/St.Er. PZgtz
Mean of X -------------------------------------
-------------------------------- Pooled Constant
2.54579 .02797 91.015 .0000
AGE .00791 .00034 23.306
.0000 43.5257 EDUC -.02047
.00170 -12.056 .0000 11.3206
HSAT -.22780 .00133 -171.350
.0000 6.78543 HHNINC -.26255
.02143 -12.254 .0000 .35208
HHKIDS -.12304 .00796 -15.464
.0000 .40273 ------------------------------
--------------------------------------- Males Cons
tant 2.38138 .04053 58.763
.0000 AGE .01232 .00050
24.738 .0000 42.6528 EDUC
-.02962 .00253 -11.728 .0000
11.7287 HSAT -.23754 .00202
-117.337 .0000 6.92436 HHNINC
-.33562 .03357 -9.998 .0000
.35905 HHKIDS -.10728 .01166
-9.204 .0000 .41297 --------------------
-------------------------------------------------
Females Constant 2.48647 .03988
62.344 .0000 AGE .00379
.00048 7.940 .0000 44.4760
EDUC .00893 .00234 3.821
.0001 10.8764 HSAT -.21724
.00177 -123.029 .0000 6.63417
HHNINC -.22371 .02767 -8.084
.0000 .34450 HHKIDS -.14906
.01107 -13.463 .0000
.39158 ------------------------------------------
---------------------------
55
Chi Squared Test
Namelist X one,age,educ,hsat,hhninc,hhkids Sam
ple All Poisson Lhs Docvis Rhs X
Calc Lpool logl Poisson For female
0 Lhs Docvis Rhs X Calc Lmale
logl Poisson For female 1 Lhs Docvis
Rhs X Calc Lfemale logl Calc K
Col(X) Calc List Chisq
2(Lmale Lfemale - Lpool) Ctb(.95,k)
------------------------------------
Listed Calculator Results
------------------------------------ CHISQ
2009.017601 Result 12.591587 The
hypothesis that the same model applies to men
and women is rejected.

Write a Comment

User Comments (0)