Applied Econometrics - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

Applied Econometrics

Description:

This defines a class of estimators based on the particular ... The main advantage of ML estimators is that among all ... Converged. Regression and ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 56
Provided by: valued79
Category:

less

Transcript and Presenter's Notes

Title: Applied Econometrics


1
Applied Econometrics
  • William Greene
  • Department of Economics
  • Stern School of Business

2
Applied Econometrics
  • 18. Maximum Likelihood Estimation

3
Maximum Likelihood Estimation
  • This defines a class of estimators based on the
    particular distribution assumed to have generated
    the observed random variable.
  • The main advantage of ML estimators is that among
    all Consistent Asymptotically Normal Estimators,
    MLEs have optimal asymptotic properties.
  • The main disadvantage is that they are not
    necessarily robust to failures of the
    distributional assumptions. They are very
    dependent on the particular assumptions.
  • The oft cited disadvantage of their mediocre
    small sample properties is probably overstated in
    view of the usual paucity of viable alternatives.

4
Setting up the MLE
  • The distribution of the observed random variable
    is written as a function of the parameters to be
    estimated
  • P(yidata,ß) Probability density
    parameters.
  • The likelihood function is constructed from the
    density
  • Construction Joint probability density
    function of the observed sample of data
    generally the product when the data are a random
    sample.

5
Regularity Conditions
  • What they are
  • 1. logf(.) has three continuous derivatives wrt
    parameters
  • 2. Conditions needed to obtain expectations of
    derivatives are met. (E.g., range of the
    variable is not a function of the parameters.)
  • 3. Third derivative has finite expectation.
  • What they mean
  • Moment conditions and convergence. We need to
    obtain expectations of derivatives.
  • We need to be able to truncate Taylor series.
  • We will use central limit theorems

6
The MLE
  • The log-likelihood function log-L(?data)
  • The likelihood equation(s)
  • First derivatives of log-L equal zero at the
    MLE.
  • (1/n)Si ?logf(yi ?)/??MLE 0.
  • (Sample statistic.) (The 1/n is irrelevant.)
  • First order conditions for maximization
  • A moment condition - its counterpart is the
    fundamental result E?log-L/?? 0.
  • How do we use this result? An analogy principle.

7
Average Time Until Failure
  • Estimating the average time until failure, ?, of
    light bulbs. yi observed life until failure.
  • f(yi?)(1/?)exp(-yi/?)
  • L(?)?i f(yi?) ?-N exp(-Syi/?)
  • logL (?)-Nlog (?) - Syi/?
  • Likelihood equation
  • ?logL(?)/??-N/? Syi/?2 0
  • Note, ?logf(yi?)/?? -1/? yi/?2
  • Since Eyi ?, E?logf(?)/??0.
    (Regular)

8
Properties of the Maximum Likelihood Estimator
  • We will sketch formal proofs of these results
  • The log-likelihood function, again
  • The likelihood equation and the information
    matrix.
  • A linear Taylor series approximation to the first
    order conditions
  • g(?ML) 0 ? g(?) H(?) (?ML - ?)
  • (under regularity, higher order terms will
    vanish in large samples.)
  • Our usual approach. Large sample behavior of the
    left and right hand sides is the same.
  • A Proof of consistency. (Property 1)
  • The limiting variance of ?n(?ML - ?). We are
    using the central limit theorem here.
  • Leads to asymptotic normality (Property 2). We
    will derive the asymptotic variance of the MLE.
  • Efficiency (we have not developed the tools to
    prove this.) The Cramer-Rao lower bound for
    efficient estimation (an asymptotic version of
    Gauss-Markov).
  • Estimating the variance of the maximum likelihood
    estimator.
  • Invariance. (A VERY handy result.) Coupled with
    the Slutsky theorem and the delta method, the
    invariance property makes estimation of nonlinear
    functions of parameters very easy.

9
Testing Hypotheses A Trinity of Tests
  • The likelihood ratio test
  • Based on the proposition (Greenes) that
    restrictions always make life worse
  • Is the reduction in the criterion
    (log-likelihood) large? Leads to the LR test.
  • The Lagrange multiplier test
  • Underlying basis Reexamine the first order
    conditions.
  • Form a test of whether the gradient is
    significantly nonzero at the restricted
    estimator.
  • The Wald test The usual.

10
The Linear (Normal) Model
  • Definition of the likelihood function - joint
    density of the observed data, written as a
    function of the parameters we wish to estimate.
  • Definition of the maximum likelihood estimator as
    that function of the observed data that maximizes
    the likelihood function, or its logarithm.
  • For the model yi ??xi ?i, where ?i
    N0,?2,
  • the maximum likelihood estimators of ? and ?2
    are
  • b (X?X)-1X?y and s2 e?e/n.
  • That is, least squares is ML for the slopes, but
    the variance estimator makes no degrees of
    freedom correction, so the MLE is biased.

11
Normal Linear Model
  • The log-likelihood function
  • ?i log f(yi?)
  • sum of logs of densities.
  • For the linear regression model with normally
    distributed disturbances
  • log-L ?i - ½log2?
  • -½log?2
  • - ½(yi xi??)2/?2 .

12
Likelihood Equations
  • The estimator is defined by the function of the
    data that equates
  • ?log-L/?? to 0. (Likelihood equation)
  • The derivative vector of the log-likelihood
    function is the score function. For the
    regression model,
  • g ?log-L/?? , ?log-L/??2
  • ?log-L/?? ?i (1/?2)xi(yi - xi??)
  • ?log-L/??2 ?i -1/(2?2) (yi -
    xi??)2/(2?4)
  • For the linear regression model, the first
    derivative vector of log-L is
  • (1/?2)X?(y - X?) and (1/2?2) ?i (yi -
    xi??)2/?2 - 1
  • (K?1)
    (1?1)
  • Note that we could compute these functions at any
    ? and ?2. If we compute them at b and e?e/n, the
    functions will be identically zero.

13
Moment Equations
  • Note that g ?i gi is a random vector and
    that each term in the sum has expectation zero.
    It follows that E(1/n)g 0. Our estimator
    is found by finding the ? that sets the sample
    mean of the gs to 0. That is, theoretically,
    Egi(?,?2) 0. We find the estimator as that
    function which produces (1/n)?i gi(b ,s2) 0.
  • Note the similarity to the way we would estimate
    any mean. If
  • Exi ?, then Exi - ? 0. We
    estimate ? by finding the function of the data
    that produces (1/n)?i (xi - m) 0, which is,
    of course the sample mean.
  • There are two main components to the regularity
    conditions for maximum likelihood estimation.
    The first is that the first derivative has
    expected value 0. That moment equation
    motivates the MLE

14
Information Matrix
  • The negative of the second derivatives matrix of
    the log-likelihood,
  • -H
  • is called the information matrix. It is usually
    a random matrix, also. For the linear regression
    model,

15
Hessian for the Linear Model
Note that the off diagonal elements have
expectation zero.
16
Estimated Information Matrix
This can be computed at any vector ? and scalar
?2. You can take expected values of the parts
of the matrix to get
  • (which should look familiar). The off
    diagonal terms go to zero (one of the assumptions
    of the model).

17
Deriving the Properties of the Maximum Likelihood
Estimator
18
The MLE
19
Consistency
20
Consistency Proof
21
Asymptotic Variance
22
Asymptotic Variance
23
Asymptotic Distribution
24
Other Results 1 Variance Bound
25
Invariance
  • The maximum likelihood estimator of a function of
    ?, say h(?) is h(MLE). This is not always true
    of other kinds of estimators. To get the
    variance of this function, we would use the delta
    method. E.g., the MLE of ?(ß/s) is b/(ee/n)

26
Invariance
27
Reparameterizing the Log Likelihood
28
Estimating the Tobit Model
29
Computing the Asymptotic Variance
  • We want to estimate -EH-1 Three ways
  • (1) Just compute the negative of the actual
    second derivatives matrix and invert it.
  • (2) Insert the maximum likelihood estimates into
    the known expected values of the second
    derivatives matrix. Sometimes (1) and (2) give
    the same answer (for example, in the linear
    regression model).
  • (3) Since EH is the variance of the first
    derivatives, estimate this with the sample
    variance (i.e., mean square) of the first
    derivatives. This will almost always be
    different from (1) and (2).
  • Since they are estimating the same thing, in
    large samples, all three will give the same
    answer. Current practice in econometrics often
    favors (3). Stata rarely uses (3). Others do.

30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
Linear Regression Model
  • Example Different Estimators of the Variance of
    the MLE
  • Consider, again, the gasoline data. We use a
    simple equation
  • Gt ?1 ?2Yt ?3Pgt ?t.

34
Linear Model
35
(No Transcript)
36
BHHH Estimator
37
Newtons Method
38
Poisson Regression
39
(No Transcript)
40
Asymptotic Variance of the MLE
41
(No Transcript)
42
Estimators of the Asymptotic Covariance Matrix
43
ROBUST ESTIMATION
  • Sandwich Estimator
  • H-1 (GG) H-1
  • Is this appropriate? Why do we do this?

44
Application Doctor Visits
  • German Individual Health Care data N27,236
  • Model for number of visits to the doctor
  • Poisson regression (fit by maximum likelihood)
  • Income, Education, Gender

45
Poisson Regression Iterations
poisson lhs doctor rhs one,female,hhninc,e
ducmaroutput3 MethodNewton Maximum
iterations100 Convergence criteria gtHg
.1000D-05 chg.F .0000D00 maxdb
.0000D00 Start values .00000D00 .00000D00
.00000D00 .00000D00 1st derivs.
-.13214D06 -.61899D05 -.43338D05
-.14596D07 Parameters .28002D01
.72374D-01 -.65451D00 -.47608D-01 Itr 2 F
-.1587D06 gtHg .2832D03 chg.F .1587D06
maxdb .1346D01 1st derivs. -.33055D05
-.14401D05 -.10804D05 -.36592D06 Parameters
.21404D01 .16980D00 -.60181D00
-.48527D-01 Itr 3 F -.1115D06 gtHg .9725D02
chg.F .4716D05 maxdb .6348D00 1st derivs.
-.42953D04 -.15074D04 -.13927D04
-.47823D05 Parameters .17997D01
.27758D00 -.54519D00 -.49513D-01 Itr 4 F
-.1063D06 gtHg .1545D02 chg.F .5162D04
maxdb .1437D00 1st derivs. -.11692D03
-.22248D02 -.37525D02 -.13159D04 Parameters
.17276D01 .31746D00 -.52565D00
-.49852D-01 Itr 5 F -.1062D06 gtHg .5006D00
chg.F .1218D03 maxdb .6542D-02 1st derivs.
-.12522D00 -.54690D-02 -.40254D-01
-.14232D01 Parameters .17249D01
.31954D00 -.52476D00 -.49867D-01 Itr 6 F
-.1062D06 gtHg .6215D-03 chg.F .1254D00
maxdb .9678D-05 1st derivs. -.19317D-06
-.94936D-09 -.62872D-07 -.22029D-05 Parameters
.17249D01 .31954D00 -.52476D00
-.49867D-01 Itr 7 F -.1062D06 gtHg .9957D-09
chg.F .1941D-06 maxdb .1602D-10
Converged
46
Regression and Partial Effects
----------------------------------------------
------------------ Variable Coefficient
Standard Error b/St.Er.PZgtz Mean of
X -------------------------------------------
--------------------- Constant 1.72492985
.02000568 86.222 .0000 FEMALE
.31954440 .00696870 45.854 .0000
.47877479 HHNINC -.52475878
.02197021 -23.885 .0000 .35208362 EDUC
-.04986696 .00172872 -28.846 .0000
11.3206310 ------------------------------------
------- Partial derivatives of expected val.
with respect to the vector of
characteristics. Effects are averaged over
individuals. Observations used for means
are All Obs. Conditional Mean at Sample
Point 3.1835 Scale Factor for Marginal
Effects 3.1835 -------------------------------
------------ ---------------------------------
------------------------------- Variable
Coefficient Standard Error b/St.Er.PZgtz
Mean of X ------------------------------------
---------------------------- Constant
5.49135704 .07890083 69.598 .0000
FEMALE 1.01727755 .02427607 41.905
.0000 .47877479 HHNINC -1.67058263
.07312900 -22.844 .0000 .35208362 EDUC
-.15875271 .00579668 -27.387
.0000 11.3206310
47
Comparison of Standard Errors
Negative Inverse of Second Derivatives ---------
----------------------------------------------
--------- Variable Coefficient Standard
Error b/St.Er.PZgtz Mean of
X -------------------------------------------
--------------------- Constant 1.72492985
.02000568 86.222 .0000 FEMALE
.31954440 .00696870 45.854 .0000
.47877479 HHNINC -.52475878
.02197021 -23.885 .0000 .35208362 EDUC
-.04986696 .00172872 -28.846 .0000
11.3206310 BHHH -----------------------------
------------------------- Variable
Coefficient Standard Error b/St.Er.PZgtz
----------------------------------------------
-------- Constant 1.72492985
.00677787 254.495 .0000 FEMALE
.31954440 .00217499 146.918 .0000
HHNINC -.52475878 .00733328 -71.559
.0000 EDUC -.04986696 .00062283
-80.065 .0000
Why are they so different? Model failure. This
is a panel. There is autocorrelation.
48
Testing Hypotheses
  • Wald tests, using the familiar distance measure
  • Likelihood ratio tests
  • LogLU log likelihood without restrictions
  • LogLR log likelihood with restrictions
  • LogLU gt logLR for any nested restrictions
  • 2(LogLU logLR) ? chi-squared J
  • The Lagrange multiplier test. Wald test of the
    hypothesis that the score of the unrestricted log
    likelihood is zero when evaluated at the
    restricted estimator.

49
Testing the Model
---------------------------------------------
Poisson Regression
Maximum Likelihood Estimates
Dependent variable DOCVIS
Number of observations 27326
Iterations completed 7
Log likelihood function -106215.1 Log
likelihood Number of parameters
4 Restricted log likelihood
-108662.1 Log Likelihood with only a
McFadden Pseudo R-squared .0225193
constant term. Chi squared
4893.983 2logL logL(0) Degrees of
freedom 3 ProbChiSqd
gt value .0000000
---------------------------------------------
Likelihood ratio test that all three slopes are
zero.
50
Wald Test
--gt MATRIX List b1 b(24) v11
varb(24,24) B1'ltV11gtB1 Matrix B1
Matrix V11 has 3 rows and 1 columns.
has 3 rows and 3 columns 1
1 2
3 --------------
------------------------------------------ 1
.31954 1 .4856275D-04
-.4556076D-06 .2169925D-05 2 -.52476
2 -.4556076D-06 .00048
-.9160558D-05 3 -.04987 3
.2169925D-05 -.9160558D-05 .2988465D-05 Matrix
Result has 1 rows and 1 columns.
1 -------------- 1 4682.38779
LR statistic was 4893.983
51
LM Test
  • Hypothesis 3 slopes 0. MLE with all 3 slopes
    0, ? y-bar exp(ß1), so MLE of ß1 is
    log(y-bar). Constrained MLEs of other 3 slopes
    are zero.

52
LM Statistic
--gt calc beta1log(xbr(docvis)) --gt matrix
bmle0beta1/0/0/0 --gt create lambda0
exp(x'bmle0) res0 docvis - lambda0 --gt
matrix list g0 x'res0 h0
x'lambda0x lm g0'lth0gtg0 Matrix G0
has 4 rows and 1 columns.
-------------- 1 .2664385D-08 2
7944.94441 3-1781.12219 4
-.3062440D05 Matrix H0 has 4 rows and 4
columns. --------------------------------
------------------------ 1 .8699300D05
.4165006D05 .3062881D05 .9848157D06
2 .4165006D05 .4165006D05 .1434824D05
.4530019D06 3 .3062881D05
.1434824D05 .1350638D05 .3561238D06
4 .9848157D06 .4530019D06 .3561238D06
.1161892D08 Matrix LM has 1 rows and 1
columns. -------------- 1
4715.41008 Wald was 4682.38779 LR statistic was
4893.983
53
Chow Style Test for Structural Change
54
Poisson Regressions
--------------------------------------------------
-------------------- Poisson Regression Dependent
variable DOCVIS Log likelihood
function -90878.20153 (Pooled, N 27326) Log
likelihood function -43286.40271 (Male, N
14243) Log likelihood function -46587.29002
(Female, N 13083) -----------------------------
---------------------------------------- Variable
Coefficient Standard Error b/St.Er. PZgtz
Mean of X -------------------------------------
-------------------------------- Pooled Constant
2.54579 .02797 91.015 .0000
AGE .00791 .00034 23.306
.0000 43.5257 EDUC -.02047
.00170 -12.056 .0000 11.3206
HSAT -.22780 .00133 -171.350
.0000 6.78543 HHNINC -.26255
.02143 -12.254 .0000 .35208
HHKIDS -.12304 .00796 -15.464
.0000 .40273 ------------------------------
--------------------------------------- Males Cons
tant 2.38138 .04053 58.763
.0000 AGE .01232 .00050
24.738 .0000 42.6528 EDUC
-.02962 .00253 -11.728 .0000
11.7287 HSAT -.23754 .00202
-117.337 .0000 6.92436 HHNINC
-.33562 .03357 -9.998 .0000
.35905 HHKIDS -.10728 .01166
-9.204 .0000 .41297 --------------------
-------------------------------------------------
Females Constant 2.48647 .03988
62.344 .0000 AGE .00379
.00048 7.940 .0000 44.4760
EDUC .00893 .00234 3.821
.0001 10.8764 HSAT -.21724
.00177 -123.029 .0000 6.63417
HHNINC -.22371 .02767 -8.084
.0000 .34450 HHKIDS -.14906
.01107 -13.463 .0000
.39158 ------------------------------------------
---------------------------
55
Chi Squared Test
Namelist X one,age,educ,hsat,hhninc,hhkids Sam
ple All Poisson Lhs Docvis Rhs X
Calc Lpool logl Poisson For female
0 Lhs Docvis Rhs X Calc Lmale
logl Poisson For female 1 Lhs Docvis
Rhs X Calc Lfemale logl Calc K
Col(X) Calc List Chisq
2(Lmale Lfemale - Lpool) Ctb(.95,k)
------------------------------------
Listed Calculator Results
------------------------------------ CHISQ
2009.017601 Result 12.591587 The
hypothesis that the same model applies to men
and women is rejected.
Write a Comment
User Comments (0)
About PowerShow.com