Title: Econometric Analysis
1Econometric Analysis
Week 9 Limited dependent variables models
2Lecture outline
-
- Binary choices and Limited Dependent Variables
models - The linear probability model
- logit and probit models
- Examples and results using PcGive
- Further comments on censored and truncated
regression models (Tobit analysis) and sample
selection bias, multinomial choice models
3References and recommended reading
-
- Wooldridge, J M (2006) Introductory Econometrics.
A Modern Approach. (Third Edition) Chapter 17 pp
582-595 - Greene, W H (2000) Econometric Analysis (Fourth
Edition) Chapter 19 pp 811-837 - Kennedy, P (2003) A Guide to Econometrics.
(Fifth Edition) Chapters 15 16 - Dougherty, C (2007) Introduction to Econometrics
(Third Edition) Chapter 10 - Spector, L C and Mazzeo, M (1980) Probit analysis
and Economic Education. Journal of Economic
Education Spring, pp 37-44 - Doornik, J A and Hendry, D F (2006) Empirical
Econometric Modelling PcGive Vol III, Chapters 5
6.
4Basics
- on occasions the variable that we are trying to
explain may be discrete rather than continuous - in the most basic case it is a binary,
dichotomous, dummy or qualitative variable in
other words it can take only one of two values
0 or 1 - examples (1) in employment/out of employment
(2) university educated/ not university educated
(3) pass test/fail test (4) owns home/does not
own home - we might wish to explain how observations fall
into each category for example in the labour
market case by linking the dependent variable to
explanatory variables like age, education,
marital status etc. - simple OLS regression will not really be
appropriate here although an early approach was
the linear probability model which is based on
OLS regression - today you are more likely to use either the
logit (sometimes called logistic) or probit
models, which make use respectively of the
logistic distribution or the cumulative normal
distribution to provide an S shaped curve linking
the two sets of points - more advanced work can extend the number of
values that the limited dependent variable can
take beyond two for example the five categories
on a Likert scale so called multinomial choice
variables - we wont cover these on this unit
5the Linear Probability Model (LPM)
- consider the simple case with one explanatory
variable X - in this model the predicted Y value denotes the
probability that the dependent variable takes a
value of 1 - so the probability of success
(Y1) is linearly related to the explanatory
variable X - the Y values can only be 0 or 1 so a straight
line fit through the points, as shown in figure
1, will result in predicted Y values outside the
range 0-1 - the residuals will also be heteroskedastic so
if we do use OLS we should use robust standard
errors to calculate t values - R squared has no meaning here whereas in the
continuous OLS case it is possible for all points
to lie on the regression line, here they cannot
as they must lie along one of the horizontal
lines at 0 and 1
6Binary Response Models logit and probit models
- These models make use of a squash function G
to ensure that the fitted values lie strictly
between 0 and 1 - Logit Model G follows a logistic distribution
- Probit Model G follows a cumulative normal
distribution - (see Wooldridge or Greene for the full algebraic
details) - The models intrinsically non-linear and so they
- are estimated using Maximum Likelihood
- procedures
7The shape of the logit and probit curves
8Partial responses in binary response models
- whereas in the LPM the marginal or partial effect
of change in one of the Xs on Y (?Y/?Xj) is
constant, for binary response models it will vary
over the curve I will give a detailed
derivation for the logistic function later - it is sometimes given in results tables as the
slope - calculated at the mean values of the X
variables
9Goodness of fit in binary response models
- You sometimes see count R2 which counts the
proportion of cases correctly predicted this is
not very helpful, particularly if the split
between 0 and 1 values for Y in the sample is
very uneven, where even a naïve model of
predicting a success for every case would come
out well. - An alternative measure called pseudo R-squared
given. This is calculated as 1 Lur/L0 where - Lur log-likelihood for the estimated model and
L0 log-likelihood for a model with an
intercept only (see Wooldridge pp589-590 and
Kennedy p267)
10More detail on the logit model
- Lets look at a simple case with just one
explanatory variable - the fitted values are kept between the limits of
0 and 1 - if we write
-
- then Y?1 as Z ? ? and Y ?0 as Z ? 0
11Yet more on the logit model
- For this function the partial response of Y to a
change in X1 turns out to be ?1Y(1-Y) -
- see proof on separate sheet
- The model is sometimes reformulated as the
log-odds model with - see derivation on separate sheet
12Example
- Greene (2000) illustrates the use of logit and
probit models with a data set from Spector and
Mazzeo (1980) which concerns the effectiveness of
a new method of teaching economics. The Spector
and Mazzeo data has information on the
performance of 32 students on the principles in
macroeconomics courses at Iowa University in the
spring semesters of 1974 and 1975. - The dependent variable GRADE is an indicator
of whether or not students passed a test in
principles of macroeconomics - The independent variables are
- GPA the students Grade Point Average prior to
taking the course - TUCE the result on a pre-entry Test of
Understanding in College Economics - PSI an indicator of whether or not the
student was taught using the new Personalised
System of Instruction rather than just in
lectures
13Model formulation in PcGive (1)
Category Models for discrete data Model class
Binary Discrete Choice using PcGive
14Model formulation in PcGive (2)
15Model formulation in PcGive (3)
Now choose the model logit or probit