Title: Limited Dependent Variables
1Limited Dependent Variables
- Lecture 2
- Main Reading Gujarati Chapter 15
2The goal of todays lecture
- To introduce binary choice models
- To discuss the problems with using OLS to
estimate binary choice models - To examine two binary models
- Logit Model
- Probit Model
- To work through some examples
3Introduction Levels of Measurement
4Categorical and Limited Dependent Variables
- Binary Variables
- Ordinal Variables
- Nominal Variables
- Censored Variables
- Count Variables
5Introduction 2
- Many economic variables are discrete (e.g.
employment status, marital status) rather than
continuous (e.g. income, government expenditure). - Discrete variables can be modeled by including a
dummy variable in the regression - Example a model of sexual discrimination
6- This model can be estimated using OLS. The
presence of a discrete variable on the right hand
side of the equation makes no difference to
estimation procedure - The discrete variable can also appear on the left
of the equation (i.e. as the dependent variable)
as in the following labour force participation
model
7There are three ways to estimate a model of this
type
- Linear Probability Model Use OLS
- Probit Model
- Logit Model
8Linear Probability Model (LPM)
- Transport Decisions
- Estimate by OLS
9Linear Probability Model
- A regression model with a dummy dependent
variable is a linear probability model - To see its properties note the following
- Since the mean error is zero, E(Yi) ?0 ?1X1i
?2X2i ?3X3i ? ?kXki - If Pi Prob(Yi 1) and 1 ? Pi Prob(Yi 0),
then E(Yi) 1 ? Pi 0 ? (1 ? Pi) Pi - The model is Pi ?0 ?1X1i ?2X2i ?3X3i ?
?kXki - The estimated slope coefficients tell the impact
of a unit change in that explanatory variable on
the probability that Y 1
10- We interpret as being the predicted
probability that person i drives to work and b as
being the marginal probability - Note individual i either drives or does not drive
(i.e. Yi is either zero or one) - The fitted value is interpreted as being either
- probability that a person with the same value
for X as person i would drive - or
- The proportion of people with the same X as
person i that would drive
11(No Transcript)
12There are a number problems with the LPM
- First, there is no guarantee that all of the
predicted probabilities will be between zero and
one. - Second the marginal effects dont make sense. b
cannot really be interpreted as the marginal
effect of cost on the probability of driving. - This is a serious problem for policy evaluation
13- Suppose that the coefficients estimated by OLS
are a0.5 and b0.2 - If the cost of public transport is 2, the fitted
value is 0.9 i.e. 90 of people will drive to
work - Evaluate the effect of increasing the cost of
public transport to 4. - b0.2 implies that each extra 1 will increase
the proportion of individuals driving by 20 - Thus the predicted portion driving is 130 which
is nonsense.
14Both problems are caused by the linear nature of
the model.
15Further problems with LPM
- R-Squared no longer a good measure of fit
- Since the dependent variable takes only two
values, the error term takes only two values - This implies that the errors can no longer be
viewed as normal - The errors are also heteroscedastic
- See Gujarati exercise 15.10
16We need a model where the probability never goes
above 1. The slope of the curve must diminish
as it gets closer to one. i.e. a non-linear
model
17Estimation
- Cannot use OLS, because the model is non-linear
- Use Maximum Likelihood (ML)
- The likelihood function is the probability that
an econometric model could generate the actual
data seen by the econometrician - By choosing the parameters of the model so that
the likelihood is a large as possible then we get
the ML estimates of the parameters of the model.
18- As a simple example, suppose there is a sample of
just three observations Drive, Drive, Bus. - i.e. Y11, Y21, Y30
- The probability that such an outcome could be
generated by the econometric model (i.e. the
likelihood) is - LProb(Y11 AND Y21 AND Y30)
- LProb(Y11)Prob(Y21)Prob(Y30)
19- Usually we take the log of the Likelihood (same
max) to get -
- lnLlnP(Y11)lnP(Y21)lnP(Y30)
- We can substitute in the expression for
probability in the Probit model that we got
earlier - Then we get the computer to try different values
for a and b. The values that maximize lnL are the
ML estimates of a and b.
20- In general, for a sample of N observations, the
log-likelihood of the sample will be
21Algorithm
- Choose starting values for the parameters of the
model i.e. a and b in this simple example.
(Could start with the LPM estimates or with
zeros) - Calculate the value of the log likelihood
- Evaluate F (a bXi) for every observation
- If yi1 then lnP(yi) lnF (a bXi),
- If yi0 then lnP(yi) ln1-F (a bXi)
- Calculate lnL by adding up all the lnP
22- Try another combination of parameters
- There are various different methods of deciding
which should be the next set of parameters to
try. - Keep repeating the procedure until cannot get a
higher lnL by choosing new parameters. - The set of parameters that generate the largest
lnL are known as Maximum Likelihood estimates
of the model. - A PC will run through this algorithm in seconds
even with thousands of observations and many
variables.
23Latent Variable Model
24Latent Variable Models
- The probit and logit can be motivated by basic
utility theoretic models
25Logit and Probit Models
- The likelihood function is
26Logit Model
- For the logit model we specify
- Prob(Yi 1) ? 0 as ?0 ?1X1i ? ??
- Prob(Yi 1) ? 1 as ?0 ?1X1i ? ?
- Thus, probabilities from the logit model will be
between 0 and 1
27Logit Model
- A complication arises in interpreting the
estimated ?s - With a linear probability model, a ? estimate
measures the ceteris paribus effect of a change
in the explanatory variable on the probability Y
equals 1 - In the logit model
28Probit Model
- Probit is a non-linear model
- The fitted value is guaranteed to between zero
and one - The marginal effect is such that model will never
predict a probability above 1. The marginal
affect of increasing the cost of transport will
be less when probability of driving is close to 1 - Non-linear implies cannot use OLS
29- More formally, we say that the probability that
Y1 (i.e. that an individual drives) is a
non-linear function, F, of the variables. - We choose the function to ensure that it has the
desired shape as in the previous diagram - In the case of Probit we use F, the cumulative
distribution function of a normal random
variable.
30Probit Model
- In the probit model, we assume the error in the
utility index model is normally distributed - ?i N(0,?2)
- Where F is the standard normal cumulative density
function (c.d.f.)
31Probit Model
- The c.d.f. of the logit and the probit look quite
similar - Calculating the derivative
- Where is the density function of the normal
distribution
32Probit Model
- The derivative is nonlinear
- Often evaluated at the mean of the explanatory
variables - Common to estimate the derivative as the
probability Y 1 when the dummy variable is 1
minus the probability Y 1 when the dummy
variable is 0 - Calculate how the predicted probability changes
when the dummy variable switches from 0 to 1
33- The mathematical expression for F is
- i.e. F(z) is the area under the normal density
curve
34- Because F is itself a probability distribution
function, all its values will be between zero and
one. - Therefore the estimated probabilities are
guaranteed to be between zero and one. - The function F has a shape similar to that in the
previous diagram - The marginal affect less when probability of is
close to 1 i.e. curve is flat close to 1 - Probit solves these two problems but creates two
others i) inconvenient expression for marginal
probabilities ii) cant estimate using OLS
35Marginal Probability
- Tempting to think that b is equal to the marginal
probability - This is not true for Probit precisely because it
is a non linear model - Because of the shape of the function F the
marginal probability will diminish as X
increases.
36(No Transcript)
37- The marginal probability is affected by b but it
is a non-linear function and is not equal to b. - Wrong to say b equal to 0.2 implies 20 increase
in travel by car for every 1 increase in bus - This is a consequence of ensuring that marginal
probability is low when probability is high and
vice-versa i.e. as in the diagram. - The marginal probability will have the same sign
as b. This is often all that we want. - Often report marginal probability evaluated at
the means
38Likelihood Ratio Test
- Cant use F-test ---- because there are no SSR
- LR test is the equivalent
- Intuition -- see if the restriction changes the
likelihood significantly - Test Statistic
- Critical Value c2 with d.f. equal to no. of
restrictions
39Empirical Examples
- Discrimination in loan approvals in the United
States - Non-Voting in Ireland