Dummy Variables and Truncated Variables

About This Presentation

Title:

Dummy Variables and Truncated Variables

Description:

The first example is a study of the determinants of automobile prices. ... Thus one might conclude that the sales elasticity of demand for cash is close to ... – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 89

Provided by: ACE5135

Category:

more less

Transcript and Presenter's Notes

Title: Dummy Variables and Truncated Variables

1
Chapter 8

Dummy Variables and Truncated Variables

2
What is in this Chapter?

This chapter relaxes the assumption made in
Chapter 4 that the variables in the regression
are observed as continuous variables.
Differences in intercepts and/or slope
coefficients
The linear probability model and the logit and
probit models.
Truncated variables, Tobit models

3
8.1 Introduction

The variables we will be considering are
1.Dummy variables.
2.Truncated variables.
They can be used to
1.Allow for differences in intercept terms.
2.Allow for differences in slopes.
3.Estimate equations with cross-equation
restrictions.
4.Test for stability of regression coefficients.

4
8.2 Dummy Variables for Changes in the Intercept
Term

Note that the slopes of the regression lines for
both groups are roughly the same but the
intercepts are different.
Hence the regression equations we fit will be

5
8.2 Dummy Variables for Changes in the Intercept
Term

These equations can be combined into a single
equation
where
The variable D is the dummy variable.
The coefficient of the dummy variable measures
the difference in the two intercept terms

6
8.2 Dummy Variables for Changes in the Intercept
Term
7
8.2 Dummy Variables for Changes in the Intercept
Term

If there are more groups, we have to introduce
more dummies.
For three groups we have
These can be written as
where

8
8.2 Dummy Variables for Changes in the Intercept
Term

As yet another example, suppose that we have data
on consumption C and income Y for a number of
households.
In addition, we have data on
S the sex of the head of the household.
A the age of the head of the household, which is
given in three categories, lt25 years, 25 to 50
year, and gt50 years.
E the education of the head of the household,
also in three categories, lthigh school, ?high
school but lt college degree, ? college degree.

9
8.2 Dummy Variables for Changes in the Intercept
Term

We include these qualitative variable in the form
of dummy variables

10
8.2 Dummy Variables for Changes in the Intercept
Term

For each category the number of dummy variables
is one less than the number of classifications.
Then we run the regression equation
The assumption made in the dummy variable method
is that it is the intercept that changes for each
group but not the slope coefficients (i.e.
coefficients of Y).

11
8.2 Dummy Variables for Changes in the Intercept
Term

The intercept term for each individual is
obtained by substituting the appropriate values
for D1 through D5.
For instance, for a male, age lt 25, with a
college degree, we have D1 1, D2 1, D3 0, D4
0, D5 0 and hence the intercept is
.
For a female, age gt 50, with a college degree, we
have D1 0, D2 0, D3 0, D4 0, D5 0 and hence
the intercept is a.

12
8.2 Dummy Variables for Changes in the Intercept
Term

The dummy variable method is also used if one has
to take care of seasonal factors.
For example, if we have quarterly data on C and
Y, we fit the regression equation

13
8.2 Dummy Variables for Changes in the Intercept
Term

If we have monthly data, we use 11seasonal
dummies
If we feel that, say, December (because of
Christmas shopping) is the only moth with strong
seasonal effect, we use only one dummy variable

14
8.2 Dummy Variables for Changes in the Intercept
Term

Two More Illustrative Examples
We will discuss two more examples using dummy
variables.
They are meant to illustrate two points worth
noting, which are as follows
1. In some studies with a large number of dummy
variables it becomes somewhat difficult to
interpret the signs of the coefficients because
they seem to have the wrong signs. (The first
example)
2. Sometimes the introduction of dummy variables
produces a drastic change in the slope
coefficient. (The second example)

15
8.2 Dummy Variables for Changes in the Intercept
Term

The first example is a study of the determinants
of automobile prices.
Griliches regressed the logarithm of new
passenger car prices on various specifications.
The results are shown in Table 8.1
Since the dependent variable is the logarithm of
price, the regression coefficients can be
interpreted as the estimated percentage change in
the price for a unit change in a particular
quality, holding other qualities constant
For example, the coefficient of H indicates that
an increase in 10 units of horsepower, results in
a 1.2 increase in price

16
(No Transcript)
17
8.2 Dummy Variables for Changes in the Intercept
Term

However, some of the coefficients have to be
interpreted with caution
For example, the coefficient of P in the
equation for 1960 says that the presence of power
steering as "standard equipment" led to a 22.5
higher price in 1960
In this case the variable P is obviously not
measuring the effect of power steering alone but
is measuring the effect of "luxuriousness" of the
car
It is also picking up the effects of A and B.
This explains why the coefficient of A is so low
in 1960. In fact. A, P, and B together can
perhaps be replaced by a single dummy that
measures "luxuriousness." These variables appear
to be highly intercorrelated

18
8.2 Dummy Variables for Changes in the Intercept
Term

Another coefficient, at first sight puzzling, is
the coefficient of V, which, though not
significant, is consistently negative
Though a V-8 costs more than a six-cylinder
engine on a "comparable" car, what this
coefficients says is that, holding horsepower and
other variables constant, a V-8 is cheaper by
about 4
Since the V-8's have higher horsepower, what this
coefficient is saying is that higher horse power
can be achieved more cheaply if one shifts to V-8
than by using the six-cylinder engine

19
8.2 Dummy Variables for Changes in the Intercept
Term

It measures the decline in price per horsepower
as one shifts to V-8's even though the total
expenditure on horsepower goes up
This example illustrates the use of dummy
variables and the interpretation of seemingly
wrong coefficients

20
8.2 Dummy Variables for Changes in the Intercept
Term

As another example consider the estimates of
liquid-asset demand by manufacturing
corporations
Vogel and Maddala computed regressions of the
form log C a ß log S, where C is the cash and S
the sales, on the basis of data from the Internal
Revenue Service, "Statistics of Income," for the
year 1960-1961.
The data consisted of 16 industry subgroups and
14 size classes, size being measured by total
assets.

21
8.2 Dummy Variables for Changes in the Intercept
Term

The equations were estimated separately for each
industry, the estimates of ß ranged from 0.929 to
1.077.
The R2s were uniformly high, ranging from 0.985
to 0.998.
Thus one might conclude that the sales elasticity
of demand for cash is close to 1.
Also, when the data were pooled and a single
equation estimated for the entire set of 224
observations, the estimate of ß was 0.992 and
R20.897.

22
8.2 Dummy Variables for Changes in the Intercept
Term

When industry dummies were added, the estimate of
ß was 0.995 and R20.992.
From the high R2s and relatively constant
estimate of ß one might be reassured that the
sales elasticity is very close to 1.
However, when asset-size dummies were introduced,
the estimate of ß fell to 0.334 with R2 of 0.996.
Also, all asset-size dummies were highly
significant.

23
8.2 Dummy Variables for Changes in the Intercept
Term

The situation is described in Figure 8.2.
That the sales elasticity is significantly less
than 1 is also confirmed by other evidence.
This example illustrates how one can be very
easily misled by high R2s and apparent constancy
of the coefficients.

24
8.2 Dummy Variables for Changes in the Intercept
Term
25
8.3 Dummy Variables for Changes in Slope
Coefficients

and
We can write these equations together as
or

26
8.3 Dummy Variables for Changes in Slope
Coefficients

where for all observations in the first
group
for all observations in the
second group
for all observations in the
first group
i.e., the respective value of
x for the second group
The coefficient of D1 measures the difference in
the intercept terms and coefficient of D2
measures the difference in the slope.

27
8.3 Dummy Variables for Changes in Slope
Coefficients

Suitable dummy variables can be defined when
there are change in slopes and intercepts at
different times.
Suppose that we have data for three periods and
in the second period only the intercept changed (
there was a parallel shift).
In the third period the intercept and the slope
changed.

28
8.3 Dummy Variables for Changes in Slope
Coefficients

Then we write
Then we can combine these equations and write the
model as

29
8.3 Dummy Variables for Changes in Slope
Coefficients
30
8.3 Dummy Variables for Changes in Slope
Coefficients

Note that in all these examples e are assuming
that the error terms in the different groups all
have the same distribution.
That is why we combine the data from the
different groups and write an error term u as in
(8.4) or (8.6) and estimate the equation by least
squares.

31
8.3 Dummy Variables for Changes in Slope
Coefficients

An alternative way of writing the equations
(8.5), which is very general, is to stack the y
variables and the error terms in columns.
Then write all the parameters a1, a2 , a3 , ß1 ,
ß2 down with their multiplicative factors stacked
in columns as follows

32
8.3 Dummy Variables for Changes in Slope
Coefficients

What this says is
where ( ) is used for multiplication, e.g.,
a3(0)a30.

33
8.3 Dummy Variables for Changes in Slope
Coefficients

where the definitions of D1, D2, D3, D4, D5
are clear from equation(8.7).
For instance,

34
8.3 Dummy Variables for Changes in Slope
Coefficients

Note that equation (8.8) has to be estimated
without a constant term.
In this method we define as many dummy variables
as there are parameters to estimate and we
estimate the regression equation with no constant
term.
Note that equations (8.6) and (8.8) are
equivalent.

35
8.7 Dummy Dependent Variables

Until now we have been considering models where
the explanatory variables are dummy variables.
We now discuss models where the explained
variable is a dummy variable.
This dummy variable can take on two or more
values but we consider here the case where it
takes on only two values, zero or 1.
Considering the other cases is beyond the scope
of this book. Since the dummy variable takes on
two values, it is called a dichotomous variable
There are numerous examples of dichotomous
explained variables.

36
8.7 Dummy Dependent Variables

There are several methods to analyze regression
model where the dependent variable is a zero or
1 variable.
The simplest procedure is to just use the usual
least squares method.
In this case the model is called the linear
probability model.

37
8.7 Dummy Dependent Variables

Another method, called the linear discriminate
function, is related to the linear probability
model.
The other alternative is to say that there is an
underlying or latent variable y which we do not
observe.
What we observe is
This is the idea behind the logit and probit
models.
First we these methods and then give an
illustrative example.

38
8.8 The Linear Probability Model and the Linear
Discriminant Function

The Linear Probability Model
Similarly, in an analysis of bankruptcy of firms,
we define
We write the model in the usual regression
framework as
with E(ui)0.

39
8.8 The Linear Probability Model and the Linear
Discriminant Function

The condition expectation is equal
to .
This has to be interpreted in this case as the
probability that the even will occur given the
xi.
The calculated value if y from the regression
equation (i.e., ) will then give the

40
8.8 The Linear Probability Model and the Linear
Discriminant Function

Since yi takes the value 1 or zero, the errors in
equation (8.11) can take only two values, (1-ßxi)
and (-ßxi).
Also, with the interpretation we have given
equation (8.11), and the requirement that
E(ui)0, the respective probabilities of these
events are ßxi and (1-ßxi).
Thus we have

41
8.8 The Linear Probability Model and the Linear
Discriminant Function

Hence

42
8.8 The Linear Probability Model and the Linear
Discriminant Function

Because of this heteroskedasticity problem the
OLS estimates of ß from equation (8.11) will not
be efficient.
We use the following two-step procedure
First estimate (8.11) by least squares.
Net compute and use weighted least
squares, that is, defining
We regress

43
8.8 The Linear Probability Model and the Linear
Discriminant Function

The problems with this procedure are
in practice may be negative,
although in large samples this will be so with a
very small probability since is a
consistent estimator for
.
Since the error ui are obviously not normally
distributed, there is a problem with the
application of the usual tests of significance.
As we will see in the next section, on the linear
discriminant function, they can be justified only
under the assumption that the explanatory
variables have a multivariate normal distribution.

44
8.8 The Linear Probability Model and the Linear
Discriminant Function

The most important criticism is with the
formulation itself that the conditional
expectation be interpreted as the
probability that the even will occur. In many
case cases lie outside the limits (0,
1).
The limitations of the linear probability model
are shown in Figure 8.3, which shows the bunching
up of points along y0 and y1.
The predicted values can easily lie outside the
interval (0, 1) and the prediction errors can be
very large.

45
8.8 The Linear Probability Model and the Linear
Discriminant Function
46
8.8 The Linear Probability Model and the Linear
Discriminant Function

The Linear Discriminant Function
Suppose that we have n individuals for whom we
have observations on k explanatory variables and
we observe that n1 of them belong to a second
group where n1n2n.
We want to construct a linear function of the k
variables that we can use to predict that a new
observation belongs to one of the twp groups.
This linear function is called the linear
discriminant function.

47
8.8 The Linear Probability Model and the Linear
Discriminant Function

As an example suppose that we have data on a
number of loan applicants and we observe that n1
of them were granted loans and n2 of them were
denied loans.
We also have the socioeconomic characteristics on
the applicants

48
8.8 The Linear Probability Model and the Linear
Discriminant Function

Let us define a linear function
Then it is intuitively clear that to get the best
discrimination between the two groups, we would
want to choose the that the ratio

49
8.8 The Linear Probability Model and the Linear
Discriminant Function

Fisher suggested an analogy between this problem
and multiple regression analysis.
He suggested that we define a dummy variable

50
8.8 The Linear Probability Model and the Linear
Discriminant Function

Now estimate the multiple regression equation
Get the residual sum of squares RSS.
Then
Thus, once we have the regression coefficients
and residual sum of squares from the dummy
dependent variable regression, we can very easily
obtain the discriminant function coefficients.

51
Discriminant Analysis

Discriminant analysis attempts to classify
customers into two groups
those that will default
those that will not
It does this by assigning a score to each
customer
The score is the weighted sum of the customer
data

52
Discriminant Analysis

Here, wi is the weight on data type i, and Xi,c,
is one piece of customer data.
The values for the weights are chosen to maximize
the difference between the average score of the
customers that later defaulted and the average
score of the customers who did not default

53
Discriminant Analysis

The actual optimization process to find the
weights is quite complex
The most famous discriminant scorecard is
Altman's Z Score.
For publicly owned manufacturing firms, the Z
Score was found to be as follows

54
Discriminant Analysis
55
Discriminant Analysis

A company scoring less than 1.81 was "very
likely" to go bankrupt later
A company scoring more than 2.99 was "unlikely"
to go bankrupt.
The scores in between were considered inconclusive

56
Discriminant Analysis

This approach has been adopted by many banks.
Some banks use the equation exactly as it was
created by Altman
But, most use Altman's approach on their own
customer data to get scoring models that are
tailored to the bank
To obtain the probability of default from the
scores, we group companies according their scores
at the beginning of a year, and then calculate
the percentage of companies within each group who
defaulted by the end of the year

57
8.9 The Probit and Logit Models

An alternative approach is to assume that we have
a regression model
where is not observed.
It is commonly called a latent variable.
What we observe is a dummy variable yi defined by

58
8.9 The Probit and Logit Models

The probit and logit model differ in the
specification of the distribution of the error
term u in equation (8.12).
The difference between the specification (8.12)
and the linear probability model is that in the
linear probability model we analyze the
dichotomous variables as they are, whereas in
(8.12) we assume the existence of an underlying
latent variable for which we observe a
dichotomous realization.

59
8.9 The Probit and Logit Models

For instance, if the observed dummy variable is
whether or not the person is employed,
would be defined as propensity or ability to
find employment.
Similarly, if the observed dummy variable is
whether or not the person has bought a car, then
would be defined as desire or ability to
buy a car.
Note that in both the examples we have given,
there is desire and ability involved.
Thus the explanatory variables in (8.12) would
con tain variables that explain both these
elements.

60
8.9 The Probit and Logit Models

Note from system(8.13) that multiplying by
any positive constant does not change yi.
Hence if we observe yi, we can estimate the ßs
in (8.12) only up to positive multiple.
Hence it is customary to assume var(ui)1.
This fixed the scale of .
From the relationship (8.12) and (8.13) we get

61
8.9 The Probit and Logit Models

where F is the cumulative distribution
function of u.

62
8.9 The Probit and Logit Models

If the distribution of u is symmetric, since
1-F(-Z), we can write
Since the observed yi are just realization of a
binomial process with probabilities given by
equation (8.14) and varying from trial to trial
(depend on zij), we can write the likelihood
function as

63
8.9 The Probit and Logit Models

If the cumulative distribution of ui is logistic
we have what is known as the logit model.
In this case
Hence
Note that for logit model

64
8.9 The Probit and Logit Models

If the errors ui in (8.12) follow a normal
distribution, we have the probit model (it should
more appropriately be called the normit model,
but the word probit was used in the biometrics
literature).
In this case

65
8.9 The Probit and Logit Models

Maximization of the likelihood function (8.15)
for either the probit or the logit model is
accomplished by nonlinear estimation methods.
There are now several computer programs available
for probit and logit analysis, and these programs
are very inexpensive to run.

66
8.9 The Probit and Logit Models

Illustrative Example
As an illustration, we consider data on a sample
of 750 mortgage applications in the Columbia, SC,
metropolitan area.
There were 500 loan applications accepted and 250
loan applications rejected.
We define

67
8.9 The Probit and Logit Models

Three model were estimated the linear
probability model, the logit model, and the
probit model.
The explanatory variables were
AI applicants and coapplicants income (103
dollars)
XMDdebt minus mortgage payment (103 dollars)
DFdummy variable,1 for female, 0 for male
DRdummy variable,1 for nonwhite, 0 for white
DSdummy variable,1 for single, 0 for
otherwise
DAage of house (102 dollars)

68
8.9 The Probit and Logit Models

NNWP percent nonwhite in the neighborhood
(103)
NMFIneighborhood mean family income
(105dollars)
NAneighborhood average age of house (102
years)
The results are presented in Table 8.3.

69
8.9 The Probit and Logit Models
70
8.9 The Probit and Logit Models

Measure Goodness of Fit
There is a problem with the use of conventional
R2-type measures when the explained variable y
takes on only two values.
The predicted values are probabilities and
the actual values y are either 0 or 1.
We can also think of R2 in term of the
proportion of correct predictions.

71
8.9 The Probit and Logit Models

Since the dependent variables is a zero or 1
variable, after we computer the we classify
the i-th observation as belonging to group 1 if
lt0.5 and group 2 if gt0.5.
We can then count the number of correct
predictions.
We can define a predicted value , which is
also a zero-one variable such that

72
8.9 The Probit and Logit Models

(Provided that we calculate yi to enough
decimals, ties will be very unlikely.)
Now define

73
Type I error vs. type II error

It should be noted that the above count R2 values
obtained with the CAP and ROC curves treat the
costs of a type I error (classifying a
subsequently failing firm as non-failed) and a
type II error (classifying a subsequently
non-failed firm as failed) as the same.
However, in the credit market, the costs of
misclassifying a firm that subsequently fails are
much more serious than the costs of
misclassifying a firm that does not fail.

74
Type I error vs. type II error

In particular, in the first case, the lender can
lose up to 100 of the loan amount while, in the
latter case, the loss is just the opportunity
cost of not lending to that firm.
Accordingly, in assessing the practical utility
of failure prediction models, banks pay more
attention to the misclassification costs involved
in type I rather than type II errors.

75
Type I error vs. type II error

In particular, for every cutoff probability, the
type I error is defined as the percentage of
defaults that the model mistakenly classifies as
non-defaults and the percentage of non-defaults
that are mistakenly classified as defaults is the
type II error.
We can consider nineteen cutoff probabilities
0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40,
0.45, 0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80,
0.85, 090 and 0.95, and use the average value of
errors as a criterion.

76
8.11 Truncated Variables The Tobit Model

In our discussion of the logit and probit models
we talked about a latent variable which was
not observed, for which we could specify the
regression model
For simplicity of exposition we assuming that
there is only one explanatory variable.
In the logit and probit models, what we observe
is a dummy variable

77
8.11 Truncated Variables The Tobit Model

Suppose, however, that is observed if
gt0 and is not observed if ?0.
Then the observed yi will be defined as

78
8.11 Truncated Variables The Tobit Model

This is known as the tobit model (Tobins probit)
and was first analyzed in the econometrics
literature by Tobit.
It is also known as a censored normal regression
model because some observations on y (those for
which y ? 0)are censored (we are not allowed to
see them).
Our objective is to estimate the parameters ß
and s.

79
8.11 Truncated Variables The Tobit Model

Some Examples
The example that Tobin considered was that of
automobile expenditures .
Let y denote expenditures on automobile and x
denote income, and we postulate the regression
equation .

80
8.11 Truncated Variables The Tobit Model

However, in the sample we would have a large
number of observations for which the expenditures
on automobiles are zero.
Tobin argued that we should use the censored
regression model.
We can specify the model as
The structure of this model thus appears to be
the same as that in (8.19).

81
8.11 Truncated Variables The Tobit Model

Another example hours worked ( H ) or wages ( W
)
If we have observations on a number of
individuals, some of whom are employed and others
not, we can specify the model for house worked as

82
8.11 Truncated Variables The Tobit Model

Similarly, for wages we can specify the model
The structure of these models again appears to be
the same as in (8.19).

83
8.11 Truncated Variables The Tobit Model

Method Estimation
Let us consider the estimation of ß and s by the
use of ordinary least squares.
We cannot use OLS with the positive observation
yi because when we write the model
the error term ui does not have a zero mean
Since observations with are omitted, it
implies that only observations for which
are included in the sample.

84
8.11 Truncated Variables The Tobit Model

Thus, the distribution of ui is a truncated
normal distribution shown in Figure 8.4 and its
mean is not zero.
In face, it depends in ß, s, and xi and is thus
different for each observation.
A method of estimation commonly suggested is the
maximum likelihood method, which is as follows.

85
8.11 Truncated Variables The Tobit Model
86
8.11 Truncated Variables The Tobit Model

1 . The positive values of y, for which we can
write down the normal density function as usual.
We note that has a standard
normal distribution.
2. The zero observations of y for which all we
know is that .
Since has a standard normal
distribution, we will write this as
. The probability of this can be
written as , where F(z) is
the cumulative distribution function of the
standard normal.

87
8.11 Truncated Variables The Tobit Model

Let us denote the density function of the
standard normal by f(?) and the cumulative
distribution function by f(?) .
Thus
and

88
8.11 Truncated Variables The Tobit Model

Using this notation we can write the likelihood
function for the tobit model as
Maximizing this likeihood function with respect
to ß and s, we get the ML estimates of these
parameters.
We will not go through the algebraic details of
the ML method here.
Instead, we discuss the situations under which
the tobit model is applicable and its
relationship to other models with truncated
variables.