Dummy Variables and Truncated Variables - PowerPoint PPT Presentation

1 / 88
About This Presentation
Title:

Dummy Variables and Truncated Variables

Description:

The first example is a study of the determinants of automobile prices. ... Thus one might conclude that the sales elasticity of demand for cash is close to ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 89
Provided by: ACE5135
Category:

less

Transcript and Presenter's Notes

Title: Dummy Variables and Truncated Variables


1
Chapter 8
  • Dummy Variables and Truncated Variables

2
What is in this Chapter?
  • This chapter relaxes the assumption made in
    Chapter 4 that the variables in the regression
    are observed as continuous variables.
  • Differences in intercepts and/or slope
    coefficients
  • The linear probability model and the logit and
    probit models.
  • Truncated variables, Tobit models

3
8.1 Introduction
  • The variables we will be considering are 
  • 1.Dummy variables.
  • 2.Truncated variables.
  • They can be used to 
  • 1.Allow for differences in intercept terms.
  • 2.Allow for differences in slopes.
  • 3.Estimate equations with cross-equation
    restrictions.
  • 4.Test for stability of regression coefficients.

4
8.2 Dummy Variables for Changes in the Intercept
Term
  • Note that the slopes of the regression lines for
    both groups are roughly the same but the
    intercepts are different.
  • Hence the regression equations we fit will be

5
8.2 Dummy Variables for Changes in the Intercept
Term
  • These equations can be combined into a single
    equation
  • where
  • The variable D is the dummy variable.
  • The coefficient of the dummy variable measures
    the difference in the two intercept terms

6
8.2 Dummy Variables for Changes in the Intercept
Term
7
8.2 Dummy Variables for Changes in the Intercept
Term
  • If there are more groups, we have to introduce
    more dummies.
  • For three groups we have
  • These can be written as
  • where

8
8.2 Dummy Variables for Changes in the Intercept
Term
  • As yet another example, suppose that we have data
    on consumption C and income Y for a number of
    households.
  • In addition, we have data on
  • S the sex of the head of the household.
  • A the age of the head of the household, which is
    given in three categories, lt25 years, 25 to 50
    year, and gt50 years.
  • E the education of the head of the household,
    also in three categories, lthigh school, ?high
    school but lt college degree, ? college degree.

9
8.2 Dummy Variables for Changes in the Intercept
Term
  • We include these qualitative variable in the form
    of dummy variables

10
8.2 Dummy Variables for Changes in the Intercept
Term
  • For each category the number of dummy variables
    is one less than the number of classifications.
  • Then we run the regression equation
  • The assumption made in the dummy variable method
    is that it is the intercept that changes for each
    group but not the slope coefficients (i.e.
    coefficients of Y).

11
8.2 Dummy Variables for Changes in the Intercept
Term
  • The intercept term for each individual is
    obtained by substituting the appropriate values
    for D1 through D5.
  • For instance, for a male, age lt 25, with a
    college degree, we have D1 1, D2 1, D3 0, D4
    0, D5 0 and hence the intercept is
    .
  • For a female, age gt 50, with a college degree, we
    have D1 0, D2 0, D3 0, D4 0, D5 0 and hence
    the intercept is a.

12
8.2 Dummy Variables for Changes in the Intercept
Term
  • The dummy variable method is also used if one has
    to take care of seasonal factors.
  • For example, if we have quarterly data on C and
    Y, we fit the regression equation

13
8.2 Dummy Variables for Changes in the Intercept
Term
  • If we have monthly data, we use 11seasonal
    dummies
  • If we feel that, say, December (because of
    Christmas shopping) is the only moth with strong
    seasonal effect, we use only one dummy variable

14
8.2 Dummy Variables for Changes in the Intercept
Term
  • Two More Illustrative Examples
  • We will discuss two more examples using dummy
    variables.
  • They are meant to illustrate two points worth
    noting, which are as follows
  • 1. In some studies with a large number of dummy
    variables it becomes somewhat difficult to
    interpret the signs of the coefficients because
    they seem to have the wrong signs. (The first
    example)
  • 2. Sometimes the introduction of dummy variables
    produces a drastic change in the slope
    coefficient. (The second example)

15
8.2 Dummy Variables for Changes in the Intercept
Term
  • The first example is a study of the determinants
    of automobile prices.
  • Griliches regressed the logarithm of new
    passenger car prices on various specifications.
    The results are shown in Table 8.1
  • Since the dependent variable is the logarithm of
    price, the regression coefficients can be
    interpreted as the estimated percentage change in
    the price for a unit change in a particular
    quality, holding other qualities constant
  • For example, the coefficient of H indicates that
    an increase in 10 units of horsepower, results in
    a 1.2 increase in price

16
(No Transcript)
17
8.2 Dummy Variables for Changes in the Intercept
Term
  • However, some of the coefficients have to be
    interpreted with caution
  • For example, the coefficient of P in the
    equation for 1960 says that the presence of power
    steering as "standard equipment" led to a 22.5
    higher price in 1960
  • In this case the variable P is obviously not
    measuring the effect of power steering alone but
    is measuring the effect of "luxuriousness" of the
    car
  • It is also picking up the effects of A and B.
    This explains why the coefficient of A is so low
    in 1960. In fact. A, P, and B together can
    perhaps be replaced by a single dummy that
    measures "luxuriousness." These variables appear
    to be highly intercorrelated

18
8.2 Dummy Variables for Changes in the Intercept
Term
  • Another coefficient, at first sight puzzling, is
    the coefficient of V, which, though not
    significant, is consistently negative
  • Though a V-8 costs more than a six-cylinder
    engine on a "comparable" car, what this
    coefficients says is that, holding horsepower and
    other variables constant, a V-8 is cheaper by
    about 4
  • Since the V-8's have higher horsepower, what this
    coefficient is saying is that higher horse power
    can be achieved more cheaply if one shifts to V-8
    than by using the six-cylinder engine

19
8.2 Dummy Variables for Changes in the Intercept
Term
  • It measures the decline in price per horsepower
    as one shifts to V-8's even though the total
    expenditure on horsepower goes up
  • This example illustrates the use of dummy
    variables and the interpretation of seemingly
    wrong coefficients

20
8.2 Dummy Variables for Changes in the Intercept
Term
  • As another example consider the estimates of
    liquid-asset demand by manufacturing
    corporations
  • Vogel and Maddala computed regressions of the
    form log C a ß log S, where C is the cash and S
    the sales, on the basis of data from the Internal
    Revenue Service, "Statistics of Income," for the
    year 1960-1961.
  • The data consisted of 16 industry subgroups and
    14 size classes, size being measured by total
    assets.

21
8.2 Dummy Variables for Changes in the Intercept
Term
  • The equations were estimated separately for each
    industry, the estimates of ß ranged from 0.929 to
    1.077.
  • The R2s were uniformly high, ranging from 0.985
    to 0.998.
  • Thus one might conclude that the sales elasticity
    of demand for cash is close to 1.
  • Also, when the data were pooled and a single
    equation estimated for the entire set of 224
    observations, the estimate of ß was 0.992 and
    R20.897.

22
8.2 Dummy Variables for Changes in the Intercept
Term
  • When industry dummies were added, the estimate of
    ß was 0.995 and R20.992.
  • From the high R2s and relatively constant
    estimate of ß one might be reassured that the
    sales elasticity is very close to 1.
  • However, when asset-size dummies were introduced,
    the estimate of ß fell to 0.334 with R2 of 0.996.
  • Also, all asset-size dummies were highly
    significant.

23
8.2 Dummy Variables for Changes in the Intercept
Term
  • The situation is described in Figure 8.2.
  • That the sales elasticity is significantly less
    than 1 is also confirmed by other evidence.
  • This example illustrates how one can be very
    easily misled by high R2s and apparent constancy
    of the coefficients.

24
8.2 Dummy Variables for Changes in the Intercept
Term
25
8.3 Dummy Variables for Changes in Slope
Coefficients
  • and
  • We can write these equations together as
  • or

26
8.3 Dummy Variables for Changes in Slope
Coefficients
  • where for all observations in the first
    group
  • for all observations in the
    second group
  • for all observations in the
    first group
  • i.e., the respective value of
    x for the second group
  • The coefficient of D1 measures the difference in
    the intercept terms and coefficient of D2
    measures the difference in the slope.

27
8.3 Dummy Variables for Changes in Slope
Coefficients
  • Suitable dummy variables can be defined when
    there are change in slopes and intercepts at
    different times.
  • Suppose that we have data for three periods and
    in the second period only the intercept changed (
    there was a parallel shift).
  • In the third period the intercept and the slope
    changed.

28
8.3 Dummy Variables for Changes in Slope
Coefficients
  • Then we write
  • Then we can combine these equations and write the
    model as

29
8.3 Dummy Variables for Changes in Slope
Coefficients
30
8.3 Dummy Variables for Changes in Slope
Coefficients
  • Note that in all these examples e are assuming
    that the error terms in the different groups all
    have the same distribution.
  • That is why we combine the data from the
    different groups and write an error term u as in
    (8.4) or (8.6) and estimate the equation by least
    squares.

31
8.3 Dummy Variables for Changes in Slope
Coefficients
  • An alternative way of writing the equations
    (8.5), which is very general, is to stack the y
    variables and the error terms in columns.
  • Then write all the parameters a1, a2 , a3 , ß1 ,
    ß2 down with their multiplicative factors stacked
    in columns as follows

32
8.3 Dummy Variables for Changes in Slope
Coefficients
  • What this says is
  • where ( ) is used for multiplication, e.g.,
    a3(0)a30.

33
8.3 Dummy Variables for Changes in Slope
Coefficients
  • where the definitions of D1, D2, D3, D4, D5
    are clear from equation(8.7).
  • For instance,

34
8.3 Dummy Variables for Changes in Slope
Coefficients
  • Note that equation (8.8) has to be estimated
    without a constant term.
  • In this method we define as many dummy variables
    as there are parameters to estimate and we
    estimate the regression equation with no constant
    term.
  • Note that equations (8.6) and (8.8) are
    equivalent.

35
8.7 Dummy Dependent Variables
  • Until now we have been considering models where
    the explanatory variables are dummy variables.
  • We now discuss models where the explained
    variable is a dummy variable.
  • This dummy variable can take on two or more
    values but we consider here the case where it
    takes on only two values, zero or 1.
  • Considering the other cases is beyond the scope
    of this book. Since the dummy variable takes on
    two values, it is called a dichotomous variable
  • There are numerous examples of dichotomous
    explained variables.

36
8.7 Dummy Dependent Variables
  • There are several methods to analyze regression
    model where the dependent variable is a zero or
    1 variable.
  • The simplest procedure is to just use the usual
    least squares method.
  • In this case the model is called the linear
    probability model.

37
8.7 Dummy Dependent Variables
  • Another method, called the linear discriminate
    function, is related to the linear probability
    model.
  • The other alternative is to say that there is an
    underlying or latent variable y which we do not
    observe.
  • What we observe is
  • This is the idea behind the logit and probit
    models.
  • First we these methods and then give an
    illustrative example.

38
8.8 The Linear Probability Model and the Linear
Discriminant Function
  • The Linear Probability Model
  • Similarly, in an analysis of bankruptcy of firms,
    we define
  • We write the model in the usual regression
    framework as
  • with E(ui)0.

39
8.8 The Linear Probability Model and the Linear
Discriminant Function
  • The condition expectation is equal
    to .
  • This has to be interpreted in this case as the
    probability that the even will occur given the
    xi.
  • The calculated value if y from the regression
    equation (i.e., ) will then give the

40
8.8 The Linear Probability Model and the Linear
Discriminant Function
  • Since yi takes the value 1 or zero, the errors in
    equation (8.11) can take only two values, (1-ßxi)
    and (-ßxi).
  • Also, with the interpretation we have given
    equation (8.11), and the requirement that
    E(ui)0, the respective probabilities of these
    events are ßxi and (1-ßxi).
  • Thus we have

41
8.8 The Linear Probability Model and the Linear
Discriminant Function
  • Hence

42
8.8 The Linear Probability Model and the Linear
Discriminant Function
  • Because of this heteroskedasticity problem the
    OLS estimates of ß from equation (8.11) will not
    be efficient.
  • We use the following two-step procedure
  • First estimate (8.11) by least squares.
  • Net compute and use weighted least
    squares, that is, defining
  • We regress

43
8.8 The Linear Probability Model and the Linear
Discriminant Function
  • The problems with this procedure are
  • in practice may be negative,
    although in large samples this will be so with a
    very small probability since is a
    consistent estimator for
    .
  • Since the error ui are obviously not normally
    distributed, there is a problem with the
    application of the usual tests of significance.
    As we will see in the next section, on the linear
    discriminant function, they can be justified only
    under the assumption that the explanatory
    variables have a multivariate normal distribution.

44
8.8 The Linear Probability Model and the Linear
Discriminant Function
  • The most important criticism is with the
    formulation itself that the conditional
    expectation be interpreted as the
    probability that the even will occur. In many
    case cases lie outside the limits (0,
    1).
  • The limitations of the linear probability model
    are shown in Figure 8.3, which shows the bunching
    up of points along y0 and y1.
  • The predicted values can easily lie outside the
    interval (0, 1) and the prediction errors can be
    very large.

45
8.8 The Linear Probability Model and the Linear
Discriminant Function
46
8.8 The Linear Probability Model and the Linear
Discriminant Function
  • The Linear Discriminant Function
  • Suppose that we have n individuals for whom we
    have observations on k explanatory variables and
    we observe that n1 of them belong to a second
    group where n1n2n.
  • We want to construct a linear function of the k
    variables that we can use to predict that a new
    observation belongs to one of the twp groups.
  • This linear function is called the linear
    discriminant function.

47
8.8 The Linear Probability Model and the Linear
Discriminant Function
  • As an example suppose that we have data on a
    number of loan applicants and we observe that n1
    of them were granted loans and n2 of them were
    denied loans.
  • We also have the socioeconomic characteristics on
    the applicants

48
8.8 The Linear Probability Model and the Linear
Discriminant Function
  • Let us define a linear function
  • Then it is intuitively clear that to get the best
    discrimination between the two groups, we would
    want to choose the that the ratio

49
8.8 The Linear Probability Model and the Linear
Discriminant Function
  • Fisher suggested an analogy between this problem
    and multiple regression analysis.
  • He suggested that we define a dummy variable

50
8.8 The Linear Probability Model and the Linear
Discriminant Function
  • Now estimate the multiple regression equation
  • Get the residual sum of squares RSS.
  • Then
  • Thus, once we have the regression coefficients
    and residual sum of squares from the dummy
    dependent variable regression, we can very easily
    obtain the discriminant function coefficients.

51
Discriminant Analysis
  • Discriminant analysis attempts to classify
    customers into two groups
  • those that will default
  • those that will not
  • It does this by assigning a score to each
    customer
  • The score is the weighted sum of the customer
    data

52
Discriminant Analysis
  • Here, wi is the weight on data type i, and Xi,c,
    is one piece of customer data.
  • The values for the weights are chosen to maximize
    the difference between the average score of the
    customers that later defaulted and the average
    score of the customers who did not default

53
Discriminant Analysis
  • The actual optimization process to find the
    weights is quite complex
  • The most famous discriminant scorecard is
    Altman's Z Score.
  • For publicly owned manufacturing firms, the Z
    Score was found to be as follows

54
Discriminant Analysis
55
Discriminant Analysis
  • A company scoring less than 1.81 was "very
    likely" to go bankrupt later
  • A company scoring more than 2.99 was "unlikely"
    to go bankrupt.
  • The scores in between were considered inconclusive

56
Discriminant Analysis
  • This approach has been adopted by many banks.
  • Some banks use the equation exactly as it was
    created by Altman
  • But, most use Altman's approach on their own
    customer data to get scoring models that are
    tailored to the bank
  • To obtain the probability of default from the
    scores, we group companies according their scores
    at the beginning of a year, and then calculate
    the percentage of companies within each group who
    defaulted by the end of the year

57
8.9 The Probit and Logit Models
  • An alternative approach is to assume that we have
    a regression model
  • where is not observed.
  • It is commonly called a latent variable.
  • What we observe is a dummy variable yi defined by

58
8.9 The Probit and Logit Models
  • The probit and logit model differ in the
    specification of the distribution of the error
    term u in equation (8.12).
  • The difference between the specification (8.12)
    and the linear probability model is that in the
    linear probability model we analyze the
    dichotomous variables as they are, whereas in
    (8.12) we assume the existence of an underlying
    latent variable for which we observe a
    dichotomous realization.

59
8.9 The Probit and Logit Models
  • For instance, if the observed dummy variable is
    whether or not the person is employed,
    would be defined as propensity or ability to
    find employment.
  • Similarly, if the observed dummy variable is
    whether or not the person has bought a car, then
    would be defined as desire or ability to
    buy a car.
  • Note that in both the examples we have given,
    there is desire and ability involved.
  • Thus the explanatory variables in (8.12) would
    con tain variables that explain both these
    elements.

60
8.9 The Probit and Logit Models
  • Note from system(8.13) that multiplying by
    any positive constant does not change yi.
  • Hence if we observe yi, we can estimate the ßs
    in (8.12) only up to positive multiple.
  • Hence it is customary to assume var(ui)1.
  • This fixed the scale of .
  • From the relationship (8.12) and (8.13) we get

61
8.9 The Probit and Logit Models
  • where F is the cumulative distribution
    function of u.

62
8.9 The Probit and Logit Models
  • If the distribution of u is symmetric, since
    1-F(-Z), we can write
  • Since the observed yi are just realization of a
    binomial process with probabilities given by
    equation (8.14) and varying from trial to trial
    (depend on zij), we can write the likelihood
    function as

63
8.9 The Probit and Logit Models
  • If the cumulative distribution of ui is logistic
    we have what is known as the logit model.
  • In this case
  • Hence
  • Note that for logit model

64
8.9 The Probit and Logit Models
  • If the errors ui in (8.12) follow a normal
    distribution, we have the probit model (it should
    more appropriately be called the normit model,
    but the word probit was used in the biometrics
    literature).
  • In this case

65
8.9 The Probit and Logit Models
  • Maximization of the likelihood function (8.15)
    for either the probit or the logit model is
    accomplished by nonlinear estimation methods.
  • There are now several computer programs available
    for probit and logit analysis, and these programs
    are very inexpensive to run.

66
8.9 The Probit and Logit Models
  • Illustrative Example
  • As an illustration, we consider data on a sample
    of 750 mortgage applications in the Columbia, SC,
    metropolitan area.
  • There were 500 loan applications accepted and 250
    loan applications rejected.
  • We define

67
8.9 The Probit and Logit Models
  • Three model were estimated the linear
    probability model, the logit model, and the
    probit model.
  • The explanatory variables were
  • AI applicants and coapplicants income (103
    dollars)
  • XMDdebt minus mortgage payment (103 dollars)
  • DFdummy variable,1 for female, 0 for male
  • DRdummy variable,1 for nonwhite, 0 for white
  • DSdummy variable,1 for single, 0 for
    otherwise
  • DAage of house (102 dollars)

68
8.9 The Probit and Logit Models
  • NNWP percent nonwhite in the neighborhood
    (103)
  • NMFIneighborhood mean family income
    (105dollars)
  • NAneighborhood average age of house (102
    years)
  • The results are presented in Table 8.3.

69
8.9 The Probit and Logit Models
70
8.9 The Probit and Logit Models
  • Measure Goodness of Fit
  • There is a problem with the use of conventional
    R2-type measures when the explained variable y
    takes on only two values.
  • The predicted values are probabilities and
    the actual values y are either 0 or 1.
  • We can also think of R2 in term of the
    proportion of correct predictions.

71
8.9 The Probit and Logit Models
  • Since the dependent variables is a zero or 1
    variable, after we computer the we classify
    the i-th observation as belonging to group 1 if
    lt0.5 and group 2 if gt0.5.
  • We can then count the number of correct
    predictions.
  • We can define a predicted value , which is
    also a zero-one variable such that

72
8.9 The Probit and Logit Models
  • (Provided that we calculate yi to enough
    decimals, ties will be very unlikely.)
  • Now define

73
Type I error vs. type II error
  • It should be noted that the above count R2 values
    obtained with the CAP and ROC curves treat the
    costs of a type I error (classifying a
    subsequently failing firm as non-failed) and a
    type II error (classifying a subsequently
    non-failed firm as failed) as the same.
  • However, in the credit market, the costs of
    misclassifying a firm that subsequently fails are
    much more serious than the costs of
    misclassifying a firm that does not fail.

74
Type I error vs. type II error
  • In particular, in the first case, the lender can
    lose up to 100 of the loan amount while, in the
    latter case, the loss is just the opportunity
    cost of not lending to that firm.
  • Accordingly, in assessing the practical utility
    of failure prediction models, banks pay more
    attention to the misclassification costs involved
    in type I rather than type II errors.

75
Type I error vs. type II error
  • In particular, for every cutoff probability, the
    type I error is defined as the percentage of
    defaults that the model mistakenly classifies as
    non-defaults and the percentage of non-defaults
    that are mistakenly classified as defaults is the
    type II error.
  • We can consider nineteen cutoff probabilities
    0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40,
    0.45, 0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80,
    0.85, 090 and 0.95, and use the average value of
    errors as a criterion.

76
8.11 Truncated Variables The Tobit Model
  • In our discussion of the logit and probit models
    we talked about a latent variable which was
    not observed, for which we could specify the
    regression model
  • For simplicity of exposition we assuming that
    there is only one explanatory variable.
  • In the logit and probit models, what we observe
    is a dummy variable

77
8.11 Truncated Variables The Tobit Model
  • Suppose, however, that is observed if
    gt0 and is not observed if ?0.
  • Then the observed yi will be defined as

78
8.11 Truncated Variables The Tobit Model
  • This is known as the tobit model (Tobins probit)
    and was first analyzed in the econometrics
    literature by Tobit.
  • It is also known as a censored normal regression
    model because some observations on y (those for
    which y ? 0)are censored (we are not allowed to
    see them).
  • Our objective is to estimate the parameters ß
    and s.

79
8.11 Truncated Variables The Tobit Model
  • Some Examples
  • The example that Tobin considered was that of
    automobile expenditures .
  • Let y denote expenditures on automobile and x
    denote income, and we postulate the regression
    equation .

80
8.11 Truncated Variables The Tobit Model
  • However, in the sample we would have a large
    number of observations for which the expenditures
    on automobiles are zero.
  • Tobin argued that we should use the censored
    regression model.
  • We can specify the model as
  • The structure of this model thus appears to be
    the same as that in (8.19).

81
8.11 Truncated Variables The Tobit Model
  • Another example hours worked ( H ) or wages ( W
    )
  • If we have observations on a number of
    individuals, some of whom are employed and others
    not, we can specify the model for house worked as

82
8.11 Truncated Variables The Tobit Model
  • Similarly, for wages we can specify the model
  • The structure of these models again appears to be
    the same as in (8.19).

83
8.11 Truncated Variables The Tobit Model
  • Method Estimation
  • Let us consider the estimation of ß and s by the
    use of ordinary least squares.
  • We cannot use OLS with the positive observation
    yi because when we write the model
  • the error term ui does not have a zero mean
  • Since observations with are omitted, it
    implies that only observations for which
    are included in the sample.

84
8.11 Truncated Variables The Tobit Model
  • Thus, the distribution of ui is a truncated
    normal distribution shown in Figure 8.4 and its
    mean is not zero.
  • In face, it depends in ß, s, and xi and is thus
    different for each observation.
  • A method of estimation commonly suggested is the
    maximum likelihood method, which is as follows.

85
8.11 Truncated Variables The Tobit Model
86
8.11 Truncated Variables The Tobit Model
  • 1 . The positive values of y, for which we can
    write down the normal density function as usual.
    We note that has a standard
    normal distribution.
  • 2. The zero observations of y for which all we
    know is that .
    Since has a standard normal
    distribution, we will write this as
    . The probability of this can be
    written as , where F(z) is
    the cumulative distribution function of the
    standard normal.

87
8.11 Truncated Variables The Tobit Model
  • Let us denote the density function of the
    standard normal by f(?) and the cumulative
    distribution function by f(?) .
  • Thus
  • and

88
8.11 Truncated Variables The Tobit Model
  • Using this notation we can write the likelihood
    function for the tobit model as
  • Maximizing this likeihood function with respect
    to ß and s, we get the ML estimates of these
    parameters.
  • We will not go through the algebraic details of
    the ML method here.
  • Instead, we discuss the situations under which
    the tobit model is applicable and its
    relationship to other models with truncated
    variables.
Write a Comment
User Comments (0)
About PowerShow.com