Models with limited dependent variables

About This Presentation

Title:

Models with limited dependent variables

Description:

Computation of the Hessian may cause problems. B. ML Estimation. Alternatives procedures: Approximations to the Hessian. Other procedures, such as steepest-ascent ... – PowerPoint PPT presentation

Number of Views:165

Avg rating:3.0/5.0

Slides: 79

Provided by: cam8150

Category:

more less

Transcript and Presenter's Notes

Title: Models with limited dependent variables

1
Models with limited dependent variables

Doctoral Program 2006-2007
Katia Campo

2
Introduction
3
Limited dependent variables
Discrete dependent variable
Continuous dependent variable
Truncated, Censored
4
Discrete choice models

Choice between different options (j)
Single Choice (binary choice models)
e.g. Buy a product or not, follow higher
education or not, ...
j1 (yes/accept) or 0 (no/reject)
Multiple Choice (multinomial choice models),
e.g. cars, stores, transportation modes
j1(opt.1), 2(opt.2), ....., J(opt.J)

5
Truncated/censored regression models

Truncated variable
observed only beyond a certain threshold level
(truncation point)
e.g. store expenditures, income
Censored variables
values in a certain range are all transformed to
(or reported as) a single value (Greene, p.761)
e.g. demand (stockouts, unfullfilled demand),
hours worked

6
Duration/Hazard models

Time between two events, e.g.
Time between two purchases
Time until a consumer becomes inactive/cancels a
subscription
Time until a consumer responds to direct mail/ a
questionnaire
...

7
Need to use adjusted models Illustration
Frances and Paap (2001)
8
Overview

Part I. Discrete Choice Models
Part II. Censored and Truncated Regression Models
Part III. Duration Models

9
Recommended Literature

Kenneth Train, Discrete Choice Methods with
Simulation, Cambridge University Press, 2003
(Part I)
Ph.H.Franses and R.Paap, Quantitative Models in
Market Research, Cambridge University Press, 2001
(Part I-II-III Data www.few.eur.nl/few/people/pa
ap)
D.A.Hensher, J.M.Rose and W.H.Greene, Applied
Choice Analysis, Cambridge University Press, 2005
(Part I)

10
Part I. Discrete Choice Models
11
Overview Part I, DCM

Properties of DCM
Estimation of DCM
Types of Discrete Choice Models
Binary Logit Model
Multinomial Logit Model
Nested logit model
Probit Model
Ordered Logit Model
Heterogeneity

12
Notation

n decision maker
i,j choice options
y decision outcome
x explanatory variables
? parameters
? error term
I. indicator function, equal to 1 if
expression within brackets is true, 0 otherwise
e.g. Iyjx 1 if j was selected (given x),
equal to 0 otherwise

13
A. Properties of DCM
Kenneth Train

Characteristics of the choice set
Alternatives must be mutually exclusive
no combination of choice alternatives
(e.g. different brands, combination of diff.
transportation modes)
Choice set must be exhaustive
i.e., include all relevant alternatives
Finite number of alternatives

14
A. Properties of DCM
Kenneth Train

Random utility maximization
Ass decision maker selects the alternative that
provides the highest utility,
i.e. Selects i if Uni gt Unj ? j ? i
Decomposition of utility into a deterministic
(observed) and random (unobserved) part
Unj Vnj ?nj

15
A. Properties of DCM
Kenneth Train

Random utility maximization

16
A. Properties of DCM
Kenneth Train

Identification problems
Only differences in utility matter
Choice probabilities do not change when a
constant is added to each alternatives utility
Implication
Some parameters cannot be identified/estimated
Alternative-specific constants Coefficients of
variables that change over decision makers but
not over alternatives
Normalization of parameter(s)

17
A. Properties of DCM
Kenneth Train

Identification problems
Overall scale of utility is irrelevant
Choice probabilities do not change when the
utility of all alternatives are multiplied by the
same factor
Implication
Coefficients of ? models (data sets) are not
directly comparable
Normalization (var.of error terms)

18
A. Properties of DCM
Kenneth Train

Aggregation
Biased estimates when aggregate values of the
explanatory variables are used as input
Consistent estimates can be obtained by sample
enumeration
- compute prob./elasticity for each dec.maker
- compute (weighted) average of these values

Swait and Louvière(1993), Andrews and Currim
(2002)
19
Properties of DCM
Keneth Train

Aggregation

20
B. Estimation DCM

Numerical maximization (ML-estimation)
Simulation-assisted estimation
Bayesian estimation

(see Train)
21
B. ML-estimation

Objective find those parameter values most
likely to have produced the sample observations
(Judge et al.)
Likelihood for one observation Pn(X,?)
Likelihoodfunction
L(?) ?n Pn(X,?)
Loglikelihood
LL(?) ? n ln(Pn(X,?))

22
B. ML Estimation

Determine ? for which LL(?) reaches its max
First derivative 0 ? no closed-form solution
Iterative procedure
Starting values ?0
Determine new value ?t1 for which LL(?t1) gt
LL(?t)
Repeat procedure ii until convergence (small
change in LL(?))

23
B. ML Estimation
24
B. ML Estimation

- Direction and step size ?t ? ?t1 ?
based on taylor approximation of LL(?t1) (with
base (?t))
LL(?t1) LL(?t)(?t1- ?t)gt1/2(?t1-
?t)Ht (?t1- ?t) 1
with

25
B. ML Estimation

- Direction and step size ?t ? ?t1 ?
Optimization of 1 leads to
? Computation of the Hessian may cause problems

26
B. ML Estimation

Alternatives procedures
Approximations to the Hessian
Other procedures, such as steepest-ascent

See e.g. Train, Judge et al.(1985)
27
B. ML Estimation

Properties ML estimator
Consistency
Asymptotic Normality
Asymptotic Efficiency

See e.g. Greene (ch.17), Judge et al.
28
B.Diagnostics and Model Selection

Goodness-of-Fit
Joint significance of explanatory vars
LR-test LR -2(LL(?0) - LL(?))
LR ?²(k)
Pseudo R² 1 - LL(?)
LL(?0)

29
B.Diagnostics and Model Selection

Goodness-of-Fit
Akaike Information Criterion
AIC 1/N (-2 LL(?) 2k)
CAIC -2LL(?) k(log(N)1)
BIC 1/N (-2 LL(?) k log(N))
sometimes conflicting results

30
B.Diagnostics and Model Selection

Model selection based on GoF
Nested models LR-test
LR -2(LL(?r) - LL(?ur))
rrestricted model urunrestricted (full) model
LR ?²(k) (kdifference in of parameters)
Non-nested models
AIC, CAIC, BIC ? lowest value

31
C. Discrete Choice Models

Binary Logit Model
Multinomial Logit Model
Nested logit model
Probit Model
Ordered Logit Model

32
1. Binary Logit Model

Choice between 2 alternatives
Often accept/reject or yes/no decisions
E.g. Purchase incidence make a purchase in the
category or not
Dep. var. yn 1, if option is selected
0, if option is not
selected
Model P(yn1 xn)

33
1. Binary Logit Model

Based on the general RUM-model
Ass. error terms are iid and follow an extreme
value or Gumbel distribution

34
1. Binary Logit Model

Based on the general RUM-model
Pn ? Ißxn en gt 0 f(e) de
? Ien gt -ßxn f(e) de
?e-ßx f(e) de
1 F(- ßxn)
1 1/(1exp(ßxn))
exp(ßxn)/(1exp(ßxn))
Ass. error terms are iid and follow an extreme
value/Gumbel distr.

35
1. Binary Logit Model

Leads to the following expression for the logit
choice probability

36
1. Binary Logit Model

Properties
Nonlinear effect of explanatory vars on
dependent variable
Logistic curve with inflection point at P0.5

37
1. Binary Logit Model
38
1. Binary Logit Model

Effect of explanatory variables ?
For
Quasi-elasticity

39
1. Binary Logit Model

Effect of explanatory variables ?
For
Odds ratio is equal to

40
1. Binary Logit Model

Estimation ML
Likelihoodfunction L(?)
?n P(yn1x,?)yn (1- P (yn1x,?))1-yn
Loglikelihood LL(?)
? n yn ln(P (yn1x,?) )
(1-yn) ln(1- P (yn1x,?))

41
1. Binary Logit Model

Forecasting accuracy
Predictions yn1 if F(Xn ?) gt c (e.g. 0.5)
yn0 if F(Xn ?) ? c
Compute hit rate of correct predictions

42
1. Binary Logit Model

Example Purchase Incidence Model
ptn(inc) probability that household n
engages
in a category purchase in the
store
on purchase occasion t,
Wtn the utility of the purchase option.

Bucklin and Gupta (1992)
43
1. Binary Logit Model

Example Purchase Incidence Model

With CRn rate of consumption for household
n INVnt inventory level for household n, time
t CVnt category value for household n, time t
Bucklin and Gupta (1992)
44
1. Binary Logit Model

Data
A.C.Nielsen scanner panel data
117 weeks 65 for initialization, 52 for
estimation
565 households 300 selected randomly for
estimation, remaining hh holdout sample for
validation
Data set for estimation 30.966 shopping trips,
2275 purchases in the category (liquid laundry
detergent)
Estimation limited to the 7 top-selling brands
(80 of category purchases), representing 28
brand-size combinations ( level of analysis for
the choice model)

Bucklin and Gupta (1992)
45
1. Binary Logit Model
Goodness-of-Fit
Model param. LL U² (pseudo R²) BIC
Null model Full model 1 4 -13614.4 -11234.5 - .175 13619.6 11255.2
46
1. Binary Logit Model
Parameter estimates
Parameter Estimate (t-statistic)
Intercept ?0 CR ?1 INV ?2 CV ?3 -4.521 (-27.70) .549 (4.18) -.520 (-8.91) .410 (8.00)
47
Variable Coefficient Std.
Error z-Statistic Prob.   C 0.2221
21 0.668483 0.332277 0.7397 DISPLHEINZ 0.57338
9 0.239492 2.394186 0.0167 DISPLHUNTS -0.55764
8 0.247440 -2.253674 0.0242 FEATHEINZ 0.505656
0.313898 1.610896 0.1072 FEATHUNTS -1.055859 0.
349108 -3.024445 0.0025 FEATDISPLHEINZ
0.428319 0.438248 0.977344 0.3284 FEATDISPLHUN
TS -1.843528 0.468883 -3.931748 0.0001 PRICEHEIN
Z -135.1312 10.34643 -13.06066 0.0000 PRICEHUNTS
222.6957 19.06951 11.67810 0.0000

Binary Logit Model (Franses and Paap
www.few.eur.nl/few/people/paap)
48
Binary Logit Model (Franses and Paap
www.few.eur.nl/few/people/paap)
Mean dependent var 0.890279     S.D. dependent
var 0.312598 S.E. of regression 0.271955     Ak
aike info criterion 0.504027 Sum squared
resid 206.2728     Schwarz criterion 0.523123
Log likelihood -696.1344    Hannan-Quinn
criter. 0.510921 Restr. log likelihood -967.918
Avg. log likelihood -0.248797 LR statistic
(8 df) 543.5673     McFadden R-squared 0.280792
Probability(LR stat) 0.000000 Obs
with Dep0 307 Total obs 2798 Obs with
Dep1 2491
49
Binary Logit Model (Franses and Paap
www.few.eur.nl/few/people/paap)
50
Binary Logit Model (Franses and Paap
www.few.eur.nl/few/people/paap)
51
2. Multinomial Logit Model

Choice between Jgt2 categories
Dependent variable yn 1, 2, 3, .... J
Explanatory variables
Different across individuals, not across
categories (standard MNL model)
Different across (individuals and) categories
(conditional MNL model)
Model P(ynjXn)

52
2. Multinomial Logit Model

Based on the general RUM-model
Ass. error terms are iid following an extreme
value or Gumbel distribution

53
2. Multinomial Logit Model

Identification problem ? select reference
category and set coeffients equal to 0

54
2. Multinomial Logit Model

Conditional MNL model

55
2. Multinomial Logit Model

Interpretation of parameters
Derivative (marginal effect)
Cross-effects

(Traditional MNL model, see Franses en Paap p.80)
56
2. Multinomial Logit Model

Interpretation of parameters
Overall effect

57
2. Multinomial Logit Model

Interpretation of parameters
Probability-ratio
Does not depend on the other alternatives!

58
2. Multinomial Logit Model

Estimation
ML estimation

(znj1 if j is selected, 0 otherwise)
59
2. Multinomial Logit Model

Estimation
Alternative estimation procedures
Simulation-assisted estimation (Train, Ch.10)
Bayesian estimation (Train, Ch.12)

60
2. Multinomial Logit Model

Example

Bucklin and Gupta (1992)
61
2. Multinomial Logit Model

Variables
Ui constant for brand-size i
BLhi loyalty of household h to brand of
brandsize i
LBPhit 1 if i was last brand purchased, 0
otherwise
SLhi loyalty of household h to size of
brandsize i
LSPhit 1 if i was last size purchased, 0
otherwise
Priceit actual shelf price of brand-size i at
time t
Promoit promotional status of brand-size i at
time t

Bucklin and Gupta (1992)
62
2. Multinomial Logit Model

Data
A.C.Nielsen scanner panel data
117 weeks 65 for initialization, 52 for
estimation
565 households 300 selected randomly for
estimation, remaining hh holdout sample for
validation
Data set for estimation 30.966 shopping trips,
2275 purchases in the category (liquid laundry
detergent)
Estimation limited to the 7 top-selling brands
(80 of category purchases), representing 28
brand-size combinations ( level of analysis for
the choice model)

Bucklin and Gupta (1992)
63
2. Multinomial Logit Model
Goodness-of-Fit
Model param. LL U² (pseudo R²) BIC
Null model Full model 27 33 -5957.3 -3786.9 - .364 6061.6 3914.3
Bucklin and Gupta (1992)
64
2. Multinomial Logit Model
Estimation Results
Parameter Estimate (t-statistic)
BL ?1 LBP ?2 SL ?3 LSP ?4 Price ?5 Promo ?6 3.499 (22.74) .548 (6.50) 2.043 (13.67) .512 (7.06) -.696 (-13.66) 2.016 (21.33)
Bucklin and Gupta (1992)
65
2. Multinomial Logit Model

Scale parameter
Variance of the extreme value distribution ?²/6
If true utility is Unj ?xnj ?nj with
var(?nj) ?² (?²/6), the estimated
representative utility Vnj ?xnj involves a
rescaling of ? ? ? ? / ?
? and ? can not be estimated separately
take into account that the estimated coeffients
indicate the variables effect relative to the
variance of unobserved factors
Include scale parameters if subsamples in a
pooled estimation (may) have different error
variances

66
2. Multinomial Logit Model

Scale parameter in case of pooled estimation of
subsamples with different error variance
For each subsample s, multiply utility by µs,
which is estimated simultaneously with ?
Normalization set µs equal to 1 for 1 subs.
Values of µs reflect diffs in error variation
µsgt1 error variance is smaller in s than in the
reference subsample
µslt1 error variance is larger in s than in the
reference subsample

Swait and Louviere (1993), Andrews and Currim
(2002)
67
2. Multinomial Logit Model

Example
Data from online experiment, 2 product categories
Three diff.assortments, assigned to
diff.respondent groups
Assortment 1 small assortment
Assortment 2 ass.1 extended with add.brands
Assortment 3 ass.1 extended with add types
Explanatory variables are the same (hh chars,
MM), with exception of the constants
A scale factor is introduced for assortment 2 and
3 (assortment 1 is reference with scale factor 1)

Breugelmans et al (2005)
68
2. Multinomial Logit Model
Table 1 Descriptives for each assortment
(margarine and cereals)
Breugelmans et al (2005)
a common refers to attribute levels that are
present in all three assortments
69
2. Multinomial Logit Model

MNL-model Pooled estimation
Phit,a the probability that household h chooses
item i at time t, facing assortment a
uhit,a the choice utility of item i for
household h facing assortment a
f(household variables, MM-variables)
Cha set of category items available to household
h within assortment a
µa Gumbel scale factor

Breugelmans et al, based on Andrews and Currim
2002 Swait and Louvière 1993
70
2. Multinomial Logit Model

Estimation results
Goodness-of-Fit
(average) LL -0.045 (M), -0.040 (C)
BIC 2929 (M), 4763(C)
CAIC 2871 (M), 4699 (C)
Scale factors
M 1.2498 (ass2), 1.2627 (ass3)
C 1.0562 (ass2), 0.7573 (ass3)

Breugelmans et al (2005)
71
2. Multinomial Logit Model
Margarine Margarine Margarine Margarine Cereals Cereals Cereals Cereals
Variable Assortment 1 Assortment 2 Assortment 3 Variable Assortment 1 Assortment 2 Assortment 3
Scale factor Mean Last purchase Item preference Brand asymmetry Size asymmetry Sequence Proximity 1.00b 2.0675 2.8310 0.2805 -0.0841 - d 0.8332 1.2498 2.5840c 3.5382c 0.4228 -0.0880 0.3672 1.0303 1.2627 2.6106c 3.5747c 0.5400 0.0169 -0.1190 0.6235 Scale factor Mean Last purchase Item preference Brand asymmetry Taste asymmetry Type asymmetry Sequence Proximity 1.00b 0.6441 5.2011 0.0077 -0.0260 0.3119 -0.3311 2.0041 1.0562 0.6803c 5.4934c 0.6130 0.2938 -0.0614 -0.0695 0.7214 0.7573 0.4888c 3.9109c 0.0969 -0.1596 0.3816 0.6190 4.1140
(Excluding brand/size constants)
Breugelmans et al (2005)
72
2. Multinomial Logit Model

Limitations of the MNL model
Independence of Irrelevant Alternatives
(proportional substitution pattern)
Order (where relevant) is not taken into account
Systematic taste variation can be represented,
not random taste variation
No correlation between error terms (iid errors)

73
2. Multinomial Logit Model

Independence of irrelevant alternatives
Ratio of choice probabilities for 2 alternatives
i and j does not depend on other alternatives
(see above)
Implication proportional substitution patterns
Cf. Blue Bus Red Bus Example
T1 Blue bus (P50), Car (P50)
T2 Blue bus (P33), Car (P33),Red bus (P33)

74
2. Multinomial Logit Model

Independence of irrelevant alternatives
New alternatives or alternatives for which
utility has increased - draw proportionally from
all other alternatives
Elasticity of Pni wrt variable xnj

75
2. Multinomial Logit Model

Independence of irrelevant alternatives
Hausman-McFadden specification test

Basic idea if a subset of the choice set is
truly irrelevant, omitting it should not
significantly affect the estimates.
76
2. Multinomial Logit Model