Title: Models with limited dependent variables
1Models with limited dependent variables
- Doctoral Program 2006-2007
- Katia Campo
2Introduction
3Limited dependent variables
Discrete dependent variable
Continuous dependent variable
Truncated, Censored
4Discrete choice models
- Choice between different options (j)
- Single Choice (binary choice models)
- e.g. Buy a product or not, follow higher
education or not, ... - j1 (yes/accept) or 0 (no/reject)
- Multiple Choice (multinomial choice models),
- e.g. cars, stores, transportation modes
- j1(opt.1), 2(opt.2), ....., J(opt.J)
5Truncated/censored regression models
- Truncated variable
- observed only beyond a certain threshold level
(truncation point) - e.g. store expenditures, income
- Censored variables
- values in a certain range are all transformed to
(or reported as) a single value (Greene, p.761) - e.g. demand (stockouts, unfullfilled demand),
hours worked
6Duration/Hazard models
- Time between two events, e.g.
- Time between two purchases
- Time until a consumer becomes inactive/cancels a
subscription - Time until a consumer responds to direct mail/ a
questionnaire - ...
7Need to use adjusted models Illustration
Frances and Paap (2001)
8Overview
- Part I. Discrete Choice Models
- Part II. Censored and Truncated Regression Models
- Part III. Duration Models
9Recommended Literature
- Kenneth Train, Discrete Choice Methods with
Simulation, Cambridge University Press, 2003
(Part I) - Ph.H.Franses and R.Paap, Quantitative Models in
Market Research, Cambridge University Press, 2001
(Part I-II-III Data www.few.eur.nl/few/people/pa
ap) - D.A.Hensher, J.M.Rose and W.H.Greene, Applied
Choice Analysis, Cambridge University Press, 2005
(Part I)
10Part I. Discrete Choice Models
11Overview Part I, DCM
- Properties of DCM
- Estimation of DCM
- Types of Discrete Choice Models
- Binary Logit Model
- Multinomial Logit Model
- Nested logit model
- Probit Model
- Ordered Logit Model
- Heterogeneity
12Notation
- n decision maker
- i,j choice options
- y decision outcome
- x explanatory variables
- ? parameters
- ? error term
- I. indicator function, equal to 1 if
expression within brackets is true, 0 otherwise - e.g. Iyjx 1 if j was selected (given x),
equal to 0 otherwise
13A. Properties of DCM
Kenneth Train
- Characteristics of the choice set
- Alternatives must be mutually exclusive
- no combination of choice alternatives
- (e.g. different brands, combination of diff.
transportation modes) - Choice set must be exhaustive
- i.e., include all relevant alternatives
- Finite number of alternatives
14A. Properties of DCM
Kenneth Train
- Random utility maximization
- Ass decision maker selects the alternative that
provides the highest utility, - i.e. Selects i if Uni gt Unj ? j ? i
- Decomposition of utility into a deterministic
(observed) and random (unobserved) part - Unj Vnj ?nj
15A. Properties of DCM
Kenneth Train
- Random utility maximization
-
16A. Properties of DCM
Kenneth Train
- Identification problems
- Only differences in utility matter
- Choice probabilities do not change when a
constant is added to each alternatives utility -
- Implication
- Some parameters cannot be identified/estimated
Alternative-specific constants Coefficients of
variables that change over decision makers but
not over alternatives - Normalization of parameter(s)
-
-
-
17A. Properties of DCM
Kenneth Train
- Identification problems
- Overall scale of utility is irrelevant
- Choice probabilities do not change when the
utility of all alternatives are multiplied by the
same factor - Implication
- Coefficients of ? models (data sets) are not
directly comparable - Normalization (var.of error terms)
18A. Properties of DCM
Kenneth Train
- Aggregation
-
- Biased estimates when aggregate values of the
explanatory variables are used as input - Consistent estimates can be obtained by sample
enumeration - - compute prob./elasticity for each dec.maker
- - compute (weighted) average of these values
-
Swait and Louvière(1993), Andrews and Currim
(2002)
19Properties of DCM
Keneth Train
20B. Estimation DCM
- Numerical maximization (ML-estimation)
- Simulation-assisted estimation
- Bayesian estimation
(see Train)
21B. ML-estimation
- Objective find those parameter values most
likely to have produced the sample observations
(Judge et al.) - Likelihood for one observation Pn(X,?)
- Likelihoodfunction
- L(?) ?n Pn(X,?)
- Loglikelihood
- LL(?) ? n ln(Pn(X,?))
22B. ML Estimation
- Determine ? for which LL(?) reaches its max
- First derivative 0 ? no closed-form solution
- Iterative procedure
- Starting values ?0
- Determine new value ?t1 for which LL(?t1) gt
LL(?t) - Repeat procedure ii until convergence (small
change in LL(?))
23B. ML Estimation
24B. ML Estimation
- - Direction and step size ?t ? ?t1 ?
- based on taylor approximation of LL(?t1) (with
base (?t)) - LL(?t1) LL(?t)(?t1- ?t)gt1/2(?t1-
?t)Ht (?t1- ?t) 1 - with
-
-
25B. ML Estimation
- - Direction and step size ?t ? ?t1 ?
-
- Optimization of 1 leads to
- ? Computation of the Hessian may cause problems
26B. ML Estimation
- Alternatives procedures
-
- Approximations to the Hessian
- Other procedures, such as steepest-ascent
See e.g. Train, Judge et al.(1985)
27B. ML Estimation
- Properties ML estimator
- Consistency
- Asymptotic Normality
- Asymptotic Efficiency
See e.g. Greene (ch.17), Judge et al.
28B.Diagnostics and Model Selection
- Goodness-of-Fit
- Joint significance of explanatory vars
- LR-test LR -2(LL(?0) - LL(?))
- LR ?²(k)
- Pseudo R² 1 - LL(?)
- LL(?0)
29B.Diagnostics and Model Selection
- Goodness-of-Fit
- Akaike Information Criterion
- AIC 1/N (-2 LL(?) 2k)
- CAIC -2LL(?) k(log(N)1)
- BIC 1/N (-2 LL(?) k log(N))
- sometimes conflicting results
30B.Diagnostics and Model Selection
- Model selection based on GoF
- Nested models LR-test
- LR -2(LL(?r) - LL(?ur))
- rrestricted model urunrestricted (full) model
- LR ?²(k) (kdifference in of parameters)
- Non-nested models
- AIC, CAIC, BIC ? lowest value
31C. Discrete Choice Models
- Binary Logit Model
- Multinomial Logit Model
- Nested logit model
- Probit Model
- Ordered Logit Model
321. Binary Logit Model
- Choice between 2 alternatives
- Often accept/reject or yes/no decisions
- E.g. Purchase incidence make a purchase in the
category or not - Dep. var. yn 1, if option is selected
- 0, if option is not
selected - Model P(yn1 xn)
331. Binary Logit Model
- Based on the general RUM-model
- Ass. error terms are iid and follow an extreme
value or Gumbel distribution
341. Binary Logit Model
- Based on the general RUM-model
- Pn ? Ißxn en gt 0 f(e) de
- ? Ien gt -ßxn f(e) de
- ?e-ßx f(e) de
- 1 F(- ßxn)
- 1 1/(1exp(ßxn))
- exp(ßxn)/(1exp(ßxn))
- Ass. error terms are iid and follow an extreme
value/Gumbel distr.
351. Binary Logit Model
- Leads to the following expression for the logit
choice probability
361. Binary Logit Model
- Properties
- Nonlinear effect of explanatory vars on
dependent variable - Logistic curve with inflection point at P0.5
371. Binary Logit Model
381. Binary Logit Model
- Effect of explanatory variables ?
- For
- Quasi-elasticity
391. Binary Logit Model
- Effect of explanatory variables ?
- For
- Odds ratio is equal to
401. Binary Logit Model
- Estimation ML
- Likelihoodfunction L(?)
- ?n P(yn1x,?)yn (1- P (yn1x,?))1-yn
- Loglikelihood LL(?)
- ? n yn ln(P (yn1x,?) )
- (1-yn) ln(1- P (yn1x,?))
411. Binary Logit Model
- Forecasting accuracy
- Predictions yn1 if F(Xn ?) gt c (e.g. 0.5)
- yn0 if F(Xn ?) ? c
- Compute hit rate of correct predictions
421. Binary Logit Model
- Example Purchase Incidence Model
-
- ptn(inc) probability that household n
engages - in a category purchase in the
store - on purchase occasion t,
- Wtn the utility of the purchase option.
Bucklin and Gupta (1992)
431. Binary Logit Model
- Example Purchase Incidence Model
With CRn rate of consumption for household
n INVnt inventory level for household n, time
t CVnt category value for household n, time t
Bucklin and Gupta (1992)
441. Binary Logit Model
- Data
- A.C.Nielsen scanner panel data
- 117 weeks 65 for initialization, 52 for
estimation - 565 households 300 selected randomly for
estimation, remaining hh holdout sample for
validation - Data set for estimation 30.966 shopping trips,
2275 purchases in the category (liquid laundry
detergent) - Estimation limited to the 7 top-selling brands
(80 of category purchases), representing 28
brand-size combinations ( level of analysis for
the choice model)
Bucklin and Gupta (1992)
451. Binary Logit Model
Goodness-of-Fit
Model param. LL U² (pseudo R²) BIC
Null model Full model 1 4 -13614.4 -11234.5 - .175 13619.6 11255.2
461. Binary Logit Model
Parameter estimates
Parameter Estimate (t-statistic)
Intercept ?0 CR ?1 INV ?2 CV ?3 -4.521 (-27.70) .549 (4.18) -.520 (-8.91) .410 (8.00)
47 Variable Coefficient Std.
Error z-Statistic Prob.  C 0.2221
21 0.668483 0.332277 0.7397 DISPLHEINZ 0.57338
9 0.239492 2.394186 0.0167 DISPLHUNTS -0.55764
8 0.247440 -2.253674 0.0242 FEATHEINZ 0.505656
0.313898 1.610896 0.1072 FEATHUNTS -1.055859 0.
349108 -3.024445 0.0025 FEATDISPLHEINZ
0.428319 0.438248 0.977344 0.3284 FEATDISPLHUN
TS -1.843528 0.468883 -3.931748 0.0001 PRICEHEIN
Z -135.1312 10.34643 -13.06066 0.0000 PRICEHUNTS
222.6957 19.06951 11.67810 0.0000
Binary Logit Model (Franses and Paap
www.few.eur.nl/few/people/paap)
48Binary Logit Model (Franses and Paap
www.few.eur.nl/few/people/paap)
Mean dependent var 0.890279 Â Â Â Â S.D. dependent
var 0.312598 S.E. of regression 0.271955 Â Â Â Â Ak
aike info criterion 0.504027 Sum squared
resid 206.2728 Â Â Â Â Schwarz criterion 0.523123
Log likelihood -696.1344Â Â Â Â Hannan-Quinn
criter. 0.510921 Restr. log likelihood -967.918
 Avg. log likelihood -0.248797 LR statistic
(8 df) 543.5673 Â Â Â Â McFadden R-squared 0.280792
Probability(LR stat) 0.000000 Obs
with Dep0 307 Â Total obs 2798 Obs with
Dep1 2491
49Binary Logit Model (Franses and Paap
www.few.eur.nl/few/people/paap)
50Binary Logit Model (Franses and Paap
www.few.eur.nl/few/people/paap)
512. Multinomial Logit Model
- Choice between Jgt2 categories
- Dependent variable yn 1, 2, 3, .... J
- Explanatory variables
- Different across individuals, not across
categories (standard MNL model) - Different across (individuals and) categories
(conditional MNL model) - Model P(ynjXn)
522. Multinomial Logit Model
- Based on the general RUM-model
- Ass. error terms are iid following an extreme
value or Gumbel distribution
532. Multinomial Logit Model
- Identification problem ? select reference
category and set coeffients equal to 0
542. Multinomial Logit Model
552. Multinomial Logit Model
- Interpretation of parameters
- Derivative (marginal effect)
- Cross-effects
(Traditional MNL model, see Franses en Paap p.80)
562. Multinomial Logit Model
- Interpretation of parameters
- Overall effect
572. Multinomial Logit Model
- Interpretation of parameters
- Probability-ratio
- Does not depend on the other alternatives!
582. Multinomial Logit Model
(znj1 if j is selected, 0 otherwise)
592. Multinomial Logit Model
- Estimation
- Alternative estimation procedures
- Simulation-assisted estimation (Train, Ch.10)
- Bayesian estimation (Train, Ch.12)
602. Multinomial Logit Model
Bucklin and Gupta (1992)
612. Multinomial Logit Model
- Variables
- Ui constant for brand-size i
- BLhi loyalty of household h to brand of
brandsize i - LBPhit 1 if i was last brand purchased, 0
otherwise - SLhi loyalty of household h to size of
brandsize i - LSPhit 1 if i was last size purchased, 0
otherwise - Priceit actual shelf price of brand-size i at
time t - Promoit promotional status of brand-size i at
time t
Bucklin and Gupta (1992)
622. Multinomial Logit Model
- Data
- A.C.Nielsen scanner panel data
- 117 weeks 65 for initialization, 52 for
estimation - 565 households 300 selected randomly for
estimation, remaining hh holdout sample for
validation - Data set for estimation 30.966 shopping trips,
2275 purchases in the category (liquid laundry
detergent) - Estimation limited to the 7 top-selling brands
(80 of category purchases), representing 28
brand-size combinations ( level of analysis for
the choice model)
Bucklin and Gupta (1992)
632. Multinomial Logit Model
Goodness-of-Fit
Model param. LL U² (pseudo R²) BIC
Null model Full model 27 33 -5957.3 -3786.9 - .364 6061.6 3914.3
Bucklin and Gupta (1992)
642. Multinomial Logit Model
Estimation Results
Parameter Estimate (t-statistic)
BL ?1 LBP ?2 SL ?3 LSP ?4 Price ?5 Promo ?6 3.499 (22.74) .548 (6.50) 2.043 (13.67) .512 (7.06) -.696 (-13.66) 2.016 (21.33)
Bucklin and Gupta (1992)
652. Multinomial Logit Model
- Scale parameter
- Variance of the extreme value distribution ?²/6
- If true utility is Unj ?xnj ?nj with
var(?nj) ?² (?²/6), the estimated
representative utility Vnj ?xnj involves a
rescaling of ? ? ? ? / ? - ? and ? can not be estimated separately
- take into account that the estimated coeffients
indicate the variables effect relative to the
variance of unobserved factors - Include scale parameters if subsamples in a
pooled estimation (may) have different error
variances
662. Multinomial Logit Model
- Scale parameter in case of pooled estimation of
subsamples with different error variance - For each subsample s, multiply utility by µs,
which is estimated simultaneously with ? - Normalization set µs equal to 1 for 1 subs.
- Values of µs reflect diffs in error variation
- µsgt1 error variance is smaller in s than in the
reference subsample - µslt1 error variance is larger in s than in the
reference subsample
Swait and Louviere (1993), Andrews and Currim
(2002)
672. Multinomial Logit Model
- Example
- Data from online experiment, 2 product categories
- Three diff.assortments, assigned to
diff.respondent groups - Assortment 1 small assortment
- Assortment 2 ass.1 extended with add.brands
- Assortment 3 ass.1 extended with add types
- Explanatory variables are the same (hh chars,
MM), with exception of the constants - A scale factor is introduced for assortment 2 and
3 (assortment 1 is reference with scale factor 1)
Breugelmans et al (2005)
682. Multinomial Logit Model
Table 1 Descriptives for each assortment
(margarine and cereals)
Breugelmans et al (2005)
a common refers to attribute levels that are
present in all three assortments
692. Multinomial Logit Model
- MNL-model Pooled estimation
- Phit,a the probability that household h chooses
item i at time t, facing assortment a - uhit,a the choice utility of item i for
household h facing assortment a - f(household variables, MM-variables)
- Cha set of category items available to household
h within assortment a - µa Gumbel scale factor
-
Breugelmans et al, based on Andrews and Currim
2002 Swait and Louvière 1993
702. Multinomial Logit Model
- Estimation results
- Goodness-of-Fit
- (average) LL -0.045 (M), -0.040 (C)
- BIC 2929 (M), 4763(C)
- CAIC 2871 (M), 4699 (C)
- Scale factors
- M 1.2498 (ass2), 1.2627 (ass3)
- C 1.0562 (ass2), 0.7573 (ass3)
Breugelmans et al (2005)
712. Multinomial Logit Model
Margarine Margarine Margarine Margarine Cereals Cereals Cereals Cereals
Variable Assortment 1 Assortment 2 Assortment 3 Variable Assortment 1 Assortment 2 Assortment 3
Scale factor Mean Last purchase Item preference Brand asymmetry Size asymmetry Sequence Proximity 1.00b 2.0675 2.8310 0.2805 -0.0841 - d 0.8332 1.2498 2.5840c 3.5382c 0.4228 -0.0880 0.3672 1.0303 1.2627 2.6106c 3.5747c 0.5400 0.0169 -0.1190 0.6235 Scale factor Mean Last purchase Item preference Brand asymmetry Taste asymmetry Type asymmetry Sequence Proximity 1.00b 0.6441 5.2011 0.0077 -0.0260 0.3119 -0.3311 2.0041 1.0562 0.6803c 5.4934c 0.6130 0.2938 -0.0614 -0.0695 0.7214 0.7573 0.4888c 3.9109c 0.0969 -0.1596 0.3816 0.6190 4.1140
(Excluding brand/size constants)
Breugelmans et al (2005)
722. Multinomial Logit Model
- Limitations of the MNL model
- Independence of Irrelevant Alternatives
(proportional substitution pattern) - Order (where relevant) is not taken into account
- Systematic taste variation can be represented,
not random taste variation - No correlation between error terms (iid errors)
-
732. Multinomial Logit Model
- Independence of irrelevant alternatives
- Ratio of choice probabilities for 2 alternatives
i and j does not depend on other alternatives
(see above) - Implication proportional substitution patterns
- Cf. Blue Bus Red Bus Example
- T1 Blue bus (P50), Car (P50)
- T2 Blue bus (P33), Car (P33),Red bus (P33)
742. Multinomial Logit Model
- Independence of irrelevant alternatives
- New alternatives or alternatives for which
utility has increased - draw proportionally from
all other alternatives - Elasticity of Pni wrt variable xnj
-
752. Multinomial Logit Model
- Independence of irrelevant alternatives
- Hausman-McFadden specification test
-
-
Basic idea if a subset of the choice set is
truly irrelevant, omitting it should not
significantly affect the estimates.
762. Multinomial Logit Model
- Independence of irrelevant alternatives
- Hausman-McFadden specification test
- Procedure
- -Estimate logit model twice
- a. on full set of alternatives
- b. on subset of alternatives
- (and subsample with choices from this
set) -When IIA is true, -
-
-
-
772. Multinomial Logit Model
- Independence of irrelevant alternatives
- Alternative Procedure
- -Estimate logit model twice
- a. on full set of alternatives
- b. on subset of alternatives
- (and subsample with choices from this set)
- - compute LL for subset b with parameters
- obtained for set a
- - Compare with LLb GoF should be similar
782. Multinomial Logit Model
- Solutions to IIA
- Model with attribute-specific constants
(intrinsic preferences) - Nested Logit Model
- Models that allow for correlation among the error
terms, such as Probit Models