Title: Discrete Choice Modeling
1Discrete Choice Modeling
- William Greene
- Stern School of Business
- New York University
2Part 11
- Modeling Heterogeneity
- Latent Class Models
- The Mixed Logit Model
3Heterogeneity
- Observational Observable differences across
choice makers - Choice strategy How consumers make decisions.
(Omitted attributes) - Structure Model frameworks
- Preferences Model parameters
4Accommodating Heterogeneity
- Observed? Enter in the model in familiar (and
unfamiliar) ways. - Unobserved?
5Observable (Quantifiable) Heterogeneity in
Utility Levels
Choice, e.g., among brands of cars xitj
attributes price, features Zit observable
characteristics age, sex, income
6Observable Heterogeneity in Preference Weights
7Quantifiable Heterogeneity in Scaling
wit observable characteristics age, sex,
income, etc.
8Attention to Heterogeneity
- Modeling heterogeneity is important
- Scaling is extremely important
- Attention to heterogeneity an informal survey
of four literatures
9Heterogeneity in Choice Strategy
- Consumers avoid complexity
- Lexicographic preferences eliminate certain
choices ? choice set may be endogenously
determined - Simplification strategies may eliminate certain
attributes - Information processing strategy is a source of
heterogeneity in the model.
10Structural Heterogeneity
- Marketing literature
- Latent class structures
- Yang/Allenby - latent class random parameters
models - Kamkura et al latent class nested logit models
with fixed parameters
11Heteroscedasticity in the MNL Model
- Motivation Scaling in utility functions
- If ignored, distorts coefficients
- Random utility basis
- Uij ?j ?xij ?zi ?j?ij
- i 1,,N j 1,,J(i)
- F(?ij/ ?j) 1 Exp(-Exp(?ij/ ?j)) now
scaled - Extensions Relaxes IIA
- Allows heteroscedasticity
12Latent Heterogeneity
- Limitation of the MNL Model Fundamental tastes
are the same across all individuals - How to adjust the model to allow variation across
individuals? - Full random variation
- Latent clustering allow some variation
13Heterogeneity
- Modeling individual heterogeneity
- Latent class Discrete approximation
- Mixed logit Continuous
- The mixed logit model (generalities)
- Data structure RP and SP data
- Induces heterogeneity
- Induces heteroscedasticity scaling problem
14A Latent Class Model
- Within a class
- Class sorting is probabilistic (to the analyst)
determined by individual characteristics
15Latent Classes and Random Parameters
16Latent Class Probabilities
- Ambiguous at face value Classical Bayesian
model? - Equivalent to random parameters models with
discrete parameter variation - Using nested logits, etc. does not change this
- Precisely analogous to continuous random
parameter models - Not always equivalent zero inflation models
17Estimates from the LCM
- Taste parameters within each class ?q
- Parameters of the class probability model, ?q
- For each person
- Posterior estimates of the class they are in qi
- Posterior estimates of their taste parameters
E?qi - Posterior estimates of their behavioral
parameters, elasticities, marginal effects, etc.
18Using the Latent Class Model
- Computing Posterior (individual specific) class
probabilities - Computing posterior (individual specific) taste
parameters
19Application Brand Choice
- True underlying model is a three class LCM
- NLOGIT
- lhschoice
- choicesBrand1,Brand2,Brand3,None
- Rhs Fash,Qual,Price,ASC4
- LCMMale,Age25,Age39
- Pts3 Pds8 Par
20MNL Starting Values and Basis
Normal exit from iterations. Exit
status0. ---------------------------------------
------ Discrete choice (multinomial logit)
model Log likelihood function
-4158.503 Number of parameters
4 Akaike IC 8325.006 Bayes IC
8349.289 Finite sample corrected AIC
8325.018 R21-LogL/LogL Log-L fncn
R-sqrd RsqAdj Constants only -4391.1804
.05299 .05101 Response data are given as
ind. choice. Number of obs. 3200,
skipped 0 bad obs. --------------------------
------------------- ---------------------------
------------------ Notes No coefficientsgt
P(i,j)1/J(i). Constants only gt
P(i,j) uses ASCs only. N(j)/N if
fixed choice set. N(j) total
sample frequency for j N total
sample frequency. These 2 models
are simple MNL models. R-sqrd 1 -
LogL(model)/logL(other)
RsqAdj1-nJ/(nJ-nparm)(1-R-sqrd)
nJ sum over i, choice set sizes
---------------------------------------------
21One Class MNL Estimates
----------------------------------------------
---------- Variable Coefficient Standard
Error b/St.Er.PZgtz ---------------------
----------------------------------- FASH1
1.47890473 .06776814 21.823 .0000
QUAL1 1.01372755 .06444532 15.730
.0000 PRICE1 -11.8023376 .80406103
-14.678 .0000 ASC41 .03679254
.07176387 .513 .6082
22Three Class LCM
Normal exit from iterations. Exit
status0. ---------------------------------------
------ Latent Class Logit Model
Log likelihood function -3649.132
Number of parameters 20
Restricted log likelihood -4436.142
Chi squared 1574.019
Degrees of freedom 20
ProbChiSqd gt value .0000000
R21-LogL/LogL Log-L fncn R-sqrd RsqAdj
No coefficients -4436.1420 .17741 .17569
Constants only -4391.1804 .16899 .16725
At start values -4158.5428 .12250 .12067
Response data are given as ind. choice.
---------------------------------------------
---------------------------------------------
Latent Class Logit Model
Number of latent classes 3
-------------------------------------------
LCM model with panel has 400 groups.
Fixed number of obsrvs./group 8
Discrete parameter variation specified.
-------------------------------------------
Number of obs. 3200, skipped 0 bad obs.
---------------------------------------------
LogL for one class MNL -4158.503
23Estimated LCM Utilities
----------------------------------------------
---------- Variable Coefficient Standard
Error b/St.Er.PZgtz ---------------------
-----------------------------------
Utility parameters in latent class --gtgt 1 FASH1
3.02569837 .14335927 21.106
.0000 QUAL1 -.08781664 .12271563
-.716 .4742 PRICE1 -9.69638056
1.40807055 -6.886 .0000 ASC41
1.28998874 .14533927 8.876 .0000
Utility parameters in latent class --gtgt 2
FASH2 1.19721944 .10652336 11.239
.0000 QUAL2 1.11574955 .09712630
11.488 .0000 PRICE2 -13.9345351
1.22424326 -11.382 .0000 ASC42
-.43137842 .10789864 -3.998 .0001
Utility parameters in latent class --gtgt 3
FASH3 -.17167791 .10507720 -1.634
.1023 QUAL3 2.71880759 .11598720
23.441 .0000 PRICE3 -8.96483046
1.31314897 -6.827 .0000 ASC43
.18639318 .12553591 1.485 .1376
24Estimated LCM Class Probability Model
----------------------------------------------
---------- Variable Coefficient Standard
Error b/St.Er.PZgtz ---------------------
-----------------------------------
This is THETA(1) in class probability model.
Constant -.90344530 .34993290 -2.582
.0098 _MALE1 .64182630 .34107555
1.882 .0599 _AGE251 2.13320852
.31898707 6.687 .0000 _AGE391
.72630019 .42693187 1.701 .0889
This is THETA(2) in class probability model.
Constant .37636493 .33156623 1.135
.2563 _MALE2 -2.76536019 .68144724
-4.058 .0000 _AGE252 -.11945858
.54363073 -.220 .8261 _AGE392
1.97656718 .70318717 2.811 .0049
This is THETA(3) in class probability model.
Constant .000000 ......(Fixed
Parameter)....... _MALE3 .000000
......(Fixed Parameter)....... _AGE253
.000000 ......(Fixed Parameter).......
_AGE393 .000000 ......(Fixed
Parameter).......
25Estimated LCM Conditional Parameter Estimates
26Estimated LCM Conditional Class Probabilities
27Average Estimated Class Probabilities
- MATRIX list 1/400 classp_i'1
- Matrix Result has 3 rows and 1 columns.
- 1
- --------------
- 1 .50555
- 2 .23853
- 3 .25593
- This is how the data were simulated. Class
probabilities are .5, .25, .25. The model
worked.
28Application Long Distance Drivers Preference
for Road Environments
- New Zealand survey, 2000, 274 drivers
- Mixed revealed and stated choice experiment
- 4 Alternatives in choice set
- The current road the respondent is/has been
using - A hypothetical 2-lane road
- A hypothetical 4-lane road with no median
- A hypothetical 4-lane road with a wide grass
median. - 16 stated choice situations for each with 2
choice profiles - choices involving all 4 choices
- choices involving only the last 3 (hypothetical)
Hensher and Greene, A Latent Class Model for
Discrete Choice Analysis Contrasts with Mixed
Logit Transportation Research B, 2003
29Attributes
- Time on the open road which is free flow (in
minutes) - Time on the open road which is slowed by other
traffic (in minutes) - Percentage of total time on open road spent with
other vehicles close behind (ie tailgating) () - Curviness of the road (A four-level attribute -
almost straight, slight, moderate, winding) - Running costs (in dollars)
- Toll cost (in dollars).
30Experimental Design
- The four levels of the six attributes that were
chosen are as follows - Free Flow Travel Time -20, -10, 10, 20
- Time Slowed Down -20, -10, 10, 20
- Percent of time with vehicles close behind-50,
-25, 25, 50 - Curvinessalmost, straight, slight, moderate,
winding - Running Costs -10, -5, 5, 10
- Toll cost for car and double for truck if trip
duration is - 1 hours or less 0, 0.5, 1.5, 3
- between 1 hour and 2 hours 30 minutes 0, 1.5,
4.5, 9 - more than 2 and a half hours 0, 2.5, 7.5, 15
31Survey
32Estimated Latent Class Model
33Estimated Value of Time Saved
34Distribution of Parameters Value of Time on 2
Lane Road
35Continuous Random Variation in Preference Weights
36Classical Estimation Platform The Likelihood
Expected value over all possible realizations of
?i (according to the estimated asymptotic
distribution). I.e., over all possible samples.
37Maximum Simulated Likelihood
True log likelihood
Simulated log likelihood
38Computational Difficulty?
- Outside of normal linear models with normal
random coefficient distributions, performing the
integral can be computationally challenging.
(AR, p. 62) - (No longer even remotely true)
- MSL with dozens of parameters is simple
- Multivariate normal (multinomial probit) is no
longer the benchmark alternative. (See McFadden
and Train) - Intelligent methods of integration (Halton
sequences) speed up integration by factors of as
much as 10. (These could be used by Bayesians.)
39Random Parameters Model
- Allow model parameters as well as constants to be
random - Allow multiple observations with persistent
effects - Allow a hierarchical structure for parameters
not completely random - Uitj ?1xi1tj ?2itxi2tj ?izit
?ijt - Random parameters in multinomial logit model
- ?1 nonrandom (fixed) parameters
- ?2it random parameters that may vary across
individuals and across time - Maintain I.I.D. assumption for ?ijt (given ?)
40Random Parameters Logit Model
Multiple choice situations Independent
conditioned on the individual specific parameters
41Random Parameters Specification
?2it(k) parameter on kth attribute
?2k ?kzi ?kvit Mean
?2k ?kzi may depend on characteristics Var
iance ?k ?kMay be correlated with other
parameters Distribution Depends on
specification of vit Vit may be a random
effect or correlated across time to capture
persistence of preferences across choice
settings Elements of ? and/or choice specific
constants ? may also Be random
42Modeling Variations
- Parameter specification
- Nonrandom variance 0
- Fixed mean not to be estimated. Free variance
- Fixed range mean estimated, triangular from 0
to 2? - Hierarchical structure - ?i ? ?(k)zi
- Stochastic specification
- Normal, uniform, triangular (tent) distributions
- Strictly positive lognormal parameters (e.g.,
on income) - Autoregressive v(i,t,k) u(i,t,k)
r(k)v(i,t-1,k) this picks up time effects in
multiple choice situations, e.g., fatigue.
43Estimating the Model
Denote by ?1 all fixed parameters in the
model Denote by ?2i,t all random and hierarchical
parameters in the model
44Estimating the RPL Model
- Denote by ?1 all fixed parameters in the model
- Denote by ?2i,t all random and hierarchical
parameters in the model - Estimation ?1
- ?2it ?2 ?zi Gvi,t
- Uncorrelated G is diagonal
- Autocorrelated vi,t Rvi,t-1 ui,t
- (1) Estimate structural parameters
- (2) Estimate individual specific utility
parameters - (3) Estimate elasticities, etc.
-
45Simulation Based Estimation
- Choice probability Pdata ?(?1,?2,?,G,R,vi,t)
- Need to integrate out the unobserved random term
- EPdata ?(?1,?2,?,G,R,vi,t)
Pvi,tf(vi,t)dvi,t - Integration is done by simulation
- Draw values of v and compute ? then probabilities
- Average many draws
- Maximize the sum of the logs of the averages
- (See TrainCambridge, 2003 on simulation
methods.)
46Customers Choice of Energy Supplier
- California, Stated Preference Survey
- 361 customers presented with 8-12 choice
situations each - Supplier attributes
- Fixed price cents per kWh
- Length of contract
- Local utility
- Well-known company
- Time-of-day rates (11 in day, 5 at night)
- Seasonal rates (10 in summer, 8 in winter, 6
in spring/fall)
47Population Distribution
- Normal for
- Contract length
- Local utility
- Well-known company
- Log-normal for
- Time-of-day rates
- Seasonal rates
- Price coefficient held fixed
48Estimated Model
Estimate Std
error Price
-.883 0.050 Contract mean
-.213 0.026 std dev
.386 0.028 Local mean
2.23 0.127
std dev 1.75 0.137 Known
mean 1.59 0.100
std dev .962 0.098 TOD
mean 2.13 0.054
std dev .411
0.040 Seasonal mean 2.16
0.051 std dev .281
0.022 Parameters of underlying normal.
49Distribution of Brand Value
Standard deviation
10 dislike local utility
2.0
0
2.5
- Brand value of local utility
5029
Mean
-.24
Standard Deviation
.55
0
-0.24
10
Mean
2.5
Standard Deviation
2.0
0
2.5
5
Mean
1.8
Standard Deviation
1.1
0
1.8
51Time of Day Rates (Customers do not like.)
0
-10.4
-10.2
0
52Expected Preferences of Each Customer
Population Mean
Customer As Conditional Mean
Contract length
-0.24
2.20
Local utility
2.50
3.30
Well-known company
1.80
2.00
Time-of-day rates
-10.40
-6.30
Seasonal rates
-10.20
-6.60
Customer likes long-term contract, local
utility, and non-fixed rates. Local utility
can retain and make profit from this customer by
offering a long-term contract with time-of-day
or seasonal rates.
53A General Extension of the RPL
54Other Model extensions
- AR(1) wi,k,t rkwi,k,t-1 vi,k,t
- Dynamic effects in the model
- Restricting Sign
- Restricting Range and Sign Using triangular
distribution and range 0 to 2?.
55Heteroscedasticity and Heterogeneity
Why is heteroscedasticity important? Why should
only the means of the random parameters be
heterogeneous?
56Estimating Individual Parameters
- Model estimates structural parameters
- Objective, model of individual specific
parameters - Can individual specific parameters be estimated?
57Estimating Individual Distributions
- Posterior estimates of E?i
- Use the same methodology to estimate E?i2 and
Var?i. - Plot individual confidence intervals (assuming
near normality) - Sample from the distribution and plot kernel
density estimates
58Posterior Estimation of ?i
Estimate by simulation
59Application Shoe Brand Choice
- Simulated Data Stated Choice, 400 respondents, 8
choice situations - 3 choice/attributes NONE
- Fashion High / Low
- Quality High / Low
- Price 25/50/75,100 coded 1,2,3,4
- Heterogeneity Sex, Age (lt25, 25-39, 40)
- Underlying data generated by a 3 class latent
class process (100, 200, 100 in classes) - Thanks to www.statisticalinnovations.com (Latent
Gold and Jordan Louviere)
60Error Components Logit Modeling
- Alternative approach to building cross choice
correlation - Common effects
- Example
61Implied Covariance Matrix
62Error Components Logit Model
Correlation 0.2837 / 1.6449 0.2837 0.1468
63Extending the Basic MNL Model
64Error Components Logit Model
65Random Parameters Model
66Heterogeneous (in the Means) Random Parameters
Model
67Heterogeneity in Both Means and Variances
68Individual Effects Model
69(No Transcript)
70Individual E?idatai Estimates
The intervals could be made wider to account for
the sampling variability of the underlying
(classical) parameter estimators.
71What is the Individual Estimate?
- Point estimate of mean, variance and range of
random variable ?i datai. - Value is NOT an estimate of ?i it is an
estimate of E?i datai - What would be the best estimate of the actual
realization ?idatai? - An interval estimate would account for the
sampling variation in the estimator of O that
enters the computation. - Bayesian counterpart to the preceding? Posterior
mean and variance? Same kind of plot could be
done.
72WTP Application (Value of Time Saved)
- Estimating Willingness to Pay for Increments to
an Attribute in a Discrete Choice Model
Random
73Extending the RP Model to WTP
- Use the model to estimate conditional
distributions for any function of parameters - Willingness to pay ?i,time / ?i,cost
- Use same method
74Estimation of WTP from ?i
Estimate by simulation
75Stated Choice Experiment Travel Mode by Sydney
Commuters
76Would You Use a New Mode?
77Value of Travel Time Saved