Title: Consumer Behavior Prediction using Parametric and Nonparametric Methods
1Consumer Behavior Prediction using Parametric and
Nonparametric Methods
- Elena Eneva
- CALD Masters Presentation
- 19 August 2002
- Advisors Alan Montgomery, Rich Caruana,
- Christos Faloutsos
2Outline
- Introduction
- Data
- Economics Overview
- Baseline Models
- New Hybrid Models
- Results
- Conclusions and Future Work
3Background
- Retail chains are aiming to customize prices in
individual stores - Pricing strategies should adapt to the
neighborhood demand - Stores can increase operating profit margins by
33 to 83
4Price Elasticity
Q is quantity purchased P is price of product
consumers response to price change
5Data Example
6Data Example Log Space
7Assumptions
- Independence
- Substitutes fresh fruit, other juices
- Other Stores
- Stationarity
- Change over time
- Holidays
8The Model
Need to multiply this across many stores, many
categories.
9Converting to Original Space
10Existing Methods
- Traditionally using parametric models (linear
regression) - Recently using non-parametric models (neural
networks)
11Our Goal
- Advantage of LR known functional form (linear in
log space), extrapolation ability - Advantage of NN flexibility, accuracy
12Datasets
- weekly store-level cash register data at the
product level - Chilled Orange Juice category
- 2 years
- 12 products
- 10 random stores selected
13Evaluation Measure
- Root Mean Squared Error (RMS)
- the average deviation between the predicted
quantity and the true quantity
14Models
- Hybrids
- Smart Prior
- MultiTask Learning
- Jumping Connections
- Frozen Jumping Connections
- Baselines
- Linear Regression
- Neural Networks
15Baselines
- Linear Regression
- Neural Networks
16Linear Regression
- q is the quantity demanded
- pi is the price for the ith product
- K products overall
- The coefficients a and bi are determined by the
condition that the sum of the square residuals is
as small as possible.
17Linear Regression
18Results RMS
19Neural Networks
- generic nonlinear function approximators
- a collection of basic units (neurons), computing
a (non)linear function of their input - backpropagation
20Neural Networks
1 hidden layer, 100 units, sigmoid activation
function
21Results RMS
22Hybrids
- Smart Prior
- MultiTask Learning
- Jumping Connections
- Frozen Jumping Connections
23Smart Prior
- Idea start the NN at a good set of weights,
help it start from a smart prior. - Take this prior from the known linearity
- NN first trained on synthetic data generated by
the LR model - NN then trained on the real data
24Smart Prior
25Results RMS
26Multitask Learning
- Idea learning an additional related task in
parallel, using a shared representation - Adding the output of the LR model (built over the
same inputs) as an extra output to the NN - Make the net share its hidden nodes between both
tasks - Custom halting function
- Custom RMS function
27MultiTask Learning
28Results RMS
29Jumping Connections
- Idea fusing LR and NN
- change architecture
- add connections which jump over the hidden
layer - Gives the effect of simulating a LR and NN all
together
30Jumping Connections
31Results RMS
32Frozen Jumping Connections
- Idea you have the linearity, now use it!
- same architecture as Jumping Connections, plus
really emphasizing the linearity - freeze the weights of the jumping layer, so the
network cant forget about the linearity
33Frozen Jumping Connections
34Frozen Jumping Connections
35Frozen Jumping Connections
36Results RMS
37Models
- Hybrids
- Smart Prior
- MultiTask Learning
- Jumping Connections
- Frozen Jumping Connections
- Baselines
- Linear Regression
- Neural Networks
- Combinations
- Voting
- Weighted Average
38Combining Models
- Idea Ensemble Learning
- Committee Voting equal weights for each models
prediction - Weighted Average optimal weights determined by
a linear regression model - 2 baseline and 3 hybrid models
- (Smart Prior, MultiTask Learning, Frozen Jumping
Conections)
39Committee Voting
- Average the predictions of the models
40Results RMS
41Weighted Average Model Regression
- Linear regression on baselines and hybrid models
to determine vote weights
42Results RMS
43Normalized RMS Error
- Compare model performance across stores
- Stores of different sizes, ages, locations, etc
- Need to normalize
- Compare to baselines
- Take the error of the LR benchmark as unit error
44Normalized RMS Error
45Conclusions
- Clearly improved models for customer choice
prediction - Will allow stores to price the products more
strategically and optimize profits - Maintain better inventories
- Understand product interaction
46Future Work Ideas
- analyze Weighted Average model
- compare extrapolation ability of new models
- use other domain knowledge
- shrinkage model a super store model with
data pooled across all stores
47Acknowledgements
- I would like to thank my advisors
- and
- my CALDling friends and colleagues
48The Most Important Slide
- for this presentation and the paper
- www.cs.cmu.edu/eneva/research.htm
- eneva_at_cs.cmu.edu
49References
- Montgomery, A. (1997). Creating Micro-Marketing
Pricing Strategies Using Supermarket Scanner Data - West, P., Brockett, P. and Golden, L (1997) A
Comparative Analysis of Neural Networks and
Statistical Methods for Predicting Consumer
Choice - Guadagni, P. and Little, J. (1983) A Logit Model
of Brand Choice Calibrated on Scanner data - Rossi, P. and Allenby, G. (1993) A Bayesian
Approach to Estimating Household Parameters
50Error Measure Unbiased ModelDetails
by computing the integral over the distribution
is a biased estimator for q, so we correct the
bias by using
- which is an unbiased estimator for q.
51On one hand
In log space, Price-Quantity relationship is
fairly linear
52On the other hand
- the derivation of consumers' demand responses to
price changes without the need to write down and
rely upon particular mathematical models for
demand