Title: MARK2039
1Lecture 8
- MARK2039
- Summer 2006
- George Brown College
- Wednesday 9-12
2Assignment 6
Backend H4B2E5STRUGER Marketing list
H4B2E5STRUGERJOHN4849MAYFAIR Unaddressed
Campaign H4B2E5
3Assignment 6
Id Total Amount
of months since last trans. 456
1280 6 months 123 300 5
months789 76 8 months12
10 10 months
4Assignment 6
Data needs to be standardized such that we have
one value for each gender outcome
5Assignment 6
Use purchase behaviour field and look at purchase
window(say 3 mos.)(April06 to June06). No
purchase in window means customer is non
defector(0) while purchase in window means
customer is defector(1). I would use the other
information(income,region,age, and tenure) as
potential variablesto help predict defection.
6Classification/Profiling vs. Predictive Modelling
Profiling
Predictive Modelling
-
-
Pre
Defector
Post
Non Defector
Age
-
Defector
-
Age
-
Age
-
Tenure
-
Non
-
Defector
-
Tenure
-
Tenure
-
Income
-
Income
-
Income
-Transaction
TransactionBehaviour
-
Transactionbehaviour
-Transaction Behaviour
Independant
Dependant
variables
variable
Classification
Predict
7Predictive Modelling
- ExamplesDiscrete Models
- Response Models
- Cross Sell
- Upsell
- Acquisition
- Attrition Models
- Product Affinity Models
- Risk Models
8Predictive Modelling
- Examples-Continuous Models
- Profitability/Value Models
- Spending Models
9Types of Predictive Models -
- An acquisition campaign with no targetting was
conducted in January. The available information
is as follows - Mail files containing name and address
- Responder files containing name and address
- 2001 Stats Can Census data available at the
enumeration area - A conversion table which maps enumeration areas
to postal codes - How would you use the above information to better
target prospects to become new customers. - Describe how the analytical file would be created
- 1) define objective function of creating response
variable - 2)create response variable by matching responder
file to mail file using match key of postal code
and last name. Assign value of 1 for
matches(responders) and 0 for non matches(non
responders). This field will be created on mail
file or analytical file - 3)Match analytical file to Stats conversion
file(contains enumeration area) by postal code.
Match new output file to Stats Can file by
enumeration area which contains the very rich
demographic information. - Remember the end deliverable is to create a table
with the dependant variable or objective function
and examples of other independent or predictor
variables.
10Types of Predictive Models
1) define objective function of
- You have been asked to create programs that
better target existing customers for insurance
products. You have the following info
What would you do and how would you create the
analytical file 1) Define objective function and
create insurance response variable 2)create
insurance response variable by looking at amount
spent in certain transaction type and within a
certain timeframe. Assign value of 1 if this
condition is met and 0 if not.. This field will
be created on analytical file 3)Create
independent model predictors by creating
recency,freq uency, and amount variables and by
type from the transaction file. Create
demographic variables from the customer file such
as region of country, tenure, age, income,etc.
Remember the end deliverable is to create a
table with the dependant variable or objective
function and examples of other independent or
predictor variables.
11Types of Predictive Models
- You have been asked to build a targetting tool
for a cross-sell campaign to get existing
customers to purchase an insurance policy A
campaign was conducted in May of 2005. What
questions do you need to ask in order to help
design a proper tool - Was the campaign data captured. Are responders
clearly identified or do we have to impute them
through the database based on the transaction
data that occurred within a certain time frame of
the campaign.
12Types of Predictive Models
- You have been asked to target customer that will
not only purchase insurance but will also
purchase the largest premiums - What type of model would be built here?
- Two-stage model with one whereby we are
targetting both insurance response and premium.
Objective function is Expected value of premium
Pr(Response) X Premium
13Types of Predictive Models
- Creating The Analytical File
- Defining the objective function
- Defining the Model predictors
- Once this is done, the first diagnostic that can
be done is the correlation matrix.
14Correlation
- Want to determine which variables have the
greatest relationship with response - Run the correlation of the dependant variable
with all the independents (in your reduced set).
- Based on the highest correlation coefficient
select best variables (usually select those with
statistical significance criterion of at least
95) - Correlation can be negative or positive
- Serves as a great pre-screening tool.
15The Concept of Correlation
- Using correlation analysis for selecting
variables for our response model. - Analytical file contains six variables
Dependant Variable/ Modelled Variable
Response
- Age
- Tenure
- of Products
- of Promotions
- Income
- Household Size
Independent Variables
- The key diagnostics in this routine are
- Correlation coefficient
- Confidence level
16Correlation Coefficient
17Correlation Analysis
- The male gender variable has a perfect
correlation of 1. - The female gender variable has a perfect
correlation of -1. - Household size has no correlation with response,
hence the correlation coefficient is 0.
18Correlation Results
- Show the level of confidence which a given
variable has with the modelled behaviour i.e.
response
Correlation coefficient
Confidence Interval
19Correlation
- Why couldnt we just use results of correlation
to create model and create index values for each
sign .variable. - Age
- Tenure
- of products purchased
- of promotions since last purchase
Because there is interaction between variables
that need to be accounted for in modelling
exercise(multicollinearity). You canreview this
concept in more detail in any introductory stats
textbook.
20Examples-Correlation-Response Model
- Listed below is an example of a correlation matrix
- Answer the following
- Is each variable relevant
- -all with exception of live in Quebec, in
household and of months since last purchase - What is the relationship or impact of each
variable with response - -sign of variable tells you relationship where
corr. Coeff. tells you impact - What is the strongest variable and what is the
weakest variable? - Strongest var of months since last promoted.
Weakest var live in Quebec
21More examples of correlation
- -Younger people are more likely to respond
-Higher income are more likely to respond - -Males are less likely to respond Would the
correlation values against response for the above
variables be highly positive,close to zero or
negative for age,income, and femalesage highly
negative - Income highly positive
- Females highly positive
- People who live in Quebec exhibit no impact on
response, people with high tenure and high number
of months since last promotion are less likely to
respond. Would the correlation values against
response for the each variable be highly
positive,close to zero or negative - Quebec close to zero
- tenure highly negative
- Number of months since last promotionhighly
negative
22More examples of correlation
- Previous analysis has indicated the following
trends
- Would the correlations be closer to 1,-1 , or0
here for bothvariables?
Spending close to 0.
tenure close to -1
23More examples of correlation
- Would the correlations be closer to 1,-1 , or0
here for bothvariables?
Spending close to 1
tenure close to 0
- What is the learning here vs. the
previousslide-variables have changed in their
impact to response
24Exploratory Data Analysis Reports(EDA)
- After looking at the correlation reports, we also
need to create EDA reports which help to better
understand the relationship of a given variable
with the desired marketing behaviour. - It helps the business people and marketers to get
inside the so-called black box of modelling.
25Exploratory Data Analysis Reports(EDA)
26Exploratory Data Analysis Reports(EDA)
- Lets take a look at example of a binary variable
Male
of Observations
Response Rate
Yes
50000
2.00
No
50000
2.60
Average
100000
2.30
On the next page are some examples of EDA reports
of variables that are not statistically
significant according to the correlation matrix.
27Exploratory Data Analysis Reports(EDA)
- EDAs of non-stat.sign. variables
28Exploratory Data Analysis Reports
- Exploratory Data Analysis Reports
What does this tell us?
What does this tell us?
29Exploratory Data Analysis Reports
What does this mean?
What does this mean?
30Creating the Final Model
- Why couldnt we just use results of correlation
to create model and create index values for each
sign .variable. - Age
- Tenure
- of products purchased
- of promotions since last purchase
Think Statistics here?
31The Data Mining Process Application of Data
Mining Techniques-Creating the Final Model
- Problems with Multicollinearity
- Example Years of Education and Income on
Response Rate -
- Regression Equation is
- Response .50.00001income -.03yrs. of
education
- Problems with Multicollinearity
- Example Years of Education and Income on
Response Rate -
- Regression Equation is
- Response .50.00001income -.03yrs. of
education
Response
Years of
Income Education Correlation Coefficient 0.11 0.
12 Confidence Interval 99 99.50
What is the problem here and what do you do?
32Continuing to build the model
- Multivariate analytical techniques such as
multiple regression,logistic regression,etc. may
be employed to produce the final model - Final equationPredicted Response RateA
B1Age B2tenure - What is the problem here?
33Continuing to build the model
34Continuing to build the model
- After observing correlation results and EDAs
what can we begin to do at this point. - Derive new variables-EDAs
- Derive new variables-multicollinearity
- Derive new variables-Factor Analysis
- Derive new variables-CHAID(will explore later)
Reference Material Factor Analysis-look up in
any Statistics Handbook Regression-look up
in textbook under Regression
and Statistics
Regression.
35Continuing to build the model
- Running further statistical routines, we are able
to develop a final model. The marketer or
business person should receive a report that
looks as follows
For those of you that have statistics training,
how is the Contribution to model calculated
derived?
36Continuing to Build the Model
Variable
Partial
Model
Entered
R-Square
R-Square
var 4
0.0036
0.0036
var 3
0.0034
0.007
var 1
0.0016
0.0086
var 2
0.0007
0.0092
var 6
0.0009
0.0102
var 5
0.0003
0.0105
37Continuing to Build the Model
What would be the final equation in terms of the
sign?
38Continuing to build the model
39Continuing to build the model
- Suppose we have the following equation
- Response .09
- .05 X Income
- .06 X Tenure
- .08 X Product Spend
- -.04 X Male
- What is the problem here?