MARK2039 - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

MARK2039

Description:

Lecture 8 MARK2039 Summer 2006 George Brown College Wednesday 9-12 – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 40
Provided by: LukeF5
Category:
Tags: data | define | mark2039 | mining

less

Transcript and Presenter's Notes

Title: MARK2039


1
Lecture 8
  • MARK2039
  • Summer 2006
  • George Brown College
  • Wednesday 9-12

2
Assignment 6
Backend H4B2E5STRUGER Marketing list
H4B2E5STRUGERJOHN4849MAYFAIR Unaddressed
Campaign H4B2E5
3
Assignment 6
Id Total Amount
of months since last trans. 456
1280 6 months 123 300 5
months789 76 8 months12
10 10 months
4
Assignment 6
Data needs to be standardized such that we have
one value for each gender outcome
5
Assignment 6
Use purchase behaviour field and look at purchase
window(say 3 mos.)(April06 to June06). No
purchase in window means customer is non
defector(0) while purchase in window means
customer is defector(1). I would use the other
information(income,region,age, and tenure) as
potential variablesto help predict defection.
6
Classification/Profiling vs. Predictive Modelling
Profiling
Predictive Modelling
-
-
Pre
Defector
Post
Non Defector
Age
-
Defector
-
Age
-
Age
-
Tenure
-
Non
-
Defector
-
Tenure
-
Tenure
-
Income
-
Income
-
Income
-Transaction
TransactionBehaviour
-
Transactionbehaviour
-Transaction Behaviour
Independant
Dependant
variables
variable
Classification
Predict

7
Predictive Modelling
  • ExamplesDiscrete Models
  • Response Models
  • Cross Sell
  • Upsell
  • Acquisition
  • Attrition Models
  • Product Affinity Models
  • Risk Models

8
Predictive Modelling
  • Examples-Continuous Models
  • Profitability/Value Models
  • Spending Models

9
Types of Predictive Models -
  • An acquisition campaign with no targetting was
    conducted in January. The available information
    is as follows
  • Mail files containing name and address
  • Responder files containing name and address
  • 2001 Stats Can Census data available at the
    enumeration area
  • A conversion table which maps enumeration areas
    to postal codes
  • How would you use the above information to better
    target prospects to become new customers.
  • Describe how the analytical file would be created
  • 1) define objective function of creating response
    variable
  • 2)create response variable by matching responder
    file to mail file using match key of postal code
    and last name. Assign value of 1 for
    matches(responders) and 0 for non matches(non
    responders). This field will be created on mail
    file or analytical file
  • 3)Match analytical file to Stats conversion
    file(contains enumeration area) by postal code.
    Match new output file to Stats Can file by
    enumeration area which contains the very rich
    demographic information.
  • Remember the end deliverable is to create a table
    with the dependant variable or objective function
    and examples of other independent or predictor
    variables.

10
Types of Predictive Models
1) define objective function of
  • You have been asked to create programs that
    better target existing customers for insurance
    products. You have the following info

What would you do and how would you create the
analytical file 1) Define objective function and
create insurance response variable 2)create
insurance response variable by looking at amount
spent in certain transaction type and within a
certain timeframe. Assign value of 1 if this
condition is met and 0 if not.. This field will
be created on analytical file 3)Create
independent model predictors by creating
recency,freq uency, and amount variables and by
type from the transaction file. Create
demographic variables from the customer file such
as region of country, tenure, age, income,etc.
Remember the end deliverable is to create a
table with the dependant variable or objective
function and examples of other independent or
predictor variables.
11
Types of Predictive Models
  • You have been asked to build a targetting tool
    for a cross-sell campaign to get existing
    customers to purchase an insurance policy A
    campaign was conducted in May of 2005. What
    questions do you need to ask in order to help
    design a proper tool
  • Was the campaign data captured. Are responders
    clearly identified or do we have to impute them
    through the database based on the transaction
    data that occurred within a certain time frame of
    the campaign.

12
Types of Predictive Models
  • You have been asked to target customer that will
    not only purchase insurance but will also
    purchase the largest premiums
  • What type of model would be built here?
  • Two-stage model with one whereby we are
    targetting both insurance response and premium.
    Objective function is Expected value of premium
    Pr(Response) X Premium

13
Types of Predictive Models
  • Creating The Analytical File
  • Defining the objective function
  • Defining the Model predictors
  • Once this is done, the first diagnostic that can
    be done is the correlation matrix.

14
Correlation
  • Want to determine which variables have the
    greatest relationship with response
  • Run the correlation of the dependant variable
    with all the independents (in your reduced set).
  • Based on the highest correlation coefficient
    select best variables (usually select those with
    statistical significance criterion of at least
    95)
  • Correlation can be negative or positive
  • Serves as a great pre-screening tool.

15
The Concept of Correlation
  • Using correlation analysis for selecting
    variables for our response model.
  • Analytical file contains six variables

Dependant Variable/ Modelled Variable
Response
  • Age
  • Tenure
  • of Products
  • of Promotions
  • Income
  • Household Size

Independent Variables
  • The key diagnostics in this routine are
  • Correlation coefficient
  • Confidence level

16
Correlation Coefficient
17
Correlation Analysis
  • The male gender variable has a perfect
    correlation of 1.
  • The female gender variable has a perfect
    correlation of -1.
  • Household size has no correlation with response,
    hence the correlation coefficient is 0.

18
Correlation Results
  • Show the level of confidence which a given
    variable has with the modelled behaviour i.e.
    response

Correlation coefficient
Confidence Interval
19
Correlation
  • Why couldnt we just use results of correlation
    to create model and create index values for each
    sign .variable.
  • Age
  • Tenure
  • of products purchased
  • of promotions since last purchase

Because there is interaction between variables
that need to be accounted for in modelling
exercise(multicollinearity). You canreview this
concept in more detail in any introductory stats
textbook.
20
Examples-Correlation-Response Model
  • Listed below is an example of a correlation matrix
  • Answer the following
  • Is each variable relevant
  • -all with exception of live in Quebec, in
    household and of months since last purchase
  • What is the relationship or impact of each
    variable with response
  • -sign of variable tells you relationship where
    corr. Coeff. tells you impact
  • What is the strongest variable and what is the
    weakest variable?
  • Strongest var of months since last promoted.
    Weakest var live in Quebec

21
More examples of correlation
  • -Younger people are more likely to respond
    -Higher income are more likely to respond
  • -Males are less likely to respond Would the
    correlation values against response for the above
    variables be highly positive,close to zero or
    negative for age,income, and femalesage highly
    negative
  • Income highly positive
  • Females highly positive
  • People who live in Quebec exhibit no impact on
    response, people with high tenure and high number
    of months since last promotion are less likely to
    respond. Would the correlation values against
    response for the each variable be highly
    positive,close to zero or negative
  • Quebec close to zero
  • tenure highly negative
  • Number of months since last promotionhighly
    negative

22
More examples of correlation
  • Previous analysis has indicated the following
    trends
  • Would the correlations be closer to 1,-1 , or0
    here for bothvariables?

Spending close to 0.
tenure close to -1
23
More examples of correlation
  • Would the correlations be closer to 1,-1 , or0
    here for bothvariables?

Spending close to 1
tenure close to 0
  • What is the learning here vs. the
    previousslide-variables have changed in their
    impact to response

24
Exploratory Data Analysis Reports(EDA)
  • After looking at the correlation reports, we also
    need to create EDA reports which help to better
    understand the relationship of a given variable
    with the desired marketing behaviour.
  • It helps the business people and marketers to get
    inside the so-called black box of modelling.

25
Exploratory Data Analysis Reports(EDA)
26
Exploratory Data Analysis Reports(EDA)
  • Lets take a look at example of a binary variable

Male
of Observations
Response Rate
Yes
50000
2.00
No
50000
2.60
Average
100000
2.30
On the next page are some examples of EDA reports
of variables that are not statistically
significant according to the correlation matrix.
27
Exploratory Data Analysis Reports(EDA)
  • EDAs of non-stat.sign. variables

28
Exploratory Data Analysis Reports
  • Exploratory Data Analysis Reports

What does this tell us?
What does this tell us?
29
Exploratory Data Analysis Reports
What does this mean?
What does this mean?
30
Creating the Final Model
  • Why couldnt we just use results of correlation
    to create model and create index values for each
    sign .variable.
  • Age
  • Tenure
  • of products purchased
  • of promotions since last purchase

Think Statistics here?
31
The Data Mining Process Application of Data
Mining Techniques-Creating the Final Model
  • Problems with Multicollinearity
  • Example Years of Education and Income on
    Response Rate
  • Regression Equation is
  • Response .50.00001income -.03yrs. of
    education
  • Problems with Multicollinearity
  • Example Years of Education and Income on
    Response Rate
  • Regression Equation is
  • Response .50.00001income -.03yrs. of
    education

Response
Years of
Income Education Correlation Coefficient 0.11 0.
12 Confidence Interval 99 99.50
What is the problem here and what do you do?
32
Continuing to build the model
  • Multivariate analytical techniques such as
    multiple regression,logistic regression,etc. may
    be employed to produce the final model
  • Final equationPredicted Response RateA
    B1Age B2tenure
  • What is the problem here?

33
Continuing to build the model
34
Continuing to build the model
  • After observing correlation results and EDAs
    what can we begin to do at this point.
  • Derive new variables-EDAs
  • Derive new variables-multicollinearity
  • Derive new variables-Factor Analysis
  • Derive new variables-CHAID(will explore later)

Reference Material Factor Analysis-look up in
any Statistics Handbook Regression-look up
in textbook under Regression
and Statistics
Regression.
35
Continuing to build the model
  • Running further statistical routines, we are able
    to develop a final model. The marketer or
    business person should receive a report that
    looks as follows

For those of you that have statistics training,
how is the Contribution to model calculated
derived?
36
Continuing to Build the Model
Variable
Partial
Model
Entered
R-Square
R-Square
var 4
0.0036
0.0036
var 3
0.0034
0.007
var 1
0.0016
0.0086
var 2
0.0007
0.0092
var 6
0.0009
0.0102
var 5
0.0003
0.0105
37
Continuing to Build the Model
What would be the final equation in terms of the
sign?
38
Continuing to build the model
  • What would you do here

39
Continuing to build the model
  • Suppose we have the following equation
  • Response .09
  • .05 X Income
  • .06 X Tenure
  • .08 X Product Spend
  • -.04 X Male
  • What is the problem here?
Write a Comment
User Comments (0)
About PowerShow.com