MKT 700 Business Intelligence and Decision Models - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

MKT 700 Business Intelligence and Decision Models

Description:

MKT 700 Business Intelligence and Decision Models Algorithms and Customer Profiling (1) – PowerPoint PPT presentation

Number of Views:124
Avg rating:3.0/5.0
Slides: 39
Provided by: Richard1809
Category:

less

Transcript and Presenter's Notes

Title: MKT 700 Business Intelligence and Decision Models


1
MKT 700Business Intelligence and Decision Models
  • Algorithms and
  • Customer Profiling (1)

2
Classification and Prediction
3
ClassificationUnsupervised Learning
4
PredictingSupervised Learning
5
SPSS Direct Marketing
Classification Predictive
Unsupervised Learning RFM Cluster analysis Postal Code Responses NA
Supervised Learning Customer Profiling Propensity to buy
6
SPSS Analysis
Classification Predictive
Unsupervised Learning Hierarchical Cluster Two-Step Cluster K-Means Cluster NA
Supervised Learning Classification Trees CHAID CART Linear Regression Logistic Regression Artificial Neural Nets
7
Major Algorithms
Classification Predictive
Unsupervised Learning Euclidean Distance Log Likelihood NA
Supervised Learning Chi-square Statistics Log Likelihood GINI Impurity Index F-Statistics (ANOVA) Log Likelihood F-Statistics (ANOVA)
Nominal Chi-square, Log Likelihood Continuous
F-Statistics, Log Likelihood
8
Euclidean Distance
9
Euclidean Distance for Continuous Variables
  • Pythagorean distance ? vd2 v(a2b2)
  • Euclidean space ? vd2 v(a2b2c2)
  • Euclidean distance ? d ?(di)21/2(Cluster
    Analysis with continuous var.)

10
Pearsons Chi-Square
11
Contingency Table
North South East West Tot.
Yes 68 75 57 79 279
No 32 45 33 31 141
Tot. 100 120 90 110 420
12
Observed and theoretical Frequencies
North South East West Tot.
Yes 68 66 75 80 57 60 79 73 279 66
No 32 34 45 40 33 30 31 37 141 34
Tot. 100 120 90 110 420
13
Chi-Square
Obs. fo fe fo-fe (fo-fe)2 (fo-fe)2 fe
1,1 68 1,2 75 1,3 57 1,4 79 2,1 32 2,2 45 2,2 33 2,4 31 66 80 60 73 34 40 30 37 2 -5 -3 6 -2 5 3 6 4 25 9 36 4 25 9 36 .0606 .3125 .1500 .4932 .1176 .6250 .3000 .9730 X2 3.032
14
Statistical Inference
  • DF (4 col 1) (2 rows 1) 3

15
Log Likelihood Chi-Square
16
Log Likelihood
  • Based on probability distributions rather than
    contingency (frequency) tables.
  • Applicable to both categorical and continuous
    variables, contrary to chi-square which must be
    discreticized.

17
Contingency Table (Observed Frequencies)
Cluster 1 Cluster 2 Total
Male 10 30 40
18
Contingency Table (Expected Frequencies)
Cluster 1 Cluster 2 Total
Male 10 20 30 20 40 40
19
Chi-Square
Obs. fo Fe fo-fe (fo-fe)2 (fo-fe)2 fe
1,1 10 1,2 30 20 20 -10 10 100 100 5.00 5.00 X2 10.00
p lt 0.05 DF 1 Critical value 3.84
20
Log Likelihood Distance Probability
Cluster 1 Cluster 2
Male O E 10 20 30 20
O/E Ln (O/E) O Ln (O/E) 2?OLn(O/E) 10/20 .50 -.693 10-.693 -6.93 30/201.50 .405 30.405 12.164 2(-6.9312.164) 10.46
p lt 0.05 critical value 3.84 p lt 0.05 critical value 3.84
21
Variance, ANOVA, andF Statistics
22
F-Statistics
  • For metric or continuous variables
  • Compares explained (in the model) and unexplained
    variances (errors)

23
Variance
SQUARED
VALUE VALUE VALUE MEAN DIFFERENCE
20 20   43.6   557
34 34   43.6   92.16
34 34   43.6   92.16
38 38   43.6   31.36
38 38   43.6   31.36
40 40   43.6   12.96
41 41   43.6   6.76
41 41   43.6   6.76
41 41   43.6   6.76
42 42   43.6   2.56
43 43   43.6   0.36
47 47   43.6   11.56
47 47   43.6   11.56
48 48   43.6   19.36
49 49   43.6   29.16
49 49   43.6   29.16
55 55   43.6   130
55 55   43.6   130
55 55   43.6   130
55 55   43.6   130

COUNT 20 20 SS 1461
DF 19
VAR 76.88
MEAN 43.6 43.6 SD 8.768
SS is Sum of Squares DF N-1 VARSS/DF SD vVAR
24
ANOVA
  • Two Groups T-test
  • Three Group Comparisons Are errors
    (discrepancies between observations and the
    overall mean) explained by group membership or by
    some other (random) effect?

25
OnewayANOVA
Grand mean
Group 1 Group 2 Group 3 5.042
6 8 3
5 9 2 (X-Mean)2
4 7 1 0.918
5 8 3 0.002
4 9 2 1.085
6 7 1 0.002
5 8 3 1.085
4 9 2 0.918
0.002
Group means Group means 1.085
4.875 8.125 2.125 8.752
15.668
3.835
8.752
(X-Mean)2 (X-Mean)2 (X-Mean)2 15.668
1.266 0.016 0.766 3.835
0.016 0.766 0.016 8.752
0.766 1.266 1.266 15.668
0.016 0.016 0.766 4.168
0.766 0.766 0.016 9.252
1.266 1.266 1.266 16.335
0.016 0.016 0.766 4.168
0.766 0.766 0.016 9.252
      16.335
4.875 4.875 4.875 4.168
9.252
SS Within 14.625
Total SS 158.958
26
MSS(Between)/MSS(Within)
  Winthin groups Winthin groups Between Groups Total Errors
   
SS 14.625 144.333 158.958
DF 24-321 3-12 24-123
Mean SS 0.696   72.167    6.911


Between Groups Mean SS Between Groups Mean SS 72.167   103.624 p-value lt .05
Within Groups Mean SS Within Groups Mean SS 0.696      
27
ONEWAY (Excel or SPSS)

Anova Single Factor Anova Single Factor Anova Single Factor

SUMMARY SUMMARY
Groups Groups Count Sum Average Variance
Group 1 Group 1 8 39 4.875 0.696
Group 2 Group 2 8 65 8.125 0.696
Group 3 Group 3 8 17 2.125 0.696


ANOVA ANOVA
Source of Variation Source of Variation SS df MS F P-value F crit
Between Groups Between Groups 144.333 2 72.167 103.624 1.318E-11 3.467
Within Groups Within Groups 14.625 21 0.696

Total Total 158.958 23        



28
Profiling
29
Customer Profiling Documenting or Describing
  • Who is likely to buy or not respond?
  • Who is likely to buy what product or service?
  • Who is in danger of lapsing?

30
CHAID or CART
  • Chi-Square Automatic Interaction Detector
  • Based on Chi-Square
  • All variables discretecized
  • Dependent variable nominal
  • Classification and Regression Tree
  • Variables can be discrete or continuous
  • Based on GINI or F-Test
  • Dependent variable nominal or continuous

31
Use of Decision Trees
  • Classify observations from a target binary or
    nominal variable ? Segmentation
  • Predictive response analysis from a target
    numerical variable ? Behaviour
  • Decision support rules ? Processing

32
Decision Tree
33
Exampledmdata.sav
  • Underlying Theory
  • ? X2

34
CHAID AlgorithmSelecting Variables
  • Example
  • Regions (4), Gender (3, including Missing)Age
    (6, including Missing)
  • For each variable, collapse categories to
    maximize chi-square test of independence
    Ex Region (N, S, E, W,) ? (WSE, N)
  • Select most significant variable
  • Go to next branch and next level
  • Stop growing if estimated X2 lt theoretical X2

35
CART (Nominal Target)
  • Nominal Targets
  • GINI (Impurity Reduction or Entropy)
  • Squared probability of node membership
  • Gini0 when targets are perfectly classified.
  • Gini Index 1-?pi2
  • Example
  • Prob Bus 0.4, Car 0.3, Train 0.3
  • Gini 1 (0.42 0.32 0.32) 0.660

36
CART (Metric Target)
  • Continuous Variables
  • Variance Reduction (F-test)

37
Comparative Advantages(From Wikipedia)
  • Simple to understand and interpret
  • Requires little data preparation
  • Able to handle both numerical and categorical data
  • Uses a white box model easilyexplained by
    Boolean logic.
  • Possible to validate a modelusing statistical
    tests
  • Robust

38
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com