Logistic Regression presentation

About This Presentation

Transcript and Presenter's Notes

Title: Logistic Regression

1
Logistic Regression
2
What is Logistic Regression?

Also known as Logit
Used when you have a binary dependent variable
Response yes no
Good credit risk bad credit risk
Buyer not buyer
Left - stayed (attrition models)
Like regression, can include metric and
categorical variables, interactions, non-linear
terms
Can be extended to dependent variables which
assume gt 2 values (e.g. brand choice models)
but these get complicated

3
Whats wrong with OLS (ordinary least squares
regression)?

Although the actual Y values are all 0 or 1, the
fitted values are not and are interpreted as
probabilities
There are 3 basic problems with this approach
Violates the OLS assumption that the error terms
are normally distributed
Violates the OLS assumption of homoscedasticity
And, perhaps most importantly, leads to predicted
probabilities that are negative or greater than
one

4
Out-of-range predicted probabilities

5
What about a non-linear model?

6
Regression vs. Logit Model Model

Try some values of x in this function
bX -5, P(Y1) .01bX -1, P(Y1 .27 bX
0, P(Y1) .5bX 1, P(Y1) .73bX 5,
P(Y1) .99
Always between 0 and 1. ()
Can be rewritten as
ln(p/(1-p)) ? biXiso now the
log of the odds is a function of the Xs which
complicates the interpretation (-)

Y b0 b1X1 b2 X2 ... SbX
Depending on the values of the predictor
variables, the predicted values for Y are
unbounded (-)
But coefficients are simple to interpret for
every unit increase in X1, the predicted value of
Y changes by b1 ()

7
A Simple Logit Example P(redeem coupon)
f(coupon value in )

Collected data on whether coupons of differing
values were redeemed and estimated logistic
regression model
Constant -2.18506
Coefficient for coupon value .1087
P(redeem) 1/1exp(2.18506 - .1087 X)
or
ln(odds) lnp(redeem)/p(not redeem)
-2.18506 .1087(X)
Interpretation of .1087?
exp(.1087) 1.1148
thus, an additional 1 off is estimated to
increase the odds that coupon will be redeemed by
11.48 (i.e. multiply the odds by 1.1148)

8
Probabilities, odds and odds ratios

Probability is the likelihood of an event and
is bounded between 0 and 1
Odds is the ratio of two probabilities of the
prob. of being redeemed to the prob. of not being
redeemed
Odds ratio is the ratio of two odds
p(redeem) 1/1exp(-?bx) 1/1exp(2.18506 -
.10870(x))
10 p(y1) 1/1exp(2.18506 - .10870(10))
.2501
11 p(y1) 1/1exp(2.18506 - .10870(11))
.2710
p(y1) p(y0) odds odds ratio
10 .2501 .7499 .3335
11 .2710 .7290 .3717 .3717/.33351.114

9
Typical CRM applications of logistic regression
10
Cross-Selling Models

Market Basket Approach or Affinity Grouping
Look at historical patterns of which products
people own/buy
Customers who have some but not all of a common
set are assumed to be good prospects
Individual Propensity Models that vote
Build a propensity-to-buy model for each product
individually (e.g. credit card, money market
account, CDs, etc.)
P(Own CD) f(income, age, assets, )
P(home equity loan) f(own home, income, )
Score each customer on each product
Best next offer is one with highest propensity to
purchase

11
Attrition Models

Increasingly used by banks, financial
institutions, telephone companies, clubs and
continuity programs
Use historical data on those who attrited (i.e.
left) and those who stayed, score current
customers on probability of attriting and decide
whether to act or not (depending on LTV)
Did you know? California AAA was the first of 90
AAA clubs to build an attrition model and to have
a lifetime value score in each members data
record

12
Geometric Interpretation

LR can be thought of as finding a hyper-plane to
separate positive and negative data points
Suppose a classification problem with input
variables
Consider data points displayed on the 2-dim input
space

13
Geometric Interpretation
x2
o
o
o
o
x
o
o
o
x
x
Hyperplane
o
x
x
x1
Classifier
14
LearningParameter Estimation

Maximum Likelihood Estimation (MLE)
Consider a record in X as a Bernoulli trial with
mean P and variance P(1-P).
We may interpret the expectation function as the
probability that y1, or equivalently that xi
belong to the positive class. Let
sigma(x,beta)P1/(1exp(-betaT x)

15
LearningParameter Estimation

Likelihood and log-likelihood of the data X, y
under the LR model with parameter

16
LearningParameter Estimation

Maximum Likelihood Estimation (MLE)
The likelihood and log-likelihood functions are
nonlinear in and cannot be solved
analytically.
Numerical methods are typically used to find the
MLE .
Conjugate gradient is a popular choice.

17
Advantages

de Facto classifier in many fields including
marketing
Simple to understand, build and use (cf. DT)
Similar to regression - interpretable results
Probabilities are easily interpreted
can assess which variables are significant and
important
can assess which variables have positive and
negative effects on the dependent variable

18
Limitations

Linear Cannot handle Nonlinear boundary
Requires knowledge of what variables to include
Solution Stepwise Regression (Forward, Backward,
etc.), Automatic Variable selection

Write a Comment

User Comments (0)

About PowerShow.com

Logistic Regression PowerPoint PPT Presentation