Linear and Logistic Regression - PowerPoint PPT Presentation

About This Presentation
Title:

Linear and Logistic Regression

Description:

546 persons asked to rate aspects of service and then overall ... of Storks. A correlation does not imply Causation. A relationship. See Excel spreadsheet ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 35
Provided by: jeffbr
Category:

less

Transcript and Presenter's Notes

Title: Linear and Logistic Regression


1
Linear and Logistic Regression
2
Where Are We Going Today?
  • An Linear regressions example
  • Data how to obtain manipulate it
  • Cleaning the data - Splus
  • Analysis
  • Issues
  • Interpretation
  • How to present the results meaningfully
  • Application
  • Description forecasting/prediction
  • Traps for the unwary
  • Logistic regression
  • Conclusions

3
An example?
  • Insurance company claims satisfaction

4
Background
  • Top secret company - insurance
  • Claims satisfaction
  • 546 persons asked to rate aspects of service and
    then overall satisfaction/likelihood to recommend
    5 point scale
  • We recommend 10 point scale - as more natural to
    respondents (1-10)
  • Major storm in a teacup

5
Questionnaire explanatory variables
  • Thinking firstly about the service you received
    from (top secret). I am going to read you some
    statements about this service and as I read you
    each statement, please give your opinion using a
    five-point scale where 1 is extremely
    dissatisfied and 5 extremely satisfied
  • (read, rotate (start at x). write in (one digit)
    per statement)
  • How satisfied or dissatisfied are you with.
  • ... everything being kept straightforward
  • ... being kept in touch while the claim was being
    processed
  • ... the general manner and attitude of the staff
    you dealt with
  • ... your claim being dealt with promptly
  • ... being treated fairly

6
Questionnaire dependent variables
  • 4a Using the same five-point scale as previously
    where 1 is extremely dissatisfied and 5
    extremely satisfied, how satisfied or
    dissatisfied were you with the overall service
    you received from (Top secret) ?
  • write in (one digit)
  •  
  • 4b And, using a five-point scale where 1 is
    extremely unlikely and 5 extremely likely, how
    likely or unlikely are you to recommend (Top
    secret) insurance to others?
  • write in (one digit)

7
Data
  • Get DP to create an Excel file with all the data
  • Make your self familiar with Excel formats
  • Clean data
  • Then start analysing the data
  • Use data to describe each aspect of service
  • the time taken to get an appointment with the
    loss adjustor
  • the convenience of meeting with the loss
    adjustor
  • the general manner and attitude of the loss
    adjustor you dealt with
  • being kept in touch while your claim was
    processed...
  • the time taken for repairs to be completed

8
Data
9
Some Code for cleaning / inspecting
  • cleaning the data
  • Regress.eg,-1Regress.eg,-16_NA
  • sum(is.na(Regress.eg))
  • 1 49
  • mn_apply(Regress.eg,2,mean,na.rmT)
  • for (i in 2ncol(Regress.eg))
  • id_is.na(Regress.eg,i)
  • Regress.egid,i_mni
  • let's look at this with a bit of jitter
  • Regress.eg2_Regress.eg2
  • matrix(rnorm(nrow(Regress.eg2)ncol(Regress.eg2),0
    ,.1),ncolncol(Regress.eg2))
  • Regress.eg2_Regress.eg2,-1

10
Matrix plot (with jitter)
11
More Code
  • lets analyse this data
  • apply(Regress.eg,2,mean)
  • cor(Regress.eg2)
  • Regress.eg.coeff_NULL
  • for (i in 26)
  • Regress.eg.coeff_c(Regress.eg.coeff,
    lm(Regress.eg,7Regress.eg,i)coeff2)
  • Regress.eg.mlr.coeff_lm(formula Satisfaction
    Straightforward
  • kept.in.touch manner.attitude prompt
    fairly, data
  • Regress.eg, na.action na.exclude)coeff

12
Output Code
  • gt Regress.eg.mlr.coeff
  • (Intercept) Straightforward kept.in.touch
  • -0.08951399 0.3802814 0.1624232
  • manner.attitude prompt fairly
  • 0.08986848 0.2199223 0.1567801
  • gt cbind(apply(Regress.eg, 2, mean)26,
    cor(Regress.eg)
  • 26, 7, Regress.eg.coeff, Regress.eg.mlr.coeff
  • -1)

  • Regress.eg.coeff
  • Straightforward 4.329650 0.7982008
    0.8010022
  • kept.in.touch 4.394834 0.7280380
    0.7185019
  • manner.attitude 4.021359 0.6524997
    0.5399704
  • prompt 4.544280 0.6774585
    0.8653943
  • fairly 4.417440 0.7017079
    0.6902109
  • Straightforward 0.38031150
  • kept.in.touch 0.16243157

13
Some issues
  • 5 point scale so definitely not normal
  • Note that the data is very left skew
  • Regression/correlation assumptions may not hold,
    except
  • CLT may kick in (546 obsns)
  • Not probably the best - but still useful
  • Challenge can anyone transform y (satisfaction)
    so it looks vaguely normal
  • If so how do we interpret these results?
  • Any other solutions?

14
Questions
  • With respect to overall satisfaction
  • What are the relationships, if any ?
  • Which are the most important?
  • What can I tell management?
  • Can I predict future scores?

15
Modelling is the answerSo what is modelling?
16
Essence of Modelling
  • Relationships
  • Understanding causation
  • Understanding the past
  • Predicting the future

A correlation does not imply Causation
17
A relationship
  • See Excel spreadsheet

18
Interpretation
  • Correlation/R2/Straight line equation
  • For one aspect of service (variable) at a time
    correlation measures strength of straight line
    relationship
  • between -1 and 1
  • 0 no straight line relationship (slr)
  • NB may not imply no relationship, just not slr!!
  • -1 perfect -ve slr, 1 perfect -ve slr
  • R2 corr. squared .7982012 .6371
  • 100 R2 VARIATION EXPLAINED BY SLR

19
Interpretation...
  • Correlation/R2 measure strength of slr
  • not the actual relationship
  • Regression equation measures size of slr
    relationship
  • Satis 0.8561
  • 0.801x (straight forward score)
  • e.g. if respondent gives a 3 we predict
  • satis .8561 0.801x ( 3 ) 3.3
  • Can use this to predict and set targets for KPIs
    or key performance indicators

20
Multiple linear regression
  • SLR except more than one input
  • ie more than one input
  • Correlation not applicable
  • R2 same interpretation
  • eg 72 versus 64 for just Straightforward only
    as an input
  • Can predict in same way - more inputs
  • satis -0.08951399
  • 0.3802814 x Straightforward
  • 0.1624232 x kept in touch
  • 0.08986848 x manner/attitude
  • 0.2199223 x prompt
  • 0.1567801 x fairly

21
Traps for young players
  • All models are wrong, some are just more useful
    than others
  • Dont always assume it is a slr
  • Multiple regression may not help you much more
  • problems of multicollinearity ( MC) -redundancy
    of variables
  • Correlation does not imply causality
  • Predicting away from region you have analysed
    will probably be crapola!!
  • Anyone thought of a solution(s) yet?

22
Output Code
  • gt Regress.eg.mlr.coeff
  • (Intercept) Straightforward kept.in.touch
  • -0.08951399 0.3802814 0.1624232
  • manner.attitude prompt fairly
  • 0.08986848 0.2199223 0.1567801
  • gt cbind(apply(Regress.eg, 2, mean)26,
    cor(Regress.eg)
  • 26, 7, Regress.eg.coeff, Regress.eg.mlr.coeff
  • -1)

  • Regress.eg.coeff
  • Straightforward 4.329650 0.7982008
    0.8010022
  • kept.in.touch 4.394834 0.7280380
    0.7185019
  • manner.attitude 4.021359 0.6524997
    0.5399704
  • prompt 4.544280 0.6774585
    0.8653943
  • fairly 4.417440 0.7017079
    0.6902109
  • Straightforward 0.38031150
  • kept.in.touch 0.16243157

23
More code
  • gt summary(lm(formula Satisfaction
    Straightforward
  • kept.in.touch manner.attitude prompt
  • fairly, data Regress.eg, na.action
  • na.exclude))
  • Call lm(formula Satisfaction Straightforward
  • kept.in.touch manner.attitude prompt
  • fairly, data Regress.eg, na.action
  • na.exclude)
  • Residuals
  • Min 1Q Median 3Q Max
  • -3.687 -0.08301 0.04314 0.133 1.924
  • Coefficients
  • Value Std. Error t value
    Pr(gtt)
  • (Intercept) -0.0895 0.1369 -0.6540
    0.5134
  • Straightforward 0.3803 0.0404 9.4127
    0.0000
  • kept.in.touch 0.1624 0.0370 4.3937
    0.0000
  • manner.attitude 0.0899 0.0270 3.3274
    0.0009

24
So what do we conclude?
  • Note in this case all the MLR estimates are ve
  • Not always the case because of MC
  • Using the KISS approach SLR is still useful
  • but note that not much difference between these
    values
  • So stretch out differences by looking at
  • Index slr coeff. x corr. Coeff

25
Presention of results
  • Invented the Importance Index
  • individual regressions
  • avoids problems that can occur with
    multi-collinearity
  • adjusted by correlation
  • allows for level of explanation
  • produce performance by importance matrix

26
Presention of results
Strengths
Concern
Maintain or divert
Secondary drivers
27
Interpretation of plot
  • Four quadrants
  • Strengths high performance /high importance
    keep up the good work
  • Maintain high performance/low importance
    dont let down your guard, maintain where
    possible
  • Secondary drivers low performance / low
    importance - keep an eye on but not too
    important
  • Concern low value/high importance this
    should be the priority area of improvement

28
Logistic Regression
29
Logistic regression
  • Suppose we wish look at the proportion of people
    who give a top box score for the satisfaction
  • Here we have a variable that is binary. Let 0a
    1-4 score and 1 top box or 5
  • Natural regression is now logistic as we have
    binary response
  • We are now in the wonderful world of generalised
    linear models

30
Logistic regression
  • With Linear regression our mean structure linear
    depends on the explanatory variables
  • mXTb
  • With logistic regression we have a non-linear
    response
  • mexp(XTb)/(1 exp(XTb))
  • Note that this is a good way of getting around
    the left skew ness of the data

31
Lets analyse this data again
  • Logistic regression code
  • Regress.eg.logistic.coeff_glm(formula
    1(Satisfaction5) Straightforward
  • kept.in.touch manner.attitude prompt
    fairly, data
  • Regress.eg, na.action na.exclude,familybino
    mial)coeff

32
Lets analyse this data again
  • gt cbind(Regress.eg.coeff, Regress.eg.mlr.coeff-1
    ,
  • Regress.eg.logistic.coeff-1)
  • Straightforward 0.8010022 0.38028138
    1.1928456
  • kept.in.touch 0.7185019 0.16242318
    0.6297301
  • manner.attitude 0.5399704 0.08986848
    0.4143086
  • prompt 0.8653943 0.21992225
    1.0494582
  • fairly 0.6902109 0.15678007
    1.0760604
  • Note that fairly comes up as being more
    important - ie this is more high associated
    with top box figures.

33
More details
  • summary(glm(formula 1 (Satisfaction 5)
  • Straightforward kept.in.touch
  • manner.attitude prompt fairly, data
  • Regress.eg, na.action na.exclude, family
  • binomial))
  • Deviance Residuals
  • Min 1Q Median 3Q
    Max
  • -2.252605 -0.3172882 0.4059497 0.4059497
    2.825783
  • Coefficients
  • Value Std. Error t value
  • (Intercept) -19.3572967 1.7395651 -11.127665
  • Straightforward 1.1928456 0.2674028 4.460857
  • kept.in.touch 0.6297301 0.2404842 2.618593
  • manner.attitude 0.4143086 0.1567237 2.643560
  • prompt 1.0494582 0.2813209 3.730467
  • fairly 1.0760604 0.2524477 4.262509

34
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com