Jennifer Umlaufs Logistic - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Jennifer Umlaufs Logistic

Description:

Logistic regression is a regression model for response variables ... SC = -2logeL p*loge(n) = 25.425 2(loge(25)) = 31.863. Odds Ratio = exp(0.1615) = 1.175 ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 26
Provided by: Jul756
Category:

less

Transcript and Presenter's Notes

Title: Jennifer Umlaufs Logistic


1
Jennifer Umlaufs Logistic Poisson Regression
Notes
  • By Jennifer Umlauf,
  • M Sc. Statistics

2
Logistic Regression
  • Logistic regression is a regression model for
    response variables that have a binomial
    distribution.
  • It is useful for modeling the probability of an
    event that occurs as a function of other
    predictor variables.
  • It is a generalized linear model that uses the
    logit as its link function.

3
Logistic Regression (Continues)
  • We wish to analyze data of the form
  • Yi B(pi , ni)
  • for i 1, . ,m
  • where the numbers of Bernoulli trials ni are
    known and the probabilities of success pi are
    unknown.

4
Simple Logistic Regression
  • For Simple Logistic Regression,
  • we can model the probability
  • of a success as


pi(x) exp(b0 b1Xi )
___________________
1 exp(b0 b1Xi )
5
Multiple Logistic Regression
  • For Multiple Logistic Regression,
  • we can model the probability of a
  • success as


p (x) exp(Xb)
_______________
1 exp(Xb)
where Xb b0 b1X1 bp-1Xp-1
6
Some Extensions
  • Ordinal and multinomial logistic regression are
    extensions of binary logistic regression.
  • They allow for the simultaneous comparison of
    more than one contrast.

7
Logistic Regression (Continues)
  • The logits of the unknown binomial probabilities
    are modelled as a linear function of the
    predictor variables.

8
A SAS Example Using Logistic RegressionX
Months of on the job experience.Y Programming
task success. Does experience matter?
  • data logistic_example
  • input x y
  • label x 'Experience'
  • y 'Success'
  • cards
  • 14 0 0.310262
  • 29 0 0.835263
  • 6 0 0.109996
  • 25 1 0.726602
  • 18 1 0.461837
  • 4 0 0.082130
  • 18 0 0.461837
  • 12 0 0.245666
  • 22 1 0.620812
  • 6 0 0.109996
  • 30 1 0.856299
  • 11 0 0.216980
  • 30 1 0.856299
  • 5 0 0.095154

9
  • ods rtf stylejournal
  • ods graphics on
  • proc logistic data logistic_example descending
  • model y x
  • output out temp resdevdevresidual p
    fittedp
  • run
  • proc print data temp
  • var x y fittedp devresidual
  • run
  • ods graphics close
  • ods rtf close
  • proc sort data temp
  • by x
  • run
  • goptions reset all
  • symbol1 cred vdot h .8
  • symbol2 cblue vdot h .8 ijoin

10
AIC SC are 2 model selection criteria used for
logistic regression model selection. (The lower
the value the better)
AIC -2logeL 2p 25.425 4
29.425 SC -2logeL ploge(n) 25.425
2(loge(25)) 31.863
11
Odds Ratio ? exp(0.1615) 1.175
Approx. Upper CI for Odds Ratio exp(0.1615
1.960.065)1.335 Approx. Lower CI for Odds Ratio
exp(0.1615 - 1.960.065)1.0345
12
Interpreting The Odds Ratio So the odds of
successfully completing the task increase by
17.5 with each additional month of experience.
13
Poisson Regression
  • The Poisson distribution describes the
    probability that a random event will occur in a
    time or space interval when the probability of
    the event occurring is very small, but the number
    of trials is very large.
  • It is the limit of a binomial process in which
    prob ? 0, n?8 nprob ? m.

14
Poisson Regression (continues)
  • Poisson regression models are generalized linear
    models with the Poisson distribution function.
    The log link function is commonly used.

Poisson Probability Distribution
15
Poisson Regression
  • Our Poisson response variable may be modeled as

Sometimes, the count responses will pertain to
unequal units of time or space. In such cases, we
let m/t l. SAS Use offset log(t)
16
Poisson Regression
  • Using the Log Link, we obtain

17
Overdispersion
  • A characteristic of the Poisson distribution is
    that its mean is equal to its variance.
  • Sometimes, we may see that the observed variance
    is greater than the mean this is known as
    overdispersion. It tells us that the model is not
    appropriate.
  • A common reason is the exclusion of relevant
    explanatory variables.

Another common issue may be trying to use a
Poisson model when the data has a lot of 0s. This
especially occurs when looking at data that has
some non-response bias (i.e. trying to model the
number of positive comments made by the subject).
18
A SAS Example Using Poisson RegressionX1
of Housing UnitsX2 Average Income, in Dollars
X3 Average Housing Unit Age, in YearsX4
Distance to Nearest Competitor, in MilesX5
Distance to Store, in MilesY1 Number of
Customers who Visited the Store
  • data poisson_example
  • input y x1 x2 x3 x4 x5
  • label x1 'Housing'
  • x2 'Income'
  • x3 'Age'
  • x4 'Competitor Distance'
  • x5 'Store Distance'
  • y 'Costumers'
  • cards
  • 9 606 41393 3 3.04 6.32
  • 6 641 23635 18 1.95 8.89
  • 28 505 55475 27 6.54 2.05
  • 11 866 64646 31 1.67 5.81
  • 4 599 31972 7 0.72 8.11
  • 4 520 41755 23 2.24 6.81
  • 0 354 46014 26 0.77 9.27
  • 14 483 34626 1 3.51 7.92
  • 16 1034 85207 13 4.23 4.40
  • 13 456 33021 32 3.07 6.03

19
  • 22 898 46027 44 3.03 5.60
  • 8 731 32202 43 5.15 9.67
  • 3 584 32871 13 1.47 8.02
  • 11 439 29564 18 3.67 5.10
  • 2 153 46806 21 0.84 9.18
  • 6 1069 59805 22 2.50 9.43
  • 11 443 42555 53 2.62 5.75
  • 10 392 36998 7 1.03 7.74
  • 0 828 85664 4 1.30 9.66
  • 15 159 21238 4 2.98 8.66
  • 9 830 47972 40 2.28 9.26
  • 16 234 33246 26 3.95 4.61
  • 29 1004 45927 24 4.90 2.69
  • 6 643 58315 8 0.78 6.26
  • 26 741 69177 9 6.61 0.87
  • 13 306 40886 27 4.53 2.68
  • 0 180 44588 14 0.88 9.38
  • 8 644 47347 35 2.94 7.69
  • 8 109 31791 9 4.37 9.31

20
  • 9 669 34595 38 4.06 8.78
  • 8 582 30878 58 1.91 6.86
  • 6 872 39366 52 0.73 8.67
  • 6 758 61563 31 3.08 8.33
  • 15 782 38412 26 2.72 6.71
  • 15 551 41045 2 3.62 7.45
  • 12 201 23864 43 4.80 8.74
  • 10 730 38647 9 0.67 7.92
  • 8 738 58387 13 2.01 6.60
  • 3 469 37242 40 1.42 8.37
  • 10 898 38337 32 2.63 9.56
  • 10 780 68201 5 4.12 6.69
  • 15 622 41066 46 4.48 4.10
  • 6 391 40873 19 1.67 6.90
  • 9 531 54655 40 2.32 5.69
  • 21 566 49826 1 3.06 4.03
  • 13 410 29013 50 2.68 7.58
  • 8 719 78082 31 2.70 4.89
  • 6 684 57506 51 2.13 8.31

21
Lots of data
  • 8 312 43393 41 2.25 6.43
  • 16 787 61765 53 5.39 3.37
  • 5 416 33348 48 1.48 7.66
  • 8 528 44541 31 4.91 9.67
  • 11 919 40795 8 2.97 7.79
  • 12 482 55972 9 2.91 5.85
  • 14 781 33140 30 1.42 5.71
  • 17 120 19673 21 2.65 6.25
  • 17 693 36190 6 4.70 9.54
  • 6 348 25768 42 1.43 7.11
  • 15 780 53974 47 4.21 6.41
  • 10 752 71814 1 3.13 5.47
  • 6 817 54429 47 1.90 9.90
  • 4 268 34022 54 1.20 9.51
  • 6 519 52850 43 2.92 8.62
  • run

22
  • proc genmod datapoisson_example
  • model y x1-x5 / dist poisson link log
  • output outtemp pmuhati resdevdevi
  • run
  • proc print data temp (obs10)
  • var y muhati devi
  • run
  • data temp
  • set temp
  • id _n_
  • run
  • symbol1 vdot ijoin cblue h .8
  • axis1 label(angle 90)
  • proc gplot data temp
  • plot deviid/ vaxis axis1

23
An Interpretation of Parameter for Store
Distance If the distance from the store were to
increase by a mile, the difference in the logs of
expected counts of customers visiting the store
decrease by 0.1288, while holding all other
variables in the model constant. An
Interpretation for Store Distance If the
distance from the store were to increase by one
mile, the number of customers visiting the store
would decrease by about 1 customer, while holding
all other variables in the model constant.
exp(-0.1288) 0.879
24
Poisson Regression (continues)
First 10 Fitted Values Deviance Residual



Devi Sign of Yi mi2 Yi loge(mi /Yi) 2(Yi
mi )1/2 Dev1 - -(29Loge(12.3378/9)
2(9-12.3378) - 0.99881
We use a similar equation for logistic regression
as well.
25
Deviance Residual Plot
Write a Comment
User Comments (0)
About PowerShow.com