Logistic Regression - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Logistic Regression

Description:

Research Project: Action Plan. Propose hypotheses. Find dataset. Find variables. Recode variables. Are variables appropriate measures of the concepts in your ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 13
Provided by: www2War
Category:

less

Transcript and Presenter's Notes

Title: Logistic Regression


1
Logistic Regression
  • Monday 13th February

2
Outline
  • Final Exam initial questions
  • Research Projects
  • Action Plan
  • Format (what are they expected to look like?)
  • When and Why do we Use Logistic Regression?
  • Transforming the dependent variable
  • Interpreting the coefficients

3
Research Project Action Plan
Read literature on your topic
Find dataset
Find variables
Are variables appropriate measures of the
concepts in your hypotheses?
Propose hypotheses
Decide the level of measurement of your variables
Recode variables
Conduct statistical tests
Decide what statistical tests you will do for
each hypothesis / combination of variables
Examine each variable you intendto use
(descriptive statistics)
Interpret statistical tests
Have you proved or disproved each hypothesis?
Conduct additional tests
Are there things you want to follow up or
investigate further?
Write up project
4
Logistic Regression When And Why
  • To predict outcome variable that is a categorical
    dichotomy from one or more categorical or
    continuous predictor variables..
  • Used because having a categorical dichotomy as an
    outcome variable violates the assumption of
    linearity in normal regression.

5
The Problem with dichotomous dependent variables
  • Are variables which take two values. Usually
    coded 1 and 0
  • i.e. being unemployed or not being pregnant or
    not smoking or not
  • Linear regression will not work because there is
    no range of scores everyone will either be 1 or
    0.
  • Therefore, instead of regressing the values, we
    regress the probability of taking one of the two
    values (or of being in the 1 category)
  • i.e. the probability of being unemployed of
    being pregnant of smoking
  • This gives us a range of values as probabilities
    range from 0 to 1.
  • However this still presents a problem for linear
    regression. As we are likely to get an impossible
    probability sometimes

6
Going from a Probabilityto a Logit 1
  • Probabilities (that Y1) range from 0 to 1.
  • For example if the probability of smoking was .48
  • The probability of an event not happening is 1
    minus the probability of the event 1 p(Y1).
  • The probability of not smoking would be 1-.48.52
  • The odds of an event happening is the probability
    of it happening divided by the probability of it
    not happening.
  • So the odds of smoking is 0.48/0.520.92.
  • Odds range from 0 to infinity (?).
  • Probabilities greater than .5 produce odds
    between 1 and ?.
  • Probabilities less than .5 produce odds between 0
    and 1.
  • This means that we cannot get a number larger
    than is reasonable, but can still get one that is
    smaller than is reasonable (i.e. less than 0).

7
Going from a Probabilityto a Logit 2
  • In order to deal with this problem we take the
    natural logarithm of the odds that Y1. This is
    referred to as logit (Y). If we use ln to stand
    it for natural log, the equation for logit (Y)
    is
  • Note the natural logarithm expresses numbers to
    base 2.72 (to an infinite number of decimal
    places). This means that the natural log of 2.72
    is 1 and the natural log of 2.722 is 2, etc.
  • This transformation stretches the lower values of
    the odds that Y1 so that the linear equation
    does not predict impossibly low values. (As the
    odds decrease from 1 to 0 the logit value becomes
    negative and increasingly large, going to ?).

8
Relationship between Probability and Logit
There is a non-linear relationship between p and
its logit. In the mid range of p there is a
linear relationship, but as p approaches the 0 or
1 extremes the relationship becomes non-linear
with increasingly larger changes in logit for the
same change in p. This means that instead of
having a dependent variable that has a minimum of
0 and a maximum of 1, we have a dependent
variable with a minimum of -? and a maximum of ?.
This means that it will not be possible to have a
result that is beyond the range of possible
values.
P
9
The logistic regression equation
  • The logistic regression equation can be arranged
    in a linear form (like a regression equation)
  • log Prob(event)/Prob(no event) a b1x1
    b2x2 ... bPxP

10
Converting back to probabilities
  • Since the coefficients in logistic regression are
    not easily interpretable (unlike linear
    regression) we convert values of logit (Y) back
    to the more meaningful values of odds and
    probabilities.
  • To obtain the odds that Y 1 we unlog logit
    (Y). This is done by taking the anti-log (or
    exponent, written as e). The equation is
  • Odds (Y1) e a bX
  • To get back to the probability that Y 1 we can
    reverse the calculation that turned the
    probability into odds
  • The probability that Y 1 e a bX divided by
    1 ea bX

11
Example
  • If we look at the effects of stress on smoking
  • We have an equation for which we get the results
  • Logit(Y) a bx
  • Logit(Y) -.08987 0.1638x
  • If x (stress) is low it would be scored as 1.
    Therefore
  • Logit(Y) -.08987 (1 x 0.1638) -0.735
  • Therefore the odds of smoking will be
  • Odds(smoker1) e-0.735 0.48
  • This can be interpreted as saying that the
    respondents reporting very low stress are about
    half as likely to smoke as not smoke.
  • The probability that they smoke will be
  • Probability of smoking odds of smoking ? (1
    odds of smoking) 0.33 or 33

12
Example contd.
  • On the other hand, if the respondent reported a
    very high level of stress (i.e. 10) his or her
    estimated probability of smoking will
    be Logit(smoker) -0.8987 (10 x 0.1638)
    0.738 Odds(smoker1) e0.738 2.09
  • This indicates that the odds of being a smoker
    are just over twice as high as those of not being
    a smoker.
  • And the probability that a highly stressed person
    will smoke is
  • Probability of smoking odds of smoking ? (1
    odds of smoking) 0.68 or 68
Write a Comment
User Comments (0)
About PowerShow.com