Logistic Regression - PowerPoint PPT Presentation

1 / 12

About This Presentation

Title:

Logistic Regression

Description:

Research Project: Action Plan. Propose hypotheses. Find dataset. Find variables. Recode variables. Are variables appropriate measures of the concepts in your ... – PowerPoint PPT presentation

Number of Views:29

Avg rating:3.0/5.0

Slides: 13

Provided by: www2War

Category:

more less

Transcript and Presenter's Notes

Title: Logistic Regression

1
Logistic Regression

Monday 13th February

2
Outline

Final Exam initial questions
Research Projects
Action Plan
Format (what are they expected to look like?)
When and Why do we Use Logistic Regression?
Transforming the dependent variable
Interpreting the coefficients

3
Research Project Action Plan
Read literature on your topic
Find dataset
Find variables
Are variables appropriate measures of the
concepts in your hypotheses?
Propose hypotheses
Decide the level of measurement of your variables
Recode variables
Conduct statistical tests
Decide what statistical tests you will do for
each hypothesis / combination of variables
Examine each variable you intendto use
(descriptive statistics)
Interpret statistical tests
Have you proved or disproved each hypothesis?
Conduct additional tests
Are there things you want to follow up or
investigate further?
Write up project
4
Logistic Regression When And Why

To predict outcome variable that is a categorical
dichotomy from one or more categorical or
continuous predictor variables..
Used because having a categorical dichotomy as an
outcome variable violates the assumption of
linearity in normal regression.

5
The Problem with dichotomous dependent variables

Are variables which take two values. Usually
coded 1 and 0
i.e. being unemployed or not being pregnant or
not smoking or not
Linear regression will not work because there is
no range of scores everyone will either be 1 or
0.
Therefore, instead of regressing the values, we
regress the probability of taking one of the two
values (or of being in the 1 category)
i.e. the probability of being unemployed of
being pregnant of smoking
This gives us a range of values as probabilities
range from 0 to 1.
However this still presents a problem for linear
regression. As we are likely to get an impossible
probability sometimes

6
Going from a Probabilityto a Logit 1

Probabilities (that Y1) range from 0 to 1.
For example if the probability of smoking was .48
The probability of an event not happening is 1
minus the probability of the event 1 p(Y1).
The probability of not smoking would be 1-.48.52
The odds of an event happening is the probability
of it happening divided by the probability of it
not happening.
So the odds of smoking is 0.48/0.520.92.
Odds range from 0 to infinity (?).
Probabilities greater than .5 produce odds
between 1 and ?.
Probabilities less than .5 produce odds between 0
and 1.
This means that we cannot get a number larger
than is reasonable, but can still get one that is
smaller than is reasonable (i.e. less than 0).

7
Going from a Probabilityto a Logit 2

In order to deal with this problem we take the
natural logarithm of the odds that Y1. This is
referred to as logit (Y). If we use ln to stand
it for natural log, the equation for logit (Y)
is
Note the natural logarithm expresses numbers to
base 2.72 (to an infinite number of decimal
places). This means that the natural log of 2.72
is 1 and the natural log of 2.722 is 2, etc.
This transformation stretches the lower values of
the odds that Y1 so that the linear equation
does not predict impossibly low values. (As the
odds decrease from 1 to 0 the logit value becomes
negative and increasingly large, going to ?).

8
Relationship between Probability and Logit
There is a non-linear relationship between p and
its logit. In the mid range of p there is a
linear relationship, but as p approaches the 0 or
1 extremes the relationship becomes non-linear
with increasingly larger changes in logit for the
same change in p. This means that instead of
having a dependent variable that has a minimum of
0 and a maximum of 1, we have a dependent
variable with a minimum of -? and a maximum of ?.
This means that it will not be possible to have a
result that is beyond the range of possible
values.
P
9
The logistic regression equation

The logistic regression equation can be arranged
in a linear form (like a regression equation)
log Prob(event)/Prob(no event) a b1x1
b2x2 ... bPxP

10
Converting back to probabilities

Since the coefficients in logistic regression are
not easily interpretable (unlike linear
regression) we convert values of logit (Y) back
to the more meaningful values of odds and
probabilities.
To obtain the odds that Y 1 we unlog logit
(Y). This is done by taking the anti-log (or
exponent, written as e). The equation is
Odds (Y1) e a bX
To get back to the probability that Y 1 we can
reverse the calculation that turned the
probability into odds
The probability that Y 1 e a bX divided by
1 ea bX

11
Example

If we look at the effects of stress on smoking
We have an equation for which we get the results
Logit(Y) a bx
Logit(Y) -.08987 0.1638x
If x (stress) is low it would be scored as 1.
Therefore
Logit(Y) -.08987 (1 x 0.1638) -0.735
Therefore the odds of smoking will be
Odds(smoker1) e-0.735 0.48
This can be interpreted as saying that the
respondents reporting very low stress are about
half as likely to smoke as not smoke.
The probability that they smoke will be
Probability of smoking odds of smoking ? (1
odds of smoking) 0.33 or 33

12
Example contd.

On the other hand, if the respondent reported a
very high level of stress (i.e. 10) his or her
estimated probability of smoking will
be Logit(smoker) -0.8987 (10 x 0.1638)
0.738 Odds(smoker1) e0.738 2.09
This indicates that the odds of being a smoker
are just over twice as high as those of not being
a smoker.
And the probability that a highly stressed person
will smoke is
Probability of smoking odds of smoking ? (1
odds of smoking) 0.68 or 68