Title: Chap 8: Introduction to Logistic Regression
1Chap 8Introduction to Logistic Regression
2Logistic regression
- Models the relationship between a set of
variables xi - dichotomous (eat yes/no)
- categorical (social class, ... )
- continuous (age, ...)
- and
- dichotomous variable Y
- Dichotomous (binary) outcome most common
situation in biology and epidemiology
3Logistic regression (1)
Table 2 Age and signs of coronary heart
disease (CD)
4How can we analyse these data?
- Comparison of the mean age of diseased and
non-diseased women - Non-diseased 38.6 years
- Diseased 58.7 years (plt0.0001)
- Linear regression?
5Dot-plot Data from Table 2
6Logistic regression (2)
- Table 3 Prevalence () of signs of CD
according to age group
7Dot-plot Data from Table 3
Diseased
Age (years)
8The logistic function (1)
Probability of disease
x
9The logistic function (2)
logodds
10The logistic function (3)
- Advantages of the logit
- Simple transformation of P(yx)
- Linear relationship with x
- Can be continuous (Logit between - ? to ?)
- Known binomial distribution (P between 0 and 1)
- Directly related to the notion of odds of disease
11Interpretation of b (1)
12Interprepation
- Intercept is the point on the Y-axis (log odds)
crossed by the regression line when X0. - Slope is the rate at which the predicted log odds
increases (or, in some cases, decreases) with
each successive unit of X. - Within the context of logistic regression, you
will usually find the slope of the log odds
regression line referred to as the "constant." - The exponent of the slope  exp(slope) describes
the proportionate rate at which the predicted
odds changes with each successive unit of X.
13Example
- If X29 and the odds is 1.81, then we say
- that
- The predicted odds for x29 is
- 1.81 times as large as the one for X28
- the one for X30 is 1.81 times as large
- as the one for X29 and so on.
14Interpretation of b (2)
- b increase in log-odds for a one unit
increase in x - Test of the hypothesis that b0 (Wald test)
- Interval testing
15(No Transcript)
16-0.123 is the rate at which the predicted CD odds
decreases with each successive unit of X. It
means also that the predicted CD odds for age30
is Exp(-0.123)0.9 times as large as the one for
age29 the one for X31 is 0.9 times as large
as the one for X30 and so on
17- Results of fitting Logistic Regression Model
- logO6.43(-.121 x 31)2.67The corresponding
predicted odds would be exp(logO)exp(2.67)14.
43 - And the corresponding predicted probability would
be probabilityO/(1O)14.43/(114.43)0.93
18Link between Logistic regression and threshold
variable
- When y is continuous we may use
- yi?0 ?1 xi1.?k xik ei
- If we create a new variable y, called threshold
variable such that - yi 1 when yi ? 0
- 0 when yi lt 0
- if we use a logistic distribution on ei instead
of a normal distribution, defined such that - p(ei ?
a)1/(1e-a). - Then we have
- pi1/(1exp-(?0 ?1
xi1.?k xik ) )
19Discrimination classification Rules
- Aim of this section is to classify a data into a
given group. - We need to find a rule which allows us to to
determine whether or not an observation falls
into a certain group/category/class. - We define, a categorical variable and use this
indicator as the response variable. - We need to establish a classification rule for
discriminating our populations.