Binary Logistic Regression - PowerPoint PPT Presentation

About This Presentation
Title:

Binary Logistic Regression

Description:

Binary Logistic Regression To be or not to be, that is the question.. (William Shakespeare, Hamlet ) Binary Logistic Regression Also known as logistic ... – PowerPoint PPT presentation

Number of Views:2787
Avg rating:3.0/5.0
Slides: 93
Provided by: JohnO150
Category:

less

Transcript and Presenter's Notes

Title: Binary Logistic Regression


1
Binary Logistic Regression To be or not to be,
that is the question..(William Shakespeare,
Hamlet)
2
Binary Logistic Regression
  • Also known as logistic or sometimes logit
    regression
  • Foundation from which more complex models derived
  • e.g., multinomial regression and ordinal logistic
    regression

3
Dichotomous Variables
  • Two categories indicating whether an event has
    occurred or some characteristic is present
  • Sometimes called binary or binomial variables

4
Dichotomous DVs
  • Placed in foster care or not
  • Diagnosed with a disease or not
  • Abused or not
  • Pregnant or not
  • Service provided or not

5
Single (Dichotomous) IV Example
  • DV continue fostering, 0 no, 1 yes
  • Customary to code category of interest 1 and the
    other category 0
  • IV married, 0 not married, 1 married
  • N 131 foster families
  • Are two-parent families more likely to continue
    fostering than one-parent families?

6
Crosstabulation
  • Table 2.1
  • Relationship between marital status and
    continuation is statistically significant ?2(1,
    N 131) 5.65, p .017
  • A higher percentage of two-parent families
    (62.20) than single-parent families (40.82)
    planned to continue fostering

7
Strength Direction of Relationships
  • Different ways to quantify the relationship
    between IV(s) and DV
  • Probabilities
  • Odds
  • Odds Ratio (OR)
  • Also abbreviated as eB, Exp(B) (on SPSS output),
    or exp(B)
  • change

8
Roadmap to Computations
9
Probabilities
  • Percentages in Table 2.1 as probabilities (e.g.,
    62.20 as .6220)
  • p
  • Probability that event will occur (continue)
  • e.g., probability that one-parent families plan
    to continue is .4082
  • 1 p
  • Probability that event will not occur (not
    continue)
  • e.g., probability that one-parent families do not
    plan to continue is .5918 (1 - .4082)

10
Odds
  • Ratio of probability that event will occur to
    probability that it will not
  • e.g., odds of continuation for one-parent
    families are .69 (.4082 / .5918)
  • Can range from 0 to positive infinity

11
Probabilities and Odds
  • Table 2.2
  • Odds 1
  • Both outcomes equally likely
  • Odds gt 1
  • Probability that event will occur greater than
    probability that it will not
  • Odds lt 1
  • Probability that event will occur less than
    probability that it will not

12
Odds Ratio (OR)
  • Odds of the event for one value of the IV
    (two-parent families) divided by the odds for a
    different value of the IV, usually a value one
    unit lower (one-parent families)
  • e.g., odds of continuing for two-parent families
    more than double the odds for one-parent families
  • OR 1.6455 / .6898 2.39

13
OR (contd)
  • Plays a central role in quantifying the strength
    and direction of relationships between IVs and
    DVs in binary, multinomial, and ordinal logistic
    regression
  • OR lt 1 indicates a negative relationship
  • OR gt 1 indicates a positive relationship
  • OR 1 indicates no linear relationship

14
ORs gt 1
  • e.g., OR of 2.39
  • A one-unit increase in the independent variable
    increases the odds of continuing by a factor of
    2.39
  • The odds of continuing are 2.39 times higher for
    two-parent compared to one-parent families

15
ORs lt 1
  • e.g., OR .50
  • A one-unit increase in the independent variable
    decreases the odds of continuing by a factor of
    .50
  • The odds that two-parent families will continue
    are .50 (or one-half) of the odds that one-parent
    families will continue

16
ORs lt 1 (contd)
  • Compute reciprocal (i.e., 1 / .50 2.00)
  • Express relationship as opposite event of
    interest (e.g., discontinuing)
  • A one-unit increase in the independent variable
    increases the odds of discontinuing by a factor
    of 2.00
  • The odds that two-parent families will
    discontinue are 2.00 times (or twice) the odds of
    one-parent families

17
OR to Percentage Change
  • change 100(OR 1)
  • Alternative way to express OR
  • e.g., A one-unit increase in the independent
    variable increases the odds of continuing by
    139.00
  • 100(2.39 1) 139.00
  • e.g., A one-unit increase in the independent
    variable decreases the odds of continuing by
    50.00
  • 100(.50 1) -50.00

18
Comparing OR gt 1 and OR lt 1
  • Compute reciprocal of one of the ORs
  • e.g., OR of 2.00 and an OR of .50
  • Reciprocal of .50 is 2.00 (1 / .50 2.00)
  • ORs are equal in size (but not in direction of
    the relationship)

19
Qualitative Descriptors for OR
  • Table 2.3
  • Use cautiously with IVs that arent dichotomous

20
Question Answer
  • Are two-parent families more likely to continue
    fostering than one-parent families?
  • Yes. The odds of continuing are 2.39 times (139)
    higher for two-parent compared to one-parent
    families. The probability of continuing is .41
    for one-parent families and .62 for two-parent
    families.

21
Binary Logistic Regression Example
  • DV continue fostering, 0 no, 1 yes
  • Customary to code category of interest 1 and the
    other category 0
  • IV married, 0 not married, 1 married
  • N 131 foster families
  • Are two-parent families more likely to continue
    fostering than one-parent families?

22
Statistical Significance
  • Table 2.4
  • Relationship between marital status and
    continuation is statistically significant (Wald
    ?2 5.544, p .019)

23
Direction of Relationship
  • B slope
  • Positive slope, positive relationship
  • OR gt 1
  • Negative slope, negative relationship
  • OR lt 1
  • 0 slope, no linear relationship
  • OR 1

24
Direction/Strength of Relationship
  • Positive relationship between marital status and
    continuation
  • Two-parent families more likely to continue
  • B .869
  • Exp(B) OR 2.385
  • change 100(2.385 - 1) 139
  • The odds of continuing are 2.39 times (139)
    higher for two-parent compared to one-parent
    families

25
Roadmap to Computations
26
Binary Logistic Regression Model
  • ln(p/ (1 - p)) a ?1X1 ? 1X2 ? kXk, or
  • ln(p / (1 - p)) ?
  • p is the probability of the event
  • ? (eta) is the abbreviation for the linear
    predictor (right hand side of this equation)
  • k number of independent variables

27
Logit Link
  • ln(p / (1 - p))
  • Log of the odds that the DV equals 1 (event
    occurs)
  • Connects (i.e., links) DV to linear combination
    of IVs

28
Estimated Logits (L)
  • ln(p / 1 - p) a B1X1 B1X2 BkXk
  • ln(p / 1 p)
  • Log of the odds that the DV equals 1 (event
    occurs)
  • Estimated logit, L
  • Does not have intuitive or substantive meaning
  • Useful for examining curvilinear relationships
    and interaction effects
  • Primarily useful for estimating probabilities,
    odds, and ORs

29
Estimated Logits (L)
  • L(Continue) a BMarriedXMarried
  • L(Continue) -.372 (.869)(XMarried)
  • a intercept
  • B slope

30
Logit to Odds
  • If L 0
  • Odds eL e0 1.00
  • If L .50
  • Odds eL e.50 1.65
  • If L 1.00
  • Odds eL e1.00 2.72

31
Logits to Odds (contd)
  • Table 2.4
  • One-parent families
  • L(Continue) -.372 -.372 (.869)(0)
  • Odds of continuing e-.372 .69
  • Two-parent families
  • L(Continue) .497 -.372 (.869)(1)
  • Odds of continuing e.497 1.65

32
Odds to OR
  • OR 1.65 / .69 2.39, or
  • e.869 2.39, labeled Exp(B)
  • Table 2.4

33
OR to Percentage Change
  • change 100(OR 1)
  • e.g., A one-unit increase in the independent
    variable increases the odds of continuing by
    139.00
  • 100(2.39 1) 139.00
  • e.g., A one-unit increase in the independent
    variable decreases the odds of continuing by
    50.00
  • 100(.50 1) -50.00

34
Logits to Probabilities
  • One-parent families, L(Continue) -.372
  • Two-parent families, L(Continue) .497

35
Question Answer
  • Are two-parent families more likely to continue
    fostering than one-parent families?
  • Yes. The odds of continuing are 2.39 times (139)
    higher for two-parent compared to one-parent
    families. The probability of continuing is .41
    for one-parent families and .62 for two-parent
    families.

36
Single (Quantitative) IV Example
  • DV continue fostering, 0 no, 1 yes
  • Customary to code category of interest 1 and
    other category 0
  • IV number of resources
  • N 131 foster families
  • Are foster families with more resources more
    likely to continue fostering?

37
Statistical Significance
  • Table 2.5
  • Relationship between resources and continuation
    is statistically significant (Wald ?2 4.924, p
    .026)
  • H0 ? 0, ? ? 0, ? 0, same as
  • H0 OR 1, OR ? 1, OR 1
  • Likelihood ratio ?2 better than Wald

38
Direction/Strength of Relationship
  • Positive relationship between resources and
    continuation
  • Families with more resources are more likely to
    continue
  • B .212
  • Exp(B) OR 1.237
  • change 100(1.237 1) 24
  • The odds of continuing are 1.24 times (24)
    higher for each additional resource

39
Estimated Logits
  • L(Continue) -1.227 (.212)(X)

40
Figures
  • Resources.xls

41
Effect of Resources on Continuation (Logits)
42
Effect of Resources on Continuation (Odds)
43
Effect of Resources on Continuation
(Probabilities)
44
Question Answer
  • Are foster families with more resources more
    likely to continue fostering?
  • Yes. The odds of continuing are 1.24 times (24)
    higher for each additional resource. The
    probability of continuing is .31 for families
    with two resources, .51 for families with 6
    resources, and .71 for families with 10 resources.

45
Relationship of Linear Predictor to Logits, Odds
p
  • Relationship between linear predictor and logits
    is linear
  • Relationship between linear predictor and odds is
    non-linear
  • Relationship between linear predictor and p is
    non-linear
  • Challenge is to summarize changes in odds and
    probabilities associated with changes in IVs in
    the most meaningful and parsimonious way

46
Logit as Function of Linear Predictor
47
Odds as Function of Linear Predictor
48
Probabilities as Function of Linear Predictor
49
IVs to z-scores
  • z-scores (standard scores)
  • Only the IV (not DV)--semi-standardized slopes
  • One-unit increase in the IV refers to a
    one-standard-deviation increase
  • OR interpreted as expected change in the odds
    associated with a one standard deviation increase
    in the IV
  • Conversion to z-scores changes intercept, slope,
    and OR, but not associated test statistics
  • Table 2.6 (compare to Table 2.5)

50
Figures
  • zResources.xls

51
Effect of zResources on Continuation
(Probabilities)
52
Question Answer
  • Are foster families with more resources more
    likely to continue fostering?
  • Yes. The odds of continuing are 1.51 times (51)
    higher for each one standard deviation (1.93)
    increase in resources. The probability of
    continuing is .34 for families with resources two
    standard deviations below the mean, .54 for
    families with the mean number of resources
    (6.60), and .73 for families with resources two
    standard deviations above the mean.

53
IVs Centered
  • Centering
  • Typically center on mean
  • Useful when testing interactions, curvilinear
    relationships, or when no meaningful 0 point
    (e.g., no family with 0 resources)
  • Centering doesnt change slope, OR, or associated
    test statistics, but does change the intercept
  • Table 2.7 (compare to Table 2.5)

54
Figures
  • cResources.xls

55
Effect of cResources on Continuation
(Probabilities)
56
Question Answer
  • Are foster families with more resources more
    likely to continue fostering?
  • Yes. The odds of continuing are 1.24 times (24)
    higher for each additional resource. The
    probability of continuing is .34 for families
    with 4 resources below the mean, .54 for families
    with the mean number of resources (6.60), and .74
    for families with 4 resources above the mean.

57
Multiple IV Example
  • DV continue fostering, 0 no, 1 yes
  • Customary to code the category of interest as 1
    and the other category as 0
  • IV married, 0 not married, 1 married
  • IV number of resources (z-scores)
  • N 131 foster families
  • Are foster families with more resources more
    likely to continue fostering, controlling for
    marital status?

58
Statistical Significance
  • Table 2.12
  • Relationship between set of IVs and continuation
    is statistically significant (?2 6.58, p
    .037)
  • H0 ?1 ?2 ?k 0, same as
  • H0 ?1 ?2 ?k 1
  • ? (psi) is symbol for population value of OR

59
Statistical Significance (contd)
  • Table 2.13
  • Relationship between resources and continuation
    is not statistically significant, controlling for
    marital status (?2 .92, p .338)
  • Relationship between marital status and
    continuation is not statistically significant,
    controlling for resources (?2 1.42, p .234)
  • H0 ? 0, ? ? 0, ? 0, same as
  • H0 ? 1, ? ? 1, ? 1
  • ? (psi) is symbol for population value of OR
  • Likelihood ratio ?2 better than Wald

60
Statistical Significance (contd)
  • Table 2.9
  • Relationship between resources and continuation
    is not statistically significant, controlling for
    marital status (?2 .91, p .340)
  • Relationship between marital status and
    continuation is not statistically significant,
    controlling for resources (?2 1.41, p .235)
  • H0 ? 0, ? ? 0, ? 0, same as
  • H0 ? 1, ? ? 1, ? 1
  • ? (psi) is symbol for population value of OR
  • Wald ?2, but likelihood ratio ?2 better

61
Estimated Logits
  • L(Continue) -.183 (.228)(XzResources)
    (.570)(XMarried)

62
ORs Percentage Change
  • ORzResources 1.256 (ns)
  • The odds of continuing are 1.26 times (26)
    higher for each one standard deviation (1.93)
    increase in resources, controlling for marital
    status
  • ORMarried 1.769 (ns)
  • The odds of continuing are 1.77 times (77)
    higher for two-parent compared to one-parent
    families, controlling for marital status

63
Figures
  • Married zResources.xls

64
Effect of Resources and Marital Status on Plans
to Continue Fostering (Odds)
65
Effect of Resources and Marital Status on Plans
to Continue Fostering (Probabilities)
66
Presenting Odds and Probabilities in Tables
  • Tables 2.10 and 2.11

67
Question Answer
  • Are foster families with more resources more
    likely to continue fostering, controlling for
    marital status?
  • No (ns). The odds of continuing are 1.26 times
    (26) higher for each one standard deviation
    (1.93) increase in resources, controlling for
    marital status.
  • Contd

68
Question Answer (contd)
  • For one-parent families the probability of
    continuing is .35 for families with resources two
    standard deviations below the mean, .45 for
    families with the mean number of resources, and
    .57 for families with resources two standard
    deviations above the mean. For two-parent
    families the probability of continuing is .48 for
    families with resources two standard deviations
    below the mean, .60 for families with the mean
    number of resources, and .70 for families with
    resources two standard deviations above the mean.

69
Comparing the Relative Strength of IVs
  • Size of slope and OR depend on how the IV is
    measured
  • When IVs measured the same way (e.g., two
    dichotomous IVs or two continuous IVs transformed
    to z-scores) relative strength can be compared
  • Nothing comparable to standardized slope (Beta)

70
Nested Models
71
Nested Models (contd)
  • One regression model is nested within another if
    it contains a subset of variables included in the
    model within which its nested, and same cases
    are analyzed in both models
  • The more complex model called the full model
  • The nested model called the reduced model.
  • Comparison of full and reduced models allows you
    to examine whether one or more variable(s) in the
    full model contribute to explanation of the DV

72
Sequential Entry of IVs
  • Used to compare full and reduced models
  • e.g., family resources entered first, and then
    marital status
  • Fchange used in linear regression

73
Sequential Entry of IVs (contd)
  • SPSS GZLM doesnt allow sequential of IVs
  • Estimate models separately and compare omnibus
    likelihood ratio ?2 values
  • Reduced model ?2(1) 5.168
  • Full model ?2(2) 6.585
  • ?2 difference 6.585 5.168 1.417
  • df difference 2 1
  • p .234
  • Chi-square Difference.xls

74
Assumptions Necessary for Testing Hypotheses
  • No assumptions unique to binary logistic
    regression other than ones discussed in GZLM
    lecture

75
Model Evaluation
  • Evaluate your model before you test hypotheses or
    interpret substantive results
  • Outliers
  • Analogs of R2

76
Outliers
  • Atypical cases
  • Can lead to flawed conclusions
  • Can provide theoretical insights
  • Common causes
  • Data entry errors
  • Model misspecification
  • Rare events

77
Outliers (contd)
  • Leverage
  • Residuals
  • Standardized or unstandardized deviance residuals
  • Influence
  • Cooks D

78
Leverage
  • Think of a seesaw
  • Leverage value for each case
  • Cases with greater leverage can exert a
    disproportionately large influence
  • Leverage value for each case
  • No clear benchmarks
  • Identify cases with substantially different
    leverage values than those of other cases

79
Residuals
  • Difference between actual and estimated values of
    the DV for a case
  • Residual for each case
  • Large residual indicates a case for which model
    fits poorly

80
Residuals (contd)
  • Standardized or unstandardized deviance residuals
  • Not normally distributed
  • Values less than -2 or greater than 2 warrant
    some concern
  • Values less than -3 or greater than 3 merit
    close inspection

81
Influence
  • Cases whose deletion result in substantial
    changes to regression coefficients
  • Cooks D for each case
  • Approximate aggregate change in regression
    parameters resulting from deletion of a case
  • Values of 1.0 or more indicate a problematic
    degree of influence for an individual case

82
Index Plot
  • Scatterplot
  • Horizontal axis (X)
  • Case id
  • Vertical axis (Y)
  • Leverage values, or
  • Residuals, or
  • Cooks D

83
Index Plot Leverage Values
84
Index Plot Standardized Deviance Residuals
85
Index Plot Cooks D
86
Analogs of R2
  • None in standard use and each may give different
    results
  • Typically much smaller than R2 values in linear
    regression
  • Difficult to interpret

87
Multicollinearity
  • SPSS GZLM doesnt compute multicollinearity
    statistics
  • Use SPSS linear regression
  • Problematic levels
  • Tolerance lt .10 or
  • VIF gt 10

88
Additional Topics
  • Polychotomous IVs
  • Curvilinear relationships
  • Interactions

89
Overview of the Process
  • Select IVs and decide whether to test curvilinear
    relationships or interactions
  • Carefully screen and clean data
  • Transform and code variables as needed
  • Estimate regression model
  • Examine assumptions necessary to estimate binary
    regression model, examine model fit, and revise
    model as needed

90
Overview of the Process (contd)
  • Test hypotheses about the overall model and
    specific model parameters, such as ORs
  • Create tables and graphs to present results in
    the most meaningful and parsimonious way
  • Interpret results of the estimated model in terms
    of logits, probabilities, odds, and odds ratios,
    as appropriate

91
Additional Regression Models for Dichotomous DVs
  • Binary probit regression
  • Substantive results essentially indistinguishable
    from binary logistic regression
  • Choice between this and binary logistic
    regression largely one of convenience and
    discipline-specific convention
  • Many researchers prefer binary logistic
    regression because it provides odds ratios
    whereas probit regression does not, and binary
    logistic regression comes with a wider variety of
    fit statistics

92
Additional Regression Models for Dichotomous DVs
(contd)
  • Complementary log-log (clog-log) and log-log
    models
  • Probability of the event is very small or large
  • Loglinear regression
  • Limited to categorical IVs
  • Discriminant analysis
  • Limited to continuous IVs
Write a Comment
User Comments (0)
About PowerShow.com