Logistic Regression - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Logistic Regression

Description:

... (an iterated least squares approach) to iteratively solve for the parameters. ... Model-building is costly for this method, because it must be re-fitted entirely ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 16
Provided by: laura202
Category:

less

Transcript and Presenter's Notes

Title: Logistic Regression


1
Logistic Regression
  • Laura Heath

2
What is linear regression?
  • Begin with a series of measurements
  • Often have a training set (supervised learning)
  • Measurements may be noisy
  • Use them to estimate a linear, stochastic
    function to model the real-world process
  • Use the model to predict future outputs or to
    assign future data to classes

3
Why use linear models?
  • A variety of models is available
  • Least squares
  • Nearest neighbor
  • Logistic regression
  • Possible to compute
  • Can be used with transforms of the inputs
  • Often more robust than non-linear models with
    noisy observations, sparse data or small training
    sets

4
Terminology
  • X X1, X2 Xp is a vector of inputs
    (observations)
  • G 1,2 K is a vector of the classes the data
    is assigned to
  • Y y0, y1 yp is the vector of predicted values

5
The model
  • The model we are forming is
  • p
  • f(X) ß0 S Xj ßj
  • j1
  • Choose the maximum-likelihood to model the
    posterior probabilities of the K classes

6
The model (cont.)
  • Log Pr(G1Xx)/Pr(GKXx)
  • ß10 ß1Tx
  • Log Pr(G2Xx)/Pr(GKXx)
  • ß20 ß2Tx
  • Log Pr(GK-1Xx)/Pr(GKXx)
  • ß(K-1)0 ß(K-1)Tx

7
Properties
  • The probabilities are all on 0,1
  • They all sum to one
  • Each is dependent on the entire set of parameters
    T ß10, ß1, ßK-1
  • Pr(GX) is a multinomial distribution
  • Binomial if G0,1
  • Will use this case for the following development

8
Determining the parameters
  • Define p1(x)p(x) p2(x)1-p(x)
  • The log-likelihood is
  • l(ß) Syi log p(x) (1-yi) log (1-p(x))
  • Syi ßTxi log1exp(ßTxi)
  • To maximize, set its derivative to zero
  • ?l(ß)/?ß S xi yi - p(xi) 0

9
Solving for the parameters
  • Most commonly, use the Newton-Raphson algorithm
    (an iterated least squares approach) to
    iteratively solve for the parameters. Take the
    Hessian
  • ?2l(ß)/?ß?ßT -SxixiT p(xi) 1-p(xi)
  • Iterate the parameters
  • ßnew ßold ?2l(ß)/?ß?ßT-1 ?l(ß)/?ß

10
Solving for the parameters
  • Defining
  • an NxN matrix W with the ith diagonal element
    p(xi)1- p(xi) of ßold
  • y the vector of yis
  • p the vector of fitted probabilities
  • Then
  • ?l(ß)/?ß XT(y-p)
  • ?2l(ß)/?ß?ßT-XTWX

11
Solving for the parameters
  • Finally
  • ßnew ßold (XTWX)-1XT(y-p)
  • (XTWX)-1XT W X ßoldW-1(y-p)
  • (XTWX)-1XT W z
  • where z X ßoldW-1(y-p)
  • At each iteration, p, W, and z change
  • Repeat iteration until all three converge

12
Notes on finding the coefficients
  • Convergence is not guaranteed initial conditions
    must be close enough
  • Usually start with all ß0
  • If it overshoots, reduce step size by 1/2
  • If if converges, it does so exponentially fast
  • Can use other optimization methods as well

13
Example
  • South African heart disease statistics
  • Coefficient Std. error Wald test
  • (Intercept) 4.130 0.964 -4.285
  • Blood pressure 0.006 0.006 1.023
  • Tobacco 0.080 0.026 3.034
  • LDL 0.185 0.057 3.219
  • Family History 0.939 0.225 4.178
  • Obesity -0.035 0.029 -1.187
  • Alcohol 0.001 0.004 0.136
  • Age 0.043 0.010 4.184

14
Example
  • Coefficient Std. error Wald test
  • (Intercept) -4.204 0.498 8.45
  • Tobacco 0.081 0.026 3.16
  • LDL 0.168 0.054 3.09
  • Family History 0.924 0.223 4.14
  • Age 0.044 0.010 4.52

15
Final Comments on this Method
  • The weighted residual sum-of-squares is the
    Pearson Chi-square statistic
  • If the model is correct, the parameters are
    consistent
  • Model-building is costly for this method, because
    it must be re-fitted entirely if one parameter
    changes
  • It makes no assumptions about the joint densities
    of the data, which gives it some robustness
    against noise and error
Write a Comment
User Comments (0)
About PowerShow.com