Lecture 4, part 1: Linear Regression Analysis: Two Advanced Topics - PowerPoint PPT Presentation

About This Presentation
Title:

Lecture 4, part 1: Linear Regression Analysis: Two Advanced Topics

Description:

Lecture 4, part 1: Linear Regression Analysis: Two Advanced Topics Karen Bandeen-Roche, PhD Department of Biostatistics Johns Hopkins University – PowerPoint PPT presentation

Number of Views:244
Avg rating:3.0/5.0
Slides: 26
Provided by: KarenB195
Category:

less

Transcript and Presenter's Notes

Title: Lecture 4, part 1: Linear Regression Analysis: Two Advanced Topics


1
Lecture 4, part 1 Linear RegressionAnalysis
Two Advanced Topics
Karen Bandeen-Roche, PhD Department of
Biostatistics Johns Hopkins University
  • July 14, 2011

Introduction to Statistical Measurement and
Modeling
2
Data examples
  • Boxing and neurological injury
  • Scientific question Does amateur boxing lead to
    decline in neurological performance?
  • Some related statistical questions
  • Is there a dose-response increase in the rate of
    cognitive decline with increased boxing exposure?
  • Is boxing-associated decline independent of
    initial cognition and age?
  • Is there a threshold of boxing that initiates
    harm?

3
Boxing data
4
Outline
  • Topic 1 Confounding
  • Handling this is crucial if we are to draw
    correct conclusions about risk factors
  • Topic 2 Signal / noise decomposition
  • Signal Regression model predictions
  • Noise Residual variation
  • Another way of approaching inference, precision
    of prediction

5
Topic 1 Confounding
  • Confound means to confuse
  • When the comparison is between groups that are
    otherwise not similar in ways that affect the
    outcome
  • Lurking variables,.

6
Confounding Example Drowning and Eating Ice
Cream







Drowning rate



















Ice Cream eaten
7
Confounding
Epidemiology definition A characteristic C is
a confounder if it is associated (related) with
both the outcome (Y drowning) and the risk
factor (X ice cream) and is not causally in
between
8
Confounding
Statistical definition A characteristic C is
a confounder if the strength of relationship
between the outcome (Y drowning) and the risk
factor (X ice cream) differs with, versus
without, adjustment for C
Outdoor Temperature
9
Confounding Example Drowning and Eating Ice
Cream







Drowning rate









Warm temperature










Cool temperature
Ice Cream eaten
10
Effect modification
A characteristic E is an effect modifier if the
strength of relationship between the outcome (Y
drowning) and the risk factor (X ice cream)
differs within levels of E
Outdoor temperature
11
Effect Modification Drowning and Eating Ice
Cream










Drowning rate






Warm temperature










Cool temperature
Ice Cream eaten
12
Topic 2 Signal/Noise Decomposition
  • Lovely due to geometry of least squares
  • Facilitates testing involving multiple parameters
    at once
  • Provides insight into R-squared

13
Signal/Noise Decomposition
  • First step decomposition of variance
  • Regression part Variance of s
  • Error or Residual part Variance of e
  • Together These determine total variance of Ys
  • Sums of Squares (SS) rather than variance per
    se
  • Regression SS (SSR)
  • Error SS (SSE)
  • Total SS (SST)

14
Signal/Noise Decomposition
  • Properties
  • SST SSR SSE
  • SSR/SST proportion of variance explained by
    regression R-squared
  • Follows from geometry
  • SSR and SSE are independent (assuming A1-A5) and
    have easily characterized probability
    distributions
  • Provides convenient testing methods
  • Follows from geometry plus assumptions

15
Signal/Noise Decomposition
  • SSR and SSE are independent
  • Define M span(X) and take Y as centered at
  • It is possible to orthogonally rotate the
    coordinate axes so that first p axes e M
    remaining n-p-1 axes e M?
  • Gram-Schmidt orthogonalization
  • Doing this transforms Y into TY Z, for some
    orthonormal matrix T with columns e1,...,en-1
  • Distribution of Z N(TEYX,s2I)

16
Signal/Noise Decomposition
  • SSR and SSE are independent - continued
  • TYZ Y TZ
  • SSE squared length of
  • SSR squared length of
  • Claim now follows SSR SSE are independent
    because (Z1,,Zp) and (Zp1,,Zn-1) are
    independent

17
Signal/Noise Decomposition
  • Under A1-A5 SSE, SSR and their scaled ratio have
    convenient distributions
  • Under A1-A2 EYX e M, EZjX 0, all jgtp
  • Recall Z1,...,Zn-1 are mutually independent
    normal with variances2
  • Thus SSE
  • s2 ?2n-p-1 under A1-A5
  • (a sum of k independent squared N(0,1) is
    )

18
Signal/Noise Decomposition
  • Under A1-A5 SSE, SSR and their scaled ratio have
    convenient distributions
  • For j p EZjX ? 0 in general
  • Exception H0 ß1ßp 0
  • Then SSR s2 ?2p under A1-A5
  • and

  • Fp,n-p-1
  • with numerator and denominator independent.

19
Signal/Noise Decomposition
  • An organizational tool The analysis of variance
    (ANOVA) table

SOURCE Sum of Squares (SS) Degrees of freedom (df) Mean square (SS/df)
Regression SSR p SSR/p
Error SSE n-p-1 SSE/(n-p-1)
Total SST SSR SSE n-1
F MSR/MSE
20
Global hypothesis tests
  • These involve sets of parameters
  • Hypotheses of the form
  • H0 ßj 0 for all j in a defined subset of
    j1,...,p vs. H1 ßj ? 0 for at least one of
    the j
  • Example 1 H0 ßLATITUDE 0 and ßLONGITUDE 0
  • Example 2 H0 all polynomial or spline
    coefficients involving a given variable
    0.
  • Example 3 H0 all coefficients involving a
    variable 0.

21
Global hypothesis tests
  • Testing method Sequential decomposition of sums
    of squares
  • Hypothesis to be tested is H0 ßj1...ßjk 0 in
    full model
  • Fit model excluding xj1,...,xjpj Save SSE
    SSEs
  • Fit full (or larger) model adding xj1,...,xjpj
    to smaller model. Save SSESSEL,
    oftenoverall SSE
  • Test statistic S (SSES-SSEL)/pj/SSEL(n-p-1)
  • Distribution under null F(pj,n-p-1)
  • Define rejection region based on this
    distribution
  • Compute S
  • Reject or not as S is in rejection region or not

22
Signal/Noise Decomposition
  • An augmented version for global testing

SOURCE Sum of Squares (SS) Degrees of freedom (df) Mean square (SS/df)
Regression SSR p SSR/p
X1 SST-SSEs p1
X2X1 SSES-SSEL p2 (SSES-SSEL )/p2
Error SSEL n-p-1 SSEL/(n-p-1)
Total SST SSR SSE n-1
F MSR(21)/MSE
23
R-squared Another view
  • From last lecture ECDF Corr(Y, ) squared
  • More conventional R2 SSR/SST
  • Geometry justifies why they are the same
  • Cov(Y, ) Cov(Y- , ) Cov(e,
    ) Var( )
  • Covariance inner product first
    term 0
  • A measure of precision with which regression
    model describes individual responses

24
Outline A few more topics
  • Colinearity
  • Overfitting
  • Influence
  • Mediation
  • Multiple comparisons

25
Main points
  • Confounding occurs when an apparent association
    between a predictor and outcome reflects the
    association of each with a third variable
  • A primary goal of regression is to adjust for
    confounding
  • Least squares decomposition of Y into fit and
    residual provides an appealing statistical
    testing framework
  • An association of an outcome with predictors is
    evidenced if SS due to regression is large
    relative to SSE
  • Geometry orthogonal decomposition provides
    convenient sampling distribution, view of R2
  • ANOVA
Write a Comment
User Comments (0)
About PowerShow.com