Lecture 4, part 1: Linear Regression Analysis: Two Advanced Topics - PowerPoint PPT Presentation

About This Presentation

Title:

Lecture 4, part 1: Linear Regression Analysis: Two Advanced Topics

Description:

Lecture 4, part 1: Linear Regression Analysis: Two Advanced Topics Karen Bandeen-Roche, PhD Department of Biostatistics Johns Hopkins University – PowerPoint PPT presentation

Number of Views:244

Avg rating:3.0/5.0

Slides: 26

Provided by: KarenB195

Learn more at: https://www.biostat.jhsph.edu

Category:

more less

Transcript and Presenter's Notes

Title: Lecture 4, part 1: Linear Regression Analysis: Two Advanced Topics

1
Lecture 4, part 1 Linear RegressionAnalysis
Two Advanced Topics
Karen Bandeen-Roche, PhD Department of
Biostatistics Johns Hopkins University

July 14, 2011

Introduction to Statistical Measurement and
Modeling
2
Data examples

Boxing and neurological injury
Scientific question Does amateur boxing lead to
decline in neurological performance?
Some related statistical questions
Is there a dose-response increase in the rate of
cognitive decline with increased boxing exposure?
Is boxing-associated decline independent of
initial cognition and age?
Is there a threshold of boxing that initiates
harm?

3
Boxing data
4
Outline

Topic 1 Confounding
Handling this is crucial if we are to draw
correct conclusions about risk factors
Topic 2 Signal / noise decomposition
Signal Regression model predictions
Noise Residual variation
Another way of approaching inference, precision
of prediction

5
Topic 1 Confounding

Confound means to confuse

When the comparison is between groups that are
otherwise not similar in ways that affect the
outcome

Lurking variables,.

6
Confounding Example Drowning and Eating Ice
Cream

Drowning rate

Ice Cream eaten
7
Confounding
Epidemiology definition A characteristic C is
a confounder if it is associated (related) with
both the outcome (Y drowning) and the risk
factor (X ice cream) and is not causally in
between
8
Confounding
Statistical definition A characteristic C is
a confounder if the strength of relationship
between the outcome (Y drowning) and the risk
factor (X ice cream) differs with, versus
without, adjustment for C
Outdoor Temperature
9
Confounding Example Drowning and Eating Ice
Cream

Drowning rate

Warm temperature

Cool temperature
Ice Cream eaten
10
Effect modification
A characteristic E is an effect modifier if the
strength of relationship between the outcome (Y
drowning) and the risk factor (X ice cream)
differs within levels of E
Outdoor temperature
11
Effect Modification Drowning and Eating Ice
Cream

Drowning rate

Warm temperature

Cool temperature
Ice Cream eaten
12
Topic 2 Signal/Noise Decomposition

Lovely due to geometry of least squares
Facilitates testing involving multiple parameters
at once
Provides insight into R-squared

13
Signal/Noise Decomposition

First step decomposition of variance
Regression part Variance of s
Error or Residual part Variance of e
Together These determine total variance of Ys
Sums of Squares (SS) rather than variance per
se
Regression SS (SSR)
Error SS (SSE)
Total SS (SST)

14
Signal/Noise Decomposition

Properties
SST SSR SSE
SSR/SST proportion of variance explained by
regression R-squared
Follows from geometry
SSR and SSE are independent (assuming A1-A5) and
have easily characterized probability
distributions
Provides convenient testing methods
Follows from geometry plus assumptions

15
Signal/Noise Decomposition

SSR and SSE are independent
Define M span(X) and take Y as centered at
It is possible to orthogonally rotate the
coordinate axes so that first p axes e M
remaining n-p-1 axes e M?
Gram-Schmidt orthogonalization
Doing this transforms Y into TY Z, for some
orthonormal matrix T with columns e1,...,en-1
Distribution of Z N(TEYX,s2I)

16
Signal/Noise Decomposition

SSR and SSE are independent - continued
TYZ Y TZ
SSE squared length of
SSR squared length of
Claim now follows SSR SSE are independent
because (Z1,,Zp) and (Zp1,,Zn-1) are
independent

17
Signal/Noise Decomposition

Under A1-A5 SSE, SSR and their scaled ratio have
convenient distributions
Under A1-A2 EYX e M, EZjX 0, all jgtp
Recall Z1,...,Zn-1 are mutually independent
normal with variances2
Thus SSE
s2 ?2n-p-1 under A1-A5
(a sum of k independent squared N(0,1) is
)

18
Signal/Noise Decomposition

Under A1-A5 SSE, SSR and their scaled ratio have
convenient distributions
For j p EZjX ? 0 in general
Exception H0 ß1ßp 0
Then SSR s2 ?2p under A1-A5
and
Fp,n-p-1
with numerator and denominator independent.

19
Signal/Noise Decomposition

An organizational tool The analysis of variance
(ANOVA) table

SOURCE Sum of Squares (SS) Degrees of freedom (df) Mean square (SS/df)
Regression SSR p SSR/p
Error SSE n-p-1 SSE/(n-p-1)
Total SST SSR SSE n-1
F MSR/MSE
20
Global hypothesis tests

These involve sets of parameters
Hypotheses of the form
H0 ßj 0 for all j in a defined subset of
j1,...,p vs. H1 ßj ? 0 for at least one of
the j
Example 1 H0 ßLATITUDE 0 and ßLONGITUDE 0
Example 2 H0 all polynomial or spline
coefficients involving a given variable
0.
Example 3 H0 all coefficients involving a
variable 0.

21
Global hypothesis tests

Testing method Sequential decomposition of sums
of squares
Hypothesis to be tested is H0 ßj1...ßjk 0 in
full model
Fit model excluding xj1,...,xjpj Save SSE
SSEs
Fit full (or larger) model adding xj1,...,xjpj
to smaller model. Save SSESSEL,
oftenoverall SSE
Test statistic S (SSES-SSEL)/pj/SSEL(n-p-1)
Distribution under null F(pj,n-p-1)
Define rejection region based on this
distribution
Compute S
Reject or not as S is in rejection region or not

22
Signal/Noise Decomposition

An augmented version for global testing

SOURCE Sum of Squares (SS) Degrees of freedom (df) Mean square (SS/df)
Regression SSR p SSR/p
X1 SST-SSEs p1
X2X1 SSES-SSEL p2 (SSES-SSEL )/p2
Error SSEL n-p-1 SSEL/(n-p-1)
Total SST SSR SSE n-1
F MSR(21)/MSE
23
R-squared Another view

From last lecture ECDF Corr(Y, ) squared
More conventional R2 SSR/SST
Geometry justifies why they are the same
Cov(Y, ) Cov(Y- , ) Cov(e,
) Var( )
Covariance inner product first
term 0
A measure of precision with which regression
model describes individual responses

24
Outline A few more topics

Colinearity
Overfitting
Influence
Mediation
Multiple comparisons

25
Main points

Confounding occurs when an apparent association
between a predictor and outcome reflects the
association of each with a third variable
A primary goal of regression is to adjust for
confounding
Least squares decomposition of Y into fit and
residual provides an appealing statistical
testing framework
An association of an outcome with predictors is
evidenced if SS due to regression is large
relative to SSE
Geometry orthogonal decomposition provides
convenient sampling distribution, view of R2
ANOVA