Regression: (1) Simple Linear Regression - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Regression: (1) Simple Linear Regression

Description:

If assumptions hold, what can we do? Testing assumptions. When ... Weights and Lengths of Cetacean Species. Whitehead & Mann 2000. 3f) Compare intercepts ... – PowerPoint PPT presentation

Number of Views:343
Avg rating:3.0/5.0
Slides: 53
Provided by: HalWhi9
Category:

less

Transcript and Presenter's Notes

Title: Regression: (1) Simple Linear Regression


1
Regression(1) Simple Linear Regression
  • Hal Whitehead
  • BIOL4062 / 5062

2
Regression
  • Purposes of regression
  • Simple linear regression
  • Formula
  • Assumptions
  • If assumptions hold, what can we do?
  • Testing assumptions
  • When assumptions do not hold

3
Regression
  • One Dependent Variable Y
  • Independent Variables X1,X2,X3,...

4
Purposes of Regression
  • 1. Relationship between Y and X's
  • 2. Quantitative prediction of Y
  • 3. Relationship between Y and X controlling for
    C
  • 4. Which of X's are most important?
  • 5. Best mathematical model
  • 6. Compare regression relationships Y1 on X,
    Y2 on X
  • 7. Assess interactive effects of X's

5
  • Simple regression one X
  • Multiple regression two or more X's

6
Simple linear regression
  • Y ß0 ß 1X Error

7
Assumptions of simple linear regression
  • 1. Existence
  • 2. Independence
  • 3. Linearity
  • 4. Homoscedasticity
  • 5. Normality
  • 6. X measured without error

8
Assumptions of simple linear regression
  • 1. For any fixed value of X, Y is a random
    variable with a certain probability distribution
    having finite mean and variance
  • (Existence)

Y
Prob of Y
X
9
Assumptions of simple linear regression
  • 2. The Y values are statistically independent of
    one another
  • (Independence)

10
Assumptions of simple linear regression
  • 3. The mean value of Y given X is a straight
    line function of X
  • (Linearity)

Y
Prob of Y
X
11
Assumptions of simple linear regression
  • 4. The variance of Y is the same for all X
  • (Homoscedasticity)

Y
Prob of Y
X
12
Assumptions of simple linear regression
  • 5. For any fixed value of X, Y has a normal
    distribution
  • (Normality)

Y
Prob of Y
X
13
Assumptions of simple linear regression
  • 6. There are no measurement errors in X
  • (X measured without error)

14
Assumptions of simple linear regression
  • 1. Existence
  • 2. Independence
  • 3. Linearity
  • 4. Homoscedasticity
  • 5. Normality
  • 6. X measured without error

15
If assumptions hold, what can we do?
  • 1. Estimate ß0 (intercept), ß1 (slope), together
    with measures of uncertainty
  • 2. Describe quality of fit (variation of data
    around straight line) by estimate of s² or r²
  • 3. Tests of slope and intercept
  • 4. Prediction and prediction bands
  • 5. ANOVA Table

16
Parameters estimated using least-squares
  • Age-specific pregnancy rates of female sperm
    whales (from Best et al. 1984 Rep. int. Whal.
    Commn. Spec. Issue)

Find line which minimizes squares of residuals
17
1. Estimate ß0 (intercept), ß1 (slope), together
with measures of uncertainty
  • Age-specific pregnancy rates of female sperm
    whales (from Best et al. 1984 Rep. int. Whal.
    Commn. Spec. Issue)

18
1. Estimate ß0 (intercept), ß1 (slope), together
with measures of uncertainty
  • ß0 0.230
  • (SE 0.028)
  • 95 c.i.
  • 0.164 0.296
  • ß1 -0.0035
  • (SE 0.0009)
  • 95 c.i.
  • -0.0056 0.0013

19
2. Describe quality of fit by estimate of s² or
r²
  • s² 0.0195
  • r2 0.679
  • r2 (adjusted) 0.633
  • (Propn. variance accounted for by
  • regression)

20
3. Tests of slope and intercept
  • a) Slope 0 Equivalent to r0
  • b) Slope Predetermined constant
  • c) Intercept 0
  • d) Intercept Predetermined constant
  • e) Compare slopes
  • f) Compare intercepts Assume same slope
  • (tests use t-distribution)

21
3a) Slope 0 Equivalent to r0
  • Does pregnancy rate change with age?
  • H0 ß1 0
  • H1 ß1 ? 0
  • P0.006
  • Does pregnancy rate decline with age?
  • H0 ß1 0
  • H1 ß1 gt 0
  • P0.003

22
3b) Slope Predetermined constant
  • ß1 2.868 (SE 0.058)
  • 95 c.i. 2.752 2.984
  • Does shape change with length?
  • H0 ß1 3
  • H1 ß1 ? 3
  • Plt0.05

weightlength3
Weights and Lengths of Cetacean Species Whitehead
Mann In Cetacean Societies 2000
23
3c) Intercept 0
  • ß0 0.436 (SE 0.080)
  • 95 c.i. 0.276 0.596
  • Is birth length proportional to length?
  • H0 ß0 0
  • H1 ß0 ? 0
  • P0.000

24
3d) Intercept Predetermined constant
  • ?

25
3e) Compare slopes
  • ß1 (m) 2.528 (SE 0.409)
  • ß1 (o) 2.962 (SE 0.094)
  • Does shape change differently with length for
    odontocetes and mysticetes?
  • H0 ß1 (m) ß1 (o)
  • H1 ß1 (m) ? ß1 (o) P 0.146

Weights and Lengths of Cetacean Species Whitehead
Mann 2000
26
3f) Compare intercepts Assume same slope
  • ß0 (m) 2.528 (SE 0.409)
  • ß0 (o) 2.962 (SE 0.094)
  • Are odontocetes and mysticetes equally fat?
  • H0 ß0 (m) ß0 (o)
  • H1 ß0 (m) ? ß0 (o) P 0.781

15
10
Log(Weight)
5
ORDER
m
o
0
0
1
2
3
4
Log(Length)
27
4. Prediction and prediction bands
95 Confidence Bands for Regression Line
95 Prediction Bands
From http//www.tufts.edu/gdallal/slr.htm
28
5. ANOVA Table
  • Analysis of Variance
  • Source Sum-of-Squares df Mean-Square
    F-ratio P
  • Regression 286.27 1 286.27 2475.07 0.00
  • Residual 5.32 46 0.12

29
If assumptions hold, what can we do?
  • 1. Estimate ß0 (intercept), ß1 (slope), together
    with measures of uncertainty
  • 2. Describe quality of fit (variation of data
    around straight line) by estimate of s² or r²
  • 3. Tests of slope and intercept
  • 4. Prediction and prediction bands
  • 5. ANOVA Table

30
Testing assumptions diagnostics
  • Use residuals to look at assumptions of
    regression
  • e(i) Y(i) - (ß0 ß1X(i))

Observed
31
Residuals
  • Residual e(i) Y(i) - (ß0 ß1X(i))
  • Standardized residuals e(i)/S
  • S is the standard deviation of the residuals
  • with adjusted degrees of freedom
  • Studentized residuals e(i) / S?(1 - h(i))
  • h(i) is the "leverage value" of observation i
  • h(i) 1/n (X(i) - SX(i)/n )²/(n-1)S(X)²
  • Jackknifed residuals e(i) / S(-i) ?(1 -
    h(i))
  • The residual variance (S(-i)) is calculated
    separately with each observation deleted

32
Use Residuals to
  • a) look for outliers which we may wish to remove
  • b) examine normality
  • c) check for linearity
  • d) check for homoscedasticity
  • e) check for some kinds of non-independence

33
a) Using residuals to look for outliers
34
Should outliers be removed?
  • Yes
  • if outlier was probably not produced by the
    process being studied
  • measurement error
  • different species
  • ...
  • No
  • if outlier was probably produced by the process
    being studied
  • extreme specimen

35
b) Using residuals to examine normality
  • Lilliefors test for normality
  • P0.62
  • Lilliefors test for normality (excluding Bowhead
    whale)
  • P0.68

36
c) Using residuals to check for linearity
37
d) Use residuals to check for homoscedasticity
38
e) Use residuals to check for some kinds of
non-independence
Days spent following sperm whales
  • Durbin-Watson D Statistic 1.48
  • low values (lt2) indicate autocorrelation
  • First Order Autocorrelation 0.26

39
Use Residuals to
  • a) look for outliers which we may wish to remove
  • b) examine normality
  • c) check for linearity
  • d) check for homoscedasticity
  • e) check for some kinds of non-independence

40
Assumptions of simple linear regression
  • 1. Existence
  • 2. Independence
  • 3. Linearity
  • 4. Homoscedasticity
  • 5. Normality
  • 6. X measured without error

41
When assumptions do not hold
  • 1. Existence
  • Forget it!

42
When assumptions do not hold
  • 2. Independence
  • collect data differently
  • reduce the size of the data set
  • add additional terms to the regression model
  • (e.g. autocorrelation term, species effect)
  • More a problem for testing than prediction

43
When assumptions do not hold
  • 3. Linearity
  • Transform either X or Y or both variables. e.g.
  • Log(Y) ß0 ß1 Log(X) E
  • Polynomial regression
  • Y ß0 ß1 X ß2 X² ... E
  • Non-linear regression. e.g.
  • Y c EXP(ß0 ß1 X) E
  • Piecewise linear regression
  • Y ß0 ß1 X XgtXK E
  • where Xgt XK0 if Xlt XK and Xgt XK1 if Xgt XK.

44
(No Transcript)
45
Transformation to improve linearity
46
When assumptions do not hold
  • 4. Homoscedasticity
  • Transformations of the Y variable
  • Weighted regressions (if we know that some
    observations are more accurate than others)

47
Y - transformation to improve homoscedasticity
48
When assumptions do not hold
  • 5. Normality
  • Transformations of the Y variable
  • Non-normal error structures (e.g. Poisson)
  • Small departures from normality are not
    especially important, unless doing a test

49
When assumptions do not hold
  • 6. X measured without error
  • Major axis regression
  • Reduced major axis, or geometric mean, regression

50
Major axis regression
  • Minimize sum of squares of perpendicular
    distances from observations to regression line
  • Only if variables are in same units
  • First principal component of covariance matrix

51
Reduced major axis regression
  • Each of the two variables is transformed to have
    a mean of zero and a standard deviation of 1
  • Then, minimize sum of squares of perpendicular
    distances from observations to regression line
  • Its slope cannot be sensibly tested against zero
  • first principal component using the correlation
    matrix

52
Regression
  • Extremely useful technique!
  • Check assumptions using residuals
  • Can be extended in several ways
  • multiple regression
  • non-linear regression
  • non-normal errors
  • piecewise regression
  • ...
Write a Comment
User Comments (0)
About PowerShow.com