Regression: (1) Simple Linear Regression - PowerPoint PPT Presentation

1 / 52

About This Presentation

Title:

Regression: (1) Simple Linear Regression

Description:

If assumptions hold, what can we do? Testing assumptions. When ... Weights and Lengths of Cetacean Species. Whitehead & Mann 2000. 3f) Compare intercepts ... – PowerPoint PPT presentation

Number of Views:344

Avg rating:3.0/5.0

Slides: 53

Provided by: HalWhi9

Category:

more less

Transcript and Presenter's Notes

Title: Regression: (1) Simple Linear Regression

1
Regression(1) Simple Linear Regression

Hal Whitehead
BIOL4062 / 5062

2
Regression

Purposes of regression
Simple linear regression
Formula
Assumptions
If assumptions hold, what can we do?
Testing assumptions
When assumptions do not hold

3
Regression

One Dependent Variable Y
Independent Variables X1,X2,X3,...

4
Purposes of Regression

1. Relationship between Y and X's
2. Quantitative prediction of Y
3. Relationship between Y and X controlling for
C
4. Which of X's are most important?
5. Best mathematical model
6. Compare regression relationships Y1 on X,
Y2 on X
7. Assess interactive effects of X's

Simple regression one X
Multiple regression two or more X's

6
Simple linear regression

Y ß0 ß 1X Error

7
Assumptions of simple linear regression

1. Existence
2. Independence
3. Linearity
4. Homoscedasticity
5. Normality
6. X measured without error

8
Assumptions of simple linear regression

1. For any fixed value of X, Y is a random
variable with a certain probability distribution
having finite mean and variance
(Existence)

Y
Prob of Y
X
9
Assumptions of simple linear regression

2. The Y values are statistically independent of
one another
(Independence)

10
Assumptions of simple linear regression

3. The mean value of Y given X is a straight
line function of X
(Linearity)

Y
Prob of Y
X
11
Assumptions of simple linear regression

4. The variance of Y is the same for all X
(Homoscedasticity)

Y
Prob of Y
X
12
Assumptions of simple linear regression

5. For any fixed value of X, Y has a normal
distribution
(Normality)

Y
Prob of Y
X
13
Assumptions of simple linear regression

6. There are no measurement errors in X
(X measured without error)

14
Assumptions of simple linear regression

1. Existence
2. Independence
3. Linearity
4. Homoscedasticity
5. Normality
6. X measured without error

15
If assumptions hold, what can we do?

1. Estimate ß0 (intercept), ß1 (slope), together
with measures of uncertainty
2. Describe quality of fit (variation of data
around straight line) by estimate of s² or r²
3. Tests of slope and intercept
4. Prediction and prediction bands
5. ANOVA Table

16
Parameters estimated using least-squares

Age-specific pregnancy rates of female sperm
whales (from Best et al. 1984 Rep. int. Whal.
Commn. Spec. Issue)

Find line which minimizes squares of residuals
17
1. Estimate ß0 (intercept), ß1 (slope), together
with measures of uncertainty

Age-specific pregnancy rates of female sperm
whales (from Best et al. 1984 Rep. int. Whal.
Commn. Spec. Issue)

18
1. Estimate ß0 (intercept), ß1 (slope), together
with measures of uncertainty

ß0 0.230
(SE 0.028)
95 c.i.
0.164 0.296
ß1 -0.0035
(SE 0.0009)
95 c.i.
-0.0056 0.0013

19
2. Describe quality of fit by estimate of s² or
r²

s² 0.0195
r2 0.679
r2 (adjusted) 0.633
(Propn. variance accounted for by
regression)

20
3. Tests of slope and intercept

a) Slope 0 Equivalent to r0
b) Slope Predetermined constant
c) Intercept 0
d) Intercept Predetermined constant
e) Compare slopes
f) Compare intercepts Assume same slope
(tests use t-distribution)

21
3a) Slope 0 Equivalent to r0

Does pregnancy rate change with age?
H0 ß1 0
H1 ß1 ? 0
P0.006
Does pregnancy rate decline with age?
H0 ß1 0
H1 ß1 gt 0
P0.003

22
3b) Slope Predetermined constant

ß1 2.868 (SE 0.058)
95 c.i. 2.752 2.984
Does shape change with length?
H0 ß1 3
H1 ß1 ? 3
Plt0.05

weightlength3
Weights and Lengths of Cetacean Species Whitehead
Mann In Cetacean Societies 2000
23
3c) Intercept 0

ß0 0.436 (SE 0.080)
95 c.i. 0.276 0.596
Is birth length proportional to length?
H0 ß0 0
H1 ß0 ? 0
P0.000

24
3d) Intercept Predetermined constant

25
3e) Compare slopes

ß1 (m) 2.528 (SE 0.409)
ß1 (o) 2.962 (SE 0.094)
Does shape change differently with length for
odontocetes and mysticetes?
H0 ß1 (m) ß1 (o)
H1 ß1 (m) ? ß1 (o) P 0.146

Weights and Lengths of Cetacean Species Whitehead
Mann 2000
26
3f) Compare intercepts Assume same slope

ß0 (m) 2.528 (SE 0.409)
ß0 (o) 2.962 (SE 0.094)
Are odontocetes and mysticetes equally fat?
H0 ß0 (m) ß0 (o)
H1 ß0 (m) ? ß0 (o) P 0.781

15
10
Log(Weight)
5
ORDER
m
o
0
0
1
2
3
4
Log(Length)
27
4. Prediction and prediction bands
95 Confidence Bands for Regression Line
95 Prediction Bands
From http//www.tufts.edu/gdallal/slr.htm
28
5. ANOVA Table

Analysis of Variance
Source Sum-of-Squares df Mean-Square
F-ratio P
Regression 286.27 1 286.27 2475.07 0.00
Residual 5.32 46 0.12

29
If assumptions hold, what can we do?

1. Estimate ß0 (intercept), ß1 (slope), together
with measures of uncertainty
2. Describe quality of fit (variation of data
around straight line) by estimate of s² or r²
3. Tests of slope and intercept
4. Prediction and prediction bands
5. ANOVA Table

30
Testing assumptions diagnostics

Use residuals to look at assumptions of
regression
e(i) Y(i) - (ß0 ß1X(i))

Observed
31
Residuals

Residual e(i) Y(i) - (ß0 ß1X(i))
Standardized residuals e(i)/S
S is the standard deviation of the residuals
with adjusted degrees of freedom
Studentized residuals e(i) / S?(1 - h(i))
h(i) is the "leverage value" of observation i
h(i) 1/n (X(i) - SX(i)/n )²/(n-1)S(X)²
Jackknifed residuals e(i) / S(-i) ?(1 -
h(i))
The residual variance (S(-i)) is calculated
separately with each observation deleted

32
Use Residuals to

a) look for outliers which we may wish to remove
b) examine normality
c) check for linearity
d) check for homoscedasticity
e) check for some kinds of non-independence

33
a) Using residuals to look for outliers
34
Should outliers be removed?

Yes
if outlier was probably not produced by the
process being studied
measurement error
different species
...

No
if outlier was probably produced by the process
being studied
extreme specimen

35
b) Using residuals to examine normality

Lilliefors test for normality
P0.62
Lilliefors test for normality (excluding Bowhead
whale)
P0.68

36
c) Using residuals to check for linearity
37
d) Use residuals to check for homoscedasticity
38
e) Use residuals to check for some kinds of
non-independence
Days spent following sperm whales

Durbin-Watson D Statistic 1.48
low values (lt2) indicate autocorrelation
First Order Autocorrelation 0.26

39
Use Residuals to

a) look for outliers which we may wish to remove
b) examine normality
c) check for linearity
d) check for homoscedasticity
e) check for some kinds of non-independence

40
Assumptions of simple linear regression

1. Existence
2. Independence
3. Linearity
4. Homoscedasticity
5. Normality
6. X measured without error

41
When assumptions do not hold

1. Existence
Forget it!

42
When assumptions do not hold

2. Independence
collect data differently
reduce the size of the data set
add additional terms to the regression model
(e.g. autocorrelation term, species effect)
More a problem for testing than prediction

43
When assumptions do not hold

3. Linearity
Transform either X or Y or both variables. e.g.
Log(Y) ß0 ß1 Log(X) E
Polynomial regression
Y ß0 ß1 X ß2 X² ... E
Non-linear regression. e.g.
Y c EXP(ß0 ß1 X) E
Piecewise linear regression
Y ß0 ß1 X XgtXK E
where Xgt XK0 if Xlt XK and Xgt XK1 if Xgt XK.

44
(No Transcript)
45
Transformation to improve linearity
46
When assumptions do not hold

4. Homoscedasticity
Transformations of the Y variable
Weighted regressions (if we know that some
observations are more accurate than others)

47
Y - transformation to improve homoscedasticity
48
When assumptions do not hold

5. Normality
Transformations of the Y variable
Non-normal error structures (e.g. Poisson)
Small departures from normality are not
especially important, unless doing a test

49
When assumptions do not hold

6. X measured without error
Major axis regression
Reduced major axis, or geometric mean, regression

50
Major axis regression

Minimize sum of squares of perpendicular
distances from observations to regression line
Only if variables are in same units
First principal component of covariance matrix

51
Reduced major axis regression

Each of the two variables is transformed to have
a mean of zero and a standard deviation of 1
Then, minimize sum of squares of perpendicular
distances from observations to regression line
Its slope cannot be sensibly tested against zero
first principal component using the correlation
matrix

52
Regression