Correlation and Regression - PowerPoint PPT Presentation

About This Presentation
Title:

Correlation and Regression

Description:

Least-squares Regression. Draw the line that minimizes the sum of the squared residuals from the line. Residual is (Yi-Yi) ... Formulae for Least-Squares Regression ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 58
Provided by: lukeh4
Category:

less

Transcript and Presenter's Notes

Title: Correlation and Regression


1
Correlation and Regression
2
Spearman's rank correlation
  • An alternative to correlation that does not make
    so many assumptions
  • Still measures the strength and direction of
    association between two variables
  • Uses the ranks instead of the raw data

3
Example Spearman's rs
VERSIONS 1. Boy climbs up rope, climbs down
again 2. Boy climbs up rope, seems to vanish,
re-appears at top, climbs down again 3. Boy
climbs up rope, seems to vanish at top 4. Boy
climbs up rope, vanishes at top, reappears
somewhere the audience was not looking 5. Boy
climbs up rope, vanishes at top, reappears in a
place which has been in full view
4
Hypotheses
H0 The difficulty of the described trick is not
correlated with the time elapsed since it was
observed. HA The difficulty of the described
trick is correlated with the time elapsed since
it was observed.
5
East-Indian Rope Trick
6
East-Indian Rope Trick
Years elapsed
Impressiveness Score
Rank Years
Rank Impressiveness
7
East-Indian Rope Trick
TABLE H
n 21, ? 0.05 Critical value 0.435 P lt
0.05, reject Ho
8
Spearmans Rank Correlation - large n
  • For large n (gt 100), you can use the normal
    correlation coefficient test for the ranks

Under Ho, t has a t-distribution with n-2 d.f.
9
Measurement Error and Correlation
  • Measurement error decreases the apparent
    correlation between two variables

You can correct for this effect - see text
10
Species are not independent data points
11
Independent contrasts
12
Independent contrasts
13
(No Transcript)
14
Quick Reference Guide - Correlation Coefficient
  • What is it for? Measuring the strength of a
    linear association between two numerical
    variables
  • What does it assume? Bivariate normality and
    random sampling
  • Parameter ?
  • Estimate r
  • Formulae

15
Quick Reference Guide - t-test for zero linear
correlation
  • What is it for? To test the null hypothesis that
    the population parameter, ?, is zero
  • What does it assume? Bivariate normality and
    random sampling
  • Test statistic t
  • Null distribution t with n-2 degrees of freedom
  • Formulae

16
T-test for correlation
Null hypothesis ?0
Sample
Test statistic
Null distribution t with n-2 d.f.
compare
How unusual is this test statistic?
P gt 0.05
P lt 0.05
Reject Ho
Fail to reject Ho
17
Quick Reference Guide - Spearmans Rank
Correlation
  • What is it for? To test zero correlation between
    the ranks of two variables
  • What does it assume? Linear relationship between
    ranks and random sampling
  • Test statistic rs
  • Null distribution See table if ngt100, use
    t-distribution
  • Formulae Same as linear correlation but based on
    ranks

18
Spearmans rank correlation
Null hypothesis ?0
Sample
Test statistic rs
Null distribution Spearmans rank Table H
compare
How unusual is this test statistic?
P gt 0.05
P lt 0.05
Reject Ho
Fail to reject Ho
19
Quick Reference Guide - Independent Contrasts
  • What is it for? To test for correlation between
    two variables when data points come from related
    species
  • What does it assume? Linear relationship between
    variables, correct phylogeny, difference between
    pairs of species in both X and Y has a normal
    distribution with zero mean and variance
    proportional to the time since divergence

20
Regression
  • The method to predict the value of one numerical
    variable from that of another
  • Predict the value of Y from the value of X
  • Example predict the size of a dinosaur from the
    length of one tooth

21
Linear Regression
  • Draw a straight line through a scatter plot
  • Use the line to predict Y from X

22
Linear Regression Formula
  • Y ? ?X
  • ? intercept
  • The predicted value of Y when X is zero
  • ? slope
  • the rate of change in Y per unit of change in X

Parameters
23
Interpretations of ? ?
higher ?
Y
lower ?
X
X
X
X
negative ?
? 0
positive ?
24
Linear Regression Formula
  • Y a bX
  • a estimated intercept
  • The predicted value of Y when X is zero
  • b estimated slope
  • the rate of change in Y per unit of change in X

25
How to draw the line?

Y4
Y3
Y4

Y3
Y

Y2
residuals
Y2

(Y1-Y1)
Y1

Y1
X
26
Least-squares Regression
  • Draw the line that minimizes the sum of the
    squared residuals from the line
  • Residual is (Yi-Yi)
  • Minimize the sum SSresidualsS(Yi-Yi)2



27
Formulae for Least-Squares Regression
  • The slope and intercept that minimize the sum of
    squared residuals are

sum of products
sum of squares for X
28
Example How old is that lion?
X proportion black Y age in years
29
Example How old is that lion?
30
Example How old is that lion?
X proportion black Y age in years X
0.322 Y 4.309 S(X-X)21.222 S(Y-Y)2222.087 S(X
-X)(Y-Y)13.012
31
(No Transcript)
32
(No Transcript)
33
A certain lion has a nose with 0.4 proportion of
black. Estimate the age of that lion.
34
Standard error of the slope
Sum of squares Sum of products
35
Lion Example, continued
36
Confidence interval for the slope
37
Lion Example, continued
38
Predicting Y from X
  • What is our confidence for predicting Y from X?
  • Two types of predictions
  • What is the mean Y for each value of X?
  • Confidence bands
  • What is a particular individual Y at each value
    of X?
  • Prediction intervals

39
Predicting Y from X
  • Confidence bands measure the precision of the
    predicted mean Y for each value of X
  • Prediction intervals measure the precision of
    predicted single Y values for each value of X

40
Predicting Y from X
Confidence bands
Prediction interval
41
Predicting Y from X
Confidence bands
Prediction interval
How confident can we be about the regression
line?
How confident can we be about the predicted
values?
42
Testing Hypotheses about a Slope
  • t-test for regression slope
  • Ho There is no linear relationship between X and
    Y (? 0)
  • Ha There is a linear relationship between X and
    Y (? ? 0)

43
Testing Hypotheses about a Slope
  • Test statistic t
  • Null distribution t with n-2 d.f.

44
Lion Example, continued
df n-2 32-2 30
Critical value 2.04 7.05 gt 2.04 so we reject the
null hypothesis
Conclude that ??0
45
Testing Hypotheses about a Slope ANOVA approach
Source of variation Sum of squares df Mean squares F P
Regression 1
Residual n-2
Total n-1
46
Lion Example, continued
Source of variation Sum of squares df Mean squares F P
Regression 138.54 1 138.54 49.7 lt0.001
Residual 83.55 30 2.785
Total 222.09 31
47
Testing Hypotheses about a Slope R2
  • R2 measures the fit of a regresion line to the
    data
  • Gives the proportion of variation in Y that is
    explained by variation in X

R2 SSregression
SStotal
48
Lion Example, Continued
49
Assumptions of Regression
  • At each value of X, there is a population of Y
    values whose mean lies on the true regression
    line
  • At each value of X, the distribution of Y values
    is normal
  • The variance of Y values is the same at all
    values of X
  • At each value of X the Y measurements represent a
    random sample from the population of Y values

50
Detecting Linearity
  • Make a scatter plot
  • Does it look like a curved line would fit the
    data better than a straight one?

51
Non-linear relationship Number of fish species
vs. Size of desert pool
52
Taking the log of area
53
Detecting non-normality and unequal variance
  • These are best detected with a residual plot
  • Plot the residuals (Yi-Yi) against X
  • Look for
  • symmetric cloud of points
  • Little noticeable curvature
  • Equal variance above and below the line


54
Residual plots help assess assumptions
Original
Residual plot
55
Transformed data
Logs
Residual plot
56
What if the relationship is not a straight line?
  • Transformations
  • Non-linear regression

57
Transformations
  • Some (but not all) nonlinear relationships can be
    made linear with a suitable transformation
  • Most common log transform Y, X, or both
Write a Comment
User Comments (0)
About PowerShow.com