Correlation and Regression - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Correlation and Regression

Description:

xy. x2. y2. Computation of r. x y. r is the correlation coefficient for the sample. ... Regression indicates the degree to which the variation in one variable X, is ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 22
Provided by: chrishol
Category:

less

Transcript and Presenter's Notes

Title: Correlation and Regression


1
Correlation and Regression
2
Correlation
A quantitative relationship between two interval
or ratio level variables
Explanatory (Independent) Variable
Response (Dependent) Variable
y
x
Hours of Training
Number of Accidents
Shoe Size
Height
Cigarettes smoked per day
Lung Capacity
Score on SAT
Grade Point Average
Height
IQ
What type of relationship exists between the two
variables and is the correlation significant?
3
Correlation
  • measures and describes the strength and direction
    of the relationship
  • requires two scores from the same individuals
    (dependent and independent variables)
  • Denoted by correlation coefficient r

4
Scatter Plots and Types of Correlation
x hours of training y number of accidents
60
50
40
Accidents
30
20
10
0
0
2
4
6
8
10
12
14
16
18
20
Hours of Training
Negative Correlationas x increases, y decreases
5
Scatter Plots and Types of Correlation
x SAT score y GPA
4.00
3.75
3.50
3.25
GPA
3.00
2.75
2.50
2.25
2.00
1.75
1.50
300
350
400
450
500
550
600
650
700
750
800
Math SAT
Positive Correlationas x increases, y increases
6
Scatter Plots and Types of Correlation
x height y IQ
160
150
140
130
IQ
120
110
100
90
80
60
64
68
72
76
80
Height
No linear correlation
7
Scatter Plots and Types of Correlation
Strong, negative relationship but non-linear!
Pearson's correlation coefficient is not
appropriate.......
8
Correlation Coefficient
A measure of the strength and direction of a
linear relationship between two variables
The range of r is from 1 to 1.
If r is close to 1 there is a strong positive
correlation.
If r is close to 1 there is a strong negative
correlation.
If r is close to 0 there is no linear correlation.
9
Outliers.....
Outliers are dangerous Here we have a spurious
correlation of r0.68
without IBM, r0.48 without IBM GE, r0.21
10
Application
Final Grade
Absences
x y 8 78 2 92 5 90 12
58 15 43 9 74 6 81
95
90
85
80
75
Final Grade
70
65
60
55
50
45
40
0
2
4
6
8
10
12
14
16
Absences
X
11
Computation of r
x y
xy
x2
y2
6084 8464 8100 3364 1849 5476 6561
624 184 450 696 645 666 486
64 4 25 144 225 81 36
1 8 78 2 2 92 3
5 90 4 12 58 5 15 43 6
9 74 7 6 81
57
516
3751
579
39898
12
Hypothesis Test for Significance
r is the correlation coefficient for the sample.
The correlation coefficient for the population is
(rho).
For a two tail test for significance
(The correlation is not significant)
(The correlation is significant)
The sampling distribution for r is a
t-distribution with n 2 d.f.
Standardized test statistic
13
Test of Significance
You found the correlation between the number of
times absent and a final grade r 0.975. There
were seven pairs of data.Test the significance of
this correlation. Use 0.01.
1. Write the null and alternative hypothesis.
(The correlation is not significant)
(The correlation is significant)
2. State the level of significance.
0.01
3. Identify the sampling distribution.
A t-distribution with 5 degrees of freedom
14
Rejection Regions
Critical Values t0
t
0
4. Find the critical value.
5. Find the rejection region.
6. Find the test statistic.
15
t
0
4.032
4.032
7. Make your decision.
t 9.811 falls in the rejection region. Reject
the null hypothesis.
8. Interpret your decision.
There is a significant negative correlation
between the number of times absent and final
grades.
16
The Line of Regression
Regression indicates the degree to which the
variation in one variable X, is related to or can
be explained by the variation in another variable
YOnce you know there is a significant linear
correlation, you can write an equation describing
the relationship between the x and y variables.
This equation is called the line of regression or
least squares line.
The equation of a line may be written as y mx
b where m is the slope of the line and b is
the y-intercept.
The line of regression is
The slope m is
The y-intercept is
17
(xi,yi)
a data point
a point on the line with the same x-value
a residual
Best fitting straight line
260
250
240
230
revenue
220
210
200
190
180
1.5
2.0
2.5
3.0
Ad
18
xy
x2
y2
x y
Write the equation of the line of regression with
x number of absences and y final grade.
1 8 78 2 2 92 3
5 90 4 12 58 5 15 43 6
9 74 7 6 81
6084 8464 8100 3364 1849 5476 6561
624 184 450 696 645 666 486
64 4 25 144 225 81 36
Calculate m and b.
57
516
3751
579
39898
The line of regression is
3.924x 105.667
19
The Line of Regression
m 3.924 and b 105.667
The line of regression is
95
90
85
Grade
80
75
70
65
Final
60
55
50
45
40
Absences
Note that the point (8.143, 73.714) is
on the line.
20
Predicting y Values
The regression line can be used to predict values
of y for values of x falling within the range of
the data.
The regression equation for number of times
absent and final grade is
3.924x 105.667
Use this equation to predict the expected grade
for a student with (a) 3 absences (b) 12
absences
3.924(3) 105.667 93.895
(a)
3.924(12) 105.667 58.579
(b)
21
Strength of the Association
The coefficient of determination, r2, measures
the strength of the association and is the ratio
of explained variation in y to the total
variation in y.
The correlation coefficient of number of times
absent and final grade is r 0.975. The
coefficient of determination is r2 (0.975)2
0.9506.
Interpretation About 95 of the variation in
final grades can be explained by the number of
times a student is absent. The other 5 is
unexplained and can be due to sampling error or
other variables such as intelligence, amount of
time studied, etc.
Write a Comment
User Comments (0)
About PowerShow.com