Title: Regression and Prediction
1Regression and Prediction
2Linear Regression
- Closely related to correlation, but goes a step
further. - Regression refers to a type of analysis in which
we use an individuals score on one variable to
predict his/her score on the other. - This prediction is obtained by applying a best
fitting straight line to the data set.
3- Theoretically, this is how it works.
First we are given a data set with a correlation
coefficient of r. Next we must obtain a line of
best fit.
4Once a line of best fit is completed, we can
predict a score for y, based on any score for x.
5By using regression, we match the score obtained
on x to a point on the regression line. This
point is then matched to y to obtain the
predicted score on y. Note, our predictions are
not perfect because the correlation between x and
y is not perfect. There are extraneous factors
that influence the relationship between x and y.
6Linear Regression Formula
However, we dont simply draw in this regression
line. Really, we draw the line only
theoretically using the following formula... Y
a byX Where Y represents the predicted
score of y. a represents the Y-intercept of the
regression line. b represents the slope of the
regression line.
7(No Transcript)
8More Formulas
9An Example
10(No Transcript)
11(No Transcript)
12Now, given the regression equation, find the
predicted values of y for each x value.
13To do this, we just plug in each x value into
the derived regression equation...
14(No Transcript)
15Another Formula (Formula 7.6)
Heres another formula that allows us to predict
Y. Y Y r sy (X -X) Where sy standard
deviation of y sx
sx standard deviation of x If we
figure out the means and standard deviations of x
and y from the previous example we can apply
this new formula to predict y scores based on x.
16An Example
Note, this is the exact same equation as
determined by the previous regression formula.
17Another Formula (7.7)
Y 110.5 8(2113) - 20(884) (X - 2.5) 8(64) -
(20)2
Y 110.5 (-6.93)(X - 2.5) Y 110.5
-6.93x 17.33 Y 127.83 (-6.93)x
18Another Example
A researcher believes that Drug A has the
potential to reduce human reaction time. Five
subjects are given different doses of Drug A and
then told to press a button each time they hear a
tone. Their reaction times are measured.
Calculate the regression equation using the
previous three formulas.
Drug Dose X (ml/kg) X
Reaction Time (msec) Y
1 1 2 1 3 2 4 2 5 4
19X 3 ?x 15 ?x2 55 sx 1.41 Y 2
?y 10 ?y2 26 sy 1.10 ?xy 37
r 0.90
SSxy ?xy - (? x)(?y) SSxy 37 -
(15)(10) 7 N
5
SSx ?x2 - (? x)2 SSx 55 - (15)2
10 N
5
b SSxy b 7 0.7 SSx
10
20a Y - bX
a 2 - 0.7(3) a -0.1
Y a bX Y -0.1 0.7X
X 1 Y -0.1 0.7(1) 0.6 X 2
Y -0.1 0.7(2) 1.3 X 3 Y -0.1
0.7(3) 2 X 4 Y -0.1 0.7(4) 2.7 X
5 Y -0.1 0.7(5) 3.4
21Another Example (7.6)
Y 2 (0.90)(1.10/1.41) (X - 3)
Y 2 (0.7)(X - 3) Y 2 0.7x -
2.1 Y -0.1 0.7x
22Another Example (7.7)
Y 2 5(37) - (15)(10) (X - 3)
5(55) - (15)2
Y 2 (0.7)(x - 3)
Y 2 0.7x -2.1
Y -0.1 0.7x
23Standard Error
- We know now that our predictions for y based
on x will not be perfect, however we can
calculate approximately how far off our
predictions will be. - This is called the standard error of the
estimate.
24Explained and Unexplained Variation
This diagram represents each scores variation
from the mean i.e., it tells how far each score
is from the mean. We can add up this variation
using ?(Y - Y)2 to get what is referred to
as total variation.
25This diagram represents each of the predicted
scores variation from the mean, i.e., it tells
us how far each predicted score is from the
mean. We can add this up using the formula
?(Y- Y)2 to get the explained variation. The
explained variation Y is attributed to X
26This diagram represents each scores variation
from the regression line i.e., it tells how far
each score is from the regression line. We can
add up this variation using ?(Y - Y)2 to get
what is referred to as unexplained variation.
27- The total variation is actually comprised of
the explained variation and the unexplained
variation. Thus...
Total variation unexplained variation
explained variation
28- Whats important about all of this is that
because we can break up the total variation in a
variable into explained and unexplained
variation, we can calculate an important ratio.
r2 explained variation total variation
r2 is called the coefficient of determination.
It tells us what percentage of the total
variation in Y can be attributed due to X.
29Example
- How much of the variation in IQ scores from
our previous example can be attributed to the
number of children? - Recall, we calculated r to be -0.82.
Simply calculate r2
r2 -0.822 r2 0.67
67 of the variation in IQ scores can be
attributed to number of children in the family.