Correlation and Regression - PowerPoint PPT Presentation

1 / 82
About This Presentation
Title:

Correlation and Regression

Description:

It indicates how much Y will change for every unit of change in X ... of the variance in weight if we considered age and gender in addition to height. ... – PowerPoint PPT presentation

Number of Views:127
Avg rating:3.0/5.0
Slides: 83
Provided by: buddy5
Category:

less

Transcript and Presenter's Notes

Title: Correlation and Regression


1
Chapter 15
  • Correlation and Regression

2
Introduction
When one does correlational research, he or she
is interested in the relationship between two
variables.
3
Some examples of questions that would be answered
with a correlational study
  • Do taller people tend to weigh more than shorter
    people?
  • Do people with higher IQ scores tend to do better
    in school than others?
  • Do children who eat more sugar in their diet tend
    to be more active than other children?

4
In each case, the researcher would obtain a pair
of observations from each member of the sample.
5
Examples
  • To determine if taller people tended to weigh
    more than shorter people, the researcher would
    have to obtain the height and weight of each
    member in a sample.
  • To determine if people with higher IQ scores tend
    to do better than others in school, the
    researcher would have to obtain the IQ score and
    some index of academic performance (e.g, GPA)
    from each member of the sample.
  • To determine if children who eat more sugar are
    more active than other children, a researcher
    would have to record the amount of sugar consumed
    and the activity level of each child in the
    sample.

6
Bivariate Distributions
Because correlational research involves getting
pairs of scores, what results is a bivariate
distribution.
7
Bivariate Distributions
Because correlational research involves getting
pairs of scores, what results is a bivariate
distribution. Bivariate distributions should be
distinguished from univariate distributions.
8
When we do correlational research, we need ways
statistically describe the nature of the
relationship between the two variables.
9
Scatter Plots
One way we can get a sense of the relationship
between the two variables is to construct a
scatter plot. In a scatter plot, each pair of
scores is represented by a point in a
two-dimensional space. The horizontal distance
of each point is determined by the value of one
of the variables. The vertical distance is
determined by the value of the other variable.
10
Example
Suppose we have the following pairs 1, 2 1, 3 2,
2 3, 4 What would the scatter plot look like for
these four pairs?
11
(No Transcript)
12
Which variable you designate as X and which you
designate as Y is largely arbitrary. The only
time when it might make a difference is when one
of the two variables might be logically used to
predict the other.
13
When that is true, we should plot the variable
used to make the prediction (the predictor
variable) along the X-axis, and the variable we
are trying to predict (the criterion variable)
along the Y-axis.
14
For example, when it comes to height and weight,
we would probably be more likely to use ones
height (predictor variable) to predict ones
weight (the criterion variable). Therefore, it
would make sense to plot height along the X-axis
and weight along the Y-axis
15
Interpreting Scatter Plots
16
This scatter plot depicts a perfect, positive
relationship between the two variables. By
perfect, positive we mean that there is perfect
consistency between the two variables. A certain
increase in X always is accompanied by the same
amount of increase in Y
17
This scatter plot depicts a strong, positive
relationship. As X increases, so does Y.
However, the increase is not perfectly consistent
18
This scatter plot depicts a weak, positive
relationship. As X increases, there is only a
slight tendency for Y to increase
19
This scatter plot depicts an instance where there
is no relationship between X and Y. As X
increases, Y neither increases nor decreases.
20
This scatter plot depicts a perfect negative
(inverse) relationship. As X increases, Y
decreases
21
This is a strong, negative (inverse) relationship
22
This is a weak, negative (inverse) relationship.
23
In other words, relationships can vary from
perfect positive to perfect negative. The full
dimension is depicted below
No relationship
Perfect, negative
Perfect, positive
24
How would you describe the relationship between
height and weight?
25
The Pearson Correlation Coefficient
The Pearson correlation coefficient is a
statistic that quite precisely describes the
relationship between two variables. Specifically
it indicates whether the relationship is positive
or negative and how strong the relationship is.
26
The conceptual formula
or
27
The numerator is the interesting part of this
formula.
First it determines how the X and Y members of a
pair deviate from their respective means.
28
The numerator is the interesting part of this
formula.
Then by multiplying these deviations it
determines if they deviate in the same direction
(in which case the product will be positive) or
in opposite directions (in which case the product
will be negative).
29
The numerator is the interesting part of this
formula.
Finally, by summing the products of these
deviations, it determines if there is a
consistent pattern across all pairs.
The sum of the products of the deviation scores
is frequently represented by the symbol SP
30
Some examples
31
X and Y scores consistently deviate in the same
direction from their respective means. This
results in a large positive value for the sum of
the products of the deviations.
32
In this case, the X and Y values consistently
deviate in opposite directions. This results in
a large negative value for the sum of the
products of the deviations
33
What will happen in this case?
34
Or in these two cases?
35
Lets take a look at the height and weight data.
36
Calculating the Pearson correlation coefficient
37
While you can use the conceptual formula to
compute r, it is easier to use the raw score or
computational formula
38
A good strategy is to break the formula into
three components and compute the value for each
component and then insert them into the formula.
39
Its also a good strategy to compute the
following quantities before you begin.
40
Example What is r for the following pairs of
scores?
1, 2 1, 3 2, 2 3, 4
41
(No Transcript)
42
What is r for the following pairs of scores?
1, 4 1, 5 2, 2 4, 1
43
(No Transcript)
44
Whats the correlation between height and weight?
45
Interpreting correlation coefficients
  • The correlation coefficient can obtain any value
    between -1 and 1.
  • The sign of the correlation coefficient indicates
    whether the relationship is positive (as X
    increases, Y also increases) or negative (as X
    increases, Y decreases)
  • Its value indicates how strong the relationship
    is. Values close to 0 are weak or nonexistent.
    Values close to either 1 or -1 are very strong.

46
r -1
-1 lt r lt 0
r 0
r 1
0 lt r lt 1
47
While the correlation coefficient conveys
information about the strength of a relationship
in a precise way, it doesnt do so in a manner
that is particularly meaningful.
48
For example, a correlation coefficient of .8
indicates a stronger relationship than.7. Just
how strong is a relationship of .8 or .7, however?
49
The Coefficient of Determination
The coefficient of determination conveys
information about the strength of a relationship
in a manner which is quite meaningful
50
The Coefficient of Determination
Specifically, it tells you how strong a
relationship is by indicating the proportion of
variance that is shared by the two variables
51
To illustrate
Suppose that the circle below labeled X
represents the variance of X and that the circle
labeled Y represents the variance in Y.
52
If X and Y are correlated, that means that the
two variables co-vary to some extent. In other
words, they share some variance. The shared
variance is represented by the portion of the
circles that overlap.
53
The stronger the relationship between X and Y,
the more overlap, or shared variance, there will
be.
54
Calculating the Coefficient of Determination
To calculate the coefficient of determination,
simply square the correlation coefficient (r2).
This will tell you exactly what proportion of the
variance in one variable is shared with the
second variable. Unique variance is simply
1- r2.
55
(No Transcript)
56
Example
If the correlation between height and weight is
.64, then the coefficient of determination would
be .41. That indicates that the two variables
share 41 of their variance. That also means
that 59 of the variance in each variable is
unique (i.e., not shared with the other.
57
(No Transcript)
58
While the coefficient of determination conveys
information about the strength of a relationship,
it does not convey information about the type of
relationship (positive vs. negative). That is
because the sign of the correlation coefficient
is lost when it is squared.
59
Linear Regression
When we calculate a correlation coefficient, we
are really determining the extent to which the
relationship can be described by a straight line
60
In this case, a straight line provides a fairly
good description of the relationship (i.e, the
points tend to fall close to the line).
Consequently the correlation coefficient would be
relatively large.
61
In contrast, a straight line does a poorer job
describing this relationship since the points
tend to fall off of the line by a good bit. In
this case, we would expect a small correlation
coefficient.
62
The line that best describes the relationship
between two variables (i.e. comes closest to all
of the points) is referred to as the regression
line.
63
The line that best describes the relationship
between two variables (i.e. comes closest to all
of the points) is referred to as the regression
line. By obtaining the formula for the regression
line, we can more precisely describe how X and Y
are related.
64
Example
The relationships depicted below are both strong
and positive, yet they are not the same
relationship.
The difference is reflected in the slope of the
regression lines.
65
We can also use the regression line to predict a
Y value given any value of X.
66
Obtaining the Regression Equation
67
All straight lines have a common formula
Y b(X) a
b is referred to as the slope. It indicates how
much Y will change for every unit of change in
X a is referred to as the Y intercept. It
indicates the point at which the line intercepts
the Y-axis. Different lines will have different
values for b and a.
68
To obtain the regression equation, we must
calculate values for the slope (b) and intercept
(a). Here are the formulas
69
Example
What would the regression equation be for
predicting weight from height given the following
information
70
(No Transcript)
71
The complete regression equation would be
You can predict a weight for any height by
substituting that height for X in the regression
equation
72
The stronger the correlation is, the more
accurate the prediction will be.
73
Its important to remember that the regression
equation for predicting Y from X is not the same
as the regression equation for predicting X from
Y.
74
A few remaining points about correlation and
regression
75
Non-linear relationships
A correlation only determines the extent to which
a straight line describes the relationship
between two variables. Sometimes the relationship
isnt linear.
76
In this case, there is a curvilinear
relationship. Since the relationship isnt
linear, r might equal 0.
77
The problem of restricted range
Sometimes the true relationship between two
variables is masked because range of values for
one or both of the variables has been restricted.

78
Consider this scatter plot. What is the
relationship between X and Y?
79
It might appear to be weak only because X and Y
vary only over a restricted range. If allowed to
vary over a wider range, a stronger relationship
might emerge.
80
Multiple correlation and regression
Often we are interested in how much of the
variance we can account for in a criterion
(dependent) variable. Typically, we can account
for more variance it we take into account
multiple predictor (independent) variables.
81
For example, we might be able to account for more
of the variance in weight if we considered age
and gender in addition to height. This would
also lead to better prediction.
82
This is the idea behind multiple correlation and
multiple regression
Write a Comment
User Comments (0)
About PowerShow.com