Title: Psychology 820
1Psychology 820
- Correlation
- Regression Prediction
2Concept of Correlation
- A coefficient of correlation (r or ? rho) is a
statistical summary of the degree and direction
of relationship or association between two
variables (X and Y) - Degree of Relationship
- Correlations range from 0 to 1.00
- Direction of Relationship
- Positive () relationship High score on X goes
with a High score on Y - Negative (-) relationship High score on X goes
with Low score on Y
3The Bivariate Normal Distribution
- A family of three dimensional surfaces
4Scatterplots
- The chief purpose of the scatterplot is for the
study of the nature of the relationship between
two variables. - Components of r
- Pearson Product Moment Correlation
5Additional Measures of Relationships
- Spearman Rank Correlation
- Both X and Y are ranks
- Phi Coefficient
- Both X and Y are dichotomies
- Point-Biserial Coefficient
- One dichotomous variable and one continuous
measure - Biserial Correlation
- One artificial dichotomy and one continuous
measure - Tetrachoric Coefficient
- Both X and Y are artificial dichotomies
6Linear and Curvilinear Relationships
- Only the degree of linear relationship is
described by r or ? - If there is a substantial nonlinear relationship
between two variables, a different correlation
coefficient (such as eta ?) should be used
7Linear Transformations and Correlation
- Any transformation of X or Y that is linear does
not affect the correlation coefficient - This includes transformations to z-scores,
T-scores, addition of a constant to all values,
subtracting multiplying or dividing by non-zero
constants
8Effects of Variability on Correlation
- The variability (heterogeneity) of the sample has
an important influence on r - Range restriction
9Causation and Correlation
- Correlation must be carefully distinguished from
causation. - Third Variable Factor
- Effect of Outliers
10Regression and Prediction
- Prediction and correlation are opposite sides of
the same coin - Regression is usually the statistical method of
choice when the predicted variable is an ordinal,
interval, or ratio scale. - Simple linear regression (1 IV 1 DV) extends to
multiple regression (more than 1 IV)
11The Regression Effect
- The sons of tall fathers tend to be taller than
average, but shorter than their fathers. - The sons of short fathers tend to be shorter than
average, but taller than their fathers. - Regression to the Mean
12Regression Equation
- Y b X c (the equation of a straight line)
- Line of best fit
- Line of least-squares
- Prediction equation
13Proportion of Variance Interpretation of
Correlation
- The coefficient of determination (r2) is the
proportion of variance in Y that can be accounted
for by knowing X and, conversely, the proportion
of variance in X that can be accounted for by
knowing Y. - The coefficient of nondetermination (k2) is the
proportion of variance not accounted for
14Homoscedasticity
In a bivariate normal distribution the variance
of scores on Y will be the same for all values of
X (equal variance of Y scores for each value of
X) is known as homoscedasticity.
- This assumption means that the variance around
the regression line is the same for all values of
the predictor variable (X). The plot on the right
shows a violation of this assumption. For the
lower values on the X-axis, the points are all
very near the regression line. For the higher
values on the X-axis, there is much more
variability around the regression line.
15Part Correlation
- It is the correlation of X1 (IQ) with X2
(achievement posttest) after the portion of the
posttest that can be predicted from the pretest
has been removed.
16Partial Correlation
- Simple extension of part correlation
- The correlation of X1 and X2 with X3 held
constant, removed, or partialed out is a partial
correlation.
17Multiple Regression
- Multiple regression is the statistical method
most commonly employed for predicting Y from two
or more independent variables.
18Multiple Correlation
- The correlation between Y and Ypredicted when the
prediction is based on two or more independent
variables is termed multiple correlation