Title: Correlations
1Correlations
- Bernardo Aguilar-Gonzalez
2Describing Relationships
- Positive relationship high values are paired
with high values, low with low. - Negative relationship high values are paired
with low values, low with high. - No relationship no regularity appears between
pairs of scores in two distributions.
3Scatterplots
- One variable is measured on the x-axis, the
other on the y-axis. - Positive relationship a cluster of dots sloping
upward from the lower left to the upper right. - Negative relationship a cluster of dots sloping
down from upper left to lower right. - No relationship no apparent slope.
4Strength of Relationship
- The more closely the dots approximate a straight
line, the stronger the relationship. - A perfect relationship forms a straight line.
- Dots forming a line reflect a linear
relationship. - Dots forming a curved or bent line reflect a
curvilinear relationship.
5(No Transcript)
6Correlation Coefficient
- Pearsons r a measure of how well a straight
line describes the cluster of dots in a plot. - Ranges from -1 to 1.
- The sign indicates a positive or negative
relationship. - The value of r indicates strength of
relationship. - Pearsons r is independent of units of measure.
1900, I suggested this, when studying natural
selecion, remember?
7Interpreting Pearsons r
- The value of r needed to assert a strong
relationship depends on - The size of n
- What is being measured.
- Pearsons r is NOT the percent or proportion of a
perfect relationship. - Correlation is not causation.
- Experimentation is used to confirm a suspected
causal relationship.
8Other Correlation Coefficients
- Spearmans rho (r) based on ranks rather than
values. - Used with ordinal data (qualitative data that can
be ordered least to most). - Point biserial correlation -- correlations
between quantitative data and two coded
categories. - Cramers phi correlation between two ordered
qualitative categories.
In 1904! Im Spearman, remember?
9Chapter 14 of the text
- Do the problem on the correlation between the
physical and intellectual effects of exercising
in Ch. 14 of the book
10Procedure and Output
11(No Transcript)
12Example 1
- Do Exercise 1 in the handout
- Hypothesis membership growth in large city
churches in positively correlated with distance
from the central business district.
13Results
14Example 2
- Do Exercise 18 in the handout
- Hypothesis There is a positive correlation
between the ranking of counties by their schools
by two different consulting agencies.
15Results
16SPSS allows to do a scatter plot too
17(No Transcript)
18The GSS
- The GSS (General Social Survey) is an almost
annual See Note 1, "omnibus," personal
interview survey of U.S. households conducted by
the National Opinion Research Center (NORC) with
James A. Davis, Tom W. Smith, and Peter V.
Marsden as principal investigators (PIs). The
first survey took place in 1972 and since then
more than 38,000 respondents have answered over
3,260 different questions. The special features
of the GSS follow from its unique origin as the
first, perhaps only, social science data set
designed to be analyzed by "users," rather than
the PIs and project staff. - The mission of the GSS is to make timely,
high-quality, scientifically relevant data
available to the social science research
community. - Key features of the GSS are its broad coverage,
its use of replication, its cross-national
perspective, and its attention to data quality.
19Example 3
- How does education influence the types of
occupations that people enter? One way to think
about occupations is in terms of occupational
prestige. Load the data set gss00a.sav. Your
data set includes a variable, PRESTG80, in which
a prestige score was assigned to respondents
occupations, where higher numbers indicate
greater prestige. (To get more information about
how the occupational prestige scale was
constructed, you can go to http//www.csub.edu/ssr
ic-trd/SPSS/xtras.html) Lets hypothesize that
as education increases, the level of prestige of
ones occupation also increases. To test this
hypothesis, click on "Analyze," "Correlate," and
"Bivariate."Â The following dialog box shown will
appear on your screen. Click on EDUC, and then
click the arrow to move it into the box. Do the
same with PRESTG80.
20Variable Code
21- The most widely used bivariate test is the
Pearson correlation. It is intended to be used
when both variables are measured at either the
interval or ratio level, and each variable is
normally distributed. However, sometimes we do
violate these assumptions. If you do a histogram
of both EDUC, chapter 4, and PRESTG80, you will
notice that neither is actually normally
distributed. Furthermore, if you noted that
PRESTG80 is really an ordinal measure, not an
interval one, you would be correct.Â
22- Nevertheless, most analysts would use the Pearson
correlation because the variables are close to
being normally distributed, the ordinal variable
has many ranks, and because the Pearson
correlation is the one they are used to. SPSS
includes another correlation test, Spearmans
rho, that is designed to analyze variables that
are not normally distributed, or are ranked, as
is PRESTG80. We will conduct both tests to see
if our hypothesis is supported, and also to see
how much the results differ depending on the test
used in other words, whether those who use the
Pearson correlation on these types of variables
are seriously off base.
23In the dialog box, the box next to Pearson is
already checked, as this is the default. Click
in the box next to Spearman. Your dialog box
should now look like the following figure Click
OK to run the tests.
- Your output screen will show two tables one for
the Pearson correlation, and one for the
Spearmans rho. The results of the Pearsons
correlation, which is called a correlation
matrix, should look like the following one
24(No Transcript)
25Notice that the Pearson coefficient for the
relationship between education and occupational
prestige is .520, and it is positive. This tells
us that, just as we predicted, as education
increases, occupational prestige increases. But
should we consider the relationship strong? At
.520, the coefficient is only about half as large
as is possible. It should not surprise us,
however, that the relationship is not perfect
(a coefficient of 1). Education appears to be an
important predictor of occupational prestige, but
no doubt you can think of other reasons why
people might enter a particular occupation. For
example, someone with a college degree may decide
that they really wanted to be a cheese-maker,
which has an occupational prestige score of only
29, while a high-school dropout may one day
become an owner of a bowling alley, which has a
prestige score of 44. Given the variety of
factors that may influence ones occupational
choice, a coefficient of .520 suggests that the
relationship between education and occupational
prestige is actually quite strong.
The correlation matrix also gives the probability
of being wrong if we assume that the relationship
we find in our sample accurately reflects the
relationship between education and occupational
prestige that exists in the total population from
which the sample was drawn (labeled as Sig.
(2-tailed)). The probability value is .000
(remember that the value is rounded to three
digits), which is well below the conventional
threshold of p lt .05. Thus, our hypothesis is
supported. There is a relationship (the
coefficient is not 0), it is in the predicted
direction (positive), and we can generalize the
results to the population (p lt .05).
26Recall that we had some concerns about using the
Pearson coefficient, given that PRESTG80 is
measured as an ordinal variable. The following
figure shows the results using Spearmans rho.Â
Notice that the coefficient, .523, is nearly
identical to coefficient obtained using the
Pearson correlation. What do you conclude?
27Example 4
- the size of breeding pairs of penguins was
measured to see if there was correlation between
the sizes of the two sexes. - Calculate both the Parametric and the non
parametric correlation between the sizes and
sexes.
28Data Procedure
29Parametric Result
30Non Parametric
What happened?