Title: Describing Relationships: Scatterplots and Correlation
1Chapter 14
- Describing Relationships Scatterplots and
Correlation
2Thought Question 1
From a scatterplot of college students, there is
a positive correlation between verbal SAT score
and GPA. For used cars, there is a negative
correlation between the age of the car and the
selling price. Explain what it means for two
variables to have a positive correlation or a
negative correlation.
3Thought Question 2
Do you think each of the following pairs of
variables would have a positive correlation, a
negative correlation or no correlation?
- Calories eaten per day and weight
- Calories eaten per day and IQ
- Amount of alcohol consumed and accuracy on a
manual dexterity test - Number of ministers and number of liquor stores
in cities in Pennsylvania - Height of husband and height of wife
4Thought Question 3
Use the following two scatterplots to speculate
on what influences outliers have on correlation.
For each scatterplot, do you think the
correlation is higher or lower than it would be
without the outlier?
5Statistical versus Deterministic Relationships
- Distance versus Speed (when travel time is
constant). - Income (in millions of dollars) versus total
assets of banks (in billions of dollars).
6Distance versus Speed
- Distance Speed x Time
- Suppose time 1.5 hours
- Speed varies from 10 mph to 50 mph
- Deterministic relationship
7Income versus Assets
- Income a b?Assets
- Assets vary from 3.4 billion to 49 billion
- Income varies from bank to bank, even among those
with similar assets - Statistical relationship
8Strength and Statistical Significance
- A strong relationship seen in the sample may
indicate a strong relationship in the population. - The sample may exhibit a strong relationship
simply by chance and the relationship in the
population is not strong or is zero. - The observed relationship is considered to be
statistically significant if it is stronger than
a large proportion of the relationships we could
expect to see just by chance.
9Warnings aboutStatistical Significance
- Statistical significance does not imply the
relationship is a strong one or even one of
practical importance. - Even weak relationships may be labeled
statistically significant if the sample size is
very large. - Even very strong relationship may not be labeled
statistically significant if the sample size is
very small.
10Linear Relationship
- Some relationships are such that the points of a
scatterplot tend to fall along a straight line --
linear relationship
11Examples of Relationships
12Measuring Strength Directionof a Linear
Relationship
- How closely does a non-horizontal straight line
fit the points of a scatterplot? - The correlation coefficient (often referred to as
just correlation) r - measure of the strength of the relationship
the stronger the relationship, the larger the
magnitude of r. - measure of the direction of the relationship
positive r indicates a positive relationship,
negative r indicates a negative relationship.
13Correlation Coefficient
- special values for r
- a perfect positive linear relationship would have
r 1 - a perfect negative linear relationship would have
r -1 - if there is no linear relationship, or if the
scatterplot points are best fit by a horizontal
line, then r 0 - Note r must be between -1 and 1, inclusive
- r gt 0 as one variable changes, the other
variable tends to change in the same direction - r lt 0 as one variable changes, the other
variable tends to change in the opposite direction
Plot
14Examples of Correlations
- Husbands versus Wifes ages
- r .94
- Husbands versus Wifes heights
- r .36
- Professional Golfers Putting Success Distance
of putt in feet versus percent success - r -.94
Plot
15Not all Relationships are Linear Miles per
Gallon versus Speed
- Linear relationship?MPG a b?Speed
- Speed varies from 20 mph to 60 mph
- MPG varies from trial to trial, even at the same
speed - Statistical relationship
16Not all Relationships are Linear Miles per
Gallon versus Speed
- Curved relationship(r is misleading)
- Speed varies from 20 mph to 60 mph
- MPG varies from trial to trial, even at the same
speed - Statistical relationship
17Problems with Correlations
- Outliers can inflate or deflate correlations
- Groups combined inappropriately may mask
relationships (a third variable) - groups may have different relationships when
separated
Plot
18Key Concepts
- Statistical vs. Deterministic Relationships
- Statistically Significant Relationship
- Strength of Linear Relationship
- Direction of Linear Relationship
- Correlation Coefficient
- Problems with Correlations