Scatterplots and Correlation - PowerPoint PPT Presentation

About This Presentation
Title:

Scatterplots and Correlation

Description:

... association = above-average values of Y accompany below-average values of X (and vice versa) ... The outlier in the above graph decreases r. If we remove ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 27
Provided by: jamesmaysm2
Learn more at: https://www.sjsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Scatterplots and Correlation


1
Chapter 4
  • Scatterplots and Correlation

2
Variable (X) and Variable (Y)
  • Prior chapters ? one variable at a time
  • This chapter ? relationship between two variables
  • One variable is an outcome response variable
    (Y)
  • The other variable is a predictor explanatory
    variable (X)
  • Are X and Y related? X ? Y?

3
Question
A study investigates whether the there is a
relationship between gross domestic product and
life expectancy Which is the explanatory
variable (X)? Which is the response variable
(Y)? All other variables that may influence life
expectancy are lurking and may confound the
relation between X and Y. Are there lurking
variables in this analysis?
4
Scatterplot
  • This chapter considers the case in which both X
    and Y are quantitative variables
  • Bivariate data points (xi, yi) are plotted on
    graph paper to form a scatterplot

5
Example of a scatterplot
  • X percent of students taking SAT
  • Y mean SAT verbal score
  • What is the relationship between X and Y?

6
Interpreting scatterplots
  • Form
  • Can data be described by straight line?
    Linearity
  • Direction
  • Does the line slope upward or downward
  • Positive association above-average values of Y
    accompany above-average values of X (and vice
    versa)
  • Negative association above-average values of Y
    accompany below-average values of X (and vice
    versa)
  • Strength
  • Do data point adhere to imaginary line?

7
Form discuss
8
Strength and direction
  • Direction positive, negative or flat
  • Strength How closely does a non-horizontal
    straight line fit the points of a scatterplot?
  • Close fitting ? strong
  • Loose fitting ? weak

9
Strength cannot be reliably judged visually
  • These two scatterplots are of the same data (they
    have the exact same correlation)
  • The second scatter plot looks like a stronger
    correlation, but this is an artifact of the axis
    scaling

10
Correlation coefficient (r)
  • Let r denote the correlation coefficient
  • r is always between -1 and 1, inclusive
  • Sign of r denotes direction of association
  • Special values for r
  • r 1? all points on upward sloping line
  • r -1 ? all points on downward sloping line
  • r 0 ? no line or horizontal line
  • The closer r is to 1 or 1, the better the fit
    of points to the line

11
Examples of Correlations
  • Husbands versus Wifes ages
  • r .94
  • Husbands versus Wifes heights
  • r .36
  • Professional Golfers Putting Success Distance
    of putt in feet versus percent success
  • r -.94

12
Correlation Coefficient r
  • Data on variables X and Y for n individuals
  • x1, x2, , xn and y1, y2, , yn
  • Each variable has a mean and std dev

13
Correlation coefficient r
The formula for r can be understood by converting
data points to standardized scores
where
14
Illustrative example (gdp_life.sav)
Per Capita Gross Domestic Product and Average
Life Expectancy for Countries in Western Europe
Does GDP predict life expectancy?
15
Illustrative example (gdp_life.sav)
Country Per Capita GDP (X) Life Expectancy (Y)
Austria 21.4 77.48
Belgium 23.2 77.53
Finland 20.0 77.32
France 22.7 78.63
Germany 20.8 77.17
Ireland 18.6 76.39
Italy 21.5 78.51
Netherlands 22.0 78.15
Switzerland 23.8 78.99
United Kingdom 21.2 77.37
16
Illustrative example (gdp_life.sav) Scatterplot
17
Illustrative example (gdp_life.sav)
x y
21.4 77.48 -0.078 -0.345 0.027
23.2 77.53 1.097 -0.282 -0.309
20.0 77.32 -0.992 -0.546 0.542
22.7 78.63 0.770 1.102 0.849
20.8 77.17 -0.470 -0.735 0.345
18.6 76.39 -1.906 -1.716 3.271
21.5 78.51 -0.013 0.951 -0.012
22.0 78.15 0.313 0.498 0.156
23.8 78.99 1.489 1.555 2.315
21.2 77.37 -0.209 -0.483 0.101
21.52 77.754 sum 7.285 sum 7.285
sx 1.532 sy 0.795 sum 7.285 sum 7.285
18
Illustrative example (gdp_life.sav)
19
Interpretation of r
Direction of association positive or
negative Strength of association the closer r
is to 1, the stronger the correlation. Here are
guidelines 0.0 ? r lt 0.3 ? weak
correlation 0.3 ? r lt 0.7 ? moderate
correlation 0.7 ? r lt 1.0 ? strong correlation
r 1.0 ? perfect correlation
20
Interpretation of r
For GDP / life expectancy example, r 0.809.
This indicates a strong positive correlation
21
Problems with Correlations
  • Not all relations are linear
  • Outliers can have large influence on r
  • Lurking variables confound relations

22
Not all Relationships are Linear Miles per
Gallon versus Speed
  • r ? 0 (flat line)
  • But there is a non-linear relation

23
Not all Relationships are Linear Miles per
Gallon versus Speed
  • Curved relationship.
  • r was misleading.

24
Outliers and Correlation
The outlier in the above graph decreases r If we
remove the outlier ? strong relation
25
Exercise 4.15 Calories and sodium content of hot
dogs
  1. What are the lowest and highest calorie counts?
    lowest and highest sodium levels?
  2. Positive or negative association?
  3. Any outliers? If we ignore outlier,is relation
    still linear? Does the correlation become
    stronger?

26
Exercise 4.13 IQ and school grades
  1. Positive or negative association?
  2. Is form linear? Does it appear strong?
  3. What is the IQ and GPA for the outlier on the
    bottom there?
Write a Comment
User Comments (0)
About PowerShow.com