Chapters 8 and 9: Correlations Between Data Sets - PowerPoint PPT Presentation

About This Presentation
Title:

Chapters 8 and 9: Correlations Between Data Sets

Description:

... and tall fathers tend to have tall sons We say there is a positive association between the heights of fathers and sons ... husbands and wives obtained the ... – PowerPoint PPT presentation

Number of Views:101
Avg rating:3.0/5.0
Slides: 39
Provided by: Evan167
Learn more at: http://www.math.unt.edu
Category:

less

Transcript and Presenter's Notes

Title: Chapters 8 and 9: Correlations Between Data Sets


1
Chapters 8 and 9 Correlations Between Data Sets
  • Math 1680

2
Overview
  • Scatter Plots
  • Associations
  • The Correlation Coefficient
  • Sketching Scatter Plots
  • Changes of Scale
  • Summary

3
Scatter Plots
  • Often, we are interested in comparing two related
    data sets
  • Heights and weights of students
  • SAT scores and freshman GPA
  • Age and fuel efficiency of vehicles
  • We can draw a scatter plot of the data set
  • Plot paired data points on a Cartesian plane

4
Scatter Plots
  • Scatter plot for the heights of 1,078 fathers and
    their adult sons
  • From HANES study

5
Scatter Plots
  • What does the dashed diagonal line represent?
  • Find the point representing a 5'3¼" father who
    has a 5'6½" son

6
Scatter Plots
  • What does the vertical dashed column represent?
  • Consider the families where the father was 72"
    tall, to the nearest inch
  • How tall was the tallest son?
  • Shortest?

7
Scatter Plots
  • Was the average height of the fathers around 64,
    68 or 72?
  • Was the SD of the fathers heights around 3", 6"
    or 9"?

8
Scatter Plots
  • The points form a swarm that is more or less
    football-shaped
  • This indicates that there is a linear association
    between the fathers heights and the sons heights

9
Scatter Plots
  • Short fathers tend to have short sons, and tall
    fathers tend to have tall sons
  • We say there is a positive association between
    the heights of fathers and sons
  • What would it mean for there to be a negative
    association between the heights?

10
Scatter Plots
  • Does knowing the fathers height give a precise
    prediction of his sons height?
  • Does knowing the fathers height let you better
    predict his sons height?

11
Scatter Plots
  • We will generally assume the scatter plots are
    football-shaped
  • Association is linear in nature
  • Each data set is approximately normal

12
Scatter Plots
  • Key features of scatter plots
  • Given two data sets X and Y,
  • The point of averages is the point (?x, ?y)
  • The average of a data set is denoted by µ (Greek
    mu, for mean)
  • The subscript indicates which set is being
    referenced
  • It will be in the center of the cloud
  • Due to the normal approximation, the vast
    majority (95) of the cloud should fall within 2
    SDs less than and greater than average for both
    X and Y

13
Scatter Plots
14
Associations
  • When given a value in one data set, we often want
    to make a prediction for the other data set
  • We call our given value the independent variable
  • We call the value we are trying to predict the
    dependent variable

15
Associations
  • If there is indeed a relationship between the two
    data sets, we can say various things about their
    association
  • Strong Knowing X helps you a lot in predicting
    Y, and vice versa
  • Weak Knowing X doesnt really help you predict
    Y, and vice versa
  • Positive X and Y are directly proportional
  • The higher in one you look, the higher in the
    other you should be
  • Negative X and Y are inversely proportional
  • The higher in one you look, the lower in the
    other you should be

16
Associations
  • Positive associations
  • Study time/final grade
  • Height/weight
  • SAT score/GPA
  • Clouds in sky/chance of rain
  • Bowling practice/bowling score
  • Age of husband/age of wife
  • Negative associations
  • Age of car/fuel efficiency
  • Golfing practice/golf score
  • Dental hygiene/cavities formed
  • Pollution/air quality
  • Speed/mile time

17
Associations
  • What kind of association is this?

18
Associations
  • What kind of association is this?

19
Associations
  • Remember that even a very strong association does
    not necessarily imply a causal relationship
  • There may be a confounding influence at play

20
The Correlation Coefficient
  • While strong/weak and positive/negative give a
    sense of the association, we want a way to
    quantify the strength and direction of the
    association
  • The correlation coefficient (r) is the statistic
    which accomplishes this

21
The Correlation Coefficient
  • The correlation coefficient is always between 1
    and 1
  • A positive r means that there is a positive
    association between the sets
  • A negative r means that there is a negative
    association between the sets
  • If r is close to 0, then there is only a weak
    association between the sets
  • If r is close to 1 or 1, then there is a strong
    association between the sets

22
The Correlation Coefficient
  • The following plots have and
    , with 50 points in them
  • The only difference between them is the
    correlation coefficient
  • Note how the points fall into a line as r
    approaches 1 or 1

23
(No Transcript)
24
The Correlation Coefficient
  • To calculate r
  • Find the average and SD of each data set
  • Multiply the data sets pairwise and find the
    average
  • The correlation is the average of the product
    minus the product of the averages, all divided by
    the product of the SDs

25
The Correlation Coefficient
X Y
1 5
3 9
4 7
5 1
7 13
26
The Correlation Coefficient
  • Compute r for the following data

X Y
1 2
2 1
3 4
4 3
5 7
6 5
7 6
X Y
1 3
3 7
4 9
5 11
7 15
1
0.8214
27
The Correlation Coefficient
  • Estimate the correlation

28
The Correlation Coefficient
  • Estimate the correlation

29
Sketching Scatter Plots
  • The SD line is the line consisting of all the
    points where the standard score in X equals the
    standard score in Y
  • zX zY
  • To sketch the SD line, draw a line bisecting the
    long axis of the football shape
  • Note that the SD line always goes through the
    point of averages

30
Sketching Scatter Plots
  • Given the five-statistic summary (averages, SDs,
    and correlation) for a pair of data sets, we can
    sketch the scatter plot
  • Plot the point of averages in the center
  • Mark two SDs in both directions, on both axes
  • Plot the point 1 SD above average for both data
    sets
  • draw a line connecting this point and the point
    of averages
  • This is the SD line
  • Draw an ellipse with the SD line as its long axis
  • Ellipse should go just beyond the 2 SD marks in
    all directions
  • The value of r determines how oblong the ellipse
    is

31
Sketching Scatter Plots
  • A study of the IQs of husbands and wives obtained
    the following results
  • Husbands average IQ 100, SD 15
  • Wives average IQ 100, SD 15
  • r 0.6
  • Sketch the scatter plot

32
Changes of Scale
  • The correlation coefficient is not affected by
    changes of scale
  • Moving adding the same number to all of the
    values of one variable
  • Stretching multiplying the same positive number
    to all the values of one variable
  • Would r change if we multiplied by a negative
    number?
  • The correlation coefficient is also unaffected by
    interchanging the two data sets

33
Changes of Scale
34
Changes of Scale
35
Changes of Scale
  • Compute r for each of the following data sets

X Y
0 8
4 9
6 10
8 12
12 6
X Y
0 2
2 3
3 4
4 6
6 0
r -0.15
36
Summary
  • The relationship between two variables, X and Y,
    can be graphed in a scatter plot
  • When the scatter plot is tightly clustered around
    a line, there is a strong linear association
    between X and Y
  • A scatter plot can be characterized by its
    five-statistic summary
  • Average and SD of the X values
  • Average and SD of the Y values
  • Correlation coefficient

37
Summary
  • When the correlation coefficient gets closer to 1
    or 1, the points cluster more tightly around a
    line
  • Positive association has a positive r-value
  • Negative association has a negative r-value
  • Calculating the correlation coefficient
  • Take the average of the product
  • Subtract the product of the averages
  • Divide the difference by the product of the SDs

38
Summary
  • The correlation coefficient is not affected by
    changes of scale or transposing the variables
  • Correlation does not measure causation!
Write a Comment
User Comments (0)
About PowerShow.com