Scatterplots, Association, and Correlation - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Scatterplots, Association, and Correlation

Description:

Chapter 7 Scatterplots, Association, and Correlation – PowerPoint PPT presentation

Number of Views:294
Avg rating:3.0/5.0
Slides: 39
Provided by: Addi65
Category:

less

Transcript and Presenter's Notes

Title: Scatterplots, Association, and Correlation


1
Chapter 7
  • Scatterplots, Association, and Correlation

2
Looking at Scatterplots
  • Scatterplots may be the most common and most
    effective display for data.
  • Look for patterns, trends, relationships, and
    possible outliers
  • Best way to picture between two
    variables

3
Looking at Scatterplots (cont.)
  • When looking at scatterplots, we will look for
    direction, form, , and unusual features
  • Direction
  • A pattern that runs from the upper left to the
    lower right is said to have a direction
  • A trend running the other way has a
  • direction

4
Looking at Scatterplots (cont.)
  • This example shows a negative association between
    central pressure and maximum hurricane wind speed
  • As the central pressure , the maximum wind
    speed

5
Looking at Scatterplots (cont.)
  • Form
  • If there is a straight line
  • ( ) relationship, it will appear as a cloud or
    swarm of points stretched out in a generally
    consistent, straight form.

6
Looking at Scatterplots (cont.)
  • Form
  • If the relationship isnt straight while still
    increasing or decreasing steadily we can usually
    find ways to make it more straight

7
Looking at Scatterplots (cont.)
  • Form
  • If the relationship curves sharply, the methods
    of this book cannot really help us

8
Looking at Scatterplots (cont.)
  • Strength
  • At one extreme, the points appear to follow a
    stream
  • At the other extreme, the points appear as a
    vague cloud with no discernable trend or

9
Looking at Scatterplots (cont.)
  • Unusual features
  • Look for the unexpected
  • Look for any outliers standing away from the
    overall pattern of the scatterplot
  • Clusters or subgroups should also raise questions

10
Roles for Variables
  • Need to determine which of the two quantitative
    variables goes on the x-axis and which on the
    y-axis
  • This determination is made based on the roles
    played by the variables
  • When the roles are clear, the explanatory or
    variable goes on the x-axis, and the
    variable goes on the y-axis

11
Correlation
  • Data collected from students in Statistics
    classes included their heights (in inches) and
    weights (in pounds)
  • There is a
    positive association
  • Fairly straight
    form
  • Seems to be
    a high outlier

12
Correlation (cont.)
  • How strong is the association between weight and
    height of Statistics students?
  • Units should not matter when quantifying strength
  • A scatterplot of heights

    (in centimeters) and
    weights (in
    kilograms)
    doesnt change the


13
Correlation (cont.)
  • both variables and write the coordinates of
    a point as (zx, zy)
  • Removes the units from the data
  • Here is a scatterplot of the standardized weights
    and heights

14
Correlation (cont.)
  • The linear pattern seems steeper in the plot
    than in the scatterplot
  • Thats because we made the scales of the axes the
  • Equal scaling gives a neutral way of drawing the
    scatterplot and a fairer impression of the

15
Correlation (cont.)
  • A numerical measurement of the strength of the
    linear relationship between the explanatory and
    response variables
  • For the students heights and weights, the
    correlation is 0.644
  • Formula

16
Correlation Conditions
  • Correlation measures the strength of the linear
    association between two quantitative variables.
  • Before you use correlation, you must check
    several conditions
  • Quantitative Variables Condition
  • Straight Enough Condition
  • Outlier Condition

17
Correlation Conditions (cont.)
  • Quantitative Variables Condition
  • Correlation applies only to variables
  • Dont apply correlation to categorical data
  • Need to know the variables units and what they
    measure

18
Correlation Conditions (cont.)
  • Straight Enough Condition
  • You can calculate a correlation coefficient for
    any pair of variables
  • Correlation measures the strength only of the
    linear association
  • Results will be misleading if the relationship is
    not linear

19
Correlation Conditions (cont.)
  • Outlier Condition
  • Outliers can distort the correlation
  • It can even change the direction of the
    correlation coefficient
  • Switch from negative to positive, or
  • When you see an outlier report the correlations
    with and without that point

20
Correlation Properties
  • The sign of a correlation coefficient gives the
    direction of the association
  • Correlation is always between
  • Correlation close or equal to -1 or 1 indicates
    a linear relationship
  • A correlation near zero corresponds to a
  • linear association.

21
Correlation Properties (cont.)
  • Correlation treats x and y
  • The correlation of x with y is the same as the
    correlation of y with x
  • Correlation has
  • Correlation is not affected by changes in the
    center or scale of either variable
  • Correlations depend only on

22
Correlation Properties (cont.)
  • Correlation measures the strength of the linear
    association between the two variables
  • Variables can have a strong association but still
    have a small correlation if the association isnt
    linear
  • Correlation is sensitive to

23
Correlation ? Causation
  • Whenever we have a strong correlation, it is
    tempting to explain it by imagining that the
    predictor variable has the response
  • Scatterplots and correlation coefficients
  • prove causation
  • Watch out for
  • A hidden variable that stands behind a
    relationship and determines it by simultaneously
    affecting the other two variables

24
Correlation Tables
  • Compute the correlations between every pair of
    variables in a dataset and arrange these
    correlations in a table

25
Straightening Scatterplots
  • If a scatterplot shows a bent form that
    consistently increases or decreases, we can often
    straighten the form of the plot by
  • one or both variables
  • Transforming the data can straighten the
    scatterplots form

26
What Can Go Wrong?
  • Dont say correlation when you mean
    association
  • Dont confuse correlation with causation
  • Dont correlate variables
  • Be sure the association is linear
  • Beware of outliers

27
What Can Go Wrong? (cont.)
  • Dont assume the relationship is linear just
    because the correlation coefficient is high
  • R 0.979, but the relationship is actually bent

28
What have we learned?
  • We examine scatterplots for direction, form,
    strength, and unusual features
  • Although not every relationship is linear, when
    the scatterplot is straight enough, the
  • is a useful numerical summary
  • The sign of the correlation tells us the of the
    association
  • The magnitude of the correlation tells us the
  • of a linear association
  • Shifting, or scaling the data, standardizing, or
    swapping the variables has no effect on the
    numerical value

29
Exercise 7.38
  • Fast food is often considered unhealthy because
    much of it is high in both fat and calories. But
    are the two related? Here are the fat contents
    and calories of several brands of burgers.
    Analyze the association between fat content and
    calories.

Fat (g) 19 31 34 35 39 39 43
Calories 410 580 590 570 640 680 660
30
Exercise 7.38 (cont.)
  • Let us first plot the data
  • Y-axis?
  • X-axis?

31
Exercise 7.38 (cont.)
32
Exercise 7.38 (cont.)
33
Exercise 7.38 (cont.)
34
Exercise 7.38 (cont.)
  • From what we learned in Chapter 4

35
Exercise 7.38 (cont.)
36
Exercise 7.38 (cont.)
37
Exercise 7.38 (cont.)
38
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com