Scatterplots, Association, and Correlation - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Scatterplots, Association, and Correlation

Description:

Will a scatterplot of height of 100 women vs weight of 100 men ... Explanatory/Predictor variable: If one variable explains or predicts the other, it is the EV ... – PowerPoint PPT presentation

Number of Views:329
Avg rating:3.0/5.0
Slides: 21
Provided by: catheri80
Category:

less

Transcript and Presenter's Notes

Title: Scatterplots, Association, and Correlation


1
Chapter 7
  • Scatterplots, Association, and Correlation

2
From single variables to 2
  • We have been interested so far in the
    distribution of single variables, or perhaps
    comparing the distributions of 2 similar
    variables
  • top speed for males and females
  • Now How do we look at the relationship between 2
    Quantitative variables?
  • Is there a trend or pattern between variables?
  • Does one predict another?
  • Are there outlying cases?
  • Are the data grouped in an interesting way?

3
A Plot between 2 Variables.
  • To see associations between variables, we use a
    type of plot youre probably very familiar with
    already.
  • What is it?
  • A Scatterplot!
  • The best way to start observing the relationship
    between two quantitative variables.

4
Keys to the Scatterplot
  • Shows relationship between 2 Quantitative
    Variables.
  • Can I make a scatterplot of car model and number
    stolen?
  • Both variables must be measured on the same
    individuals
  • Will a scatterplot of height of 100 women vs
    weight of 100 men work?
  • One variable per axis
  • Each individual represented by single pt on plot.
  • Centered on data (remember the midrange?)

5
DataDesk and Scatterplots
  • Lets look at how to make a scatterplot on
    DataDesk.
  • But first, ALWAYS ask
  • Who? (individuals)
  • What? (vars, units)
  • Class Data, Travel Distance / Travel Time

6
What goes where?
  • Which variable should go on which axis?
  • Vocabulary
  • Explanatory/Predictor variable If one variable
    explains or predicts the other, it is the EV
  • Place on X axis.
  • Response variable If one variable responds to
    the other, it is the response variable,
  • Place on Y axis.
  • What if no clear influence?
  • Your choice!

7
Examples
  • Which is the EV, which RV?
  • Daily high temp, Cooling costs
  • Degree of pain relief, Drug dosage
  • Score on test, Time spent studying
  • Height, Weight
  • Shoe size, Grade-point average

8
Whats important in a Scatterplot?
  • What do you think? What do you see?
  • Lets look at Monopoly data.
  • Price, Rent
  • Which one X, which one Y?

9
Describing Scatterplots
  • 3 Things to look for in a scatterplot
  • Direction
  • Positive Increasing, as one variable increases,
    the other one does too.
  • Negative Decreasing, as one variable increases,
    the other decreases.
  • Form What is the overall pattern of the plot?
  • Linear
  • Curved (Parabolic, Exponential, Bent)
  • Others (Fan, etc)
  • NO DATA RE-EXPRESSION IN THIS CLASS!

10
Describing Scatterplots
  • Strength How strong is the relationship?
  • How closely do the variables conform to the
    described form?
  • Good strength (low scatter)- follows form
    closely, tightly clustered in a single stream
  • Poor strength (high scatter)- points form a
    cloud with barely any visible pattern.
  • Also, look for the unusual!
  • Outliers points well away from the pattern
  • Clusters Maybe you have groups, and need to
    split data.

11
Some examples
  • Lets look at some datafiles, and describe the
    relations.
  • Before graphing- what do you expect to see?
  • CEO Salaries and Ages
  • Airfares
  • Oil Change
  • Colleges

12
How strong?
  • Assessing strength of linear association by eye
    is difficult, eyes are easily misled.
  • Back to Colleges.
  • Scaling can change how an association appears.
  • We also want to be able to measure strength
    independent of units. ( or pesos? Years or
    months?)
  • What do we know that is independent of units?
  • Z-scores

13
Z-scores and Scatterplots
  • If we plot our variables as z-scores, the new
    scatterplot will be
  • Centered at (0,0)
  • Axes will be same scale
  • Independent of original units of the data
  • Gives a neutral way of drawing the scatterplot,
    fairer impression of strength.
  • Class Data again

14
Z-scores and Correlation
  • Correlation, r, is the statistic used to give a
    numerical indication of strength, independent of
    scale.
  • Points in Quadrants I and III will have a
    positive product of z-scores, points in II and IV
    will have a negative product. Points on axes
    dont get to vote (0 product).

Do Not Calculate by hand- use Technoloy!
15
About Correlation
  • Does Correlation (r) have any units?
  • Will changing the units on a variable change the
    correlation?
  • What are the requirements for finding a
    correlation?
  • 2 quantitative variables
  • Approximately linear form
  • Be aware of outliers- they can affect r a lot.
  • Ie Check the scatterplot FIRST!

16
All about Correlation
  • r will vary between 1.0 and -1.0 ONLY.
  • Sign of r gives the direction
  • The closer r is to -1 or 1, the stronger the
    linear relation
  • r -1.0 perfect linear association, negative
    slope
  • r 1.0 perfect linear assoc., positive slope
  • r 0 no LINEAR association
  • There is no distinction between X and Y in
    calculating.
  • Like Mean and SD, sensitive to outliers.
  • Outliers can create a false good correlation
  • Can hide a true good correlation
  • Go to ActivStats, Chapter 7 Scatterplot Tool

17
Data Desk and Correlation
  • Estimating the correlation is hard.
  • Even Statisticians have a hard time doing it.
  • We will always let Technology calculate r for us.
  • Class Data

18
How strong is strong?
  • How high does r have to be to be good?
  • It depends!
  • Education study might be thrilled with r 0.6
  • Engineering study might find r0.9 too low!

19
Correlation or Causation?
  • Correlation is shows an association.
  • We can use one variable to predict another.
  • Does this mean that one variable causes the
    other?
  • Shoe size, reading level
  • Be careful- there might be lurking variables!
  • A variable, hidden or not considered, that
    influences both variables simulataneously.
  • Age

20
Example 18
  • Drug Use
  • Find Correlation
  • Describe association
  • Do results confirm that marijuana is a gateway
    drug, that it leads to the use of other drugs?
Write a Comment
User Comments (0)
About PowerShow.com