Chapter 8 CorrelationLinear Regression - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Chapter 8 CorrelationLinear Regression

Description:

Linear Relationships: If the explanatory and response variables show a straight ... Power 1: No change at all. Power : the square root of the data values. Y^(1/2) ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 48
Provided by: eli7151
Category:

less

Transcript and Presenter's Notes

Title: Chapter 8 CorrelationLinear Regression


1
Chapter 8 Correlation/Linear Regression
  • Linear Relationships If the explanatory and
    response variables show a straight-line pattern,
    then we say they follow a linear relationship.
  • Curved relationships and clusters are other forms
    to watch for.

2
Chapter 8 Correlation/Linear Regression
  • Linear Relationships If the explanatory and
    response variables show a straight-line pattern,
    then we say they follow a linear relationship.
  • Curved relationships and clusters are other forms
    to watch for.

3
Chapter 8 Correlation/Linear Regression
  • Direction If the relationship has a clear
    direction, we speak of either positive
    association or negative association.
  • Positive association high values of the two
    variables tend to occur together
  • Negative association high values of one variable
    tend to occur with low values of the other
    variable.

4
Chapter 8 Correlation/Linear Regression
  • Correlation is a number that determines the
    strength of a linear relationship between two
    quantitative variables.
  • Correlation is always between -1 and 1 inclusive
  • The sign of a correlation coefficient determines
    positive/negative association between the
    variables

5
Chapter 8 Correlation/Linear Regression
  • Strong correlation If r is between 0.8 and 1 and
    -0.8 and -1
  • Moderate correlation If r is between 0.5 and 0.8
    and -0.8 and -0.5
  • Weak correlation If r is between 0 and 0.5 and
    -0.5 and 0

6
Chapter 8 Correlation/Linear Regression
  • Correlation does not distinguish between X and Y
  • Correlation is unitless
  • Correlation measures the strength of linear
    relationship between two quantitative variables

7
Chapter 8 Correlation/Linear Regression
8
Choose the best description of the scatter plot
  • Moderate, negative, linear association
  • Strong, curved, association
  • Moderate, positive, linear association
  • Strong, negative, non-linear association
  • Weak, positive, linear association

9
Which of the following values is most likely to
represent the correlation coefficient for the
data shown in this scatterplot?
  • r -0.67
  • r -0.10
  • r 0.71
  • r 0.96
  • r 1.00

10
Which of the following values is most likely to
represent the correlation coefficient for the
data shown in this scatterplot?
  • r -0.67
  • r -0.10
  • r 0.71
  • r 0.96
  • r 1.00

11
Which of the following values is most likely to
represent the correlation coefficient for the
data shown in this scatterplot?
  • r -0.67
  • r -0.10
  • r 0.71
  • r 0.96
  • r 1.00

12
Cautions about Correlation
  • It should only be used
  • To describe the relationship between 2
    QUANTITATIVE variables
  • When the association is linear enough
  • When there are no outliers
  • Correlation does NOT imply causation

13
  • A teacher at an elementary school measures the
  • heights of children on the playground and then
    makes a
  • scatter plot of the childrens heights and
    reading test
  • scores. The data meet the conditions for
    correlation so
  • she calculates r .79. Which conclusion is most
  • accurate?
  • Being taller causes students to read better
  • Being shorter causes students to read better
  • Taller students tend to have better reading
    scores
  • Shorter students tend to have better reading
    scores

14
Chapter 8 Linear Models
  • Easiest to understand and analyze
  • Relationships are often linear
  • Variables with non-linear relationship can often
    be transformed into linear relationship through
    an appropriate transformation
  • Even when a relationship is non-linear, a linear
    model may provide an accurate approximation for a
    limited range of values.
  • Strength The strength of a linear relationship
    is determined by how close the points in the
    scatterplot lie to a straight line

15
Least Square Regression Line - Calculations
16
Chapter 8 Linear Models
  • Not all data fall on a straight line!
  • Residual Data Model or
  • Residual Observed Y Predicted y

17
Chapter 8 Linear Models
  • Example
  • X Fat Y Calories
  • 19 410
  • 31 580
  • 34 590
  • 35 570
  • 39 640
  • 39 680
  • 43 660

18
Chapter 8 Linear Models
19
Chapter 8 Linear Models
20
Chapter 8 Linear Models
  • S 27.3340 R-Sq 92.3 R-Sq(adj) 90.7
  • Residual Plot

21
Chapter 9 Regression Wisdom
  • Extrapolation Reaching beyond the data
  • Outliers Regression models are sensitive to
    outliers
  • Leverage An unusual data point whose x value is
    far from the mean of the x values
  • A point with high leverage has the potential to
    change the regression line.

22
Chapter 9 Regression Wisdom
  • Influential A point is influential if omitting
    it from the analysis gives a very different
    model.
  • Influence depends on leverage and residual
  • Lurking variables A variable that is not
    included in the construction of the linear
    model/study.

23
Chapter 9 Regression Wisdom
  • Lurking variables may influence correlation and
    regression models.
  • Association is not causations!!

24
Summary
  • r is a number between -1 and 1
  • r 1 or r -1 indicates a perfect correlation
    case where all data points lie on a straight line
  • r gt 0 indicates positive association
  • r lt 0 indicates negative association
  • r value does not change when units of measurement
    are changed (correlation has no units!)
  • Correlation treats X and Y symmetrically. The
    correlation of X with Y is the same as the
    correlation of Y with X

25
Summary
  • Quantitative variable condition Do not apply
    correlation to categorical variables
  • Correlation can be misleading if the relationship
    is not linear
  • Outliers distort correlation dramatically. Report
    correlation with/without outliers.

26
More Examples for Checking Linear Enough
ConditionAll four data sets have r .82
27
In which case is a linear model appropriate?
B.
A.
C.
D.
28
A. Linear model appropriate residual plot shows
no pattern
B. Linear model not appropriate clear pattern of
residuals
29
C. Graph has an outlier outlier is clear on the
residual plot
D. Linear model not appropriate clear pattern of
residuals
30
Calculating r with the TI-83/84
  • The first time you do this
  • Press 2nd, CATALOG (above 0)
  • Scroll down to DiagnosticOn
  • Press ENTER, ENTER
  • Read Done
  • Your calculator will remember this setting even
    when turned off

31
Calculating r with the TI-83/84
  • Press STAT, ENTER
  • If there are old values in L1
  • Highlight L1, press CLEAR, then ENTER
  • If there are old values in L2
  • Highlight L2, press CLEAR, then ENTER
  • Enter predictor (x) values in L1
  • Enter response (y) values in L2
  • Pairs must line up
  • There must be the same number of predictor and
    response values

32
Calculating r with the TI-83/84
  • Press STAT, gt (to CALC)
  • Scroll down to LinReg(axb), press ENTER, ENTER
  • Read r at bottom of screen

33
Re-Expression with the TI-83/84
  • Most common re-expressions are built in.
  • To see whats available, try
  • STAT
  • CALC
  • Scroll down to see
  • 5QuadReg
  • 6CubicReg
  • 9LnReg
  • 0ExpReg
  • APwrReg

34
Example
  • X Age in months
  • Y Height in inches
  • X 18 19 20 21 22 23 24
  • Y 29.9 30.3 30.7 31 31.38 31.45 31.9

35
Chapter 9 Prediction, Residuals, Influence
  • Linear Model Height 24.212 .321 Age
  • Correlation r .992
  • Examples
  • Age 24 months, Observed Height 31.9
  • Predicted Height 31.916
  • Residual 31.9 31.916 .016

36
Chapter 9 Prediction, Residuals, Influence
  • Age 20 years (2012 240)
  • Predicted Height 8.5 ft!!
  • Residual BIG!
  • Be aware of Extrapolation!

37
Example
  • 4. Relationship between calories and sugar
    content A researcher tracked the sugar content
    and calorie of 15 baked goods and found the
    following information
  • Average sugar content 7.0 grams
  • Standard deviation of sugar content 4.4 grams
  • Average calories 107.0 grams
  • Standard deviation of calories 19.5 grams
  • Correlation between sugar content and calories
  • 0.564

38
Solution to Example
  • a) Find a linear model that describes this
    example
  • b_1r S_y/S_x 0.56419.5/4.4 2.5
    calories per gram of sugar
  • b_0 mean of (Y) b1mean of (X) 107 -2.507
    89.5
  • Linear Model y b_0b_1x
  • y 89.5 2.5x or better
  • calories 89.5 2.50 sugar
  • b) How many calories are there in a muffin with
    6.5 grams of sugar?
  • calories 89.5 2.50 6.5 105.75

39
Chapter 10 Re-expressing Data
  • Example The data shows the number of academic
    journals published on the Internet and during the
    last decade.

40
Chapter 10 Re-expressing Data
41
Chapter 10 Re-expressing Data
  • Re-express data to linearize

42
Chapter 10 Re-expressing Data
43
(No Transcript)
44
Chapter 10 Re-expressing Data
  • Least Square Regression Line has the following
    equation
  • Log(journals) 1.22 0.346 Year
  • Problem
  • How many journals will be published online in
    year 2000?

45
Chapter 10 Re-expressing Data
  • Answer
  • Log(journals) 1.22 0.3469 4.334
  • Answer 21577.44 (10(4.334))

46
Chapter 10 Re-expressing Data
  • Why Re-expressing data?
  • Make a distribution of a variable more symmetric
  • Make the spread of several groups more alike,
    even if their centers differ
  • Make the form of a scatterplot more nearly linear
  • Make the scatter in a scatterplot spreadout more
    evenly rather than thickening at one end.

47
Chapter 10 Re-expressing Data
  • The Ladder of Powers
  • Power 2 the square of the data values y2
  • Try this for unimodal distributions that are
    skewed to the left.
  • Power 1 No change at all
  • Power ½ the square root of the data values
  • Y(1/2)
  • Try this for counted data
  • Power 0 the logarithm of the data values y
  • Try this for measurements that cannot be negative
  • Especially those that grow by percentage
    increases
  • Salries and populations are good examples.
Write a Comment
User Comments (0)
About PowerShow.com