Modeling a Linear Relationship - PowerPoint PPT Presentation

About This Presentation
Title:

Modeling a Linear Relationship

Description:

Title: Understanding Observational Studies Author: Robb Koether Last modified by: Robb Koether Created Date: 1/20/2004 2:58:15 PM Document presentation format – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 65
Provided by: Robb130
Learn more at: https://people.hsc.edu
Category:

less

Transcript and Presenter's Notes

Title: Modeling a Linear Relationship


1
Modeling a Linear Relationship
  • Lecture 44
  • Secs. 13.1 13.3.1
  • Tue, Apr 24, 2007

2
Bivariate Data
  • Data is called bivariate if each observations
    consists of a pair of values (x, y).
  • x is the explanatory variable.
  • y is the response variable.
  • x is also called the independent variable.
  • y is also called the dependent variable.

3
Scatterplots
  • Scatterplot A display in which each observation
    (x, y) is plotted as a point in the xy plane.

4
Example
  • Draw a scatterplot of the following data of
    calories vs. cholesterol in Subway sandwiches.

Calories (x) 350 290 330 290 320 370 280 290 310 230
Cholesterol (y) 50 20 45 15 35 50 20 25 20 0
5
Example
6
Example
  • Does there appear to be a relationship?
  • How can we tell?

7
TI-83 - Scatterplots
  • To set up a scatterplot,
  • Enter the x values in L1.
  • Enter the y values in L2.
  • Press 2nd STAT PLOT.
  • Select Plot1 and press ENTER.

8
TI-83 - Scatterplots
  • The Stat Plot display appears.
  • Select On and press ENTER.
  • Under Type, select the first icon (a small image
    of a scatterplot) and press ENTER.
  • For XList, enter L1.
  • For YList, enter L2.
  • For Mark, select the one you want and press ENTER.

9
TI-83 - Scatterplots
  • To draw the scatterplot,
  • Press ZOOM. The Zoom menu appears.
  • Select ZoomStat (9) and press ENTER. The
    scatterplot appears.
  • Press TRACE and use the arrow keys to inspect the
    individual points.

10
Describing a Linear Relationship
  • How would we describe this relationship?

11
Linear Association
  • Draw (or imagine) an oval around the data set.
  • If the oval is tilted, then there is some linear
    association.
  • If the oval is tilted upwards from left to right,
    then there is positive association.
  • If the oval is tilted downwards from left to
    right, then there is negative association.
  • If the oval is not tilted at all, then there is
    no association.

12
Positive Linear Association
y
x
13
Positive Linear Association
y
x
14
Negative Linear Association
y
x
15
Negative Linear Association
16
No Linear Association
17
No Linear Association
18
Strong vs. Weak Association
  • The association is strong if the oval is narrow.
  • The association is weak if the oval is wide.

19
Strong Positive Linear Association
y
x
20
Strong Positive Linear Association
y
x
21
Weak Positive Linear Association
y
x
22
Weak Positive Linear Association
y
x
23
Example
50
40
30
Cholesterol
20
10
0
Calories
200
250
300
350
400
24
Describing the Relationship
50
40
30
Cholesterol
20
10
0
Calories
200
250
300
350
400
25
Describing the Relationship
  • There appears to be a strong positive linear
    association between calories and cholesterol in
    Subway sandwiches.

26
Example
  • Draw a scatterplot of the following data.

x y
2 3
3 5
5 9
6 12
9 16
27
Simple Linear Regression
  • To quantify the linear relationship between x and
    y, we wish to find the equation of the line that
    best fits the data.
  • Typically, there will be many lines that all look
    pretty good.
  • How do we measure how well a line fits the data?

28
Measuring the Goodness of Fit
  • Which line better fits the data?

y
x
29
Measuring the Goodness of Fit
  • Which line better fits the data?

y
x
30
Measuring the Goodness of Fit
  • Which line better fits the data?

y
x
31
Measuring the Goodness of Fit
  • Which line better fits the data?

y
x
32
Measuring the Goodness of Fit
  • Start with the scatterplot.

y
x
33
Measuring the Goodness of Fit
  • Draw any line through the scatterplot.

y
x
34
Measuring the Goodness of Fit
  • Measure the vertical distances from every point
    to the line

y
x
35
Measuring the Goodness of Fit
  • Each of these represents a deviation, called a
    residual, from the line.

y
e
x
36
Residuals
  • The i th residual The difference between the
    observed value of yi and the predicted, or
    expected, value of yi.
  • Use yi for the predicted yi.
  • The formula for the ith residual is

37
Residuals
  • Notice that the residual is positive if the data
    point is above the line and it is negative if the
    data point is below the line.

38
Measuring the Goodness of Fit
  • The ith residual.

y
yi
ei
yi
x
xi
39
Measuring the Goodness of Fit
  • Find the sum of the squared residuals.

y
yi
ei
yi
x
xi
40
Measuring the Goodness of Fit
  • The smaller the sum of squared residuals, the
    better the fit.

y
yi
ei
yi
x
xi
41
Example
  • Consider the data points

x y
2 3
3 5
5 9
6 12
9 16
42
Example
15
10
5
2
3
4
5
6
7
8
9
43
Least Squares Line
  • Lets see how good the fit is for the line
  • y -1 2x,
  • where y represents the predicted value of y,
    not the observed value.

44
Sum of Squared Residuals
  • Begin with the data set.

x y
2 3
3 5
5 9
6 12
9 16
45
Sum of Squared Residuals
  • Compute the predicted y, using y -1 2x.

x y y
2 3 3
3 5 5
5 9 9
6 12 11
9 16 17
46
Sum of Squared Residuals
  • Compute the residuals, y y.

x y y y y
2 3 3 0
3 5 5 0
5 9 9 0
6 12 11 1
9 16 17 -1
47
Sum of Squared Residuals
  • Square the residuals.

x y y y y (y y)2
2 3 3 0 0
3 5 5 0 0
5 9 9 0 0
6 12 11 1 1
9 16 17 -1 1
48
Sum of Squared Residuals
  • Find the sum of the squared residuals.

x y y y y (y y)2
2 3 3 0 0
3 5 5 0 0
5 9 9 0 0
6 12 11 1 1
9 16 17 -1 1
?SSE ?(y y)2 2.00
49
Least Squares Line
  • Least squares line The line for which the sum
    of the squares of the residuals is as small as
    possible.
  • The least squares line is also called the line of
    best fit or the regression line.

50
Regression Line
  • We will write regression line as
  • a is the y-intercept.
  • b is the slope.
  • This is the usual slope-intercept form
  • with the two terms rearranged and relabeled.

51
TI-83 Computing Residuals
  • It is not hard to compute the residuals and the
    sum of their squares on the TI-83.
  • (Later, we will see a faster method.)
  • Enter the x-values in list L1 and the y-values in
    list L2.
  • Compute a bL1 and store in list L3 (y
    values).
  • Compute (L2 L3)2. This is a list of the
    squared residuals.
  • Compute sum(Ans). This is the sum of the squared
    residuals.

52
Sum of Squared Residuals
  • Now lets see how good the fit is for the line
  • y -0.5 1.9x.
  • We will compute the sum of squared residuals, SSE.

53
Sum of Squared Residuals
  • Begin with the data set.

x y
2 3
3 5
5 9
6 12
9 16
54
Sum of Squared Residuals
  • Compute the predicted y, using y -0.5 1.9x.

x y y
2 3 3.3
3 5 5.2
5 9 9.0
6 12 10.9
9 16 16.6
55
Sum of Squared Residuals
  • Compute the residuals, y y.

x y y y y
2 3 3.3 -0.3
3 5 5.2 -0.2
5 9 9.0 0.0
6 12 10.9 1.1
9 16 16.6 -0.6
56
Sum of Squared Residuals
  • Compute the squared residuals.

x y y y y (y y)2
2 3 3.3 -0.3 0.09
3 5 5.2 -0.2 0.04
5 9 9.0 0.0 0.00
6 12 10.9 1.1 1.21
9 16 16.6 -0.6 0.36
57
Sum of Squared Residuals
  • Find the sum of the squared residuals.

x y y y y (y y)2
2 3 3.3 -0.3 0.09
3 5 5.2 -0.2 0.04
5 9 9.0 0.0 0.00
6 12 10.9 1.1 1.21
9 16 16.6 -0.6 0.36
?SSE ?(y y)2 1.70
58
Sum of Squared Residuals
  • We conclude that y -0.5 1.9x is a better fit
    than y -1 2x.
  • Is it the best fit?

59
Sum of Squared Residuals
y -1 2x
15
10
5
2
3
4
5
6
7
8
9
60
Sum of Squared Residuals
y -0.5 1.9x
15
10
5
2
3
4
5
6
7
8
9
61
Example
  • For all the lines that one could draw through
    this data set,
  • it turns out that 1.70 is the smallest possible
    value for the sum of the squares of the residuals.

x y
2 3
3 5
5 9
6 12
9 16
62
Example
  • Therefore,
  • y -0.5 1.9x
  • is the regression line for this data set.

63
Prediction
  • Use the regression line to predict y when
  • x 4
  • x 7
  • x 20
  • Interpolation Using an x value within the
    observed extremes of x values to predict y.
  • Extrapolation Using an x value beyond the
    observed extremes of x values to predict y.

64
Interpolation vs. Extrapolation
  • Interpolated values are more reliable then
    extrapolated values.
  • The farther out the values are extrapolated, the
    less reliable they are.
Write a Comment
User Comments (0)
About PowerShow.com