Title: Least Squares Regression
1Least Squares Regression
- Fitting a Line to Bivariate Data
2Linear Relationships
- Avg. occupants per car
- 1980 6/car
- 1990 3/car
- 2000 1.5/car
- By the year 2010 every fourth car will have
nobody in it!
- Food for Thought
- Kind of mathematical relationship between year
and avg. no. of occupants per car? - Why might relation-
- ship break down by 2010?
3Basic Terminology
- Scatterplots, correlation interested in
association between 2 variables (assign x and y
arbitrarily) - Least squares regression does one quantitative
variable explain or cause changes in another
variable?
4Basic Terminology (cont.)
- Explanatory variable explains or causes changes
in the other variable the x variable.
(independent variable) - Response variable the y -variable it responds
to changes in the x - variable. (dependent
variable)
5Examples
- Fertilizer (x ) corn yield (y )
- Advertising (x ) store income (y )
- Drug dose (x ) blood pressure (y )
- Daily temperature (x )
- natural gas demand (y )
- change in min wage(x)
- unemployment rate (y)
6Simplest Relationship
- Simplest equation that describes the dependence
of variable y on variable x - y a bx
- linear equation
- graph is line with slope b and y-intercept a
7Graph
yabx
y
rise
Slope brise/run
a
run
x
0
8Notation
- (x1, y1), (x2, y2), . . . , (xn, yn)
- draw the line yabx through the scatterplot ,
the point on the line corresponding to xi is
9Observed y, Predicted y
predicted y when x2.7 yhat a bx a
b2.7
2.7
10Scatterplot Fuel Consumption vs Car Weight
Best line?
11Scatterplot with least squares prediction line
12How do we draw the line? Residuals
13Residuals graphically
14Criterion for choosing what line to draw method
of least squares
- The method of least squares chooses the line that
makes the sum of squares of the residuals as
small as possible - This line has slope b and intercept a that
minimizes
15Least Squares Line yabx Slope b and Intercept a
- Another way to calculate the slope
16Example Income vs Consumption Expenditure
17Questions
- Construct scatterplot determine if linear model
is appropriate. If so - find the least squares prediction line
- Estimate consumption expenditure in a household
with an income of (i) 6,000 (ii) 25,000.
Comfortable with estimates? - Compute the residuals
18Scatterplot
19Solution
20Calculations
21least squares prediction line
22Least Squares Prediction Line
23Consumption Expenditure Prediction When x6,000
7.4
6
24Consumption Expenditure Prediction When x25,000
11.2
25
25The least squares line always goes through the
point with coordinates (x, y)
( x, y ) ( 9, 8 )
26C. Compute the Residuals
27Residuals
28Income Residual Plot
29Sresiduals, S(residuals)2
- Note that
- Sresiduals 0
- S(residuals)2 3.6
- From formula in box on p. 7
- SSE?yi2 a?yi b?xiyi
- 330 6.240 - .2392
- 330 248 78.4 3.6
- Any other line drawn through the scatterplot will
have - S(residuals)2 gt 3.6
30Car Weight, Fuel Consumption Example, cont.
31(No Transcript)
32Calculations
33Scatterplot with least squares prediction line
34The Least Squares Line Always goes Through ( x, y
)
(x, y ) (2.9, 4.39)
35Using the least squares line for prediction. Fuel
consumption of 3,000 lb car? (x3)
36Be Careful!
Fuel consumption of 500 lb car? (x .5)
x .5 is outside the range of the x-data that we
used to determine the least squares line
37Avoid GIGO! Evaluating the least squares line
- Create scatterplot. Approximately linear?
- Calculate r2, the square of the correlation
coefficient - Examine residual plot
38r2 The Variation Accounted For
- The square of the correlation coefficient r gives
important information about the usefulness of the
least squares line
39r2 important information for evaluating the
usefulness of the least squares line
-1 r 1 implies 0 r2 1
The square of the correlation coefficient, r2, is
the fraction of the variation in y that is
explained by the least squares regression of y on
x.
The square of the correlation coefficient, r2, is
the fraction of the variation in y that is
explained by the variation in x.
40Example car weight, fuel consumption
- xcar weight, yfuel consumption
- r2 (.9766)2 ? .95
- About 95 of the variation in fuel consumption
(y) is explained by the linear relationship
between car weight (x) and fuel consumption (y). - What else affects fuel consumption?
- Driver, size of engine, tires, road, etc.
41Example SAT scores
42SAT scores calculations
43SAT scores result
r2 (-.868)2 .7534
If 57 of NC seniors take the SAT, the predicted
mean score is
44Avoid GIGO! Evaluating the least squares line
- Create scatterplot. Approximately linear?
- Calculate r2, the square of the correlation
coefficient - Examine residual plot
45Residuals
- residual observed y - predicted y
- y - y
- Properties of residuals
- The residuals always sum to 0 (therefore the mean
of the residuals is 0) - The least squares line always goes through the
point (x, y)
46Graphicallyresidual y - y
47Residual Plot
- Residuals help us determine if fitting a least
squares line to the data makes sense - When a least squares line is appropriate, it
should model the underlying relationship nothing
interesting should be left behind - We make a scatterplot of the residuals in the
hope of finding - NOTHING!
48Car Wt/ Fuel Consump Residuals
- CAR WT. FUEL CONSUMP. Pred FUEL CONSUMP.
Residuals - 3.4 5.5 5.2094980690 .290501931
- 3.8 5.9 5.865096525 0.034903475
- 4.1 6.5 6.356795367 0.143204633
- 2.2 3.3 3.242702703 0.057297297
- 2.6 3.6 3.898301158 -0.29830115
- 2.9 4.6 4.39 0.21
- 2 2.9 2.914903475 -0.01490347
- 2.7 3.6 4.062200772 -0.46220077
- 1.9 3.1 2.751003861 0.348996139
- 3.4 4.9 5.209498069 -0.309498069
49Example Car wt/fuel consump. residual plot page
13
50SAT Residuals p. 14
51Linear Relationship?
52Garbage In Garbage Out
53Residual Plot Clue to GIGO
54(No Transcript)