Title: ASSESSING THE STRENGTH
1- ASSESSING THE STRENGTH
- OF THE
- REGRESSION MODEL
2Assessing the Models Strength
- Although the best straight line through a set of
points may have been found and the assumptions
for ? may appear valid, is the resulting
regression line useful in predicting y?
3STEP 4HOW GOOD IS THE MODEL?
- Can we conclude that there is a linear relation
between y and x? - This is a hypothesis test (t-test)
- What proportion of the overall variability in y
(from its mean) can be explained by changes in x? - This is a performance measure called -- the
coefficient of determination (denoted by r2)
4Can we conclude a linear relation exists between
y and x?
- We are hypothesizing that y changes linearly with
x y ?0 ?1x. That is, if x goes up by 1, y
will change by ?1. - But if no linear relation exists, then that means
if x goes up by 1, y will not change, i.e. ?1
0.
5The Hypothesis Test
- To test whether or not a linear relation exists
- H0 ?1 0 (No linear relation exists)
- HA ?1 ? 0 (A linear relation does exists)
- ? the significance level
- Reject H0 (Accept HA) if t gt t?/2 or if t lt
-t?/2 - with Degrees of Freedom n- ( betas) n-2
6The t statistic for the test of ?1 0
7HAND CALCULATIONS
Test Reject H0 if t gt t.025,8 2.306 or t lt
-t.025,8 -2.306
5.123 gt 2.306 Can conclude ß1 ?0, i.e. a linear
relation exists.
895 Confidence Interval for ?1
- (Point Estimate) ? t.025,n-2(Appropriate std
dev.)
9Coefficient of Determination -- r2
The proportion of the total change in y that can
be explained by changes in the x values is called
the coefficient of determination, denoted r2.
10Hand Calculation of SSR, SSE, SST
1 1200 101000 109567.57 186802403.21 73403214.02 26010000
2 800 92000 88540.54 54161643.54 11967859.75 15210000
3 1000 110000 99054.05 9948056.98 119813732.7 198810000
4 1300 120000 114824.32 358130051.13 26787618.7 580810000
5 700 90000 83283.78 159168911.61 45107560.26 34810000
6 800 82000 88540.54 54161643.54 42778670.56 193210000
7 1000 93000 99054.05 9948056.98 36651570.49 8410000
8 600 75000 78027.03 319443162.89 9162892.622 436810000
9 900 91000 93797.30 4421358.66 7824872.169 24010000
10 1100 105000 104310.81 70741738.50 474981.7385 82810000
SUM 1226927027.03 373972972.97 1600900000
11Hand Calculation of r2
12Interpretation of r2
- r2 1 -- perfect (positive or negative) relation
- i.e. points fit exactly along the regression
line - r2 close to 0 -- very little relation
- The higher the value of r2 the better the model
fits the data
13Pearson Correlation Coefficient, r
- r ?r2, which can also be calculated by
cov(x,y)/sxsy is called the Pearson correlation
coefficient. - This is also used to measure the strength of the
relation between y and x. - r -1 means perfect negative correlation (i.e.
all points fit exactly on a line with negative
slope). - r 1 means perfect positive correlation (i.e.
all points fit exactly on a line with positive
slope). - r 0 means no correlation.
- Other values give relative strength, but have no
exact meaning like r2 so we usually use r2 - When we take the square root of r2 to get r, the
sign in front of r is the sign of b1 positive
or negative slope
14EXCEL
15Steps Using Excel
- Determine regression equation
- Equation y 46486.49 52.56757x
- Can you conclude a linear relation exists between
y and x? - The p-value for the test is .000904 lt ?.05YES
- What proportion of the overall variation in y is
explained by changes to x? - This is r2 .766398 -- a high r2
- CONCLUSION Overall a good model!
16Review
- Can we conclude a linear relation exists?
- Two-tailed t-test of ?1? 0
- Look at p-value for the x-variable on Excel
- Computation of a confidence interval for the
amount y will change per unit increase in x (i.e.
for ?1) - By hand
- Printed on Excel Output
- What proportion of the overall variation in y is
explained by changes in x? r2 - By hand
- Printed on Excel
- Pearson correlation coefficient r
- Square root of r2
- Sign is same as b1