Title: Multiple Regression Models
1Multiple Regression Models
2General Form of the Multiple Regression Model
- where y is the dependent variable
- x1, x2,,xk are the independent variables
- E(y)b0b1x1b2x2bkxk is the deterministic
portion of the model - bi determines the contribution of the
independent variable xi - Note The symbols x1, x2,,xk may represent
higher-order terms. For example, x1 might
represent the current interest rate, x2 might
represent x12, and so forth.
3Analyzing a Multiple Regression Model
- STEP 1 Collect the sample data, i.e., the values
of y, x1, x2,,xk, for each experimental unit in
the sample. - STEP 2 Hypothesize the form of the model, i.e.,
the deterministic component, E(y). This involves
choosing which independent variables to include
in the model. - STEP 3 Use the method of least squares to
estimate the unknown parameters b0, b1,,bk. - STEP 4 Specify the probability distribution of
the random error component e and estimate its
variance s2.
4Continued
- STEP 5 Statistically evaluate the utility of the
model. - STEP 6 Check that the assumptions on s and
satisfied and make model modifications, if
necessary. - STEP 7 Finally, if the model is deemed adequate,
use the fitted model to estimate the mean value
of y or to predict a particular value of y for
given values of independent variables, and to
make other inferences.
5Assumptions About the Random Error e
- For any given set of values of x1, x2,, xk, e
has a normal probability distribution with mean
equal to 0 i.e., E(e)0 and variance equal to
s2 i.e.,Var(e) s2. - The random errors are independent (in a
probabilistic sense).
6A First-Order Model in Five Quantitative
Independent Variables
- where x1, x2,, x5 are all quantitative variables
that are not functions of other independent
variables. - Note bi represents the slope of the line
relating y to xi when all the other xs are held
fixed
7Graphs of E(y)12x1x2 for x20,1,2
8The Method of Least Squares
- That is, we choose the estimated model
- that minimizes
- least squares prediction equation
9Scatterplots for the Data of Table 4.1
10Estimator of s2 for Multiple Regression Model
with k Independent Variables
s2 is called the mean square for error (MSE)
11Test of an Individual Parameter Coefficient in
the Multiple Regression Model
- TWO-TAILED TEST
- Test statistics
- Rejection region
- where ta/2 are based on n - (k1) degrees of
freedom
12A 100(1-a) Confidence Interval for a b Parameter
- where ta/2 is based on n (k1) degrees of
freedom and - n Number of observations
- k1 Number of b parameters in the model
13Caution
- Extreme care should be exercised when conducting
t-tests on the individual b parameters in a
first-order linear model for the purpose of
determining which independent variables are
useful for predicting y and which are not. If you
fail to reject H0 bi 0, several conclusions
are possible - 1. There is no relationship between y and xi .
14Continued
- 2. A straight-line relationship y and x exists
(holding the other xs in the model fixed), but a
Type II error occurred. - 3. A relationship between y and xi (holding the
other xs in the model fixed) exists, but is more
complex than a straight-line relationship (e.g.,
a curvilinear relationship may be appropriate).
The most you can say about a b parameter test is
that there is either sufficient (if you reject
H0 bi 0) or insufficient (if you do not reject
H0 bi 0) evidence of a linear (straight-line)
relationship between y and xi.
15Definition 4.1
- The multiple coefficient of determination, R 2,
is defined as - where
, and is the predicted value of yi for the
multiple regression model.
16Adjusted Multiple Coefficient of Determination
17Global Test
- H0 b1 b2 b3 0
- Ha At least one of the coefficients is nonzero
- Test statistic
- Rejection region F gtFa , where F is based on k
numerator and n (k1) denominator degrees of
freedom.
18Testing Global Usefulness of the Model The
Analysis of Variance F-Test
- H0 b1 b2 bk 0 (All model terms are
unimportant for predicting y) - Ha At least one bi ? 0 (At least on model
term is useful for predicting y) - Test statistic
- where n is the sample size and k is the number of
terms in the model.
19Continued
- Rejection region F gtFa , with k numerator
degrees of freedom and n (k1) denominator
degrees of freedom. - Assumptions The standard regression assumptions
about the random error component (Section 4.2)
20Caution
- A rejection of the null hypothesis H0 b1b2bk
0 in the global F-test leads to the conclusion
with 100(1-a) confidence that the model is
statistically useful. However, statistically
useful does not necessarily mean best.
Another model may prove even more useful in terms
of providing more reliable estimates and
predictions. This global F-test is usually
regarded as a test that the model must pass to
merit further consideration.
21Recommendation for Checking the Utility of a
Multiple Regression Model
- First, conduct a test of overall model adequacy
using the F-test, that is, test H0 b1b2bk
0. If the model is deemed adequate (that is, if
you reject H0), then proceed to step 2.
Otherwise, you should hypothesize and fit another
model. The new model may include more independent
variables or higher-order terms. - Conduct t- tests on those b parameters in which
you are particularly interested (that is, the
most important bs). These usually involve only
the bs associated with higher-order terms (x2,
x1x2, etc.). However, it is a safe practice to
limit the number of bs that are tested.
Conducting a series of t-tests leads to a high
overall Type I error rate a.
22An Interaction Model Relating E(y) to Two
Quantitative Independent Variables
- where
- (b1 b3x2) represents the change in E(y) for
every 1-unit increase in x1, holding x2 fixed -
- (b2 b3x1) represents the change in E(y) for
every 1-unit increase in x2, holding x1 fixed
23Caution
- Once interaction has been deemed important in the
model , do not conduct t-tests on the b
coefficients of the first-order terms x1 and x2.
These terms should be kept in the model
regardless of the magnitude of their associated
p-values shown on the printout.
24A Quadratic (Second-Order) Model in a Single
Quantitative Independent Variable
- where b0 is the y-intercept of the curve
- b1 is a shift parameter
- b2 is the rate of curvature
25Global F-test
- H0 b1 b2 0
- Ha At least one of the above coefficients is
nonzero
26Using the Model for Estimation and Prediction
27A First-Order Model Relating E(y) to Five
Quantitative x s
28A Quadratic (Second-Order) Model Relating E(y) to
One Quantitative x
29An Interaction Model Relating E(y) to Two
Quantitative x s
30A Complete Second-Order Model with Two
Quantitative xs
31A Model Relating E(y) to a Qualitative
Independent Variable with Two Levels
- where
- Interpretation of bs
32A Model Relating E(y) to a Qualitative
Independent Variable with Three Levels
- where
- Interpretation of bs
33A Multiplicative (Log) Model Relating y to
Several Independent Variables
- where ln(y) natural logarithm of y
- Interpretation of b s
- (ebi 1) x 100 Percentage change in y for
every 1 unit increase in xi, holding all other
xs fixed
34Definition 4.3
- Two models are nested if one model contains all
the terms in the second model and at least one
additional term. The more complex of the two
models is called the complete (or full) model.
The simpler of the two models is called the
reduced (or restricted) model.
35F Test for Comparing Nested Models
- Reduced model
- Complete model
- H0 bg1 bg2 bk 0
- Ha At least one of the b parameters being tested
is nonzero. - Test statistic
36Continued
- where SSER Sum of squared errors for the
reduced model - SSEC Sum of squared errors for the complete
model - MSEC Mean square error for the complete model
- k g Number of b parameters specified in H0
(i.e., number of bs tested) - k 1 Number of b parameters in the complete
model (including b0) - n Total sample size
- Rejection region F gt Fa where
- n1 k g Degrees of freedom for the
numerator - n2 n (k 1) Degrees of freedom for the
denominator
37Definition 4.4
- A parsimonious model is a model with a small
number of b parameters. In situations where two
competing models have essentially the same
predictive power (as determined by an F test),
choose the more parsimonious of the two.