Multiple Regression Models - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

Multiple Regression Models

Description:

bi determines the contribution of the independent variable xi ... straight-line relationship (e.g., a curvilinear relationship may be appropriate) ... – PowerPoint PPT presentation

Number of Views:28

Avg rating:3.0/5.0

Slides: 38

Provided by: stat57

Category:

more less

Transcript and Presenter's Notes

Title: Multiple Regression Models

1
Multiple Regression Models

Chapter 4

2
General Form of the Multiple Regression Model

where y is the dependent variable
x1, x2,,xk are the independent variables
E(y)b0b1x1b2x2bkxk is the deterministic
portion of the model
bi determines the contribution of the
independent variable xi
Note The symbols x1, x2,,xk may represent
higher-order terms. For example, x1 might
represent the current interest rate, x2 might
represent x12, and so forth.

3
Analyzing a Multiple Regression Model

STEP 1 Collect the sample data, i.e., the values
of y, x1, x2,,xk, for each experimental unit in
the sample.
STEP 2 Hypothesize the form of the model, i.e.,
the deterministic component, E(y). This involves
choosing which independent variables to include
in the model.
STEP 3 Use the method of least squares to
estimate the unknown parameters b0, b1,,bk.
STEP 4 Specify the probability distribution of
the random error component e and estimate its
variance s2.

4
Continued

STEP 5 Statistically evaluate the utility of the
model.
STEP 6 Check that the assumptions on s and
satisfied and make model modifications, if
necessary.
STEP 7 Finally, if the model is deemed adequate,
use the fitted model to estimate the mean value
of y or to predict a particular value of y for
given values of independent variables, and to
make other inferences.

5
Assumptions About the Random Error e

For any given set of values of x1, x2,, xk, e
has a normal probability distribution with mean
equal to 0 i.e., E(e)0 and variance equal to
s2 i.e.,Var(e) s2.
The random errors are independent (in a
probabilistic sense).

6
A First-Order Model in Five Quantitative
Independent Variables

where x1, x2,, x5 are all quantitative variables
that are not functions of other independent
variables.
Note bi represents the slope of the line
relating y to xi when all the other xs are held
fixed

7
Graphs of E(y)12x1x2 for x20,1,2
8
The Method of Least Squares

That is, we choose the estimated model
that minimizes
least squares prediction equation

9
Scatterplots for the Data of Table 4.1
10
Estimator of s2 for Multiple Regression Model
with k Independent Variables
s2 is called the mean square for error (MSE)
11
Test of an Individual Parameter Coefficient in
the Multiple Regression Model

TWO-TAILED TEST
Test statistics
Rejection region
where ta/2 are based on n - (k1) degrees of
freedom

12
A 100(1-a) Confidence Interval for a b Parameter

where ta/2 is based on n (k1) degrees of
freedom and
n Number of observations
k1 Number of b parameters in the model

13
Caution

Extreme care should be exercised when conducting
t-tests on the individual b parameters in a
first-order linear model for the purpose of
determining which independent variables are
useful for predicting y and which are not. If you
fail to reject H0 bi 0, several conclusions
are possible
1. There is no relationship between y and xi .

14
Continued

2. A straight-line relationship y and x exists
(holding the other xs in the model fixed), but a
Type II error occurred.
3. A relationship between y and xi (holding the
other xs in the model fixed) exists, but is more
complex than a straight-line relationship (e.g.,
a curvilinear relationship may be appropriate).
The most you can say about a b parameter test is
that there is either sufficient (if you reject
H0 bi 0) or insufficient (if you do not reject
H0 bi 0) evidence of a linear (straight-line)
relationship between y and xi.

15
Definition 4.1

The multiple coefficient of determination, R 2,
is defined as
where
, and is the predicted value of yi for the
multiple regression model.

16
Adjusted Multiple Coefficient of Determination

Note Ra2 R 2

17
Global Test

H0 b1 b2 b3 0
Ha At least one of the coefficients is nonzero
Test statistic
Rejection region F gtFa , where F is based on k
numerator and n (k1) denominator degrees of
freedom.

18
Testing Global Usefulness of the Model The
Analysis of Variance F-Test

H0 b1 b2 bk 0 (All model terms are
unimportant for predicting y)
Ha At least one bi ? 0 (At least on model
term is useful for predicting y)
Test statistic
where n is the sample size and k is the number of
terms in the model.

19
Continued

Rejection region F gtFa , with k numerator
degrees of freedom and n (k1) denominator
degrees of freedom.
Assumptions The standard regression assumptions
about the random error component (Section 4.2)

20
Caution

A rejection of the null hypothesis H0 b1b2bk
0 in the global F-test leads to the conclusion
with 100(1-a) confidence that the model is
statistically useful. However, statistically
useful does not necessarily mean best.
Another model may prove even more useful in terms
of providing more reliable estimates and
predictions. This global F-test is usually
regarded as a test that the model must pass to
merit further consideration.

21
Recommendation for Checking the Utility of a
Multiple Regression Model

First, conduct a test of overall model adequacy
using the F-test, that is, test H0 b1b2bk
0. If the model is deemed adequate (that is, if
you reject H0), then proceed to step 2.
Otherwise, you should hypothesize and fit another
model. The new model may include more independent
variables or higher-order terms.
Conduct t- tests on those b parameters in which
you are particularly interested (that is, the
most important bs). These usually involve only
the bs associated with higher-order terms (x2,
x1x2, etc.). However, it is a safe practice to
limit the number of bs that are tested.
Conducting a series of t-tests leads to a high
overall Type I error rate a.

22
An Interaction Model Relating E(y) to Two
Quantitative Independent Variables

where
(b1 b3x2) represents the change in E(y) for
every 1-unit increase in x1, holding x2 fixed
(b2 b3x1) represents the change in E(y) for
every 1-unit increase in x2, holding x1 fixed

23
Caution

Once interaction has been deemed important in the
model , do not conduct t-tests on the b
coefficients of the first-order terms x1 and x2.
These terms should be kept in the model
regardless of the magnitude of their associated
p-values shown on the printout.

24
A Quadratic (Second-Order) Model in a Single
Quantitative Independent Variable

where b0 is the y-intercept of the curve
b1 is a shift parameter
b2 is the rate of curvature

25
Global F-test

H0 b1 b2 0
Ha At least one of the above coefficients is
nonzero

26
Using the Model for Estimation and Prediction
27
A First-Order Model Relating E(y) to Five
Quantitative x s
28
A Quadratic (Second-Order) Model Relating E(y) to
One Quantitative x
29
An Interaction Model Relating E(y) to Two
Quantitative x s
30
A Complete Second-Order Model with Two
Quantitative xs
31
A Model Relating E(y) to a Qualitative
Independent Variable with Two Levels

where
Interpretation of bs

32
A Model Relating E(y) to a Qualitative
Independent Variable with Three Levels

where
Interpretation of bs

33
A Multiplicative (Log) Model Relating y to
Several Independent Variables

where ln(y) natural logarithm of y
Interpretation of b s
(ebi 1) x 100 Percentage change in y for
every 1 unit increase in xi, holding all other
xs fixed

34
Definition 4.3

Two models are nested if one model contains all
the terms in the second model and at least one
additional term. The more complex of the two
models is called the complete (or full) model.
The simpler of the two models is called the
reduced (or restricted) model.

35
F Test for Comparing Nested Models

Reduced model
Complete model
H0 bg1 bg2 bk 0
Ha At least one of the b parameters being tested
is nonzero.
Test statistic

36
Continued

where SSER Sum of squared errors for the
reduced model
SSEC Sum of squared errors for the complete
model
MSEC Mean square error for the complete model
k g Number of b parameters specified in H0
(i.e., number of bs tested)
k 1 Number of b parameters in the complete
model (including b0)
n Total sample size
Rejection region F gt Fa where
n1 k g Degrees of freedom for the
numerator
n2 n (k 1) Degrees of freedom for the
denominator

37
Definition 4.4

A parsimonious model is a model with a small
number of b parameters. In situations where two
competing models have essentially the same
predictive power (as determined by an F test),
choose the more parsimonious of the two.

Write a Comment

User Comments (0)