Multiple Regression Models - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Multiple Regression Models

Description:

bi determines the contribution of the independent variable xi ... straight-line relationship (e.g., a curvilinear relationship may be appropriate) ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 38
Provided by: stat57
Category:

less

Transcript and Presenter's Notes

Title: Multiple Regression Models


1
Multiple Regression Models
  • Chapter 4

2
General Form of the Multiple Regression Model
  • where y is the dependent variable
  • x1, x2,,xk are the independent variables
  • E(y)b0b1x1b2x2bkxk is the deterministic
    portion of the model
  • bi determines the contribution of the
    independent variable xi
  • Note The symbols x1, x2,,xk may represent
    higher-order terms. For example, x1 might
    represent the current interest rate, x2 might
    represent x12, and so forth.

3
Analyzing a Multiple Regression Model
  • STEP 1 Collect the sample data, i.e., the values
    of y, x1, x2,,xk, for each experimental unit in
    the sample.
  • STEP 2 Hypothesize the form of the model, i.e.,
    the deterministic component, E(y). This involves
    choosing which independent variables to include
    in the model.
  • STEP 3 Use the method of least squares to
    estimate the unknown parameters b0, b1,,bk.
  • STEP 4 Specify the probability distribution of
    the random error component e and estimate its
    variance s2.

4
Continued
  • STEP 5 Statistically evaluate the utility of the
    model.
  • STEP 6 Check that the assumptions on s and
    satisfied and make model modifications, if
    necessary.
  • STEP 7 Finally, if the model is deemed adequate,
    use the fitted model to estimate the mean value
    of y or to predict a particular value of y for
    given values of independent variables, and to
    make other inferences.

5
Assumptions About the Random Error e
  • For any given set of values of x1, x2,, xk, e
    has a normal probability distribution with mean
    equal to 0 i.e., E(e)0 and variance equal to
    s2 i.e.,Var(e) s2.
  • The random errors are independent (in a
    probabilistic sense).

6
A First-Order Model in Five Quantitative
Independent Variables
  • where x1, x2,, x5 are all quantitative variables
    that are not functions of other independent
    variables.
  • Note bi represents the slope of the line
    relating y to xi when all the other xs are held
    fixed

7
Graphs of E(y)12x1x2 for x20,1,2
8
The Method of Least Squares
  • That is, we choose the estimated model
  • that minimizes
  • least squares prediction equation

9
Scatterplots for the Data of Table 4.1
10
Estimator of s2 for Multiple Regression Model
with k Independent Variables
s2 is called the mean square for error (MSE)
11
Test of an Individual Parameter Coefficient in
the Multiple Regression Model
  • TWO-TAILED TEST
  • Test statistics
  • Rejection region
  • where ta/2 are based on n - (k1) degrees of
    freedom

12
A 100(1-a) Confidence Interval for a b Parameter
  • where ta/2 is based on n (k1) degrees of
    freedom and
  • n Number of observations
  • k1 Number of b parameters in the model

13
Caution
  • Extreme care should be exercised when conducting
    t-tests on the individual b parameters in a
    first-order linear model for the purpose of
    determining which independent variables are
    useful for predicting y and which are not. If you
    fail to reject H0 bi 0, several conclusions
    are possible
  • 1. There is no relationship between y and xi .

14
Continued
  • 2. A straight-line relationship y and x exists
    (holding the other xs in the model fixed), but a
    Type II error occurred.
  • 3. A relationship between y and xi (holding the
    other xs in the model fixed) exists, but is more
    complex than a straight-line relationship (e.g.,
    a curvilinear relationship may be appropriate).
    The most you can say about a b parameter test is
    that there is either sufficient (if you reject
    H0 bi 0) or insufficient (if you do not reject
    H0 bi 0) evidence of a linear (straight-line)
    relationship between y and xi.

15
Definition 4.1
  • The multiple coefficient of determination, R 2,
    is defined as
  • where
    , and is the predicted value of yi for the
    multiple regression model.

16
Adjusted Multiple Coefficient of Determination
  • Note Ra2 R 2

17
Global Test
  • H0 b1 b2 b3 0
  • Ha At least one of the coefficients is nonzero
  • Test statistic
  • Rejection region F gtFa , where F is based on k
    numerator and n (k1) denominator degrees of
    freedom.

18
Testing Global Usefulness of the Model The
Analysis of Variance F-Test
  • H0 b1 b2 bk 0 (All model terms are
    unimportant for predicting y)
  • Ha At least one bi ? 0 (At least on model
    term is useful for predicting y)
  • Test statistic
  • where n is the sample size and k is the number of
    terms in the model.

19
Continued
  • Rejection region F gtFa , with k numerator
    degrees of freedom and n (k1) denominator
    degrees of freedom.
  • Assumptions The standard regression assumptions
    about the random error component (Section 4.2)

20
Caution
  • A rejection of the null hypothesis H0 b1b2bk
    0 in the global F-test leads to the conclusion
    with 100(1-a) confidence that the model is
    statistically useful. However, statistically
    useful does not necessarily mean best.
    Another model may prove even more useful in terms
    of providing more reliable estimates and
    predictions. This global F-test is usually
    regarded as a test that the model must pass to
    merit further consideration.

21
Recommendation for Checking the Utility of a
Multiple Regression Model
  • First, conduct a test of overall model adequacy
    using the F-test, that is, test H0 b1b2bk
    0. If the model is deemed adequate (that is, if
    you reject H0), then proceed to step 2.
    Otherwise, you should hypothesize and fit another
    model. The new model may include more independent
    variables or higher-order terms.
  • Conduct t- tests on those b parameters in which
    you are particularly interested (that is, the
    most important bs). These usually involve only
    the bs associated with higher-order terms (x2,
    x1x2, etc.). However, it is a safe practice to
    limit the number of bs that are tested.
    Conducting a series of t-tests leads to a high
    overall Type I error rate a.

22
An Interaction Model Relating E(y) to Two
Quantitative Independent Variables
  • where
  • (b1 b3x2) represents the change in E(y) for
    every 1-unit increase in x1, holding x2 fixed
  • (b2 b3x1) represents the change in E(y) for
    every 1-unit increase in x2, holding x1 fixed

23
Caution
  • Once interaction has been deemed important in the
    model , do not conduct t-tests on the b
    coefficients of the first-order terms x1 and x2.
    These terms should be kept in the model
    regardless of the magnitude of their associated
    p-values shown on the printout.

24
A Quadratic (Second-Order) Model in a Single
Quantitative Independent Variable
  • where b0 is the y-intercept of the curve
  • b1 is a shift parameter
  • b2 is the rate of curvature

25
Global F-test
  • H0 b1 b2 0
  • Ha At least one of the above coefficients is
    nonzero

26
Using the Model for Estimation and Prediction
27
A First-Order Model Relating E(y) to Five
Quantitative x s
28
A Quadratic (Second-Order) Model Relating E(y) to
One Quantitative x
29
An Interaction Model Relating E(y) to Two
Quantitative x s
30
A Complete Second-Order Model with Two
Quantitative xs
31
A Model Relating E(y) to a Qualitative
Independent Variable with Two Levels
  • where
  • Interpretation of bs

32
A Model Relating E(y) to a Qualitative
Independent Variable with Three Levels
  • where
  • Interpretation of bs

33
A Multiplicative (Log) Model Relating y to
Several Independent Variables
  • where ln(y) natural logarithm of y
  • Interpretation of b s
  • (ebi 1) x 100 Percentage change in y for
    every 1 unit increase in xi, holding all other
    xs fixed

34
Definition 4.3
  • Two models are nested if one model contains all
    the terms in the second model and at least one
    additional term. The more complex of the two
    models is called the complete (or full) model.
    The simpler of the two models is called the
    reduced (or restricted) model.

35
F Test for Comparing Nested Models
  • Reduced model
  • Complete model
  • H0 bg1 bg2 bk 0
  • Ha At least one of the b parameters being tested
    is nonzero.
  • Test statistic

36
Continued
  • where SSER Sum of squared errors for the
    reduced model
  • SSEC Sum of squared errors for the complete
    model
  • MSEC Mean square error for the complete model
  • k g Number of b parameters specified in H0
    (i.e., number of bs tested)
  • k 1 Number of b parameters in the complete
    model (including b0)
  • n Total sample size
  • Rejection region F gt Fa where
  • n1 k g Degrees of freedom for the
    numerator
  • n2 n (k 1) Degrees of freedom for the
    denominator

37
Definition 4.4
  • A parsimonious model is a model with a small
    number of b parameters. In situations where two
    competing models have essentially the same
    predictive power (as determined by an F test),
    choose the more parsimonious of the two.
Write a Comment
User Comments (0)
About PowerShow.com