Simple Linear Regression Populations and Parameters - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Simple Linear Regression Populations and Parameters

Description:

Estimator of s2 for Multiple Regression Model with k Independent Variables ... If we have 4 independent variables, how many possible subsets do we have? ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 43
Provided by: stat57
Category:

less

Transcript and Presenter's Notes

Title: Simple Linear Regression Populations and Parameters


1
Simple Linear Regression Populations and
Parameters
2
Simple Regression
  • Single independent variable
  • Dependent variable y
  • Linear function

3
Assumption of Linearity
  • Assumption of Linearity the slope of the
    equation does not change as x changes
  • Linearity is not always a reasonable assumption

4
Random Error Term
  • Random Error Term
  • The model
  • When we assume that the s are constants, the
    only random portion of the model for is the
    random error term .

5
Formal Assumptions of Regression Analysis
  • The relation is, in fact, linear, so that the
    errors all have expected value zero for
    all i.
  • The errors all have the same varianceVar
    for all i.
  • The errors are independent of each other
  • The errors are all normally distributed is
    normally distributed for all i.

6
Figure 11.2 Theoretical Distribution of y in
Regression
7
Estimating Model Parameters
  • Intercept
  • Slope
  • is another population parameter

8
Multiple Regression Models
  • Chapter 4

9
The Method of Least Squares
  • That is, we choose the estimated model
  • that minimizes
  • least squares prediction equation

10
Scatterplots for the Data of Table 4.1
11
Estimator of s2 for Multiple Regression Model
with k Independent Variables
s2 is called the mean square for error (MSE)
12
Definition 4.1
  • The multiple coefficient of determination, R 2,
    is defined as
  • where
    , and is the predicted value of yi for the
    multiple regression model.

13
Adjusted Multiple Coefficient of Determination
  • Note Ra2 R 2

14
Global Test
  • H0 b1 b2 b3 0
  • Ha At least one of the coefficients is nonzero
  • Test statistic
  • Rejection region F gtFa , where F is based on k
    numerator and n (k1) denominator degrees of
    freedom.

15
Caution
  • A rejection of the null hypothesis H0 b1b2bk
    0 in the global F-test leads to the conclusion
    with 100(1-a) confidence that the model is
    statistically useful. However, statistically
    useful does not necessarily mean best.
    Another model may prove even more useful in terms
    of providing more reliable estimates and
    predictions. This global F-test is usually
    regarded as a test that the model must pass to
    merit further consideration.

16
Using the Model for Estimation and Prediction
17
Definition 4.4
  • A parsimonious model is a model with a small
    number of b parameters. In situations where two
    competing models have essentially the same
    predictive power (as determined by an F test),
    choose the more parsimonious of the two.

18
Variable Screening Methods
  • Chapter 6

19
Why use a Variable Screening Method?
  • In this chapter, we consider two systematic
    methods designed to reduce a large list of
    potential predictors to a more manageable one.
    These techniques, known as variable screening
    procedures, objectively determine which
    independent variables in the list are the most
    important predictors of y and which are the least
    important predictors.

20
Stepwise Regression
  • One of the most widely used variable screening
    methods is known as stepwise regression. To run
    a stepwise regression, the user first identifies
    the dependent variable (response) y, and the set
    of potentially important independent variables,
    x1, x2, , xk, where k is generally large.
    Note This set of variables could include both
    first-order and higher-order terms as well as
    interactions. The data are entered into the
    computer software, and the stepwise procedure
    begins.

21
All-Possible Regression Selection Procedure
  • R2 Criterion
  • Adjusted R2 or MSE Criterion
  • Cp Criterion

22
Caveats
  • Both stepwise regression and the
    all-possible-regressions selection procedure are
    useful variable screening methods. Many
    regression analysts, however, tend to apply these
    procedures as model building methods. Why? The
    stepwise (or best subset) model will often have a
    high value of R2 and all the ß coefficients in
    the model will be significantly different from 0
    with small p-values. And, with very little work
    (other than collecting the data and entering it
    into the computer), you can obtain the model
    using a statistical software package.
    Consequently, it is extremely tempting to use the
    stepwise model as the final model for predicting
    and making inferences about the dependent
    variable, y.

23
Stepwise
  • Forward
  • Backward
  • True Stepwise

24
All Possible Subsets
  • If we have 4 independent variables, how many
    possible subsets do we have?
  • Mallows CP SSE_p/MSE_k) 2(P1) n
  • Choose p when Cp p and CP is small

25
P- Values
  • Ignore p values and hypothesis testing when using
    these procedures.

26
Transformations
  • A Summary

27
Regression
Hypotheses Testing
Prediction
28
Hypotheses Testing
We now can ask and answer questions about the
unknown parameters.
29
Hypotheses Testing Example
Thus x1 and x3 are important in explaining y, but
x2 is not We can not say much about the original
unknown parameters.
30
Prediction
31
Prediction - Continued
We can not say much about the original unknown
parameters.
32
Which Model is the Best ?
33
Which Model is the Best ?
Can you compare bell peppers and apples?
34
Which Model is the Best ?
  • To compare models using RSQ both models must have
    the same dependent variable
  • To compare models with different dependent
    variables, we use Predicted Mean Squares or PREDMS

35
PREDMS
For the original model, we use
36
PREDMS1
37
Example
38
Problem Points
  • High Leverage Point
  • High Influence Point

39
Figure 11.11(a) High Influence Points
40
Figure 11.11(b) Low Influence Points
41
Diagnostic Measures
  • Residuals
  • Residual Standard Deviation
  • Sample standard deviation around the regression
    line, the standard error of estimate, or the
    residual standard deviation.

42
SPSS
  • Cooks D
  • Leverage Points
Write a Comment
User Comments (0)
About PowerShow.com