Title: Simple Linear Regression Populations and Parameters
1Simple Linear Regression Populations and
Parameters
2Simple Regression
- Single independent variable
- Dependent variable y
- Linear function
3Assumption of Linearity
- Assumption of Linearity the slope of the
equation does not change as x changes - Linearity is not always a reasonable assumption
4Random Error Term
- Random Error Term
- The model
- When we assume that the s are constants, the
only random portion of the model for is the
random error term .
5Formal Assumptions of Regression Analysis
- The relation is, in fact, linear, so that the
errors all have expected value zero for
all i. - The errors all have the same varianceVar
for all i. - The errors are independent of each other
- The errors are all normally distributed is
normally distributed for all i.
6Figure 11.2 Theoretical Distribution of y in
Regression
7Estimating Model Parameters
- Intercept
- Slope
- is another population parameter
8Multiple Regression Models
9The Method of Least Squares
- That is, we choose the estimated model
- that minimizes
- least squares prediction equation
10Scatterplots for the Data of Table 4.1
11Estimator of s2 for Multiple Regression Model
with k Independent Variables
s2 is called the mean square for error (MSE)
12Definition 4.1
- The multiple coefficient of determination, R 2,
is defined as - where
, and is the predicted value of yi for the
multiple regression model.
13Adjusted Multiple Coefficient of Determination
14Global Test
- H0 b1 b2 b3 0
- Ha At least one of the coefficients is nonzero
- Test statistic
- Rejection region F gtFa , where F is based on k
numerator and n (k1) denominator degrees of
freedom.
15Caution
- A rejection of the null hypothesis H0 b1b2bk
0 in the global F-test leads to the conclusion
with 100(1-a) confidence that the model is
statistically useful. However, statistically
useful does not necessarily mean best.
Another model may prove even more useful in terms
of providing more reliable estimates and
predictions. This global F-test is usually
regarded as a test that the model must pass to
merit further consideration.
16Using the Model for Estimation and Prediction
17Definition 4.4
- A parsimonious model is a model with a small
number of b parameters. In situations where two
competing models have essentially the same
predictive power (as determined by an F test),
choose the more parsimonious of the two.
18Variable Screening Methods
19Why use a Variable Screening Method?
- In this chapter, we consider two systematic
methods designed to reduce a large list of
potential predictors to a more manageable one.
These techniques, known as variable screening
procedures, objectively determine which
independent variables in the list are the most
important predictors of y and which are the least
important predictors.
20Stepwise Regression
- One of the most widely used variable screening
methods is known as stepwise regression. To run
a stepwise regression, the user first identifies
the dependent variable (response) y, and the set
of potentially important independent variables,
x1, x2, , xk, where k is generally large.
Note This set of variables could include both
first-order and higher-order terms as well as
interactions. The data are entered into the
computer software, and the stepwise procedure
begins.
21All-Possible Regression Selection Procedure
- R2 Criterion
- Adjusted R2 or MSE Criterion
- Cp Criterion
22Caveats
- Both stepwise regression and the
all-possible-regressions selection procedure are
useful variable screening methods. Many
regression analysts, however, tend to apply these
procedures as model building methods. Why? The
stepwise (or best subset) model will often have a
high value of R2 and all the ß coefficients in
the model will be significantly different from 0
with small p-values. And, with very little work
(other than collecting the data and entering it
into the computer), you can obtain the model
using a statistical software package.
Consequently, it is extremely tempting to use the
stepwise model as the final model for predicting
and making inferences about the dependent
variable, y.
23Stepwise
- Forward
- Backward
- True Stepwise
24All Possible Subsets
- If we have 4 independent variables, how many
possible subsets do we have? - Mallows CP SSE_p/MSE_k) 2(P1) n
- Choose p when Cp p and CP is small
25P- Values
- Ignore p values and hypothesis testing when using
these procedures.
26Transformations
27Regression
Hypotheses Testing
Prediction
28Hypotheses Testing
We now can ask and answer questions about the
unknown parameters.
29Hypotheses Testing Example
Thus x1 and x3 are important in explaining y, but
x2 is not We can not say much about the original
unknown parameters.
30Prediction
31Prediction - Continued
We can not say much about the original unknown
parameters.
32Which Model is the Best ?
33Which Model is the Best ?
Can you compare bell peppers and apples?
34Which Model is the Best ?
- To compare models using RSQ both models must have
the same dependent variable - To compare models with different dependent
variables, we use Predicted Mean Squares or PREDMS
35PREDMS
For the original model, we use
36PREDMS1
37Example
38Problem Points
- High Leverage Point
- High Influence Point
39Figure 11.11(a) High Influence Points
40Figure 11.11(b) Low Influence Points
41Diagnostic Measures
- Residuals
- Residual Standard Deviation
- Sample standard deviation around the regression
line, the standard error of estimate, or the
residual standard deviation.
42SPSS