Title: MULTIPLE REGRESSION
1MULTIPLE REGRESSION
2The Multiple Regression Model
- The multiple regression model is an extension of
the simple linear regression model introduced in
bivariate analysis - The number of explanatory variables is now p,
denoted by X1, X2, X3, ,Xp and the model is
given by - y ?0 ?1x1 ?2x2 ?3x3 ?pxp u
- where VU and EU 0
3Example
- The data for this example was derived from an air
pollution study in forty cities. The variables
are defined as follows - TMR Total mortality rate
- SMIN, SMEAN, SMAX biweekly sulphate reading,
smallest annual, average annual and largest
annual respectively - PMIN, PMEAN, PMAX biweekly suspended
particulate reading, smallest annual, average
annual and largest annual
4Example
- GE65 percent of population at least 65
multiplied by 10 - NONPOOR percent of families above poverty level
- PERWH percent of whites in population
- LPOP logarithm of population
- PM2 population density
5Example
6Correlation Matrix
- Shows simple correlation relationship among all
possible pairs of variables - Useful for understanding how the explanatory
variables influence the dependent variable - Useful for showing the correlation among the
explanatory variables
7Example
8Least Squares Estimation
- As in the case of the simple linear regression
model least squares is used to estimate the
unknown parameters by minimizing the expression - with respect to the parameters ?0, ?1, ?2, , ?p
9Least Squares Estimation
- The least squares estimators can be defined by
the matrix expression given by the vector of
coefficients - b (X? X)-1 X? y and the equation
- i 1, 2, , n
10Least Squares Estimation
11Least Squares Estimation
12Least Squares Estimation
- The estimator of is given by
13Example
14Example
15Properties of Estimators of The Coefficients
- Least squares estimators are unbiased and minimum
variance - The standard error of the coefficient estimator
bj is estimated by cjjs where cjj is the
diagonal element of (X? X)-1 corresponding to bj - The statistic given by bj /cjjs has a t
distribution with (n-p-1) degrees of freedom if
?j 0
16Inference for Regression Coefficients
- A 100(1-?) confidence interval for ?j is given
by
17Inference for Regression Coefficients
- To test the null hypothesis H0 ?j 0 we employ
the test statistic - Which has a t distribution with (n-p-1) degrees
of freedom if the null hypothesis is true
18Example
19Multiple Coefficient of Determination
- An extension of the coefficient of determination
goodness of fit measure introduced for the simple
linear regression model is the multiple
coefficient of determination - The definition is identical to the coefficient of
determination definition given by - R2 SSR/SST
20Multiple Coefficient of Determination
- The sums of squares have the same definitions as
in simple linear regression - SST
- SSR
- SSE SST - SSR
21F Test of Goodness of Fit
- A test for the overall goodness of fit is given
by the F-test similar to the F- test used in
simple linear regression - H0 ?1 ?2 ?p 0
- The test statistic is given by
- F
22Analysis of Variance Table
- The information is usually summarized in the
analysis of variance table given by
23Example
24Reduced Models
- In practice it is of interest to study reduced
models which contain only a subset of the set of
possible explanatory variables - In order to compare various models in which one
is a subset of another we require a statistical
test which will indicate whether there has been a
loss of explanatory power by reducing the number
of explanatory variables - The partial F- test is outlined below for this
purpose
25Example
26Reduced Models
- The two models being compared are called the full
model and the reduced model - The full model contains all p explanatory
variables and is given by - y ?0 ?1x1 ?qxq ?q1xq1 ?pxp u
- The reduced model eliminates the first q
explanatory variables and is given by - y ?0 ?q1xq1 ?pxp u
27Reduced Models
- We wish to test the null hypothesis that the
reduced model is as good as the full model for
explaining the variation in y - Hence we have H0 Reduced model as good as Full
model - Equivalently we are testing that the coefficients
for the first q explanatory variables are zero - Hence H0 ?1 ?2 ?q 0
- Note that the order of the variables in the model
is arbitrary so we assume the first q
28Comparing Full and Reduced Models
- Denote the sums of squares and coefficient of
multiple determination for the full model by
SST, SSR, SSE and R2 - Denote the reduced model sums of squares and
coefficient of multiple determination by SSRR
and - Note that the total sum of squares remains fixed
29Test Statistic For Comparing Full and Reduced
Models
- The test statistic is given by
- which has an F distribution with q and (n-p-1)
d.f. of H0 is true
30Examples
31Examples
32Examples
H0 Reduced model as good as Full model or the
extra variables are superfluous HA Full model
superior or at least one of the 6 variables is
important
33Examples
- Mean of F distribution is usually near 1
- Since F is much less than 1 obviously cannot
reject H0 - Note F0.05,6,28 2.45, F0.01,6,28 3.53,
F0.10,6,28 2.00
34Confidence Interval For The Mean
- At X xj (a particular value of X) denote the
estimator of y by - A confidence interval for the mean value of y at
X xj is given by
35Confidence Interval For Individual Predictions
- A confidence interval for a particular value of y
at X xj is given by - Note the extra term of 1 which includes the
variation around the mean
36Examples
37Examples