Chapter 2 Simple Linear Regression - PowerPoint PPT Presentation

About This Presentation

Title:

Chapter 2 Simple Linear Regression

Description:

1. Chapter 2 Simple Linear Regression. Ray ... E(MSR) = 2 12Sxx. Reject H0 if F0 F /2,1, n-2 ... E(R2) increases (decreases) as Sxx increases (decreases) 50 ... – PowerPoint PPT presentation

Number of Views:104

Avg rating:3.0/5.0

Slides: 73

Provided by: Lis158

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 2 Simple Linear Regression

1
Chapter 2 Simple Linear Regression

Ray-Bing Chen
Institute of Statistics
National University of Kaohsiung

2
2.1 Simple Linear Regression Model

y ?0 ?1 x ?
x regressor variable
y response variable
?0 the intercept, unknown
?1 the slope, unknown
? error with E(?) 0 and Var(?) ?2 (unknown)
The errors are uncorrelated.

Given x,
E(yx) E(?0 ?1 x ?) ?0 ?1 x
Var(yx) Var(?0 ?1 x ?) ?2
Responses are also uncorrelated.
Regression coefficients ?0, ?1
?1 the change of E(yx) by a unit change in x
?0 E(yx0)

4
2.2 Least-squares Estimation of the Parameters

2.2.1 Estimation of ?0 and ?1
n pairs (yi, xi), i 1, , n
Method of least squares Minimize

Least-squares normal equations

The least-squares estimator

The fitted simple regression model
A point estimate of the mean of y for a
particular x
Residual
An important role in investigating the adequacy
of the fitted regression model and in detecting
departures from the underlying assumption!

Example 2.1 The Rocket Propellant Data
Shear strength is related to the age in weeks of
the batch of sustainer propellant.
20 observations
From scatter diagram, there is a strong
relationship between shear strength (y) and
propellant age (x).
Assumption
y ?0 ?1 x ?

9
(No Transcript)
10

The least-square fit

How well does this equation fit the data?
Is the model likely to be useful as a predictor?
Are any of the basic assumption violated and if
so how serious is this?

2.2.2 Properties of the Least-Squares Estimators
and the Fitted Regression Model
are linear combinations of yi
are unbiased estimators.

The Gauss-Markov Theorem
are the best linear unbiased estimators
(BLUE).

Some useful properties
The sum of the residuals in any regression model
that contains an intercept ?0 is always 0, i.e.
Regression line always passes through the
centroid point of data,

2.2.3 Estimator of ?2
Residual sum of squares

Since ,
the unbiased estimator of ?2 is
MSE is called the residual mean square.
This estimate is model-dependent.
Example 2.2

2.2.4 An Alternate Form of the Model
The new regression model
Normal equations
The least-squares estimators

Some advantages
The normal equations are easier to solve
are uncorrelated.

20
2.3 Hypothesis Testing on the Slope and Intercept

Assume ei are normally distributed
yi N(?0 ?1 xi , ?2 )
2.3.1 Use of t-Tests
Test on slope
H0 ?1 ?10 v.s. H1 ?1 ? ?10

If ?2 is known, under null hypothesis,
(n-2) MSE/?2 follows a ?2n-2
If ?2 is unknown,
Reject H0 if t0 gt t?/2, n-2

Test on intercept
H0 ?0 ?00 v.s. H1 ?0 ? ?00
If ?2 is unknown
Reject H0 if t0 gt t?/2, n-2

2.3.2 Testing Significance of Regression
H0 ?1 0 v.s. H1 ?1 ? 0
Accept H0 there is no linear relationship
between x and y.

Reject H0 x is of value in explaining the
variability in y.
Reject H0 if t0 gt t?/2, n-2

Example 2.3The Rocket Propellant Data
Test significance of regression
MSE 9244.59
the test statistic is
t0.0025,18 2.101
Reject H0

26
(No Transcript)
27

2.3.3 The Analysis of Variance (ANOVA)
Use an analysis of variance approach to test
significance of regression

SST the corrected sum of squares of the
observations. It measures the total variability
in the observations.
SSRes the residual or error sum of squares
The residual variation left unexplained by the
regression line.
SSR the regression or model sum of squares
The amount of variability in the observations
accounted for by the regression line
SST SSR SSRes

The degree-of-freedom
dfT n-1
dfR 1
dfRes n-2
dfT dfR dfRes
Test significance regression by ANOVA
SSRes (n-2) MSRes ?n-2
SSR MSR ?1
SSR and SSRes are independent

E(MSRes) ?2
E(MSR) ?2 ?12 Sxx
Reject H0 if F0 gt F?/2,1, n-2
If ?1? 0, F0 follows a noncentral F with 1 and
n-2 degree of freedom and a noncentrality
parameter

Example 2.4 The Rocket Propellant Data

More About the t Test
The square of a t random variable with f degree
of freedom is a F random variable with 1 and f
degree of freedom.

33
2.4 Interval Estimation in Simple Linear
Regression

2.4.1 Confidence Intervals on ?0, ?1 and ?2
Assume that ei are normally and independently
distributed

100(1-?) confidence intervals on ?0, ?1 are
given
Interpretation of C.I.
Confidence interval for ?2

Example 2.5 The Rocket Propellant Data

2.4.2 Interval Estimation of the Mean Response
Let x0 be the level of the regressor variable for
which we wish to estimate the mean response.
x0 is in the range of the original data on x.
An unbiased estimator of E(y x0) is

follows a normal distribution.

A 100(1-?) confidence interval on the mean
response at x0

Example 2.6 The Rocket Propellant Data

41
(No Transcript)
42

The interval width is a minimum for
and widens as increases.
Extrapolation

43
2.5 Prediction of New Observations

is the point estimate of the
new value of the response
follows a normal distribution with
mean 0 and variance

The 100(1-?) confidence interval on a future
observation at x0 (a prediction interval for the
future observation y0)

Example 2.7

46
(No Transcript)
47

The 100(1-?) confidence interval on

48
2.6 Coefficient of Determination

The coefficient of determination
The proportion of variation explained by the
regressor x
0 ? R2 ? 1

In Example 2.1, R2 0.9018. It means that 90.18
of the variability in strength is accounted for
by the regression model.
R2 can be increased by adding terms to the model.
For a simple regression model,
E(R2) increases (decreases) as Sxx increases
(decreases)

R2 does not measure the magnitude of the slope of
the regression line. A large value of R2 imply a
steep slope.
R2 does not measure the appropriateness of the
linear model.

51
2.7 Some Considerations in the Use of Regression

Only suitable for interpretation over the range
of the regressors, not for extrapolation.
Important The disposition of the x values. Slope
strongly influenced by the remote values of x.
Outliers and bad values can seriously disturb the
least-square fit. (intercept and the residual
mean square)
Dont imply the cause and effect relationship

52
(No Transcript)
53
(No Transcript)
54

The t statistic for testing H0 ?1 0 for this
model is t0 27.312 and R2 0.9842

x may be unknown. For example consider
predicting maximum daily load on an electric
power generation system from a regression model
relating the load to the maximum daily
temperature.

56
2.8 Regression Through the Origin

A no-intercept model is
Given (yi, xi), i 1 2 ,, n,

The 100(1-?) confidence interval on ?1
The 100(1-?) confidence interval on E(y x0)
The 100(1-?) confidence interval on y0

Misuse data lie in a region of x-space remote
from the origin.

The residual mean square, MSRes
Generally R2 is not a good comparative statistic
for two models.
For the intercept model,
For the no-intercept model,
Occasionally R02 gt R2 , but MS0,Res lt MSRes

Example 2.8 The Shelf-Stocking Data

61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
64
2.9 Estimation by Maximum Likelihood

Assume that the errors are NID(0, ?2). Then yi
N(?0 ?1xi, ?2)
The likelihood function

MLE v.s. LSE
In general MLE have better statistical
properties than LSE.
MLE are unbiased (asymptotically unbiased) and
have minimum variance when compare to all the
other unbiased estimators.
They are also consistent estimators.
They are a set of sufficient statistics.

MLE requires more stringent statistical
assumptions than LSE.
LSE only need to have the second moment
assumptions.
MLE require a full distributional assumption.

67
2.10 Case Where the Regressor x Is Random

2.10.1 x and y Jointly Distributed
x and y are jointly distributed r.v. and this
joint distribution is unknown.
All of our previous results hold if
yx N(?0 ?1x, ?2)
The xs are independent r.v.s whose probability
distribution does not involve ?0, ?1, ?2