Chapter 10 Simple Regression - PowerPoint PPT Presentation

1 / 50

About This Presentation

Title:

Chapter 10 Simple Regression

Description:

The analysis of business and economic processes makes extensive use of ... a model that will predict total sales for proposed new retail store locations. ... – PowerPoint PPT presentation

Number of Views:72

Avg rating:3.0/5.0

Slides: 51

Provided by: hui

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 10 Simple Regression

1
Chapter 10Simple Regression
2
10.1 Correlation Analysis
3
Introduction
The analysis of business and economic processes
makes extensive use of relationships between
variables.
4
Correlation analysis

Recall Covariance measures the relationship
between two random variables. It does not,
however, reveal the strength of this
relationship.
Correlation coefficient gives us some idea on how
strongly the two random variables are related.
Correlation measures the linear relationship.
Correlation coefficient does not imply which
variable is dependent or independent.

5
Hypothesis test for correlation
6
Correlation analysis
7
Tests for Zero Population Correlation
Let r be the sample correlation coefficient,
calculated from a random sample of n pairs of
observation from a joint normal distribution. The
following tests of the null hypothesis have a
significance value ? 1. To test H0 against the
alternative the decision rule is
8
Tests for Zero Population Correlation(continued)
2. To test H0 against the two-sided
alternative the decision rule is Here, t
n-2,? is the number for which Where the random
variable tn-2 follows a Students t distribution
with (n 2) degrees of freedom.
9
Example (10.5)

A research team was attempting to determine if
political risk in countries is related to infant
mortalities for these countries. The sample
correlation between experts political riskiness
score and the infant mortality rate in these
countries was .75.
Test the null hypothesis of no correlation
between these quantities against the alternative
of positive correlation.

Set up hypothesis testing
Using sample information
The t-statistic is
Thus, we can reject the null hypothesis at 1
significance level. The data strongly support
that there is a positive linear relationship
between infant mortality rate and experts
political riskiness.

11
Exercise

P374 10.7
The sample correlation for 68 pairs of annual
returns on common stocks in Country A and Country
B was found to be 0.51. Test the null hypothesis
that the population correlation is 0, against the
alternative that it is positive. (using
alpha0.05)

12
Linear Regression Model
Linear relationship
Indifferences between r.v.s
Linear relationship
Dependency between r.v.s
13
Example 10.2

The president of Knox Retailers has asked you to
develop a model that will predict total sales for
proposed new retail store locations. As part of
the project, you need to estimate a linear
equation that predicts retail sales per household
as a function of household disposable income.
They have obtained data from a national sampling
survey of households and the got the variables
such as retail sales (Y) and income (X) per
household.

14
Data
15
Plot the data
16

As the scatter gram shows, corresponding to any
given X (disposable income) there are values of Y
(retail sales).
In other words, each disposable income (X) has
associated with it a retail sale (Y)
populationthe totality of Y corresponding to
that X.
What does it reveal?

We have the impression that retail sales (Y)
generally increases as disposable income (X)
increases.
This tendency is more obvious if we focus on the
circled pointsthese give the expected, or
population mean of Y corresponding to the various
Xs.

18
(No Transcript)
19

PRL is a line that tells us how the average value
of Y (or any independent variable) is related to
each value of X (or any independent variable).

PRL
20
PRL

Since PRL sketched in previous graph is linear,
we can express the average value mathematically
in the form
where is the intercept of the equation
and is the slope and they are called
parameters.
Slope measures the rate of change in the mean
value of Y per unit change in X.
Intercept is the mean value of Y if X is zero.

21
Linear Regression Population Model

PRL gives the average value of dependent
variable. But if we pick up one household at
random, the individual households retail sale
does not necessarily be equal to the mean sale
value.
How do we explain that?
The best we can say is that the actual observed
sale is equal to the average for the given
household disposable income plus or minors some
quantity.
In a simple linear equation we model these other
factors by a random error term.

22
(No Transcript)
23
The nature of the random error

It may represent the variables that are not
explicitly included in the model.
It reflects the inherent randomness in human
behavior.
It also reflects the errors of measurement.
It also reflects the idea from Principle of
Occams razorthat descriptions be kept as simple
as possible until proved to be inadequate.

24
(No Transcript)
25
Linear regression model

When we use least square method to estimate the
population model, we then obtain the estimated
regression model.
Estimated value of coefficients
Predicted value of Y
Residual

26
e2

e1

27

Linear regression provides two important results
Predicted values of the dependent or endogenous
variable as a function of an independent or
exogenous variable.
Estimated marginal change in the endogenous
variable that results from a one unit change in
the independent or exogenous variable.

28
(No Transcript)
29
Exercises

Explain the difference between the residual ei
and the model error ei .
e i is the true model error which reflects the
random error in a population regression equation
ei is the residual term which is the difference
between the predicted value and the observed
value of Y.
ei is a combined measure of both the model error
and the errors in estimating b0 and b1.

30
Least Square Coefficient Estimators

How can we estimate the population regression
line? How can we find the values of beta0 and
beta1?
Find the estimators of unknown beta0 and beta1 by
using least square procedure.

31
The Least-squares procedure obtains estimates of
the linear equation coefficients b0 and b1, in
the model by minimizing the sum of the squared
residuals ei This results in a procedure
stated as Choose b0 and b1 so that the
quantity SSE is minimized. We use differential
calculus to obtain the coefficient estimators
that minimize SSE.
32

Why we minimize SSE (error sum of squares)
instead of sum of ei?

33
Least-Squares Derived Coefficient Estimators
The slope coefficient estimator is And the
constant or intercept indicator is We also
note that the regression line always goes through
the mean X, Y.
34
Standard Assumptions for the Linear Regression
Model

The xs are fixed numbers, or they are
realizations of random variable, X that are
independent of the error terms, ?is. In the
latter case, inference is carried out
conditionally on the observed values of the xs.
The error terms are random variables with mean 0
and the same variance, ?2. The later is called
homoscedasticity or uniform variance.
The random error terms, ?I, are not correlated
with one another, so that

35
Computer Computation
The regression equation is Y Retail Sales 1922
0.382 X Income
36
The Explanatory Power

Analysis of Variance (ANOVA)
The total variability in a regression analysis
(SST) can be partitioned into a component
explained by the regression (SSR), and a
component due to unexplained error (SSE)

37
(No Transcript)
38
Computation Results
39
Coefficient of Determination, R2
The Coefficient of Determination for a regression
equation is defined as This quantity
varies from 0 to 1 and higher values indicate a
better regression. Caution should be used in
making general interpretations of R2 because a
high value can result from either a small SSE or
a large SST or both.
40
Correlation and R2
The multiple coefficient of determination, R2,
for a simple regression is equal to the simple
correlation squared
41
Computation
42
Estimation of Model Error Variance
The quantity SSE is a measure of the total
squared deviation about the estimated regression
line, and ei is the residual. An estimator for
the variance of the population model error
is Division by n 2 instead of n 1 results
because the simple regression model uses two
estimated parameters, b0 and b1, instead of one.
43
Sample variance estimators
If the standard least squares assumptions hold,
then b1 is an unbiased estimator of ?1 and an
unbiased sample variance estimator and an
unbiased sample variance estimator for b0
44
Basis for Inference About the Population
Regression Slope
Let ?1 be a population regression slope and b1
its least squares estimate based on n pairs of
sample observations. Then, if the standard
regression assumptions hold and it can also be
assumed that the errors ?i are normally
distributed, the random variable is
distributed as Students t with (n 2) degrees
of freedom. In addition the central limit
theorem enables us to conclude that this result
is approximately valid for a wide range of
non-normal distributions and large sample sizes,
n.
45
Tests of the Population Regression Slope

If the regression errors ?i are normally
distributed and the standard least squares
assumptions hold (or if the distribution of b1 is
approximately normal), the following tests have
significance value ?
To test either null hypothesis
against the alternative
the decision rule is

46
Tests of the Population Regression
Slope(continued)

2. To test either null hypothesis
against the alternative
the decision rule is

47
Tests of the Population Regression
Slope(continued)

3. To test the null hypothesis
Against the two-sided alternative
the decision rule is

48
Confidence Intervals for the Population
Regression Slope ?1

If the regression errors ?i , are normally
distributed and the standard regression
assumptions hold, a 100(1 - ?) confidence
interval for the population regression slope ?1
is given by
Where t(n 2, ?/2) is the number for which
And the random variable t(n 2) follows a
Students t distribution with (n 2) degrees of
freedom.

49
F test for Simple Regression Coefficient