Title: Chapter 10 Simple Regression
1Chapter 10Simple Regression
210.1 Correlation Analysis
3Introduction
The analysis of business and economic processes
makes extensive use of relationships between
variables.
4Correlation analysis
- Recall Covariance measures the relationship
between two random variables. It does not,
however, reveal the strength of this
relationship. - Correlation coefficient gives us some idea on how
strongly the two random variables are related. - Correlation measures the linear relationship.
- Correlation coefficient does not imply which
variable is dependent or independent.
5Hypothesis test for correlation
6Correlation analysis
7Tests for Zero Population Correlation
Let r be the sample correlation coefficient,
calculated from a random sample of n pairs of
observation from a joint normal distribution. The
following tests of the null hypothesis have a
significance value ? 1. To test H0 against the
alternative the decision rule is
8Tests for Zero Population Correlation(continued)
2. To test H0 against the two-sided
alternative the decision rule is Here, t
n-2,? is the number for which Where the random
variable tn-2 follows a Students t distribution
with (n 2) degrees of freedom.
9Example (10.5)
- A research team was attempting to determine if
political risk in countries is related to infant
mortalities for these countries. The sample
correlation between experts political riskiness
score and the infant mortality rate in these
countries was .75. - Test the null hypothesis of no correlation
between these quantities against the alternative
of positive correlation.
10- Set up hypothesis testing
- Using sample information
- The t-statistic is
- Thus, we can reject the null hypothesis at 1
significance level. The data strongly support
that there is a positive linear relationship
between infant mortality rate and experts
political riskiness.
11Exercise
- P374 10.7
- The sample correlation for 68 pairs of annual
returns on common stocks in Country A and Country
B was found to be 0.51. Test the null hypothesis
that the population correlation is 0, against the
alternative that it is positive. (using
alpha0.05)
12Linear Regression Model
Linear relationship
Indifferences between r.v.s
Linear relationship
Dependency between r.v.s
13Example 10.2
- The president of Knox Retailers has asked you to
develop a model that will predict total sales for
proposed new retail store locations. As part of
the project, you need to estimate a linear
equation that predicts retail sales per household
as a function of household disposable income.
They have obtained data from a national sampling
survey of households and the got the variables
such as retail sales (Y) and income (X) per
household.
14Data
15Plot the data
16- As the scatter gram shows, corresponding to any
given X (disposable income) there are values of Y
(retail sales). - In other words, each disposable income (X) has
associated with it a retail sale (Y)
populationthe totality of Y corresponding to
that X. - What does it reveal?
17- We have the impression that retail sales (Y)
generally increases as disposable income (X)
increases. - This tendency is more obvious if we focus on the
circled pointsthese give the expected, or
population mean of Y corresponding to the various
Xs.
18(No Transcript)
19- PRL is a line that tells us how the average value
of Y (or any independent variable) is related to
each value of X (or any independent variable).
PRL
20PRL
- Since PRL sketched in previous graph is linear,
we can express the average value mathematically
in the form - where is the intercept of the equation
and is the slope and they are called
parameters. - Slope measures the rate of change in the mean
value of Y per unit change in X. - Intercept is the mean value of Y if X is zero.
21Linear Regression Population Model
- PRL gives the average value of dependent
variable. But if we pick up one household at
random, the individual households retail sale
does not necessarily be equal to the mean sale
value. - How do we explain that?
- The best we can say is that the actual observed
sale is equal to the average for the given
household disposable income plus or minors some
quantity. - In a simple linear equation we model these other
factors by a random error term.
22(No Transcript)
23The nature of the random error
- It may represent the variables that are not
explicitly included in the model. - It reflects the inherent randomness in human
behavior. - It also reflects the errors of measurement.
- It also reflects the idea from Principle of
Occams razorthat descriptions be kept as simple
as possible until proved to be inadequate.
24(No Transcript)
25Linear regression model
- When we use least square method to estimate the
population model, we then obtain the estimated
regression model. - Estimated value of coefficients
- Predicted value of Y
- Residual
26e2
e1
27- Linear regression provides two important results
- Predicted values of the dependent or endogenous
variable as a function of an independent or
exogenous variable. - Estimated marginal change in the endogenous
variable that results from a one unit change in
the independent or exogenous variable.
28(No Transcript)
29Exercises
- Explain the difference between the residual ei
and the model error ei . - e i is the true model error which reflects the
random error in a population regression equation - ei is the residual term which is the difference
between the predicted value and the observed
value of Y. - ei is a combined measure of both the model error
and the errors in estimating b0 and b1.
30Least Square Coefficient Estimators
- How can we estimate the population regression
line? How can we find the values of beta0 and
beta1? - Find the estimators of unknown beta0 and beta1 by
using least square procedure.
31The Least-squares procedure obtains estimates of
the linear equation coefficients b0 and b1, in
the model by minimizing the sum of the squared
residuals ei This results in a procedure
stated as Choose b0 and b1 so that the
quantity SSE is minimized. We use differential
calculus to obtain the coefficient estimators
that minimize SSE.
32- Why we minimize SSE (error sum of squares)
instead of sum of ei?
33Least-Squares Derived Coefficient Estimators
The slope coefficient estimator is And the
constant or intercept indicator is We also
note that the regression line always goes through
the mean X, Y.
34Standard Assumptions for the Linear Regression
Model
- The xs are fixed numbers, or they are
realizations of random variable, X that are
independent of the error terms, ?is. In the
latter case, inference is carried out
conditionally on the observed values of the xs. - The error terms are random variables with mean 0
and the same variance, ?2. The later is called
homoscedasticity or uniform variance. - The random error terms, ?I, are not correlated
with one another, so that
35Computer Computation
The regression equation is Y Retail Sales 1922
0.382 X Income
36The Explanatory Power
- Analysis of Variance (ANOVA)
- The total variability in a regression analysis
(SST) can be partitioned into a component
explained by the regression (SSR), and a
component due to unexplained error (SSE)
37(No Transcript)
38Computation Results
39Coefficient of Determination, R2
The Coefficient of Determination for a regression
equation is defined as This quantity
varies from 0 to 1 and higher values indicate a
better regression. Caution should be used in
making general interpretations of R2 because a
high value can result from either a small SSE or
a large SST or both.
40Correlation and R2
The multiple coefficient of determination, R2,
for a simple regression is equal to the simple
correlation squared
41Computation
42Estimation of Model Error Variance
The quantity SSE is a measure of the total
squared deviation about the estimated regression
line, and ei is the residual. An estimator for
the variance of the population model error
is Division by n 2 instead of n 1 results
because the simple regression model uses two
estimated parameters, b0 and b1, instead of one.
43Sample variance estimators
If the standard least squares assumptions hold,
then b1 is an unbiased estimator of ?1 and an
unbiased sample variance estimator and an
unbiased sample variance estimator for b0
44Basis for Inference About the Population
Regression Slope
Let ?1 be a population regression slope and b1
its least squares estimate based on n pairs of
sample observations. Then, if the standard
regression assumptions hold and it can also be
assumed that the errors ?i are normally
distributed, the random variable is
distributed as Students t with (n 2) degrees
of freedom. In addition the central limit
theorem enables us to conclude that this result
is approximately valid for a wide range of
non-normal distributions and large sample sizes,
n.
45Tests of the Population Regression Slope
- If the regression errors ?i are normally
distributed and the standard least squares
assumptions hold (or if the distribution of b1 is
approximately normal), the following tests have
significance value ? - To test either null hypothesis
- against the alternative
- the decision rule is
46Tests of the Population Regression
Slope(continued)
- 2. To test either null hypothesis
- against the alternative
- the decision rule is
47Tests of the Population Regression
Slope(continued)
- 3. To test the null hypothesis
- Against the two-sided alternative
- the decision rule is
48Confidence Intervals for the Population
Regression Slope ?1
- If the regression errors ?i , are normally
distributed and the standard regression
assumptions hold, a 100(1 - ?) confidence
interval for the population regression slope ?1
is given by - Where t(n 2, ?/2) is the number for which
- And the random variable t(n 2) follows a
Students t distribution with (n 2) degrees of
freedom.
49F test for Simple Regression Coefficient
- We can test the hypothesis
- against the alternative
- By using the F statistic
- The decision rule is
- We can also show that the F statistic is
- For any simple regression analysis.
50Summary
- Analysis of Variance
- Assumptions for the Least Squares Coefficient
Estimators - Basis for Inference About the Population
Regression Slope - Coefficient of Determination, R2
- Confidence Intervals for Predictions
- Confidence Intervals for the Population
Regression Slope b1 - Correlation and R2
- Estimation of Model Error Variance
- F test for Simple Regression Coefficient
- Least-Squares Procedure
- Linear Regression Outcomes