Chapter 10 Simple Regression - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Chapter 10 Simple Regression

Description:

The analysis of business and economic processes makes extensive use of ... a model that will predict total sales for proposed new retail store locations. ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 51
Provided by: hui
Category:

less

Transcript and Presenter's Notes

Title: Chapter 10 Simple Regression


1
Chapter 10Simple Regression
2
10.1 Correlation Analysis
3
Introduction
The analysis of business and economic processes
makes extensive use of relationships between
variables.
4
Correlation analysis
  • Recall Covariance measures the relationship
    between two random variables. It does not,
    however, reveal the strength of this
    relationship.
  • Correlation coefficient gives us some idea on how
    strongly the two random variables are related.
  • Correlation measures the linear relationship.
  • Correlation coefficient does not imply which
    variable is dependent or independent.

5
Hypothesis test for correlation
6
Correlation analysis
7
Tests for Zero Population Correlation
Let r be the sample correlation coefficient,
calculated from a random sample of n pairs of
observation from a joint normal distribution. The
following tests of the null hypothesis have a
significance value ? 1. To test H0 against the
alternative the decision rule is
8
Tests for Zero Population Correlation(continued)
2. To test H0 against the two-sided
alternative the decision rule is Here, t
n-2,? is the number for which Where the random
variable tn-2 follows a Students t distribution
with (n 2) degrees of freedom.
9
Example (10.5)
  • A research team was attempting to determine if
    political risk in countries is related to infant
    mortalities for these countries. The sample
    correlation between experts political riskiness
    score and the infant mortality rate in these
    countries was .75.
  • Test the null hypothesis of no correlation
    between these quantities against the alternative
    of positive correlation.

10
  • Set up hypothesis testing
  • Using sample information
  • The t-statistic is
  • Thus, we can reject the null hypothesis at 1
    significance level. The data strongly support
    that there is a positive linear relationship
    between infant mortality rate and experts
    political riskiness.

11
Exercise
  • P374 10.7
  • The sample correlation for 68 pairs of annual
    returns on common stocks in Country A and Country
    B was found to be 0.51. Test the null hypothesis
    that the population correlation is 0, against the
    alternative that it is positive. (using
    alpha0.05)

12
Linear Regression Model
Linear relationship
Indifferences between r.v.s
Linear relationship
Dependency between r.v.s
13
Example 10.2
  • The president of Knox Retailers has asked you to
    develop a model that will predict total sales for
    proposed new retail store locations. As part of
    the project, you need to estimate a linear
    equation that predicts retail sales per household
    as a function of household disposable income.
    They have obtained data from a national sampling
    survey of households and the got the variables
    such as retail sales (Y) and income (X) per
    household.

14
Data
15
Plot the data
16
  • As the scatter gram shows, corresponding to any
    given X (disposable income) there are values of Y
    (retail sales).
  • In other words, each disposable income (X) has
    associated with it a retail sale (Y)
    populationthe totality of Y corresponding to
    that X.
  • What does it reveal?

17
  • We have the impression that retail sales (Y)
    generally increases as disposable income (X)
    increases.
  • This tendency is more obvious if we focus on the
    circled pointsthese give the expected, or
    population mean of Y corresponding to the various
    Xs.

18
(No Transcript)
19
  • PRL is a line that tells us how the average value
    of Y (or any independent variable) is related to
    each value of X (or any independent variable).

PRL
20
PRL
  • Since PRL sketched in previous graph is linear,
    we can express the average value mathematically
    in the form
  • where is the intercept of the equation
    and is the slope and they are called
    parameters.
  • Slope measures the rate of change in the mean
    value of Y per unit change in X.
  • Intercept is the mean value of Y if X is zero.

21
Linear Regression Population Model
  • PRL gives the average value of dependent
    variable. But if we pick up one household at
    random, the individual households retail sale
    does not necessarily be equal to the mean sale
    value.
  • How do we explain that?
  • The best we can say is that the actual observed
    sale is equal to the average for the given
    household disposable income plus or minors some
    quantity.
  • In a simple linear equation we model these other
    factors by a random error term.

22
(No Transcript)
23
The nature of the random error
  • It may represent the variables that are not
    explicitly included in the model.
  • It reflects the inherent randomness in human
    behavior.
  • It also reflects the errors of measurement.
  • It also reflects the idea from Principle of
    Occams razorthat descriptions be kept as simple
    as possible until proved to be inadequate.

24
(No Transcript)
25
Linear regression model
  • When we use least square method to estimate the
    population model, we then obtain the estimated
    regression model.
  • Estimated value of coefficients
  • Predicted value of Y
  • Residual

26
e2

e1

27
  • Linear regression provides two important results
  • Predicted values of the dependent or endogenous
    variable as a function of an independent or
    exogenous variable.
  • Estimated marginal change in the endogenous
    variable that results from a one unit change in
    the independent or exogenous variable.

28
(No Transcript)
29
Exercises
  • Explain the difference between the residual ei
    and the model error ei .
  • e i is the true model error which reflects the
    random error in a population regression equation
  • ei is the residual term which is the difference
    between the predicted value and the observed
    value of Y.
  • ei is a combined measure of both the model error
    and the errors in estimating b0 and b1.

30
Least Square Coefficient Estimators
  • How can we estimate the population regression
    line? How can we find the values of beta0 and
    beta1?
  • Find the estimators of unknown beta0 and beta1 by
    using least square procedure.

31
The Least-squares procedure obtains estimates of
the linear equation coefficients b0 and b1, in
the model by minimizing the sum of the squared
residuals ei This results in a procedure
stated as Choose b0 and b1 so that the
quantity SSE is minimized. We use differential
calculus to obtain the coefficient estimators
that minimize SSE.
32
  • Why we minimize SSE (error sum of squares)
    instead of sum of ei?

33
Least-Squares Derived Coefficient Estimators
The slope coefficient estimator is And the
constant or intercept indicator is We also
note that the regression line always goes through
the mean X, Y.
34
Standard Assumptions for the Linear Regression
Model
  • The xs are fixed numbers, or they are
    realizations of random variable, X that are
    independent of the error terms, ?is. In the
    latter case, inference is carried out
    conditionally on the observed values of the xs.
  • The error terms are random variables with mean 0
    and the same variance, ?2. The later is called
    homoscedasticity or uniform variance.
  • The random error terms, ?I, are not correlated
    with one another, so that

35
Computer Computation
The regression equation is Y Retail Sales 1922
0.382 X Income
36
The Explanatory Power
  • Analysis of Variance (ANOVA)
  • The total variability in a regression analysis
    (SST) can be partitioned into a component
    explained by the regression (SSR), and a
    component due to unexplained error (SSE)

37
(No Transcript)
38
Computation Results
39
Coefficient of Determination, R2
The Coefficient of Determination for a regression
equation is defined as This quantity
varies from 0 to 1 and higher values indicate a
better regression. Caution should be used in
making general interpretations of R2 because a
high value can result from either a small SSE or
a large SST or both.
40
Correlation and R2
The multiple coefficient of determination, R2,
for a simple regression is equal to the simple
correlation squared
41
Computation
42
Estimation of Model Error Variance
The quantity SSE is a measure of the total
squared deviation about the estimated regression
line, and ei is the residual. An estimator for
the variance of the population model error
is Division by n 2 instead of n 1 results
because the simple regression model uses two
estimated parameters, b0 and b1, instead of one.
43
Sample variance estimators
If the standard least squares assumptions hold,
then b1 is an unbiased estimator of ?1 and an
unbiased sample variance estimator and an
unbiased sample variance estimator for b0
44
Basis for Inference About the Population
Regression Slope
Let ?1 be a population regression slope and b1
its least squares estimate based on n pairs of
sample observations. Then, if the standard
regression assumptions hold and it can also be
assumed that the errors ?i are normally
distributed, the random variable is
distributed as Students t with (n 2) degrees
of freedom. In addition the central limit
theorem enables us to conclude that this result
is approximately valid for a wide range of
non-normal distributions and large sample sizes,
n.
45
Tests of the Population Regression Slope
  • If the regression errors ?i are normally
    distributed and the standard least squares
    assumptions hold (or if the distribution of b1 is
    approximately normal), the following tests have
    significance value ?
  • To test either null hypothesis
  • against the alternative
  • the decision rule is

46
Tests of the Population Regression
Slope(continued)
  • 2. To test either null hypothesis
  • against the alternative
  • the decision rule is

47
Tests of the Population Regression
Slope(continued)
  • 3. To test the null hypothesis
  • Against the two-sided alternative
  • the decision rule is

48
Confidence Intervals for the Population
Regression Slope ?1
  • If the regression errors ?i , are normally
    distributed and the standard regression
    assumptions hold, a 100(1 - ?) confidence
    interval for the population regression slope ?1
    is given by
  • Where t(n 2, ?/2) is the number for which
  • And the random variable t(n 2) follows a
    Students t distribution with (n 2) degrees of
    freedom.

49
F test for Simple Regression Coefficient
  • We can test the hypothesis
  • against the alternative
  • By using the F statistic
  • The decision rule is
  • We can also show that the F statistic is
  • For any simple regression analysis.

50
Summary
  • Analysis of Variance
  • Assumptions for the Least Squares Coefficient
    Estimators
  • Basis for Inference About the Population
    Regression Slope
  • Coefficient of Determination, R2
  • Confidence Intervals for Predictions
  • Confidence Intervals for the Population
    Regression Slope b1
  • Correlation and R2
  • Estimation of Model Error Variance
  • F test for Simple Regression Coefficient
  • Least-Squares Procedure
  • Linear Regression Outcomes
Write a Comment
User Comments (0)
About PowerShow.com