Introduction to Linear Regression and Correlation Analysis PowerPoint PPT Presentation

presentation player overlay
1 / 58
About This Presentation
Transcript and Presenter's Notes

Title: Introduction to Linear Regression and Correlation Analysis


1
Chapter 11
  • Introduction to Linear Regression and Correlation
    Analysis

2
Chapter 11 - Chapter Outcomes
  • After studying the material in this chapter, you
    should be able to
  • Calculate and interpret the simple correlation
    between two variables.
  • Determine whether the correlation is significant.
  • Calculate and interpret the simple linear
    regression coefficients for a set of data.
  • Understand the basic assumptions behind
    regression analysis.
  • Determine whether a regression model is
    significant.

3
Chapter 11 - Chapter Outcomes(continued)
  • After studying the material in this chapter, you
    should be able to
  • Calculate and interpret confidence intervals for
    the regression coefficients.
  • Recognize regression analysis applications for
    purposes of prediction and description.
  • Recognize some potential problems if regression
    analysis is used incorrectly.
  • Recognize several nonlinear relationships between
    two variables.

4
Scatter Diagrams
  • A scatter plot is a graph that may be used to
    represent the relationship between two variables.
    Also referred to as a scatter diagram.

5
Dependent and Independent Variables
  • A dependent variable is the variable to be
    predicted or explained in a regression model.
    This variable is assumed to be functionally
    related to the independent variable.

6
Dependent and Independent Variables
  • An independent variable is the variable related
    to the dependent variable in a regression
    equation. The independent variable is used in a
    regression model to estimate the value of the
    dependent variable.

7
Two Variable Relationships(Figure 11-1)
Y
X
(a) Linear
8
Two Variable Relationships(Figure 11-1)
Y
X
(b) Linear
9
Two Variable Relationships(Figure 11-1)
Y
X
(c) Curvilinear
10
Two Variable Relationships(Figure 11-1)
Y
X
(d) Curvilinear
11
Two Variable Relationships(Figure 11-1)
Y
X
(e) No Relationship
12
Correlation
  • The correlation coefficient is a quantitative
    measure of the strength of the linear
    relationship between two variables. The
    correlation ranges from 1.0 to - 1.0. A
    correlation of ? 1.0 indicates a perfect linear
    relationship, whereas a correlation of 0
    indicates no linear relationship.

13
Correlation
  • SAMPLE CORRELATION COEFFICIENT
  • where
  • r Sample correlation coefficient
  • n Sample size
  • x Value of the independent variable
  • y Value of the dependent variable

14
Correlation
  • SAMPLE CORRELATION COEFFICIENT
  • or the algebraic equivalent

15
Correlation(Example 11-1)
(Table 11-1)
16
Correlation(Example 11-1)
17
Correlation(Example 11-1)
Correlation between Years and Sales
Excel Correlation Output (Figure 11-5)
18
Correlation
  • TEST STATISTIC FOR CORRELATION
  • where
  • t Number of standard deviations r is from 0
  • r Simple correlation coefficient
  • n Sample size

19
Correlation Significance Test(Example 11-1)
Rejection Region ? /2 0.025
Rejection Region ? /2 0.025
Since t4.752 gt 2.048, reject H0, there is a
significant linear relationship
20
Correlation
  • Spurious correlation occurs when there is a
    correlation between two otherwise unrelated
    variables.

21
Simple Linear Regression Analysis
  • Simple linear regression analysis analyzes the
    linear relationship that exists between a
    dependent variable and a single independent
    variable.

22
Simple Linear Regression Analysis
  • SIMPLE LINEAR REGRESSION MODEL (POPULATION MODEL)
  • where
  • y Value of the dependent variable
  • x Value of the independent variable
  • Populations y-intercept
  • Slope of the population regression line
  • Error term, or residual

23
Simple Linear Regression Analysis
  • The simple linear regression model has four
    assumptions
  • Individual values if the error terms, ?i, are
    statistically independent of one another.
  • The distribution of all possible values of ? is
    normal.
  • The distributions of possible ?i values have
    equal variances for all value of x.
  • The means of the dependent variable, for all
    specified values of the independent variable, y,
    can be connected by a straight line called the
    population regression model.

24
Simple Linear Regression Analysis
  • REGRESSION COEFFICIENTS
  • In the simple regression model, there are two
    coefficients the intercept and the slope.

25
Simple Linear Regression Analysis
  • The interpretation of the regression slope
    coefficient is that is gives the average change
    in the dependent variable for a unit increase in
    the independent variable. The slope coefficient
    may be positive or negative, depending on the
    relationship between the two variables.

26
Simple Linear Regression Analysis
  • The least squares criterion is used for
    determining a regression line that minimizes the
    sum of squared residuals.

27
Simple Linear Regression Analysis
  • A residual is the difference between the actual
    value of the dependent variable and the value
    predicted by the regression model.

28
Simple Linear Regression Analysis
Y
390
400
Sales in Thousands
300
312
200
Residual 312 - 390 -78
100
X
4
Years with Company
29
Simple Linear Regression Analysis
  • ESTIMATED REGRESSION MODEL
  • (SAMPLE MODEL)
  • where
  • Estimated, or predicted, y value
  • b0 Unbiased estimate of the regression
    intercept
  • b1 Unbiased estimate of the regression slope
  • x Value of the independent variable

30
Simple Linear Regression Analysis
  • LEAST SQUARES EQUATIONS
  • algebraic equivalent
  • and

31
Simple Linear Regression Analysis
  • SUM OF SQUARED ERRORS

32
Simple Linear Regression Analysis (Midwest
Example)
(Table 11-3)
33
Simple Linear Regression Analysis (Table 11-3)
The least squares regression line is
34
Simple Linear Regression Analysis(Figure 11-11)
Excel Midwest Distribution Results
35
Least Squares Regression Properties
  • The sum of the residuals from the least squares
    regression line is 0.
  • The sum of the squared residuals is a minimum.
  • The simple regression line always passes through
    the mean of the y variable and the mean of the x
    variable.
  • The least squares coefficients are unbiased
    estimates of ?0 and ?1.

36
Simple Linear Regression Analysis
  • SUM OF RESIDUALS

SUM OF SQUARED RESIDUALS
37
Simple Linear Regression Analysis
  • TOTAL SUM OF SQUARES
  • where
  • TSS Total sum of squares
  • n Sample size
  • y Values of the dependent variable
  • Average value of the dependent variable

38
Simple Linear Regression Analysis
  • SUM OF SQUARES ERROR (RESIDUALS)
  • where
  • SSE Sum of squares error
  • n Sample size
  • y Values of the dependent variable
  • Estimated value for the average of y for
    the given x value

39
Simple Linear Regression Analysis
  • SUM OF SQUARES REGRESSION
  • where
  • SSR Sum of squares regression
  • Average value of the dependent variable
  • y Values of the dependent variable
  • Estimated value for the average of y for
    the given x value

40
Simple Linear Regression Analysis
  • SUMS OF SQUARES

41
Simple Linear Regression Analysis
  • The coefficient of determination is the portion
    of the total variation in the dependent variable
    that is explained by its relationship with the
    independent variable. The coefficient of
    determination is also called R-squared and is
    denoted as R2.

42
Simple Linear Regression Analysis
  • COEFFICIENT OF DETERMINATION (R2)

43
Simple Linear Regression Analysis(Midwest
Example)
  • COEFFICIENT OF DETERMINATION (R2)

69.31 of the variation in the sales data for
this sample can be explained by the linear
relationship between sales and years of
experience.
44
Simple Linear Regression Analysis
  • COEFFICIENT OF DETERMINATION SINGLE INDEPENDENT
    VARIABLE CASE
  • where
  • R2 Coefficient of determination
  • r Simple correlation coefficient

45
Simple Linear Regression Analysis
  • STANDARD DEVIATION OF THE REGRESSION SLOPE
    COEFFICIENT (POPULATION)
  • where
  • Standard deviation of the regression slope
    (Called the standard error of the slope)
  • Population standard error of the estimate

46
Simple Linear Regression Analysis
  • ESTIMATOR FOR THE STANDARD ERROR OF THE ESTIMATE
  • where
  • SSE Sum of squares error
  • n Sample size
  • k number of independent variables in the
    model

47
Simple Linear Regression Analysis
  • ESTIMATOR FOR THE STANDARD DEVIATION OF THE
    REGRESSION SLOPE
  • where
  • Estimate of the standard error of the least
    squares slope
  • Sample standard error of the estimate

48
Simple Linear Regression Analysis
  • TEST STATISTIC FOR TEST OF SIGNIFICANCE OF THE
    REGRESSION SLOPE
  • where
  • b1 Sample regression slope coefficient
  • ?1 Hypothesized slope
  • sb1 Estimator of the standard error of the
    slope

49
Significance Test of Regression Slope(Example
11-5)
Rejection Region ? /2 0.025
Rejection Region ? /2 0.025
Since t4.753 gt 2.048, reject H0 conclude that
the true slope is not zero
50
Simple Linear Regression Analysis
  • MEAN SQUARE REGRESSION
  • where
  • SSR Sum of squares regression
  • k Number of independent variables in the model

51
Simple Linear Regression Analysis
  • MEAN SQUARE ERROR
  • where
  • SSE Sum of squares error
  • n Sample size
  • k Number of independent variables in the model

52
Significance Test(Example 11-6)
Rejection Region ? 0.05
Since F 22.59 gt 4.96, reject H0 conclude that
the regression model explains a significant
amount of the variation in the dependent variable
53
Simple Regression Steps
  • Develop a scatter plot of y and x. You are
    looking for a linear relationship between the two
    variables.
  • Calculate the least squares regression line for
    the sample data.
  • Calculate the correlation coefficient and the
    simple coefficient of determination, R2.
  • Conduct one of the significance tests.

54
Simple Linear Regression Analysis
  • CONFIDENCE INTERVAL ESTIMATE FOR THE REGRESSION
    SLOPE
  • or equivalently
  • where
  • sb1 Standard error of the regression slope
    coefficient
  • s? Standard error of the estimate

55
Simple Linear Regression Analysis
  • CONFIDENCE INTERVAL FOR
  • where
  • Point estimate of the dependent variable
  • t Critical value with n - 2 d.f.
  • s? Standard error of the estimate
  • n Sample size
  • xp Specific value of the independent variable
  • Mean of independent variable observations

56
Simple Linear Regression Analysis
  • PREDICTION INTERVAL FOR

57
Residual Analysis
  • Before using a regression model for description
    or prediction, you should do a check to see if
    the assumptions concerning the normal
    distribution and constant variance of the error
    terms have been satisfied. One way to do this is
    through the use of residual plots.

58
Key Terms
  • Coefficient of Determination
  • Correlation Coefficient
  • Dependent Variable
  • Independent Variable
  • Least Squares Criterion
  • Regression Coefficients
  • Regression Slope Coefficient
  • Residual
  • Scatter Plot
  • Simple Linear Regression Analysis
  • Spurious Correlation
Write a Comment
User Comments (0)
About PowerShow.com