Chapter 14 Introduction to Linear Regression and Correlation Analysis PowerPoint PPT Presentation

presentation player overlay
1 / 83
About This Presentation
Transcript and Presenter's Notes

Title: Chapter 14 Introduction to Linear Regression and Correlation Analysis


1
Chapter 14Introduction to Linear Regression and
Correlation Analysis
Business Statistics A Decision-Making
Approach 7th Edition
2
Ch. 14
  • 14.1 Correlation Coefficient
  • 14.2 Simple Linear Regression Analysis
  • 14.3 Uses for Regression Analysis

3
Scatter Plots and Correlation
  • A scatter plot (or scatter diagram) is used to
    show the relationship between two variables
  • Correlation analysis is used to measure strength
    of the association (linear relationship) between
    two variables
  • Only concerned with strength of the relationship
  • No causal effect is implied

4
Scatter Plot Examples
Linear relationships
Curvilinear relationships
y
y
x
x
y
y
x
x
5
Scatter Plot Examples
(continued)
Strong relationships
Weak relationships
y
y
x
x
y
y
x
x
6
Scatter Plot Examples
No relationship
y
x
y
x
7
Correlation Coefficient
  • Correlation measures the strength of the linear
    association between two variables
  • The sample correlation coefficient r is a
    measure of the strength of the linear
    relationship between two variables, based on
    sample observations

8
Features of r
  • Unit free
  • Range between -1 and 1
  • The closer to -1, the stronger the negative
    linear relationship
  • The closer to 1, the stronger the positive linear
    relationship
  • The closer to 0, the weaker the linear
    relationship

9
Examples of Approximate r Values
y
y
y
x
x
x
r -1
r -.6
r 0
y
y
x
x
r .3
r 1
10
Armands Pizza Parlors
  • Armands Pizza Parlors is a
    chain of Italian-food restaurants
    located in a five-state area. The most
    successful locations for Armands have been near
    college campuses. The CEO of Armands would like
    to know the correlation between the number of
    students at a university (x) and pizza sales of
    restaurants located near the university (y).

11
(No Transcript)
12
(No Transcript)
13
Simple Linear Regression
  • Simple Linear Regression Model
  • Least Squares Method
  • Coefficient of Determination
  • Model Assumptions
  • Testing for Significance
  • Excels Regression Tool
  • Using the Estimated Regression Equation for
    Estimation and Prediction

14
The Simple Linear Regression Model
Dependent variable y Independent variable x
  • Simple Linear Regression Model
  • y ?0 ?1x ?
  • Simple Linear Regression Equation
  • E(y) ?0 ?1x
  • Estimated Simple Linear Regression Equation

15
The Simple Linear Regression Model
y-intercept
slope
16
The Simple Linear Regression Model
Estimated value of y
Observed value of x
17
Possible Regression Lines in Simple Linear
Regression
Positive Linear Relationship
E(y)
E(y) ?0 ?1x
Regression line
Intercept
Slope ?1 Is positive
?0
x
18
Possible Regression Lines in Simple Linear
Regression
Negative Linear Relationship
E(y)
E(y) ?0 - ?1x
Intercept
?0
Regression line
Slope ?1 Is negative
x
19
Possible Regression Lines in Simple Linear
Regression
No Relationship
E(y)
E(y) ?0 0(x)
Intercept
Regression line
?0
Slope ?1 0
x
20
Possible Regression Lines in Simple Linear
Regression
Positive Linear Relationship Negative y-intercept
E(y)
E(y) -?0 ?1x
Regression line
x
Intercept
?0
21
The Least Squares Method
22
The Least Squares Method
23
The Least Squares Method
24
The Least Squares Method
  • Least Squares Criterion
  • where
  • yi observed value of the dependent
    variable for the ith observation
  • yi estimated value of the dependent
    variable for the ith observation


25
The Least Squares Method
  • Slope for the Estimated Regression Equation
  • y-Intercept for the Estimated Regression Equation
  • b0 y - b1x
  • where
  • xi value of independent variable for ith
    observation
  • yi value of dependent variable for ith
    observation
  • x mean value for independent variable
  • y mean value for dependent variable

_
_
_
_
26
Example Reed Auto Sales
  • Simple Linear Regression
  • Reed Auto periodically has a special week-long
    sale. As part of the advertising campaign, Reed
    runs one or more television commercials during
    the weekend preceding the sale. Data from a
    sample of 5 previous sales are shown below.
  • 1
  • Number of TV Ads (x) Number of Cars
    Sold (y)
  • 1 14
  • 3 24
  • 2 18
  • 1 17
  • 3 27

27
Example Reed Auto Sales
?
?
?
?
?
?
y 20
x 2
28
Example Reed Auto Sales
  • Slope for the Estimated Regression Equation
  • y-Intercept for the Estimated Regression Equation
  • b0 y - b1x
  • b0 20 - 5(2) 10

_
_
29
Example Reed Auto Sales
  • Slope for the Estimated Regression Equation
  • y-Intercept for the Estimated Regression Equation
  • b0 20 - 5(2) 10
  • Estimated Regression Equation

30
Now You Try
  • The following table gives the percentage of women
    working in each company (x) and the percentage of
    management jobs held by women in that company
    (y).
  • Develop the estimated regression equation for
    these data.
  • Predict the percentage of management jobs held by
    women in a company that has 60 women employees.

31
Now You Try
?
?
?
?
?
x
y
?
32
Coefficient of Determination
?
?
?
?
?
?
?
Sales (y)
?
?
?
Student Population (x)
33
Coefficient of Determination
?
?
?
?
?
?
?
Sales (y)
?
?
?
Student Population (x)
34
Coefficient of Determination (r2)
  • A measure of the goodness of fit for the
    estimated regression equation.
  • SSE - Sum of Squares Due to Error
  • SSR - Sum of Squares Due to Regression
  • SST - Total Sum of Squares

35
Coefficient of Determination (r2)
Use to calculate SST
?
Use to calculate SSE
?
Use to calculate SSR
?
?
?
?
?
Sales (y)
?
?
?
Student Population (x)
36
Coefficient of Determination (r2)
  • Relationship Among SST, SSR, SSE
  • SST SSR SSE

r2 SSR/SST
37
Example Reed Auto Sales
?
?
?
SSE
y 10 5x
38
Example Reed Auto Sales
SST
y 20
39
Example Reed Auto Sales
?
?
?
y 10 5x
y 20
SSR
40
Example Reed Auto Sales
  • Coefficient of Determination
  • r2 SSR/SST 100/114 .8772
  • The regression relationship is very strong
    since 87.7 of the variation in number of cars
    sold can be explained by the linear relationship
    between the number of TV ads and the number of
    cars sold.

41
Now You Try
  • Given are five observations for two variables, x
    and y.
  • The estimated regression equation for these data
    is
  • Compute SSE, SST, and SSR.
  • Compute the coefficient of determination r2.
    Comment on the goodness of fit.

42
Now You Try
?
?
SSE
43
Now You Try
?
?
y 8
SSR
44
Testing for Significance
  • To test for a significant regression
    relationship, we must conduct a hypothesis test
    to determine whether the value of b1 is zero.
  • Two tests are commonly used
  • t Test - Used to test the significance of 1
    independent variable.
  • F Test - must be used when working with more
    than 1 independent variable.

45
Testing for Significance t Test
  • Hypotheses
  • H0 ?1 0
  • Ha ?1 0
  • Test Statistic

46
Testing for Significance
  • To test for a significant regression
    relationship, we must conduct a hypothesis test
    to determine whether the value of b1 is zero.
  • Two tests are commonly used
  • t Test - Used to test the significance of 1
    independent variable.
  • F Test - must be used when working with more
    than 1 independent variable.
  • Both tests require an estimate of s 2, the
    variance of e in the regression model.

47
Model Assumptions
  • Assumptions About the Error Term ?
  • The error ? is a random variable with mean of
    zero.
  • The variance of ?, denoted by ? 2, is the same
    for all values of the independent variable.
  • The values of ? are independent.
  • The error ? is a normally distributed random
    variable.

48
The Error Term (E)
?
?
?
?
?
?
?
Sales (y)
?
?
?
Student Population (x)
49
The Error Term (E)
?
Residual
?
?
?
?
?
?
Sales (y)
?
?
Residuals - The deviations of the y values about
the estimated regression line.
?
Student Population (x)
50
Testing for Significance
  • An Estimate of s 2
  • The mean square error (MSE) provides the estimate
  • of s 2, and the notation s2 is also used.
  • s2 MSE SSE/(n-p-1)
  • where

n sample size p of independent variables
51
Testing for Significance
  • An Estimate of s
  • To estimate s we take the square root of s 2.
  • The resulting s is called the standard error of
    the estimate.
  • Estimated Standard Deviation of b1

52
Testing for Significance t Test
  • Hypotheses
  • H0 ?1 0
  • Ha ?1 0
  • Test Statistic
  • Rejection Rule
  • Reject H0 if t lt -t???? or t gt t????
  • where t??? is based on a t distribution with
  • n p - 1 degrees of freedom.

53
Example Reed Auto Sales
?
?
?
SSE
y 10 5x
54
Example Reed Auto Sales
55
Example Reed Auto Sales
  • t Test
  • Hypotheses H0 ?1 0
  • Ha ?1 0
  • Rejection Rule
  • For ? .05 and d.f. 3, t.025 3.182
  • Reject H0 if t gt 3.182 (or lt -3.182)
  • Test Statistics
  • t 5/1.08 4.63
  • Conclusions
  • Reject H0

56
Now You Try
  • Using the data from the previous Now You Try,
  • Compute s2 (MSE)
  • Compute s (standard error of the estimate)
  • Compute sb1 (Given )
  • Use the t test to test the following hypothesis
    (?.05)

57
Confidence Interval for ?1
  • We can use a 95 confidence interval for ?1 to
    test the hypotheses just used in the t test.
  • H0 is rejected if the hypothesized value of ?1
    is not included in the confidence interval for
    ?1.

58
Confidence Interval for ?1
  • The form of a confidence interval for ?1 is
  • where b1 is the point estimate
  • is the margin of error
  • is the t value providing an area
  • of a/2 in the upper tail of a
  • t distribution with n-p-1 degrees
  • of freedom

59
Example Reed Auto Sales
  • Rejection Rule
  • Reject H0 if 0 is not included in the
    confidence interval for ?1.
  • 95 Confidence Interval for ?1
  • 5 ? 3.182(1.08) 5 ? 3.44
  • or 1.56 to 8.44
  • Conclusion
  • Reject H0

60
Testing for Significance F Test
  • Used to test the significance of the entire
    model.
  • Hypotheses
  • H0 ?1 ?2 . . . ?p 0
  • Ha One or more of the parameters
  • is not equal to zero.
  • Mean Square Due to Regression (MSR)
  • p the number of independent variables

61
Testing for Significance F Test
  • Hypotheses
  • H0 ?1 ?2 . . . ?p 0
  • Ha One or more of the parameters
  • is not equal to zero.
  • Test Statistic
  • F MSR/MSE
  • Rejection Rule
  • Reject H0 if F gt F?
  • where F? is based on an F distribution with p
    d.f. in
  • the numerator and n - p - 1 d.f. in the
    denominator.

62
Example Reed Auto Sales
?
?
?
y 10 5x
y 20
SSR
63
Example Reed Auto Sales
  • ? .05
  • d.f. p 1 (numerator), n-p-1 3 (denominator)
  • MSR 100
  • MSE (SSE/n-p-1) 4.667

64
Example Reed Auto Sales
  • F Test
  • Hypotheses H0 ?1 0
  • Ha ?1 ? 0
  • Rejection Rule
  • For ? .05 and d.f. 1, 3 F.05
    10.13
  • Reject H0 if F gt 10.13.
  • Test Statistic
  • F MSR/MSE 100/4.667 21.43
  • Conclusion
  • We can reject H0.

65
Using Excels Regression Tool
  • Performing the Regression Analysis
  • Step 1 Select the Tools pull-down menu
  • Step 2 Choose the Data Analysis option
  • Step 3 Choose Regression from the list of
  • Analysis Tools
  • continued

66
Using Excels Regression Tool
  • Performing the Regression Analysis
  • Step 4 When the Regression dialog box appears
  • Enter C1C6 in the Input Y Range box
  • Enter B1B6 in the Input X Range box
  • Select Labels
  • Select Confidence Level
  • Enter desired confidence level in the
    Confidence Level box
  • Select Output Range
  • Enter A9 (any cell) in the Output Range
    box
  • Select OK to begin the regression
    analysis

67
Using Excels Regression Tool
  • Formula Worksheet (showing data entered)

68
(No Transcript)
69
Using Excels Regression Tool
  • Estimated Regression Equation Output (left
    portion)

Note Columns F-I are not shown.
70
Using Excels Regression Tool
  • Estimated Regression Equation Output (left
    portion)

Note Columns F-I are not shown.
p-value lt 0.05, TV Ads is significant at ?
.05
71
Using Excels Regression Tool
  • Estimated Regression Equation Output (right
    portion)

Note Columns C-E are hidden.
72
Using Excels Regression Tool
  • ANOVA Output

73
Using Excels Regression Tool
  • Regression Statistics Output

74
Now You Try
  • In a manufacturing process the assembly line
    speed (feet per minute) was thought to affect the
    number of defective parts found during the
    inspection process. To test this theory, managers
    devised a situation in which the same batch of
    parts was inspected at a variety of line speeds.
    The collected data follows

75
Now You Try
  • Develop the estimated regression equation that
    relates line speed to the number of defective
    parts found.

76
Now You Try
77
Now You Try
  • At .05 level of significance, determine whether
    line speed and number of defective parts are
    related (perform a t-test and F-test)

78
Now You Try
79
Now You Try
SSR
80
Now You Try
  • Did the estimated regression equation provide a
    good fit to the data? (r2)

81
Now You Try
r2 gt 0.60, therefore the estimated regression
equation provides a good fit to the data. The
regression equation explains 74 of the
variations in the number of defective parts.
82
Now You Try
  • Estimate the number of defective parts with a
    line speed of 50 feet per minute.

83
End of Chapter 14
Write a Comment
User Comments (0)
About PowerShow.com