Title: Chapter 14 Introduction to Linear Regression and Correlation Analysis
1Chapter 14Introduction to Linear Regression and
Correlation Analysis
Business Statistics A Decision-Making
Approach 7th Edition
2Ch. 14
- 14.1 Correlation Coefficient
- 14.2 Simple Linear Regression Analysis
- 14.3 Uses for Regression Analysis
3Scatter Plots and Correlation
- A scatter plot (or scatter diagram) is used to
show the relationship between two variables - Correlation analysis is used to measure strength
of the association (linear relationship) between
two variables - Only concerned with strength of the relationship
- No causal effect is implied
4Scatter Plot Examples
Linear relationships
Curvilinear relationships
y
y
x
x
y
y
x
x
5Scatter Plot Examples
(continued)
Strong relationships
Weak relationships
y
y
x
x
y
y
x
x
6Scatter Plot Examples
No relationship
y
x
y
x
7Correlation Coefficient
- Correlation measures the strength of the linear
association between two variables - The sample correlation coefficient r is a
measure of the strength of the linear
relationship between two variables, based on
sample observations
8Features of r
- Unit free
- Range between -1 and 1
- The closer to -1, the stronger the negative
linear relationship - The closer to 1, the stronger the positive linear
relationship - The closer to 0, the weaker the linear
relationship
9Examples of Approximate r Values
y
y
y
x
x
x
r -1
r -.6
r 0
y
y
x
x
r .3
r 1
10Armands Pizza Parlors
- Armands Pizza Parlors is a
chain of Italian-food restaurants
located in a five-state area. The most
successful locations for Armands have been near
college campuses. The CEO of Armands would like
to know the correlation between the number of
students at a university (x) and pizza sales of
restaurants located near the university (y).
11(No Transcript)
12(No Transcript)
13Simple Linear Regression
- Simple Linear Regression Model
- Least Squares Method
- Coefficient of Determination
- Model Assumptions
- Testing for Significance
- Excels Regression Tool
- Using the Estimated Regression Equation for
Estimation and Prediction
14The Simple Linear Regression Model
Dependent variable y Independent variable x
- Simple Linear Regression Model
- y ?0 ?1x ?
- Simple Linear Regression Equation
- E(y) ?0 ?1x
- Estimated Simple Linear Regression Equation
15The Simple Linear Regression Model
y-intercept
slope
16The Simple Linear Regression Model
Estimated value of y
Observed value of x
17Possible Regression Lines in Simple Linear
Regression
Positive Linear Relationship
E(y)
E(y) ?0 ?1x
Regression line
Intercept
Slope ?1 Is positive
?0
x
18Possible Regression Lines in Simple Linear
Regression
Negative Linear Relationship
E(y)
E(y) ?0 - ?1x
Intercept
?0
Regression line
Slope ?1 Is negative
x
19Possible Regression Lines in Simple Linear
Regression
No Relationship
E(y)
E(y) ?0 0(x)
Intercept
Regression line
?0
Slope ?1 0
x
20Possible Regression Lines in Simple Linear
Regression
Positive Linear Relationship Negative y-intercept
E(y)
E(y) -?0 ?1x
Regression line
x
Intercept
?0
21The Least Squares Method
22The Least Squares Method
23The Least Squares Method
24The Least Squares Method
- Least Squares Criterion
- where
- yi observed value of the dependent
variable for the ith observation - yi estimated value of the dependent
variable for the ith observation
25The Least Squares Method
- Slope for the Estimated Regression Equation
- y-Intercept for the Estimated Regression Equation
- b0 y - b1x
- where
- xi value of independent variable for ith
observation - yi value of dependent variable for ith
observation - x mean value for independent variable
- y mean value for dependent variable
_
_
_
_
26Example Reed Auto Sales
- Simple Linear Regression
- Reed Auto periodically has a special week-long
sale. As part of the advertising campaign, Reed
runs one or more television commercials during
the weekend preceding the sale. Data from a
sample of 5 previous sales are shown below. - 1
- Number of TV Ads (x) Number of Cars
Sold (y) - 1 14
- 3 24
- 2 18
- 1 17
- 3 27
27Example Reed Auto Sales
?
?
?
?
?
?
y 20
x 2
28Example Reed Auto Sales
- Slope for the Estimated Regression Equation
-
- y-Intercept for the Estimated Regression Equation
- b0 y - b1x
- b0 20 - 5(2) 10
_
_
29Example Reed Auto Sales
- Slope for the Estimated Regression Equation
-
- y-Intercept for the Estimated Regression Equation
- b0 20 - 5(2) 10
- Estimated Regression Equation
30Now You Try
- The following table gives the percentage of women
working in each company (x) and the percentage of
management jobs held by women in that company
(y). - Develop the estimated regression equation for
these data. - Predict the percentage of management jobs held by
women in a company that has 60 women employees.
31Now You Try
?
?
?
?
?
x
y
?
32Coefficient of Determination
?
?
?
?
?
?
?
Sales (y)
?
?
?
Student Population (x)
33Coefficient of Determination
?
?
?
?
?
?
?
Sales (y)
?
?
?
Student Population (x)
34Coefficient of Determination (r2)
- A measure of the goodness of fit for the
estimated regression equation. - SSE - Sum of Squares Due to Error
- SSR - Sum of Squares Due to Regression
- SST - Total Sum of Squares
35Coefficient of Determination (r2)
Use to calculate SST
?
Use to calculate SSE
?
Use to calculate SSR
?
?
?
?
?
Sales (y)
?
?
?
Student Population (x)
36Coefficient of Determination (r2)
- Relationship Among SST, SSR, SSE
- SST SSR SSE
r2 SSR/SST
37Example Reed Auto Sales
?
?
?
SSE
y 10 5x
38Example Reed Auto Sales
SST
y 20
39Example Reed Auto Sales
?
?
?
y 10 5x
y 20
SSR
40Example Reed Auto Sales
- Coefficient of Determination
- r2 SSR/SST 100/114 .8772
- The regression relationship is very strong
since 87.7 of the variation in number of cars
sold can be explained by the linear relationship
between the number of TV ads and the number of
cars sold.
41Now You Try
- Given are five observations for two variables, x
and y. - The estimated regression equation for these data
is - Compute SSE, SST, and SSR.
- Compute the coefficient of determination r2.
Comment on the goodness of fit.
42Now You Try
?
?
SSE
43Now You Try
?
?
y 8
SSR
44Testing for Significance
- To test for a significant regression
relationship, we must conduct a hypothesis test
to determine whether the value of b1 is zero. - Two tests are commonly used
- t Test - Used to test the significance of 1
independent variable. - F Test - must be used when working with more
than 1 independent variable.
45Testing for Significance t Test
- Hypotheses
- H0 ?1 0
- Ha ?1 0
- Test Statistic
46Testing for Significance
- To test for a significant regression
relationship, we must conduct a hypothesis test
to determine whether the value of b1 is zero. - Two tests are commonly used
- t Test - Used to test the significance of 1
independent variable. - F Test - must be used when working with more
than 1 independent variable. - Both tests require an estimate of s 2, the
variance of e in the regression model.
47Model Assumptions
- Assumptions About the Error Term ?
- The error ? is a random variable with mean of
zero. - The variance of ?, denoted by ? 2, is the same
for all values of the independent variable. - The values of ? are independent.
- The error ? is a normally distributed random
variable.
48The Error Term (E)
?
?
?
?
?
?
?
Sales (y)
?
?
?
Student Population (x)
49The Error Term (E)
?
Residual
?
?
?
?
?
?
Sales (y)
?
?
Residuals - The deviations of the y values about
the estimated regression line.
?
Student Population (x)
50Testing for Significance
- An Estimate of s 2
- The mean square error (MSE) provides the estimate
- of s 2, and the notation s2 is also used.
- s2 MSE SSE/(n-p-1)
- where
n sample size p of independent variables
51Testing for Significance
- An Estimate of s
- To estimate s we take the square root of s 2.
- The resulting s is called the standard error of
the estimate. - Estimated Standard Deviation of b1
52Testing for Significance t Test
- Hypotheses
- H0 ?1 0
- Ha ?1 0
- Test Statistic
- Rejection Rule
- Reject H0 if t lt -t???? or t gt t????
- where t??? is based on a t distribution with
- n p - 1 degrees of freedom.
53Example Reed Auto Sales
?
?
?
SSE
y 10 5x
54Example Reed Auto Sales
55Example Reed Auto Sales
- t Test
- Hypotheses H0 ?1 0
- Ha ?1 0
- Rejection Rule
- For ? .05 and d.f. 3, t.025 3.182
- Reject H0 if t gt 3.182 (or lt -3.182)
- Test Statistics
- t 5/1.08 4.63
- Conclusions
- Reject H0
56Now You Try
- Using the data from the previous Now You Try,
- Compute s2 (MSE)
- Compute s (standard error of the estimate)
- Compute sb1 (Given )
- Use the t test to test the following hypothesis
(?.05)
57Confidence Interval for ?1
- We can use a 95 confidence interval for ?1 to
test the hypotheses just used in the t test. - H0 is rejected if the hypothesized value of ?1
is not included in the confidence interval for
?1.
58Confidence Interval for ?1
- The form of a confidence interval for ?1 is
- where b1 is the point estimate
- is the margin of error
- is the t value providing an area
- of a/2 in the upper tail of a
- t distribution with n-p-1 degrees
- of freedom
59Example Reed Auto Sales
- Rejection Rule
- Reject H0 if 0 is not included in the
confidence interval for ?1. - 95 Confidence Interval for ?1
- 5 ? 3.182(1.08) 5 ? 3.44
- or 1.56 to 8.44
- Conclusion
- Reject H0
60Testing for Significance F Test
- Used to test the significance of the entire
model. - Hypotheses
- H0 ?1 ?2 . . . ?p 0
- Ha One or more of the parameters
- is not equal to zero.
- Mean Square Due to Regression (MSR)
- p the number of independent variables
61Testing for Significance F Test
- Hypotheses
- H0 ?1 ?2 . . . ?p 0
- Ha One or more of the parameters
- is not equal to zero.
- Test Statistic
- F MSR/MSE
- Rejection Rule
- Reject H0 if F gt F?
- where F? is based on an F distribution with p
d.f. in - the numerator and n - p - 1 d.f. in the
denominator.
62Example Reed Auto Sales
?
?
?
y 10 5x
y 20
SSR
63Example Reed Auto Sales
- ? .05
- d.f. p 1 (numerator), n-p-1 3 (denominator)
- MSR 100
- MSE (SSE/n-p-1) 4.667
64Example Reed Auto Sales
- F Test
- Hypotheses H0 ?1 0
- Ha ?1 ? 0
- Rejection Rule
- For ? .05 and d.f. 1, 3 F.05
10.13 - Reject H0 if F gt 10.13.
- Test Statistic
- F MSR/MSE 100/4.667 21.43
- Conclusion
- We can reject H0.
65Using Excels Regression Tool
- Performing the Regression Analysis
- Step 1 Select the Tools pull-down menu
- Step 2 Choose the Data Analysis option
- Step 3 Choose Regression from the list of
- Analysis Tools
- continued
66Using Excels Regression Tool
- Performing the Regression Analysis
- Step 4 When the Regression dialog box appears
- Enter C1C6 in the Input Y Range box
- Enter B1B6 in the Input X Range box
- Select Labels
- Select Confidence Level
- Enter desired confidence level in the
Confidence Level box - Select Output Range
- Enter A9 (any cell) in the Output Range
box - Select OK to begin the regression
analysis
67Using Excels Regression Tool
- Formula Worksheet (showing data entered)
68(No Transcript)
69Using Excels Regression Tool
- Estimated Regression Equation Output (left
portion)
Note Columns F-I are not shown.
70Using Excels Regression Tool
- Estimated Regression Equation Output (left
portion)
Note Columns F-I are not shown.
p-value lt 0.05, TV Ads is significant at ?
.05
71Using Excels Regression Tool
- Estimated Regression Equation Output (right
portion)
Note Columns C-E are hidden.
72Using Excels Regression Tool
73Using Excels Regression Tool
- Regression Statistics Output
74Now You Try
- In a manufacturing process the assembly line
speed (feet per minute) was thought to affect the
number of defective parts found during the
inspection process. To test this theory, managers
devised a situation in which the same batch of
parts was inspected at a variety of line speeds.
The collected data follows
75Now You Try
- Develop the estimated regression equation that
relates line speed to the number of defective
parts found.
76Now You Try
77Now You Try
- At .05 level of significance, determine whether
line speed and number of defective parts are
related (perform a t-test and F-test)
78Now You Try
79Now You Try
SSR
80Now You Try
- Did the estimated regression equation provide a
good fit to the data? (r2)
81Now You Try
r2 gt 0.60, therefore the estimated regression
equation provides a good fit to the data. The
regression equation explains 74 of the
variations in the number of defective parts.
82Now You Try
- Estimate the number of defective parts with a
line speed of 50 feet per minute.
83End of Chapter 14