Title: Simple Linear Regression
1- Simple Linear Regression
- and Correlation
2Learning Objectives
- Describe the Linear Regression Model
- State the Regression Modeling Steps
- Explain Ordinary Least Squares
- Compute Regression Coefficients
- Predict Response Variable
- Describe Residual Influence Analysis
- Interpret Computer Output
3Deterministic Models
- Hypothesize Exact Relationships
- Suitable When Prediction Error is
Negligible - Force Is Exactly Mass Times Acceleration
F ma
4Probabilistic Models
- Hypothesize 2 Components
- Deterministic
- Random Error
- Sales Volume Is 10 Times Advertising Spending
Plus Random Error - Y 10X e
- Random Error May Be Due to Factors Other Than
Advertising
5Regression Models
- Answer What Is the Relationship Between the
Variables? - Equation Used
- 1 Numerical Dependent Variable
- What Is to Be Predicted
- 1 or More Numerical or Categorical Independent
Variables - Used Mainly for Prediction
6Regression Modeling Steps
- 1. Define Problem or Question
- 2. Specify Model
- 3. Collect Data
- 4. Do Descriptive Data Analysis
- 5. Estimate Unknown Parameters
- 6. Evaluate Model
- 7. Use Model for Prediction
7Specifying the Model
- Define Variables
- Conceptual (e.g., Advertising, Price)
- Empirical (e.g., List Price, Regular Price)
- Measurement (e.g., , Units)
- Hypothesize Nature of Relationship
- Expected Effects (i.e., Coefficients Signs)
- Functional Form (Linear or Non-Linear)
- Interactions
8Which Functional Form?
Sales
Sales
Advertising
Advertising
Sales
Sales
Advertising
Advertising
9Linear Regression Model
- Relationship Between Variables Is a
Linear Function
Y
X
a
b
e
1
i
i
10Population Linear Regression Model
Y
Observed Value
X
Observed Value
11Sample Linear Regression Model
Y
(3,Y)
Unsampled Observation
X
Observed Value
12Scatter Diagram
- 1. Plot of All (Xi, Yi) Pairs
- 2. Suggests How Well Model Will Fit
13 Ordinary Least Squares
- Best Fit Means Difference Between Actual
Values (Y ) Predicted Values ( Y ) Are a
Minimum - But Positive Differences Off-Set Negative
- OLS Minimizes the Sum of the Squared
Differences (or Errors)
14Ordinary Least Squares Graphically
n
å
2
2
2
2
2
OLS Minimi
zes
e
e
e
e
e
i
1
2
3
4
i
1
Y
e
2
e
e
1
3
X
15 Coefficient Equations
Sample Regression Equation
16Parameter Estimation Example
- Youre a marketing analyst for Hasbro Toys. You
gather the following data - Ad Sales (Units) 1 1 2 1 3 2 4 2 5 4
- What is the relationship between sales
advertising?
17Parameter Estimation Solution Table
2
2
X
Y
X
Y
X
Y
i
i
i
i
i
i
1
1
1
1
1
2
1
4
1
2
3
2
9
4
6
4
2
16
4
8
5
4
25
16
20
15
10
55
26
37
18Parameter Estimation Solution
19Interpretation of Coefficients
- Slope (b1)
- Estimated Y Changes by b1 for Each 1 Unit
Increase in X - If b1 2, then Sales (Y) Is Expected to Increase
by 2 for Each 1 Unit Increase in Advertising (X) - Y-Intercept (a)
- Average Value of Y When X 0
- If a 4, then Average Sales (Y) Is Expected to
Be 4 When Advertising (X) Is 0
20(No Transcript)
21Parameter Estimation SPSS Output
22Evaluating the Model
- 1. How Well Does the Model Describe the
Relationship Between the Variables? - 2. Closeness of Best Fit
- Closer the Points to the Line the Better
- 3. Assumptions Met
- 4. Significance of Parameter Estimates
- 5. Outliers (Unusual Observations)
23Evaluating Model Steps
- 1. Examine Variation Measures
- 2. Test Coefficients for Significance
- 3. Do Residual Analysis
- 4. Do Influence Analysis
24Random Error Variation
- Variation of Actual Y from Predicted Y
- Measured by Standard Error of Estimate
- Sample Standard Deviation of e
- Denoted SYX
- Affects Several Factors
- Parameter Significance
- Prediction Accuracy
25Standard Error of Estimate
26Standard Error of EstimateSolution
27Rule of Thumb for Interpreting the Standard Error
of Estimate
- Regression line 1(std. error) about 68 of
the data points are expected to fall in this
interval - Regression line 2(std. error) about 95 of
the data points are expected to fall in this
interval - Regression line 3(std. error) about 99.7 of
the data points are expected to fall in this
interval
28Graphic Representation of Standard Error of
Estimate
Y
One Standard Error
Two Standard Errors
One Standard Error
Two Standard Errors
_
X
X
X
given
29Prediction With Regression Models
- Types of Predictions
- Point Estimates
- Interval Estimates
- What Is Predicted
- Population Mean Response (mYX) for Given X
- Point on Population Regression Line
- Individual Response (Yi) for Given X
30What Is Predicted
Y
Y
X
Individual
b
1
a
Y
i
Mean Y (
m
)
YX
m
a
b
X
YX
1
Prediction, Y
X
X
Given
31Confidence Interval Estimate of Mean Y (mYX)
Y
t
S
Y
t
S
-
,
/
,
/
n
k
n
k
-
-
-
-
1
2
1
2
a
a
Y
Y
where
(
)
2
X
X
-
1
given
S
S
YX
n
Y
n
(
)
2
å
2
X
n
X
-
i
i
1
32Factors Affecting Interval Width
- 1. Level of Confidence (1 - a)
- Width Increases as Confidence Increases
- 2. Data Dispersion (SYX)
- Width Increases as Variation Increases
- 3. Sample Size
- Width Decreases as Sample Size Increases
- 4. Distance of Xgiven from MeanX
- Width Increases as Distance Increases
33Why Distance from Mean?
Y
Sample 1 Line
Greater Dispersion Than X1
_
Y
Sample 2 Line
X
X
X
X
1
2
34Confidence Interval Estimate Solution
35Prediction Interval of Individual Response
where
(
)
2
X
X
-
1
given
S
S
1
YX
n
n
ind
(
)
2
å
2
X
n
X
-
i
i
1
Note!
36Prediction Interval of Individual Response
Solution
37Hyperbolic Interval Bands
Y
Upper Prediction Limit
X
b
Upper Confidence Limit
1
a
Y
i
Lower Confidence Limit
Lower Prediction Limit
_
X
X
X
given
38Interval EstimateSPSS Output
39Measures of Variation in Regression
- Total Sum of Squares (SST)
- Measures Variation of Observed Yi Around the
MeanY - Explained Variation (SSR)
- Variation Due to Relationship Between X Y
- Unexplained Variation (SSE)
- Variation Due to Other Factors
40Variation Measures
Y
Yi
Y
(xi,Yi)
X
X
i
41Relationship
SST SSR SSE
42Coefficient of Determination
- Proportion of Variation Explained by
Relationship Between X Y
ˆ
0 r2 1
43Coefficient of Determination Examples
r2 1
r2 1
Y
Y
Y
b
b
X
i
0
1
i
Y
b
b
X
i
0
1
i
X
X
r2 .8
r2 0
Y
Y
Y
b
b
X
Y
b
b
X
i
0
1
i
i
0
1
i
X
X
44Adjusted Coefficient of Determination
- Proportion of Variation Explained by
Relationship Between X Y - Reflects
- Sample Size
- Number of Independent Variables
- Equation
45Coefficient of Determination Solution
81.67 of Variation in Sales Is Due Advertising
46Coefficient of Determination SPSS Output
47Correlation Models
- Answer How Strong Is the Linear Relationship
Between 2 Variables? - Coefficient of Correlation Used
- Population Correlation Coefficient Denoted r
(Rho) - Values Range from -1 to 1
- Measures Degree of Association
- Used Mainly for Understanding
48Sample Coefficient of Correlation
- Pearson Product-Moment Coefficient of Correlation
ˆ
r
Coefficien
t of Deter
mination
n
å
(
)(
)
X
X
Y
Y
-
-
i
i
i
1
n
n
(
)
(
)
2
2
å
å
X
X
Y
Y
-
-
i
i
i
i
1
1
49Coefficient of Correlation Values
Perfect Positive Correlation
Perfect Negative Correlation
No Correlation
-1.0
1.0
0
-.5
.5
Increasing Degree of Negative Correlation
Increasing Degree of Positive Correlation
50Coefficient of Correlation Regression Model
r 1
r -1
Y
Y
Y
a
b
X
i
1
i
Y
a
b
X
i
1
i
X
X
r .89
r 0
Y
Y
Y
a
b
X
Y
a
b
X
i
1
i
i
1
i
X
X