Title: Regression Analysis: Estimating Relationships
1Regression Analysis and Modeling for Decision
Support
2Regression Analysis Estimating Models for
Decision Support
Regression Analysis is a study of relationship
between a set of independent variables and the
dependent variable. Independent variables are
characteristics that can be measured directly
(example the area of a house). These variables
are also caled predictor variables (used to
predict the dependent variable) or explanatory
variables (used to explain the behavior of the
dependent variable). Dependent variable is a
characteristic whose value depends on the values
of independent variables.
3Purpose of Regression Analysis
Future/Unknown
Past / Experience / Known
Now
time
ExplanationUse regression analysis to develop a
mathematical model to explain the variance in the
dependent variable based on values of independent
variables.
Prediction If the regression model adequately
explains the dependent variable, use the model to
predict values of the dependent variable.
Explain Selling Price of a house (dependent)
based on its characteristics (independents). If
the model is valid, use it for prediction.
Develop Regression Model using known data
(sample) Selling Price 40,000 100(Sq.ft)
20,000(Baths) If the above model is reliable and
valid, Use this model to predict the Selling
Price of any house based on its area (Sq.ft.) and
the number of bathrooms (Baths) The constant
term (40,000) is the fixed price of the house.
This is not dependent on the values of the
variables considered. Can be interpreted as the
price of the lot and transaction costs. The
coefficient of Sq.ft. (100) is the change in
Selling Price for an additional Square Foot. Can
be interpreted as Price per Sq.Foot.
4Procedure for Building Regression Models
Define/Clarify Purpose. Identify, graph and
describe the measurement of the dependent
variable. Specific y
Identify possible independent variables
(predictors should make sense). Describe each
X. Use scatter plots and correlations for y with
each x.
Estimate Regression Coefficients (using least
squares method).
Test to see if all coefficients are significant
(reliability). Establish validity (are
relationships as expected, do predictions match
actuals?).
Implement the model in Decision Support System.
Incorporate error in predictions. Outline
limitations/constraints of the model.
Compare predictions with actual values.
Modify/Refine/Expand model if necessary.
5Selecting Independent Variables Scatter Plots
Scatter Plots are used to visualize the
relationship between any two variables. For
regression analysis, we are looking for strong
linear relationship between the independent and
dependent variable.
Y-Intercept (Constant) Value of the dependent
variable irrespective of the value(s) of the
independent variable(s).
X-Coefficient (Slope) Change in dependent
variable per unit change in independent variable.
Overhead 3996 43 M_Hrs 883 Runs
R-Squared Proportion of variance in dependent
variable explained by independent variable(s).
6Selecting Independent Variables Correlation
Analysis
Correlation Coefficients are used to measure the
linear relationship between any two variables.
For regression analysis, we are looking for
strong linear relationship between the
independent and dependent variable, and low
correlations among independent variables .
Correlation of MachHrs with Overhead (should be
high)
Correlation of MachHrs with ProdRuns (should be
low)
Correlation of ProdRuns with Overhead (should
be high)
Multicollinearity exists when two independent
variables are highly correlated (redundancy).
7Simple Linear Regression
Linear regression function One dependent and one
independent variables Mathematical form Y
b0 b1X e b0 and b1 are parameters (unknown
constants) and their values are estimated from a
known sample of X and corresponding Y.
Estimated Model Y-Pred b0 b1X
Y-actual
e
Y-pred
b0 and b1 are estimates (based on a sample) of
B0 and B1 which are parameters (based on
population) Estimation of b0 and b1
(coefficients) is done by the Least Squares
Method. This method selects the line that has
the smallest squared error
B1 slope
B0 y -intercept
X
8Example of Simple Linear Regression Defining
Objective(s)
Define Objectives
- Pharmex is a chain of drugstores that operates
around the country. - To see how effective their advertising and other
promotional activities are, the company has
collected data from 50 randomly selected
metropolitan regions. - In each region it has compared its own
promotional expenditures and sales to those of
the leading competitor in the region over the
past year. - So, Pharmexs objective is to model the
relationship between Promotion expenditures and
Sales - Since Pharmex is interested in improving its
sales, relative to its largest competitor, the
dependent (outcome) variable for this situation
is - Sales Pharmexs sales as a percentage of those
of the leading competitor. This is the dependent
(or predicted) variable.
9Example of SLR Select Independent Variable
Variable Selection
The company expects that there is a positive
relationship between the Relative measures of
Sales and Promotion Expenditures, so that regions
with relatively more expenditures have relatively
more sales.
Promote Pharmexs promotional expenditures as a
percentage of those of the leading competitor.
This is the independent variable (or predictor
variable), one which can be controlled by
Pharmex.
- Selection Criteria
- Based in Common Sense and Experience
- Scatter Plots and Correlations
Description of Variables List each variable, how
measured, and expected relationship with
dependent variable. In this section report
results of Correlation Analyses, Scatter Plots,
etc.
10Example of SLR Collect and Organize Data
Data Collection
11Example of SLR Estimate Coefficients
Estimate Model
Regression Procedure in Excel
R-Square 45 of the variance in Sales is
explained by Promote (model)
Estimated Coefficients
Y-intercept (b0) 25.12 Slope (b1) 0.762
Sales-predicted
25.12 0.762 Promote
P-Value Indicates the probability of making a
Type I error (the possibility that the
coefficient is 0, that is there is no
relationship). If this value is greater than .05
do not use the variable as a predictor.
12Example of SLR Testing the Model
- Reliability and Validity
- Does the model make intuitive sense? Is the model
easy to understand and interpret? - Are all coefficients statistically significant?
(p-values less than .05) - Are the signs associated with the coefficients as
expected? - Does the model predict values that are reasonably
close to the actual values? - Is the model sufficiently sound? (High R-square,
low standard error, etc.)
13Example of SLR Implementing and Using the Model
Develop a Spreadsheet Model (Decision Support
System)
Estimated
Decision Variable
Forecast (regression formula)
What-if Pharmex spent 160K on promotions?
(Sensitivity analysis) What will Pharmex have to
do to achieve 20 sales more than its competitor?
(goal seeking) What will happen to Pharmexs
sales if its Competitors promotion can be any
value between 130K and 140K? (Monte-Carlo
Simulation)
14- Estimating Demand for a Product
15Conceptual Structure of Demand Model
Firm Demand Total Industry Demand Market
Share Define Market Share Firm Demand / Total
Industry Demand Market Share Firm
Demand / (Avg. Demand Number of Firms)
Market Share Relative Demand / Number of
Firms Firm Demand Total Industry Demand
(Relative Demand / N)
Macro-economic Influences Seasonal
Patterns Stage of Life Cycle
Exogenous Demand
Industry Activity Pricing, Promotion,
Quality
Endogenous Demand
FD
Competitive Profile Relative Pricing,
Promotion, Quality, and Loyalty
Relative Demand
16Total Industry Demand
17Relative Demand(Measure of Market Share)
RD is firm specific and a measure of market
share, the predictor variables should also be
relative to industry averages. For example,
relative price of the firm is PREL Firms
Price / Industry Avg. Price
18Estimating Demand
19 20Objectives
- Obtaining reliable forecasts for the Firm Demand
- Build a more dependable model based on regression
analysis - Make operations more efficient based on more
reliable forecasts - Monitor patterns in the overall demand for the
industry
21 22Descriptions of the Variables
- FD Firm Demand
- Demand for our Firms product
- TID Total Industry Demand
- Includes demand for all competitors in the market
- MS Market Share
- Firms percentage share of the market
- N Number of Competitors
- AFD Average Demand
- Average Demand for all competitors in the market
- Avg. Demand Total Industry Demand / N
- RD - Relative Demand
- The firms demand relative to the average demand
of the market - Relative Demand Firm Demand / Avg. Demand
23Descriptions of the Variables (continued)
- Firm Demand Total Industry Demand Market
Share - Market Share Firm Demand / Total Industry
Demand - Market Share Firm Demand / (Avg. Demand N)
- Market Share Relative Demand / N
- Hence
- Firm Demand Total Industry Demand (Relative
Demand / N)
24Predictor Variables for Total Industry Demand
- Total Industry Demand is a factor of individual
company demand - Demand for each company depends on macro-economic
influences and overall industry trends - Average Price Indicates industry trends in
pricing, and is reasonable to use instead of
individual competitor prices - Average Advertising Indicates industry
expenditures in promotions, marketing - In this case, also factors in RD
25Predictor Variables for Relative Demand
- Relative Demand is a measure of the firms market
share - Hence, the predictor variables should also be
relative to Industry Average - PREL Relative Price
- PREL Firm Price / Average Price
- AREL Relative Advertising
- AREL Firm Advertising / Average Advertising
- RD1 Relative Loyalty
- This is estimated for the analysis by using the
Relative Demand from the previous quarter
26 27TID Summary Measures
28TID Describing Avg. Price with Graphs
- Values seem to be trending down over time
- Relatively stable - In the 375 range for the 19
quarters observed - Median is to the right of the Mean in the
Box-Plot - Indicates negative skewness
29TID Describing Avg. Advertising with Graphs
- Some indications of seasonality from the
time-series plot - Values change significantly with time
- Median is almost the same as the mean
- Slight negative skewness exists
30TID Describing Total Industry Demand with Graphs
- Time series plot indicates a steadily increasing
demand for the product - Short term increases are steep
- Mean is to the right of the Median
- Indicates positive skewness
31TID Correlations Matrix
- Decent correlation between Quarter and TID
- Consider Quarter for regression analysis
- High correlation between Avg Price and TID
- Good candidate for regression analysis
- High correlation exists between Avg Advertising
and TID - Good candidate for regression analysis
- Significant correlation between Avg Price, Avg
Advertising and Quarter - Potential candidates for variable exclusion
during regression analysis
32RD Scatter Plot Arel vs RD
- Trendline indicates a positive relationship
- A correlation factor of .378 indicating less
correlation between the two variables - Arel is a potential candidate to be discarded
33RD Scatter Plot Prel vs RD
- Trendline indicates a negative relationship
- Has a correlation factor of -0.67 indicating a
fair amount of correlation between the two
variables
34RD Scatter Plot RD1 vs RD
- Trendline indicates a positive relationship
- Has a correlation factor of 0.711indicating a
fair amount of correlation between the two
variables
35RD Correlations Matrix
- RD has a reasonably high correlation to Prel and
RD1 - RD has a relatively lower correlation to Arel
- The variable could be discarded during regression
analysis - The correlations between Prel, Arel, and RD1 are
low enough to indicate that there will not be
problems related to multicollinearity
36 37TID Model Regression Analysis (1)
- Dependent Variable
- Total Industry Demand
- Independent Variable
- Quarter
- Resulting Equation
- TID 14218.0702 (645.9825 Quarter)
- The R2 for the resulting equation indicates that
the variable Quarter explains 48 of TID, and
hence the unexplained value is 52 - This makes for a very poor model despite the
p-value being very low which implies that the
probability of a Type 1 error is minimal
38TID Model Regression Analysis (2)
- Dependent Variable
- Total Industry Demand
- Independent Variables
- Quarter, Avg. Price, Avg. Advertising
- Resulting Equation
- TID 130249 (132.228 Quarter) (-358.613
Avg Price) (0.263 Avg. Advertising) - The R2 for the resulting equation indicates that
the model explains 90.69 of TID, and hence the
unexplained value is about 9, this generally
signifies a good model - p-value is within tolerance levels for all
variables, except Quarter which has a p-value of
0.21 and includes the 0 value in its range. Hence
Quarter must be discarded
39TID Model Regression Analysis (3)
- Dependent Variable
- Total Industry Demand
- Independent Variables
- Avg. Price, Avg. Advertising
- Resulting Equation
- TID 164336.17 (-445.168 Avg Price) (0.262
Avg. Advertising) - The R2 for the resulting equation indicates that
the model explains 89.67 of TID, and hence the
unexplained value is about 10.5 - p-value is within tolerance levels for all
variables, and none of the variables include the
0 value in its range - This model is good, and is the model that will be
used for TID
40RD Model Regression Analysis (3)
- Dependent Variable
- Relative Demand
- Independent Variables
- Relative Pricing (Prel), Relative Advertising
(Arel), Loyalty (RD1) - Resulting Equation
- RD 16.13 (-16.445 Prel) (0.779 Arel)
(0.533 RD1) - The R2 for the resulting equation indicates that
the model explains 95.75 of RD, and hence the
unexplained value is about 4.25 - p-value is within tolerance levels for all
variables, and none of the variables include the
0 value in its range - This model is very good, and is the model that
will be used for RD
41 42Verification of TID Model
43Verification of RD Model
44 45DSS Forecasting Firm Demand
- The coefficients from the TID model and the RD
model are entered into this model - Based on estimated values for the various items
on the left, the Firm Demand is calculated by
this model