Statistics for Business and Economics - PowerPoint PPT Presentation

1 / 112
About This Presentation
Title:

Statistics for Business and Economics

Description:

You work in advertising for the New York Times. You want to find the effect of ad size (sq. in. ... in advertising for the New York Times. You want to find the ... – PowerPoint PPT presentation

Number of Views:115
Avg rating:3.0/5.0
Slides: 113
Provided by: johnjmcgil
Category:

less

Transcript and Presenter's Notes

Title: Statistics for Business and Economics


1
Statistics for Business and Economics
  • Chapter 11 Multiple Regression and Model
    Building
  • John J. McGill/Lyn Noble
  • Revisions by Peter Jurkat

2
Learning Objectives
  • Explain the Linear Multiple Regression Model
  • Describe Inference About Individual Parameters
  • Test Overall Significance
  • Explain Estimation and Prediction
  • Describe Various Types of Models
  • Describe Model Building
  • Explain Residual Analysis
  • Describe Regression Pitfalls

3
Types of Regression Models
4
Models With Two or More Quantitative Variables
5
Types of Regression Models
6
Multiple Regression Model
  • General form
  • k independent variables
  • x1, x2, , xk may be functions of variables
  • e.g. x2 (x1)2
  • Example PropertyPrice b0 b1(LotSize)
    b2(LivingArea) b3(NoRooms) b4(BRs) b5(Pool)

7
Regression Modeling Steps
  • Hypothesize deterministic component
  • Estimate unknown model parameters
  • Specify probability distribution of random error
    term
  • Estimate standard deviation of error
  • Evaluate model
  • Use model for prediction and estimation

8
Probability Distribution of Random Error
9
Linear Multiple Regression Model
10
Types of Regression Models
11
Regression Modeling Steps
  • Hypothesize deterministic component
  • Estimate unknown model parameters
  • Specify probability distribution of random error
    term
  • Estimate standard deviation of error
  • Evaluate model
  • Use model for prediction and estimation

12
FirstOrder Multiple Regression Model
  • Relationship between 1 dependent and 2 or more
    independent variables is a linear function

Population slopes
Population Y-intercept
Random error
Dependent (response) variable
Independent (explanatory) variables
13
Assumptions for Probability Distribution of e
  • Mean is 0
  • Constant variance, s2
  • Normally Distributed
  • Errors are independent

14
First-Order Model With 2 Independent Variables
  • Relationship between 1 dependent and 2
    independent variables is a linear function
  • Model
  • Assumes no interaction between x1 and x2
  • Effect of x1 on E(y) is the same regardless of x2
    values

15
Population Multiple Regression Model
Bivariate model
y
(Observed y)
b
Response
e
0
i
Plane
x2
x1
(x1i , x2i)

16
Sample Multiple Regression Model
Bivariate model
y
(Observed y)

b
Response
0

e
Plane
i
x2
x1
(x1i , x2i)
17
Parameter Estimation
18
Regression Modeling Steps
  • Hypothesize Deterministic Component
  • Estimate Unknown Model Parameters
  • Specify Probability Distribution of Random Error
    Term
  • Estimate Standard Deviation of Error
  • Evaluate Model
  • Use Model for Prediction Estimation

19
First-Order Model Worksheet
Case, i
yi
x1i
x2i
1
1
1
3
2
4
8
5
3
1
3
2
4
3
5
6




Run regression with y, x1, x2
20
Multiple Linear Regression Equations
Too complicated by hand!
Ouch!
21
Interpretation of Estimated Coefficients
22
1st Order Model Example
  • You work in advertising for the New York Times.
    You want to find the effect of ad size (sq. in.)
    and newspaper circulation (000) on the number of
    ad responses (00). Estimate the unknown
    parameters.
  • NYTAdSizeCirc.xls

Youve collected the following data (y)
(x1) (x2)Resp Size Circ 1 1 2 4 8 8 1 3
1 3 5 7 2 6 4 4 10 6
23
Parameter Estimation Computer Output
  • Parameter Estimates
  • Parameter Standard T for H0
  • Variable DF Estimate Error Param0 ProbgtT
  • INTERCEP 1 0.0640 0.2599 0.246 0.8214
  • ADSIZE 1 0.2049 0.0588 3.656 0.0399
  • CIRC 1 0.2805 0.0686 4.089 0.0264

24
Interpretation of Coefficients Solution
25
Estimation of s2
26
Regression Modeling Steps
  • Hypothesize deterministic component
  • Estimate unknown model parameters
  • Specify probability distribution of random error
    term
  • Estimate standard deviation of error
  • Evaluate model
  • Use model for prediction and estimation

27
Regression Modeling Steps
  • Hypothesize Deterministic Component
  • Estimate Unknown Model Parameters
  • Specify Probability Distribution of Random Error
    Term
  • Estimate Standard Deviation of Error
  • Evaluate Model
  • Use Model for Prediction Estimation

28
Estimation of s2
For a model with k independent variables
29
Calculating s2 and s Example
  • You work in advertising for the New York Times.
    You want to find the effect of ad size (sq. in.),
    x1, and newspaper circulation (000), x2, on the
    number of ad responses (00), y. Find SSE, s2, and
    s.

30
Analysis of Variance Computer Output
  • Analysis of Variance
  • Source DF SS MS F
    PRegression 2 9.249736 4.624868 55.44
    .0043 Residual Error 3 .250264
    .083421Total 5 9.5

31
Evaluating the Model
32
Regression Modeling Steps
  • Hypothesize Deterministic Component
  • Estimate Unknown Model Parameters
  • Specify Probability Distribution of Random Error
    Term
  • Estimate Standard Deviation of Error
  • Evaluate Model
  • Use Model for Prediction Estimation

33
Evaluating Multiple Regression Model Steps
  • Examine variation measures
  • Test parameter significance
  • Individual coefficients
  • Overall model
  • Do residual analysis

34
Variation Measures
35
Evaluating Multiple Regression Model Steps
  • Examine variation measures
  • Test parameter significance
  • Individual coefficients
  • Overall model
  • Do residual analysis

36
Multiple Coefficient of Determination
  • Proportion of variation in y explained by all x
    variables taken together
  • Never decreases when new x variable is added to
    model
  • Only y values determine SSyy
  • Disadvantage when comparing models

37
Adjusted Multiple Coefficient of Determination
  • Takes into account n and number of parameters
  • Similar interpretation to R2

38
Estimation of R2 and Ra2 Example
  • You work in advertising for the New York Times.
    You want to find the effect of ad size (sq. in.),
    x1, and newspaper circulation (000), x2, on the
    number of ad responses (00), y. Find R2 and Ra2.

39
Excel Computer OutputSolution
40
Testing Parameters
41
Evaluating Multiple Regression Model Steps
  • Examine variation measures
  • Test parameter significance
  • Individual coefficients
  • Overall model
  • Do residual analysis

42
Inference for an Individual ß Parameter
  • Confidence Interval
  • Hypothesis Test Ho ßi 0 Ha ßi ? 0 (or lt or
    gt )
  • Test Statistic

43
Confidence Interval Example
  • You work in advertising for the New York Times.
    You want to find the effect of ad size (sq. in.),
    x1, and newspaper circulation (000), x2, on the
    number of ad responses (00), y. Find a 95
    confidence interval for ß1.

44
Excel Computer OutputSolution
45
Confidence IntervalSolution
46
Hypothesis Test Example
  • You work in advertising for the New York Times.
    You want to find the effect of ad size (sq. in.),
    x1, and newspaper circulation (000), x2, on the
    number of ad responses (00), y. Test the
    hypothesis that the mean ad response increases as
    circulation increases (ad size constant). Use a
    .05.

47
Hypothesis Test Solution
  • H0
  • Ha
  • ? ?
  • df ?
  • Critical Value(s)

Test Statistic Decision Conclusion
48
Excel Computer OutputSolution
49
Hypothesis Test Solution
  • H0
  • Ha
  • ? ?
  • df ?
  • Critical Value(s)

Test Statistic Decision Conclusion
Reject at ? .05
There is evidence the mean ad response increases
as circulation increases
50
Excel Computer OutputSolution
PValue
51
Evaluating Multiple Regression Model Steps
  • Examine variation measures
  • Test parameter significance
  • Individual coefficients
  • Overall model
  • Do residual analysis

52
Testing Overall Significance
  • Shows if there is a linear relationship between
    all x variables together and y
  • Hypotheses
  • H0 ?1 ?2 ... ?k 0
  • No linear relationship
  • Ha At least one coefficient is not 0
  • At least one x variable affects y

53
Testing Overall Significance
  • Test Statistic
  • Degrees of Freedom?1 k ?2 n (k 1)
  • k Number of independent variables
  • n Sample size

54
Testing Overall Significance Example
  • You work in advertising for the New York Times.
    You want to find the effect of ad size (sq. in.),
    x1, and newspaper circulation (000), x2, on the
    number of ad responses (00), y. Conduct the
    global Ftest of model usefulness. Use a .05.

55
Testing Overall Significance Solution
  • H0
  • Ha
  • ?
  • ?1 ?2
  • Critical Value(s)

Test Statistic Decision Conclusion
56
Testing Overall SignificanceComputer Output
  • Analysis of Variance
  • Sum of Mean
  • Source DF Squares Square F Value ProbgtF
  • Model 2 9.2497 4.6249 55.440 0.0043
  • Error 3 0.2503 0.0834
  • C Total 5 9.5000

MS(Model)
n (k 1)
MS(Error)
57
Testing Overall Significance Solution
  • H0
  • Ha
  • ?
  • ?1 ?2
  • Critical Value(s)

Test Statistic Decision Conclusion
Reject at ? .05
There is evidence at least 1 of the coefficients
is not zero
58
Testing Overall SignificanceComputer Output
Solution
  • Analysis of Variance
  • Sum of Mean
  • Source DF Squares Square F Value ProbgtF
  • Model 2 9.2497 4.6249 55.440 0.0043
  • Error 3 0.2503 0.0834
  • C Total 5 9.5000

MS(Model) MS(Error)
59
Interaction Models
60
Types of Regression Models
61
Interaction Model With 2 Independent Variables
  • Hypothesizes interaction between pairs of x
    variables
  • Response to one x variable varies at different
    levels of another x variable
  • Can be combined with other models
  • Example dummy-variable model

62
Effect of Interaction
Given
  • Without interaction term, effect of x1 on y is
    measured by ?1
  • With interaction term, effect of x1 on y is
    measured by ?1 ?3x2
  • Effect increases as x2 increases

63
No Interaction
E(y) 1 2x1 3x2
E(y)
12
8
4
0
x1
0
1
0.5
1.5
Effect (slope) of x1 on E(y) does not depend on
x2 value
64
Interaction Model Relationships
E(y) 1 2x1 3x2 4x1x2
E(y)
12
8
4
x1
0
0
1
0.5
1.5
Effect (slope) of x1 on E(y) depends on x2 value
65
Interaction Model Worksheet
Case, i
yi
x1i
x2i
x1i x2i
1
1
1
3
3
2
4
8
5
40
3
1
3
2
6
4
3
5
6
30





Multiply x1 by x2 to get x1x2. Run regression
with y, x1, x2 , x1x2
66
Interaction Example
  • You work in advertising for the New York Times.
    You want to find the effect of ad size (sq. in.),
    x1, and newspaper circulation (000), x2, on the
    number of ad responses (00), y. Conduct a test
    for interaction. Use a .05.

67
Interaction Model Worksheet
x1i
x2i
yi
x1i x2i
1
2
2
1
8
8
64
4
3
1
3
1
5
7
35
3
6
4
24
2
10
6
60
4
Multiply x1 by x2 to get x1x2. Run regression
with y, x1, x2 , x1x2
68
Excel Computer OutputSolution
Global Ftest indicates at least one parameter is
not zero
F
P-Value
69
Interaction Test Solution
  • H0
  • Ha
  • ? ?
  • df ?
  • Critical Value(s)

Test Statistic Decision Conclusion
70
Excel Computer OutputSolution
71
Interaction Test Solution
  • H0
  • Ha
  • ? ?
  • df ?
  • Critical Value(s)

Test Statistic Decision Conclusion
t 1.8528
Do no reject at ? .05
There is no evidence of interaction
72
SecondOrder Models
73
Types of Regression Models
74
Second-Order Model With 1 Independent Variable
  • Relationship between 1 dependent and 1
    independent variable is a quadratic function
  • Useful 1st model if non-linear relationship
    suspected
  • Model

75
Second-Order Model Relationships
?2 gt 0
?2 gt 0
y
y
x1
x1
?2 lt 0
?2 lt 0
y
y
x1
x1
76
Second-Order Model Worksheet
2
Case, i
yi
xi
xi
1
1
1
1
2
4
8
64
3
1
3
9
4
3
5
25




Create x2 column. Run regression with y, x, x2.
77
2nd Order Model Example
Errors (y) Weeks (x) 20 1 18 1
16 2 10 4 8 4 4 5 3 6 1 8 2 10 1 11 0 12
1 12
  • The data shows the number of weeks employed and
    the number of errors made per day for a sample of
    assembly line workers. Find a 2nd order model,
    conduct the global Ftest, and test if ß2 ? 0.
    Use a .05 for all tests.

78
Second-Order Model Worksheet
2
yi
xi
xi
1
1
20
1
1
18
2
4
16
4
16
10



Create x2 column. Run regression with y, x, x2.
79
Excel Computer Output Solution
80
Overall Model Test Solution
Global Ftest indicates at least one parameter is
not zero
F
P-Value
81
ß2 Parameter Test Solution
ß2 test indicates curvilinear relationship exists
t
82
Types of Regression Models
83
Second-Order Model With 2 Independent Variables
  • Relationship between 1 dependent and 2
    independent variables is a quadratic function
  • Useful 1st model if non-linear relationship
    suspected
  • Model

84
Second-Order Model Relationships
x2
x1
x2
x1
85
Second-Order Model Worksheet
2
2
Case, i
yi
x1i
x1i
x2i
x2i
x1ix2i
1
1
1
3
3
1
9
2
4
8
5
40
64
25
3
1
3
2
6
9
4
4
3
5
6
30
25
36







Multiply x1 by x2 to get x1x2 then create x12,
x22. Run regression with y, x1, x2 , x1x2, x12,
x22.
86
Models With One Qualitative Independent Variable
87
Types of Regression Models
88
Dummy-Variable Model
  • Involves categorical x variable with 2 levels
  • e.g., male-female college-no college
  • Variable levels coded 0 and 1
  • Number of dummy variables is 1 less than number
    of levels of variable
  • May be combined with quantitative variable (1st
    order or 2nd order model)

See QtrGDPAnalyzed.xls
89
Dummy-Variable Model Worksheet
Case, i
yi
x1i
x2i
1
1
1
1
2
4
8
0
3
1
3
1
4
3
5
1




x2 levels 0 Group 1 1 Group 2. Run
regression with y, x1, x2
90
Interpreting Dummy-Variable Model Equation
Given


y Starting salary of college graduates

x1 GPA
0 if Male
x2
1 if Female
91
Dummy-Variable Model Example
Computer Output
0 if Male
x2
1 if Female
92
Dummy-Variable Model Relationships

y
Same Slopes ?1
Female


?0 ?2
Male

?0
x1
0
0
93
Nested Models
94
Comparing Nested Models
  • Contains a subset of terms in the complete (full)
    model
  • Tests the contribution of a set of x variables to
    the relationship with y
  • Null hypothesis H0 ?g1 ... ?k 0
  • Variables in set do not improve significantly the
    model when all other variables are included
  • Used in selecting x variables or models
  • Part of most computer programs

95
Selecting Variables in Model Building
96
Selecting Variables in Model Building
A butterfly flaps its wings in Japan, which
causes it to rain in Nebraska. -- Anonymous
Use Theory Only!
Use Computer Search!
97
Model Building with Computer Searches
  • Rule Use as few x variables as possible
  • Stepwise Regression
  • Computer selects x variable most highly
    correlatedwith y
  • Continues to add or remove variables depending on
    SSE
  • Best subset approach
  • Computer examines all possible sets

98
Residual Analysis
99
Evaluating Multiple Regression Model Steps
  • Examine variation measures
  • Test parameter significance
  • Individual coefficients
  • Overall model
  • Do residual analysis

100
Residual Analysis
  • Graphical analysis of residuals
  • Plot estimated errors versus xi values
  • Difference between actual yi and predicted yi
  • Estimated errors are called residuals
  • Plot histogram or stem--leaf of residuals
  • Purposes
  • Examine functional form (linear v. non-linear
    model)
  • Evaluate violations of assumptions

101
Residual Plot for Functional Form
Add x2 Term
Correct Specification
102
Residual Plot for Equal Variance
Unequal Variance
Correct Specification
Fan-shaped.Standardized residuals used
typically.
103
Residual Plot for Independence
Not Independent
Correct Specification

e
x
Plots reflect sequence data were collected.
104
Residual Analysis Computer Output
  • Dep Var Predict Student
  • Obs SALES Value Residual Residual -2-1-0 1 2
  • 1 1.0000 0.6000 0.4000 1.044
  • 2 1.0000 1.3000 -0.3000 -0.592
  • 3 2.0000 2.0000 0 0.000
  • 4 2.0000 2.7000 -0.7000 -1.382
  • 5 4.0000 3.4000 0.6000 1.567

Plot of standardized (student) residuals
105
Regression Pitfalls
106
Regression Pitfalls
  • Parameter Estimability
  • Number of different xvalues must be at least one
    more than order of model
  • Multicollinearity
  • Two or more xvariables in the model are
    correlated
  • Extrapolation
  • Predicting yvalues outside sampled range
  • Correlated Errors

107
Multicollinearity
  • High correlation between x variables
  • Coefficients measure combined effect
  • Leads to unstable coefficients depending on x
    variables in model
  • Always exists matter of degree
  • Example using both age and height as
    explanatory variables in same model

See RealestateAnalyzed.xls
108
Detecting Multicollinearity
  • Significant correlations between pairs of x
    variables are more than with y variable
  • Nonsignificant ttests for most of the
    individual parameters, but overall model test is
    significant
  • Estimated parameters have wrong sign

109
Solutions to Multicollinearity
  • Eliminate one or more of the correlated x
    variables
  • Avoid inference on individual parameters
  • Do not extrapolate

110
Extrapolation
y
Interpolation
Extrapolation
Extrapolation
x
Sampled Range
111
NPP Not Straight
  • When regression measures cannot guarantee
    reliability (bent NPP, high Sig-F) can transform
    variables usually applied to DV
  • Raise values to some power see ladder of
    powers in Variable Transformations.doc
  • Powers gt 1 can make lift skewed less skewed
  • Powers lt 1 can make right skewed more skewed
  • Could make NPP straighter for DVs

112
Conclusion
  • Explained the Linear Multiple Regression Model
  • Described Inference About Individual Parameters
  • Tested Overall Significance
  • Explained Estimation and Prediction
  • Described Various Types of Models
  • Described Model Building
  • Explained Residual Analysis
  • Described Regression Pitfalls
Write a Comment
User Comments (0)
About PowerShow.com