Title: Exam%20Feb%2028:%20sets%201,2
1Exam Feb 28 sets 1,2
- Set 1 due Thurs
- Memo C-1 due Feb 14
- Free tutoring will be available next week
Plan A MW 4-6PM OR
Plan B TT 2-4PM
VOTE for
Plan A or Plan B
Announce results Thurs
2Kinderman Supplement
- Ch 2 Multiple Regression
- Ch 3 Analysis of Variance
3MULTIPLE REGRESSION
4Example
- Reference Statistics for Managers
- By Levine, David M Berenson Stephan
- Second edition (1999)
- Prentice Hall
5Y dependent variable heating oil sales (gal)
- X1 Temperature (degrees)
- X2 Insulation (inches)
- X1 and X2 are independent variables
- Y bo b1X1 b2X2
- Enter data to Excel
- NOTE If you cant find Data Analysis, try
Add-Ins
6Y 562 5X1 20X2
- Bottom table
- Coefficient Column
7Interpret coefficients
- Intercept bo 562 If temp 0 and insulation
0, heating oil sales 562 - b1 -5 For all homes with same insulation, each
1 degree increase in temperature should decrease
heating oil sales by 5 gallons - b2 -20 For all months with same temp, each
additional 1 inch of insulation should decrease
sales by 20 gallons
8Categorical Variables
- X 0 or 1
- Example 0 if male, 1 if female
- Example 1 if graduate, 0 if drop out
- Example 1 if citizen, 0 if alien
- NOTE not in this fuel oil example
9Estimate sales if temp 30, insulation 6
- Y 562 -5(30) 20(6) 292 gal
10Standard Error 26Top table
- Interpret Typical fuel oil sales were about 26
gal away from average fuel oil sales of other
homes with same temp and insulation
11COEFFICIENT OF MULTIPLEDETERMINATION
- Top table, R square
- Interpret 96 of total variation in fuel oil
sales can be explained by variation in
temperature and insulation
12Is there a relationship between all independent
variables and dependent variables?
- Ho Null hypothesis All coefficients 0
- Ho NO Relationship
- H1 Alternative hypothesis At least one
coefficient is not zero
H1 There is a relationship
13Computer output Sample data
- Hypotheses Population parameters
- Ho Parameters 0, but sample data makes it
appear that there is a relationship - Simple regression Ho zero slope vs H1
slope positive or slope negative
14Exponents
15Decision Rule
- Reject Ho if Significance F lt alpha
- Middle table
- Fuel oil example Significance F 1.6E-09
- Excel E Exponent
- 1.6E-09 1.610-9 0.0000000016
- Approaches zero as limit
16Significance Fp-value
- Excel uses p-value only if t distribution
- Significance F probability F is greater than
Sample F
17Assume alpha .05
- Since 0 lt .05, reject Ho
- We conclude there IS a relationship between fuel
oil sales and the independent variables
18Which independent variables seem to be important
factors?
- Ho Temperature not important factor
- H1 Temperature is important
- Reject Ho if p-value lt alpha
- Bottom table p-value column, X1 row
- P-value 1.6E-09, or zero
- Reject Ho
- Temp is important
19Insulation
- Ho insulation unimportant
- H1 insulation important
- P-value 1.9E-06, or zero
- Reject Ho
- Insulation important
20Analysis of Variance (ANOVA)
21X number of auto accidents
Live in City Live in Suburb Live in rural
1 2 1
3 0 0
2 1 0
22Hypothesis Testing
- Ho µ1 µ2 µ 3
- H1 Not all means are
- H1 There are differences among 3 populations
- H1 Average number of accidents different
depending on where you live
23This course manual calculations
- If you used computer software, you could have as
many populations as needed - Homework, exam 3 populations
- Computer 4 or more populations
- Ex Ethnic classifications at CSUN
24Sample Sizes
- Column 1 n1 number of drivers sampled from
policyholders living in city 3 - Column 2 n2 sampled from suburban drivers 3
- Col 3 n3 sampled from rural 3
- Number of rows of data
- Kinderman example Different sample sizes
25n n1 n2 n3
26X number of auto accidents
Live in City Live in Suburb Live in rural
1X11 2 1
3X21 0 0
2X31 1 0
27(No Transcript)
28(No Transcript)
29Do not assume n13 on exam
30(No Transcript)
31X number of auto accidents
Live in City Live in Suburb Live in rural
1X11 2 1
3X21 0 0
2X31 1 0
S6 S3 S1
Sample mean2 Sample mean1 Sample mean.3
32(No Transcript)
33(No Transcript)
34Hypotheses
- Ho Differences in sample means due to chance,
but no differences if ALL drivers were included
(Prop 103) - H1 Population means are different because city
drivers have more accidents
35(No Transcript)
36(No Transcript)
37(No Transcript)
38Grand mean 1.1
39SSB Sum of Squares Between
- Between 3 groups
- Explained Variation
- Here Variation in number of accidents explained
by where you live (city, suburb, rural) - If where you live did not affect accidents, we
would expect SSB 0 - Next slide SSB formula
40(No Transcript)
41X number of auto accidents
Live in City Live in Suburb Live in rural
1X11 2 1
3X21 0 0
2X31 1 0
S6 S3 S1
Sample mean2 Sample mean1 Sample mean.3
42This example
- SSB 3(2-1.1)23(1-1.1)2 3(.3-1.1)2 4.2
43MSB Mean Square Between
- MSB SSB/2
- Note OK for this course, but bigger problems
would have bigger denominator - MSB 4.2/2 2.1
44SSE Sum of Squared Error
- Variation within group
- Ex Variation within group of city drivers
- Unexplained variation
- If every city driver had same number of
accidents, we would expect SSE 0 - Formula on next slide
45(No Transcript)
46(No Transcript)
47X number of auto accidents
Live in City Live in Suburb Live in rural
1X11 2 1
3X21 0 0
2X31 1 0
S6 S3 S1
Sample mean2 Sample mean1 Sample mean.3
48(1-2)2 (3-2)2 (2-2)2 (2-1)2 (0-1)2
(1-1)2 (1-.3)2 (0-.3)2 (0-.3)2
49MSE Mean Square Error
- Mean Square Within
- Next slide is formula for this course.
- Bigger problems have bigger denominator
50(No Transcript)
51(No Transcript)
52MSE 0.78
53F RATIO
- Sample F statistic
- Test statistic
- SAM F
54(No Transcript)
55(No Transcript)
56Sam F 2.7
- Extreme case1 Where you live does not affect
number of accidents, so SSB 0, so MSB 0, so
sam F 0 - Extreme case 2 Every city driver has same
number of accidents, etc, so SSE 0, so MSE 0,
so sam F is very large
57Critical F cr F
- F table at end of Kinderman Supplement
- Appendix A, Table A.3, p 60 in Second Edition
(assumes alpha .05) - Column 2 (denominator of MSB)
- Row n 3 (denominator of MSE)
- Correct for this course, different for bigger
problems
58Example
- Col 2
- Row 9-3 6
- Cr F 5.14
59Hypothesis Testing
- Ho µ1 µ2 µ 3
- H1 Not all means are
- H1 There are differences among 3 populations
- H1 Average number of accidents different
depending on where you live
60Decision Rule
- Reject Ho if sam F gt cr F
- Only right tail since SSBgt0, SSEgt0, so sam Fgt0
- If you reject Ho, you conclude that where you
live affects number of accidents - If you do not reject Ho, you conclude that there
is too much variation within city drivers, etc to
draw any conclusions
61Example
- Since 2.7 is NOT gt 5.14, we can NOT reject Ho
- Differences between city and suburb, etc are NOT
significant
62Computer Approach
- Similar to multiple regression
- Reject Ho if Significance F lt alpha
- Needed if more than 3 groups