Simple Regression Models of Estimation - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Simple Regression Models of Estimation

Description:

Interpreting the Results. Yi = 1636.415 1.487Xi ... There is a strong significant correlation between reading time and GPA (0.86) ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 34
Provided by: bagu5
Category:

less

Transcript and Presenter's Notes

Title: Simple Regression Models of Estimation


1
Simple Regression Models of Estimation
  • Bernardo Aguilar-Gonzalez

2
Purpose of Regression and Correlation Analysis
  • Regression Analysis is Used Primarily for
    Prediction
  • A statistical model used to predict the values
    of a dependent or response variable based on
    values of at least one independent or
    explanatory variable
  • Very related to Correlation Analysis is Used
    to Measure Strength of the Association Between
    Numerical Variables

3
The Scatter Diagram
Plot of all (Xi , Yi) pairs
4
Types of Regression Models
Positive Linear Relationship
Relationship NOT Linear
Negative Linear Relationship
No Relationship
5
Simple Linear Regression Model
  • Relationship Between Variables Is a Linear
    Function
  • The Straight Line that Best Fit the Data

Y intercept
Random Error
Dependent (Response) Variable
Independent (Explanatory) Variable
Slope
6
Population Linear Regression Model
Y
Y
X
?
?
?
?
?
?
Observed Value
i
i
i
0
1
?
Random Error
i
?
?
?
X
?
?
i
0
1
YX
X
Observed Value
7
Sample Linear Regression Model
?
?
Yi
Predicted Value of Y for observation i
Xi
Value of X for observation i
b0
Sample Y - intercept used as estimate of the
population ?0
b1
Sample Slope used as estimate of the
population ?1
8
Simple Linear Regression Equation Example
Annual Store Square Sales
Feet (000) 1 1,726 3,681 2
1,542 3,395 3 2,816 6,653 4
5,555 9,543 5 1,292 3,318 6
2,208 5,563 7 1,313 3,760
You wish to examine the relationship between the
square footage of produce stores and its annual
sales. Sample data for 7 stores were obtained.
Find the equation of the straight line that fits
the data best
9
Scatter Diagram Example
10
Equation for the Best Straight Line
?
11
Graph of the Best Straight Line
Yi 1636.415 1.487Xi
?
12
Interpreting the Results
?
Yi 1636.415 1.487Xi
The slope of 1.487 means for each increase of one
unit in X, the Y is estimated to increase
1.487units.
For each increase of 1 square foot in the size of
the store, the model predicts that the expected
annual sales are estimated to increase by 1487.
13
Measures of VariationThe Sum of Squares
  • SST Total Sum of Squares
  • measures the variation of the Yi values around
    their mean Y

_
  • SSR Regression Sum of Squares
  • explained variation attributable to the
    relationship between X and Y
  • SSE Error Sum of Squares
  • variation attributable to factors other than the
    relationship between X and Y

14
Measures of Variation The Sum of Squares
Y
?
SSE ?(Yi - Yi )2
_
Yi b0 b1Xi
?
SST ?(Yi - Y)2
_
?
SSR ?(Yi - Y)2
_
Y
X
Xi
15
Measures of VariationThe Sum of Squares Example
SSR
SSE
SST
16
The Coefficient of Determination
SSR regression sum of squares
r2
SST total sum of squares
Measures the proportion of variation that is
explained by the independent variable X in the
regression model
17
Coefficients of Determination (r2) and
Correlation (r)
r2 1,
Y
r 1
Y
r2 1,
r -1

Y

b

b
X
i
0
1
i

Y

b

b
X
i
0
1
i
X
X
r2 .8,
r2 0,
r 0.9
r 0
Y
Y


Y

b

b
X
Y

b

b
X
i
0
1
i
i
0
1
i
X
X
18
Standard Error of Estimate
?

The standard deviation of the variation of
observations around the regression line
19
Measures of Variation Example
Syx
r2 .94
94 of the variation in annual sales can be
explained by the variability in the size of the
store as measured by square footage
20
Linear Regression Assumptions
For Linear Models
  • 1. Normality
  • Y Values Are Normally Distributed For Each X
  • Error is Normally Distributed
  • 2. Homoscedasticity (Constant Variance)
  • 3. Independence of Errors

21
Inferences about the Slope t Test
  • t Test for a Population Slope Is a Linear
    Relationship Between X Y ?
  • Null and Alternative Hypotheses
  • H0 ?1 0 (No Linear Relationship) H1
    ?1 ? 0 (Linear Relationship)
  • Test Statistic

Where
and df n - 2
22
Example Produce Stores
Data for 7 Stores
Regression Model Obtained
Annual Store Square Sales
Feet (000) 1 1,726 3,681 2
1,542 3,395 3 2,816 6,653 4
5,555 9,543 5 1,292 3,318 6
2,208 5,563 7 1,313 3,760
?
Yi 1636.415 1.487Xi
The slope of this model is 1.487. Is there a
linear relationship between the square footage of
a store and its annual sales?
23
Inferences about the Slope t Test Example
Test Statistic Decision Conclusion
  • H0 ?1 0
  • H1 ?1 ? 0
  • ? ? .05
  • df ? 7 - 2 7
  • Critical Value(s)

Reject H0
Reject
Reject
.0.025
.0.025
There is evidence of a relationship.
t
0
2.5706
-2.5706
24
Estimation of Predicted Values
25
Example Produce Stores
Data for 7 Stores
Annual Store Square Sales
Feet (000) 1 1,726 3,681 2
1,542 3,395 3 2,816 6,653 4
5,555 9,543 5 1,292 3,318 6
2,208 5,563 7 1,313 3,760
Predict the annual sales for a store with 4200
square feet.
Regression Model Obtained
?
Yi 1636.415 1.487Xi
26
Answer
?
  • Predicted Sales Yi 1636.415 1.487Xi
    7,881.82
  • Yet,
  • 1- How good is this estimate?
  • 2- Could I have values that give me a confidence
    interval?
  • You will be able to find these answers out in
    your next statistics course!

27
Ok, so, do the problem in Chapter 15
  • The Data set is in the NAAGE site

28
Estimate the descriptives and correlation
29
So
  • GPAi 2.140 0.168 Reading Timei
  • This means that for each increase of 1 average
    hour in reading with parents, the model predicts
    that the expected GPA is estimated to increase by
    0.168.
  • There is a strong significant correlation between
    reading time and GPA (0.86)
  • 73.9 of the variation in GPA can be explained by
    the variability in reading time as measured by
    square footage (Good r2).
  • ,

30
  • The slope of this model is 0.168.
  • Is there a significant linear relationship
    between the reading time and the GPA?
  • Examining the T-statistic (8.075) value and the P
    value (0.00) we Reject Ho at both 0.05 and 0.01
    levels and conclude that there is a significant
    linear relationship.
  • So, it seems like this is a good model for
    prediction of this linear relationship. If we
    have a student that spends on average 10 hours
    per week reading with his/her parents we can
    safely estimate that his/her GPA will be of
    approximately 3.82

31
To confirm the good fit (prediction power and
strong significant relationship) we can look at
the scatter plot of the relationship between GPA
and reading time
32
It could have a better fit trying some other form
(log linear maybe).
33
Homework
  • Use the General Social Survey for the Year 2000,
    GSS00 data set and simple linear regression
    estimation methods to determine if
  • There is a significant linear relationship
    between the strength of religious affiliation of
    respondents and the happiness in marriage
  • There is a significant linear relationship
    between the race of the respondent and total
    family income
  • There is a significant linear relationship
    between total family income and hours per day
    watching TV
  • There is a significant linear relationship
    between occupational prestige and total family
    income
Write a Comment
User Comments (0)
About PowerShow.com