Title: Welcome to Econ 420 Applied Regression Analysis
1Welcome to Econ 420 Applied Regression Analysis
- Study Guide
- Week Two
- Ending Sunday, September 9
- (Note You must go over these slides and complete
every task outlined here by the end of the day on
September 8)
2Last week
- I asked you to report your heights and weights
before Sunday September 2 - That meant by the end of the day on Saturday,
September 1. - I did not hear from 4 of the students who are
registered in this class - Remember that this affects your grade
3Here is our sample data on height and weight.
4Assignment 1(Carries 30 points and is due before
noon on Thursday, September 6)
- Use the data set on the previous slide and the
formulas on Page 8 (1-5 and 1-6) to estimated the
coefficients ß0 and ß1 in the equation below - W ß0 ß1 H
- Make sure to show your work.
- Do the estimated coefficients make sense to you?
- What is the meaning of the estimated
coefficients?
5Assignment 1 continued
- 2. Answer Question 5 on Page 15
- 3. Answer Question 8 on Page 15
- Type your answers and send them to me as an email
attachment. Remember that I have an old version
of word (2003). If you are using a newer version
of word, you will need to save your work in the
old format.
6Note
- The following notes are not going to take the
place of the discussions covered in your text
books - First read the book
- Then look at the notes
7Total, Explained and Residual Sum of Squares
(PP11-13)
- Remember our height/weight example
- What is the average weight of the class?
- Duplicate the graph on Page 12 where Y is the
weight and X is the height - The Fitted Line will be upward sloping
- The Average Line (average weight) will be
horizontal
8Suppose instead of using the fitted line to
predict someones weight we use the average line
- Y is the actual weight of a person.
- Y is the predicted weight according to the
fitted line. - Y bar is the average weight in the sample.
- (Y Ybar) is how much the weight of a given
individual is different from the average. - (Y - Ybar) is how much our fitted line is closer
to the actual weight than the average weight. - (Y Y) is our residual
- The portion of the weight that was not predicted
(explained) by our fitted line
9Remember we have 8 observations in our sample
- Some of our weights are below average and some
are above average. - Look at Equation 1-8, Page 12
- The reason why we square (Y Ybar), (Y - Ybar)
and (Y Y) is because we do not want the
positive differences to cancel the negative
differences - Note the best fitted line will be the one with
the lowest (Y Y) 2
10Multiple Regression Model (Chapter 2, PP20-29)
- Is height the only factor affecting weight?
- Of course not.
- What are some other factors affecting an
individuals weight? - Age
- Calorie in take per day
-
11So a better model will be
- Y ß0 ß1 X1 ß2 X2 ß3 X3 e
- Where Y is weight and X1 through X3 are Wight,
Age, and Calorie intake. - We will use EViews to estimate the coefficients
of the a multiple regression model.
12The meaning of the estimated coefficients
- Our estimated equations will be
- Y ß0 ß1 X1 ß2 X2 ß3 X3
- Bonus Can someone tell me why didnt I put an
e at the end of the above equation? - ß1 measures the effect of one more inch of
height on weight, holding the age and the calorie
intake constant and ignoring the effect of all
other variables on weight. - Similarly ß2 measures the effect of one more
year of age on weight , holding the weight and
the calorie intake constant and ignoring the
effect of all other variables on weight.
13How big should the sample be?
- The bigger the sample the closer the ß will be
to ß. - Rule of thumb Degrees of Freedom gt30
- Degrees of Freedom n- k-1
- Where n is the sample size and k is the number
of independent variables.
14The Classical Assumption
- Assumptions that have to be met in order for OLS
to give us the best estimators.
15Assumption 1
- The regression equation
- Is linear in coefficients (not linear in
variables) - Is correctly specified (right functional form, no
omitted variables, no irrelevant variables) - Has additive error term
16Assumption 2
- Two or more independent variables are not
perfectly correlated with each other. - If violated ? Perfect Multicollinearity
- Example
- Consumption f (inflation, real interest rate,
nominal interest rate, .) - Since real interest nominal interest
inflations, - The 3 independent variables are perfectly and
linearly correlated with each other. When one
independent variable changes, the others change
too. OLS can not capture the effect of one
variable in isolation
17Assumption 3
- No correlation between the explanatory
(independent) variables and the error term - What if it is violated?
- Example Salary f (Education,.,GPA)
- What if people with low GPA lie about their GPAs?
- When GPA is low, the error is always positive
- Problem OLS attributes the variation in salary
to the variation in GPA while it is in part
caused by the variation in error.
18Assumption 4
- The error terms are uncorrelated with each other
- What if it is violated?
- Then we have autocorrelation (serial correlation)
problem - Example Consumption f (., income)
- Suppose we use time series data on the US economy
to estimate the above model. - Suppose that in 5 years of our study there was a
war and consumption dropped significantly even
though income didnt. So, we will get negative
errors during those years and they all seem to be
correlated with each other.
19Assumption 5
- The error term must have a zero mean
- What if this assumption is violated
- This is not a big deal the intercept will pick
up the mean of the error term
20Assumption 5
- The error term has a constant variance
- What if it is violated?
- Problem of Heteroskedasticity
- Example Consumption f (., income)
- Suppose we use cross section data on various
individuals to estimate the above model. - People with low levels of income will probably
spend most of their income. (The variance of the
error is small) - People with high levels of income may spend
anywhere between 10 to 99 of their income. (The
variance of the error is high.) (Figure 2-1)
21Assumption 7 (Not Necessary)
- The error term is normally distributed
- What is a normal distribution?
- Symmetric, continuous, bell shaped
- Can be characterized by its mean and variance
- Must know if it is violated
- If violated, some statistical tests are not
applicable - As the size of sample goes up ? the distribution
becomes more normal