Regression Assumptions - PowerPoint PPT Presentation

About This Presentation
Title:

Regression Assumptions

Description:

Regression Assumptions Best Linear Unbiased Estimate (BLUE) If the following assumptions are met: The Model is M1 Complete M2 Linear M3 Additive Variables are V1 ... – PowerPoint PPT presentation

Number of Views:117
Avg rating:3.0/5.0
Slides: 28
Provided by: AkosRo4
Learn more at: https://pages.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: Regression Assumptions


1
Regression Assumptions
2
Best Linear Unbiased Estimate (BLUE)
  • If the following assumptions are met
  • The Model is
  • Complete
  • Linear
  • Additive
  • Variables are
  • measured at an interval or ratio scale
  • without error
  • The regression error term is
  • normally distributed
  • has an expected value of 0
  • errors are independent
  • homoscedasticity
  • predictors are unrelated to error
  • In a system of interrelated equations the errors
    are unrelated to each other
  • Characteristics of OLS if sample is probability
    sample
  • Unbiased
  • Efficient

3
The Three Desirable Characteristics
  • Lack of bias
  • E(b)ß b is the sample ß is the true,
    population coefficient
  • On the average we are on target
  • Efficiency
  • Standard error will be minimum
  • Remember
  • OLS will minimize s2 (the error variance)
  • Consistency
  • As N increases the standard error decreases
  • Notice as N increases so does Sxi2

4
Completeness
. regress API13 MEALS AVG_ED P_EL P_GATE EMER
DMOB if AVG_EDgt0 AVG_EDlt6, beta Source
SS df MS Number
of obs 10082 -------------------------------
------------- ------------------------------------
-- F( 6, 10075) 2947.08 Model
65503313.6 6 10917218.9 Prob gt F
0.0000 Residual 37321960.3 10075
3704.41293 R-squared
0.6370 ------------------------------------------
----------------------------------------- Adj
R-squared 0.6368 Total 102825274
10081 10199.9081 Root MSE
60.864 ------------------------------------------
--------------------------------------------------
---------------- API13 Coef.
Std. Err. t Pgtt
Beta --------------------------------------------
--------------------------------------------------
------------- MEALS .1843877 .0394747
4.67 0.000 .0508435
AVG_ED 92.81476 1.575453 58.91 0.000
.6976283 P_EL .6984374
.0469403 14.88 0.000
.1225343 P_GATE .8179836 .0666113
12.28 0.000 .0769699 EMER
-1.095043 .1424199 -7.69 0.000
-.046344 DMOB 4.715438
.0817277 57.70 0.000
.3746754 _cons 52.79082 8.491632
6.22 0.000
. ------------------------------------------------
--------------------------------------------------
----------
Meals
. regress API13 MEALS AVG_ED P_EL P_GATE EMER
DMOB PCT_AA PCT_AI PCT_AS PCT_FI PCT_HI PCT_PI
PCT_MR if AVG_EDgt0 AVG_EDlt6, beta Source
SS df MS
Number of obs 10082 -----------------------
--------------------------------------------------
----------- F( 13, 10068) 1488.01
Model 67627352 13 5202104
Prob gt F 0.0000 Residual
35197921.9 10068 3496.01926 R-squared
0.6577 ----------------------------------
-------------------------------------------------
Adj R-squared 0.6572 Total
102825274 10081 10199.9081 Root
MSE 59.127 -----------------------------
--------------------------------------------------
------------------------------- API13
Coef. Std. Err. t Pgtt
Beta --------------------------------
--------------------------------------------------
--------------------------- MEALS .370891
.0395857 9.37 0.000
.1022703 AVG_ED 89.51041 1.851184
48.35 0.000 .6727917
P_EL .2773577 .0526058 5.27 0.000
.0486598 P_GATE .7084009
.0664352 10.66 0.000
.0666584 EMER -.7563048 .1396315
-5.42 0.000 -.032008 DMOB
4.398746 .0817144 53.83 0.000
.349512 PCT_AA -1.096513
.0651923 -16.82 0.000
-.1112841 PCT_AI -1.731408 .1560803
-11.09 0.000 -.0718944 PCT_AS
.5951273 .0585275 10.17 0.000
.0715228 PCT_FI .2598189
.1650952 1.57 0.116
.0099543 PCT_HI .0231088 .0445723
0.52 0.604 .0066676 PCT_PI
-2.745531 .6295791 -4.36 0.000
-.0274142 PCT_MR -.8061266
.1838885 -4.38 0.000
-.0295927 _cons 96.52733 9.305661
10.37 0.000
. ------------------------------------------------
--------------------------------------------------
---------
Parents education
5
Diagnosis and Remedy
  • Diagnosis
  • Theoretical
  • Remedy
  • Including new variables

6
Linearity
  • Violation of linearity
  • An almost perfect relationship will appear as a
    weak one
  • Almost all linear relations stop being linear at
    a certain point

7
Diagnosis Remedy
  • Diagnosis
  • Visual scatter plots
  • Comparing regression with continuous and dummied
    independent variable
  • Remedy
  • Use dummies
  • YabXe becomes
  • Yab1D1 bk-1Dk-1e where X is broken up into
    k dummies (Di) and k-1 is included. If the
    R-square of this equation is significantly higher
    than the R-square of the original that is a sign
    of non-linearity. The pattern of the slopes (bi)
    will indicate the shape of the non-linearity.
  • Transform the variables through a non-linear
    transformation, therefore
  • YabXe becomes
  • Quadratic Yab1Xb2X2e
  • Cubic Yab1Xb2X2b3X3e
  • Kth degree polynomial Yab1XbkXke
  • Logarithmic Yablog(X)e or
  • Exponential log(Y)abXe or Yeabxe
  • Inverse Yab/Xe etc.

8
Example
9
Meaningless!
Inflection point -b1/2b2 -(-3.666183)/2.018
1756100.85425 As you approach 100 the negative
effect disappears
10
Other non-linear functions Example Count Data
N Minimum Maximum Mean Std. Deviation
childs NUMBER OF CHILDREN 1751 0 8 1.89 1.665
DEPENDENT VARIABLE Underdispersion
Mean/Std.Dev.gt1 Overdispersion
Mean/Std.Dev.lt1 As Mean gtStd. Deviation we have
a case of a (small) underdispersion We care
about dispersion, because it tells us something
about not just how spread out is the distribution
but also about its shape. Remember that count
data cannot be less than 0. So if the mean is
less than the standard deviation, the
distribution will have to be asymmetric (often
with lots of 0s to keep the mean low, but a few
very large values to pull the Std.Dev. up.)
11
Poisson and Negative Binomial Regressions
Poisson regression assumes for the depedent
variable that MeanStd.Dev (No over- or
underdispersion). Then Where ? stands for all
the coefficients to be estimated (constant and
slopes). Use Negative Binomial regression when
there is overdispersion (when mean is smaller
than standard deviation). Overdispersion happens
when you have a lot of 0s.
alpha 0 means no over- or underdispersion
Here alpha is small but significantly different
from 0 (the 95 confidence interval does not
include 0).
Log of expected counts is now the unit of the
dependent variable
In this case, given the slight underdispersion,
you should opt for the Poisson regression.
12
Additivity
  • Yab1X1b2X2e
  • The assumption is that both X1 and X2 each,
    separately add to Y regardless of the value of
    the other.
  • You cannot simply add the two. X1 works
    differently, depending on the value of X2 .
  • There are many examples of the violation of
    additivity
  • E.g., the effect of previous knowledge (X1) and
    effort (X2) on grades (Y)
  • Less effort will bring better grades if you have
    previous knowledge about the material taught in
    the class.
  • The effect of gender and education on income
    (discrimination)
  • Women increase their income less by increasing
    their educational achievements. Education does
    not pay the same way for men and women.
  • The effect of paternal and maternal education on
    academic achievement
  • If you have an educated father, your moms
    education matters less (or if you have an
    educated mom, your fathers education matters
    less). You cannot just add the effect of the two
    parents education.

13
Diagnosis Remedy
  • Diagnosis
  • Try other functional forms and compare R-squares
  • Remedy
  • Introducing the multiplicative term as a new
    variable so
  • Yab1X1b2X2e becomes
  • Yab1X1b2X2b3Z e where ZX1X2
  • Suppose X2 is a dummy variable
  • If X20
  • Yab1X1b2X2b3Z e ab1X1b2X2b3X1X2 e
    ab1X1b20b3X10 e
  • ab1X1 e
  • If X2 1
  • Y ab1X1b2X2b3X1X2 e ab1X1b21b3X11
    e
  • (ab2) (b1b3)X1 e ab1X1 e
  • So when X20 the constant is a and the slope
    is b1
  • And when X21 the constant is a and the slope is
    b1
  • The difference between a and a is b2
  • The difference between b1 and b1 is b3

b1
Y
b3
b1
a'
b2
a
X1
14
Example with one dummy variable
Model Summary Model R R Square Adjusted R
Square Std. Error of the Estimate 1 .720(a) .519
.519 70.918 a Predictors (Constant), ESCHOOL,
AVG_ED
Does parents education matter more in elementary
school or later?
Coefficients(a) Model Unstandardized
Coefficients Standardized Coefficients t Sig.
B Std. Error Beta 1(Constant) 510.030 2.738
186.250 .000 AVG_ED 87.476 .930 .649 94.08
5 .000 ESCHOOL 54.352 1.424 .264 38.179 .000
a Dependent Variable API13
ESCHOOL1 if it is an elementary school ESCHOOL0
otherwise
Model Summary Model R R Square Adjusted R
Square Std. Error of the Estimate 1 .730(a) .533
.533 69.867 a Predictors (Constant), INTESXED,
AVG_ED, ESCHOOL
Coefficients(a) Model Unstandardized
Coefficients Standardized Coefficients t Sig.
B Std. Error Beta 1(Constant) 454.542 4.151
109.497 .000 AVG_ED 107.938 1.481 .801 72.
896 .000 ESCHOOL 145.801 5.386 .707 27.073 .00
0 AVG_EDESCHOOL(interaction) -33.145 1.885 -.49
5 -17.587 .000 a Dependent Variable API13
15
Equations
  • Pred(API13) 454.542 107.938AVG_ED
    145.801ESCHOOL(-33.145)AVG_EDESCHOOL
  • IF ESCHOOL1 i.e. school is an elementary school
  • Pred(API13) 454.542 107.938AVG_ED
    145.8011(-33.145)AVG_ED1
  • 454.542 107.938AVG_ED 145.801(-33.145)AVG_ED
  • (454.542 145.801) (107.938 -33.145)AVG_ED
  • 600.34374.793AVG_ED
  • IF ESCHOOL0 i.e. school is not an elementary but
    a middle or high school
  • Pred(API13) 454.542 107.938AVG_ED
    145.8010(-33.145)AVG_ED0
  • 454.542 107.938AVG_ED
  • The effect of parental education is larger after
    elementary school!
  • Is this difference statistically significant? Yes

Coefficients(a) Model Unstandardized
Coefficients Standardized Coefficients t Sig.
B Std. Error Beta 1(Constant) 454.542 4.151
109.497 .000 AVG_ED 107.938 1.481 .801 72.
896 .000 ESCHOOL 145.801 5.386 .707 27.073 .00
0 AVG_EDESCHOOL(interaction) -33.145 1.885 -.49
5 -17.587 .000 a Dependent Variable API13
16
Example with continuous variables
Does parents education work differently
depending on the percent English learners?
Yes. As English learners become more numerous
proportionally, the less positive effect parents
education has.
17
Proper Level of Measurement
18
Measurement Error
  • Take YabXe
  • Suppose XXe where X is the real value and e
    is a random measurement error
  • Then YabXe ? Yab(Xe)eabXbee ?
  • YabXE where Ebee and bb
  • The slope (b) will not change but the error will
    increase as a result
  • Our R-square will be smaller
  • Our standard errors will be larger ? t-values
    smaller ? significance smaller
  • Suppose XXcWe where W is a systematic
    measurement error c is a weight
  • Then YabXe ? Yab(XcWe)eabXbcWE
  • bb iff rwx0 or rwy0 otherwise b?b which
    means that the slope will change together with
    the increase in the error. Apart from the
    problems stated above, that means that
  • Our slope will be wrong

19
Diagnosis Remedy
  • Diagnosis
  • Look at the correlation of the measure with other
    measures of the same variable
  • Remedy
  • Use multiple indicators and structural equation
    models
  • Confirmatory factor analysis
  • Better measures

20
Normally Distributed Error
21
Non-Normal Error
  • Our calculations of statistical significance
    depends on this assumption
  • Statistical inference can be robust even when
    error is non-normal
  • Diagnosis
  • You can look at the distribution of the error.
    Because of the homoscedasticity assumption (see
    later) the error when summed up for each
    prediction should be also normal. (In principle,
    we have multiple observations for each
    prediction.)
  • Remember! Our measured variables (Y and X) do not
    have to have a normal distribution! Only the
    error for each prediction.
  • Remedy
  • Any non-linear transformation will change the
    shape of the distribution of the error

22
Error Has a Non-Zero Mean
  • The solid line gives a negative
  • The dotted line a positive mean
  • This can happen when we have some selection
    problem
  • Diagnosis
  • Visual scatter plot will not help unless we know
    in advance somehow the true regression line
  • Remedy
  • If it is a selection problem try to address it.

23
Non-independent errors
  • Example 1 Suppose you take a survey of 10 people
    but you interview everyone 10 times.
  • Now your N1000 but your errors are not
    independent. For the same person you will have
    similar errors
  • Example 2 Suppose you take 10 countries and you
    observe them in 10 different time period
  • Now your N1000 but your errors are not
    independent. For the same country you will have
    similar errors
  • Example 3 Suppose you take 100 countries and you
    observe them only once. Now your N100. But
    countries that are next to each other are often
    similar (same geography and climate, similar
    history, cooperation etc.). If your model
    underpredicts Denmark, it is likely to
    underpredict Sweden as well.
  • Example 4 Suppose you take 100 people but they
    are all couples, so what you really have is 50
    couples. Husband and wife tend to be similar. If
    your model underestimates one chances are it does
    the same for the other. Spouses have similar
    errors.
  • Statistical inference assumes that each case is
    independent of the other and in the two examples
    above it is not the case. In fact, your N lt 100.
  • This biases your standard error because the
    formula is tricked into believing that you have
    a larger sample than you actually have and larger
    samples give smaller standard errors and better
    statistical significance.
  • This may also bias your estimates of the
    intercept and the slope. Non-linearity is a
    special case of correlated errors.

24
Diagnosis Remedy
  • It is called autocorrelation because the
    correlation is between cases and not variables,
    although autocorrelations often can be traced to
    certain variables such as common geographic
    location or same country or person or family.
  • Diagnosis
  • Visual, scatterplot
  • Checking groups of cases that are theoretically
    suspect
  • Certain forms of serial or spatial
    autocorrelations can be diagnosed by calculating
    certain statistics (e.g., Durbin-Watson test)
  • Remedy
  • You can include new variables in the equation
  • E.g. for serial (temporal) correlation you can
    include the value of Y in t-1 as an independent
    variable
  • For spatial correlation we can often model the
    relationships by introducing an weight matrix

25
Heteroscedasticity
  • Homoscedasticity means equal variance
  • Heteroscedasticity means unequal variance
  • We assume that each prediction is not just on
    target on average but also that we make the same
    amount of error
  • Heteroscedasticity results in biased standard
    errors and statistical significance
  • Diagnosis
  • Visual, scatter plot
  • Remedy
  • Introducing a weight matrix (e.g. using 1/X)

26
Predictor Related to Error
  • Error represents all factors influencing Y that
    are not included in the regression equation
  • If an omitted variable is related to X the
    assumption is violated. This is the same as the
    Completeness or Omitted Variable Problem
  • Diagnosis
  • The error will ALWAYS be uncorrelated with X,
    there is no way to establish the TRUE error
  • Theoretical
  • Remedy
  • Adding new variables to the model

27
Correlated errors across interrelated equations
  • We sometimes estimate more than one regression.
  • Suppose Ytab1Xt-1b2Zt-1e but
  • Xtab1Yt-1b2Zt-1e
  • e and e will be correlated
  • (whatever is omitted from both equations will
    show up in both e and e making them correlated)
  • This is also the case in sample selection models
  • Sab1Xb2Ze S is whether one is selected into
    the sample
  • Yab1Xb2Zb3Wb4Ve Y is the outcome of
    interest
  • e and e will be correlated
  • (whatever is omitted from both equations will
    show up in both e and e making them correlated)
Write a Comment
User Comments (0)
About PowerShow.com