Title: Univariate Linear Regression
1Univariate Linear Regression
- Chapter Eight
- Basic Problem
- Definition of Scatterplots
- What to check for
2Basic Empirical Situation
- Unit of data.
- Two interval (or ratio) scales measured for each
unit. - Example observational study, independent
variable is score of student on first exam in
AMS315, dependent variable is score on final
exam. - Objective is to assess the strength of the
association between score on first exam and final.
3Examining the scatterplot
- Regression techniques ASSUME
- 1. Linear regression function
- 2. Independent errors of measurement
- 3. Constant error variance
- 4. Normal distribution of errors.
- If assumptions 1 and 3 met, scatterplot is a
football shaped cloud of points.
4How to use a scatterplot
- Look at it!
- Check whether linear regression function appears
reasonable (pencil test). - Check whether there is a horn shaped pattern in
the scatterplot (homoscedasticity violated). - Check for outliers or other unusual patterns.
5Ordinary Least Squares Line
- Residual
- ASSUME intercept is a and slope b
- ASSUME dependent variable value is y1 and
independent variable value is x1 - Residual r1(a,b)(y1-a-bx1)
- Chose slope b and intercept a so that the sum of
the residuals squared is as small as possible.
6OLS Estimate for the Slope
- The solution is always the same you should
memorize the following.
7OLS Estimate of the Slope
- The correlation coefficient is r.
- The standard deviation of the y data is sY.
- The standard deviation of the x data is sX
- There are other formulas as well that are useful
for solving specific distributional problems
8Point Slope Form of the Regression Line
- Memorize the following formula
9Univariate Linear Regression Model
- Value of dependent variable on i-th unit is Yi
and independent variable is xi. - There are three quantities to be estimated ß0,
ß1, and s. These are the intercept, slope, and
standard deviation of error.
10Four Assumptions of Univariate Linear Regression
- Regression function is linear.
- Observations have independent errors.
- Variance of error is the same for all
observations. - Errors are normally distributed.
11Implication of Assumptions
- Each Yi is normally distributed with expected
value ß0ß1xi and variance s2. - The most important question is whether the data
indicates that the slope is different from zero. - From these facts, we can derive the distribution
of the OLS estimate of the slope.
12OLS estimate of the slope
- The estimate given in the last class is the most
practical and interpretable estimate. - There is another formula that gives exactly the
same result but is easier to work with
13Using the new formula
- The estimate is a linear combination of the Yi,
which are normally distributed. - Therefore, the distribution of the estimate is
normal. - If only we knew its expected value and variance!
14Using the new formula
- The estimate can be rewritten
- where
15Using the new formula
- When we write in what the model is, we get
16Expected value of estimated slope
- Expectation is a linear operator.
- We apply the standard calculations to the
previous formula to find
17Variance of the OLS estimate of the slope
- The formula for the variance of the sum of two
random variables generalizes. The general result
is
18Variance of the OLS estimate of the slope
- We apply this formula to the last term in the
formula for the OLS estimate of the slope
19Variance of the OLS estimate of the slope
- Remember that the Zi are independent standard
normal random variables. - That is, each variance is one.
- Each covariance is zero.
20Variance of the OLS estimate of the slope
- Then, the variance of the OLS estimate of the
slope is given by
21Summary of Results
- When the model is correct, the distribution of
the OLS estimate of the slope is
22Tests of Hypotheses and Confidence Intervals
- ASSUME s2 is known.
- Then you can test a null hypothesis and find
confidence interval for ß1 using procedures as
before. - These results are most useful for designing
studies. - We will focus on this next class.
23Tests of Hypotheses and Confidence Intervals
- ASSUME s2 is unknown.
- Estimate s2 by MSE.
- Use a t-test
- Also have an F test
24Tests of Hypotheses and Confidence Intervals
- For t-test, degrees of freedom from MSE.
- Degrees of freedom is n-2.
- Alternatives can be right, left, or two-sided.
- For F-test, one numerator and n-2 degrees of
freedom. - Test is always right-sided.
- With respect to coefficients, F-test is a
two-sided test about the coefficients.
25Additional Tests and Confidence Intervals
- Can get confidence interval for ß0.
- Can get confidence interval for the value of the
regression function at a specific argument.
26Prediction Intevals
- Covered in a later lecture.
27Next Class
- Design issues in two independent sample studies.
- Design issues in regression analysis.