Statistical Techniques I - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Statistical Techniques I

Description:

... of the observations from the regression line, this is called a Least Squares Fit ... We want to fit the best possible line, we define this as the line that ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 29
Provided by: jamespg
Category:

less

Transcript and Presenter's Notes

Title: Statistical Techniques I


1
  • Statistical Techniques I

EXST7005
Simple Linear Regression
2
  • Simple Linear Regression
  • Measuring describing a relationship between two
    variables
  • Simple Linear Regression allows a measure of the
    rate of change of one variable relative to
    another variable.
  • Variables will always be paired, one termed an
    independent variable (often referred to as the X
    variable) and a dependent variable (termed a Y
    variable).
  • There is a change in the value of variable Y as
    the value of variable X changes.

3
  • Simple Linear Regression (continued)
  • For each value of X there is a population of
    values for the variable Y (normally distributed).

4
  • Simple Linear Regression (continued)
  • The linear model which discribes this
    relationship is given as
  • Yi b0 b1Xi
  • this is the equation for a straight line
  • where b0 is the value of the intercept (the
    value of Y when X 0)
  • b1 is the amount of change in Y for each unit
    change in X. (i.e. if X changes by 1 unit, Y
    changes by b1 units). b1 is also called the
    slope or REGRESSION COEFFICIENT

5
  • Simple Linear Regression (continued)
  • Population Parameters
  • my.x the true population mean of Y at each
    value of X
  • b0 the true value of the Y intercept
  • b1 the true value of the slope, the change in
    Y per unit of X
  • my.x b0 b1Xi
  • this is the population equation for a straight
    line

6
  • Simple Linear Regression (continued)
  • The sample equation for the line describes a
    perfect line with no variation. In practice
    there is always variation about the line. We
    include an additional term to represent this
    variation.
  • my.x b0 b1Xi ei for a population
  • Yi b0 b1Xi ei for a sample
  • when we put this term in the model, we are
    describing individual points as their position on
    the line, plus or minus some deviation

7
  • Simple Linear Regression (continued)

8
Simple Linear Regression (continued)
  • the SS of deviations from the line will form the
    basis of a variance for the regression line
  • when we leave the ei off the sample model, we are
    describing a point on the regression line
    predicted from the sample. To indicate this we
    put a HAT on the Yi value

9
  • Characteristics of a Regression Line
  • The line will pass through the point X,Y (also
    the point 0, b0)
  • The sum of squared deviations (measured
    vertically) of the points from the regression
    line will be a minimum.
  • Values on the line can be described by the
    equation Y b0 b1Xi

10
  • Fitting the line
  • Fitting the line starts with a corrected
    SSDeviation, this is the SSDeviation of the
    observations from a horizontal line through the
    mean.

11
  • Fitting the line (continued)
  • The fitted line is pivoted on the point until it
    has a minimum SSDeviations.

12
  • Fitting the line (continued)
  • How do we know the SSDeviations are a minimum?
    Actually, we solve the equation for ei, and use
    calculus to determine the solution that has a
    minimum of Sei2.

13
  • Fitting the line (continued)
  • The line has some desirable properties
  • E(b0) b0
  • E(b1) b1
  • E(YX) mX.Y
  • Therefore, the parameter estimates and predicted
    values are unbiased estimates.

14
  • The regression of Y on X
  • Y the "dependent" variable, the variable to be
    predicted
  • X the "independent" variable, also called the
    regressor or predictor variable.
  • Assumptions - general assumptions
  • Y variable is normally distributed at each value
    of X
  • The variance is homogeneous (across X).
  • Observations are independent of each other and ei
    independent of the rest of the model.

15
  • The regression of Y on X (continued)
  • Special assumption for regression.
  • Assume that all of the variation is attributable
    to the dependent variable (Y), and that the
    variable X is measured WITHOUT ERROR.
  • Note that the deviations are measured vertically,
    not horizontally or perpendicular to the line.

16
  • Derivation of the formulas
  • Any observation can be written as
  • Yi b0 b1Xi ei for a sample
  • where ei a deviation fo the observed point
    from the regression line
  • note, the idea of regression is to minimize the
    deviation of the observations from the regression
    line, this is called a Least Squares Fit

17
  • Derivation of the formulas (continued)
  • Sei 0
  • the sum of the squared deviations
  • Sei2 S(Yi - Yhat)2
  • Sei2 S(Yi - b0 b1Xi )2
  • The objective is to select b0 and b1 such that
    Sei2 is a minimum, this is done with calculus
  • You do not need to know this derivation!

18
  • A note on calculations
  • We have previously defined the uncorrected sum of
    squares and corrected sum of squares of a
    variable Yi
  • The uncorrected SS is SYi2
  • The correction factor is (SYi)2/n
  • The corrected SS is SYi2 - (SYi)2/n
  • Your book calls this SYY, the correction factor
    is CYY
  • We could define the exact same series of
    calculations for Xi , and call it SXX

19
  • A note on calculations (continued)
  • We will also need a crossproduct for regression,
    and a corrected crossproduct
  • The crossproduct is XiYi
  • The Sum of crossproducts is SXiYi, which is
    uncorrected
  • The correction factor is (SXi)(SYi) / n CXY
  • The corrected crossproduct is SXiYi-(SXi)(SYi)/n
  • Which you book calls SXY

20
Derivation of the formulas (continued)
  • the partial derivative is taken with respect to
    each of the parameters for b0

21
  • Derivation of the formulas (continued)
  • set the partial derivative to 0 and solve for b0
  • 2 S(Yi-b0-b1Xi)(-1) 0
  • - SYi nb0 b1 SXi 0
  • nb0 SYi - b1 SXi
  • b0 Y - b1X
  • So b0 is estimated using b1 and the means of X
    and Y

22
  • Derivation of the formulas (continued)
  • Likewise for b1 we obtain the partial derivative

23
  • Derivation of the formulas (continued)
  • set the partial derivative to 0 and solve for b1
  • 2 S(Yi-b0-b1Xi)(-Xi) 0
  • - S(YiXi b0Xi b1 Xi2) 0
  • -SYiXi b0SXi b1 SXi2) 0
  • and since b0 Y - b1X ) , then
  • SYiXi (SYi/n - b1 SXi/n )SXi b1 SXi2
  • SYiXi SXiSYi/n - b1 (SXi)2/n b1 SXi2
  • SYiXi - SXiSYi/n b1 SXi2 - (SXi)2/n
  • b1 SYiXi - SXiSYi/n / SXi2 - (SXi)2/n

24
  • Derivation of the formulas (continued)
  • b1 SYiXi - SXiSYi/n / SXi2 - (SXi)2/n
  • b1 SXY / SXX
  • so b1 is the corrected crossproducts over the
    corrected SS of X
  • The intermediate statistics needed to solve all
    elements of a SLR are SXi, SYi, n, SXi2 , SYiXi
    and SYi2 (this last term we haven't seen in the
    calculations above, but we will need later)

25
  • Derivation of the formulas (continued)
  • Review
  • We want to fit the best possible line, we define
    this as the line that minimizes the vertically
    measured distances from the observed values to
    the fitted line.
  • The line that achieves this is defined by the
    equations
  • b0 Y - b1X
  • b1 SYiXi - SXiSYi/n / SXi2 - (SXi)2/n

26
  • Derivation of the formulas (continued)
  • These calculations provide us with two parameter
    estimates that we can then use to get the
    equation for the fitted line.

27
  • Numerical example
  • See Regression handout

28
  • About Crossproducts
  • Crossproducts are used in a number of related
    calculations.
  • a crossproduct YiXi
  • Sum of crossproducts SYiXi SXY
  • Covariance SYiXi / (n-1)
  • Slope SXY / SXX
  • SSRegression S2XY / SXX
  • Correlation SXY / ÖSXXSYY
  • R2 r2 S2XY / SXXSYY SSRegression/SSTotal
Write a Comment
User Comments (0)
About PowerShow.com