Single Variable Regression - PowerPoint PPT Presentation

About This Presentation
Title:

Single Variable Regression

Description:

Single Variable Regression – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 26
Provided by: Kash56
Learn more at: http://cs.furman.edu
Category:

less

Transcript and Presenter's Notes

Title: Single Variable Regression


1
Single Variable Regression
2
Which Approach Is Appropriate When?
  • Choosing the right method for the data is the key
    statistical expertise that you need to have.

3
Do I Need to Know the Formulas?
  • You do not need to know exact formulas.
  • You do need to understand the concept behind them
    and the general statistical concepts imbedded in
    the use of the formulas.
  • You do not need to be able to do correlation and
    regression by hand.
  • You must be able to do it on a computer using
    Excel.

4
Table of Content
  • Objectives
  • Purpose of Regression
  • Correlation or Regression?
  • First Order Linear Model
  • Probabilistic Linear Relationship
  • Estimating Regression Parameters
  • Assumptions
  • Sum of squares
  • Tests
  • Percent of variation explained
  • Example
  • Regression Analysis in Excel
  • Normal Probability Plot
  • Residual Plot
  • Goodness of Fit
  • ANOVA For Regression

5
Objectives
  • To learn the assumptions behind and the
    interpretation of single and multiple variable
    regression.
  • To use Excel to calculate regressions and test
    hypotheses.

6
Purpose of Regression
  • To determine whether values of one or more
    variable are related to the response variable.
  • To predict the value of one variable based on the
    value of one or more variables.
  • To test hypotheses.

7
Correlation or Regression?
  • Use correlation if you are interested only in
    whether a relationship exists.
  • Use Regression if you are interested in building
    a mathematical model that can predict the
    response variable.
  • Use regression if you are interested in the
    relative effectiveness of several variables in
    predicting the response variable.

8
First Order Linear Model
  • A deterministic mathematical model between y and
    x
  • y ?0 ?1 x
  • ?0 is the intercept with y axis, the point at
    which x 0
  • ?1 is the angle of the line, the ratio of rise
    divided by the run in figure to the right. It
    measures the change in y for one unit of change
    in x.

9
Probabilistic Linear Relationship
  • But relationship between x and y is not always
    exact. Observations do not always fall on a
    straight line.
  • To accommodate this, we introduce a random error
    term referred to as epsilon y ?0 ?1 x
    ?
  • The task of regression analysis then is to
    estimate the parameters b0 and b1 in the
    equation
  • b0 b1 x
  • so that the difference between y and is
    minimized

10
Estimating Regression Parameters
  • Red dots show the observations
  • The solid line shows the estimated regression
    line
  • The distance between each observation and the
    solid line is called residual
  • Minimize the sum of the squared residuals
    (differences between line and observations).

11
Assumptions
  • The dependent (response) variable is measured on
    an interval scale
  • The probability distribution of the error is
    Normal with mean zero
  • The standard deviation of error is constant and
    does not depend on values of x
  • The error terms associated with any particular
    value of Y is independent of error term
    associated with other values of Y

12
Sum of Squares
  • Variation in y SSR SSE
  • MSR divided by MSE is the test statistic for
    ability of regression to explain the data

13
Tests
  • The hypothesis that the regression equation does
    not explain variation in Y and can be tested
    using F test.
  • The hypothesis that the coefficient for x is zero
    can be tested using t statistic.
  • The hypothesis that the intercept is 0 can be
    tested using t statistic

14
Percent of Variation Explained
  • R2 is the coefficient of determination.
  • The minimum R2 is zero. The maximum is 1.
  • 1- R2 is the variation left unexplained.
  • If Y is not related to X or related in a
    non-linear fashion, then R2 will be small.
  • Adjusted R2 shows the value of R2 after
    adjustment for degrees of freedom. It protects
    against having an artificially high R2 by
    increasing the number of variables in the model.

15
Example
  • Is waiting time related to satisfaction ratings?
  • Predict what will happen to satisfaction ratings
    if waiting time reaches 15 minutes?

16
Regression Analysis in Excel
  • Select tools
  • Select data analysis
  • Select regression analysis
  • Identify the x and y data of equal length
  • Ask for residual plots to test assumptions
  • Ask for normal probability plot to test assumption

17
Normal Probability Plot
  • Normal Probability Plot compares the percent of
    errors falling in particular bins to the
    percentage expected from Normal distribution.
  • If assumption is met then the plot should look
    like a straight line.

18
Residual Plot
The difference between the observed value of the
dependent variable (y) and the predicted value
(y) is called the residual (e). Each data point
has one residual. Residual Observed value -
Predicted value
  • Tests that residuals have mean of zero and
    constant standard deviation
  • Tests that residuals are not dependent on values
    of x

19
Residual Plot
  • A residual plot is a graph that shows the
    residuals on the vertical axis and the
    independent variable on the horizontal axis.
  • If the points in a residual plot are randomly
    dispersed around the horizontal axis, a linear
    regression model is appropriate for the data
    otherwise, a non-linear model is more
    appropriate.
  • Below the chart displays the residual (e) and
    independent variable (X) as a residual plot.
  • This random pattern indicates that a linear model
    provides a decent fit to the data.

20
Residual Plot
  • Below, the residual plots show three typical
    patterns.
  • The first plot shows a random pattern, indicating
    a good fit for a linear model.
  • The other plot patterns are non-random (U-shaped
    and inverted U), suggesting a better fit for a
    non-linear model.

Random pattern Non-random U-shaped Non-random Inverted U
21
Linear Equation
  • Satisfaction 121.3 4.8 Waiting time
  • At 15 minutes waiting time, satisfaction is
    predicted to be
  • 121.3 - 4.8 15 48.87
  • The t statistic related to both the intercept and
    waiting time coefficient are statistically
    significant.
  • The hypotheses that the coefficients are zero are
    rejected.

22
Goodness of Fit
  • 57 of variation in satisfaction ratings is
    explained by the equation
  • 43 of variation in satisfaction ratings is left
    unexplained

23
ANOVA For Regression
  • The regression model has mean sum of square of
    347.
  • The mean sum of errors is 33. Note the error
    term is called residuals in Excel.
  • F statistics is 10, the probability of observing
    this statistic is 0.02.
  • The hypothesis that the MSR and MSE are equal is
    rejected. Significant variation is explained by
    regression.

24
Null Hypothesis
  • The null hypothesis corresponds to a general or
    default position.
  • For example, the null hypothesis might be that
    there is no relationship between two measured
    phenomena or that a potential treatment has no
    effect.
  • It is important to understand that the null
    hypothesis can never be proven.
  • A set of data can only reject a null hypothesis
    or fail to reject it.
  • For example, if comparison of two groups (e.g.
    treatment, no treatment) reveals no statistically
    significant difference between the two, it does
    not mean that there is no difference in reality.
  • It only means that there is not enough evidence
    to reject the null hypothesis (in other words,
    the experiment fails to reject the null
    hypothesis)

25
What is a P value?
  • P stands for probability
  • Measures the strength of the evidence against the
    null hypothesis (that our regression has no
    significance)
  • Smaller P values indicate stronger evidence
    against the null hypothesis
  • By convention, p-values of lt.05 are often
    accepted as statistically significant but this
    is an arbitrary cut-off.

25
Write a Comment
User Comments (0)
About PowerShow.com