Simple Linear Regression and Correlation - PowerPoint PPT Presentation

About This Presentation
Title:

Simple Linear Regression and Correlation

Description:

Chapter 11 Simple Linear Regression and Correlation Learning Objectives Use simple linear regression for building empirical models Estimate the parameters in a linear ... – PowerPoint PPT presentation

Number of Views:125
Avg rating:3.0/5.0
Slides: 39
Provided by: siegm
Learn more at: https://www.ecs.csun.edu
Category:

less

Transcript and Presenter's Notes

Title: Simple Linear Regression and Correlation


1
Chapter 11
  • Simple Linear Regression and Correlation

2
Learning Objectives
  • Use simple linear regression for building
    empirical models
  • Estimate the parameters in a linear regression
    model
  • Determine if the regression model is an adequate
    fit to the data
  • Test statistical hypotheses and construct
    confidence intervals
  • Prediction of a future observation
  • Use simple transformations to achieve a linear
    regression model
  • understand the correlation

3
Regression analysis
  • Relationships between two or more variables
  • Useful for these types of problems
  • Predict a new observation
  • Sometimes a regression model will arise from a
    theoretical relationship
  • At other times no theoretical knowledge
  • Choice of the model is based on inspection of a
    scatter diagram
  • Empirical model

4
Regression Model
  • Mean of the random variable Y is related to x
  • EYx?Yx?0 ?1x
  • ?0 and ?1 called regression coefficients
  • Appropriate way to generalize this to a
    probabilistic model
  • Assume that the expected value of Y is a linear
    function of x
  • Actual value of Y is determined by the mean value
    function plus a random error term
  • Y?0 ?1x?
  • Where ? called random error term

5
?1 and ?
  • Suppose that the mean and variance of ? are 0
    and ?2
  • Slope, ?1, can be interpreted as the change in
    the mean of Y
  • Height of the line at any value of x is just the
    expected value of Y for that x
  • Variability of Y at a particular value of x is
    determined by the error variance ?2
  • Implies that there is a distribution of Y-values
    at each x

6
Graph of the Variability
Y
  • Distribution of Y for any given value of x
  • Values of x are fixed, and Y is a random variable
    with the following mean and variance
  • Mean ?Yx?0 ?1x
  • Variance ?2

True regression line
x
7
Simple Linear Regression
  • Values of the intercept, slope and the error
    variance will not be known
  • Must be estimated from sample data
  • Fitted model is used in prediction of future
    observations of Y at a particular level of x

8
Method of Least Squares
  • True relationship between Y and x is a straight
    line
  • Assume n pairs of observations
  • Estimates of ?0 and ?1 result in a line that is a
    best fit to the data
  • Called method of least squares

9
Least Squares Method
  • Assuming the n observations in the sample
  • Sum of the squares of the deviations of the
    observations from the true regression line
  • Taking the partial derivatives

10
Least Squares Method-Cont.
  • Simplifying
  • Results are
  • Fitted or estimated regression line is

11
Using Special Symbols
  • Convenient to use special symbols
  • Numerator
  • Denominator

12
Residual Error
  • Describes the error in the fit of the model to
    the ith observation yi
  • Each pair of observations satisfies
  • Denoted by ei

13
Estimating ?2
  • Another unknown parameter,?2, the variance of the
    error term ?
  • Residuals ei are used to obtain an estimate of ?2
  • Sum of squares of the residuals, often called the
    error sum of squares
  • A more convenient computing formula
  • SST is the total sum of squares

14
Example
  • Regression methods were used to analyze the data
    from a study investigating the relationship
    between roadway surface temperature (x) and
    pavement deflection ( y).
  • Summary quantities were as follows

15
Questions
  • (a) Calculate the least squares estimates of the
    slope and intercept. Graph the regression line.
  • (b) Use the equation of the fitted line to
    predict what pavement deflection would be
    observed when the surface temperature is 85F.
  • (c) What is the mean pavement deflection when the
    surface temperature is 90F?
  • (d) What change in mean pavement deflection would
    be expected for a 1F change in surface
    temperature?

16
Solution
  • Need to have
  • Hence, the slope and intercept
  • Regression line

17
Solution-Cont.
  • Graph of the regression line

18
Solution-Cont.
  • Pavement deflection
  • Mean pavement deflection
  • Change in mean pavement deflection

19
Properties of the Least Estimators
  • Assumed that the error term ? in the model is a
    random variable
  • Estimators will be viewed as random variables
  • Properties of the slope
  • Properties of the intercept

20
Analysis of Variance Approach
  • Used to test for significance of regression
  • Partitions the total variability in the response
    variable into two components
  • First term is called error sum of squares
  • Second term is called regression sum of squares
  • Symbolically
  • SSTSSRSSE
  • SST is the total corrected sum of squares

21
Analysis of Variance
  • SST, SSR, and SSE has n-1, 1, and n-2 d.o.f,
    respectively
  • SSR ß1Sxy and SSESST- ß1Sxy
  • Divide by its d.o.f
  • MSRSSR/1 and MSESSE/n-2
  • Then FMSR/MSE follows F1,n-2 distribution

22
Hypothesis Tests for Slope
  • Adequacy of a linear regression model
  • Appropriate hypotheses for slope are
  • H0 ß1ß1,0
  • H1 ß1ß1,0
  • Test Statistic
  • Follows the F 1,n-2 distribution
  • Reject H0 if f0gtf?,1,n-2

23
Analysis of Variance for Testing Significance of
Regression
24
Example
  • Consider the data from the previous example on
    xroadway surface temperature and ypavement
    deflection.
  • (a) Test for significance of regression using a
  • 0.05. What conclusions can you draw?
  • (b) Estimate the standard errors of the slope and
    intercept.

25
Solution
  • Use the steps in hypotheses testing
  • 1) Parameter of interest is slope of the
    regression line ?1
  • 2)
  • 3)
  • 4) ? 0.05
  • 5) The test statistic is
  • 6) Reject H0 if f0 gt f?,1,18 where f0.05,1,18
    4.416

26
Solution
  • 7) Using the results from the previous example
  • Hence, the test statistic
  • 8) Since 73.95 gt 4.416, reject H0 and conclude
    the model specifies a useful relationship at ?
    0.05
  • Standard error

27
Confidence Intervals on the Slope and Intercept
  • Interested to obtain C.I. estimates of the
    parameters
  • Width of these C.I. is a measure of the overall
    quality of the regression line
  • 100(1-a) C.I. on the slope ß1
  • 100(1-a) C.I. on the intercept ß0

28
Confidence Interval on the Mean Response
  • Constructed on the mean response at a specified
    value of x, say, x0
  • Called a C.I. about the regression line
  • C.I. about the mean response at the value of xx0
  • Applies only to the interval

29
Prediction of New Observations
  • An important application of a regression model
  • New observation is independent of the
    observations used to develop the regression model
  • C.I. for ?Yx is inappropriate
  • Prediction interval on a future observation at
    the value x0
  • Always wider than the C.I. at x0
  • Depends on both the error from the fitted model
    and the error associated with future observations

30
Example
  • The first example presented data on roadway
    surface temperature x and pavement deflection y
  • Find a 99 confidence interval on each of the
    following
  • (a) Slope
  • (b) Intercept
  • (c) Mean deflection when temperature x85o F
  • (d) Find a 99 prediction interval on pavement
    deflection when the temperature is 90oF.

31
Solution
  • a) Confidence interval on the slope
  • Critical value
  • t?/2,n-2 t0.005,18 2.878
  • Hence
  • b) Confidence interval on the intercept

32
Solution
  • c) 99 confidence interval on ? when x85 F
  • d) 99 prediction interval when x90 F.

33
Residual Analysis
  • Helpful in checking the errors are approximately
    normally distributed with constant variance
  • Useful in determining whether additional terms in
    the model are required
  • Construct normal probability plot of residuals
  • Patterns of residual plots

34
Coefficient of Determination(R2)
  • Judge the adequacy of a regression model
  • Coefficient of determination
  • Referred as the amount of variability in the data
    explained by the regression model and
  • SSR is that portion of SST that is explained by
    the use of the regression model
  • SSE is that portion of SST that is not explained
    by the use of the regression model

35
Transformation of Data Points
  • Inappropriateness of straight-line regression
    model
  • Scatter diagram
  • Consider the exponential function
  • Yß0eß1x? transformed to a straight line
  • By a logarithmic transformation
  • ln Ylnß0 ß1 x ln ?
  • Another intrinsically linear function is
    Yß0ß1(1/x)?
  • By using the reciprocal transformation z1/x
  • Yß0 ß1 z ?
  • Transformed error terms ? are normally distributed

36
Correlation
  • Assumed that x is a mathematical variable and
    that Y is a random variable
  • Many applications involve situations in which
    both X and Y are random variables
  • Suppose observations are jointly distributed
    random variables
  • Measures the strength of linear association
    between two variables and denoted by ?
  • Shows how closely the points in a scatter diagram
    are spread around the regression line

37
Hypothesis Tests
  • Useful to test the hypotheses
  • H0 ?0
  • H1 ?0
  • Appropriate test statistic
  • Follows the t distribution with n-2 degrees of
    freedom
  • Reject the null hypothesis if

38
Next Agenda
  • Chapters 13 deals with designing and conducting
    engineering experiments
  • ANOVA in designing single factor experiments will
    be emphasized
Write a Comment
User Comments (0)
About PowerShow.com