Title: Regression
1Regression
2Is there a relationship between two or more
quantitative variables?
- univariate data.
- bivariate data.
- Examples of bivariate data include
- Height and Arm Length
- Movie Theater Attendance and Concession Sales
- Diastolic and systolic blood pressure
- Elevation and air temperature
- There is typically one variable that is used to
predict the other. - The variable used to predict is the independent
or explanatory or predictor variable and is
represented by x. - The variable we want to predict is the response
or dependent variable. It is represented by y.
3Scatter Plots
- Graphing method for understanding the
relationship between the two variables
4Is there a relationship between the amount of
mail at the post office and the number of man
hours needed to process the mail?
5(No Transcript)
6Is there a relationship between the number of
establishments in a community that sell alcohol
and the number of DUI arrests? Data from Jean
Blackwell, Diem-Trang Tran, Liane Jitchaku,
6/8/01.
7(No Transcript)
8(No Transcript)
9(No Transcript)
10Is there a relationship between Suspended Solids
in water and Turbidity?
11Notice the outlier.
12(No Transcript)
13(No Transcript)
14Bivariate Data Analysis
- scatter plot
- Correlation
- Determining if there is significant Correlation
- Coefficient of Determination (r2)
- The Least Squares Regression Line
- Analyzing the Residuals
- Possibly making a prediction of y based on x.
15Correlation
- The correlation shows the strength of the linear
relationship between x and y. - r is the sample correlation.
- r is the population correlation.
- Correlation is the covariance of x and y divided
by the standard deviation of x and y.
16Correlation is the covariance of x and y divided
by the standard deviation of x and y.
17Correlation
- -1 r 1,
- where r 0 means no correlation
- r 1 or 1 means perfect positive or negative
correlation. - Correlation does not prove cause and effect.
18Determining if there is significant Correlation
Use Excel
19Coefficient of Determination
- It is common to report the r2 value. This value
represents the proportion of the total variation
in the y values that can be explained by the
linear relationship between x and y. - 0 r2 1.
20Least Squares Regression Line
- To model the relationship between x and y, we use
the line that is known as the least squares
regression line. - Performing the regression is often stated as
regress y on x. - The slope of the line is given by b.
- The y-intercept is given by a.
- The difference between the predicted value of y
and the actual value is called the residual (the
vertical deviation). It represents the error
in the prediction. The ith residual is observed
response minus the predicted response .