Title: Examining Linear Relationships: Correlation and Regression
1Examining Linear Relationships Correlation and
Regression
Topics Scatterplots Correlation review Least
squares regression line Interpretation of
regression model
2Examining Linear Relationships Correlation and
Regression
- We can examine the relationship between two
numerical variables by observing a scatterplot. - The simplest relationship is linear
3Creating the Scatterplot
- In regression analysis the variable of main
interest is called the response variable - The variable that we believe affects or drives
the response variable is called the explanatory
variable
4Creating the Scatterplot
- We always put the explanatory variable on the
horizontal axis and the response variable on the
vertical axis. - If a store manager believes that large
promotional expenditures cause larger values of
sales he should put Sales on the vertical axis
and Expenditure on the horizontal axis.
5Examining Linear Relationships Correlation and
Regression
- The relationship is strong if the points in a
scatterplot cluster tightly around some straight
line. If this line rises form left to right then
the relationship is positive. If it falls from
left to right then the relationship is negative.
6Examining Linear Relationships Correlation and
Regression
- The correlation coefficient is a numerical
measure of the strength of the linear
relationship between two variables. - r 1 for perfect positive
- r -1 for perfect negative
7(No Transcript)
8Assessing Linear Relationships with Correlation
Coefficient r
9Regression Line
- Whenever the correlation is high enough to
indicate at least a moderately strong linear
relationship between the two variables we usually
draw a line through the scatterplot points to
model the relationship. - The best fitting line through the points is
called the regression line.
10Regression Line
- The regression line is a straight line that
describes how the response variable Y changes as
an explanatory variable X changes. - The ultimate goal is often to use the regression
line to predict the value of Y for a given value
of X
11Least Squares Estimation
- The best fitting (regression) line through the
points is obtained by satisfying the following
condition - The sum of the squares of the vertical distances
from the sample points to the line must be as
small as possible. - Method known as Least squares estimation.
12Scenario for Regression Example
- To see how effective their advertising and other
promotional activities are, the Pharmex drugstore
collected data from 50 randomly selected
metropolitan regions. - In each region it compared Pharmexs promotional
expenditures and sales to those of the leading
competitor in the region over the past year.
13Regression Scenario Variable Names
- Promote Pharmexs promotional expenditures as a
percentage of those of the leading competitor - Sales Pharmexs sales as a percentage of those
of the leading competitor
14Regression Scenario Scatterplot
15Regression Scenario Least Squares Equation
predicted Sales 25.126 0.762Promote
16Generic Equation of a Straight Line
Y a bX Slope, b the change in the mean
value of Y for each unit increase in X Intercept,
a (theoretically) the mean value of Y when X
equals zero. May not have a practical meaning.
Valid only if data includes observations with X
0.
17Interpretation of Slope and Intercept Pharmex
Regression
predicted Sales 25.126 0.762Promote
intercept slope Slope For each unit increase in
the promotional expenses index, sales index is
expected to increase by about 0.76 Intercept In
a region that does no promotions the sales index
is expected to be about 25.1 (theoretical)
18Causation
- Unless the data is obtained in a carefully
controlled experiment we should never make
definitive statements about causation in
regression analysis. - Reason - we can almost never rule out the
possibility that some other (lurking) variable is
causing the variation in both of the observed
variables.