Title: Bivariate Relationships Between IntervalRatio Level Variables
1Bivariate Relationships Between Interval/Ratio
Level Variables
- Correlation Coefficient (r)
- Regression Analysis
2Scatterplots to examine a relationship between
X and Y
3Scatterplots
4Positive Relationship
5Negative Relationship
6No Relationship(Independence)
7Covariance
- The correlation coefficient is based on the
covariance. - For a sample, the covariance is calculated as
- _ _
- sxy ?(Xi - X)(Yi - Y)
- N - 1
- Interpretation Covariance tells us how variation
in one variable goes with variation in another
variable (covary).
8Covariance
- Two variables are statistically independent
(perfectly unrelated) when their covariance 0. - Positive relationships indicated by value,
negative relationships by a value. - Problem with Covariance as a measure of
association?
9Correlation
- Correlation Coefficient (Pearsons r)
- A way of standardizing the covariance.
- rxy sxy / sxsy
- Intepretation Measures the strength of a linear
relationship. - -1 ? r ? 1
- X and Y are perfectly unrelated (independent,
uncorrelated) iff rxy 0
10Example The Determinants of State Welfare
Generosity
- What explains variation the generosity of state
welfare expenditures? (STATES 55) - 110 - Clinton
- 97 Teenmom
- 40 - STATETAX
- 146 FLEGIS
11Regression Analysis
- Regression concerned with dependence of one
variable (the dependent variable, measured at the
interval/ratio level) on one or more other
variables (independent variables, measured at the
interval, ratio, ordinal or nominal levels). - Bivariate vs. Multivariate regression analysis
- Y used as dependent variable and X as independent
variable.
12Regression vs. Correlation
- The correlation coefficient measures the strength
of a linear association between two variables
measured at the interval level - In a scatterplot the degree to which the points
in the plot cluster around a best-fitting line
13Regression vs. Correlation
- The purpose of regression analysis is to
determine exactly what that line is (i.e. to
estimate the equation for the line) - The regression line represents predicted values
of Y based on the value of X
14Equation for a Line (Perfect Linear Relationship)
- Yi a bXi
- a Intercept, or Constant The value
- of Y when X 0
-
- b Slope coefficient The change ( or -) in Y
given a one unit increase in X
15Linear Equation for a Regression Model (with
error)
- Yi a bXi ei
- Residual (ei ) for every observation, the
difference between the observed value of Y and
the regression line
16Estimating the Regression Coefficients
- Using statistical calculations, for any
relationship between X and Y, we can determine
the best-fitting line for the relationship - This means finding specific values for a and b
for the regression equation - Yi a bXi ei
17Estimating the Regression Coefficients
- Regression analysis finds the line that minimizes
the sum of squared residuals - Yi a bXi ei
18Interpreting the Regression Coefficients
- a the expected value of Y when X0
- b the expected change in Y given a one unit
increase in X - Yi a bXi ei
19Calculating Predicted Values
- We can calculate a predicted value for the
dependent variable for any value of X by using
the regression equation for the regression line -
- Yi a bXi
20Calculating Predicted Values for Y from a
Regression Equation The 2000 Election
- Research Question Did the butterfly ballot
result in an unusual number of votes for Pat
Buchanan in the 2000 election in Palm Beach Co.? - Unit of analysis Fla. Counties (66 counties
all but Palm Beach) - Dependent variable (Y) vote for Buchanan in
2000 - Independent variable (X) vote for Buchanan in
1996 -
21Calculating Predicted Values for Y from a
Regression Equation The 2000 Election
- The estimated regression equation is
- Vote(2000) 40.60879 .0739 Vote(1996)
- Source SS df MS
Number of obs 66 - ---------------------------------------
F( 1, 64) 378.53 - Model 2832257.19 1 2832257.19
Prob gt F 0.0000 - Residual 478868.813 64 7482.3252
R-squared 0.8554 - ---------------------------------------
Adj R-squared 0.8531 - Total 3311126.00 65 50940.40
Root MSE 86.50 - --------------------------------------------------
---------------------------- - buch2000 Coef. Std. Err. t
Pgtt 95 Conf. Interval - -------------------------------------------------
--------------------------- - Buch1996(b) .0739179 .0037993 19.456
0.000 .066328 .0815079 - _cons(a) 40.60879 13.85208 2.932
0.005 12.93607 68.28151
22Regression Example The 2000 Election
- To generate a predicted value for Palm Beach in
2000, we could simply plug in the appropriate X
value and solve for Y. -
- In 1996, Buchanan received 8788 votes in Palm
Beach. Our prediction for Palm Beach in 2000
based on this regression is - 40.6088 .07398788 690.04
23The 2000 Election (FL)
Palm Beach
24Calculating Residuals
- We can calculate the residual for any observation
by first calculating the predicted value for Y,
and then subtracting the predicted value from the
observed value of Y -
- ei Yi - Yi
25Interpreting Residuals
- For any observation in our data, the residual
represents the prediction error for that
observation (based on the regression equation) -
- ei Yi - Yi
26Regression Analysis and Statistical Significance
- Testing for statistical significance for the
slope - The p-value - probability of observing a sample
slope value at least as large (different from 0)
as the one we are observing in our sample IF THE
NULL HYPOTHESIS IS TRUE - P-values closer to 0 suggest the null hypothesis
is less likely to be true (.05 usually the
threshold for statistical significance)
27The Fit of the Regression Line
- The R-squared the proportion of variation in
the dependent variable (Y) explained by the
independent variable (X). - In bivariate regression analysis it is simply the
square of the correlation coefficient (r)
28Summary of Regression Statistics
- Intercept (a)
- Slope (b)
- Predicted values of Y
- Residuals
- P-value for the slope
- R-squared
29Examples of Regression Analysis
- What explains variation the generosity of state
welfare expenditures? (STATES 55) - 110 - Clinton
- 97 Teenmom
- 40 - STATETAX
- 146 FLEGIS