Bivariate Relationships Between IntervalRatio Level Variables - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Bivariate Relationships Between IntervalRatio Level Variables

Description:

Example: The Determinants of State Welfare Generosity. What explains variation the generosity of state welfare expenditures? ( STATES 55) ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 30
Provided by: rfor6
Category:

less

Transcript and Presenter's Notes

Title: Bivariate Relationships Between IntervalRatio Level Variables


1
Bivariate Relationships Between Interval/Ratio
Level Variables
  • Correlation Coefficient (r)
  • Regression Analysis

2
Scatterplots to examine a relationship between
X and Y

3
Scatterplots

4
Positive Relationship

5
Negative Relationship

6
No Relationship(Independence)

7
Covariance
  • The correlation coefficient is based on the
    covariance.
  • For a sample, the covariance is calculated as
  • _ _
  • sxy ?(Xi - X)(Yi - Y)
  • N - 1
  • Interpretation Covariance tells us how variation
    in one variable goes with variation in another
    variable (covary).

8
Covariance
  • Two variables are statistically independent
    (perfectly unrelated) when their covariance 0.
  • Positive relationships indicated by value,
    negative relationships by a value.
  • Problem with Covariance as a measure of
    association?

9
Correlation
  • Correlation Coefficient (Pearsons r)
  • A way of standardizing the covariance.
  • rxy sxy / sxsy
  • Intepretation Measures the strength of a linear
    relationship.
  • -1 ? r ? 1
  • X and Y are perfectly unrelated (independent,
    uncorrelated) iff rxy 0

10
Example The Determinants of State Welfare
Generosity
  • What explains variation the generosity of state
    welfare expenditures? (STATES 55)
  • 110 - Clinton
  • 97 Teenmom
  • 40 - STATETAX
  • 146 FLEGIS

11
Regression Analysis
  • Regression concerned with dependence of one
    variable (the dependent variable, measured at the
    interval/ratio level) on one or more other
    variables (independent variables, measured at the
    interval, ratio, ordinal or nominal levels).
  • Bivariate vs. Multivariate regression analysis
  • Y used as dependent variable and X as independent
    variable.

12
Regression vs. Correlation
  • The correlation coefficient measures the strength
    of a linear association between two variables
    measured at the interval level
  • In a scatterplot the degree to which the points
    in the plot cluster around a best-fitting line

13
Regression vs. Correlation
  • The purpose of regression analysis is to
    determine exactly what that line is (i.e. to
    estimate the equation for the line)
  • The regression line represents predicted values
    of Y based on the value of X

14
Equation for a Line (Perfect Linear Relationship)
  • Yi a bXi
  • a Intercept, or Constant The value
  • of Y when X 0
  • b Slope coefficient The change ( or -) in Y
    given a one unit increase in X

15
Linear Equation for a Regression Model (with
error)
  • Yi a bXi ei
  • Residual (ei ) for every observation, the
    difference between the observed value of Y and
    the regression line

16
Estimating the Regression Coefficients
  • Using statistical calculations, for any
    relationship between X and Y, we can determine
    the best-fitting line for the relationship
  • This means finding specific values for a and b
    for the regression equation
  • Yi a bXi ei

17
Estimating the Regression Coefficients
  • Regression analysis finds the line that minimizes
    the sum of squared residuals
  • Yi a bXi ei

18
Interpreting the Regression Coefficients
  • a the expected value of Y when X0
  • b the expected change in Y given a one unit
    increase in X
  • Yi a bXi ei

19
Calculating Predicted Values
  • We can calculate a predicted value for the
    dependent variable for any value of X by using
    the regression equation for the regression line
  • Yi a bXi

20
Calculating Predicted Values for Y from a
Regression Equation The 2000 Election
  • Research Question Did the butterfly ballot
    result in an unusual number of votes for Pat
    Buchanan in the 2000 election in Palm Beach Co.?
  • Unit of analysis Fla. Counties (66 counties
    all but Palm Beach)
  • Dependent variable (Y) vote for Buchanan in
    2000
  • Independent variable (X) vote for Buchanan in
    1996

21
Calculating Predicted Values for Y from a
Regression Equation The 2000 Election
  • The estimated regression equation is
  • Vote(2000) 40.60879 .0739 Vote(1996)
  • Source SS df MS
    Number of obs 66
  • ---------------------------------------
    F( 1, 64) 378.53
  • Model 2832257.19 1 2832257.19
    Prob gt F 0.0000
  • Residual 478868.813 64 7482.3252
    R-squared 0.8554
  • ---------------------------------------
    Adj R-squared 0.8531
  • Total 3311126.00 65 50940.40
    Root MSE 86.50
  • --------------------------------------------------
    ----------------------------
  • buch2000 Coef. Std. Err. t
    Pgtt 95 Conf. Interval
  • -------------------------------------------------
    ---------------------------
  • Buch1996(b) .0739179 .0037993 19.456
    0.000 .066328 .0815079
  • _cons(a) 40.60879 13.85208 2.932
    0.005 12.93607 68.28151

22
Regression Example The 2000 Election
  • To generate a predicted value for Palm Beach in
    2000, we could simply plug in the appropriate X
    value and solve for Y.
  • In 1996, Buchanan received 8788 votes in Palm
    Beach. Our prediction for Palm Beach in 2000
    based on this regression is
  • 40.6088 .07398788 690.04

23
The 2000 Election (FL)
Palm Beach
24
Calculating Residuals
  • We can calculate the residual for any observation
    by first calculating the predicted value for Y,
    and then subtracting the predicted value from the
    observed value of Y
  • ei Yi - Yi

25
Interpreting Residuals
  • For any observation in our data, the residual
    represents the prediction error for that
    observation (based on the regression equation)
  • ei Yi - Yi

26
Regression Analysis and Statistical Significance
  • Testing for statistical significance for the
    slope
  • The p-value - probability of observing a sample
    slope value at least as large (different from 0)
    as the one we are observing in our sample IF THE
    NULL HYPOTHESIS IS TRUE
  • P-values closer to 0 suggest the null hypothesis
    is less likely to be true (.05 usually the
    threshold for statistical significance)

27
The Fit of the Regression Line
  • The R-squared the proportion of variation in
    the dependent variable (Y) explained by the
    independent variable (X).
  • In bivariate regression analysis it is simply the
    square of the correlation coefficient (r)

28
Summary of Regression Statistics
  • Intercept (a)
  • Slope (b)
  • Predicted values of Y
  • Residuals
  • P-value for the slope
  • R-squared

29
Examples of Regression Analysis
  • What explains variation the generosity of state
    welfare expenditures? (STATES 55)
  • 110 - Clinton
  • 97 Teenmom
  • 40 - STATETAX
  • 146 FLEGIS
Write a Comment
User Comments (0)
About PowerShow.com