Title: Measures of Association and Regression Introduction
1Measures of Association and Regression
Introduction
2Measures of Association
- Ask the question, How strong is the
relationship? or How well can we predict y
using x - What is the difference between MoA and...
- Chi-Square or other hypothesis test results
- Recall difference between regression and
correlation
3How Strong is the Relationship?
4How Strong is the Relationship?
- Without an independent variable, our best
prediction is a measure of central tendency
(mode) - With an independent variable, we might be able to
improve our ability to predict the dependent
variable. - How much better can we predict the dependent
variable with another variable than when we only
had the mode?
5Gender and Guns
- ModeFavor Gun Ban
- If all we knew was this, our best prediction
would be that everyone favors the gun ban. - We would be right 807 times and we would be in
error 707 times
6Gender and Guns
- When we add gender to the mix we would predict
that all females favor (449 right, 226 wrong) and
all males oppose (481 right, 358 wrong) - That gives us a total of 930 right predictions
and 584 errors
7How much better do we do?
- Without knowing gender, we made 707 errors
- Knowing gender, we reduced that to 584 errors
- We made 123 fewer errors by knowing gender.
- We can show as a proportion, how big the
reduction in error is
We make about 18 fewer errors than we did
knowing the mode alone.
8Proportional Reduction of Error
- This framework is called PRE
- This measure, lambda (?), is appropriate for
crosstabs that involve a nominal variable - Similar logic can be applied to analyses of two
ordinal variables
9Ordinal PRE Measures
- Basic Logic
- Look at each case
- Did the measure of central tendency alone get it
right? - Did adding the ind. Variable get it right?
- Compare to see which is better
- Criterion for good choices?
- Fits your research needs
- Conservative estimate
10Choices
- Gamma
- Drops cases that are ties
- Overestimates strength
- Kendalls tau-b
- Allows for ties, best for square tables
- Kendalls tau-c
- Allows for ties, best for non-square tables
- Somers d
- Variant of tau statistics, not quite as common
11Nominal Measures
- If one or both variables are nominal, you should
use lambda (?). - Sometimes lambda doesnt work well
- Because we predict that all women approve and
that all men approve, we get the same prediction!
We need something else
12Cramers V (aka phi / f)
- We know that the ?2 statistic said there is a
real relationship, so we know that its strength
is greater than 0. - Cramers V is a mathematical manipulation of the
?2 statistic that turns it into a measure of
association. - where m is the less of
(r-1) and (c-1) - This is not a PRE measure of association, though,
so its interpretation is a little different.
13Interpreting PRE MoAs
- All lie on the interval -1,1
- SPSS will calculate them all for you.
- General Guidelines
14Interpreting PRE MoAs
- SPSS may or may not get the direction of the
association right (positive or negative). It
depends on how the variables are coded. - You should place the appropriate sign by your
measure of association (but not by Cramers V,
its always positive) - Tells you the proportion of errors made by the
measure of central tendency alone that are no
longer errors when we add the independent variable
15Interpreting Cramers V
- For our purposes, you may use the same table as
for PRE results (weak, moderate) - You cannot interpret it in terms of reduction of
error.
16Directionality
- Some tests are directional while others are
symmetric - In directional tests, you get different answers
depending on which variable is the D.V. and which
is the I.V. - In symmetric tests, the answer is the same
- Given the option, choose the directional test and
specify the right D.V.
17Finally,
- We only observe a sample, and Cramers V or any
of the PRE MoAs could be 0 in the population, and
we observe one higher than 0 by sampling error
only! - We can do tests of significance on many of these
- SPSS will do this for you, I wont make you do it
by hand.
18Regression
19Multiple Variables
- So far, we have learned
- How to analyze one variable at a time (CI)
- How to compare two means or proportions (a
relationship between one variable measured at any
level and a nominal or ordinal variable with two
categories) - How to compare relationships between two nominal
or ordinal level variables with any number of
categories (using simple crosstabs) - Today, we learn about relationships between two
interval-level variables.
20Two Interests
- Magnitude of relationship between the independent
variable and the dependent variable (how much
change in one yields how much change in the
other). - Correlation the predictive power of one variable
on another. This is a Measure of Association
(but not a PRE Measure of Association)
21Correlation
22Types of Correlation
- Positive Correlation An increase in one variable
results in an increase in the other - Negative Correlation An increase in one variable
results in a decrease in the other.
23Correlation analysis asks
- How good a predictor is the independent
Variable of the dependent variable? - How good a predictor of income is education?
- How accurate is our prediction of the effect of
education on income? - How close are the Dots to the line
- It is a Measure of Association
24Computing the Correlation Coefficient, Pearsons r
- Assess how much X and Y move together
(covariance) out of the amount they move
individually (variance)
25Computing r in STATA
- corr var1 var2 var100
- Output looks like this
. corr unemplyd mdnincm flood age65 black
(obs427) unemplyd mdnincm flood
age65 black ------------------------------
--------------------------- unemplyd 1.0000
mdnincm -0.4960 1.0000 flood 0.0827
0.0083 1.0000 age65 0.0319 -0.1634
-0.0272 1.0000 black 0.5037 -0.3065
0.0703 -0.1038 1.0000
26Interpreting the Correlation Coefficient
27Note Correlation is Linear
r 0
r .9
r 0
r .8
28Correlation and Regression
- Regression effects are depicted by the slope of
the line. - Correlation can be seen as the spread of points
around the regression line. The greater the
amount of spread of points around the regression
line, the less predictive is X of Y and
consequently, the weaker the correlation.
29Correlation 1 Slope 1
30Correlation 1 Slope -2
31Imperfect Correlation and Relationships
- We rarely see perfect correlation
- However, even with imperfect correlation, we can
have some expectation of what will happen on
average. - While Correlation is never perfect, we can draw a
line to summarize the trend in the data points.
This is the Regression Line
32Formula for a line
- y mx b (algebraic)
- y a bx (statistical)
- It is the same thing. Well add one more thing
error - yi a bx ei is called the sample regression
function - Yi a ßx ei is called the population
regression function
33Making Predictions
34Establishing Relationships
35Establishing Relationships
Now Add 5 years of education
10 Years of Education Means about 12,000 Income
It adds an Additional 4,000 of Income!
36Where do we Draw the Line?
- Least Squares Principle
- Under the Gauss-Markov assumptions, the Ordinary
Least Squares estimator is the Best Linear
Unbiased Estimator - OLS is BLUE
37What is the estimator?
- In the bivariate case (1 dependent, 1
independent), the least squares principle gives
us these equations for calculating the slope and
intercept.
38(No Transcript)
39Calculating a and b
- b 54.73 / 4.19 13.06
- a 235.7 b (11.2)
- 235.7 13(11.2)
- 235.7 146.27
- 89.43
40STATA command for Regression
- regress y x
- Output looks like this
. regress turnout diplomau Source
SS df MS Number of obs
426 -------------------------------------------
F( 1, 424) 55.40 Model
1.2806e11 1 1.2806e11 Prob gt F
0.0000 Residual 9.8018e11 424
2.3117e09 R-squared
0.1156 ------------------------------------------
- Adj R-squared 0.1135 Total
1.1082e12 425 2.6076e09 Root MSE
48081 ------------------------------------------
------------------------------- turnout
Coef. Std. Err. t Pgtt 95 Conf.
Interval ---------------------------------------
-------------------------------- diplomau
2164.063 290.7549 7.44 0.000 1592.563
2735.564 _cons 172999.4 6300.284
27.46 0.000 160615.7 185383.1 --------------
--------------------------------------------------
--------
41How do we interpret it?
- For now, we look at one key thing coefficients
(slope and intercept) - 172999.4 2164.073 x
- Every 1 unit increase in the percentage of
university diploma holders increases voter
turnout by 2,164 votes, on average. - If there was a district with no university
diploma holders, we would expect 172,999 people
to turnout, on average.