Title: Agenda
1Agenda
- Review Association for Nominal/Ordinal Data
- ?2 Based Measures, PRE measures
- Introduce Association Measures for I-R data
- Regression, Pearsons r, R2
- HOMEWORKS
- Two left, each due one week after posting
- HW 5 posted later this afternoon
- HW 6 posted on April 28
2Why measure of association?
- What do significant tests tell us?
- What is the point of calculating measures of
association? - Which do you calculate first (significant tests
or association measures) and why?
3?2 Based Measures
- How does ?2 tap into association?
- Indicates how different our findings from what is
expected under null - Since null not related, high ?2 suggests
stronger relationship - Why not just use ?2 ?
- Phi
- Cramers V
4PRE Measures
- ?2 based measures have no intuitive meaning
- PRE measures proportionate reduction in error
- Does knowing someones value on the independent
variable (e.g., sex) improve our prediction of
their score/value on the dependent variable
(whether or not criminal). - Lambda
- Gamma
5ORDINAL MEASURE OF ASSOCIATION
- GAMMA
- For examining STRENGTH DIRECTION of collapsed
ordinal variables (lt6 categories) - Like Lambda, a PRE-based measure
- Range is -1.0 to 1.0
6GAMMA
- Logic Applying PRE to PAIRS of individuals
Prejudice Lower Class Middle Class Upper Class
Low Kenny Tim Kim
Middle Joey Deb Ross
High Randy Eric Barb
7GAMMA
- CONSIDER KENNY-DEB PAIR
- In the language of Gamma, this is a same pair
- direction of difference on 1 variable is the same
as direction on the other - If you focused on the Kenny-Eric pair, you would
come to the same conclusion
Prejudice Lower Class Middle Class Upper Class
Low Kenny Tim Kim
Middle Joey Deb Ross
High Randy Eric Barb
8GAMMA
- NOW LOOK AT THE TIM-JOEY PAIR
- In the language of Gamma, this is a different
pair - direction of difference on one variable is
opposite of the difference on the other
Prejudice Lower Class Middle Class Upper Class
Low Kenny Tim Kim
Middle Joey Deb Ross
High Randy Eric Barb
9GAMMA
- Logic Applying PRE to PAIRS of individuals
- Formula
- same different
- same different
Prejudice Lower Class Middle Class Upper Class
Low Kenny Tim Kim
Middle Joey Deb Ross
High Randy Eric Barb
10GAMMA
- If you were to account for all the pairs in this
table, you would find that there were 9 same
9 different pairs - Applying the Gamma formula, we would get
- 9 9 0 0.0
- 18 18
Prejudice Lower Class Middle Class Upper Class
Low Kenny Tim Kim
Middle Joey Deb Ross
High Randy Eric Barb
11GAMMA
- 3-case example
- Applying the Gamma formula, we would get
- 3 0 3 1.00
- 3 3
Prejudice Lower Class Middle Class Upper Class
Low Kenny
Middle Deb
High Barb
12Gamma Example 1
- Examining the relationship between
- FEHELP (Wife should help husbands career
first) - FEFAM (Better for man to work, women to tend
home) - Both variables are ordinal, coded 1 (strongly
agree) to 4 (strongly disagree)
13Gamma Example 1
- Based on the info in this table, does there seem
to be a relationship between these factors? - Does there seem to be a positive or negative
relationship between them? - Does this appear to be a strong or weak
relationship?
14GAMMA
- Do we reject the null hypothesis of independence
between these 2 variables? - Yes, the Pearson chi square p value (.000) is
lt alpha (.05) - Its worthwhile to look at gamma.
- Interpretation
- There is a strong positive relationship between
these factors. - Knowing someones view on a wifes first
priority improves our ability to predict whether
they agree that women should tend home by 75.5.
15ASSOCIATION BETWEEN INTERVAL-RATIO VARIABLES
16Scattergrams
- Allow quick identification of important features
of relationship between interval-ratio variables - Two dimensions
- Scores of the independent (X) variable
(horizontal axis) - Scores of the dependent (Y) variable (vertical
axis)
173 Purposes of Scattergrams
- To give a rough idea about the existence,
strength direction of a relationship - The direction of the relationship can be detected
by the angle of the regression line - 2. To give a rough idea about whether a
relationship between 2 variables is linear
(defined with a straight line) - 3. To predict scores of cases on one variable (Y)
from the score on the other (X)
18- IV and DV?
- What is the direction
- of this relationship?
19- IV and DV?
- What is the direction of this relationship?
20The Regression line
- Properties
- The sum of positive and negative vertical
distances from it is zero - The standard deviation of the points from the
line is at a minimum - The line passes through the point (mean x, mean
y) - Bivariate Regression Applet
21Regression Line Formula
- Y a bX
- Y score on the dependent variable
- X the score on the independent variable
- a the Y intercept
- point where the regression line crosses the Y
axis - b the slope of the regression line
- SLOPE the amount of change produced in Y by a
unit change in X or, - a measure of the effect of the X variable on the Y
22Regression Line Formula
- Y a bX
- y-intercept (a) 102
- slope (b) .9
- Y 102 (.9)X
- This information can be used to predict weight
from height. - Example What is the predicted weight of a male
who is 70 tall (510)? - Y 102 (.9)(70) 102 63
- 165 pounds
-
23Example 2 Examining the link between hours of
daily TV watching (X) of cans of soda
consumed per day (Y)
Case Hours TV/ Day (X) Cans Soda Per Day (Y)
1 1 2
2 3 6
3 2 3
4 2 4
5 1 1
6 4 6
7 6 7
8 4 2
9 4 5
10 2 0
24Example 2
- Example 2 Examining the link between hours of
daily TV watching (X) of cans of soda
consumed per day. (Y) - The regression line for this problem
- Y 0.7 .99x
- If a person watches 3 hours of TV per day, how
many cans of soda would he be expected to consume
according to the regression equation? - y .7 .99(3) 3.67
25The Slope (b) A Strength A Weakness
- We know that b indicates the change in Y for a
unit change in X, but b is not really a good
measure of strength - Weakness
- It is unbounded (can be gt1 or lt-1) making it hard
to interpret - The size of b is influenced by the scale that
each variable is measured on
26Pearsons r Correlation Coefficient
- By contrast, Pearsons r is bounded
- a value of 0.0 indicates no linear relationship
and a value of /-1.00 indicates a perfect linear
relationship
27Pearsons r
- Y 0.7 .99x
- sx 1.51
- sy 2.24
- Converting the slope to a Pearsons r correlation
coefficient - Formula r b(sx/sy)
- r .99 (1.51/2.24)
- r .67
28The Coefficient of Determination
- The interpretation of Pearsons r (like Cramers
V) is not straightforward - What is a strong or weak correlation?
- Subjective
- The coefficient of determination (r2) is a more
direct way to interpret the association between 2
variables - r2 represents the amount of variation in Y
explained by X - You can interpret r2 with PRE logic
- predict Y while ignoring info. supplied by X
- then account for X when predicting Y
29Coefficient of Determination Example
- Without info about X (hours of daily TV
watching), the best predictor we have is the mean
of cans of soda consumed (mean of Y) - The green line (the slope) is what we would
predict WITH info about X
30Coefficient of Determination
- Conceptually, the formula for r2 is
- r2 Explained variation
- Total variation
- The proportion of the total variation in Y that
is attributable or explained by X. - The variation not explained by r2 is called the
unexplained variation - Usually attributed to measurement error, random
chance, or some combination of other variables
31Coefficient of Determination
- Interpreting the meaning of the coefficient of
determination in the example - Squaring Pearsons r (.67) gives us an r2 of .45
- Interpretation
- The of hours of daily TV watching (X) explains
45 of the total variation in soda consumed (Y)
32Another Example Relationship between Mobility
Rate (x) Divorce rate (y)
- The formula for this regression line is
- Y -2.5 (.17)X
- 1) What is this slope telling you?
- 2) Using this formula, if the mobility rate for a
given state was 45, what would you predict the
divorce rate to be? - 3) The standard deviation (s) for x6.57 the s
for y1.29. Use this info to calculate Pearsons
r. How would you interpret this correlation? - 4) Calculate interpret the coefficient of
determination (r2)
33Another Example Relationship between Mobility
Rate (x) Divorce rate (y)
- The formula for this regression line is
- Y -2.5 (.17)X
- 1) What is this slope telling you?
- For every one unit increase in x (mobility rate),
divorce rate (y) goes up .17 - 2) Using this formula, if the mobility rate for a
given state was 45, what would you predict the
divorce rate to be? - Y -2.5 (.17) 45 5.15
- 3) The standard deviation (s) for x6.57 the s
for y1.29. Use this info to calculate Pearsons
r. How would you interpret this correlation? - r .17 (6.57/1.29) .17(5.093) .866
- There is a strong positive association between
mobility rate divorce rate. - 4) Calculate interpret the coefficient of
determination (r2) - r2 (.866)2 .75
- A states mobility rate explains 75 of the
variation in its divorce rate.