Title: Correlation Vs' Causation
1Correlation Vs. Causation
2Cautions about Correlation and Regression
- Correlation and Regression ONLY describe only
linear relationships - r and Least Squares Line are NOT resistant
- Extreme values and influential points can have
large effect - Plot your scatter plot FIRST!!!!
3Extrapolation
- Predicting x values from ys (Extrapolation)
- You SHOULD remain within the domain of your data
- Or very close to it
- Predictions Outside your domain are often VERY
inaccurate
The following is the least squares regression
equation obtained for a young childs heights in
years (y) compared to her age in feet(x).
Assuming the girl will live to be 52, predict her
height at this ripe old age.
10 feet tall
Obviously people dont continue to grow over
time
Just remember to be careful when extrapolating!!
4Lurking Variables
- Lurking Variable
- Variable not in your study that can (and probably
does) effect the interpretation of the
relationship between your two measured variables - Often makes up the left over r2
- May be hidden
- Can cause a strong or weak relationship that
isnt true - Dangerous to data and Interpretations
What do I do about them?
Try to identify them BEFORE the study
Use a residual plot with time as your x to try to
identify potential effects
5Should I use Averaged Data?
- Averaged data is okay, BUT
- It shouldnt really be used to predict or
interpret for INDIVIDUALS - Correlations based on Averaged Data are often too
High when applied to individuals - Averaged Data should be used to make predictions
about averages
So What Do I Need to Do?
- Pay attention to the WHOLE Situation
- Look at the Data (Contextually)
- Look for Possible Lurking Variables
- Make sure to DOUBLE CHECK any Contextual
Inferences you make!!
6Causation
- r and r2, our regression statistics are
describing an association between 2 variables. - But does this association mean that the
explanatory variable CAUSES the response
variables - An obvious example of this statement comes from a
true study that found the association listed
below
An actual study performed over a one year time
span found a statistically strong relationship
between the number of ice cream cones sold in a
month and the number of homicides in the same
month.
While there appeared to be a statistical
association between these two variables, we know
that it would be incorrect to say that the number
of ice cream cones sold CAUSES the number of
homicides.
This is where a LURKING variable comes into play
7Causation (visually)
- Below are three different visual examples of
different situations and underlying variables
that can Explain an association
Dotted lines association Arrow causal
relationship
Causation doesnt mean there arent other factors
that effect the result Just that the response is
directly caused by the explanatory variable
Common Response (lurking variable)
Causation
Confounding
Common Response
8Causation (direct)
- Lets look at situations where direct causation
occurs - A study of recorded the heights of young males
(between the ages of 12 and 15) and their
fathers. The study found an association between
the two heights with an r2 of about 25.
A study performed on a number of lab rats found
an association between the number of ounces of
battery acid eaten and the thickness level of the
stomach lining.
There is a direct causal relationship between the
height of a father and their son through
heredity. It is possible to have direct
causation with a low r2, it just says that the
fathers height only explains about 25 of the
variation in the sons height.
While there is a direct cause between the
thickness of the rats stomach and the ounces of
battery acid eaten, this is an example of a
situation that you cant generalize to all cases.
IE The effect might not be the same for humans.
9Common Response (lurking variable)
- Lets look at situations where there is a
lurking variable - An actual study performed over a one year time
span found a strong relationship statistically
between the number of ice cream cones sold in a
month and the number of homicides sold in the
same month - Earlier we found a fairly good association
between the number of tvs that a person owns and
their life expectancy.
The MORAL Association doesnt mean CAUSATION
While this study may show an association between
the two, we know that there are many other
lurking variables that can have an effect on
life expectancy and the of tvs you own.
(DISCUSSION!!)
10Confounding
Two variables are confounding when you cant
tell which variable is effecting the response
Mr. Arnold and Mr. Reed have been selected to
compare the effectiveness of two well known
laundry detergents, PRIDE and NONE. Each takes
their respective detergents home, wash their
clothes, and then bring them to a panel of judges
for submission. It is found that PRIDE is the
better detergent because Mr. Reeds clothes are
more clean.
The MORAL Association doesnt mean CAUSATION
While we can say that the detergent had an effect
on the cleanliness of their clothes, there are
other factors that could have equally effected
the outcome Washer quality, Water Quality,
Laundry Cycle, etc When we cant tell if the
lurking variables or the explanatory variable
had the effect, the study is CONFOUNDING.
11So When Can I say CAUSE?
Cause
Man, I look good!!
- Remember, even HIGH correlation doesnt mean
CAUSATION - When can I say it?
12Moral Of the Story
- Correlation and Association doesnt mean
CAUSATION - Really examine the CONTEXT of your data
- Dont just look at the numbers
13Homework
- 38-48,72,73,75
- Multiple Choice Test Next Class 3.1-3.3, 4.1, 4.2