Title: LSP%20121
1LSP 121
- Introduction to Correlation
2Correlation
- The news is filled with examples of correlation
- If you eat so many helpings of tomatoes
- One alcoholic beverage a day
- Driving faster than the speed limit
- Women who smoke during pregnancy
- Often, we can quantify correlation
3How Do You Calculate Correlation in Excel?
- Make an XY scatterplot of the data, putting one
variable on the x-axis and one variable on the
y-axis. - Select the two columns you wish to graph
- Choose Insert ? Scatter
- Insert a linear trendline on the graph and
include the R2 value - Click one of the data points on the chart
- Right-click, choose Add Trendline,
- Check boxes/buttons for Linear, Display
Equation, Display R2 - Interpret the results
- Try it with CigarettesBirthweight.xls
4 Smokes/day and Birth Weight
5Interpreting the Results
- The higher the R2 value, the greater the
likelihood that there is correlation - Crude estimate R2 gt 0.5
- Most people say there is a correlation
- R2 lt 0.3
- Most say correlation is essentially non-existent
- R2 between 0.3 and 0.5?
- Gray area further analysis is needed
- If you only have a few data points, then you need
a higher R2 value in order to make a decision
whether there is or is not a correlation
6Examples Are they correlated?
- Look at
- CigarettesBirthweight.xls
- SpeedLimits.xls (under Older Data)
- HeightWeight.xls
- Grades.xls (under Older Data)
- WineConsumption.xls (under Older Data)
- BreastCancerTemperature.xls
7How Do We Calculate Correlation in SPSS/PASW?
- In SPSS, click on Analyze -gt Correlate -gt
Bivariate - Select the two columns of data you want to
analyze (move them from the left box to the right
box) - You can actually pick more than two columns, but
well keep it simple for now
8How Do We Calculate Correlation in SPSS/PASW?
- Make sure the checkbox for Pearson Correlation
Coefficients is checked - Click OK to run the correlation
- You should get an output window something like
the following slide
9The correlation between height and weight is 0.861
The Pearson Correlation value is not the same as
Excels R-squared value it can be positive or
negative
10Positive and Negative Correlation
- Positive correlation as the values of one
variable increase, the values of a second
variable increase (values from 0 to 1.0) - Negative correlation as the values of one
variable increase, the values of a second
variable decrease (values from 0 to -1.0)
11Positive v.s. Negative Correlation
- There is a negative correlation between TV
viewing and class gradesstudents who spend more
time watching TV tend to have lower grades (or,
students with higher grades tend to spend less
time watching TV). - There is a negative correlation between exercise
and heart disease - There is a positive correlation between exercise
and self-esteem
12Positive and Negative Correlation on a graph
Positive correlation
Negative correlation
13How would you classify these correlations?
Negative correlation
Positive correlation
NO correlation
14Positive and Negative Correlation
- When looking for correlation, positive
correlation is not necessarily greater than
negative correlation - Which correlation is the greatest?
- -.34 .72 -.81 .40 -.12
15 Correlation vs Causation
- Correlation Two concepts are related in some
way. - Causation Changing one of the factors also
causes a change in the other factor. - eg Smoking and Cancer are correlated. They
also have a causal relationship. - If you do something to increase smoking, you
increase the chance of cancer - eg Ice cream sales and crime rates also have a
correlation. However, they do NOT have a causal
relationship. (Can you think why they are
correlated?) - If you do something to increase ice cream sales,
you do not see an increase in crime
16What Can We Conclude?
- If two variables are correlated, then we can
predict one based on the other - But correlation does NOT imply causation!
- It might be the case that having more education
causes a person to earn a higher income. It might
be the case that having higher income allows a
person to go to school more. There could also be
a third variable. Or a fourth. Or a fifth
17Causation (aka Causality)
- Causation One variable A, actually causes a
change in B. - Here are some examples of correlations that also
have a causality - Increase smoking ? Increased likelihood of lung
cancer - Increase exercise ? Decreased likelihood of heart
disease - Key point Many, many, many things in life have
correlations. But this does not mean that they
have causation. - See next slide
18Correlation does NOT imply causation!
- OFTEN (very often!), two items that are
correlated are falsely assumed to have a causal
relationship. - Usually, the reason for falsely assuming
causation is the presence of a common underlying
factor. That is, A may be correlated with B, but
this is due to some other factor, C. - Example None of these three correlations have a
causal relationship. Can you identify the other
factor? - As ice cream sales go up, so do crime rates
- Summer! Crime always goes up in the summer. Not
surprisingly, more people buy ice cream in the
summer as well. - People who wear top-hats live longer (An actual
study from the Victorian era) - Income. Wealthier people wear top hats and can
also afford better health care, medicines,
doctors, etc. - Hormone therapy for breast cancer decreases
likelihood of heart disease - As with the previous example socioeconomic
status. Hormone therapy in of itself increases
the likelihood of heart disease! However, people
who are wealthier are more likely to have better
general medical care resulting in early detection
of breast cancer, proper treatments, etc. For
this reason, they are also more likely to be more
educated about heart disease (eat better,
exercise more, smoke less, etc). So even though
hormone therapy causes heart disease, on the
whole, the majority of people on this therapy
tend to have less heart disease.
19Causation or not?
- What do you think of this example?
- Studies have demonstrated a clear correlation
between ease of faculty grading and faculty
evaluations. That is, faculty who taught less
challenging courses routinely receive better
evaluations.
20Correlation v.s. Causation
- Do not confuse correlation with causation.
- Just because two things are correlated (e.g.
height and weight) does not mean that there is a
causal relationship. - In other words, making a change in A will
predictably cause a change in B - Giving somebody a top-hat will not make them live
longer (see next slide). - This is an example of where there is a
correlation, but there is not causation. - Very important point expect 1-2 exam questions
on this idea!
21What Can We Conclude?
- Sheer coincidence the two variables have
nothing in common, but they create a strong R or
R2 value - Both variables are changing over time divorce
rates are going up and so are drug-offenses. Is
an increase in divorce causing more people to use
drugs (and get caught)?