Title: Describing Relationships: Regression, Prediction, and Causation
1Chapter 15
- Describing Relationships Regression,
Prediction, and Causation
2Thought Question 1
Suppose you were to make a scatterplot of (adult)
sons heights versus fathers heights, by
collecting data on both from several of your male
friends. You would now like to predict how tall
your nephew will be when he grows up, on the
basis of his fathers height. Could you use your
scatterplot to help you make this prediction?
Explain.
3Thought Question 2
A strong positive correlation has been found in a
certain city in the northeastern United States
between weekly sales of hot chocolate and weekly
sales of facial tissues. Would you interpret
that to mean that hot chocolate causes people to
need facial tissues? Explain.
4Thought Question 3
Researchers have shown that there is a positive
correlation between the average fat intake and
the breast cancer rate across countries. In
other words, countries with higher fat intake
tend to have higher breast cancer rates. Does
this correlation provide evidence that dietary
fat is a contributing cause of breast cancer?
Explain.
5Thought Question 4
If you were to draw a scatterplot of number of
women in the work force versus number Christmas
trees sold in the United States for each year
between 1930 and the present, you would find a
very strong positive correlation. Why do you
think this would be true? Does one cause the
other?
6Linear Regression
- Objective To quantify the linear relationship
between an explanatory variable and response
variable. We can then predict the average
response for all subjects with a given value of
the explanatory variable. - Regression equation y a bx
- x is the value of the explanatory variable
- y is the average value of the response variable
- note that a and b are just the intercept and
slope of a straight line - note that r and b are not the same thing, but
their signs will agree
Plot
7Least Squares
- Used to determine the best line
- We want the line to be as close as possible to
the data points in the vertical (y) direction
(since that is what we are trying to predict) - Least Squares use the line that minimizes the
sum of the squares of the vertical distances of
the data points from the line
8Prediction via Regression Line Husband and Wife
Ages
Hand, et.al., A Handbook of Small Data Sets,
London Chapman and Hall
- The regression equation is y 3.6 0.97x
- y is the average age of all husbands who have
wives of age x - For all women aged 30, we predict the average
husband age to be 32.7 years - 3.6 (0.97)(30) 32.7 years
- Suppose we know that an individual wifes age is
30. What would we predict her husbands age to
be?
9Coefficient of Determination (R2)
- Measures usefulness of regression prediction
- R2 (or r2, the square of the correlation)
measures how much variation in the values of the
response variable (y) is explained by the
regression line - r1 R21 regression line explains all (100)
of the variation in y - r.7 R2.49 regression line explains almost
half (50) of the variation in y
10A CautionBeware of Extrapolation
- Sarahs height was plotted against her age
- Can you predict her height at age 42 months?
- Can you predict her height at age 30 years (360
months)?
11A CautionBeware of Extrapolation
- Regression liney 71.95 .383 x
- height at age 42 months? y 88
- height at age 30 years? y 209.8
- She is predicted to be 6 10.5 at age 30.
12Correlation Does Not Imply Causation
- Even very strong correlations may not correspond
to a real causal relationship.
13Evidence of Causation
- A properly conducted experiment establishes the
connection - Other considerations
- A reasonable explanation for a cause and effect
exists - The connection happens in repeated trials
- The connection happens under varying conditions
- Potential confounding factors are ruled out
- Alleged cause precedes the effect in time
14Reasons Two Variables May Be Related (Correlated)
- Explanatory variable causes change in response
variable - Response variable causes change in explanatory
variable - Explanatory may have some cause, but is not the
sole cause of changes in the response variable - Confounding variables may exist
- Both variables may result from a common cause
- such as, both variables changing over time
- The correlation may be merely a coincidence
15Explanatory causes Response
- Explanatory pollen count from grasses
- Response percentage of people suffering from
allergy symptoms
- Explanatory amount of food eaten
- Response hunger level
16Response causes Explanatory
- Explanatory Divorce among men
- Response Percent abusing alcohol
- Conclusion was that getting divorced caused
alcohol abuse in men.
- Could it be that alcohol abuse
- caused divorce?
17Explanatory is notSole Contributor
- Explanatory Possession of gun in home
- Response Occurrence of a homicide
- tendency toward violence may be
- another contributor
18Confounding VariablesCase Study Meditation
vs. Aging
- Explanatory Meditation
- Response Aging (measurable aging factor)
- general concern for ones well
- being may be confounded with
- decision to try meditation
19Common Response(both variables change due to
common cause)
- Explanatory Divorce among men
- Response Percent abusing alcohol
- Both may result from an unhappy
- marriage.
20Both Variables are Changing Over Time
- Both divorces and suicides have increased
dramatically since 1900. - Are divorces causing suicides?
- Are suicides causing divorces???
- The population has increased dramatically since
1900 (causing both to increase).
- Better to investigate Has the rate of divorce
or the rate of suicide changed over time?
21The Relationship May Be Just a Coincidence
- We will see some strong correlations (or
apparent associations) just by chance, even when
the variables are not related in the population
22Coincidence (?)Case Study Vaccines and Brain
Damage
- A required whooping cough vaccine was blamed for
seizures that caused brain damage - led to reduced production of vaccine (due to
lawsuits) - Study of 38,000 children found no evidence for
the accusations (reported in New York Times) - people confused association with
cause-and-effect - virtually every kid received the vaccineit was
inevitable that, by chance, brain damage caused
by other factors would occasionally occur in a
recently vaccinated child
23Case Study
Social Relationships and Health
House, J., Landis, K., and Umberson, D. Social
Relationships and Health, Science, Vol. 241
(1988), pp 540-545.
- Does lack of social relationships cause people to
become ill? - Or, are unhealthy people less likely to establish
and maintain social relationships? - Or, is there some other factor that predisposes
people both to have lower social activity and
become ill?
24Key Concepts
- Least Squares Regression Equation
- R2
- Correlation does not imply causation
- Confirming causation
- Reasons variables may be correlated